* Why the Makefile is so eager to re-build & re-link @ 2021-06-24 13:16 Ævar Arnfjörð Bjarmason 2021-06-24 15:16 ` Jeff King ` (2 more replies) 0 siblings, 3 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-06-24 13:16 UTC (permalink / raw) To: Git List Cc: Junio C Hamano, Jeff King, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin This is probably all stuff that's been on list-before / known by some/all people in the CC list, but in case not: I looked a bit into why we'e so frequently re-linking and re compiling things these days, slowing down e.g. "git rebase --exec='make ...'". These are all fixable issues, I haven't worked on them, just some notes in case anyone has better ideas: * version.c: The biggest offender, every time you move to a new commit we have a FORCE on GIT-VERSION-FILE, if it changes (which it will, if you're in a git checkout), we in turn re-compile version.o, link libgit to contain it, and then need to re-link everything that uses libgit. Some of this can be micro-optimized by moving that out of libgit, e.g. we re-link git-shell, which doesn't ever report the version, we do the same for all the perl scripts, just because obscure codepaths of them that could shell out to "git version" want to report it, and we embed it in the generated code. But I think the best approach here is to piggy-back on SKIP_DASHED_BUILT_INS, if you have that enabled then only "git" (and git-upload-pack etc.) need to be aware of the version, which they can then set in the environment for the others. This also applies to e.g. the git-http* stuff, which has a user agent derived from the version. * {command,config}-list.h (and in-flight, my hook-list.h): Every time you touch a Documentation/git-*.txt we need to re-generate these, and since their mtime changes we re-compile and re-link all the way up to libgit and our other tools. I think the best solution here is to make the generate-*.sh shellscripts faster (just one takes ~300ms of nested shellscripting, just to grep out the first few lines of every git-*.txt, in e.g. Perl or a smarter awk script this would be <5ms). Then we make those FORCE, but most of the time the config or command summary (or list of hooks) doesn't change, so we don't need to replace the file. Perhaps even better would be to piggy-back on the RUNTIME_PREFIX support, and simply drop in generated plain-text files, so in your build checkout the list of hooks, commands etc. would be parsed instead of compiled in. Then we wouldn't need to re-build or re-link anything for the version or this other data. We could/should still have some facility to compile those in for "install", see also [1] for some concerns / ideas for other similar things that could use that. 1. https://lore.kernel.org/git/87czvoowg2.fsf@evledraar.gmail.com/ ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason @ 2021-06-24 15:16 ` Jeff King 2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason ` (3 more replies) 2021-06-24 23:35 ` Øystein Walle 2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason 2 siblings, 4 replies; 87+ messages in thread From: Jeff King @ 2021-06-24 15:16 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote: > This is probably all stuff that's been on list-before / known by > some/all people in the CC list, but in case not: I looked a bit into why > we'e so frequently re-linking and re compiling things these days, > slowing down e.g. "git rebase --exec='make ...'". > > These are all fixable issues, I haven't worked on them, just some notes > in case anyone has better ideas: From a quick skim I didn't see anything wrong in your analysis or suggestions. I do kind of wonder if we are hitting a point of diminishing returns here. "make -j16" on my system takes ~175ms for a noop, and ~650ms if I have to regenerate version.h (it's something like 2s total of CPU, but I have 8 cores). I know I've probably got a nicer machine than many other folks. But at some point correctness and avoiding complexity in the Makefile become a win over shaving off a second from compile times. You'd probably find lower hanging fruit in the test suite which could shave off tens of seconds. :) > * {command,config}-list.h (and in-flight, my hook-list.h): Every time > you touch a Documentation/git-*.txt we need to re-generate these, and > since their mtime changes we re-compile and re-link all the way up to > libgit and our other tools. > > I think the best solution here is to make the generate-*.sh > shellscripts faster (just one takes ~300ms of nested shellscripting, > just to grep out the first few lines of every git-*.txt, in e.g. Perl > or a smarter awk script this would be <5ms). Yeah, I think Eric mentioned he had looked into doing this in perl, but we weren't entirely happy with the dependency. Here's another really odd thing I noticed: $ time sh ./generate-cmdlist.sh command-list.txt >one real 0m1.323s user 0m1.531s sys 0m0.834s $ time sh -x ./generate-cmdlist.sh command-list.txt >two [a bunch of trace output] real 0m0.513s user 0m0.754s sys 0m0.168s $ cmp one two [no output] Er, what? Running with "-x" makes it almost 3 times faster to generate the same output? I'd have said this is an anomaly, but it's repeatable (and swapping the order produces the same output, so it's not some weird priming thing). And then to top it all off, redirecting the trace is slow again: $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null real 0m1.363s user 0m1.538s sys 0m0.902s A little mini-mystery that I think I may leave unsolved for now. > Then we make those FORCE, but most of the time the config or command > summary (or list of hooks) doesn't change, so we don't need to > replace the file. Yes, possibly we could use the "if it hasn't changed, don't update the file" trick to avoid cascading updates. > Perhaps even better would be to piggy-back on the RUNTIME_PREFIX > support, and simply drop in generated plain-text files, so in your build > checkout the list of hooks, commands etc. would be parsed instead of > compiled in. Then we wouldn't need to re-build or re-link anything for > the version or this other data. Yeah, that would work. I worry a bit that the value of something like "version.h" is lost with a runtime file, though. The point is to bake it into the binary so you can't accidentally get the wrong value (say, from running "./git" from the build directory, which looks at the runtime file where the binary _would_ be installed, except you haven't run "make install" yet). For cmdlist stuff it could be a good match, though. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 15:16 ` Jeff King @ 2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason 2021-06-24 21:30 ` Johannes Sixt ` (2 subsequent siblings) 3 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-06-24 15:28 UTC (permalink / raw) To: Jeff King Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin On Thu, Jun 24 2021, Jeff King wrote: > On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> This is probably all stuff that's been on list-before / known by >> some/all people in the CC list, but in case not: I looked a bit into why >> we'e so frequently re-linking and re compiling things these days, >> slowing down e.g. "git rebase --exec='make ...'". >> >> These are all fixable issues, I haven't worked on them, just some notes >> in case anyone has better ideas: > > From a quick skim I didn't see anything wrong in your analysis or > suggestions. I do kind of wonder if we are hitting a point of > diminishing returns here. "make -j16" on my system takes ~175ms for a > noop, and ~650ms if I have to regenerate version.h (it's something like > 2s total of CPU, but I have 8 cores). > > I know I've probably got a nicer machine than many other folks. But at > some point correctness and avoiding complexity in the Makefile become a > win over shaving off a second from compile times. You'd probably find > lower hanging fruit in the test suite which could shave off tens of > seconds. :) It's mainly annoying when e.g. doing a rebase of an N patch series, those ~700ms v.s. ~200ms add up quickly. >> * {command,config}-list.h (and in-flight, my hook-list.h): Every time >> you touch a Documentation/git-*.txt we need to re-generate these, and >> since their mtime changes we re-compile and re-link all the way up to >> libgit and our other tools. >> >> I think the best solution here is to make the generate-*.sh >> shellscripts faster (just one takes ~300ms of nested shellscripting, >> just to grep out the first few lines of every git-*.txt, in e.g. Perl >> or a smarter awk script this would be <5ms). > > Yeah, I think Eric mentioned he had looked into doing this in perl, but > we weren't entirely happy with the dependency. Here's another really odd > thing I noticed: > > $ time sh ./generate-cmdlist.sh command-list.txt >one > real 0m1.323s > user 0m1.531s > sys 0m0.834s > > $ time sh -x ./generate-cmdlist.sh command-list.txt >two > [a bunch of trace output] > real 0m0.513s > user 0m0.754s > sys 0m0.168s > > $ cmp one two > [no output] > > Er, what? Running with "-x" makes it almost 3 times faster to generate > the same output? I'd have said this is an anomaly, but it's repeatable > (and swapping the order produces the same output, so it's not some weird > priming thing). And then to top it all off, redirecting the trace is > slow again: > > $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null > real 0m1.363s > user 0m1.538s > sys 0m0.902s > > A little mini-mystery that I think I may leave unsolved for now. Sounds interesting if true, I haven't looked into it. >> Then we make those FORCE, but most of the time the config or command >> summary (or list of hooks) doesn't change, so we don't need to >> replace the file. > > Yes, possibly we could use the "if it hasn't changed, don't update the > file" trick to avoid cascading updates. The problem is also that you can only do it at the lowest level, or you'll get into a dead-end of something else depending on the FORCE target continually re-making it, even though the target itself decided there was nothing to do based on a cmp(1). >> Perhaps even better would be to piggy-back on the RUNTIME_PREFIX >> support, and simply drop in generated plain-text files, so in your build >> checkout the list of hooks, commands etc. would be parsed instead of >> compiled in. Then we wouldn't need to re-build or re-link anything for >> the version or this other data. > > Yeah, that would work. I worry a bit that the value of something like > "version.h" is lost with a runtime file, though. The point is to bake it > into the binary so you can't accidentally get the wrong value (say, from > running "./git" from the build directory, which looks at the runtime > file where the binary _would_ be installed, except you haven't run "make > install" yet). I think all of those concerns are covered under RUNTIME_PREFIX, it discovers files relative to git whether you have it installed or not. I still haven't looked into why I sometimes need --exec-path=$PWD in the build checkout, and sometimes not though... ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 15:16 ` Jeff King 2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason @ 2021-06-24 21:30 ` Johannes Sixt 2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason 2021-06-25 21:17 ` Why the Makefile is so eager to re-build & re-link Felipe Contreras 2021-06-29 5:04 ` Eric Sunshine 3 siblings, 1 reply; 87+ messages in thread From: Johannes Sixt @ 2021-06-24 21:30 UTC (permalink / raw) To: Jeff King, Ævar Arnfjörð Bjarmason Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin Am 24.06.21 um 17:16 schrieb Jeff King: > On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote: >> * {command,config}-list.h (and in-flight, my hook-list.h): Every time >> you touch a Documentation/git-*.txt we need to re-generate these, and >> since their mtime changes we re-compile and re-link all the way up to >> libgit and our other tools. >> >> I think the best solution here is to make the generate-*.sh >> shellscripts faster (just one takes ~300ms of nested shellscripting, >> just to grep out the first few lines of every git-*.txt, in e.g. Perl >> or a smarter awk script this would be <5ms). > > Yeah, I think Eric mentioned he had looked into doing this in perl, but > we weren't entirely happy with the dependency. Here's another really odd > thing I noticed: > > $ time sh ./generate-cmdlist.sh command-list.txt >one > real 0m1.323s > user 0m1.531s > sys 0m0.834s > > $ time sh -x ./generate-cmdlist.sh command-list.txt >two > [a bunch of trace output] > real 0m0.513s > user 0m0.754s > sys 0m0.168s > > $ cmp one two > [no output] > > Er, what? Running with "-x" makes it almost 3 times faster to generate > the same output? I'd have said this is an anomaly, but it's repeatable > (and swapping the order produces the same output, so it's not some weird > priming thing). And then to top it all off, redirecting the trace is > slow again: > > $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null > real 0m1.363s > user 0m1.538s > sys 0m0.902s > > A little mini-mystery that I think I may leave unsolved for now. Strange, really. Reminds me of the case that the `read` built-in must read input byte by byte if stdin is not connected to a (searchable) file. I have two patches that speed up generate-cmdlist.sh a bit: generate-cmdlist.sh: replace for loop by printf's auto-repeat feature (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03) generate-cmdlist.sh: spawn fewer processes (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304) that I can submit (give me some time, though) or interested parties could pick up. -- Hannes ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 21:30 ` Johannes Sixt @ 2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason 2021-06-25 9:01 ` Ævar Arnfjörð Bjarmason 2021-06-29 2:13 ` Jeff King 0 siblings, 2 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-06-25 8:34 UTC (permalink / raw) To: Johannes Sixt Cc: Jeff King, Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin On Thu, Jun 24 2021, Johannes Sixt wrote: > Am 24.06.21 um 17:16 schrieb Jeff King: >> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote: >>> * {command,config}-list.h (and in-flight, my hook-list.h): Every time >>> you touch a Documentation/git-*.txt we need to re-generate these, and >>> since their mtime changes we re-compile and re-link all the way up to >>> libgit and our other tools. >>> >>> I think the best solution here is to make the generate-*.sh >>> shellscripts faster (just one takes ~300ms of nested shellscripting, >>> just to grep out the first few lines of every git-*.txt, in e.g. Perl >>> or a smarter awk script this would be <5ms). >> >> Yeah, I think Eric mentioned he had looked into doing this in perl, but >> we weren't entirely happy with the dependency. Here's another really odd >> thing I noticed: >> >> $ time sh ./generate-cmdlist.sh command-list.txt >one >> real 0m1.323s >> user 0m1.531s >> sys 0m0.834s >> >> $ time sh -x ./generate-cmdlist.sh command-list.txt >two >> [a bunch of trace output] >> real 0m0.513s >> user 0m0.754s >> sys 0m0.168s >> >> $ cmp one two >> [no output] >> >> Er, what? Running with "-x" makes it almost 3 times faster to generate >> the same output? I'd have said this is an anomaly, but it's repeatable >> (and swapping the order produces the same output, so it's not some weird >> priming thing). And then to top it all off, redirecting the trace is >> slow again: >> >> $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null >> real 0m1.363s >> user 0m1.538s >> sys 0m0.902s >> >> A little mini-mystery that I think I may leave unsolved for now. > > Strange, really. Reminds me of the case that the `read` built-in must > read input byte by byte if stdin is not connected to a (searchable) file. > > I have two patches that speed up generate-cmdlist.sh a bit: > > generate-cmdlist.sh: replace for loop by printf's auto-repeat feature > (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03) > generate-cmdlist.sh: spawn fewer processes > (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304) > > that I can submit (give me some time, though) or interested parties > could pick up. Interesting, but I think rather than micro-optimizing the O(n) loop it makes more sense to turn it into a series of O(1) in -j parallel, i.e. actually use the make dependency graph for this as I suggested in: https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/ Something like the hacky throwaway patch that follows. Now when you touch a file in Documentation/git-*.txt you re-make just that file chain, which gets assembled into the command-list.h: $ touch Documentation/git-add.txt; time make -k -j $(nproc) git 2>&1 GIT_VERSION = 2.32.0.94.g87ef2a6b7ed.dirty GEN build/Documentation/git-add.txt.cmdlist.in CC version.o GEN build/Documentation/git-add.txt.cmdlist GEN command-list.h CC help.o AR libgit.a LINK git Those build/* files are, respectively, the relevant line corresponding to the *.txt file from command-list.txt: $ cat build/Documentation/git-add.txt.cmdlist.in git-add mainporcelain worktree And then the line we want in command-list.h at the end: $ cat build/Documentation/git-add.txt.cmdlist { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree }, The build process in then just a matter of keeping those files up-to-date, and having "make" create the command-list.h is just a matter of: cat build/Documentation/*.cmdlist Well, with the categories still being an O(n) affair, but the cost of generating those is trivial. I think that aside from optimizing for speed it just makes things clearer, if you modify e.g. git-add.txt you see a clear chain of output from make about how we generate output from that, and then make the command-list.h. Now we'd just show a mysterious command-list.h entry. Whenever you have "make" create one file from many it becomes hard to see at a glance what's going on. I'd be curious to see what how that performs e.g. on Windows, on Linux I get e.g. (warm cache): $ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1 GEN build/Documentation/git-apply.txt.cmdlist.in GEN build/Documentation/git-annotate.txt.cmdlist.in GEN build/Documentation/git-am.txt.cmdlist.in GEN build/Documentation/git-add.txt.cmdlist.in GEN build/Documentation/git-archimport.txt.cmdlist.in GEN build/Documentation/git-archive.txt.cmdlist.in GEN build/Documentation/git-apply.txt.cmdlist GEN build/Documentation/git-annotate.txt.cmdlist GEN build/Documentation/git-am.txt.cmdlist GEN build/Documentation/git-add.txt.cmdlist GEN build/Documentation/git-archimport.txt.cmdlist GEN build/Documentation/git-archive.txt.cmdlist GEN command-list.h real 0m0.214s user 0m0.196s sys 0m0.071s Doing the same on master takes around 600ms for me at best: $ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1 GEN command-list.h real 0m0.611s user 0m0.756s sys 0m0.112s It's even faster when I have to make all of them (my -j is = 8), or around 450ms after a "touch Documentation/git-*.txt". We have ~170 lines we process in command-list.txt, I'd think on Windows you'd get much better results, instead of optimizing those number of "sort | uniq" to the same number of "sort -u" the common case is just running 1-5 of those, as that's all the *.txt files you changed, then just "cat-ing" the full set. diff --git a/.gitignore b/.gitignore index 311841f9bed..9b365395496 100644 --- a/.gitignore +++ b/.gitignore @@ -13,6 +13,7 @@ /GIT-USER-AGENT /GIT-VERSION-FILE /bin-wrappers/ +/build/ /git /git-add /git-add--interactive diff --git a/Makefile b/Makefile index c3565fc0f8f..5e845bd0f69 100644 --- a/Makefile +++ b/Makefile @@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt $(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \ >$@+ && mv $@+ $@ -command-list.h: generate-cmdlist.sh command-list.txt +build/Documentation: + $(QUIET_GEN)mkdir -p build/Documentation +.PRECIOUS: build/Documentation/git%.txt.cmdlist.in +build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt + $(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \ + then \ + # For e.g. git-init-db, which has a *.txt file, but no \ + # command-list.h entry \ + >$@; \ + fi +build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in + $(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@ -command-list.h: $(wildcard Documentation/git*.txt) - $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \ +COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt)) + +COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES)) +command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN) + $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \ $(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \ - command-list.txt >$@+ && mv $@+ $@ + command-list.txt >$@+ && \ + echo "static struct cmdname_help command_list[] = {" >>$@+ && \ + cat build/Documentation/*.txt.cmdlist >>$@+ && \ + echo "};" >>$@+ && \ + mv $@+ $@ SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\ $(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\ diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 9dbbb08e70a..f80266d5138 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -1,5 +1,8 @@ #!/bin/sh +mode=$1 +shift + die () { echo "$@" >&2 exit 1 @@ -61,8 +64,6 @@ define_category_names () { } print_command_list () { - echo "static struct cmdname_help command_list[] = {" - command_list "$1" | while read cmd rest do @@ -73,6 +74,9 @@ print_command_list () { done echo " }," done +} + +end_print_command_list () { echo "};" } @@ -84,6 +88,12 @@ do shift done +if test "$mode" = "--tail" +then + print_command_list "$1" + exit 0 +fi + echo "/* Automatically generated by generate-cmdlist.sh */ struct cmdname_help { const char *name; @@ -94,5 +104,3 @@ struct cmdname_help { define_categories "$1" echo define_category_names "$1" -echo -print_command_list "$1" ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason @ 2021-06-25 9:01 ` Ævar Arnfjörð Bjarmason 2021-06-29 2:13 ` Jeff King 1 sibling, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-06-25 9:01 UTC (permalink / raw) To: Johannes Sixt Cc: Jeff King, Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin, Johannes Schindelin On Fri, Jun 25 2021, Ævar Arnfjörð Bjarmason wrote: Just one small thing I forgot to note: > On Thu, Jun 24 2021, Johannes Sixt wrote: >> I have two patches that speed up generate-cmdlist.sh a bit: > [...] > Interesting, but I think rather than micro-optimizing the O(n) loop it > makes more sense to turn it into a series of O(1) in -j parallel, > i.e. actually use the make dependency graph for this as I suggested in: > https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/ > [...] > diff --git a/.gitignore b/.gitignore > index 311841f9bed..9b365395496 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -13,6 +13,7 @@ > /GIT-USER-AGENT > /GIT-VERSION-FILE > /bin-wrappers/ > +/build/ > /git > /git-add > /git-add--interactive > diff --git a/Makefile b/Makefile > index c3565fc0f8f..5e845bd0f69 100644 > --- a/Makefile > +++ b/Makefile > @@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt > $(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \ > >$@+ && mv $@+ $@ > > -command-list.h: generate-cmdlist.sh command-list.txt > +build/Documentation: > + $(QUIET_GEN)mkdir -p build/Documentation > +.PRECIOUS: build/Documentation/git%.txt.cmdlist.in > +build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt > + $(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \ > + then \ > + # For e.g. git-init-db, which has a *.txt file, but no \ > + # command-list.h entry \ > + >$@; \ > + fi > +build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in > + $(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@ > > -command-list.h: $(wildcard Documentation/git*.txt) > - $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \ > +COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt)) > + > +COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES)) > +command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN) > + $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \ > $(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \ I left this --exclude-program code (added in 724d63569fe (help -a: do not list commands that are excluded from the build, 2019-04-18)) in just because I couldn't be bothered to rewrite it for this demo. But of course that's also something that becomes much easier once we do this in a Makefile-native way, we'd just $(filter) that list out ourselves, so no need to have some shellscript deal with it. That's also true on "master" right now, 724d63569fe doesn't discuss why it went for the shellscript approach. I suspect it's just that the author thought it was easier than dealing with some combination of $(filter) and (a possibly nested) $(pathsubst). > - command-list.txt >$@+ && mv $@+ $@ > + command-list.txt >$@+ && \ > + echo "static struct cmdname_help command_list[] = {" >>$@+ && \ > + cat build/Documentation/*.txt.cmdlist >>$@+ && \ > + echo "};" >>$@+ && \ > + mv $@+ $@ ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason 2021-06-25 9:01 ` Ævar Arnfjörð Bjarmason @ 2021-06-29 2:13 ` Jeff King 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 87+ messages in thread From: Jeff King @ 2021-06-29 2:13 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Johannes Sixt, Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin On Fri, Jun 25, 2021 at 10:34:20AM +0200, Ævar Arnfjörð Bjarmason wrote: > Interesting, but I think rather than micro-optimizing the O(n) loop it > makes more sense to turn it into a series of O(1) in -j parallel, > i.e. actually use the make dependency graph for this as I suggested in: > https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/ I have mixed feelings on that. I do like the general notion of breaking apart tasks and feeding the dependencies to "make", because that lets it do a better job of parallelizing or avoiding already-done work. But there's a cost to running any job, so eventually you get to a unit of work that's so small the overhead dominates. For instance, starting from a built Git but dirtying all doc files with "touch Documentation/*.txt", running "time make -j16" yields: real 0m1.749s user 0m2.963s sys 0m1.146s With your patch to break it apart into many jobs, the same operation gives: real 0m0.762s user 0m3.054s sys 0m0.600s So that took fewer wall-clock seconds, but we actually spent more CPU. On a system with fewer cores, it would probably be a loss in both places. Now maybe that's a good tradeoff, especially because the common case (aside from a build-from-scratch, which will spend loads more time actually compiling anyway) is that only a handful of files would be updated. But if we can just make the actual operation faster, then even O(n) repeated work might be a win in all cases, because it's avoiding the overhead of extra jobs. I dunno. I think there's a formula here that depends on the overhead of a job versus the time to handle a single file in the script, coupled with the expected number of changed files for any run. I'm not sure of the exact values of those numbers in this case, but am mostly pointing out that it's a tradeoff and not always a pure win. :) > Something like the hacky throwaway patch that follows. Now when you > touch a file in Documentation/git-*.txt you re-make just that file > chain, which gets assembled into the command-list.h: I know you said this was throwaway, but in case you do pursue it further, my first run hit: $ time make GIT_VERSION = 2.32.0.94.gaa5e6f14dd * new prefix flags GEN build/Documentation GEN build/Documentation/git-add.txt.cmdlist.in /bin/sh: 1: cannot create build/Documentation/git-add.txt.cmdlist.in: Directory nonexistent /bin/sh: 5: cannot create build/Documentation/git-add.txt.cmdlist.in: Directory nonexistent GEN build/Documentation/git-am.txt.cmdlist.in /bin/sh: 1: cannot create build/Documentation/git-am.txt.cmdlist.in: Directory nonexistent /bin/sh: 5: cannot create build/Documentation/git-am.txt.cmdlist.in: Directory nonexistent So I'd guess there's some race with creating the build/Documentation directory (a subsequent run worked fine). -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-06-29 2:13 ` Jeff King @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason ` (9 more replies) 0 siblings, 10 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason This series is based off an off-hand comment I made about making the cmdlist.sh faster, in the meantime much of the same methods are already cooking in "next" for the "lint-docs" target. See 7/8 for the main performance numbers, along the way I stole some patches from Johannes Sixt who'd worked on optimizing the script before, which compliment this new method of generating this file by piggy-backing more on GNU make for managing a dependency graph for us. 1. https://lore.kernel.org/git/87r1gqxqxn.fsf@evledraar.gmail.com/ Johannes Sixt (2): generate-cmdlist.sh: spawn fewer processes generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason (6): command-list.txt: sort with "LC_ALL=C sort" generate-cmdlist.sh: trivial whitespace change generate-cmdlist.sh: don't call get_categories() from category_list() generate-cmdlist.sh: run "grep | sort", not "sort | grep" Makefile: stop having command-list.h depend on a wildcard Makefile: assert correct generate-cmdlist.sh output .gitignore | 1 + Makefile | 57 ++++++++++++++++++++++++++++++++++++++++----- command-list.txt | 20 ++++++++-------- generate-cmdlist.sh | 53 ++++++++++++++++++++++++++++------------- help.c | 3 --- 5 files changed, 99 insertions(+), 35 deletions(-) -- 2.33.1.1338.g20da966911a ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason ` (8 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason We should keep these files sorted in the C locale, e.g. in the C locale the order is: git-check-mailmap git-check-ref-format git-checkout But under en_US.UTF-8 it's: git-check-mailmap git-checkout git-check-ref-format In a subsequent commit I'll change generate-cmdlist.sh to use C sort order, and without this change we'd be led to believe that that change caused a meaningful change in the output, so let's do this as a separate step, right now the generate-cmdlist.sh script just uses the order found in this file. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- command-list.txt | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/command-list.txt b/command-list.txt index a289f09ed6f..02fc7ddde68 100644 --- a/command-list.txt +++ b/command-list.txt @@ -60,9 +60,9 @@ git-cat-file plumbinginterrogators git-check-attr purehelpers git-check-ignore purehelpers git-check-mailmap purehelpers +git-check-ref-format purehelpers git-checkout mainporcelain git-checkout-index plumbingmanipulators -git-check-ref-format purehelpers git-cherry plumbinginterrogators complete git-cherry-pick mainporcelain git-citool mainporcelain @@ -111,7 +111,6 @@ git-index-pack plumbingmanipulators git-init mainporcelain init git-instaweb ancillaryinterrogators complete git-interpret-trailers purehelpers -gitk mainporcelain git-log mainporcelain info git-ls-files plumbinginterrogators git-ls-remote plumbinginterrogators @@ -124,11 +123,11 @@ git-merge-base plumbinginterrogators git-merge-file plumbingmanipulators git-merge-index plumbingmanipulators git-merge-one-file purehelpers -git-mergetool ancillarymanipulators complete git-merge-tree ancillaryinterrogators -git-multi-pack-index plumbingmanipulators +git-mergetool ancillarymanipulators complete git-mktag plumbingmanipulators git-mktree plumbingmanipulators +git-multi-pack-index plumbingmanipulators git-mv mainporcelain worktree git-name-rev plumbinginterrogators git-notes mainporcelain @@ -154,23 +153,23 @@ git-request-pull foreignscminterface complete git-rerere ancillaryinterrogators git-reset mainporcelain history git-restore mainporcelain worktree -git-revert mainporcelain git-rev-list plumbinginterrogators git-rev-parse plumbinginterrogators +git-revert mainporcelain git-rm mainporcelain worktree git-send-email foreignscminterface complete git-send-pack synchingrepositories +git-sh-i18n purehelpers +git-sh-setup purehelpers git-shell synchelpers git-shortlog mainporcelain git-show mainporcelain info git-show-branch ancillaryinterrogators complete git-show-index plumbinginterrogators git-show-ref plumbinginterrogators -git-sh-i18n purehelpers -git-sh-setup purehelpers git-sparse-checkout mainporcelain worktree -git-stash mainporcelain git-stage complete +git-stash mainporcelain git-status mainporcelain info git-stripspace purehelpers git-submodule mainporcelain @@ -189,10 +188,11 @@ git-var plumbinginterrogators git-verify-commit ancillaryinterrogators git-verify-pack plumbinginterrogators git-verify-tag ancillaryinterrogators -gitweb ancillaryinterrogators git-whatchanged ancillaryinterrogators complete git-worktree mainporcelain git-write-tree plumbingmanipulators +gitk mainporcelain +gitweb ancillaryinterrogators gitattributes guide gitcli guide gitcore-tutorial guide @@ -211,6 +211,6 @@ gitremote-helpers guide gitrepository-layout guide gitrevisions guide gitsubmodules guide -gittutorial-2 guide gittutorial guide +gittutorial-2 guide gitworkflows guide -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason ` (7 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason This makes a subsequent diff smaller, and won't leave us with this syntax nit at the end. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 9dbbb08e70a..5114f46680a 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -10,7 +10,7 @@ command_list () { } get_categories () { - tr ' ' '\012'| + tr ' ' '\012' | grep -v '^$' | sort | uniq -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason ` (6 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason From: Johannes Sixt <j6t@kdbg.org> The function get_categories() is invoked in a loop over all commands. As it runs several processes, this takes an awful lot of time on Windows. To reduce the number of processes, move the process that filters empty lines to the other invoker of the function, where it is needed. The invocation of get_categories() in the loop does not need the empty line filtered away because the result is word-split by the shell, which eliminates the empty line automatically. Furthermore, use sort -u instead of sort | uniq to remove yet another process. [Ævar: on Linux this seems to speed things up a bit, although with hyperfine(1) the results are fuzzy enough to land within the confidence interval]: $ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old $ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt' Benchmark #1: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 371.3 ms ± 64.2 ms [User: 430.4 ms, System: 72.5 ms] Range (min … max): 320.5 ms … 517.7 ms 10 runs Benchmark #2: sh generate-cmdlist.sh.old command-list.txt Time (mean ± σ): 489.9 ms ± 185.4 ms [User: 724.7 ms, System: 141.3 ms] Range (min … max): 346.0 ms … 885.3 ms 10 runs Summary 'sh generate-cmdlist.sh command-list.txt' ran 1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt' Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 5114f46680a..27367915611 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -11,15 +11,14 @@ command_list () { get_categories () { tr ' ' '\012' | - grep -v '^$' | - sort | - uniq + LC_ALL=C sort -u } category_list () { command_list "$1" | cut -c 40- | - get_categories + get_categories | + grep -v '^$' } get_synopsis () { -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason ` (2 preceding siblings ...) 2021-10-20 18:39 ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason ` (5 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason This isn't for optimization as the get_categories() is a purely shell function, but rather for ease of readability, let's just inline these two lines. We'll be changing this code some more in subsequent commits to make this worth it. Rename the get_categories() function to get_category_line(), since that's what it's doing now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 27367915611..16043e38476 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -9,7 +9,7 @@ command_list () { eval "grep -ve '^#' $exclude_programs" <"$1" } -get_categories () { +get_category_line () { tr ' ' '\012' | LC_ALL=C sort -u } @@ -17,7 +17,8 @@ get_categories () { category_list () { command_list "$1" | cut -c 40- | - get_categories | + tr ' ' '\012' | + LC_ALL=C sort -u | grep -v '^$' } @@ -66,7 +67,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - for cat in $(echo "$rest" | get_categories) + for cat in $(echo "$rest" | get_category_line) do printf " | CAT_$cat" done -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason ` (3 preceding siblings ...) 2021-10-20 18:39 ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason ` (4 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason This doesn't matter for performance, but let's not include the empty lines in our sorting. This makes the intent of the code clearer. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 16043e38476..e517c33710a 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -18,8 +18,8 @@ category_list () { command_list "$1" | cut -c 40- | tr ' ' '\012' | - LC_ALL=C sort -u | - grep -v '^$' + grep -v '^$' | + LC_ALL=C sort -u } get_synopsis () { -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason ` (4 preceding siblings ...) 2021-10-20 18:39 ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-21 14:42 ` Jeff King 2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason ` (3 subsequent siblings) 9 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason From: Johannes Sixt <j6t@kdbg.org> This is just a small code reduction. There is a small probability that the new code breaks when the category list is empty. But that would be noticed during the compile step. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index e517c33710a..a1ab2b1f077 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -67,10 +67,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - for cat in $(echo "$rest" | get_category_line) - do - printf " | CAT_$cat" - done + printf " | CAT_%s" $(echo "$rest" | get_category_line) echo " }," done echo "};" -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature 2021-10-20 18:39 ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason @ 2021-10-21 14:42 ` Jeff King 2021-10-21 16:25 ` Jeff King 0 siblings, 1 reply; 87+ messages in thread From: Jeff King @ 2021-10-21 14:42 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20, 2021 at 08:39:57PM +0200, Ævar Arnfjörð Bjarmason wrote: > From: Johannes Sixt <j6t@kdbg.org> > > This is just a small code reduction. There is a small probability that > the new code breaks when the category list is empty. But that would be > noticed during the compile step. > > Signed-off-by: Johannes Sixt <j6t@kdbg.org> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > generate-cmdlist.sh | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh > index e517c33710a..a1ab2b1f077 100755 > --- a/generate-cmdlist.sh > +++ b/generate-cmdlist.sh > @@ -67,10 +67,7 @@ print_command_list () { > while read cmd rest > do > printf " { \"$cmd\", $(get_synopsis $cmd), 0" > - for cat in $(echo "$rest" | get_category_line) > - do > - printf " | CAT_$cat" > - done > + printf " | CAT_%s" $(echo "$rest" | get_category_line) > echo " }," I think this is fine, but regardless of what happens in patch 7, it's probably worth dropping this get_category_line call. All it does is sort and de-dup the tokens in $rest, but we don't care because we're just OR-ing them together. And of the 3 processes spawned by each loop, it is responsible for 2 of them. Even if this loop is broken out into individual bits of Makefile snippet, avoiding the extra processes is worth doing. The patch below gives me: $ git show HEAD:generate-cmdlist.sh >generate-cmdlist.sh.old $ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt' Benchmark #1: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 591.3 ms ± 31.5 ms [User: 392.9 ms, System: 243.7 ms] Range (min … max): 543.7 ms … 630.6 ms 10 runs Benchmark #2: sh generate-cmdlist.sh.old command-list.txt Time (mean ± σ): 1.236 s ± 0.060 s [User: 1.100 s, System: 0.556 s] Range (min … max): 1.089 s … 1.275 s 10 runs Summary 'sh generate-cmdlist.sh command-list.txt' ran 2.09 ± 0.15 times faster than 'sh generate-cmdlist.sh.old command-list.txt' --- diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index a1ab2b1f07..fab9e6a671 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -67,7 +67,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - printf " | CAT_%s" $(echo "$rest" | get_category_line) + printf " | CAT_%s" $rest echo " }," done echo "};" I think you could also delete get_category_line, as it was inlined in the other caller. -Peff ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature 2021-10-21 14:42 ` Jeff King @ 2021-10-21 16:25 ` Jeff King 0 siblings, 0 replies; 87+ messages in thread From: Jeff King @ 2021-10-21 16:25 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle On Thu, Oct 21, 2021 at 10:42:52AM -0400, Jeff King wrote: > diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh > index a1ab2b1f07..fab9e6a671 100755 > --- a/generate-cmdlist.sh > +++ b/generate-cmdlist.sh > @@ -67,7 +67,7 @@ print_command_list () { > while read cmd rest > do > printf " { \"$cmd\", $(get_synopsis $cmd), 0" > - printf " | CAT_%s" $(echo "$rest" | get_category_line) > + printf " | CAT_%s" $rest > echo " }," > done > echo "};" > > I think you could also delete get_category_line, as it was inlined in > the other caller. Just for fun, I did a pure-shell loop to drop get_synopsis, which means we don't exec any processes inside the loop. That patch is below, which yields the timings (orig is up to your patch 6, no-sort is the patch above, and pure-shell is the patch below on top): $ hyperfine --warmup 1 -L v orig,no-sort,pure-shell -p 'make clean' 'sh generate-cmdlist.sh.{v} command-list.txt' Benchmark #1: sh generate-cmdlist.sh.orig command-list.txt Time (mean ± σ): 1.286 s ± 0.148 s [User: 1.503 s, System: 0.781 s] Range (min … max): 0.938 s … 1.451 s 10 runs Benchmark #2: sh generate-cmdlist.sh.no-sort command-list.txt Time (mean ± σ): 553.6 ms ± 143.3 ms [User: 396.7 ms, System: 198.3 ms] Range (min … max): 192.6 ms … 683.5 ms 10 runs Benchmark #3: sh generate-cmdlist.sh.pure-shell command-list.txt Time (mean ± σ): 29.7 ms ± 15.6 ms [User: 22.6 ms, System: 19.4 ms] Range (min … max): 12.0 ms … 49.1 ms 10 runs Summary 'sh generate-cmdlist.sh.pure-shell command-list.txt' ran 18.65 ± 10.93 times faster than 'sh generate-cmdlist.sh.no-sort command-list.txt' 43.33 ± 23.32 times faster than 'sh generate-cmdlist.sh.orig command-list.txt' So that's building all of the commands faster than I could get even "touch Documentation/git-add.txt && make command-list.h" to run with your patch (not entirely fair; I'm not invoking make here, which probably does add 100ms of overhead, but I think it's still a net win). The patch below doesn't enforce the /NAME/ section as the sed does. IMHO that's not of much value because it uses the line with the command-name as the lower bound. But it could be done pretty easily with an extra $seen_name variable. diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index fab9e6a671..eae4bbb4c7 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -22,16 +22,6 @@ category_list () { LC_ALL=C sort -u } -get_synopsis () { - sed -n ' - /^NAME/,/'"$1"'/H - ${ - x - s/.*'"$1"' - \(.*\)/N_("\1")/ - p - }' "Documentation/$1.txt" -} - define_categories () { echo echo "/* Command categories */" @@ -66,7 +56,18 @@ print_command_list () { command_list "$1" | while read cmd rest do - printf " { \"$cmd\", $(get_synopsis $cmd), 0" + synopsis= + while read line + do + case "$line" in + "$cmd - "*) + synopsis=${line#$cmd - } + break + ;; + esac + done <"Documentation/$cmd.txt" + + printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis" printf " | CAT_%s" $rest echo " }," done ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason ` (5 preceding siblings ...) 2021-10-20 18:39 ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-21 14:45 ` Jeff King 2021-10-21 22:46 ` Øystein Walle 2021-10-20 18:39 ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason ` (2 subsequent siblings) 9 siblings, 2 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason Change the dependency graph for "command-list.h" so that it's not re-built in its entirety every time one file in Documentation/git*.txt changes. This makes the generation of the command-list.h use a pattern similar to that established in 8650c6298c1 (doc lint: make "lint-docs" non-.PHONY, 2021-10-15) for the "lint-docs" target. This change replaces the monolithic "generate-cmdlist.sh" invocation with a dependency chain like: Documentation/git-add.txt -> .build/command-list.h.d/git-add.gen (and "git-add.txt" intermediary) -> command-list.h I.e. we'll now generate intermediary files when extracting the "NAME" section the documentation. Here the 6th line of "Documentation/git.txt" will be extracted to ".build/command-list.h.d/git-add.gen.txt": git-add - Add file contents to the index We'll then create an intermediate ".build/command-list.h.d/git-add.gen" (leading "\t" stripped): ./generate-cmdlist.sh --entry-only .build/command-list.h.d/git-add.gen.txt { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree }, The "command-list.h" itself can the be made by having "generate-cmdlist.sh" emit only the header section, followed by: LC_ALL=sort .build/command-list.h.d/*.gen Unfortunately we can't drop the old code completely due to the CMake integration, see 061c2240b1b (Introduce CMake support for configuring Git, 2020-06-12). It will keep using the older and slower script. With this the initial creation of the command-list.h is a bit slower with -j1, but around 2x as fast -j8[1]. The real benefit comes from the more common case of an incremental build, say when only "Documentation/git-add.txt" was updated after a "pull". There we're 4-5x as fast with this new method[2]. The benefit of optimizing this is because this file is very frequently re-generated, e.g. for "git rebase -i --exec 'make git'" with "git-add.txt" modified we're around 1.7 times as fast[3]. That target will need to re-make "git" (via "help.o") due to help.c's use of command-list.h. In terms of implementation: I'm adding $(QUIET) while I'm at it, which is here so we don't quiet the equivalent of trace output under V=1, this could be used in other places that use "@cmd" to quiet "cmd" output. Some of the dependencies between "command-list.h" and "$(COMMAND_LIST_GEN)" are redundant, considering that the former depends on the latter. I'm sticking to listing dependencies mentioned in the rule itself, e.g. "command-list.h" itself calls "generate-cmdlist.sh", so I list the dependency even though it would get it via a recursive dependency. These rules can be lazy about leaving behind files on error thanks to the .DELETE_ON_ERROR flag, see 7b76d6bf221 (Makefile: add and use the ".DELETE_ON_ERROR" flag, 2021-06-29), or in the case of the "*.txt.gen" files because we'll unconditionally clobber them anyway if the relevant source file is touched, so we can leave them for "make clean". 1. Piped through: <cmd> 2>/dev/null | grep -P -v "^\s*$"; ditto below: hyperfine -s basic -L j ", -j8" -L s ,.old -p 'rm -rf .build command-list.h' 'make{j} -f Makefile{s} command-list.h' Benchmark #1: make -f Makefile command-list.h Time (mean ± σ): 769.3 ms ± 90.4 ms [User: 892.4 ms, System: 98.8 ms] Range (min … max): 665.6 ms … 941.9 ms 10 runs Benchmark #2: make -j8 -f Makefile command-list.h Time (mean ± σ): 212.5 ms ± 45.0 ms [User: 954.4 ms, System: 136.9 ms] Range (min … max): 187.6 ms … 326.2 ms 11 runs Benchmark #3: make -f Makefile.old command-list.h Time (mean ± σ): 515.0 ms ± 70.5 ms [User: 613.8 ms, System: 110.9 ms] Range (min … max): 407.6 ms … 603.9 ms 10 runs Benchmark #4: make -j8 -f Makefile.old command-list.h Time (mean ± σ): 474.9 ms ± 62.7 ms [User: 569.7 ms, System: 106.0 ms] Range (min … max): 407.0 ms … 556.8 ms 10 runs Summary 'make -j8 -f Makefile command-list.h' ran 2.24 ± 0.56 times faster than 'make -j8 -f Makefile.old command-list.h' 2.42 ± 0.61 times faster than 'make -f Makefile.old command-list.h' 3.62 ± 0.88 times faster than 'make -f Makefile command-list.h' 2. hyperfine -s basic -L j ", -j8" -L s ,.old -p 'touch Documentation/git-add.txt' 'make{j} -f Makefile{s} command-list.h' Benchmark #1: make -f Makefile command-list.h Time (mean ± σ): 94.6 ms ± 26.7 ms [User: 83.0 ms, System: 14.1 ms] Range (min … max): 79.4 ms … 186.0 ms 15 runs Benchmark #2: make -j8 -f Makefile command-list.h Time (mean ± σ): 80.3 ms ± 10.5 ms [User: 81.3 ms, System: 12.9 ms] Range (min … max): 76.9 ms … 127.6 ms 37 runs Benchmark #3: make -f Makefile.old command-list.h Time (mean ± σ): 348.9 ms ± 174.5 ms [User: 415.3 ms, System: 70.6 ms] Range (min … max): 66.0 ms … 550.3 ms 43 runs Benchmark #4: make -j8 -f Makefile.old command-list.h Time (mean ± σ): 406.7 ms ± 157.1 ms [User: 473.4 ms, System: 78.3 ms] Range (min … max): 67.4 ms … 560.4 ms 34 runs Summary 'make -j8 -f Makefile command-list.h' ran 1.18 ± 0.37 times faster than 'make -f Makefile command-list.h' 4.34 ± 2.25 times faster than 'make -f Makefile.old command-list.h' 5.07 ± 2.07 times faster than 'make -j8 -f Makefile.old command-list.h' 3. hyperfine -s basic -L j " -j8" -L s ,.old -p 'touch Documentation/git-add.txt' 'make{j} -f Makefile{s} git' Benchmark #1: make -j8 -f Makefile git Time (mean ± σ): 484.2 ms ± 53.5 ms [User: 253.2 ms, System: 83.5 ms] Range (min … max): 400.7 ms … 580.5 ms 10 runs Benchmark #2: make -j8 -f Makefile.old git Time (mean ± σ): 836.7 ms ± 37.3 ms [User: 729.2 ms, System: 157.1 ms] Range (min … max): 774.2 ms … 885.6 ms 10 runs Summary 'make -j8 -f Makefile git' ran 1.73 ± 0.21 times faster than 'make -j8 -f Makefile.old git' Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- .gitignore | 1 + Makefile | 52 +++++++++++++++++++++++++++++++++++++++------ generate-cmdlist.sh | 36 +++++++++++++++++++++++++------ 3 files changed, 77 insertions(+), 12 deletions(-) diff --git a/.gitignore b/.gitignore index 054249b20a8..dc03a988b16 100644 --- a/.gitignore +++ b/.gitignore @@ -198,6 +198,7 @@ *.[aos] *.o.json *.py[co] +.build/ .depend/ *.gcda *.gcno diff --git a/Makefile b/Makefile index 381bed2c1d2..ce4cce57eb8 100644 --- a/Makefile +++ b/Makefile @@ -1954,6 +1954,7 @@ endif ifneq ($(findstring s,$(MAKEFLAGS)),s) ifndef V + QUIET = @ QUIET_CC = @echo ' ' CC $@; QUIET_AR = @echo ' ' AR $@; QUIET_LINK = @echo ' ' LINK $@; @@ -2242,12 +2243,50 @@ config-list.h: generate-configlist.sh config-list.h: Documentation/*config.txt Documentation/config/*.txt $(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh >$@ -command-list.h: generate-cmdlist.sh command-list.txt - -command-list.h: $(wildcard Documentation/git*.txt) - $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \ - $(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \ - command-list.txt >$@ +EXCLUDED_COMMAND_LIST = +EXCLUDED_COMMAND_LIST += git +EXCLUDED_COMMAND_LIST += git-bisect-lk2009 +EXCLUDED_COMMAND_LIST += git-credential-cache--daemon +EXCLUDED_COMMAND_LIST += git-fsck-objects +EXCLUDED_COMMAND_LIST += git-init-db +EXCLUDED_COMMAND_LIST += git-mergetool--lib +EXCLUDED_COMMAND_LIST += git-remote-ext +EXCLUDED_COMMAND_LIST += git-remote-fd +EXCLUDED_COMMAND_LIST += git-sh-i18n--envsubst +EXCLUDED_COMMAND_LIST += git-submodules +EXCLUDED_COMMAND_LIST += git-tools +EXCLUDED_COMMAND_LIST += git-version +EXCLUDED_COMMAND_LIST += git-web--browse +EXCLUDED_COMMAND_LIST += gitweb.conf + +EXCLUDED_TXT += $(patsubst %,Documentation/%.txt,$(EXCLUDED_PROGRAMS) $(EXCLUDED_COMMAND_LIST)) +COMMAND_LIST_TXT_DEP = $(filter-out $(EXCLUDED_TXT), $(wildcard Documentation/git*.txt)) + +COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,.build/command-list.h.d/%.gen,$(COMMAND_LIST_TXT_DEP)) + +.build: + $(QUIET)mkdir .build +.build/command-list.h.d: | .build + $(QUIET)mkdir -p .build/command-list.h.d + +# We must depend on .build/command-list.h as an "order-only" +# prerequisite, its mtime will change when these targets run. +$(COMMAND_LIST_GEN): | .build/command-list.h.d +$(COMMAND_LIST_GEN): command-list.txt +$(COMMAND_LIST_GEN): generate-cmdlist.sh +$(COMMAND_LIST_GEN): .build/command-list.h.d/%.gen: Documentation/%.txt + $(QUIET)grep "^$(patsubst .build/command-list.h.d/%.gen,%,$@) " command-list.txt >$@.txt && \ + ./generate-cmdlist.sh --entry-only $@.txt >$@ + +command-list.h: $(COMMAND_LIST_GEN) +command-list.h: generate-cmdlist.sh +command-list.h: command-list.txt + $(QUIET_GEN){ \ + $(SHELL_PATH) ./generate-cmdlist.sh --header-only command-list.txt && \ + echo "static struct cmdname_help command_list[] = {" && \ + LC_ALL=C sort $(COMMAND_LIST_GEN) && \ + echo "};"; \ + } >$@ hook-list.h: generate-hooklist.sh Documentation/githooks.txt $(QUIET_GEN)$(SHELL_PATH) ./generate-hooklist.sh >$@ @@ -3238,6 +3277,7 @@ clean: profile-clean coverage-clean cocciclean $(RM) $(SP_OBJ) $(RM) $(HCC) $(RM) -r bin-wrappers $(dep_dirs) $(compdb_dir) compile_commands.json + $(RM) -r .build/ $(RM) -r po/build/ $(RM) *.pyc *.pyo */*.pyc */*.pyo $(GENERATED_H) $(ETAGS_TARGET) tags cscope* $(RM) -r .dist-tmp-dir .doc-tmp-dir diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index a1ab2b1f077..2bc528e8cae 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -15,7 +15,7 @@ get_category_line () { } category_list () { - command_list "$1" | + grep -v '^#' "$1" | cut -c 40- | tr ' ' '\012' | grep -v '^$' | @@ -23,13 +23,14 @@ category_list () { } get_synopsis () { + head -n 10 "Documentation/$1.txt" | sed -n ' /^NAME/,/'"$1"'/H ${ x s/.*'"$1"' - \(.*\)/N_("\1")/ p - }' "Documentation/$1.txt" + }' } define_categories () { @@ -61,16 +62,12 @@ define_category_names () { } print_command_list () { - echo "static struct cmdname_help command_list[] = {" - - command_list "$1" | while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" printf " | CAT_%s" $(echo "$rest" | get_category_line) echo " }," done - echo "};" } exclude_programs= @@ -81,6 +78,19 @@ do shift done +header_only= +case $1 in +--entry-only) + shift + print_command_list $1 <"$1" + exit 0 + ;; +--header-only) + shift + header_only=t + ;; +esac + echo "/* Automatically generated by generate-cmdlist.sh */ struct cmdname_help { const char *name; @@ -92,4 +102,18 @@ define_categories "$1" echo define_category_names "$1" echo + +if test -n "$header_only" +then + exit 0 +fi + +# The old compatibility mode for CMmake. See 061c2240b1b (Introduce +# CMake support for configuring Git, 2020-06-12) +echo "static struct cmdname_help command_list[] = {" +grep -v \ + -e '^#' \ + -e '^git-fsck-objects ' \ + "$1" | print_command_list "$1" +echo "};" -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard 2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason @ 2021-10-21 14:45 ` Jeff King 2021-10-21 18:24 ` Junio C Hamano 2021-10-21 22:46 ` Øystein Walle 1 sibling, 1 reply; 87+ messages in thread From: Jeff King @ 2021-10-21 14:45 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20, 2021 at 08:39:58PM +0200, Ævar Arnfjörð Bjarmason wrote: > get_synopsis () { > + head -n 10 "Documentation/$1.txt" | > sed -n ' > /^NAME/,/'"$1"'/H > ${ > x > s/.*'"$1"' - \(.*\)/N_("\1")/ > p > - }' "Documentation/$1.txt" > + }' > } By the way, I'm not sure about the utility of this change. It reduces the number of lines that sed looks at, but at the cost of an extra process. That's probably a net loss. And if we did want to limit the data sed covers, doing "pq" after we matched would be simpler. It also feels like it's orthogonal to what this patch is doing, but maybe there's some subtle non-performance reason to want this. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard 2021-10-21 14:45 ` Jeff King @ 2021-10-21 18:24 ` Junio C Hamano 0 siblings, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-10-21 18:24 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, git, Johannes Sixt, Øystein Walle Jeff King <peff@peff.net> writes: > On Wed, Oct 20, 2021 at 08:39:58PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> get_synopsis () { >> + head -n 10 "Documentation/$1.txt" | >> sed -n ' >> /^NAME/,/'"$1"'/H >> ${ >> x >> s/.*'"$1"' - \(.*\)/N_("\1")/ >> p >> - }' "Documentation/$1.txt" >> + }' >> } > > By the way, I'm not sure about the utility of this change. It reduces > the number of lines that sed looks at, but at the cost of an extra > process. That's probably a net loss. And if we did want to limit the > data sed covers, doing "pq" after we matched would be simpler. Doesn't the above - slurp the lines into hold space while we are in the synopsis part; - otherwise keep reading and discarding; - and do the processing at end So presumably, instead of waiting till the end, can't we immediately process the thing and exit? I guess your "'pq' after we matched" is saying the same thing. I agree that extra process with an ad-hoc limitation of 10 does leave a bad taste in my mouth. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard 2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason 2021-10-21 14:45 ` Jeff King @ 2021-10-21 22:46 ` Øystein Walle 1 sibling, 0 replies; 87+ messages in thread From: Øystein Walle @ 2021-10-21 22:46 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Git, Junio C Hamano, Jeff King, Johannes Sixt On Wed, 20 Oct 2021 at 20:40, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > Unfortunately we can't drop the old code completely due to the CMake > integration, see 061c2240b1b (Introduce CMake support for configuring > Git, 2020-06-12). It will keep using the older and slower script. I took a quick stab at this earlier today and it's not too difficult to implement. I can take a deeper dive over the weekend if this patch series gets traction. > +header_only= > +case $1 in > +--entry-only) > + shift > + print_command_list $1 <"$1" > + exit 0 > + ;; Why is "$1" given twice here? As far as I can tell print_command_list doesn't reference its arguments anymore. > +# The old compatibility mode for CMmake. See 061c2240b1b (Introduce > +# CMake support for configuring Git, 2020-06-12) s/CMmake/CMake/ > +echo "static struct cmdname_help command_list[] = {" > +grep -v \ > + -e '^#' \ > + -e '^git-fsck-objects ' \ > + "$1" | > print_command_list "$1" > +echo "};" I suppose it stems from this unchanged line. That can be changed to just `print_command_list`. Øsse ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason ` (6 preceding siblings ...) 2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 ` Ævar Arnfjörð Bjarmason 2021-10-20 20:35 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 18:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Ævar Arnfjörð Bjarmason Because generate-cmdlist.sh invokes "sed" to extract the "NAME" blurb from the Documentation/git-*.txt files, we can end up with bad content if those files aren't what we expected. E.g. we'll emit multiple lines from that "sed" one-liner if the "NAME" section were to move further down than what our "head -n 10" covers. Let's assert that this can't happen by checking that the number of lines we make is what we'd expect to get. This means we can remove an assertion added in cfb22a02ab5 (help: use command-list.h for common command list, 2018-05-10). We'll catch this during compilation instead. There are still other cases where it's possible to generate a bad command-list.h, but the rest should be caught by a C compilation error. It would be possible to change our generated "command-list.h" to simply include this new ".build/command-list.h.gen" instead of cat-ing it in the Makefile, but let's leave the generated file as it is. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Makefile | 7 ++++++- generate-cmdlist.sh | 2 +- help.c | 3 --- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index ce4cce57eb8..711f0c064e6 100644 --- a/Makefile +++ b/Makefile @@ -2278,13 +2278,18 @@ $(COMMAND_LIST_GEN): .build/command-list.h.d/%.gen: Documentation/%.txt $(QUIET)grep "^$(patsubst .build/command-list.h.d/%.gen,%,$@) " command-list.txt >$@.txt && \ ./generate-cmdlist.sh --entry-only $@.txt >$@ +.build/command-list.h.gen: $(COMMAND_LIST_GEN) + $(QUIET)LC_ALL=C sort $(COMMAND_LIST_GEN) >$@ && \ + test $$(wc -l <$@) -eq $(words $(COMMAND_LIST_GEN)) + command-list.h: $(COMMAND_LIST_GEN) command-list.h: generate-cmdlist.sh command-list.h: command-list.txt +command-list.h: .build/command-list.h.gen $(QUIET_GEN){ \ $(SHELL_PATH) ./generate-cmdlist.sh --header-only command-list.txt && \ echo "static struct cmdname_help command_list[] = {" && \ - LC_ALL=C sort $(COMMAND_LIST_GEN) && \ + cat $< && \ echo "};"; \ } >$@ diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 2bc528e8cae..bdefa151ae1 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -23,7 +23,7 @@ category_list () { } get_synopsis () { - head -n 10 "Documentation/$1.txt" | + head -n 6 "Documentation/$1.txt" | sed -n ' /^NAME/,/'"$1"'/H ${ diff --git a/help.c b/help.c index 973e47cdc30..cd6dd4c2440 100644 --- a/help.c +++ b/help.c @@ -57,9 +57,6 @@ static void extract_cmds(struct cmdname_help **p_cmds, uint32_t mask) int i, nr = 0; struct cmdname_help *cmds; - if (ARRAY_SIZE(command_list) == 0) - BUG("empty command_list[] is a sign of broken generate-cmdlist.sh"); - ALLOC_ARRAY(cmds, ARRAY_SIZE(command_list) + 1); for (i = 0; i < ARRAY_SIZE(command_list); i++) { -- 2.33.1.1338.g20da966911a ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason ` (7 preceding siblings ...) 2021-10-20 18:39 ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason @ 2021-10-20 20:35 ` Jeff King 2021-10-20 21:31 ` Taylor Blau 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason 9 siblings, 1 reply; 87+ messages in thread From: Jeff King @ 2021-10-20 20:35 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote: > This series is based off an off-hand comment I made about making the > cmdlist.sh faster, in the meantime much of the same methods are > already cooking in "next" for the "lint-docs" target. > > See 7/8 for the main performance numbers, along the way I stole some > patches from Johannes Sixt who'd worked on optimizing the script > before, which compliment this new method of generating this file by > piggy-backing more on GNU make for managing a dependency graph for us. I still think this is a much more complicated and error-prone approach than just making the script faster. I know we can't rely on perl, but could we use it optimistically? The proof-of-concept below on top of your patch 6 does two things: - observes that there is no need for get_category_line in the loop; it is just sorting and de-duping the bitfields, but since we just OR them together, neither of those things matters - uses perl to open each individual doc file to get the synopsis. It _feels_ like this should be something that sed or awk could do, but it is beyond me. However, speculatively trying perl is an easy win, and we can fall back to the shell loop. Here are my timings: Benchmark #1: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 40.4 ms ± 18.1 ms [User: 44.9 ms, System: 7.1 ms] Range (min … max): 20.3 ms … 65.5 ms 10 runs Benchmark #2: sh generate-cmdlist.sh.old command-list.txt Time (mean ± σ): 1.414 s ± 0.038 s [User: 1.641 s, System: 0.863 s] Range (min … max): 1.344 s … 1.451 s 10 runs Summary 'sh generate-cmdlist.sh command-list.txt' ran 34.96 ± 15.66 times faster than 'sh generate-cmdlist.sh.old command-list.txt' I hate having fallbacks, because the seldom-used version may bitrot. I'm tempted to just write that loop in C, but there's a circular dependency with using any of libgit.a (even though it's really only the git porcelain that cares about command-list.h, it goes into help.o which goes into libgit.a. We could break that dependency if we wanted, though). If we can do it in awk, that may be worthwhile. But either way, I think this is superior to trying to parallelize the Makefile: - it actually uses less CPU, rather than just trying to improve wall-clock time by using more cores - there's little chance of having some subtle dependency problem Parallelizing makes a lot of sense to me when the operation is truly expensive. But in this case it's literally just opening a file, and the only reason it's slow is because we spawn a ton of processes to do it. --- diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index a1ab2b1f07..f922eebe23 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -63,11 +63,23 @@ define_category_names () { print_command_list () { echo "static struct cmdname_help command_list[] = {" + # try perl first, as we can do it all in one process + command_list "$1" | + perl -ne ' + my ($cmd, @rest) = split; + open(my $fh, "<", "Documentation/$cmd.txt"); + while (<$fh>) { + next unless /^$cmd - (.*)/; + print "{ \"$cmd\", N_(\"$1\"), 0"; + print " | CAT_$_" for (@rest); + print " },\n"; + } + ' || command_list "$1" | while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - printf " | CAT_%s" $(echo "$rest" | get_category_line) + printf " | CAT_%s" $rest echo " }," done echo "};" ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-20 20:35 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King @ 2021-10-20 21:31 ` Taylor Blau 2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 87+ messages in thread From: Taylor Blau @ 2021-10-20 21:31 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote: > On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote: > > > This series is based off an off-hand comment I made about making the > > cmdlist.sh faster, in the meantime much of the same methods are > > already cooking in "next" for the "lint-docs" target. > > > > See 7/8 for the main performance numbers, along the way I stole some > > patches from Johannes Sixt who'd worked on optimizing the script > > before, which compliment this new method of generating this file by > > piggy-backing more on GNU make for managing a dependency graph for us. > > I still think this is a much more complicated and error-prone approach > than just making the script faster. I know we can't rely on perl, but > could we use it optimistically? I'll take credit for this terrible idea of using Perl when available. But I don't think we even need to, since we could just rely on Awk. That has all the benefits you described while still avoiding the circular dependency on libgit.a. But the killer feature is that we don't have to rely on two implementations, the lesser-used of which is likely to bitrot over time. The resulting awk is a little ugly, because of the nested-ness. I'm also no awk-spert, but I think that something like the below gets the job done. It also has the benefit of being slightly faster than the equivalent Perl implementation, for whatever those extra ~9 ms are worth ;). Benchmark #1: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 25.3 ms ± 5.3 ms [User: 31.1 ms, System: 8.3 ms] Range (min … max): 15.5 ms … 31.7 ms 95 runs Benchmark #2: sh generate-cmdlist.sh.old command-list.txt Time (mean ± σ): 34.9 ms ± 9.8 ms [User: 41.0 ms, System: 6.9 ms] Range (min … max): 22.4 ms … 54.8 ms 64 runs Summary 'sh generate-cmdlist.sh command-list.txt' ran 1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt' --- diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index a1ab2b1f07..39338ef1cc 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -64,12 +64,19 @@ print_command_list () { echo "static struct cmdname_help command_list[] = {" command_list "$1" | - while read cmd rest - do - printf " { \"$cmd\", $(get_synopsis $cmd), 0" - printf " | CAT_%s" $(echo "$rest" | get_category_line) - echo " }," - done + awk '{ + f="Documentation/" $1 ".txt" + while((getline line<f) > 0) { + if (match(line, "^" $1 " - ")) { + syn=substr(line, RLENGTH+1) + printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn + for (i=2; i<=NF; i++) { + printf " | CAT_%s", $i + } + print " }," + } + } + }' echo "};" } ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-20 21:31 ` Taylor Blau @ 2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason 2021-10-20 23:46 ` Jeff King 2021-10-21 5:39 ` Eric Sunshine 0 siblings, 2 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-20 23:14 UTC (permalink / raw) To: Taylor Blau Cc: Jeff King, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20 2021, Taylor Blau wrote: > On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote: >> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> > This series is based off an off-hand comment I made about making the >> > cmdlist.sh faster, in the meantime much of the same methods are >> > already cooking in "next" for the "lint-docs" target. >> > >> > See 7/8 for the main performance numbers, along the way I stole some >> > patches from Johannes Sixt who'd worked on optimizing the script >> > before, which compliment this new method of generating this file by >> > piggy-backing more on GNU make for managing a dependency graph for us. >> >> I still think this is a much more complicated and error-prone approach >> than just making the script faster. I know we can't rely on perl, but >> could we use it optimistically? Jeff: Just in terms of error prone both of these implementations will accept bad input that's being caught in 8/8 of this series. We accept a lot of bad input now, ending up with some combinations of bad output or compile errors if you screw with the input *.txt files. I think I've addressed all of those in this series. If you mean the general concept of making a "foo.gen" from a "foo.txt" as an intermediate with make as a way to get to "many-foo.h" I don't really see how it's error prone conceptually. You get error checking each step of the way, and it encourages logic that's simpler each step of the way. > I'll take credit for this terrible idea of using Perl when available. > > But I don't think we even need to, since we could just rely on Awk. That > has all the benefits you described while still avoiding the circular > dependency on libgit.a. But the killer feature is that we don't have to > rely on two implementations, the lesser-used of which is likely to > bitrot over time. > > The resulting awk is a little ugly, because of the nested-ness. I'm also > no awk-spert, but I think that something like the below gets the job > done. > > It also has the benefit of being slightly faster than the equivalent > Perl implementation, for whatever those extra ~9 ms are worth ;). > > Benchmark #1: sh generate-cmdlist.sh command-list.txt > Time (mean ± σ): 25.3 ms ± 5.3 ms [User: 31.1 ms, System: 8.3 ms] > Range (min … max): 15.5 ms … 31.7 ms 95 runs > > Benchmark #2: sh generate-cmdlist.sh.old command-list.txt > Time (mean ± σ): 34.9 ms ± 9.8 ms [User: 41.0 ms, System: 6.9 ms] > Range (min … max): 22.4 ms … 54.8 ms 64 runs > > Summary > 'sh generate-cmdlist.sh command-list.txt' ran > 1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt' > > --- > > diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh > index a1ab2b1f07..39338ef1cc 100755 > --- a/generate-cmdlist.sh > +++ b/generate-cmdlist.sh > @@ -64,12 +64,19 @@ print_command_list () { > echo "static struct cmdname_help command_list[] = {" > > command_list "$1" | > - while read cmd rest > - do > - printf " { \"$cmd\", $(get_synopsis $cmd), 0" > - printf " | CAT_%s" $(echo "$rest" | get_category_line) > - echo " }," > - done > + awk '{ > + f="Documentation/" $1 ".txt" > + while((getline line<f) > 0) { > + if (match(line, "^" $1 " - ")) { > + syn=substr(line, RLENGTH+1) > + printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn > + for (i=2; i<=NF; i++) { > + printf " | CAT_%s", $i > + } > + print " }," > + } > + } > + }' > echo "};" > } Per Eric's Sunshine's upthread comments an awk and Perl implementation were both considered before[1]. I also care a bit about the timings of the from-scratch build, but I think they're way less interesting than a partial build. I.e. I think if you e.g. touch Documentation/git-a*.txt with this series with/without this awk version the difference in runtime is within the error bars. I.e. making the loop faster isn't necessary. It's better to get to a point where make can save you from doing all/most of the work by checking modification times, rather than making an O(n) loop faster. The only reason there's even a loop there is because it's used by the cmake logic in contrib/* (how we've ended up with a hard dependency in contrib is another matter...). I'm also interested in (and have WIP patches for) simplifying things more generally in the Makefile. Once we have a file exploded out has just the synopsis line that can be used to replace what's now in Documentation/cmd-list.perl, i.e. those summary blurbs also end up in "man git". There's subtle dependency issues there as well, and just having a one-off solution for the the command-list.h doesn't get us closer to addressing that sibling implementation. In terms of future Makefile work I was hoping to get this in, untangle some of the complexity between the inter-dependency of Makefile & Documentation/Makefile (eventually just merging the two, and leaving a stub in Documentation/Makefile). I've also got a working implementation for getting rid of all of the "FORCE" dependencies (except the version one). 1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/ ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason @ 2021-10-20 23:46 ` Jeff King 2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason 2021-10-21 5:39 ` Eric Sunshine 1 sibling, 1 reply; 87+ messages in thread From: Jeff King @ 2021-10-20 23:46 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Thu, Oct 21, 2021 at 01:14:37AM +0200, Ævar Arnfjörð Bjarmason wrote: > Jeff: Just in terms of error prone both of these implementations will > accept bad input that's being caught in 8/8 of this series. > > We accept a lot of bad input now, ending up with some combinations of > bad output or compile errors if you screw with the input *.txt files. I > think I've addressed all of those in this series. I don't mind more error-checking, though TBH I don't find a huge value in it. But what I did mean was: > If you mean the general concept of making a "foo.gen" from a "foo.txt" > as an intermediate with make as a way to get to "many-foo.h" I don't > really see how it's error prone conceptually. You get error checking > each step of the way, and it encourages logic that's simpler each step > of the way. Yes. It just seems like the Makefile gets more complicated, and sometimes that can lead to subtle dependency issues (e.g., the ".build" dependency in the earlier iteration of the series). And in general I'd much rather debug an awk script than a Makefile. > Per Eric's Sunshine's upthread comments an awk and Perl implementation > were both considered before[1]. Ah sorry, I thought it was just a perl one that had been the show-stopper. I hadn't noticed the awk one. However, the point of my patch was to use perl if available, and fall back otherwise. Maybe that's too ugly, but it does address the concern with Eric's implementation. > I.e. I think if you e.g. touch Documentation/git-a*.txt with this series > with/without this awk version the difference in runtime is within the > error bars. I.e. making the loop faster isn't necessary. It's better to > get to a point where make can save you from doing all/most of the work > by checking modification times, rather than making an O(n) loop faster. FWIW, I don't agree with this paragraph at all. Parallelizing or reusing partial results is IMHO inferior to just making things faster. > I'm also interested in (and have WIP patches for) simplifying things > more generally in the Makefile. Once we have a file exploded out has > just the synopsis line that can be used to replace what's now in > Documentation/cmd-list.perl, i.e. those summary blurbs also end up in > "man git". > > There's subtle dependency issues there as well, and just having a > one-off solution for the the command-list.h doesn't get us closer to > addressing that sibling implementation. So I don't know what "subtle dependency issues" you found here, but this is exactly the kind of complexity it was my goal to avoid. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-20 23:46 ` Jeff King @ 2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason 2021-10-21 2:20 ` Taylor Blau 2021-10-21 14:34 ` Jeff King 0 siblings, 2 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-21 0:48 UTC (permalink / raw) To: Jeff King Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20 2021, Jeff King wrote: > On Thu, Oct 21, 2021 at 01:14:37AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Jeff: Just in terms of error prone both of these implementations will >> accept bad input that's being caught in 8/8 of this series. >> >> We accept a lot of bad input now, ending up with some combinations of >> bad output or compile errors if you screw with the input *.txt files. I >> think I've addressed all of those in this series. > > I don't mind more error-checking, though TBH I don't find a huge value > in it. But what I did mean was: > >> If you mean the general concept of making a "foo.gen" from a "foo.txt" >> as an intermediate with make as a way to get to "many-foo.h" I don't >> really see how it's error prone conceptually. You get error checking >> each step of the way, and it encourages logic that's simpler each step >> of the way. > > Yes. It just seems like the Makefile gets more complicated, and > sometimes that can lead to subtle dependency issues (e.g., the ".build" > dependency in the earlier iteration of the series). FWIW there wasn't an earlier version of the series, just a POC patch I had as a comment in https://lore.kernel.org/git/87r1gqxqxn.fsf@evledraar.gmail.com/ > And in general I'd much rather debug an awk script than a Makefile. > >> Per Eric's Sunshine's upthread comments an awk and Perl implementation >> were both considered before[1]. > > Ah sorry, I thought it was just a perl one that had been the > show-stopper. I hadn't noticed the awk one. However, the point of my > patch was to use perl if available, and fall back otherwise. Maybe > that's too ugly, but it does address the concern with Eric's > implementation. I think carrying two implementations is worse than just having the one slightly slower one. >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series >> with/without this awk version the difference in runtime is within the >> error bars. I.e. making the loop faster isn't necessary. It's better to >> get to a point where make can save you from doing all/most of the work >> by checking modification times, rather than making an O(n) loop faster. > > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing > partial results is IMHO inferior to just making things faster. I agree with you in the general case, but for something that's consumed by a make dependency graph I find it easier to debug things if e.g. changing git-add.txt results in a change to git-add.gen, which is then cat'd together. IOW if we had a sufficiently fast C compiler I think I'd still prefer make's existing rules over some equivalent of: cat *.c | super-fast-cc Since similar to how the *.sp files depend on the the *.o files now, declaring the dependency graph allows you to easily add more built things. >> I'm also interested in (and have WIP patches for) simplifying things >> more generally in the Makefile. Once we have a file exploded out has >> just the synopsis line that can be used to replace what's now in >> Documentation/cmd-list.perl, i.e. those summary blurbs also end up in >> "man git". >> >> There's subtle dependency issues there as well, and just having a >> one-off solution for the the command-list.h doesn't get us closer to >> addressing that sibling implementation. > > So I don't know what "subtle dependency issues" you found here, but this > is exactly the kind of complexity it was my goal to avoid. But how? I don't see how narrowly making the loop in generate-cmdlist.sh gets us closer to generating the "cmds_txt" in the Documentation/Makefile. Whereas after this series we're pretty much there in terms of generating those files. i.e. try: cat Documentation/cmds-mainporcelain.txt All of those synopsis blurbs are extracted, and reverse-attributable to the corresponding files. The dependencies there are (arguably) subtly broken because those files aren't re-made if a "cmd-list.made" is more recent, so if you remove one of the generated text files the Makefile logic will get stuck because the graph is incomplete (which can happen e.g. if "make clean" is interrupted, or you run a "git clean -dxf '*.txt'". I did the latter and ran into that recently, because I was trying to ad-hoc fix another more general dependency issue we tend to have, which is using wildcards on potentially generated files, so if you checkout a new verison, build, and then checkout an old version (or are adding one of the files involved) a script like build-docdep.perl will "helpfully" pick up bad dependencies. I guess you could argue that those are all problems with the Makefile, but I think they're ultimately best solved by driving the dependencies from the Makefile. I.e. all we need is the one list of built-ins in command-list.txt, pair that up with the "category" and we can always generated everything down to the manpages correctly without relying on FS wildcards. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason @ 2021-10-21 2:20 ` Taylor Blau 2021-10-22 12:37 ` Ævar Arnfjörð Bjarmason 2021-10-21 14:34 ` Jeff King 1 sibling, 1 reply; 87+ messages in thread From: Taylor Blau @ 2021-10-21 2:20 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Jeff King, Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Per Eric's Sunshine's upthread comments an awk and Perl implementation > >> were both considered before[1]. > > > > Ah sorry, I thought it was just a perl one that had been the > > show-stopper. I hadn't noticed the awk one. However, the point of my > > patch was to use perl if available, and fall back otherwise. Maybe > > that's too ugly, but it does address the concern with Eric's > > implementation. > > I think carrying two implementations is worse than just having the one > slightly slower one. I have no opinion on whether or not assuming that awk or Perl exists and can be relied upon during the build is reasonable or not. It seems like the former might be a slightly safer assumption than the latter, but in all honesty it seems like both are always likely to be around. In any case, I think the point was that we could improve upon Peff's patch by just having a single implementation done in awk. And when I wrote that I definitely was in the mindset of being able to rely on awk during compilation. > >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series > >> with/without this awk version the difference in runtime is within the > >> error bars. I.e. making the loop faster isn't necessary. It's better to > >> get to a point where make can save you from doing all/most of the work > >> by checking modification times, rather than making an O(n) loop faster. > > > > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing > > partial results is IMHO inferior to just making things faster. > > I agree with you in the general case, but for something that's consumed > by a make dependency graph I find it easier to debug things if > e.g. changing git-add.txt results in a change to git-add.gen, which is > then cat'd together. > > IOW if we had a sufficiently fast C compiler I think I'd still prefer > make's existing rules over some equivalent of: > > cat *.c | super-fast-cc > > Since similar to how the *.sp files depend on the the *.o files now, > declaring the dependency graph allows you to easily add more built > things. This seems like an unfair comparison to me. I might be more sympathetic if we were generating a more complicated artifact by running generate-cmdlist.sh, but its inputs and outputs seem very well defined (and non-complicated) to me. In any case, I agree with Peff that this isn't the approach that I would have taken. But I also think that *just* parallelizing isn't necessarily a win here. There are two reasons I think that: - The cognitive load required to parallelize this process is complicated; the .build directory seems like another thing to keep track of, and it's not clear to me what updates it, or what the result of touching some file in that directory is. - But even if the parallelization was achievable by more straightforward means, you still have to do the slow thing when you're rebuilding from scratch. So this is strictly worse the first time you are compiling, at least on machines with fewer cores. In any case, this is all overkill in my mind for what we are talking about. I agree that 'cat *.c | super-fast-cc' is worse than a competent Makefile that knows what to build and when. But the problem here is a slow loop in shell that is easily made much faster by implementing it in a language that can execute the whole loop in a single process. Thanks, Taylor ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-21 2:20 ` Taylor Blau @ 2021-10-22 12:37 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 12:37 UTC (permalink / raw) To: Taylor Blau Cc: Jeff King, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20 2021, Taylor Blau wrote: > On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> Per Eric's Sunshine's upthread comments an awk and Perl implementation >> >> were both considered before[1]. >> > >> > Ah sorry, I thought it was just a perl one that had been the >> > show-stopper. I hadn't noticed the awk one. However, the point of my >> > patch was to use perl if available, and fall back otherwise. Maybe >> > that's too ugly, but it does address the concern with Eric's >> > implementation. >> >> I think carrying two implementations is worse than just having the one >> slightly slower one. > > I have no opinion on whether or not assuming that awk or Perl exists and > can be relied upon during the build is reasonable or not. It seems like > the former might be a slightly safer assumption than the latter, but in > all honesty it seems like both are always likely to be around. > > In any case, I think the point was that we could improve upon Peff's > patch by just having a single implementation done in awk. And when I > wrote that I definitely was in the mindset of being able to rely on awk > during compilation. > >> >> I.e. I think if you e.g. touch Documentation/git-a*.txt with this series >> >> with/without this awk version the difference in runtime is within the >> >> error bars. I.e. making the loop faster isn't necessary. It's better to >> >> get to a point where make can save you from doing all/most of the work >> >> by checking modification times, rather than making an O(n) loop faster. >> > >> > FWIW, I don't agree with this paragraph at all. Parallelizing or reusing >> > partial results is IMHO inferior to just making things faster. >> >> I agree with you in the general case, but for something that's consumed >> by a make dependency graph I find it easier to debug things if >> e.g. changing git-add.txt results in a change to git-add.gen, which is >> then cat'd together. >> >> IOW if we had a sufficiently fast C compiler I think I'd still prefer >> make's existing rules over some equivalent of: >> >> cat *.c | super-fast-cc >> >> Since similar to how the *.sp files depend on the the *.o files now, >> declaring the dependency graph allows you to easily add more built >> things. > > This seems like an unfair comparison to me. I might be more sympathetic > if we were generating a more complicated artifact by running > generate-cmdlist.sh, but its inputs and outputs seem very well defined > (and non-complicated) to me. They are? a foo.o to foo.o input is relatively uncomplicated, and you can discover the exact dependencies with 3rd party tools, like the GCC and Clang switches we use generate the .depends dirs[1]. Whereas with the custom shellscripts that have for-loops of their own like generate-cmdlist.sh what it depends on exactly is relatively opaque to you until you read the shellscript. I guess it's a matter of taste, but if you run this with/without this series: touch Documentation/git-a*.txt; time make -j1 command-list.h --debug=b V=1 You'll see that before we'd spot that e.g. git-add.txt changed, but we'll run one target in response to that at the end. So it's just like what you'd get when you make %.o from %.c to produce a "program" that links all those %.o together at the end. So I do think it's a fair comparison, if anything it's unfair to this series, because as noted you can discover these dependency independently with GCC etc. for C code. But for a custom *.txt format with an ad-hoc *.sh to parse it there's no such aid available. 1. $ cat .depend/help.o.d help.o: help.c cache.h git-compat-util.h compat/bswap.h wildmatch.h \ banned.h strbuf.h hashmap.h hash.h repository.h path.h sha1dc_git.h \ sha1collisiondetection/lib/sha1.h sha256/block/sha256.h list.h advice.h \ gettext.h convert.h string-list.h trace.h trace2.h pack-revindex.h \ oid-array.h mem-pool.h config.h builtin.h commit.h object.h tree.h \ decorate.h gpg-interface.h pretty.h commit-slab.h commit-slab-decl.h \ commit-slab-impl.h exec-cmd.h run-command.h thread-utils.h strvec.h \ levenshtein.h help.h command-list.h column.h version.h refs.h \ parse-options.h prompt.h [...] ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason 2021-10-21 2:20 ` Taylor Blau @ 2021-10-21 14:34 ` Jeff King 2021-10-21 22:34 ` Junio C Hamano 2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason 1 sibling, 2 replies; 87+ messages in thread From: Jeff King @ 2021-10-21 14:34 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> There's subtle dependency issues there as well, and just having a > >> one-off solution for the the command-list.h doesn't get us closer to > >> addressing that sibling implementation. > > > > So I don't know what "subtle dependency issues" you found here, but this > > is exactly the kind of complexity it was my goal to avoid. > > But how? I don't see how narrowly making the loop in generate-cmdlist.sh > gets us closer to generating the "cmds_txt" in the > Documentation/Makefile. What I meant is that the work to get everything right in the Makefile to correctly handle dependencies and a partial rebuild can be tricky. For instance, you're still stuck with a big wildcard dependency on Documentation/git*.txt (and a manual list of exclusions in the Makefile) because it's hard in make to do make new dynamic rules based on an existing one (i.e., the list _should_ come from what's in command-list.txt). Or the fact that we apparently need to keep the old script around or cmake anyway. It's also much slower. Here are from-scratch builds before and after your patch 7: $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h' Benchmark #1: make command-list.h Time (mean ± σ): 1.527 s ± 0.060 s [User: 1.320 s, System: 0.649 s] Range (min … max): 1.433 s … 1.625 s 10 runs $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h' Benchmark #1: make command-list.h Time (mean ± σ): 2.661 s ± 0.080 s [User: 2.359 s, System: 1.082 s] Range (min … max): 2.481 s … 2.756 s 10 runs I know that partial builds will offset that in some cases, but it still feels like a step in the wrong direction. Even with a partial build, swapping out "make clean" for "touch Documentation/git-add.txt" takes about 200ms for me. Whereas with the faster version of generate-cmdlist.sh I showed, it takes 150ms. Now performance isn't everything, and it's possible these partial snippets will be useful in other places. But I'm not sure I see any real advantage in this series, and it seems like we're taking a hit in both performance and complexity in the meantime. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-21 14:34 ` Jeff King @ 2021-10-21 22:34 ` Junio C Hamano 2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-10-21 22:34 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Taylor Blau, git, Johannes Sixt, Øystein Walle Jeff King <peff@peff.net> writes: > Now performance isn't everything, and it's possible these partial > snippets will be useful in other places. But I'm not sure I see any real > advantage in this series, and it seems like we're taking a hit in both > performance and complexity in the meantime. Let's not forget about a hit we are taking in reviewer bandwidth. And added complexity will cost more over time. I am not sure if these 8-patches deserved this much attention, compared to other neglected topics. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-21 14:34 ` Jeff King 2021-10-21 22:34 ` Junio C Hamano @ 2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason 2021-10-22 18:31 ` Jeff King 1 sibling, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 10:51 UTC (permalink / raw) To: Jeff King Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Thu, Oct 21 2021, Jeff King wrote: > On Thu, Oct 21, 2021 at 02:48:24AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> >> There's subtle dependency issues there as well, and just having a >> >> one-off solution for the the command-list.h doesn't get us closer to >> >> addressing that sibling implementation. >> > >> > So I don't know what "subtle dependency issues" you found here, but this >> > is exactly the kind of complexity it was my goal to avoid. >> >> But how? I don't see how narrowly making the loop in generate-cmdlist.sh >> gets us closer to generating the "cmds_txt" in the >> Documentation/Makefile. > > What I meant is that the work to get everything right in the Makefile to > correctly handle dependencies and a partial rebuild can be tricky. For > instance, you're still stuck with a big wildcard dependency on > Documentation/git*.txt (and a manual list of exclusions in the Makefile) > because it's hard in make to do make new dynamic rules based on an > existing one (i.e., the list _should_ come from what's in > command-list.txt). Or the fact that we apparently need to keep the old > script around or cmake anyway. > > It's also much slower. Here are from-scratch builds before and after > your patch 7: > > $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h' > Benchmark #1: make command-list.h > Time (mean ± σ): 1.527 s ± 0.060 s [User: 1.320 s, System: 0.649 s] > Range (min … max): 1.433 s … 1.625 s 10 runs > > > $ hyperfine --warmup 1 -p 'make clean' 'make command-list.h' > Benchmark #1: make command-list.h > Time (mean ± σ): 2.661 s ± 0.080 s [User: 2.359 s, System: 1.082 s] > Range (min … max): 2.481 s … 2.756 s 10 runs > > I know that partial builds will offset that in some cases, but it still > feels like a step in the wrong direction. Even with a partial build, > swapping out "make clean" for "touch Documentation/git-add.txt" takes > about 200ms for me. Whereas with the faster version of > generate-cmdlist.sh I showed, it takes 150ms. > > Now performance isn't everything, and it's possible these partial > snippets will be useful in other places. But I'm not sure I see any real > advantage in this series, and it seems like we're taking a hit in both > performance and complexity in the meantime. Yes, the same numbers are noted in the 7/8 commit message. I.e. it's slower on -j1, but faster with higher -j<n> numbers. Aside from any changes I'm proposing here it seems rather pointless to me to optimize the runtime of -j1 runs. I think we use those in e.g. CI, so of course if they become *really* slow it will matter, but the purpose of this change is to make hacking on git easier, both in terms of runtime and discovering what the Makefile is doing wih V=1. I think anyone hacking on git is going to be on a system with -j2 at least. So again, separate from these specific changes, if we've got a change that speeds up -jN runs at the cost of a -j1 run that seems like good thing. In terms of the utility of benchmarks this benchmark uses "make" and is meaningful, but e.g. <YXCKqAEwtwFozWk6@nand.local> (and I think some other ones?) in this thread invoke the shellscript directly. Those sorts of benchmarks may or may not matter, and in this case the script is always called in the context of a Makefile, so that's really the only meaningful way to test it. If e.g. its performance changes in a way that won't be noticed in other Makefile noise it probably won't matter to anyone. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason @ 2021-10-22 18:31 ` Jeff King 2021-10-22 20:50 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 87+ messages in thread From: Jeff King @ 2021-10-22 18:31 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Fri, Oct 22, 2021 at 12:51:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > Yes, the same numbers are noted in the 7/8 commit message. I.e. it's > slower on -j1, but faster with higher -j<n> numbers. > > Aside from any changes I'm proposing here it seems rather pointless to > me to optimize the runtime of -j1 runs. > > I think we use those in e.g. CI, so of course if they become *really* > slow it will matter, but the purpose of this change is to make hacking > on git easier, both in terms of runtime and discovering what the > Makefile is doing wih V=1. > > I think anyone hacking on git is going to be on a system with -j2 at > least. So again, separate from these specific changes, if we've got a > change that speeds up -jN runs at the cost of a -j1 run that seems like > good thing. It seems weird to me to assume that all of our processors are available to build command-list.h. In most cases you are not running "make -j16 command-list.h", but rather "make -j16", and those other processors are doing useful things, like say, building actual C code. So counting CPU time is the interesting thing, because every cycle you save there gets used for other work. And "make -j1" just brings wall-clock and CPU time together. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-22 18:31 ` Jeff King @ 2021-10-22 20:50 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 20:50 UTC (permalink / raw) To: Jeff King Cc: Taylor Blau, git, Junio C Hamano, Johannes Sixt, Øystein Walle On Fri, Oct 22 2021, Jeff King wrote: > On Fri, Oct 22, 2021 at 12:51:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Yes, the same numbers are noted in the 7/8 commit message. I.e. it's >> slower on -j1, but faster with higher -j<n> numbers. >> >> Aside from any changes I'm proposing here it seems rather pointless to >> me to optimize the runtime of -j1 runs. >> >> I think we use those in e.g. CI, so of course if they become *really* >> slow it will matter, but the purpose of this change is to make hacking >> on git easier, both in terms of runtime and discovering what the >> Makefile is doing wih V=1. >> >> I think anyone hacking on git is going to be on a system with -j2 at >> least. So again, separate from these specific changes, if we've got a >> change that speeds up -jN runs at the cost of a -j1 run that seems like >> good thing. > > It seems weird to me to assume that all of our processors are available > to build command-list.h. In most cases you are not running "make -j16 > command-list.h", but rather "make -j16", and those other processors are > doing useful things, like say, building actual C code. > > So counting CPU time is the interesting thing, because every cycle you > save there gets used for other work. And "make -j1" just brings > wall-clock and CPU time together. There's been a lot of goalpost moving in the discussion around this series, but if you look at the numbers for the v1 I proposed at <patch-7.8-0c6f9b80d3b-20211020T183533Z-avarab@gmail.com> it also had a significant drop in "user" time. The only thing that was slower in my tests in either wallclock or user time was the wallclock time on the building from scratch case (the user time was lower). Maybe those measurements were bad or whatever, but that's the context for the above. Now, since then I've submitted a v2 with just the "make the shellscript faster" parts of this, including with some of your suggestions, making the shellscript version faster. FWIW there's a rather trivial addition to my version that makes the "make all the things" faster again in wallclock/user time. We don't need to re-parse the command-list.txt at all again to emit the headers if command-list.txt doesn't change, which is the common case. So we can just cache the whole header portion in another *.gen file and "cat" it. Anyway, I've been running these changes on top of other Makefile changes I've got locally where we re-build almost nothing as HEAD changes. No FORCE targets shellscripting (but the same via native GNU make features), and no *.sh/*.perl script re-generation on HEAD changes (the former being on-list). So with that the common case really is that everything hangs on the command-list.h generation. Since we have maybe 1-5 *.c files modified, then we need to link everything, and the linking is hanging on help.o, which needs command-list.h. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN 2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason 2021-10-20 23:46 ` Jeff King @ 2021-10-21 5:39 ` Eric Sunshine 1 sibling, 0 replies; 87+ messages in thread From: Eric Sunshine @ 2021-10-21 5:39 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Taylor Blau, Jeff King, Git List, Junio C Hamano, Johannes Sixt, Øystein Walle On Wed, Oct 20, 2021 at 7:34 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Wed, Oct 20 2021, Taylor Blau wrote: > > But I don't think we even need to, since we could just rely on Awk. That > > has all the benefits you described while still avoiding the circular > > dependency on libgit.a. But the killer feature is that we don't have to > > rely on two implementations, the lesser-used of which is likely to > > bitrot over time. > > Per Eric's Sunshine's upthread comments an awk and Perl implementation > were both considered before[1]. > 1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/ Thanks for saving me the trouble of digging up that email reference. ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason ` (8 preceding siblings ...) 2021-10-20 20:35 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason ` (13 more replies) 9 siblings, 14 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason This version of this series drops the Makefile-powered version of the cmdlist in favor of making the shellscript much faster, mostly with suggestions from Jeff King. I still think that splitting out the generated data into files may be useful for unifying the Documentation/ and C code build processes, there's another custom parser for command-list.txt in Documentation/cmd-list.perl. But if and when I've got something for that I can dig that out of the v1, in the meantime the v1 of this should be mostly uncontroversial. The last tow patches make things a bit slower for me, but since they replace command invocations with pure-shell logic they presumably make things a bit less painful on e.g. Windows, and the 8th patch here already made things quite very fast already. Jeff King (1): generate-cmdlist.sh: do not shell out to "sed" Johannes Sixt (2): generate-cmdlist.sh: spawn fewer processes generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason (7): command-list.txt: sort with "LC_ALL=C sort" generate-cmdlist.sh: trivial whitespace change generate-cmdlist.sh: don't call get_categories() from category_list() generate-cmdlist.sh: run "grep | sort", not "sort | grep" generate-cmdlist.sh: stop sorting category lines generate-cmdlist.sh: replace "grep' invocation with a shell version generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell command-list.txt | 20 +++++++------- generate-cmdlist.sh | 66 ++++++++++++++++++++++++++------------------- 2 files changed, 48 insertions(+), 38 deletions(-) Range-diff against v1: 1: 96885282988 = 1: 96885282988 command-list.txt: sort with "LC_ALL=C sort" 2: 5e8fef90e42 = 2: 5e8fef90e42 generate-cmdlist.sh: trivial whitespace change 3: 6b4de6a6088 = 3: 6b4de6a6088 generate-cmdlist.sh: spawn fewer processes 4: 074685cf714 = 4: 074685cf714 generate-cmdlist.sh: don't call get_categories() from category_list() 5: f01c1fd8088 = 5: f01c1fd8088 generate-cmdlist.sh: run "grep | sort", not "sort | grep" 6: e0b11514b8d = 6: e0b11514b8d generate-cmdlist.sh: replace for loop by printf's auto-repeat feature 7: 0c6f9b80d3b < -: ----------- Makefile: stop having command-list.h depend on a wildcard 8: 23d4cc77b6c < -: ----------- Makefile: assert correct generate-cmdlist.sh output -: ----------- > 7: f2f37c2963b generate-cmdlist.sh: stop sorting category lines -: ----------- > 8: 83318d6c0da generate-cmdlist.sh: do not shell out to "sed" -: ----------- > 9: 7903dd1f8c2 generate-cmdlist.sh: replace "grep' invocation with a shell version -: ----------- > 10: e10a43756d1 generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell -- 2.33.1.1505.g075a284c562 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-25 18:29 ` Junio C Hamano 2021-10-22 19:36 ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason ` (12 subsequent siblings) 13 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason We should keep these files sorted in the C locale, e.g. in the C locale the order is: git-check-mailmap git-check-ref-format git-checkout But under en_US.UTF-8 it's: git-check-mailmap git-checkout git-check-ref-format In a subsequent commit I'll change generate-cmdlist.sh to use C sort order, and without this change we'd be led to believe that that change caused a meaningful change in the output, so let's do this as a separate step, right now the generate-cmdlist.sh script just uses the order found in this file. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- command-list.txt | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/command-list.txt b/command-list.txt index a289f09ed6f..02fc7ddde68 100644 --- a/command-list.txt +++ b/command-list.txt @@ -60,9 +60,9 @@ git-cat-file plumbinginterrogators git-check-attr purehelpers git-check-ignore purehelpers git-check-mailmap purehelpers +git-check-ref-format purehelpers git-checkout mainporcelain git-checkout-index plumbingmanipulators -git-check-ref-format purehelpers git-cherry plumbinginterrogators complete git-cherry-pick mainporcelain git-citool mainporcelain @@ -111,7 +111,6 @@ git-index-pack plumbingmanipulators git-init mainporcelain init git-instaweb ancillaryinterrogators complete git-interpret-trailers purehelpers -gitk mainporcelain git-log mainporcelain info git-ls-files plumbinginterrogators git-ls-remote plumbinginterrogators @@ -124,11 +123,11 @@ git-merge-base plumbinginterrogators git-merge-file plumbingmanipulators git-merge-index plumbingmanipulators git-merge-one-file purehelpers -git-mergetool ancillarymanipulators complete git-merge-tree ancillaryinterrogators -git-multi-pack-index plumbingmanipulators +git-mergetool ancillarymanipulators complete git-mktag plumbingmanipulators git-mktree plumbingmanipulators +git-multi-pack-index plumbingmanipulators git-mv mainporcelain worktree git-name-rev plumbinginterrogators git-notes mainporcelain @@ -154,23 +153,23 @@ git-request-pull foreignscminterface complete git-rerere ancillaryinterrogators git-reset mainporcelain history git-restore mainporcelain worktree -git-revert mainporcelain git-rev-list plumbinginterrogators git-rev-parse plumbinginterrogators +git-revert mainporcelain git-rm mainporcelain worktree git-send-email foreignscminterface complete git-send-pack synchingrepositories +git-sh-i18n purehelpers +git-sh-setup purehelpers git-shell synchelpers git-shortlog mainporcelain git-show mainporcelain info git-show-branch ancillaryinterrogators complete git-show-index plumbinginterrogators git-show-ref plumbinginterrogators -git-sh-i18n purehelpers -git-sh-setup purehelpers git-sparse-checkout mainporcelain worktree -git-stash mainporcelain git-stage complete +git-stash mainporcelain git-status mainporcelain info git-stripspace purehelpers git-submodule mainporcelain @@ -189,10 +188,11 @@ git-var plumbinginterrogators git-verify-commit ancillaryinterrogators git-verify-pack plumbinginterrogators git-verify-tag ancillaryinterrogators -gitweb ancillaryinterrogators git-whatchanged ancillaryinterrogators complete git-worktree mainporcelain git-write-tree plumbingmanipulators +gitk mainporcelain +gitweb ancillaryinterrogators gitattributes guide gitcli guide gitcore-tutorial guide @@ -211,6 +211,6 @@ gitremote-helpers guide gitrepository-layout guide gitrevisions guide gitsubmodules guide -gittutorial-2 guide gittutorial guide +gittutorial-2 guide gitworkflows guide -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason @ 2021-10-25 18:29 ` Junio C Hamano 2021-10-25 21:22 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 87+ messages in thread From: Junio C Hamano @ 2021-10-25 18:29 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > We should keep these files sorted in the C locale, e.g. in the C > locale the order is: > > git-check-mailmap > git-check-ref-format > git-checkout > > But under en_US.UTF-8 it's: > > git-check-mailmap > git-checkout > git-check-ref-format > > In a subsequent commit I'll change generate-cmdlist.sh to use C sort > order, and without this change we'd be led to believe that that change > caused a meaningful change in the output, so let's do this as a > separate step, right now the generate-cmdlist.sh script just uses the > order found in this file. Hmph, I do not mind sorting this file bytewise like this at all, but does the justification above still apply to this round? I had an impression that we lose the sorting altogether in the end... Also, I am not sure where that "led to believe" comes from---do we have a test that checks the output from generate-cmdlist somehow? ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-10-25 18:29 ` Junio C Hamano @ 2021-10-25 21:22 ` Ævar Arnfjörð Bjarmason 2021-10-25 21:26 ` Junio C Hamano 0 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-25 21:22 UTC (permalink / raw) To: Junio C Hamano Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau On Mon, Oct 25 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> We should keep these files sorted in the C locale, e.g. in the C >> locale the order is: >> >> git-check-mailmap >> git-check-ref-format >> git-checkout >> >> But under en_US.UTF-8 it's: >> >> git-check-mailmap >> git-checkout >> git-check-ref-format >> >> In a subsequent commit I'll change generate-cmdlist.sh to use C sort >> order, and without this change we'd be led to believe that that change >> caused a meaningful change in the output, so let's do this as a >> separate step, right now the generate-cmdlist.sh script just uses the >> order found in this file. > > Hmph, I do not mind sorting this file bytewise like this at all, but > does the justification above still apply to this round? I had an > impression that we lose the sorting altogether in the end... > > Also, I am not sure where that "led to believe" comes from---do we > have a test that checks the output from generate-cmdlist somehow? We end up unsorting the categories we bitwise-OR together. I.e. the CAT_* on a line like this: { "git-whatchanged", N_("Show logs with difference each commit introduces"), 0 | CAT_ancillaryinterrogators | CAT_complete }, But we'll still sort the actual command list, and spew it out as-is in some places "git" and "help" output. So having it be unsorted or shuffled wouldn't be nice. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-10-25 21:22 ` Ævar Arnfjörð Bjarmason @ 2021-10-25 21:26 ` Junio C Hamano 0 siblings, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-10-25 21:26 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> Also, I am not sure where that "led to believe" comes from---do we >> have a test that checks the output from generate-cmdlist somehow? > > We end up unsorting the categories we bitwise-OR together. I.e. the > CAT_* on a line like this: > > { "git-whatchanged", N_("Show logs with difference each commit introduces"), 0 | CAT_ancillaryinterrogators | CAT_complete }, > > But we'll still sort the actual command list, and spew it out as-is in > some places "git" and "help" output. So having it be unsorted or > shuffled wouldn't be nice. The last paragraph would be a good replacement sentence for the part of the proposed log message that made me ask the question, I would think. Thanks. ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason ` (11 subsequent siblings) 13 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason This makes a subsequent diff smaller, and won't leave us with this syntax nit at the end. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 9dbbb08e70a..5114f46680a 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -10,7 +10,7 @@ command_list () { } get_categories () { - tr ' ' '\012'| + tr ' ' '\012' | grep -v '^$' | sort | uniq -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason ` (10 subsequent siblings) 13 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason From: Johannes Sixt <j6t@kdbg.org> The function get_categories() is invoked in a loop over all commands. As it runs several processes, this takes an awful lot of time on Windows. To reduce the number of processes, move the process that filters empty lines to the other invoker of the function, where it is needed. The invocation of get_categories() in the loop does not need the empty line filtered away because the result is word-split by the shell, which eliminates the empty line automatically. Furthermore, use sort -u instead of sort | uniq to remove yet another process. [Ævar: on Linux this seems to speed things up a bit, although with hyperfine(1) the results are fuzzy enough to land within the confidence interval]: $ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old $ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt' Benchmark #1: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 371.3 ms ± 64.2 ms [User: 430.4 ms, System: 72.5 ms] Range (min … max): 320.5 ms … 517.7 ms 10 runs Benchmark #2: sh generate-cmdlist.sh.old command-list.txt Time (mean ± σ): 489.9 ms ± 185.4 ms [User: 724.7 ms, System: 141.3 ms] Range (min … max): 346.0 ms … 885.3 ms 10 runs Summary 'sh generate-cmdlist.sh command-list.txt' ran 1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt' Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 5114f46680a..27367915611 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -11,15 +11,14 @@ command_list () { get_categories () { tr ' ' '\012' | - grep -v '^$' | - sort | - uniq + LC_ALL=C sort -u } category_list () { command_list "$1" | cut -c 40- | - get_categories + get_categories | + grep -v '^$' } get_synopsis () { -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (2 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason ` (9 subsequent siblings) 13 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason This isn't for optimization as the get_categories() is a purely shell function, but rather for ease of readability, let's just inline these two lines. We'll be changing this code some more in subsequent commits to make this worth it. Rename the get_categories() function to get_category_line(), since that's what it's doing now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 27367915611..16043e38476 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -9,7 +9,7 @@ command_list () { eval "grep -ve '^#' $exclude_programs" <"$1" } -get_categories () { +get_category_line () { tr ' ' '\012' | LC_ALL=C sort -u } @@ -17,7 +17,8 @@ get_categories () { category_list () { command_list "$1" | cut -c 40- | - get_categories | + tr ' ' '\012' | + LC_ALL=C sort -u | grep -v '^$' } @@ -66,7 +67,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - for cat in $(echo "$rest" | get_categories) + for cat in $(echo "$rest" | get_category_line) do printf " | CAT_$cat" done -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (3 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason ` (8 subsequent siblings) 13 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason This doesn't matter for performance, but let's not include the empty lines in our sorting. This makes the intent of the code clearer. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 16043e38476..e517c33710a 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -18,8 +18,8 @@ category_list () { command_list "$1" | cut -c 40- | tr ' ' '\012' | - LC_ALL=C sort -u | - grep -v '^$' + grep -v '^$' | + LC_ALL=C sort -u } get_synopsis () { -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (4 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason ` (7 subsequent siblings) 13 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason From: Johannes Sixt <j6t@kdbg.org> This is just a small code reduction. There is a small probability that the new code breaks when the category list is empty. But that would be noticed during the compile step. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index e517c33710a..a1ab2b1f077 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -67,10 +67,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - for cat in $(echo "$rest" | get_category_line) - do - printf " | CAT_$cat" - done + printf " | CAT_%s" $(echo "$rest" | get_category_line) echo " }," done echo "};" -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (5 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-25 16:39 ` Jeff King 2021-10-22 19:36 ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason ` (6 subsequent siblings) 13 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason In a preceding commit we changed the print_command_list() loop to use printf's auto-repeat feature. Let's now get rid of get_category_line() entirely by not sorting the categories. This will change the output of the generated code from e.g.: - { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators }, To: + { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete }, I.e. the categories are no longer sorted, but as they're OR'd together it won't matter for the end result. This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old) and "master" to this code: 'sh generate-cmdlist.sh command-list.txt' ran 1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt' 1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt' Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index a1ab2b1f077..f50112c50f8 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -9,11 +9,6 @@ command_list () { eval "grep -ve '^#' $exclude_programs" <"$1" } -get_category_line () { - tr ' ' '\012' | - LC_ALL=C sort -u -} - category_list () { command_list "$1" | cut -c 40- | @@ -67,7 +62,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - printf " | CAT_%s" $(echo "$rest" | get_category_line) + printf " | CAT_%s" $rest echo " }," done echo "};" -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines 2021-10-22 19:36 ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason @ 2021-10-25 16:39 ` Jeff King 0 siblings, 0 replies; 87+ messages in thread From: Jeff King @ 2021-10-25 16:39 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau On Fri, Oct 22, 2021 at 09:36:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > In a preceding commit we changed the print_command_list() loop to use > printf's auto-repeat feature. Let's now get rid of get_category_line() > entirely by not sorting the categories. > > This will change the output of the generated code from e.g.: > > - { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators }, > > To: > > + { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete }, > > I.e. the categories are no longer sorted, but as they're OR'd together > it won't matter for the end result. Thanks for picking this up. The commit message here is well explained. > This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old) > and "master" to this code: > > 'sh generate-cmdlist.sh command-list.txt' ran > 1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt' > 1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt' Curious. I get much more dramatic results (as I'd expect, as we are cutting out 2 of 3 process spawns in the loop): 'sh generate-cmdlist.sh command-list.txt' ran 2.16 ± 0.17 times faster than 'sh generate-cmdlist.sh.old command-list.txt' 2.37 ± 0.28 times faster than 'sh generate-cmdlist.sh.master command-list.txt' Either way, I think it's a good idea (and it paves the way for the next patch, where we get the biggest speedup because we stop spawning any processes at all). -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (6 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-25 16:46 ` Jeff King 2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason ` (5 subsequent siblings) 13 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason From: Jeff King <peff@peff.net> Replace the "sed" invocation in get_synopsis() with a pure-shell version. This speeds up generate-cmdlist.sh significantly. Compared to HEAD~ (old) and "master" we are, according to hyperfine(1): 'sh generate-cmdlist.sh command-list.txt' ran 12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt' 18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt' Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jeff King <peff@peff.net> --- generate-cmdlist.sh | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index f50112c50f8..9b7d6aea629 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -17,16 +17,6 @@ category_list () { LC_ALL=C sort -u } -get_synopsis () { - sed -n ' - /^NAME/,/'"$1"'/H - ${ - x - s/.*'"$1"' - \(.*\)/N_("\1")/ - p - }' "Documentation/$1.txt" -} - define_categories () { echo echo "/* Command categories */" @@ -61,7 +51,18 @@ print_command_list () { command_list "$1" | while read cmd rest do - printf " { \"$cmd\", $(get_synopsis $cmd), 0" + synopsis= + while read line + do + case "$line" in + "$cmd - "*) + synopsis=${line#$cmd - } + break + ;; + esac + done <"Documentation/$cmd.txt" + + printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis" printf " | CAT_%s" $rest echo " }," done -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" 2021-10-22 19:36 ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason @ 2021-10-25 16:46 ` Jeff King 2021-10-25 17:52 ` Jeff King 0 siblings, 1 reply; 87+ messages in thread From: Jeff King @ 2021-10-25 16:46 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau On Fri, Oct 22, 2021 at 09:36:12PM +0200, Ævar Arnfjörð Bjarmason wrote: > From: Jeff King <peff@peff.net> > > Replace the "sed" invocation in get_synopsis() with a pure-shell > version. This speeds up generate-cmdlist.sh significantly. Compared to > HEAD~ (old) and "master" we are, according to hyperfine(1): Unsurprisingly I'm in favor of this. ;) Curiously again, I get more dramatic results than you: > 'sh generate-cmdlist.sh command-list.txt' ran > 12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt' > 18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt' 'sh generate-cmdlist.sh command-list.txt' ran 22.44 ± 13.59 times faster than 'sh generate-cmdlist.sh.old command-list.txt' 57.35 ± 34.10 times faster than 'sh generate-cmdlist.sh.master command-list.txt' It's like spawning processes is somehow faster on your machine than mine. I wonder if it's a CPU governor thing. This is a laptop, and those numbers come from using "powersave". Doing "cpufreq-set -g performance", I get: 'sh generate-cmdlist.sh command-list.txt' ran 14.35 ± 0.23 times faster than 'sh generate-cmdlist.sh.old command-list.txt' 33.15 ± 0.50 times faster than 'sh generate-cmdlist.sh.master command-list.txt' which is closer. But most notably all versions are 3-5x faster than their "powersave" counterparts. I wonder if that has been driving some of the confusion in our timings in this thread. Either way, I think this is still a good direction to go. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" 2021-10-25 16:46 ` Jeff King @ 2021-10-25 17:52 ` Jeff King 0 siblings, 0 replies; 87+ messages in thread From: Jeff King @ 2021-10-25 17:52 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau On Mon, Oct 25, 2021 at 12:46:45PM -0400, Jeff King wrote: > It's like spawning processes is somehow faster on your machine than > mine. I wonder if it's a CPU governor thing. This is a laptop, and those > numbers come from using "powersave". Doing "cpufreq-set -g performance", > I get: > > 'sh generate-cmdlist.sh command-list.txt' ran > 14.35 ± 0.23 times faster than 'sh generate-cmdlist.sh.old command-list.txt' > 33.15 ± 0.50 times faster than 'sh generate-cmdlist.sh.master command-list.txt' > > which is closer. But most notably all versions are 3-5x faster than > their "powersave" counterparts. I wonder if that has been driving some > of the confusion in our timings in this thread. BTW, I should have mentioned these governors are used with the intel_pstate driver. So "powersave" here is not the default "use the lowest frequency" governor in linux, but the pstate-specific one which is more like "ondemand". Not that I expect anybody to look into or pontificate on governor issues, but I wanted to make sure I didn't confuse anyone. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (7 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-23 22:19 ` Junio C Hamano 2021-10-23 22:26 ` Junio C Hamano 2021-10-22 19:36 ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason ` (4 subsequent siblings) 13 siblings, 2 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason Replace the "grep" we run to exclude certain programs from the generated output with a pure-shell loop that strips out the comments, and sees if the "cmd" we're reading is on a list of excluded programs. This uses a trick similar to test_have_prereq() in test-lib-functions.sh. On my *nix system this makes things quite a bit slower compared to HEAD~, but since the generate-cmdlist.sh is already quite fast, and this likely helps systems where command invocations are more expensive (i.e. Windows) let's use this anyway. 'sh generate-cmdlist.sh.old command-list.txt' ran 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 9b7d6aea629..2b184bbc65f 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -6,12 +6,27 @@ die () { } command_list () { - eval "grep -ve '^#' $exclude_programs" <"$1" + while read cmd rest + do + case "$cmd" in + "#"*) + continue; + ;; + *) + case "$exclude_programs" in + *":$cmd:"*) + ;; + *) + echo "$cmd $rest" + ;; + esac + esac + done } category_list () { - command_list "$1" | - cut -c 40- | + command_list <"$1" | + cut -d' ' -f2- | tr ' ' '\012' | grep -v '^$' | LC_ALL=C sort -u @@ -48,7 +63,7 @@ define_category_names () { print_command_list () { echo "static struct cmdname_help command_list[] = {" - command_list "$1" | + command_list <"$1" | while read cmd rest do synopsis= @@ -69,11 +84,11 @@ print_command_list () { echo "};" } -exclude_programs= +exclude_programs=: while test "--exclude-program" = "$1" do shift - exclude_programs="$exclude_programs -e \"^$1 \"" + exclude_programs="$exclude_programs$1:" shift done -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version 2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason @ 2021-10-23 22:19 ` Junio C Hamano 2021-10-23 22:26 ` Junio C Hamano 1 sibling, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-10-23 22:19 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Replace the "grep" we run to exclude certain programs from the > generated output with a pure-shell loop that strips out the comments, > and sees if the "cmd" we're reading is on a list of excluded > programs. This uses a trick similar to test_have_prereq() in > test-lib-functions.sh. > > On my *nix system this makes things quite a bit slower compared to > HEAD~, but since the generate-cmdlist.sh is already quite fast, and > this likely helps systems where command invocations are more > expensive (i.e. Windows) let's use this anyway. > > 'sh generate-cmdlist.sh.old command-list.txt' ran > 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' > 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > generate-cmdlist.sh | 27 +++++++++++++++++++++------ > 1 file changed, 21 insertions(+), 6 deletions(-) > > diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh > index 9b7d6aea629..2b184bbc65f 100755 > --- a/generate-cmdlist.sh > +++ b/generate-cmdlist.sh > @@ -6,12 +6,27 @@ die () { > } > > command_list () { > - eval "grep -ve '^#' $exclude_programs" <"$1" > + while read cmd rest > + do > + case "$cmd" in > + "#"*) > + continue; > + ;; > + *) > + case "$exclude_programs" in > + *":$cmd:"*) > + ;; Funny indentation. > + *) > + echo "$cmd $rest" > + ;; > + esac > + esac > + done > } > category_list () { > - command_list "$1" | > + command_list <"$1" | This change is unnecessary if you did while read cmd rest do ... done <"$1" to keep the external interface to the command_list helper unchanged. > - cut -c 40- | > + cut -d' ' -f2- | Is this just a subjective preference or a logical consequence of how the output from command_list looks like got somehow changed? > @@ -48,7 +63,7 @@ define_category_names () { > print_command_list () { > echo "static struct cmdname_help command_list[] = {" > > - command_list "$1" | > + command_list <"$1" | Ditto. > -exclude_programs= > +exclude_programs=: > while test "--exclude-program" = "$1" > do > shift > - exclude_programs="$exclude_programs -e \"^$1 \"" > + exclude_programs="$exclude_programs$1:" > shift > done ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version 2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason 2021-10-23 22:19 ` Junio C Hamano @ 2021-10-23 22:26 ` Junio C Hamano 1 sibling, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-10-23 22:26 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > + "#"*) > + continue; > + ;; Stray ';' after continue. ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (8 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 ` Ævar Arnfjörð Bjarmason 2021-10-23 22:26 ` Junio C Hamano 2021-10-22 21:20 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau ` (3 subsequent siblings) 13 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-10-22 19:36 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason Extend the pure-shell parsing of command-list.txt by using having command_list() take an argument indicating whether we're interested in the "$cmd" part of the line, or just the "$rest". That takes care of the "cut -d", and printf's auto-repeat feature can replace the "tr". We don't need the "grep -v" either, as we're not emitting any empty lines here (the command-list.txt doesn't have any). This doesn't make things any faster or slower in my tests, but as with the preceding commit let's do it just to get rid of command invocations, it'll probably help on e.g. Windows. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 2b184bbc65f..394443c66df 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -17,7 +17,12 @@ command_list () { *":$cmd:"*) ;; *) - echo "$cmd $rest" + if test -n "$1" + then + printf "%s\n" $rest + else + echo "$cmd $rest" + fi ;; esac esac @@ -25,10 +30,7 @@ command_list () { } category_list () { - command_list <"$1" | - cut -d' ' -f2- | - tr ' ' '\012' | - grep -v '^$' | + command_list --no-cat <"$1" | LC_ALL=C sort -u } -- 2.33.1.1505.g075a284c562 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell 2021-10-22 19:36 ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason @ 2021-10-23 22:26 ` Junio C Hamano 0 siblings, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-10-23 22:26 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Extend the pure-shell parsing of command-list.txt by using having using having??? > command_list() take an argument indicating whether we're interested in > the "$cmd" part of the line, or just the "$rest". OK, --no-cat stands for --no-category? Even if (or especially if, perhaps) you do not bother to parse the option in the command_list helper, it would help the readers if it is spelled out. I somehow thought if this option has anything to do with "/bin/cat". > That takes care of the "cut -d", and printf's auto-repeat feature can > replace the "tr". We don't need the "grep -v" either, as we're not > emitting any empty lines here (the command-list.txt doesn't have any). It may make sense to ensure that the case arm won't feed an empty line that made cmd an empty by tightening the condition. case "$cmd" in "#"*) continue ;; - *) + ?*) case "$exclude_programs" in *:"$cmd":*) ;; If anything, that would serve as a clear documentation that we are safe even when the input has an empty line. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (9 preceding siblings ...) 2021-10-22 19:36 ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason @ 2021-10-22 21:20 ` Taylor Blau 2021-10-23 22:34 ` Junio C Hamano ` (2 subsequent siblings) 13 siblings, 0 replies; 87+ messages in thread From: Taylor Blau @ 2021-10-22 21:20 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine On Fri, Oct 22, 2021 at 09:36:04PM +0200, Ævar Arnfjörð Bjarmason wrote: > This version of this series drops the Makefile-powered version of the > cmdlist in favor of making the shellscript much faster, mostly with > suggestions from Jeff King. I'd be happy with the version that you suggest here, since at least the net-effect is that generate-commandlist.sh is faster than before without much additional complexity. That said, I did find the structure of these patches somewhat confusing. There is a lot of refactoring of get_categories() and related functions, and those patches were a little tricky to read for me. I wonder how much could be cleaned up by placing "generate-cmdlist.sh: stop sorting category lines" earlier in the series, getting rid of the caller. It's too bad that the penultimate patch slowed things down a bit, but I think that things are so fast now that much more discussion in this area is really just splitting hairs. It would be interesting to hear from somebody on Windows whether or not the speed-up there was worth it (otherwise dropping that patch might make sense). Anyway, I think we've all spent a lot of time discussing a rather straightforward set of patches ;-). So this version looks good to me. Thanks, Taylor ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (10 preceding siblings ...) 2021-10-22 21:20 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau @ 2021-10-23 22:34 ` Junio C Hamano 2021-10-25 16:57 ` Jeff King 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason 13 siblings, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-10-23 22:34 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Subject: Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Stale topic as there is no change to the Makefile. > This version of this series drops the Makefile-powered version of the > cmdlist in favor of making the shellscript much faster, mostly with > suggestions from Jeff King. OK. I looked at the whole thing and it looked almost done, modulo just a little breakages here and there whose corrections should be fairly obvious. Thanks. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (11 preceding siblings ...) 2021-10-23 22:34 ` Junio C Hamano @ 2021-10-25 16:57 ` Jeff King 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason 13 siblings, 0 replies; 87+ messages in thread From: Jeff King @ 2021-10-25 16:57 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Junio C Hamano, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau On Fri, Oct 22, 2021 at 09:36:04PM +0200, Ævar Arnfjörð Bjarmason wrote: > This version of this series drops the Makefile-powered version of the > cmdlist in favor of making the shellscript much faster, mostly with > suggestions from Jeff King. > > I still think that splitting out the generated data into files may be > useful for unifying the Documentation/ and C code build processes, > there's another custom parser for command-list.txt in > Documentation/cmd-list.perl. > > But if and when I've got something for that I can dig that out of the > v1, in the meantime the v1 of this should be mostly uncontroversial. Thanks, up through patch 8 this all looks good to me. > The last tow patches make things a bit slower for me, but since they > replace command invocations with pure-shell logic they presumably make > things a bit less painful on e.g. Windows, and the 8th patch here > already made things quite very fast already. These ones I could take or leave. They probably do help a little on Windows, but I'm much more concerned about O(nr_of_commands) process invocations than I am in reducing the base number of invocations (because one gives a 169x speedup over the other). And in patch 9 in particular, we're trading a grep one-liner for a much-longer shell loop. And I don't think this is hypocritical with respect to patch 8; there we are replacing ugly sed with ugly shell, and the speed benefit is clear and large. -Peff ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason ` (12 preceding siblings ...) 2021-10-25 16:57 ` Jeff King @ 2021-11-05 14:07 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason ` (9 more replies) 13 siblings, 10 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason This series makes the rather slow generate-cmdlist.sh shellscript run in ~20ms on my box instead of ~400ms. This helps a lot with "make" runtime, e.g. during interactive rebase. This hopefully addresses all the feedback on v2. I kept the penultimate patch as explained in an updated commit message it helped quite a bit in the CI environment, and presumably on various other setup. There's also a new ~1.3x speedup of "generate-cmdlist.sh: don't parse command-list.txt thrice" new at the end here, but on some faster boxes that's against a relative slowdown in 9/10 compared to Jeff's 8/10[1]. I opted to keep 9/10 to give slower systems where process spawning isn't as fast the benefit of the doubt. We spend less user time in 10/10 than in 8/10, but more system time. 1. This is HEAD~n at the tip of the series. So .2 = 8/10, .1 = 9/10 '' = 10/10 $ hyperfine --warmup 20 -L v .2,.1, 'sh generate-cmdlist.sh{v} command-list.txt' Benchmark #1: sh generate-cmdlist.sh.2 command-list.txt Time (mean ± σ): 19.5 ms ± 0.2 ms [User: 16.7 ms, System: 8.1 ms] Range (min … max): 18.9 ms … 20.5 ms 151 runs Benchmark #2: sh generate-cmdlist.sh.1 command-list.txt Time (mean ± σ): 30.1 ms ± 0.3 ms [User: 24.8 ms, System: 17.1 ms] Range (min … max): 29.3 ms … 31.3 ms 97 runs Benchmark #3: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 22.8 ms ± 0.3 ms [User: 15.2 ms, System: 10.1 ms] Range (min … max): 22.5 ms … 23.7 ms 125 runs Summary 'sh generate-cmdlist.sh.2 command-list.txt' ran 1.17 ± 0.02 times faster than 'sh generate-cmdlist.sh command-list.txt' 1.54 ± 0.02 times faster than 'sh generate-cmdlist.sh.1 command-list.txt' Jeff King (1): generate-cmdlist.sh: do not shell out to "sed" Johannes Sixt (2): generate-cmdlist.sh: spawn fewer processes generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason (7): command-list.txt: sort with "LC_ALL=C sort" generate-cmdlist.sh: trivial whitespace change generate-cmdlist.sh: don't call get_categories() from category_list() generate-cmdlist.sh: run "grep | sort", not "sort | grep" generate-cmdlist.sh: stop sorting category lines generate-cmdlist.sh: replace "grep' invocation with a shell version generate-cmdlist.sh: don't parse command-list.txt thrice command-list.txt | 22 ++++++------- generate-cmdlist.sh | 78 ++++++++++++++++++++++++++------------------- 2 files changed, 56 insertions(+), 44 deletions(-) Range-diff against v2: 1: 96885282988 ! 1: c385e84c04c command-list.txt: sort with "LC_ALL=C sort" @@ Commit message separate step, right now the generate-cmdlist.sh script just uses the order found in this file. + Note that this refers to the sort order of the lines in + command-list.txt, a subsequent commit will also change how we treat + the sort order of the "category" fields, but that's unrelated to this + change. + Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## command-list.txt ## @@ command-list.txt: git-request-pull foreignscminterface git-show-ref plumbinginterrogators -git-sh-i18n purehelpers -git-sh-setup purehelpers - git-sparse-checkout mainporcelain worktree + git-sparse-checkout mainporcelain -git-stash mainporcelain git-stage complete +git-stash mainporcelain @@ command-list.txt: git-var plumbinginterrogators git-whatchanged ancillaryinterrogators complete git-worktree mainporcelain git-write-tree plumbingmanipulators +@@ command-list.txt: gitfaq guide + gitglossary guide + githooks guide + gitignore guide +gitk mainporcelain -+gitweb ancillaryinterrogators - gitattributes guide - gitcli guide - gitcore-tutorial guide + gitmailmap guide + gitmodules guide + gitnamespaces guide @@ command-list.txt: gitremote-helpers guide gitrepository-layout guide gitrevisions guide @@ command-list.txt: gitremote-helpers guide -gittutorial-2 guide gittutorial guide +gittutorial-2 guide ++gitweb ancillaryinterrogators gitworkflows guide 2: 5e8fef90e42 ! 2: b4b4c3aa135 generate-cmdlist.sh: trivial whitespace change @@ Metadata ## Commit message ## generate-cmdlist.sh: trivial whitespace change - This makes a subsequent diff smaller, and won't leave us with this - syntax nit at the end. + Add " " before a "|" at the end of a line in generate-cmdlist.sh for + consistency with other code in the file. Some of the surrounding code + will be modified in subsequent commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> 3: 6b4de6a6088 = 3: 737cca59d99 generate-cmdlist.sh: spawn fewer processes 4: 074685cf714 = 4: 6ad17ab56c2 generate-cmdlist.sh: don't call get_categories() from category_list() 5: f01c1fd8088 = 5: d7be565b567 generate-cmdlist.sh: run "grep | sort", not "sort | grep" 6: e0b11514b8d = 6: 646363db11f generate-cmdlist.sh: replace for loop by printf's auto-repeat feature 7: f2f37c2963b = 7: d8cc7c246b8 generate-cmdlist.sh: stop sorting category lines 8: 83318d6c0da = 8: aeeecc575fb generate-cmdlist.sh: do not shell out to "sed" 9: 7903dd1f8c2 ! 9: e2702bcc1d0 generate-cmdlist.sh: replace "grep' invocation with a shell version @@ Commit message test-lib-functions.sh. On my *nix system this makes things quite a bit slower compared to - HEAD~, but since the generate-cmdlist.sh is already quite fast, and - this likely helps systems where command invocations are more - expensive (i.e. Windows) let's use this anyway. - + HEAD~: + o 'sh generate-cmdlist.sh.old command-list.txt' ran 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' - Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> + But when I tried running generate-cmdlist.sh 100 times in CI I found + that it helped across the board even on OSX & Linux. I tried testing + it in CI with this ad-hoc few-liner: + + for i in $(seq -w 0 11 | sort -nr) + do + git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh && + git add generate-cmdlist* && + cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : && + perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh && + git add t/t00*.sh + done && git commit -m"generated it" + + Here HEAD~02 and the t0002* file refers to this change, and HEAD~03 + and t0003* file to the preceding commit, the relevant results were: + + linux-gcc: + + [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) + [12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU) + + osx-gcc: + + [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) + [11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU) + + vs-test: + + [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) + [12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU) + + I.e. even on *nix running 100 of these in a loop was up to ~2x faster + in absolute runtime, I suspect it's due factors that are exacerbated + in the CI, e.g. much slower process startup due to some platform + limits, or a slower FS. + + The "cut -d" change here is because we're not emitting the + 40-character aligned output anymore, i.e. we'll get the output from + command_list() now, not an as-is line from command-list.txt. + + This also makes the parsing more reliable, as we could tweak the + whitespace alignment without breaking this parser. Let's reword a + now-inaccurate comment in "command-list.txt" describing that previous + alignment limitation. We'll still need the "### command-list [...]" + line due to the "Documentation/cmd-list.perl" logic added in + 11c6659d85d (command-list: prepare machinery for upcoming "common + groups" section, 2015-05-21). + + There was a proposed change subsequent to this one[3] which continued + moving more logic into the "command_list() function, i.e. replaced the + "cut | tr | grep" chain in "category_list()" with an argument to + "command_list()". + + That change might have had a bit of an effect, but not as much as the + preceding commit, so I decided to drop it. The relevant performance + numbers from it were: + + linux-gcc: + + [12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU) + [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) + + osx-gcc: + + [11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU) + [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) + + vs-test: + + [12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU) + [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) + + As above HEAD~2 and t0002* are testing the code in this commit (and + the line is the same), but HEAD~1 and t0001* are testing that dropped + change in [3]. + + 1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/ + 2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/ + 3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/ + + ## command-list.txt ## +@@ + # specified here, which can only have "guide" attribute and nothing + # else. + # +-### command list (do not change this line, also do not change alignment) ++### command list (do not change this line) + # command name category [category] [category] + git-add mainporcelain worktree + git-am mainporcelain ## generate-cmdlist.sh ## @@ generate-cmdlist.sh: die () { @@ generate-cmdlist.sh: die () { + while read cmd rest + do + case "$cmd" in -+ "#"*) -+ continue; ++ "#"* | '') ++ # Ignore comments and allow empty lines ++ continue + ;; + *) + case "$exclude_programs" in -+ *":$cmd:"*) ++ *":$cmd:"*) + ;; + *) + echo "$cmd $rest" + ;; + esac + esac -+ done ++ done <"$1" } category_list () { -- command_list "$1" | + command_list "$1" | - cut -c 40- | -+ command_list <"$1" | + cut -d' ' -f2- | tr ' ' '\012' | grep -v '^$' | LC_ALL=C sort -u -@@ generate-cmdlist.sh: define_category_names () { - print_command_list () { - echo "static struct cmdname_help command_list[] = {" - -- command_list "$1" | -+ command_list <"$1" | - while read cmd rest - do - synopsis= @@ generate-cmdlist.sh: print_command_list () { echo "};" } 10: e10a43756d1 < -: ----------- generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell -: ----------- > 10: 100084070fd generate-cmdlist.sh: don't parse command-list.txt thrice -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason @ 2021-11-05 14:07 ` Ævar Arnfjörð Bjarmason 2021-11-05 22:45 ` Junio C Hamano 2021-11-05 14:08 ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason ` (8 subsequent siblings) 9 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:07 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason We should keep these files sorted in the C locale, e.g. in the C locale the order is: git-check-mailmap git-check-ref-format git-checkout But under en_US.UTF-8 it's: git-check-mailmap git-checkout git-check-ref-format In a subsequent commit I'll change generate-cmdlist.sh to use C sort order, and without this change we'd be led to believe that that change caused a meaningful change in the output, so let's do this as a separate step, right now the generate-cmdlist.sh script just uses the order found in this file. Note that this refers to the sort order of the lines in command-list.txt, a subsequent commit will also change how we treat the sort order of the "category" fields, but that's unrelated to this change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- command-list.txt | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/command-list.txt b/command-list.txt index eb9cee8dee9..04cde20c3da 100644 --- a/command-list.txt +++ b/command-list.txt @@ -60,9 +60,9 @@ git-cat-file plumbinginterrogators git-check-attr purehelpers git-check-ignore purehelpers git-check-mailmap purehelpers +git-check-ref-format purehelpers git-checkout mainporcelain git-checkout-index plumbingmanipulators -git-check-ref-format purehelpers git-cherry plumbinginterrogators complete git-cherry-pick mainporcelain git-citool mainporcelain @@ -111,7 +111,6 @@ git-index-pack plumbingmanipulators git-init mainporcelain init git-instaweb ancillaryinterrogators complete git-interpret-trailers purehelpers -gitk mainporcelain git-log mainporcelain info git-ls-files plumbinginterrogators git-ls-remote plumbinginterrogators @@ -124,11 +123,11 @@ git-merge-base plumbinginterrogators git-merge-file plumbingmanipulators git-merge-index plumbingmanipulators git-merge-one-file purehelpers -git-mergetool ancillarymanipulators complete git-merge-tree ancillaryinterrogators -git-multi-pack-index plumbingmanipulators +git-mergetool ancillarymanipulators complete git-mktag plumbingmanipulators git-mktree plumbingmanipulators +git-multi-pack-index plumbingmanipulators git-mv mainporcelain worktree git-name-rev plumbinginterrogators git-notes mainporcelain @@ -154,23 +153,23 @@ git-request-pull foreignscminterface complete git-rerere ancillaryinterrogators git-reset mainporcelain history git-restore mainporcelain worktree -git-revert mainporcelain git-rev-list plumbinginterrogators git-rev-parse plumbinginterrogators +git-revert mainporcelain git-rm mainporcelain worktree git-send-email foreignscminterface complete git-send-pack synchingrepositories +git-sh-i18n purehelpers +git-sh-setup purehelpers git-shell synchelpers git-shortlog mainporcelain git-show mainporcelain info git-show-branch ancillaryinterrogators complete git-show-index plumbinginterrogators git-show-ref plumbinginterrogators -git-sh-i18n purehelpers -git-sh-setup purehelpers git-sparse-checkout mainporcelain -git-stash mainporcelain git-stage complete +git-stash mainporcelain git-status mainporcelain info git-stripspace purehelpers git-submodule mainporcelain @@ -189,7 +188,6 @@ git-var plumbinginterrogators git-verify-commit ancillaryinterrogators git-verify-pack plumbinginterrogators git-verify-tag ancillaryinterrogators -gitweb ancillaryinterrogators git-whatchanged ancillaryinterrogators complete git-worktree mainporcelain git-write-tree plumbingmanipulators @@ -204,6 +202,7 @@ gitfaq guide gitglossary guide githooks guide gitignore guide +gitk mainporcelain gitmailmap guide gitmodules guide gitnamespaces guide @@ -211,6 +210,7 @@ gitremote-helpers guide gitrepository-layout guide gitrevisions guide gitsubmodules guide -gittutorial-2 guide gittutorial guide +gittutorial-2 guide +gitweb ancillaryinterrogators gitworkflows guide -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason @ 2021-11-05 22:45 ` Junio C Hamano 2021-11-06 4:26 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 87+ messages in thread From: Junio C Hamano @ 2021-11-05 22:45 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > In a subsequent commit I'll change generate-cmdlist.sh to use C sort > order, and without this change we'd be led to believe that that change > caused a meaningful change in the output, What I found misleading in this sentence (which hasn't changed after I pointed it out in the previous iterations) is that command-list.txt is an input file, and if the sort order used in the script that reads this to produce some other file as its output changes, nobody will be "led to believe" anything. Not unless/until which output file to look at and compare between revisions. Is this talking about the order of entries in command-list.h file? Also, if the script sorts the input, whether it is in C locale or other locale, it would not matter how the input originally is sorted, as the input does not have duplicated entries to make sort stability matter, no? ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-11-05 22:45 ` Junio C Hamano @ 2021-11-06 4:26 ` Ævar Arnfjörð Bjarmason 2021-11-08 19:18 ` Junio C Hamano 0 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-06 4:26 UTC (permalink / raw) To: Junio C Hamano Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau On Fri, Nov 05 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> In a subsequent commit I'll change generate-cmdlist.sh to use C sort >> order, and without this change we'd be led to believe that that change >> caused a meaningful change in the output, > > What I found misleading in this sentence (which hasn't changed after > I pointed it out in the previous iterations) is that[...] I tried to clarify what you raised in the previous iteration with the new paragraph after the one you're quoting. I.e.: Note that this refers to the sort order of the lines in command-list.txt, a subsequent commit will also change how we treat the sort order of the "category" fields, but that's unrelated to this change. > command-list.txt is an input file, and if the sort order used in the > script that reads this to produce some other file as its output > changes, nobody will be "led to believe" anything. Not unless/until > which output file to look at and compare between revisions. I'd read your comment on the previous iteration as you being unclear on whether we were changing the sort order of lines in the file, or the category fields found within those lines. We don't sort the lines, and never have, just the fields, and within this series we stop doing that sorting (as it ends up in an OR'd bitfield, so it doesn't matter). > Is this talking about the order of entries in command-list.h file? > > Also, if the script sorts the input, whether it is in C locale or > other locale, it would not matter how the input originally is > sorted, as the input does not have duplicated entries to make sort > stability matter, no? This change is just to make this consistent for human editors. I think we re-sort this wherever we display this in git, whether that's via help.c or the completion powered by git.c. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" 2021-11-06 4:26 ` Ævar Arnfjörð Bjarmason @ 2021-11-08 19:18 ` Junio C Hamano 0 siblings, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-11-08 19:18 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Fri, Nov 05 2021, Junio C Hamano wrote: > >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> In a subsequent commit I'll change generate-cmdlist.sh to use C sort >>> order, and without this change we'd be led to believe that that change >>> caused a meaningful change in the output, >> >> What I found misleading in this sentence (which hasn't changed after >> I pointed it out in the previous iterations) is that[...] > > I tried to clarify what you raised in the previous iteration with the > new paragraph after the one you're quoting. I.e.: > > Note that this refers to the sort order of the lines in > command-list.txt, a subsequent commit will also change how we treat > the sort order of the "category" fields, but that's unrelated to this > change. Yeah, but my question was not about the order of category tokens on each line. My question was the order of these lines in the file. The new pargraph is answering a question that wasn't asked. >> Is this talking about the order of entries in command-list.h file? >> >> Also, if the script sorts the input, whether it is in C locale or >> other locale, it would not matter how the input originally is >> sorted, as the input does not have duplicated entries to make sort >> stability matter, no? > > This change is just to make this consistent for human editors. I think > we re-sort this wherever we display this in git, whether that's via > help.c or the completion powered by git.c. So >> order, and without this change we'd be led to believe that that change >> caused a meaningful change in the output, is something irrelevant to explain what this change is, no? ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason 2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason ` (7 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason Add " " before a "|" at the end of a line in generate-cmdlist.sh for consistency with other code in the file. Some of the surrounding code will be modified in subsequent commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 9dbbb08e70a..5114f46680a 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -10,7 +10,7 @@ command_list () { } get_categories () { - tr ' ' '\012'| + tr ' ' '\012' | grep -v '^$' | sort | uniq -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason 2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 22:47 ` Junio C Hamano 2021-11-05 14:08 ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason ` (6 subsequent siblings) 9 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason From: Johannes Sixt <j6t@kdbg.org> The function get_categories() is invoked in a loop over all commands. As it runs several processes, this takes an awful lot of time on Windows. To reduce the number of processes, move the process that filters empty lines to the other invoker of the function, where it is needed. The invocation of get_categories() in the loop does not need the empty line filtered away because the result is word-split by the shell, which eliminates the empty line automatically. Furthermore, use sort -u instead of sort | uniq to remove yet another process. [Ævar: on Linux this seems to speed things up a bit, although with hyperfine(1) the results are fuzzy enough to land within the confidence interval]: $ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old $ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt' Benchmark #1: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 371.3 ms ± 64.2 ms [User: 430.4 ms, System: 72.5 ms] Range (min … max): 320.5 ms … 517.7 ms 10 runs Benchmark #2: sh generate-cmdlist.sh.old command-list.txt Time (mean ± σ): 489.9 ms ± 185.4 ms [User: 724.7 ms, System: 141.3 ms] Range (min … max): 346.0 ms … 885.3 ms 10 runs Summary 'sh generate-cmdlist.sh command-list.txt' ran 1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt' Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 5114f46680a..27367915611 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -11,15 +11,14 @@ command_list () { get_categories () { tr ' ' '\012' | - grep -v '^$' | - sort | - uniq + LC_ALL=C sort -u } category_list () { command_list "$1" | cut -c 40- | - get_categories + get_categories | + grep -v '^$' } get_synopsis () { -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes 2021-11-05 14:08 ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason @ 2021-11-05 22:47 ` Junio C Hamano 2021-11-06 4:23 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 87+ messages in thread From: Junio C Hamano @ 2021-11-05 22:47 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh > index 5114f46680a..27367915611 100755 > --- a/generate-cmdlist.sh > +++ b/generate-cmdlist.sh > @@ -11,15 +11,14 @@ command_list () { > > get_categories () { > tr ' ' '\012' | > - grep -v '^$' | > - sort | > - uniq > + LC_ALL=C sort -u > } > > category_list () { > command_list "$1" | > cut -c 40- | > - get_categories > + get_categories | > + grep -v '^$' > } It is funny that this changes "grep then sort" into "sort then grep", which will be "corrected" in two steps down. The series seems a bit over-engineered and broken down too much, at least to me, but let's not waste any more time on it by an extra reroll. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes 2021-11-05 22:47 ` Junio C Hamano @ 2021-11-06 4:23 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-06 4:23 UTC (permalink / raw) To: Junio C Hamano Cc: git, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau On Fri, Nov 05 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh >> index 5114f46680a..27367915611 100755 >> --- a/generate-cmdlist.sh >> +++ b/generate-cmdlist.sh >> @@ -11,15 +11,14 @@ command_list () { >> >> get_categories () { >> tr ' ' '\012' | >> - grep -v '^$' | >> - sort | >> - uniq >> + LC_ALL=C sort -u >> } >> >> category_list () { >> command_list "$1" | >> cut -c 40- | >> - get_categories >> + get_categories | >> + grep -v '^$' >> } > > It is funny that this changes "grep then sort" into "sort then > grep", which will be "corrected" in two steps down. The series > seems a bit over-engineered and broken down too much, at least to > me, but let's not waste any more time on it by an extra reroll. Yes, it's a bit of back and forth, but I didn't want to outright drop Johannes's patches which I'd integrated here, and thought it would be helpful to others to distill the history of various optimization steps (starting with Johannes's work here) into the permanent commit history. ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason ` (2 preceding siblings ...) 2021-11-05 14:08 ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason ` (5 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason This isn't for optimization as the get_categories() is a purely shell function, but rather for ease of readability, let's just inline these two lines. We'll be changing this code some more in subsequent commits to make this worth it. Rename the get_categories() function to get_category_line(), since that's what it's doing now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 27367915611..16043e38476 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -9,7 +9,7 @@ command_list () { eval "grep -ve '^#' $exclude_programs" <"$1" } -get_categories () { +get_category_line () { tr ' ' '\012' | LC_ALL=C sort -u } @@ -17,7 +17,8 @@ get_categories () { category_list () { command_list "$1" | cut -c 40- | - get_categories | + tr ' ' '\012' | + LC_ALL=C sort -u | grep -v '^$' } @@ -66,7 +67,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - for cat in $(echo "$rest" | get_categories) + for cat in $(echo "$rest" | get_category_line) do printf " | CAT_$cat" done -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason ` (3 preceding siblings ...) 2021-11-05 14:08 ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason ` (4 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason This doesn't matter for performance, but let's not include the empty lines in our sorting. This makes the intent of the code clearer. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 16043e38476..e517c33710a 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -18,8 +18,8 @@ category_list () { command_list "$1" | cut -c 40- | tr ' ' '\012' | - LC_ALL=C sort -u | - grep -v '^$' + grep -v '^$' | + LC_ALL=C sort -u } get_synopsis () { -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason ` (4 preceding siblings ...) 2021-11-05 14:08 ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason ` (3 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason From: Johannes Sixt <j6t@kdbg.org> This is just a small code reduction. There is a small probability that the new code breaks when the category list is empty. But that would be noticed during the compile step. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index e517c33710a..a1ab2b1f077 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -67,10 +67,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - for cat in $(echo "$rest" | get_category_line) - do - printf " | CAT_$cat" - done + printf " | CAT_%s" $(echo "$rest" | get_category_line) echo " }," done echo "};" -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason ` (5 preceding siblings ...) 2021-11-05 14:08 ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason ` (2 subsequent siblings) 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason In a preceding commit we changed the print_command_list() loop to use printf's auto-repeat feature. Let's now get rid of get_category_line() entirely by not sorting the categories. This will change the output of the generated code from e.g.: - { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators }, To: + { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete }, I.e. the categories are no longer sorted, but as they're OR'd together it won't matter for the end result. This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old) and "master" to this code: 'sh generate-cmdlist.sh command-list.txt' ran 1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt' 1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt' Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index a1ab2b1f077..f50112c50f8 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -9,11 +9,6 @@ command_list () { eval "grep -ve '^#' $exclude_programs" <"$1" } -get_category_line () { - tr ' ' '\012' | - LC_ALL=C sort -u -} - category_list () { command_list "$1" | cut -c 40- | @@ -67,7 +62,7 @@ print_command_list () { while read cmd rest do printf " { \"$cmd\", $(get_synopsis $cmd), 0" - printf " | CAT_%s" $(echo "$rest" | get_category_line) + printf " | CAT_%s" $rest echo " }," done echo "};" -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason ` (6 preceding siblings ...) 2021-11-05 14:08 ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason From: Jeff King <peff@peff.net> Replace the "sed" invocation in get_synopsis() with a pure-shell version. This speeds up generate-cmdlist.sh significantly. Compared to HEAD~ (old) and "master" we are, according to hyperfine(1): 'sh generate-cmdlist.sh command-list.txt' ran 12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt' 18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt' Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jeff King <peff@peff.net> --- generate-cmdlist.sh | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index f50112c50f8..9b7d6aea629 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -17,16 +17,6 @@ category_list () { LC_ALL=C sort -u } -get_synopsis () { - sed -n ' - /^NAME/,/'"$1"'/H - ${ - x - s/.*'"$1"' - \(.*\)/N_("\1")/ - p - }' "Documentation/$1.txt" -} - define_categories () { echo echo "/* Command categories */" @@ -61,7 +51,18 @@ print_command_list () { command_list "$1" | while read cmd rest do - printf " { \"$cmd\", $(get_synopsis $cmd), 0" + synopsis= + while read line + do + case "$line" in + "$cmd - "*) + synopsis=${line#$cmd - } + break + ;; + esac + done <"Documentation/$cmd.txt" + + printf '\t{ "%s", N_("%s"), 0' "$cmd" "$synopsis" printf " | CAT_%s" $rest echo " }," done -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason ` (7 preceding siblings ...) 2021-11-05 14:08 ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason Replace the "grep" we run to exclude certain programs from the generated output with a pure-shell loop that strips out the comments, and sees if the "cmd" we're reading is on a list of excluded programs. This uses a trick similar to test_have_prereq() in test-lib-functions.sh. On my *nix system this makes things quite a bit slower compared to HEAD~: o 'sh generate-cmdlist.sh.old command-list.txt' ran 1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt' 18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt' But when I tried running generate-cmdlist.sh 100 times in CI I found that it helped across the board even on OSX & Linux. I tried testing it in CI with this ad-hoc few-liner: for i in $(seq -w 0 11 | sort -nr) do git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh && git add generate-cmdlist* && cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : && perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh && git add t/t00*.sh done && git commit -m"generated it" Here HEAD~02 and the t0002* file refers to this change, and HEAD~03 and t0003* file to the preceding commit, the relevant results were: linux-gcc: [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) [12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU) osx-gcc: [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) [11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU) vs-test: [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) [12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU) I.e. even on *nix running 100 of these in a loop was up to ~2x faster in absolute runtime, I suspect it's due factors that are exacerbated in the CI, e.g. much slower process startup due to some platform limits, or a slower FS. The "cut -d" change here is because we're not emitting the 40-character aligned output anymore, i.e. we'll get the output from command_list() now, not an as-is line from command-list.txt. This also makes the parsing more reliable, as we could tweak the whitespace alignment without breaking this parser. Let's reword a now-inaccurate comment in "command-list.txt" describing that previous alignment limitation. We'll still need the "### command-list [...]" line due to the "Documentation/cmd-list.perl" logic added in 11c6659d85d (command-list: prepare machinery for upcoming "common groups" section, 2015-05-21). There was a proposed change subsequent to this one[3] which continued moving more logic into the "command_list() function, i.e. replaced the "cut | tr | grep" chain in "category_list()" with an argument to "command_list()". That change might have had a bit of an effect, but not as much as the preceding commit, so I decided to drop it. The relevant performance numbers from it were: linux-gcc: [12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU) [12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU) osx-gcc: [11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU) [11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU) vs-test: [12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU) [12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU) As above HEAD~2 and t0002* are testing the code in this commit (and the line is the same), but HEAD~1 and t0001* are testing that dropped change in [3]. 1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/ 3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/ --- command-list.txt | 2 +- generate-cmdlist.sh | 24 ++++++++++++++++++++---- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/command-list.txt b/command-list.txt index 04cde20c3da..675c28f0bd0 100644 --- a/command-list.txt +++ b/command-list.txt @@ -43,7 +43,7 @@ # specified here, which can only have "guide" attribute and nothing # else. # -### command list (do not change this line, also do not change alignment) +### command list (do not change this line) # command name category [category] [category] git-add mainporcelain worktree git-am mainporcelain diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index 9b7d6aea629..cfe0454d1de 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -6,12 +6,28 @@ die () { } command_list () { - eval "grep -ve '^#' $exclude_programs" <"$1" + while read cmd rest + do + case "$cmd" in + "#"* | '') + # Ignore comments and allow empty lines + continue + ;; + *) + case "$exclude_programs" in + *":$cmd:"*) + ;; + *) + echo "$cmd $rest" + ;; + esac + esac + done <"$1" } category_list () { command_list "$1" | - cut -c 40- | + cut -d' ' -f2- | tr ' ' '\012' | grep -v '^$' | LC_ALL=C sort -u @@ -69,11 +85,11 @@ print_command_list () { echo "};" } -exclude_programs= +exclude_programs=: while test "--exclude-program" = "$1" do shift - exclude_programs="$exclude_programs -e \"^$1 \"" + exclude_programs="$exclude_programs$1:" shift done -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason ` (8 preceding siblings ...) 2021-11-05 14:08 ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason 9 siblings, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-11-05 14:08 UTC (permalink / raw) To: git Cc: Junio C Hamano, Jeff King, Johannes Sixt, Øystein Walle, Eric Sunshine, Taylor Blau, Ævar Arnfjörð Bjarmason Change the "define_categories()" and "define_category_names()" functions to take the already-parsed output of "category_list()" as an argument, which brings our number of passes over "command-list.txt" from three to two. Then have "category_list()" itself take the output of "command_list()" as an argument, bringing the number of times we parse the file to one. Compared to the pre-image this speeds us up quite a bit: $ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old $ hyperfine --warmup 10 -L v ,.old 'sh generate-cmdlist.sh{v} command-list.txt' Benchmark #1: sh generate-cmdlist.sh command-list.txt Time (mean ± σ): 22.9 ms ± 0.3 ms [User: 15.8 ms, System: 9.6 ms] Range (min … max): 22.5 ms … 24.0 ms 125 runs Benchmark #2: sh generate-cmdlist.sh.old command-list.txt Time (mean ± σ): 30.1 ms ± 0.4 ms [User: 24.4 ms, System: 17.5 ms] Range (min … max): 29.5 ms … 32.3 ms 96 runs Summary 'sh generate-cmdlist.sh command-list.txt' ran 1.32 ± 0.02 times faster than 'sh generate-cmdlist.sh.old command-list.txt' Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- generate-cmdlist.sh | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh index cfe0454d1de..205541e0f7f 100755 --- a/generate-cmdlist.sh +++ b/generate-cmdlist.sh @@ -26,7 +26,7 @@ command_list () { } category_list () { - command_list "$1" | + echo "$1" | cut -d' ' -f2- | tr ' ' '\012' | grep -v '^$' | @@ -37,7 +37,7 @@ define_categories () { echo echo "/* Command categories */" bit=0 - category_list "$1" | + echo "$1" | while read cat do echo "#define CAT_$cat (1UL << $bit)" @@ -51,7 +51,7 @@ define_category_names () { echo "/* Category names */" echo "static const char *category_names[] = {" bit=0 - category_list "$1" | + echo "$1" | while read cat do echo " \"$cat\", /* (1UL << $bit) */" @@ -64,7 +64,7 @@ define_category_names () { print_command_list () { echo "static struct cmdname_help command_list[] = {" - command_list "$1" | + echo "$1" | while read cmd rest do synopsis= @@ -93,6 +93,9 @@ do shift done +commands="$(command_list "$1")" +categories="$(category_list "$commands")" + echo "/* Automatically generated by generate-cmdlist.sh */ struct cmdname_help { const char *name; @@ -100,8 +103,8 @@ struct cmdname_help { uint32_t category; }; " -define_categories "$1" +define_categories "$categories" echo -define_category_names "$1" +define_category_names "$categories" echo -print_command_list "$1" +print_command_list "$commands" -- 2.34.0.rc1.721.ga0c1db665bc ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 15:16 ` Jeff King 2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason 2021-06-24 21:30 ` Johannes Sixt @ 2021-06-25 21:17 ` Felipe Contreras 2021-06-29 5:04 ` Eric Sunshine 3 siblings, 0 replies; 87+ messages in thread From: Felipe Contreras @ 2021-06-25 21:17 UTC (permalink / raw) To: Jeff King, Ævar Arnfjörð Bjarmason Cc: Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Eric Sunshine, Johannes Schindelin Jeff King wrote: > On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote: > > > This is probably all stuff that's been on list-before / known by > > some/all people in the CC list, but in case not: I looked a bit into why > > we'e so frequently re-linking and re compiling things these days, > > slowing down e.g. "git rebase --exec='make ...'". > > > > These are all fixable issues, I haven't worked on them, just some notes > > in case anyone has better ideas: > > From a quick skim I didn't see anything wrong in your analysis or > suggestions. I do kind of wonder if we are hitting a point of > diminishing returns here. "make -j16" on my system takes ~175ms for a > noop, and ~650ms if I have to regenerate version.h (it's something like > 2s total of CPU, but I have 8 cores). > > I know I've probably got a nicer machine than many other folks. But at > some point correctness and avoiding complexity in the Makefile become a > win over shaving off a second from compile times. You'd probably find > lower hanging fruit in the test suite which could shave off tens of > seconds. :) That's not a good enough reason to not try to improve something. In fact, it's a known fallacy called the fallacy of relative privation [1]. Also, your definition of "correctness" is not necessarily shared by everyone. I for one don't see what I'm gaining by running tests with GIT_VERSION = 2.32.0.97.g949e814b27, and then 2.32.0.98.gcfb60a24d6. What's wrong with GIT_VERSION = 2.33-dev? Sure, I understand that some people might want to have a precise version, but when I'm doing development I don't. Perhaps set a static version when DEVELOPER=1? [1] https://rationalwiki.org/wiki/Not_as_bad_as -- Felipe Contreras ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 15:16 ` Jeff King ` (2 preceding siblings ...) 2021-06-25 21:17 ` Why the Makefile is so eager to re-build & re-link Felipe Contreras @ 2021-06-29 5:04 ` Eric Sunshine 3 siblings, 0 replies; 87+ messages in thread From: Eric Sunshine @ 2021-06-29 5:04 UTC (permalink / raw) To: Jeff King Cc: Ævar Arnfjörð Bjarmason, Git List, Junio C Hamano, Taylor Blau, Emily Shaffer, Johannes Schindelin On Thu, Jun 24, 2021 at 11:16 AM Jeff King <peff@peff.net> wrote: > On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote: > > I think the best solution here is to make the generate-*.sh > > shellscripts faster (just one takes ~300ms of nested shellscripting, > > just to grep out the first few lines of every git-*.txt, in e.g. Perl > > or a smarter awk script this would be <5ms). > > Yeah, I think Eric mentioned he had looked into doing this in perl, but > we weren't entirely happy with the dependency. [...] For what it's worth, the original `generate-cmdlist` was a shell script which I rewrote[1] in `awk` to extend the functionality, but Junio felt uncomfortable[2] about making `awk` a build dependency, so I rewrote[3] it again in `perl`. However, the `perl` version didn't last long since we got a report[4] that Git would no longer build in the FreeBSD ports tree, so I rewrote[5] it a final time in shell, thus coming full circle (but with extended functionality). [1]: https://lore.kernel.org/git/1431976697-26288-4-git-send-email-sebastien.guimmara@gmail.com/ [2]: https://lore.kernel.org/git/xmqqr3qda7kx.fsf@gitster.dls.corp.google.com/ [3]: https://lore.kernel.org/git/1432149781-24596-4-git-send-email-sebastien.guimmara@gmail.com/ [4]: https://lore.kernel.org/git/loom.20150814T171757-901@post.gmane.org/ [5]: https://lore.kernel.org/git/1440365469-9928-1-git-send-email-sunshine@sunshineco.com/ ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason 2021-06-24 15:16 ` Jeff King @ 2021-06-24 23:35 ` Øystein Walle 2021-06-24 23:39 ` Øystein Walle 2021-06-25 0:11 ` Ævar Arnfjörð Bjarmason 2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason 2 siblings, 2 replies; 87+ messages in thread From: Øystein Walle @ 2021-06-24 23:35 UTC (permalink / raw) To: avarab; +Cc: Johannes.Schindelin, emilyshaffer, git, gitster, me, peff, sunshine Hi, Ævar > {command,config}-list.h (and in-flight, my hook-list.h): Every time > you touch a Documentation/git-*.txt we need to re-generate these, and > since their mtime changes we re-compile and re-link all the way up to > libgit and our other tools. > > I think the best solution here is to make the generate-*.sh > shellscripts faster (just one takes ~300ms of nested shellscripting, > just to grep out the first few lines of every git-*.txt, in e.g. Perl > or a smarter awk script this would be <5ms). > > Then we make those FORCE, but most of the time the config or command > summary (or list of hooks) doesn't change, so we don't need to > replace the file. One possible technique to fix this is to generate to a temporary file, and copy the temporary file over the real file if it's actually different. I see the Makefile already does create a temporary file, so how about something like this: diff --git a/Makefile b/Makefile index 93664d6714..9b2f081a2a 100644 --- a/Makefile +++ b/Makefile @@ -2216,7 +2216,8 @@ command-list.h: generate-cmdlist.sh command-list.txt command-list.h: $(wildcard Documentation/git*.txt) $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \ $(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \ - command-list.txt >$@+ && mv $@+ $@ + command-list.txt >$@+ && \ + if ! cmp -s $@ $@+; then mv $@+ $@; fi SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\ $(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\ This seems to work fine from my basic testing. I think it can even be written more terse as `&& ! cmp ... &&` but that looks a bit weird to me. In any case it looks like it can easily be added the other relevant places in the Makefile too. Øsse ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 23:35 ` Øystein Walle @ 2021-06-24 23:39 ` Øystein Walle 2021-06-25 0:11 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 87+ messages in thread From: Øystein Walle @ 2021-06-24 23:39 UTC (permalink / raw) To: oystwa Cc: Johannes.Schindelin, avarab, emilyshaffer, git, gitster, me, peff, sunshine Oops, I missed that you had already discussed this idea. Sorry about about that. Øsse ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: Why the Makefile is so eager to re-build & re-link 2021-06-24 23:35 ` Øystein Walle 2021-06-24 23:39 ` Øystein Walle @ 2021-06-25 0:11 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-06-25 0:11 UTC (permalink / raw) To: Øystein Walle Cc: Johannes.Schindelin, emilyshaffer, git, gitster, me, peff, sunshine On Fri, Jun 25 2021, Øystein Walle wrote: > Hi, Ævar > >> {command,config}-list.h (and in-flight, my hook-list.h): Every time >> you touch a Documentation/git-*.txt we need to re-generate these, and >> since their mtime changes we re-compile and re-link all the way up to >> libgit and our other tools. >> >> I think the best solution here is to make the generate-*.sh >> shellscripts faster (just one takes ~300ms of nested shellscripting, >> just to grep out the first few lines of every git-*.txt, in e.g. Perl >> or a smarter awk script this would be <5ms). >> >> Then we make those FORCE, but most of the time the config or command >> summary (or list of hooks) doesn't change, so we don't need to >> replace the file. > > One possible technique to fix this is to generate to a temporary file, and copy > the temporary file over the real file if it's actually different. I see the > Makefile already does create a temporary file, so how about something like > this: > > diff --git a/Makefile b/Makefile > index 93664d6714..9b2f081a2a 100644 > --- a/Makefile > +++ b/Makefile > @@ -2216,7 +2216,8 @@ command-list.h: generate-cmdlist.sh command-list.txt > command-list.h: $(wildcard Documentation/git*.txt) > $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \ > $(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \ > - command-list.txt >$@+ && mv $@+ $@ > + command-list.txt >$@+ && \ > + if ! cmp -s $@ $@+; then mv $@+ $@; fi > > SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\ > $(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\ > > > This seems to work fine from my basic testing. I think it can even be written > more terse as `&& ! cmp ... &&` but that looks a bit weird to me. In any case > it looks like it can easily be added the other relevant places in the Makefile > too. I see your downthread "oops", no worries. Just to elaborate slightly on the comment I had in [1] (and as you've doubtless discovered) then yes, this works the first time: $ touch Documentation/git-bisect.txt $ make -k -j $(nproc) git GEN command-list.h Without your change we'd have also had to recompile "git", but the problem is that if you run and re-run that we'll keep making command-list.h again and again, which is "make" getting stuck on, because: $ make -d -k -j $(nproc) git 2>&1 | grep newer.*command-list.h Prerequisite 'Documentation/git-bisect.txt' is newer than target 'command-list.h'. Hence the "lowest level" comment in [1], i.e. you can only use this "cmp" trick for things that don't themselves have dependencies. Which at the end of the day is why make is both wonderful, and why it sucks. It sometimes gets hard to force your "circular" dependency graph into the "square peg" of files and their mtimes, but it's also great for simplicity that everyone can't write their own custom "this is what it means for my dependencies to have changed" function (which in practice would break way more in the hands of most programmers, including yours truly, than files & their mtimes). E.g. in this case we don't *actually* depend on Documentation/git*.txt for the purposes of re-making things all the way up the graph, we depend on the tiny bit of data we extract from those files, basically just the NAME section at the very beginning. The only way to represent that information in a way that "make" understands would be to add another intermediate step where we extract exactly that information out into intermediate files with a script, and then in turn depend on those files. Which actually, might not be such a bad idea as a solution to this particular problem. I.e. just to have individual generated versions of git-bisect.txt.just-what-command-list.h-needs. We do something vaguely similar with Documentation/build-docdep.perl, which is another glorious hack of trying to get make's dependency graph to behave as we'd like. As a practical matter it wouldn't get you very far except for doc-only changes, since you'd still have the version.c problem, which also changes on every (code) commit. 1. http://lore.kernel.org/git/87y2azyzer.fsf@evledraar.gmail.com ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason 2021-06-24 15:16 ` Jeff King 2021-06-24 23:35 ` Øystein Walle @ 2021-07-02 11:58 ` Ævar Arnfjörð Bjarmason 2021-07-02 15:53 ` Junio C Hamano 2021-07-03 1:05 ` Felipe Contreras 2 siblings, 2 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-07-02 11:58 UTC (permalink / raw) To: git Cc: Junio C Hamano, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle, Ævar Arnfjörð Bjarmason Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17) we've been eagerly re-building the documentation whenever the output of "git version" (via the GIT-VERSION file) changed. This was never the intention, and was a regression on what we intended in 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation, 2007-03-25). So let's add an ASCIIDOC_MANVERSION variable that we exclude from ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch cases where we e.g. switched between asciidoc and asciidoctor, not to undo the logic in 7b8a74f39cb and force a re-build every time our HEAD changed in the repository. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- As a follow-up to https://lore.kernel.org/git/874kdn1j6i.fsf@evledraar.gmail.com/ I cut "make man" out of my "rebase -x" invocations, I could swear it didn't used to take so long. Turns out it didn't, and that its eagerness is a recent-ish regression. This is what we used to do before v2.22.0, so I'm not too worried about the edge case discussed in the comment here. I think an improvement on this might be to e.g. force all the flags with a "make dist" or one of the install targets. In practice I don't think there's many/any people who build releases that matter to anyone out of the checkout they've been using for their own development. Documentation/Makefile | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/Documentation/Makefile b/Documentation/Makefile index f5605b7767f..6b3f0bb6c8b 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -137,11 +137,12 @@ ASCIIDOC_HTML = xhtml11 ASCIIDOC_DOCBOOK = docbook ASCIIDOC_CONF = -f asciidoc.conf ASCIIDOC_COMMON = $(ASCIIDOC) $(ASCIIDOC_EXTRA) $(ASCIIDOC_CONF) \ - -amanversion=$(GIT_VERSION) \ -amanmanual='Git Manual' -amansource='Git' +ASCIIDOC_MANVERSION = -amanversion=$(GIT_VERSION) +ASCIIDOC_ALL = $(ASCIIDOC_COMMON) $(ASCIIDOC_MANVERSION) ASCIIDOC_DEPS = asciidoc.conf GIT-ASCIIDOCFLAGS -TXT_TO_HTML = $(ASCIIDOC_COMMON) -b $(ASCIIDOC_HTML) -TXT_TO_XML = $(ASCIIDOC_COMMON) -b $(ASCIIDOC_DOCBOOK) +TXT_TO_HTML = $(ASCIIDOC_ALL) -b $(ASCIIDOC_HTML) +TXT_TO_XML = $(ASCIIDOC_ALL) -b $(ASCIIDOC_DOCBOOK) MANPAGE_XSL = manpage-normal.xsl XMLTO = xmlto XMLTO_EXTRA = @@ -333,6 +334,16 @@ mergetools-list.made: ../git-mergetool--lib.sh $(wildcard ../mergetools/*) show_tool_names can_merge "* " || :' >mergetools-merge.txt && \ date >$@ +# We use $(ASCIIDOC_COMMON) here, and not $(ASCIIDOC_ALL). We don't +# want to include $(ASCIIDOC_MANVERSION) and have the documentation +# re-built every time HEAD changes. +# +# This is a trade-off requiring a "clean" build of the documentation +# for release purposes, in the future we might include the version if +# there's a cheaper way to re-insert the "Source" version during +# re-builds. If we detect that that's the only thing we changed we +# could insert it with a cheap search/replacement against the existing +# files. TRACK_ASCIIDOCFLAGS = $(subst ','\'',$(ASCIIDOC_COMMON):$(ASCIIDOC_HTML):$(ASCIIDOC_DOCBOOK)) GIT-ASCIIDOCFLAGS: FORCE -- 2.32.0.634.g284ac724283 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason @ 2021-07-02 15:53 ` Junio C Hamano 2021-07-03 11:58 ` Ævar Arnfjörð Bjarmason 2021-07-03 1:05 ` Felipe Contreras 1 sibling, 1 reply; 87+ messages in thread From: Junio C Hamano @ 2021-07-02 15:53 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17) > we've been eagerly re-building the documentation whenever the output > of "git version" (via the GIT-VERSION file) changed. This was never > the intention, and was a regression on what we intended in > 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation, > 2007-03-25). I am not sure. Even if there were no changes in say 'Documentation/git-cat-file.txt' and the sources it depends on between 'master' and 'next', after doing this: $ git checkout next $ make prefix=$HOME/git-next/ install install-doc $ git checkout master $ make prefix=$HOME/git-master/ install install-doc $ $HOME/git-master/bin/git help cat-file | tail -n 1 I should see that the documentation should say it is from the 'master' branch in its footer, no? In other words, I think 7b8a74f39cb's reasoning (not the implementation), especially the last sentence of its log message, is flawed, where it said: Documentation: Replace @@GIT_VERSION@@ in documentation Include GIT-VERSION-FILE and replace @@GIT_VERSION@@ in the HTML and XML asciidoc output. The documentation doesn't depend on GIT-VERSION-FILE so it will not be automatically rebuild if nothing else changed. Thanks. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-02 15:53 ` Junio C Hamano @ 2021-07-03 11:58 ` Ævar Arnfjörð Bjarmason 2021-07-05 19:48 ` Junio C Hamano 0 siblings, 1 reply; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-07-03 11:58 UTC (permalink / raw) To: Junio C Hamano Cc: git, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle On Fri, Jul 02 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17) >> we've been eagerly re-building the documentation whenever the output >> of "git version" (via the GIT-VERSION file) changed. This was never >> the intention, and was a regression on what we intended in >> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation, >> 2007-03-25). > > I am not sure. Even if there were no changes in say > 'Documentation/git-cat-file.txt' and the sources it depends on > between 'master' and 'next', after doing this: > > $ git checkout next > $ make prefix=$HOME/git-next/ install install-doc > $ git checkout master > $ make prefix=$HOME/git-master/ install install-doc > $ $HOME/git-master/bin/git help cat-file | tail -n 1 > > I should see that the documentation should say it is from the > 'master' branch in its footer, no? Yes in theory, in practice it's very annoying to have the very slow documentation build be re-built so aggressively. Since it wasn't a practical issue anyone worried about before 2019 I think it's worth reverting it. > In other words, I think 7b8a74f39cb's reasoning (not the > implementation), especially the last sentence of its log message, is > flawed, where it said: > > Documentation: Replace @@GIT_VERSION@@ in documentation > > Include GIT-VERSION-FILE and replace @@GIT_VERSION@@ in > the HTML and XML asciidoc output. The documentation > doesn't depend on GIT-VERSION-FILE so it will not be > automatically rebuild if nothing else changed. Arguably it's a feature. The point of the version in the documentation is to make it clear what version we're discussing. If I build something on the master SHA-1 and advance to next, and none of the documentation dependencies change, it's most useful to refer to the oldest last version we can cover. I think nobody's doing such a "chained" build when building the docs for a "real" release, and having mixed versions might be confusing, but for the "local build" case from a development checkout it's arguably more useful. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-03 11:58 ` Ævar Arnfjörð Bjarmason @ 2021-07-05 19:48 ` Junio C Hamano 0 siblings, 0 replies; 87+ messages in thread From: Junio C Hamano @ 2021-07-05 19:48 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > I think nobody's doing such a "chained" build when building the docs for > a "real" release, and having mixed versions might be confusing, but for > the "local build" case from a development checkout it's arguably more > useful. Well, when I prepare a cascade of maintenance releases for CVE, what do you think is done? Yes, I do have "make distclean" in the loop that iterates over the maint-$version branches, just to be extra careful, but I shouldn't have to. ^ permalink raw reply [flat|nested] 87+ messages in thread
* RE: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason 2021-07-02 15:53 ` Junio C Hamano @ 2021-07-03 1:05 ` Felipe Contreras 2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 87+ messages in thread From: Felipe Contreras @ 2021-07-03 1:05 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, git Cc: Junio C Hamano, Martin Ågren, Jeff King, Felipe Contreras, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle, Ævar Arnfjörð Bjarmason Ævar Arnfjörð Bjarmason wrote: > Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17) > we've been eagerly re-building the documentation whenever the output > of "git version" (via the GIT-VERSION file) changed. This was never > the intention, and was a regression on what we intended in > 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation, > 2007-03-25). > > So let's add an ASCIIDOC_MANVERSION variable that we exclude from > ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch > cases where we e.g. switched between asciidoc and asciidoctor, not to > undo the logic in 7b8a74f39cb and force a re-build every time our HEAD > changed in the repository. Once again, why do we care that the version is 2.32.0.98.gcfb60a24d6 and not 2.32.0.97.g949e814b27? Not just in the documentation, but everywhere. Maybe we can add a GIT_RELEASE variable that unlike GIT_VERSION it doesn't contain the precise commit. For example GIT_RELEASE = 2.33-dev. -- Felipe Contreras ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-03 1:05 ` Felipe Contreras @ 2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason 2021-07-03 18:56 ` Felipe Contreras 2021-07-05 19:38 ` Junio C Hamano 0 siblings, 2 replies; 87+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-07-03 12:03 UTC (permalink / raw) To: Felipe Contreras Cc: git, Junio C Hamano, Martin Ågren, Jeff King, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle On Fri, Jul 02 2021, Felipe Contreras wrote: > Ævar Arnfjörð Bjarmason wrote: >> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17) >> we've been eagerly re-building the documentation whenever the output >> of "git version" (via the GIT-VERSION file) changed. This was never >> the intention, and was a regression on what we intended in >> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation, >> 2007-03-25). >> >> So let's add an ASCIIDOC_MANVERSION variable that we exclude from >> ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch >> cases where we e.g. switched between asciidoc and asciidoctor, not to >> undo the logic in 7b8a74f39cb and force a re-build every time our HEAD >> changed in the repository. > > Once again, why do we care that the version is 2.32.0.98.gcfb60a24d6 and > not 2.32.0.97.g949e814b27? > > Not just in the documentation, but everywhere. It's useful e.g. on my Debian system to see that the "next" Debian packaged is 2.31.0.291.g576ba9dcdaf in docs & "git version", arguably less so for documentation. But yeah, I think we should at least have some opt-out for sticking the exact version everywhere, given the mostly unless re-building it does during development. > Maybe we can add a GIT_RELEASE variable that unlike GIT_VERSION it > doesn't contain the precise commit. For example GIT_RELEASE = 2.33-dev. I just added this to my pre-make script: grep -q ^/version .git/info/exclude || echo /version >>.git/info/exclude echo $(grep -o -P '(?<=^DEF_VER=v).*' GIT-VERSION-GEN)-dev >version It makes use of GIT-VERSION-GEN picking up a tarball "version" file. So far it seems like Junio isn't interested in the patch, and in any case it doesn't fix the more general over-rebuilding due to version changes noted upthread when it comes to *.o and libgit. Doing this fixes both for me. Then when I build an installed version I don't use that trick. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason @ 2021-07-03 18:56 ` Felipe Contreras 2021-07-05 19:38 ` Junio C Hamano 1 sibling, 0 replies; 87+ messages in thread From: Felipe Contreras @ 2021-07-03 18:56 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Felipe Contreras Cc: git, Junio C Hamano, Martin Ågren, Jeff King, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle Ævar Arnfjörð Bjarmason wrote: > > On Fri, Jul 02 2021, Felipe Contreras wrote: > > > Ævar Arnfjörð Bjarmason wrote: > >> Since 9a71722b4df (Doc: auto-detect changed build flags, 2019-03-17) > >> we've been eagerly re-building the documentation whenever the output > >> of "git version" (via the GIT-VERSION file) changed. This was never > >> the intention, and was a regression on what we intended in > >> 7b8a74f39cb (Documentation: Replace @@GIT_VERSION@@ in documentation, > >> 2007-03-25). > >> > >> So let's add an ASCIIDOC_MANVERSION variable that we exclude from > >> ASCIIDOC_COMMON. The change in 9a71722b4df was only intending to catch > >> cases where we e.g. switched between asciidoc and asciidoctor, not to > >> undo the logic in 7b8a74f39cb and force a re-build every time our HEAD > >> changed in the repository. > > > > Once again, why do we care that the version is 2.32.0.98.gcfb60a24d6 and > > not 2.32.0.97.g949e814b27? > > > > Not just in the documentation, but everywhere. > > It's useful e.g. on my Debian system to see that the "next" Debian > packaged is 2.31.0.291.g576ba9dcdaf in docs & "git version", arguably > less so for documentation. Obviously packagers would use real versions I meant this only for developers (e.g. DEVELOPER=1). And BTW, where is that Debian package coming from? I thought distributions didn't package git pre-releases. > > Maybe we can add a GIT_RELEASE variable that unlike GIT_VERSION it > > doesn't contain the precise commit. For example GIT_RELEASE = 2.33-dev. > > I just added this to my pre-make script: > > grep -q ^/version .git/info/exclude || echo /version >>.git/info/exclude > echo $(grep -o -P '(?<=^DEF_VER=v).*' GIT-VERSION-GEN)-dev >version > > It makes use of GIT-VERSION-GEN picking up a tarball "version" file. This would also do the trick (you need DEVELOPER on the environment though). --- a/GIT-VERSION-GEN +++ b/GIT-VERSION-GEN @@ -6,9 +6,10 @@ DEF_VER=v2.32.0 LF=' ' -# First see if there is a version file (included in release tarballs), -# then try git-describe, then default. -if test -f version +if test -n "$DEVELOPER" +then + VN="$DEF_VER"-dev +elif test -f version then VN=$(cat version) || VN="$DEF_VER" elif test -d ${GIT_DIR:-.git} -o -f .git && -- Felipe Contreras ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason 2021-07-03 18:56 ` Felipe Contreras @ 2021-07-05 19:38 ` Junio C Hamano 2021-07-06 22:25 ` Felipe Contreras 1 sibling, 1 reply; 87+ messages in thread From: Junio C Hamano @ 2021-07-05 19:38 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Felipe Contreras, git, Martin Ågren, Jeff King, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > It's useful e.g. on my Debian system to see that the "next" Debian > packaged is 2.31.0.291.g576ba9dcdaf in docs & "git version", arguably > less so for documentation. If it is arguable, perhaps make an argument that is more convincing? What I dislike the most is that in the sample scenario where master and next has the same documentation material to build "git-cat-file.1", the installed result would be different depending on the order of building the documentation, with the change being discussed, i.e. $ git checkout master && make prefix=$HOME/git-master install-doc $ git checkout next && make prefix=$HOME/git-next install-doc would make "~/git-next/bin/git help cat-file" to claim the documentation is from the "master" version. Which is not all that bad, given that there wasn't anything that changed the documentation between 'master' and 'next'. But if you swap the installation order, "~/git-master/bin/git help cat-file" would say that the documentation is from a version much newer than 'master', which is not quite acceptable. I am OK with the approach you hinted to have an option to _hide_ the version string in the generated documentation (hence they lose their dependency on GIT-VERSION-FILE), while keeping the dependency of version.o on GIT-VERSION-FILE, so that something goes wrong in a built binary, the developer can still ask "git version" to identify where the binary came from. Thanks. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH] Documentation/Makefile: don't re-build on 'git version' changes 2021-07-05 19:38 ` Junio C Hamano @ 2021-07-06 22:25 ` Felipe Contreras 0 siblings, 0 replies; 87+ messages in thread From: Felipe Contreras @ 2021-07-06 22:25 UTC (permalink / raw) To: Junio C Hamano, Ævar Arnfjörð Bjarmason Cc: Felipe Contreras, git, Martin Ågren, Jeff King, Taylor Blau, Emily Shaffer, Eric Sunshine, Øystein Walle Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > I am OK with the approach you hinted to have an option to _hide_ the > version string in the generated documentation (hence they lose their > dependency on GIT-VERSION-FILE), while keeping the dependency of > version.o on GIT-VERSION-FILE, so that something goes wrong in a > built binary, the developer can still ask "git version" to identify > where the binary came from. But it would be even better if the build system knew what was the intent of the build. There's a difference when building a revision just to run `make test` on it, and building it to do `make install`. In the former there's no need for a real version in version.o, only in the latter. Maybe RELASE=1. Not a release from the point of view of a maintainer, but a release in the sense that the binaries will escape our in-tree build. -- Felipe Contreras ^ permalink raw reply [flat|nested] 87+ messages in thread
end of thread, other threads:[~2021-11-08 19:18 UTC | newest] Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason 2021-06-24 15:16 ` Jeff King 2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason 2021-06-24 21:30 ` Johannes Sixt 2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason 2021-06-25 9:01 ` Ævar Arnfjörð Bjarmason 2021-06-29 2:13 ` Jeff King 2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason 2021-10-20 18:39 ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason 2021-10-21 14:42 ` Jeff King 2021-10-21 16:25 ` Jeff King 2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason 2021-10-21 14:45 ` Jeff King 2021-10-21 18:24 ` Junio C Hamano 2021-10-21 22:46 ` Øystein Walle 2021-10-20 18:39 ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason 2021-10-20 20:35 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King 2021-10-20 21:31 ` Taylor Blau 2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason 2021-10-20 23:46 ` Jeff King 2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason 2021-10-21 2:20 ` Taylor Blau 2021-10-22 12:37 ` Ævar Arnfjörð Bjarmason 2021-10-21 14:34 ` Jeff King 2021-10-21 22:34 ` Junio C Hamano 2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason 2021-10-22 18:31 ` Jeff King 2021-10-22 20:50 ` Ævar Arnfjörð Bjarmason 2021-10-21 5:39 ` Eric Sunshine 2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason 2021-10-25 18:29 ` Junio C Hamano 2021-10-25 21:22 ` Ævar Arnfjörð Bjarmason 2021-10-25 21:26 ` Junio C Hamano 2021-10-22 19:36 ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason 2021-10-22 19:36 ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason 2021-10-25 16:39 ` Jeff King 2021-10-22 19:36 ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason 2021-10-25 16:46 ` Jeff King 2021-10-25 17:52 ` Jeff King 2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason 2021-10-23 22:19 ` Junio C Hamano 2021-10-23 22:26 ` Junio C Hamano 2021-10-22 19:36 ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason 2021-10-23 22:26 ` Junio C Hamano 2021-10-22 21:20 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau 2021-10-23 22:34 ` Junio C Hamano 2021-10-25 16:57 ` Jeff King 2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason 2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason 2021-11-05 22:45 ` Junio C Hamano 2021-11-06 4:26 ` Ævar Arnfjörð Bjarmason 2021-11-08 19:18 ` Junio C Hamano 2021-11-05 14:08 ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason 2021-11-05 22:47 ` Junio C Hamano 2021-11-06 4:23 ` Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason 2021-11-05 14:08 ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason 2021-06-25 21:17 ` Why the Makefile is so eager to re-build & re-link Felipe Contreras 2021-06-29 5:04 ` Eric Sunshine 2021-06-24 23:35 ` Øystein Walle 2021-06-24 23:39 ` Øystein Walle 2021-06-25 0:11 ` Ævar Arnfjörð Bjarmason 2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason 2021-07-02 15:53 ` Junio C Hamano 2021-07-03 11:58 ` Ævar Arnfjörð Bjarmason 2021-07-05 19:48 ` Junio C Hamano 2021-07-03 1:05 ` Felipe Contreras 2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason 2021-07-03 18:56 ` Felipe Contreras 2021-07-05 19:38 ` Junio C Hamano 2021-07-06 22:25 ` Felipe Contreras
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).