git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Johannes Sixt <j6t@kdbg.org>, Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>,
	Emily Shaffer <emilyshaffer@google.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: Why the Makefile is so eager to re-build & re-link
Date: Mon, 28 Jun 2021 22:13:10 -0400	[thread overview]
Message-ID: <YNqBtrXzUlJiuc7y@coredump.intra.peff.net> (raw)
In-Reply-To: <87r1gqxqxn.fsf@evledraar.gmail.com>

On Fri, Jun 25, 2021 at 10:34:20AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Interesting, but I think rather than micro-optimizing the O(n) loop it
> makes more sense to turn it into a series of O(1) in -j parallel,
> i.e. actually use the make dependency graph for this as I suggested in:
> https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/

I have mixed feelings on that. I do like the general notion of breaking
apart tasks and feeding the dependencies to "make", because that lets it
do a better job of parallelizing or avoiding already-done work. But
there's a cost to running any job, so eventually you get to a unit of
work that's so small the overhead dominates.

For instance, starting from a built Git but dirtying all doc files with
"touch Documentation/*.txt", running "time make -j16" yields:

  real	0m1.749s
  user	0m2.963s
  sys	0m1.146s

With your patch to break it apart into many jobs, the same operation
gives:

  real	0m0.762s
  user	0m3.054s
  sys	0m0.600s

So that took fewer wall-clock seconds, but we actually spent more CPU.
On a system with fewer cores, it would probably be a loss in both
places.

Now maybe that's a good tradeoff, especially because the common case
(aside from a build-from-scratch, which will spend loads more time
actually compiling anyway) is that only a handful of files would be
updated.

But if we can just make the actual operation faster, then even O(n)
repeated work might be a win in all cases, because it's avoiding the
overhead of extra jobs.

I dunno. I think there's a formula here that depends on the overhead of
a job versus the time to handle a single file in the script, coupled
with the expected number of changed files for any run. I'm not sure of
the exact values of those numbers in this case, but am mostly pointing
out that it's a tradeoff and not always a pure win. :)

> Something like the hacky throwaway patch that follows. Now when you
> touch a file in Documentation/git-*.txt you re-make just that file
> chain, which gets assembled into the command-list.h:

I know you said this was throwaway, but in case you do pursue it
further, my first run hit:

  $ time make
  GIT_VERSION = 2.32.0.94.gaa5e6f14dd
      * new prefix flags
      GEN build/Documentation
      GEN build/Documentation/git-add.txt.cmdlist.in
  /bin/sh: 1: cannot create build/Documentation/git-add.txt.cmdlist.in: Directory nonexistent
  /bin/sh: 5: cannot create build/Documentation/git-add.txt.cmdlist.in: Directory nonexistent
      GEN build/Documentation/git-am.txt.cmdlist.in
  /bin/sh: 1: cannot create build/Documentation/git-am.txt.cmdlist.in: Directory nonexistent
  /bin/sh: 5: cannot create build/Documentation/git-am.txt.cmdlist.in: Directory nonexistent

So I'd guess there's some race with creating the build/Documentation
directory (a subsequent run worked fine).

-Peff

  parent reply	other threads:[~2021-06-29  2:13 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30   ` Johannes Sixt
2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
2021-06-29  2:13       ` Jeff King [this message]
2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42             ` Jeff King
2021-10-21 16:25               ` Jeff King
2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45             ` Jeff King
2021-10-21 18:24               ` Junio C Hamano
2021-10-21 22:46             ` Øystein Walle
2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31             ` Taylor Blau
2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
2021-10-20 23:46                 ` Jeff King
2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
2021-10-21  2:20                     ` Taylor Blau
2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34                     ` Jeff King
2021-10-21 22:34                       ` Junio C Hamano
2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31                         ` Jeff King
2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
2021-10-21  5:39                 ` Eric Sunshine
2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29               ` Junio C Hamano
2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26                   ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39               ` Jeff King
2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46               ` Jeff King
2021-10-25 17:52                 ` Jeff King
2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19               ` Junio C Hamano
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34             ` Junio C Hamano
2021-10-25 16:57             ` Jeff King
2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45                 ` Junio C Hamano
2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18                     ` Junio C Hamano
2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47                 ` Junio C Hamano
2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29  5:04   ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39   ` Øystein Walle
2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53   ` Junio C Hamano
2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48       ` Junio C Hamano
2021-07-03  1:05   ` Felipe Contreras
2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56       ` Felipe Contreras
2021-07-05 19:38       ` Junio C Hamano
2021-07-06 22:25         ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNqBtrXzUlJiuc7y@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=me@ttaylorr.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).