git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: "René Scharfe" <l.s.r@web.de>,
	"Derrick Stolee" <stolee@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PoC] coccinelle: make Coccinelle-related make targets more fine-grained
Date: Thu,  2 Aug 2018 13:55:22 +0200	[thread overview]
Message-ID: <20180802115522.16107-1-szeder.dev@gmail.com> (raw)
In-Reply-To: <20180723135100.24288-1-szeder.dev@gmail.com>


So, I have this PoC patch below on top of this patch series for some
months now; actually this series was all fallout from working on this
patch.  It makes 'make -j<N> coccicheck' much faster (speedup of over
99% :) in some scenarios that, in my experience, occur rather
frequently when working on Coccinelle semantic patches (as opposed to
occasionally running 'make coccicheck' to see whether there are any
regressions).

However, it's not ready for inclusion as it is, because,
unfortunately, the improvements are not without some disadvantages, as
explained in the second half of the commit message below.

Anyway, I'm sending this out, because I don't see how I could make it
any better, but other contributors working on semantic patches might
still find it useful to keep this in their own fork, even in its
current form.  And perhaps someone will come up with a brilliant idea
to address those disadvantages...

Also available at:

  https://github.com/szeder/git make-coccicheck-finegrained


  -- >8 --

Subject: [PATCH] [PoC] coccinelle: make Coccinelle-related make targets more fine-grained

When running 'make coccicheck', each semantic patch is applied to all
source files in a separate shell for loop.  This approach has the
following disadvantages:

  1. Even if you only modified a single C source file, that shell loop
     will still iterate over all source files and apply the semantic
     patches to all of them, wasting a lot of time.

  2. If you apply only a single semantic patch (either implicitly,
     after you modified only one of them (and the results of the
     previous 'make coccicheck' are still there), or explicitly, by
     running e.g. 'make contrib/coccinelle/array.cocci.patch'), then
     you won't be able to exploit multiple CPU cores to speed up the
     operation, because that shell loop will iterate over all source
     files sequentially.

  3. 'make coccicheck' can only use as many parallel jobs as the
     number of semantic patches, so even if you have more available
     CPU cores than that, you can't exploit them all to speed up this
     operation.

  4. During 'make coccicheck' there is only a single

       SPATCH contrib/coccinelle/<whatever>.cocci

     line (or as many lines as the number of parallel jobs) of output
     when starting to apply each semantic patch, and then comes a long
     silence without any indication of progress, because applying some
     of our semantic patches to all source files can take over a
     minute.  This can trick developers new to semantic patches into
     thinking that something went wrong, hung, ended up in an endless
     loop, etc.  It certainly confused my the first time.

Let's add a bit of Makefile metaprogramming to generate finer-grained
make targets applying one semantic patch to only a single source file,
and specify these as dependencies of the targets applying one semantic
patch to all source files.  This way that shell loop can be avoided,
semantic patches will only be applied to changed source files, and the
same semantic patch can be applied in parallel to multiple source
files.  The only remaining sequential part is aggregating the
suggested transformations from the individual targets into a single
patch file, which is comparatively cheap (especially since ideally
there aren't any suggestions).

This change brings spectacular speedup in the scenario described in
point (1) above.  When the results of a previous 'make coccicheck' are
there, the time needed to run

  touch apply.c ; time make -j4 coccicheck

went from 3m42s to 1.848s, which is just over 99% speedup, yay!, and
'apply.c' is the second longest source file in our codebase.  In the
scenario in point (2), running

  touch contrib/coccinelle/array.cocci ; time make -j4 coccicheck

went from 56.364s to 35.883s, which is ~36% speedup.

All the above timings are best-of-five on a machine with 2 physical
and 2 logical cores.  I don't have the hardware to bring any numbers
for point (3).  The time needed to run 'make -j4 coccicheck' in a
clean state didn't change, it's ~3m42s both with and without this
patch.

Unfortunately, this new approach has some disadvantages compared to
the current situation:

  - [RFC]
    With this patch 'make coccicheck's output will look like this
    ('make' apparently doesn't iterate over semantic patches in
    lexicographical order):

      SPATCH commit.cocci              abspath.c
      SPATCH commit.cocci              advice.c
      <... lots of lines ...>
      SPATCH array.cocci               http-walker.c
      SPATCH array.cocci               remote-curl.c

    which means that the number of lines in the output grows from
    "Nr-of-semantic-patches" to "Nr-of-semantic-patches *
    Nr-of-source-files".

    Now, while this certainly addresses point (4) above, it can be
    considered too much, and I'm not sure that (4) is that big an
    issue to justify this much output.

    OTOH, I couldn't yet figure out a way to print a single 'SPATCH
    contrib/coccinelle/<whatever>.cocci' line at the beginning of
    applying that semantic patch without triggering unnecessary work,
    effectively killing most of the runtime benefits.  Or to somehow
    iterate over source files in the outer loop and over semantic
    patches in the inner loop, so we could have one output line per
    file.

  - [RFC]
    The overhead of applying a semantic patch to all source files
    became larger.  'make coccicheck' currently runs only one shell
    process and creates two output files for each semantic patch.
    With this patch it will run approx.  "Nr-of-semantic-patches *
    Nr-of-source-files" shell processes and create twice as many
    output files.

    This overhead can be quantified by measuring the runtime of:

      make -j4 SPATCH=true coccicheck

    and this patch increases it from 0.142s to 1.479s.  While this is
    indeed an order of magnitude increase, it's still negligible
    compared to the "real" runtime of 3m42s.

    So the increased overhead doesn't seem to matter on Linux, but I
    would expect that it's worse on OSX and Windows; though I'm not
    sure whether Coccinelle (with all its OCaml dependencies) works on
    those platforms in the first place.

  - [RFC]
    This approach uses $(eval), which we haven't used in any of our
    Makefiles yet.  I wonder whether it's portable enough.  And it's
    ugly and complicated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 Makefile                      | 52 ++++++++++++++++++++++++-----------
 contrib/coccinelle/.gitignore |  3 +-
 2 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/Makefile b/Makefile
index d616c04125..f516dd93d1 100644
--- a/Makefile
+++ b/Makefile
@@ -2674,25 +2674,44 @@ COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(C_SOURCES))
 else
 COCCI_SOURCES = $(filter-out sha1dc/%,$(C_SOURCES))
 endif
+COCCI_SEM_PATCHES = $(wildcard contrib/coccinelle/*.cocci)
 
-%.cocci.patch: %.cocci $(COCCI_SOURCES)
-	@echo '    ' SPATCH $<; \
-	ret=0; \
-	for f in $(COCCI_SOURCES); do \
-		$(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
-			{ ret=$$?; break; }; \
-	done >$@+ 2>$@.log; \
-	if test $$ret != 0; \
+define cocci_template
+$(cocci_sem_patch)_dirs := $$(addprefix $(cocci_sem_patch).patches/,$$(sort $$(dir $$(COCCI_SOURCES))))
+
+$$($(cocci_sem_patch)_dirs):
+	@mkdir -p $$@
+
+# e.g. 'contrib/coccinelle/strbuf.cocci.patches/builtin/commit.c.patch'
+# Applies one semantic patch to _one_ source file.
+$(cocci_sem_patch).patches/%.patch: % $(cocci_sem_patch)
+	@printf '     SPATCH %-25s %s\n' $$(notdir $(cocci_sem_patch)) $$<; \
+	if ! $$(SPATCH) --sp-file $(cocci_sem_patch) $$< $$(SPATCH_FLAGS) >$$@ 2>$$@.log; \
 	then \
-		cat $@.log; \
+		rm -f $$@; \
+		cat $$@.log; \
 		exit 1; \
-	fi; \
-	mv $@+ $@; \
-	if test -s $@; \
-	then \
-		echo '    ' SPATCH result: $@; \
 	fi
-coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci))
+
+# e.g. 'contrib/coccinelle/strbuf.cocci.patch'
+# Applies one semantic patch to _all_ source files.
+$(cocci_sem_patch).patch: $(cocci_sem_patch) $$($(cocci_sem_patch)_dirs) $$(patsubst %,$(cocci_sem_patch).patches/%.patch,$(COCCI_SOURCES))
+	@>$$@+; \
+	for f in $$(filter %.patch,$$^); do \
+		if test -s $$$$f; \
+		then \
+			cat $$$$f >>$$@+; \
+		fi \
+	done; \
+	mv $$@+ $$@; \
+	if test -s $$@; then \
+		echo '    ' SPATCH result: $$@; \
+	fi
+endef
+
+$(foreach cocci_sem_patch,$(COCCI_SEM_PATCHES),$(eval $(cocci_template)))
+
+coccicheck: $(addsuffix .patch,$(COCCI_SEM_PATCHES))
 
 .PHONY: coccicheck
 
@@ -2907,7 +2926,8 @@ profile-clean:
 	$(RM) $(addsuffix *.gcno,$(addprefix $(PROFILE_DIR)/, $(object_dirs)))
 
 cocciclean:
-	$(RM) contrib/coccinelle/*.cocci.patch*
+	$(RM) contrib/coccinelle/*.cocci.patch
+	$(RM) -r contrib/coccinelle/*.cocci.patches/
 
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
diff --git a/contrib/coccinelle/.gitignore b/contrib/coccinelle/.gitignore
index d3f29646dc..7ae6ffa983 100644
--- a/contrib/coccinelle/.gitignore
+++ b/contrib/coccinelle/.gitignore
@@ -1 +1,2 @@
-*.patch*
+*.cocci.patch
+*.cocci.patches/
-- 
2.18.0.408.g42635c01bc


  parent reply	other threads:[~2018-08-02 11:55 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-23 13:50 [PATCH 0/5] Misc Coccinelle-related improvements SZEDER Gábor
2018-07-23 13:50 ` [PATCH 1/5] coccinelle: mark the 'coccicheck' make target as .PHONY SZEDER Gábor
2018-07-23 14:36   ` Derrick Stolee
2018-07-23 19:37   ` Junio C Hamano
2018-07-23 13:50 ` [PATCH 2/5] coccinelle: use $(addsuffix) in 'coccicheck' make target SZEDER Gábor
2018-07-23 19:37   ` Junio C Hamano
2018-07-23 13:50 ` [PATCH 3/5] coccinelle: exclude sha1dc source files from static analysis SZEDER Gábor
2018-07-23 18:27   ` Eric Sunshine
2018-07-23 18:43     ` SZEDER Gábor
2018-07-23 18:57       ` Eric Sunshine
2018-07-23 13:50 ` [PATCH 4/5] coccinelle: put sane filenames into output patches SZEDER Gábor
2018-07-23 14:34   ` Derrick Stolee
2018-07-23 13:51 ` [PATCH 5/5] coccinelle: extract dedicated make target to clean Coccinelle's results SZEDER Gábor
2018-07-23 14:38 ` [PATCH 0/5] Misc Coccinelle-related improvements Derrick Stolee
2018-07-23 15:29 ` Duy Nguyen
2018-07-23 16:30 ` René Scharfe
2018-07-23 19:40 ` Junio C Hamano
2018-08-02 11:55 ` SZEDER Gábor [this message]
2018-08-02 13:24   ` [PoC] coccinelle: make Coccinelle-related make targets more fine-grained René Scharfe
2018-08-02 18:01   ` Jeff King
2018-08-02 18:31     ` Jeff King
2018-08-03  6:21       ` Jonathan Nieder
2018-08-03 13:08         ` Jeff King
2018-08-05 23:02           ` Jonathan Nieder
2018-08-02 19:46     ` Eric Sunshine
2018-08-02 21:29       ` Jeff King
2018-08-02 21:45     ` Ævar Arnfjörð Bjarmason
2018-08-03  6:22       ` Julia Lawall
2018-08-03  6:44         ` Jonathan Nieder
2018-08-03  6:52           ` Julia Lawall
2018-08-03  6:25       ` Julia Lawall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180802115522.16107-1-szeder.dev@gmail.com \
    --to=szeder.dev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).