git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: intelfx@intelfx.name
To: git@vger.kernel.org
Subject: Building with PGO: concurrency and test data
Date: Sun, 21 Apr 2024 02:52:48 +0200	[thread overview]
Message-ID: <65f32df3f49341bf192b606914d44cc937f7971a.camel@intelfx.name> (raw)

Hi!

I'm trying to build Git with PGO (for a private distribution) and I
have two questions about the specifics of the profiling process.


1. The INSTALL doc says that the profiling pass has to run the test
suite using a single CPU, and the Makefile `profile` target also
encodes this rule:

> As a caveat: a profile-optimized build takes a *lot* longer since the
> git tree must be built twice, and in order for the profiling
> measurements to work properly, ccache must be disabled and the test
> suite has to be run using only a single CPU. <...>
( https://github.com/git/git/blob/master/INSTALL#L54-L59 )

> profile:: profile-clean
> 	$(MAKE) PROFILE=GEN all
> 	$(MAKE) PROFILE=GEN -j1 test
> 	@if test -n "$$GIT_PERF_REPO" || test -d .git; then \
> 		$(MAKE) PROFILE=GEN -j1 perf; \
( https://github.com/git/git/blob/master/Makefile#L2350-L2352 )

However, some cursory searching tells me that gcc is equipped to handle
concurrent runs of an instrumented program:

> > It is unclear to me if one can safely run multiple processes
concurrently.
> > is there any risk of corruption or overwriting of the various
"gcda” files if different processes attempt to write on them?
>
> The gcda files are accessed by proper locks, so you should be sa[f]e.
( https://gcc-help.gcc.gnu.narkive.com/0NItmccw/is-it-safe-to-generate-profiles-from-multiple-concurrent-processes#post1 )

As far as I understand, the profiling data collected does not include
timing information or any performance counters. What am I missing? Why
is it not possible to run the test suite with parallelism on the
profiling pass?


2. The performance test suite (t/perf/) uses up to two git repositories
("normal" and "large") as test data to run git commands against. Does
the internal organization of these repositories matter? I.e., does it
matter if those are "real-world-used" repositories with overlapping
packs, cruft, loose objects, many refs etc., or can I simply use fresh
clones of git.git and linux.git without loss of profile quality?

Thanks,

-- 
Ivan Shapovalov / intelfx /


             reply	other threads:[~2024-04-21  0:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-21  0:52 intelfx [this message]
2024-04-21 15:45 ` Building with PGO: concurrency and test data Mike Castle
2024-04-23 22:42 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65f32df3f49341bf192b606914d44cc937f7971a.camel@intelfx.name \
    --to=intelfx@intelfx.name \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).