git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Calvin Wan <calvinwan@google.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Parallelism defaults and config options
Date: Tue, 25 Oct 2022 20:47:45 +0200	[thread overview]
Message-ID: <221025.86k04nd89c.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <CAFySSZBnuszT0iYdFThRzktBuMaCTfGCTz5nbhK6sbrt=QL+5w@mail.gmail.com>


On Tue, Oct 25 2022, Calvin Wan wrote:
>> > The safe option is to default to 1 process for many of these config
>> > options, but we trade off in improving the experience for the average
>> > user that is unaware of these options. If we're already defaulting to
>> > online_cpus() for grep.threads and selecting 5 for http.maxRequests,
>> > then why not do the same for other options? My suggestion would be
>> > defaulting IO dominant operations to min(4, online_cpus()) since that
>> > seems like the standard number of lanes for people using SSDs. I would
>> > also default operations that have a mix of both to
>> > min(8, online_cpus()).
>>
>> I haven't thought/tested what the defaults *should* be, but I think it's
>> a fair assumption that the current defaults were probably picked on the
>> basis of a few ad-hoc tests on some person's laptop :)
>>
>> I.e. the 48 core case you mention etc. is likely to be untested & wasn't
>> thought of at the time.
>
> Even with 8 threads, git grep runs very slightly slower than with 1 thread
> for me. Unless we have something along the lines of "git setup-parallelism",
> any default we pick will have different outcomes for different users, but I
> think we can at least make a better guess than what we currently have.

For me e.g.:

	hyperfine -L f E,P -L n 0,1,2,3,4,5,6,7,8,9,10 'git -P -c grep.threads={n} grep -{f} a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'

Gets this in git.git, this is on a 8 core box:
	
	Summary
	  'git -P -c grep.threads=4 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa' ran
	    1.01 ± 0.11 times faster than 'git -P -c grep.threads=3 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.03 ± 0.16 times faster than 'git -P -c grep.threads=5 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.14 ± 0.14 times faster than 'git -P -c grep.threads=6 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.22 ± 0.12 times faster than 'git -P -c grep.threads=7 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.29 ± 0.13 times faster than 'git -P -c grep.threads=0 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.29 ± 0.13 times faster than 'git -P -c grep.threads=8 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.31 ± 0.13 times faster than 'git -P -c grep.threads=2 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.33 ± 0.13 times faster than 'git -P -c grep.threads=9 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.40 ± 0.15 times faster than 'git -P -c grep.threads=10 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.49 ± 0.15 times faster than 'git -P -c grep.threads=6 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.50 ± 0.15 times faster than 'git -P -c grep.threads=7 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.57 ± 0.16 times faster than 'git -P -c grep.threads=0 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.59 ± 0.17 times faster than 'git -P -c grep.threads=8 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.59 ± 0.19 times faster than 'git -P -c grep.threads=5 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.64 ± 0.17 times faster than 'git -P -c grep.threads=9 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.66 ± 0.20 times faster than 'git -P -c grep.threads=4 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.73 ± 0.18 times faster than 'git -P -c grep.threads=10 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.94 ± 0.28 times faster than 'git -P -c grep.threads=3 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    2.15 ± 0.21 times faster than 'git -P -c grep.threads=1 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    2.59 ± 0.25 times faster than 'git -P -c grep.threads=2 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    4.70 ± 0.46 times faster than 'git -P -c grep.threads=1 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'

YMMV.

I think there's been some past discussion about having a probing command
figure this out for you, longer term that would be neat.

>> I think *structurally* the best approach is something like having a
>> family of config variables like:
>>
>>         core.jobs: [(false | 1)|(0 | true) | [2..Inf] ]
>>         core.jobs.IOBound: [(false | 1)|(0 | true) | [2..Inf]]
>>         core.jobs.CPUBound: [(false | 1)|(0 | true) | [2..Inf]]
>>
>> Note that it's "0 or true" and "1 or false", not a mistake, i.e. that
>> matches our current defaults. You'd set it to "true" to get the "yes, I
>> want it parallel" setting.
>>
>> We'd have these take priority from each other, so "grep.threads" would
>> override "core.jobs.IOBound", which in turn would override "core.jobs".
>>
>> The common case would be that you wouldn't set either "core.jobs" or
>> "grep.threads", so we'd default to "core.jobs.IOBound", which we'd set
>> to some sensible default.
>
> While I like this concept very much, my worry is that some commands
> might not fall nicely into IOBound or CPUBound. If they're a mix of the two
> or bound by possibly something else (like network for fetch.parallel? not
> sure about this one haven't looked too into it), then what bucket would we
> put them under?

Just have them use the top-level config if there's no "intermediate".

Or maybe everything should go straight to the top-level default.

Or maybe there shouldn't be a "top-level" at all, and nobody wants to
configure these N at at time, I don't know.

I was just trying to offer you a way out of the problem of wanting
different defaults for certain variables, we could "hardcode" them
across the different variables, or if we find commonalities we could
have them fall back to another parent config variable.

      reply	other threads:[~2022-10-25 19:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-24 23:08 Parallelism defaults and config options Calvin Wan
2022-10-25  9:48 ` Ævar Arnfjörð Bjarmason
2022-10-25 18:01   ` Calvin Wan
2022-10-25 18:47     ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221025.86k04nd89c.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).