Parallelism defaults and config options

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Parallelism defaults and config options
@ 2022-10-24 23:08 Calvin Wan
  2022-10-25  9:48 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 4+ messages in thread
From: Calvin Wan @ 2022-10-24 23:08 UTC (permalink / raw)
  To: Git Mailing List

While trying to figure out how I was going to set the default
parallelism in cw/submodule-status-in-parallel, I noticed some
discrepancies between how all the parallelism config options are set in
git. I wanted to discuss what we can do now to make them more consistent
and also what the standard should be for the future. Here is a list of
parallelism config options in git (let me know if I missed any) and how
they're set:

grep.threads: if unset or set to 0, git uses number of logical cores.
index.threads: if unset or set to 0/true, git uses number of logical
  cores. If set to 1/false, multithreading is disabled
pack.threads: if unset or set to 0, git uses number of logical cores.
  (documentation doesn't mention what default is)
checkout.workers: if unset, defaults to 1. If set to < 1, git uses
  number of logical cores
fetch.parallel: if unset, defaults to 1. If set to 0, git uses number of
  logical cores (documentation says reasonable default)
http.maxRequests: if unset, defaults to 5. If set to < 1, git uses the
  default 5.
submodule.fetchJobs: if unset, defaults to 1. If set to 0, git uses
  number of logical cores (documentation says reasonable default)

The first inconsistency is the difference in language used to describe
when each option is set to "online_cpus()". Some are explicit while
others omit it or use language such as "reasonable default". Being
explicit for all of the options is probably the easiest documentation
fix.

The next inconsistency is for values < 1. Most options use online_cpus()
when set to 0 except index.threads which is a special case of its own.
Some options error out when set to a negative number while
checkout.workers falls back to online_cpus() and http.maxRequests falls
back to 5. I don't think we can fix this retroactively unless we decide
that all config options will be set to online_cpus() if the value is
negative. Should that be the case going forward or should 0 be the only
special cased value for now? I can see an argument for allowing other
negative values to be configured in the future for different defaulting
options.

The final inconsistency is how values are defaulted if unset. Some
default to online_cpus() while others default to 1 (http.maxRequests is
5). I want to call out grep.threads specifically here -- on my machine
with 48 cores, the default is actually SLOWER than using 1 thread. This
is because the grep operation is heavily IO bound, so creating too many
threads adds overhead every time the read head changes. Interestingly,
this option runs optimally at 4 since that's how many PCIe lanes my SSD
uses. While it makes sense to default processor heavy operations to
online_cpus(), does it make sense to do the same for IO heavy
operations? (I wasn't able to find an equivalent of online_cpus() for
drive reading capabilities.) And what about operations that have a fair
mix of each?

The safe option is to default to 1 process for many of these config
options, but we trade off in improving the experience for the average
user that is unaware of these options. If we're already defaulting to
online_cpus() for grep.threads and selecting 5 for http.maxRequests,
then why not do the same for other options? My suggestion would be
defaulting IO dominant operations to min(4, online_cpus()) since that
seems like the standard number of lanes for people using SSDs. I would
also default operations that have a mix of both to
min(8, online_cpus()).
-- 
Calvin Wan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Parallelism defaults and config options
  2022-10-24 23:08 Parallelism defaults and config options Calvin Wan
@ 2022-10-25  9:48 ` Ævar Arnfjörð Bjarmason
  2022-10-25 18:01   ` Calvin Wan
  0 siblings, 1 reply; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-25  9:48 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Git Mailing List

On Mon, Oct 24 2022, Calvin Wan wrote:

> While trying to figure out how I was going to set the default
> parallelism in cw/submodule-status-in-parallel, I noticed some
> discrepancies between how all the parallelism config options are set in
> git. I wanted to discuss what we can do now to make them more consistent
> and also what the standard should be for the future. Here is a list of
> parallelism config options in git (let me know if I missed any) and how
> they're set:

Yeah, I've noticed this before & found it odd, thanks for trying to make
it sensible.

> grep.threads: if unset or set to 0, git uses number of logical cores.
> index.threads: if unset or set to 0/true, git uses number of logical
>   cores. If set to 1/false, multithreading is disabled
> pack.threads: if unset or set to 0, git uses number of logical cores.
>   (documentation doesn't mention what default is)
> checkout.workers: if unset, defaults to 1. If set to < 1, git uses
>   number of logical cores
> fetch.parallel: if unset, defaults to 1. If set to 0, git uses number of
>   logical cores (documentation says reasonable default)
> http.maxRequests: if unset, defaults to 5. If set to < 1, git uses the
>   default 5.
> submodule.fetchJobs: if unset, defaults to 1. If set to 0, git uses
>   number of logical cores (documentation says reasonable default)

Aside from anything we come up with, I think it's fine to change the
defaults, i.e. to make some of these parallel by default where they
aren't now.

> The first inconsistency is the difference in language used to describe
> when each option is set to "online_cpus()". Some are explicit while
> others omit it or use language such as "reasonable default". Being
> explicit for all of the options is probably the easiest documentation
> fix.

Ideally I think a documentation fix would be to just have these point at
the same text, e.g. a "parent" option could explain all the details, and
they could each point to that.

> The next inconsistency is for values < 1. Most options use online_cpus()
> when set to 0 except index.threads which is a special case of its own.
> Some options error out when set to a negative number while
> checkout.workers falls back to online_cpus() and http.maxRequests falls
> back to 5. I don't think we can fix this retroactively unless we decide
> that all config options will be set to online_cpus() if the value is
> negative. Should that be the case going forward or should 0 be the only
> special cased value for now? I can see an argument for allowing other
> negative values to be configured in the future for different defaulting
> options.

We also have other options, like warning() on negative values if we
think it's worth making these consistent while slightly annoying
existing users.

I think it's safe to say that by far the majority of git users aren't
tweaking these at all, so we have a lot of leeway for sensible changes.

> The final inconsistency is how values are defaulted if unset. Some
> default to online_cpus() while others default to 1 (http.maxRequests is
> 5). I want to call out grep.threads specifically here -- on my machine
> with 48 cores, the default is actually SLOWER than using 1 thread. This
> is because the grep operation is heavily IO bound, so creating too many
> threads adds overhead every time the read head changes. Interestingly,
> this option runs optimally at 4 since that's how many PCIe lanes my SSD
> uses. While it makes sense to default processor heavy operations to
> online_cpus(), does it make sense to do the same for IO heavy
> operations? (I wasn't able to find an equivalent of online_cpus() for
> drive reading capabilities.) And what about operations that have a fair
> mix of each?

First, http.maxRequests is different from any of the rest, it'll impact
3rd party servers. So we should treat it differently. I'd think unless
we have solid evidence for something else we should just leave that one
be, and certainly not have it be some function of online_cpus().

> The safe option is to default to 1 process for many of these config
> options, but we trade off in improving the experience for the average
> user that is unaware of these options. If we're already defaulting to
> online_cpus() for grep.threads and selecting 5 for http.maxRequests,
> then why not do the same for other options? My suggestion would be
> defaulting IO dominant operations to min(4, online_cpus()) since that
> seems like the standard number of lanes for people using SSDs. I would
> also default operations that have a mix of both to
> min(8, online_cpus()).

I haven't thought/tested what the defaults *should* be, but I think it's
a fair assumption that the current defaults were probably picked on the
basis of a few ad-hoc tests on some person's laptop :)

I.e. the 48 core case you mention etc. is likely to be untested & wasn't
thought of at the time.

I think *structurally* the best approach is something like having a
family of config variables like:

	core.jobs: [(false | 1)|(0 | true) | [2..Inf] ]
	core.jobs.IOBound: [(false | 1)|(0 | true) | [2..Inf]]
	core.jobs.CPUBound: [(false | 1)|(0 | true) | [2..Inf]]

Note that it's "0 or true" and "1 or false", not a mistake, i.e. that
matches our current defaults. You'd set it to "true" to get the "yes, I
want it parallel" setting.

We'd have these take priority from each other, so "grep.threads" would
override "core.jobs.IOBound", which in turn would override "core.jobs".

The common case would be that you wouldn't set either "core.jobs" or
"grep.threads", so we'd default to "core.jobs.IOBound", which we'd set
to some sensible default.

All of that's off-the-cuff, but hopefully has some worthwhile
feedback. In particular I think leaving our docs in a state where we
don't have to repeat ourselves (sometimes inaccurately) would be nice,
whether it's via includes or because we point to a "parent" variable.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Parallelism defaults and config options
  2022-10-25  9:48 ` Ævar Arnfjörð Bjarmason
@ 2022-10-25 18:01   ` Calvin Wan
  2022-10-25 18:47     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 4+ messages in thread
From: Calvin Wan @ 2022-10-25 18:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

> > The next inconsistency is for values < 1. Most options use online_cpus()
> > when set to 0 except index.threads which is a special case of its own.
> > Some options error out when set to a negative number while
> > checkout.workers falls back to online_cpus() and http.maxRequests falls
> > back to 5. I don't think we can fix this retroactively unless we decide
> > that all config options will be set to online_cpus() if the value is
> > negative. Should that be the case going forward or should 0 be the only
> > special cased value for now? I can see an argument for allowing other
> > negative values to be configured in the future for different defaulting
> > options.
>
> We also have other options, like warning() on negative values if we
> think it's worth making these consistent while slightly annoying
> existing users.
>
> I think it's safe to say that by far the majority of git users aren't
> tweaking these at all, so we have a lot of leeway for sensible changes.

I like the warning() idea -- it doesn't break people's git and sets the correct
precedence for what to expect for similar parallelism config options.

> > The final inconsistency is how values are defaulted if unset. Some
> > default to online_cpus() while others default to 1 (http.maxRequests is
> > 5). I want to call out grep.threads specifically here -- on my machine
> > with 48 cores, the default is actually SLOWER than using 1 thread. This
> > is because the grep operation is heavily IO bound, so creating too many
> > threads adds overhead every time the read head changes. Interestingly,
> > this option runs optimally at 4 since that's how many PCIe lanes my SSD
> > uses. While it makes sense to default processor heavy operations to
> > online_cpus(), does it make sense to do the same for IO heavy
> > operations? (I wasn't able to find an equivalent of online_cpus() for
> > drive reading capabilities.) And what about operations that have a fair
> > mix of each?
>
> First, http.maxRequests is different from any of the rest, it'll impact
> 3rd party servers. So we should treat it differently. I'd think unless
> we have solid evidence for something else we should just leave that one
> be, and certainly not have it be some function of online_cpus().

I wasn't planning on touching that one either -- mostly brought it up as an
example of an arbitrarily picked default.

> > The safe option is to default to 1 process for many of these config
> > options, but we trade off in improving the experience for the average
> > user that is unaware of these options. If we're already defaulting to
> > online_cpus() for grep.threads and selecting 5 for http.maxRequests,
> > then why not do the same for other options? My suggestion would be
> > defaulting IO dominant operations to min(4, online_cpus()) since that
> > seems like the standard number of lanes for people using SSDs. I would
> > also default operations that have a mix of both to
> > min(8, online_cpus()).
>
> I haven't thought/tested what the defaults *should* be, but I think it's
> a fair assumption that the current defaults were probably picked on the
> basis of a few ad-hoc tests on some person's laptop :)
>
> I.e. the 48 core case you mention etc. is likely to be untested & wasn't
> thought of at the time.

Even with 8 threads, git grep runs very slightly slower than with 1 thread
for me. Unless we have something along the lines of "git setup-parallelism",
any default we pick will have different outcomes for different users, but I
think we can at least make a better guess than what we currently have.

> I think *structurally* the best approach is something like having a
> family of config variables like:
>
>         core.jobs: [(false | 1)|(0 | true) | [2..Inf] ]
>         core.jobs.IOBound: [(false | 1)|(0 | true) | [2..Inf]]
>         core.jobs.CPUBound: [(false | 1)|(0 | true) | [2..Inf]]
>
> Note that it's "0 or true" and "1 or false", not a mistake, i.e. that
> matches our current defaults. You'd set it to "true" to get the "yes, I
> want it parallel" setting.
>
> We'd have these take priority from each other, so "grep.threads" would
> override "core.jobs.IOBound", which in turn would override "core.jobs".
>
> The common case would be that you wouldn't set either "core.jobs" or
> "grep.threads", so we'd default to "core.jobs.IOBound", which we'd set
> to some sensible default.

While I like this concept very much, my worry is that some commands
might not fall nicely into IOBound or CPUBound. If they're a mix of the two
or bound by possibly something else (like network for fetch.parallel? not
sure about this one haven't looked too into it), then what bucket would we
put them under?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Parallelism defaults and config options
  2022-10-25 18:01   ` Calvin Wan
@ 2022-10-25 18:47     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-25 18:47 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Git Mailing List


On Tue, Oct 25 2022, Calvin Wan wrote:
>> > The safe option is to default to 1 process for many of these config
>> > options, but we trade off in improving the experience for the average
>> > user that is unaware of these options. If we're already defaulting to
>> > online_cpus() for grep.threads and selecting 5 for http.maxRequests,
>> > then why not do the same for other options? My suggestion would be
>> > defaulting IO dominant operations to min(4, online_cpus()) since that
>> > seems like the standard number of lanes for people using SSDs. I would
>> > also default operations that have a mix of both to
>> > min(8, online_cpus()).
>>
>> I haven't thought/tested what the defaults *should* be, but I think it's
>> a fair assumption that the current defaults were probably picked on the
>> basis of a few ad-hoc tests on some person's laptop :)
>>
>> I.e. the 48 core case you mention etc. is likely to be untested & wasn't
>> thought of at the time.
>
> Even with 8 threads, git grep runs very slightly slower than with 1 thread
> for me. Unless we have something along the lines of "git setup-parallelism",
> any default we pick will have different outcomes for different users, but I
> think we can at least make a better guess than what we currently have.

For me e.g.:

	hyperfine -L f E,P -L n 0,1,2,3,4,5,6,7,8,9,10 'git -P -c grep.threads={n} grep -{f} a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'

Gets this in git.git, this is on a 8 core box:
	
	Summary
	  'git -P -c grep.threads=4 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa' ran
	    1.01 ± 0.11 times faster than 'git -P -c grep.threads=3 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.03 ± 0.16 times faster than 'git -P -c grep.threads=5 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.14 ± 0.14 times faster than 'git -P -c grep.threads=6 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.22 ± 0.12 times faster than 'git -P -c grep.threads=7 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.29 ± 0.13 times faster than 'git -P -c grep.threads=0 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.29 ± 0.13 times faster than 'git -P -c grep.threads=8 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.31 ± 0.13 times faster than 'git -P -c grep.threads=2 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.33 ± 0.13 times faster than 'git -P -c grep.threads=9 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.40 ± 0.15 times faster than 'git -P -c grep.threads=10 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.49 ± 0.15 times faster than 'git -P -c grep.threads=6 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.50 ± 0.15 times faster than 'git -P -c grep.threads=7 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.57 ± 0.16 times faster than 'git -P -c grep.threads=0 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.59 ± 0.17 times faster than 'git -P -c grep.threads=8 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.59 ± 0.19 times faster than 'git -P -c grep.threads=5 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.64 ± 0.17 times faster than 'git -P -c grep.threads=9 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.66 ± 0.20 times faster than 'git -P -c grep.threads=4 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.73 ± 0.18 times faster than 'git -P -c grep.threads=10 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    1.94 ± 0.28 times faster than 'git -P -c grep.threads=3 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    2.15 ± 0.21 times faster than 'git -P -c grep.threads=1 grep -P a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    2.59 ± 0.25 times faster than 'git -P -c grep.threads=2 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'
	    4.70 ± 0.46 times faster than 'git -P -c grep.threads=1 grep -E a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa'

YMMV.

I think there's been some past discussion about having a probing command
figure this out for you, longer term that would be neat.

>> I think *structurally* the best approach is something like having a
>> family of config variables like:
>>
>>         core.jobs: [(false | 1)|(0 | true) | [2..Inf] ]
>>         core.jobs.IOBound: [(false | 1)|(0 | true) | [2..Inf]]
>>         core.jobs.CPUBound: [(false | 1)|(0 | true) | [2..Inf]]
>>
>> Note that it's "0 or true" and "1 or false", not a mistake, i.e. that
>> matches our current defaults. You'd set it to "true" to get the "yes, I
>> want it parallel" setting.
>>
>> We'd have these take priority from each other, so "grep.threads" would
>> override "core.jobs.IOBound", which in turn would override "core.jobs".
>>
>> The common case would be that you wouldn't set either "core.jobs" or
>> "grep.threads", so we'd default to "core.jobs.IOBound", which we'd set
>> to some sensible default.
>
> While I like this concept very much, my worry is that some commands
> might not fall nicely into IOBound or CPUBound. If they're a mix of the two
> or bound by possibly something else (like network for fetch.parallel? not
> sure about this one haven't looked too into it), then what bucket would we
> put them under?

Just have them use the top-level config if there's no "intermediate".

Or maybe everything should go straight to the top-level default.

Or maybe there shouldn't be a "top-level" at all, and nobody wants to
configure these N at at time, I don't know.

I was just trying to offer you a way out of the problem of wanting
different defaults for certain variables, we could "hardcode" them
across the different variables, or if we find commonalities we could
have them fall back to another parent config variable.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-25 19:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-24 23:08 Parallelism defaults and config options Calvin Wan
2022-10-25  9:48 ` Ævar Arnfjörð Bjarmason
2022-10-25 18:01   ` Calvin Wan
2022-10-25 18:47     ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).