git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] index-pack threading defaults
@ 2020-08-21 17:51 Jeff King
  2020-08-21 17:53 ` [PATCH 1/3] p5302: disable thread-count parameter tests by default Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jeff King @ 2020-08-21 17:51 UTC (permalink / raw)
  To: git

I decided to revisit index-pack's hard-coded default of 3 threads. And
it turns out that higher numbers perform much better. :)

That value was determined experimentally in 2012. I'm not sure of the
exact reason it's different now (modern processors are better at
parallelism, or modern git is better at parallelism, or the original
experiment was just a fluke). But regardless, I can get on the order of
a two-to-one speedup by bumping the thread count. See the final patch
for timings and more specific discussion.

The first two patches are just perf-test improvements (actually, the
slowness to value ratio of p5302 from the first patch is why I was
looking at this at all).

  [1/3]: p5302: disable thread-count parameter tests by default
  [2/3]: p5302: count up to online-cpus for thread tests
  [3/3]: index-pack: adjust default threading cap

 builtin/index-pack.c       | 19 ++++++++++++---
 t/perf/README              |  9 ++++++++
 t/perf/p5302-pack-index.sh | 47 +++++++++++++++++++-------------------
 t/perf/perf-lib.sh         |  2 ++
 4 files changed, 51 insertions(+), 26 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] p5302: disable thread-count parameter tests by default
  2020-08-21 17:51 [PATCH 0/3] index-pack threading defaults Jeff King
@ 2020-08-21 17:53 ` Jeff King
  2020-08-21 17:54 ` [PATCH 2/3] p5302: count up to online-cpus for thread tests Jeff King
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-08-21 17:53 UTC (permalink / raw)
  To: git

The primary function of the perf suite is to detect regressions (or
improvements) between versions of Git. The only numbers we show a direct
comparison for are timings between the same test run on two different
versions.

However, it can sometimes be used to collect other information.  For
instance, p5302 runs the same index-pack operation with different thread
counts. The output doesn't directly compare these, but anybody
interested in working on index-pack can manually compare the results.

For a normal regression run of the full perf-suite, though, this incurs
a significant cost to generate numbers nobody will actually look at;
about 25% of the total time of the test suite is spent in p5302. And the
low-thread-count runs are the most expensive part of it, since they're
(unsurprisingly) not using as many threads.

Let's skip these tests by default, but make it possible for people
working on index-pack to still run them by setting an environment
variable. Rather than make this specific to p5302, let's introduce a
generic mechanism. This makes it possible to run the full suite with
every possible test if somebody really wants to burn some CPU.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/perf/README              |  9 +++++++++
 t/perf/p5302-pack-index.sh | 10 +++++-----
 t/perf/perf-lib.sh         |  2 ++
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/t/perf/README b/t/perf/README
index c7b70e2d28..bd649afa97 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -84,6 +84,15 @@ You can set the following variables (also in your config.mak):
 	probably be about linux.git size for optimal results.
 	Both default to the git.git you are running from.
 
+    GIT_PERF_EXTRA
+	Boolean to enable additional tests. Most test scripts are
+	written to detect regressions between two versions of Git, and
+	the output will compare timings for individual tests between
+	those versions. Some scripts have additional tests which are not
+	run by default, that show patterns within a single version of
+	Git (e.g., performance of index-pack as the number of threads
+	changes). These can be enabled with GIT_PERF_EXTRA.
+
 You can also pass the options taken by ordinary git tests; the most
 useful one is:
 
diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
index a9b3e112d9..23011ab739 100755
--- a/t/perf/p5302-pack-index.sh
+++ b/t/perf/p5302-pack-index.sh
@@ -13,31 +13,31 @@ test_expect_success 'repack' '
 	export PACK
 '
 
-test_perf 'index-pack 0 threads' '
+test_perf PERF_EXTRA 'index-pack 0 threads' '
 	rm -rf repo.git &&
 	git init --bare repo.git &&
 	GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK
 '
 
-test_perf 'index-pack 1 thread ' '
+test_perf PERF_EXTRA 'index-pack 1 thread ' '
 	rm -rf repo.git &&
 	git init --bare repo.git &&
 	GIT_DIR=repo.git GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
 '
 
-test_perf 'index-pack 2 threads' '
+test_perf PERF_EXTRA 'index-pack 2 threads' '
 	rm -rf repo.git &&
 	git init --bare repo.git &&
 	GIT_DIR=repo.git git index-pack --threads=2 --stdin < $PACK
 '
 
-test_perf 'index-pack 4 threads' '
+test_perf PERF_EXTRA 'index-pack 4 threads' '
 	rm -rf repo.git &&
 	git init --bare repo.git &&
 	GIT_DIR=repo.git git index-pack --threads=4 --stdin < $PACK
 '
 
-test_perf 'index-pack 8 threads' '
+test_perf PERF_EXTRA 'index-pack 8 threads' '
 	rm -rf repo.git &&
 	git init --bare repo.git &&
 	GIT_DIR=repo.git git index-pack --threads=8 --stdin < $PACK
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index 13e389367a..821581a885 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -245,3 +245,5 @@ test_at_end_hook_ () {
 test_export () {
 	export "$@"
 }
+
+test_lazy_prereq PERF_EXTRA 'test_bool_env GIT_PERF_EXTRA false'
-- 
2.28.0.694.g07780f7063


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] p5302: count up to online-cpus for thread tests
  2020-08-21 17:51 [PATCH 0/3] index-pack threading defaults Jeff King
  2020-08-21 17:53 ` [PATCH 1/3] p5302: disable thread-count parameter tests by default Jeff King
@ 2020-08-21 17:54 ` Jeff King
  2020-08-21 17:58   ` Jeff King
  2020-08-21 17:58 ` [PATCH 3/3] index-pack: adjust default threading cap Jeff King
  2020-08-21 18:44 ` [PATCH 0/3] index-pack threading defaults Jeff King
  3 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-08-21 17:54 UTC (permalink / raw)
  To: git

When PERF_EXTRA is enabled, p5302 checks the performance of index-pack
with various numbers of threads. This can be useful for deciding what
the default should be (which is currently capped at 3 threads based on
the results of this script).

However, we only go up to 8 threads, and modern machines may have more.
Let's get the number of CPUs from test-tool, and test various numbers of
threads between one and that maximum.

Note that the current tests aren't all identical, as we have to set
GIT_FORCE_THREADS for the --threads=1 test (which measures the overhead
of starting a single worker thread versus the "0" case of using the main
thread). To keep the loop simple, we'll keep the "0" case out of it, and
set GIT_FORCE_THREADS=1 for all of the other cases (it's a noop for all
but the "1" case, since numbers higher than 1 would always need
threads).

Note also that we could skip running "test-tool" if PERF_EXTRA isn't
set. However, there's some small value in knowing the number of threads,
so that we can mark each test as skipped in the output.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/perf/p5302-pack-index.sh | 47 +++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
index 23011ab739..228593d9ad 100755
--- a/t/perf/p5302-pack-index.sh
+++ b/t/perf/p5302-pack-index.sh
@@ -13,35 +13,36 @@ test_expect_success 'repack' '
 	export PACK
 '
 
-test_perf PERF_EXTRA 'index-pack 0 threads' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK
+# Rather than counting up and doubling each time, count down from the endpoint,
+# halving each time. That ensures that our final test uses as many threads as
+# CPUs, even if it isn't a power of 2.
+test_expect_success 'set up thread-counting tests' '
+	t=$(test-tool online-cpus) &&
+	threads= &&
+	while test $t -gt 0
+	do
+		threads="$t $threads"
+		t=$((t / 2))
+	done
 '
 
-test_perf PERF_EXTRA 'index-pack 1 thread ' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
-'
-
-test_perf PERF_EXTRA 'index-pack 2 threads' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git git index-pack --threads=2 --stdin < $PACK
-'
-
-test_perf PERF_EXTRA 'index-pack 4 threads' '
+test_perf PERF_EXTRA 'index-pack 0 threads' '
 	rm -rf repo.git &&
 	git init --bare repo.git &&
-	GIT_DIR=repo.git git index-pack --threads=4 --stdin < $PACK
+	GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK
 '
 
-test_perf PERF_EXTRA 'index-pack 8 threads' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git git index-pack --threads=8 --stdin < $PACK
-'
+for t in $threads
+do
+	THREADS=$t
+	export THREADS
+	test_perf PERF_EXTRA "index-pack $t threads" '
+		rm -rf repo.git &&
+		git init --bare repo.git &&
+		GIT_DIR=repo.git GIT_FORCE_THREADS=1 \
+		git index-pack --threads=$THREADS --stdin <$PACK
+	'
+done
 
 test_perf 'index-pack default number of threads' '
 	rm -rf repo.git &&
-- 
2.28.0.694.g07780f7063


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] index-pack: adjust default threading cap
  2020-08-21 17:51 [PATCH 0/3] index-pack threading defaults Jeff King
  2020-08-21 17:53 ` [PATCH 1/3] p5302: disable thread-count parameter tests by default Jeff King
  2020-08-21 17:54 ` [PATCH 2/3] p5302: count up to online-cpus for thread tests Jeff King
@ 2020-08-21 17:58 ` Jeff King
  2020-08-21 18:08   ` Eric Sunshine
  2020-08-22  1:16   ` brian m. carlson
  2020-08-21 18:44 ` [PATCH 0/3] index-pack threading defaults Jeff King
  3 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2020-08-21 17:58 UTC (permalink / raw)
  To: git

Commit b8a2486f15 (index-pack: support multithreaded delta resolving,
2012-05-06) describes an experiment that shows that setting the number
of threads for index-pack higher than 3 does not help.

I repeated that experiment using a more modern version of Git and a more
modern CPU and got different results.

Here are timings for p5302 against linux.git run on my laptop, a Core
i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16):

  5302.3: index-pack 0 threads                   256.28(253.41+2.79)
  5302.4: index-pack 1 threads                   257.03(254.03+2.91)
  5302.5: index-pack 2 threads                   149.39(268.34+3.06)
  5302.6: index-pack 4 threads                   94.96(294.10+3.23)
  5302.7: index-pack 8 threads                   68.12(339.26+3.89)
  5302.8: index-pack 16 threads                  70.90(655.03+7.21)
  5302.9: index-pack default number of threads   116.91(290.05+3.21)

You can see that wall-clock times continue to improve dramatically up to
the number of cores, but bumping beyond that (into hyperthreading
territory) does not help (and in fact hurts a little).

Here's the same experiment on a machine with dual Xeon 6230's, totaling
40 cores (80 with hyperthreading):

  5302.3: index-pack 0 threads                    310.04(302.73+6.90)
  5302.4: index-pack 1 threads                    310.55(302.68+7.40)
  5302.5: index-pack 2 threads                    178.17(304.89+8.20)
  5302.6: index-pack 5 threads                    99.53(315.54+9.56)
  5302.7: index-pack 10 threads                   72.80(327.37+12.79)
  5302.8: index-pack 20 threads                   60.68(357.74+21.66)
  5302.9: index-pack 40 threads                   58.07(454.44+67.96)
  5302.10: index-pack 80 threads                  59.81(720.45+334.52)
  5302.11: index-pack default number of threads   134.18(309.32+7.98)

The results are similar; things stop improving at 40 threads. Curiously,
going from 20 to 40 really doesn't help much, either (and increases CPU
time considerably). So that may represent an actual barrier to
parallelism, where we lose out due to context-switching and loss of
cache locality, but don't reap the wall-clock benefits due to contention
of our coarse-grained locks.

So what's a good default value? It's clear that the current cap of 3 is
too low; our default values are 42% and 57% slower than the best times
on each machine. The results on the 40-core machine imply that 20
threads is an actual barrier regardless of the number of cores, so we'll
take that as a maximum. We get the best results on these machines at
half of the online-cpus value. That's presumably a result of the
hyperthreading. That's common on multi-core Intel processors, but not
necessarily elsewhere. But if we take it as an assumption, we can
perform optimally on hyperthreaded machines and still do much better
than the status quo on other machines, as long as we never half below
the current value of 3.

So that's what this patch does.

Signed-off-by: Jeff King <peff@peff.net>
---
Somebody with an AMD machine with a bunch of cores may want to revisit
this. I don't think it makes anything worse, but they may want to not do
the halving. I don't know if there's an easy way for us to determine the
difference via online_cpus(), though.

 builtin/index-pack.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f865666db9..9721bf1ffe 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1798,9 +1798,22 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	if (HAVE_THREADS && !nr_threads) {
 		nr_threads = online_cpus();
-		/* An experiment showed that more threads does not mean faster */
-		if (nr_threads > 3)
-			nr_threads = 3;
+		/*
+		 * Experiments show that going above 20 threads doesn't help,
+		 * no matter how many cores you have. Below that, we tend to
+		 * max at half the number of online_cpus(), presumably because
+		 * half of those are hyperthreads rather than full cores. We'll
+		 * never reduce the level below "3", though, to match a
+		 * historical value that nobody complained about.
+		 */
+		if (nr_threads < 4)
+			; /* too few cores to consider capping */
+		else if (nr_threads < 6)
+			nr_threads = 3; /* historic cap */
+		else if (nr_threads < 40)
+			nr_threads /= 2;
+		else
+			nr_threads = 20; /* hard cap */
 	}
 
 	curr_pack = open_pack_file(pack_name);
-- 
2.28.0.694.g07780f7063

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] p5302: count up to online-cpus for thread tests
  2020-08-21 17:54 ` [PATCH 2/3] p5302: count up to online-cpus for thread tests Jeff King
@ 2020-08-21 17:58   ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-08-21 17:58 UTC (permalink / raw)
  To: git

On Fri, Aug 21, 2020 at 01:54:51PM -0400, Jeff King wrote:

>  t/perf/p5302-pack-index.sh | 47 +++++++++++++++++++-------------------
>  1 file changed, 24 insertions(+), 23 deletions(-)

I meant to generate this diff with --patience, which is much more
readable:

diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
index 23011ab739..228593d9ad 100755
--- a/t/perf/p5302-pack-index.sh
+++ b/t/perf/p5302-pack-index.sh
@@ -13,35 +13,36 @@ test_expect_success 'repack' '
 	export PACK
 '
 
+# Rather than counting up and doubling each time, count down from the endpoint,
+# halving each time. That ensures that our final test uses as many threads as
+# CPUs, even if it isn't a power of 2.
+test_expect_success 'set up thread-counting tests' '
+	t=$(test-tool online-cpus) &&
+	threads= &&
+	while test $t -gt 0
+	do
+		threads="$t $threads"
+		t=$((t / 2))
+	done
+'
+
 test_perf PERF_EXTRA 'index-pack 0 threads' '
 	rm -rf repo.git &&
 	git init --bare repo.git &&
 	GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK
 '
 
-test_perf PERF_EXTRA 'index-pack 1 thread ' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
-'
-
-test_perf PERF_EXTRA 'index-pack 2 threads' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git git index-pack --threads=2 --stdin < $PACK
-'
-
-test_perf PERF_EXTRA 'index-pack 4 threads' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git git index-pack --threads=4 --stdin < $PACK
-'
-
-test_perf PERF_EXTRA 'index-pack 8 threads' '
-	rm -rf repo.git &&
-	git init --bare repo.git &&
-	GIT_DIR=repo.git git index-pack --threads=8 --stdin < $PACK
-'
+for t in $threads
+do
+	THREADS=$t
+	export THREADS
+	test_perf PERF_EXTRA "index-pack $t threads" '
+		rm -rf repo.git &&
+		git init --bare repo.git &&
+		GIT_DIR=repo.git GIT_FORCE_THREADS=1 \
+		git index-pack --threads=$THREADS --stdin <$PACK
+	'
+done
 
 test_perf 'index-pack default number of threads' '
 	rm -rf repo.git &&
-- 
2.28.0.694.g07780f7063


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] index-pack: adjust default threading cap
  2020-08-21 17:58 ` [PATCH 3/3] index-pack: adjust default threading cap Jeff King
@ 2020-08-21 18:08   ` Eric Sunshine
  2020-08-21 18:41     ` Jeff King
  2020-08-22  1:16   ` brian m. carlson
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Sunshine @ 2020-08-21 18:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Fri, Aug 21, 2020 at 1:58 PM Jeff King <peff@peff.net> wrote:
> So what's a good default value? It's clear that the current cap of 3 is
> too low; our default values are 42% and 57% slower than the best times
> on each machine. The results on the 40-core machine imply that 20
> threads is an actual barrier regardless of the number of cores, so we'll
> take that as a maximum. We get the best results on these machines at
> half of the online-cpus value. That's presumably a result of the
> hyperthreading. That's common on multi-core Intel processors, but not
> necessarily elsewhere. But if we take it as an assumption, we can
> perform optimally on hyperthreaded machines and still do much better
> than the status quo on other machines, as long as we never half below
> the current value of 3.

I'm not familiar with the index-pack machinery, so this response may
be silly, but the first question which came to my mind was whether or
not SSD vs. spinning-platter disk impacts these results, and which of
the two you were using for the tests (which I don't think was
mentioned in any of the commit messages). So, basically, I'm wondering
about the implication of this change for those of us still stuck with
old spinning-platter disks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] index-pack: adjust default threading cap
  2020-08-21 18:08   ` Eric Sunshine
@ 2020-08-21 18:41     ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-08-21 18:41 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Fri, Aug 21, 2020 at 02:08:55PM -0400, Eric Sunshine wrote:

> On Fri, Aug 21, 2020 at 1:58 PM Jeff King <peff@peff.net> wrote:
> > So what's a good default value? It's clear that the current cap of 3 is
> > too low; our default values are 42% and 57% slower than the best times
> > on each machine. The results on the 40-core machine imply that 20
> > threads is an actual barrier regardless of the number of cores, so we'll
> > take that as a maximum. We get the best results on these machines at
> > half of the online-cpus value. That's presumably a result of the
> > hyperthreading. That's common on multi-core Intel processors, but not
> > necessarily elsewhere. But if we take it as an assumption, we can
> > perform optimally on hyperthreaded machines and still do much better
> > than the status quo on other machines, as long as we never half below
> > the current value of 3.
> 
> I'm not familiar with the index-pack machinery, so this response may
> be silly, but the first question which came to my mind was whether or
> not SSD vs. spinning-platter disk impacts these results, and which of
> the two you were using for the tests (which I don't think was
> mentioned in any of the commit messages). So, basically, I'm wondering
> about the implication of this change for those of us still stuck with
> old spinning-platter disks.

They were both SSD machines, but it wouldn't matter for these tests
because they easily fit the whole pack into memory anyway.

But in the general case, I don't think disk performance would be
relevant. Delta resolution is very CPU-bound, because it's
de-compressing data and then computing its SHA-1. So linux.git, for
instance, is looking at ~1.3GB on disk that expands to 87.5GB of bytes
to run through SHA-1.

And it would be pretty unlikely to hit the disk anyway, as the thing we
primarily index is incoming packs which we've literally just written. So
I'd expect them to be in cache.

Of course, if you can get different numbers from p5302, I'd be curious
to hear them. :)

A more plausible downside might be that memory usage would increase as
we operate on multiple deltas at once. But pack-objects is already much
more hungry here, as it runs online_cpus() delta-compression threads
simultaneously, each of which may have up to window_size entries in
memory at once.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] index-pack threading defaults
  2020-08-21 17:51 [PATCH 0/3] index-pack threading defaults Jeff King
                   ` (2 preceding siblings ...)
  2020-08-21 17:58 ` [PATCH 3/3] index-pack: adjust default threading cap Jeff King
@ 2020-08-21 18:44 ` Jeff King
  2020-08-21 18:59   ` Junio C Hamano
  3 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-08-21 18:44 UTC (permalink / raw)
  To: git

On Fri, Aug 21, 2020 at 01:51:53PM -0400, Jeff King wrote:

> That value was determined experimentally in 2012. I'm not sure of the
> exact reason it's different now (modern processors are better at
> parallelism, or modern git is better at parallelism, or the original
> experiment was just a fluke). But regardless, I can get on the order of
> a two-to-one speedup by bumping the thread count. See the final patch
> for timings and more specific discussion.

After writing a response elsewhere in the thread, it occurred to me that
a good candidate for explaining this may be that our modern sha1dc
implementation is way slower than what we were using in 2012 (which
would have been either block-sha1, or the even-faster openssl
implementation). And since a good chunk of index-pack's time is going to
computing sha1 hashes on the resulting objects, that means that since
2012, we're spending relatively more time in the hash computation (which
parallelizes per-object) and less time in the other parts that happen
under a lock.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] index-pack threading defaults
  2020-08-21 18:44 ` [PATCH 0/3] index-pack threading defaults Jeff King
@ 2020-08-21 18:59   ` Junio C Hamano
  2020-08-21 19:14     ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-08-21 18:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Fri, Aug 21, 2020 at 01:51:53PM -0400, Jeff King wrote:
>
>> That value was determined experimentally in 2012. I'm not sure of the
>> exact reason it's different now (modern processors are better at
>> parallelism, or modern git is better at parallelism, or the original
>> experiment was just a fluke). But regardless, I can get on the order of
>> a two-to-one speedup by bumping the thread count. See the final patch
>> for timings and more specific discussion.
>
> After writing a response elsewhere in the thread, it occurred to me that
> a good candidate for explaining this may be that our modern sha1dc
> implementation is way slower than what we were using in 2012 (which
> would have been either block-sha1, or the even-faster openssl
> implementation). And since a good chunk of index-pack's time is going to
> computing sha1 hashes on the resulting objects, that means that since
> 2012, we're spending relatively more time in the hash computation (which
> parallelizes per-object) and less time in the other parts that happen
> under a lock.

Believable conjecture that is.  You could benchmark again with
block-sha1 on today's hardware, but because the performance profile
with sha1dc is what matters in the real world anyway...

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] index-pack threading defaults
  2020-08-21 18:59   ` Junio C Hamano
@ 2020-08-21 19:14     ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-08-21 19:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Aug 21, 2020 at 11:59:58AM -0700, Junio C Hamano wrote:

> > After writing a response elsewhere in the thread, it occurred to me that
> > a good candidate for explaining this may be that our modern sha1dc
> > implementation is way slower than what we were using in 2012 (which
> > would have been either block-sha1, or the even-faster openssl
> > implementation). And since a good chunk of index-pack's time is going to
> > computing sha1 hashes on the resulting objects, that means that since
> > 2012, we're spending relatively more time in the hash computation (which
> > parallelizes per-object) and less time in the other parts that happen
> > under a lock.
> 
> Believable conjecture that is.  You could benchmark again with
> block-sha1 on today's hardware, but because the performance profile
> with sha1dc is what matters in the real world anyway...

Yeah, I agree on the "real world" part, but I'm the curious sort, so
here are numbers compiled against openssl (which is generally even
faster than block-sha1, and would thus emphasize the results of our
hypothesis):

  5302.3: index-pack 0 threads                   108.78(106.39+2.31)
  5302.4: index-pack 1 threads                   110.65(108.08+2.49)
  5302.5: index-pack 2 threads                   67.57(110.83+2.75) 
  5302.6: index-pack 4 threads                   48.18(123.82+3.02) 
  5302.7: index-pack 8 threads                   39.07(153.45+4.13) 
  5302.8: index-pack 16 threads                  38.38(265.78+7.71) 
  5302.9: index-pack default number of threads   54.64(117.35+2.73)

So it's actually pretty similar. Things continue getting faster as we go
past 3 threads. Though our total improvement is less (29% better with 8
threads compared to 3, versus 42% better when using sha1dc). So I think
it's _part_ of the reason, but not all of it.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] index-pack: adjust default threading cap
  2020-08-21 17:58 ` [PATCH 3/3] index-pack: adjust default threading cap Jeff King
  2020-08-21 18:08   ` Eric Sunshine
@ 2020-08-22  1:16   ` brian m. carlson
  2020-08-24 17:37     ` Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2020-08-22  1:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

On 2020-08-21 at 17:58:00, Jeff King wrote:
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index f865666db9..9721bf1ffe 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1798,9 +1798,22 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>  
>  	if (HAVE_THREADS && !nr_threads) {
>  		nr_threads = online_cpus();
> -		/* An experiment showed that more threads does not mean faster */
> -		if (nr_threads > 3)
> -			nr_threads = 3;
> +		/*
> +		 * Experiments show that going above 20 threads doesn't help,
> +		 * no matter how many cores you have. Below that, we tend to
> +		 * max at half the number of online_cpus(), presumably because
> +		 * half of those are hyperthreads rather than full cores. We'll
> +		 * never reduce the level below "3", though, to match a
> +		 * historical value that nobody complained about.
> +		 */
> +		if (nr_threads < 4)
> +			; /* too few cores to consider capping */
> +		else if (nr_threads < 6)
> +			nr_threads = 3; /* historic cap */
> +		else if (nr_threads < 40)
> +			nr_threads /= 2;

I was going to ask if we could make the halving conditional on x86_64,
but it turns out POWER and UltraSPARC also have SMT, so that doesn't
make sense.  I expect that most users who have more than 6 "cores" are
going to be on one of those systems or possibly ARM64, and since the
performance penalty of using half as many cores isn't that significant,
I'm not sure it's worth worrying about further.  This will be an
improvement regardless.

Which is just a long way of saying, this patch seems fine to me.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] index-pack: adjust default threading cap
  2020-08-22  1:16   ` brian m. carlson
@ 2020-08-24 17:37     ` Jeff King
  2020-08-24 17:55       ` Eric Sunshine
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-08-24 17:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

On Sat, Aug 22, 2020 at 01:16:07AM +0000, brian m. carlson wrote:

> > +		if (nr_threads < 4)
> > +			; /* too few cores to consider capping */
> > +		else if (nr_threads < 6)
> > +			nr_threads = 3; /* historic cap */
> > +		else if (nr_threads < 40)
> > +			nr_threads /= 2;
> 
> I was going to ask if we could make the halving conditional on x86_64,
> but it turns out POWER and UltraSPARC also have SMT, so that doesn't
> make sense.  I expect that most users who have more than 6 "cores" are
> going to be on one of those systems or possibly ARM64, and since the
> performance penalty of using half as many cores isn't that significant,
> I'm not sure it's worth worrying about further.  This will be an
> improvement regardless.
> 
> Which is just a long way of saying, this patch seems fine to me.

OK, good. :) I agree there may be room for more improvement on those
systems. But lacking access to any, my goal was to make things better on
systems I _could_ test on, and not make things worse on other systems.
So I'd be very happy if people on other platforms (especially non-intel
ones) wanted to run:

  cd t/perf
  GIT_PERF_EXTRA=1 \
  GIT_PERF_LARGE_REPO=/path/to/clone/of/linux.git \
  ./p5302-pack-index.sh

and report the results.

I do have a slightly-old AMD machine with 4 cores (an A8-7600). Here's
what it says:

  5302.3: index-pack 0 threads                   447.67(436.62+6.57)
  5302.4: index-pack 1 threads                   450.80(441.26+7.20)
  5302.5: index-pack 2 threads                   265.62(459.56+7.30)
  5302.6: index-pack 4 threads                   177.06(477.56+8.22)
  5302.7: index-pack default number of threads   202.60(473.15+7.61)

So it does get better with 4 threads (but we continue to cap it at 3).
I wonder whether we should just do:

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 9721bf1ffe..d7453d0c09 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1809,7 +1809,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		if (nr_threads < 4)
 			; /* too few cores to consider capping */
 		else if (nr_threads < 6)
-			nr_threads = 3; /* historic cap */
+			nr_threads = nr_threads;
 		else if (nr_threads < 40)
 			nr_threads /= 2;
 		else

That does probably make things slightly worse for a 6-core hyperthreaded
Intel machine. And it doesn't help an actual 8-core AMD machine at all.

-Peff

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] index-pack: adjust default threading cap
  2020-08-24 17:37     ` Jeff King
@ 2020-08-24 17:55       ` Eric Sunshine
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Sunshine @ 2020-08-24 17:55 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, git

On Mon, Aug 24, 2020 at 01:37:35PM -0400, Jeff King wrote:
> So I'd be very happy if people on other platforms (especially non-intel
> ones) wanted to run:
> 
>   cd t/perf
>   GIT_PERF_EXTRA=1 \
>   GIT_PERF_LARGE_REPO=/path/to/clone/of/linux.git \
>   ./p5302-pack-index.sh
> 
> and report the results.
> 
> I do have a slightly-old AMD machine with 4 cores (an A8-7600). Here's
> what it says:
> 
>   5302.3: index-pack 0 threads                   447.67(436.62+6.57)
>   5302.4: index-pack 1 threads                   450.80(441.26+7.20)
>   5302.5: index-pack 2 threads                   265.62(459.56+7.30)
>   5302.6: index-pack 4 threads                   177.06(477.56+8.22)
>   5302.7: index-pack default number of threads   202.60(473.15+7.61)

I tested on my 9.5 year old 4-core iMac with 2.5 GHz Intel Core i5,
spinning platter hard disk, and 12 GB memory (recently upgraded from 4
GB). I used git.git rather than linux.git because I didn't want to
wait several days for the results (plus this is my primary machine
which I'm actively using). Results were similar to what you saw:

  5302.3: index-pack 0 threads                   14.85(14.10+0.66)
  5302.4: index-pack 1 threads                   14.65(13.83+0.63)
  5302.5: index-pack 2 threads                   9.11(14.40+0.80) 
  5302.6: index-pack 4 threads                   6.89(17.03+1.32) 
  5302.7: index-pack default number of threads   7.72(16.15+1.15) 

Thanks for providing exact instructions for running the test; I had
never done it before, thus didn't know how.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-08-24 17:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21 17:51 [PATCH 0/3] index-pack threading defaults Jeff King
2020-08-21 17:53 ` [PATCH 1/3] p5302: disable thread-count parameter tests by default Jeff King
2020-08-21 17:54 ` [PATCH 2/3] p5302: count up to online-cpus for thread tests Jeff King
2020-08-21 17:58   ` Jeff King
2020-08-21 17:58 ` [PATCH 3/3] index-pack: adjust default threading cap Jeff King
2020-08-21 18:08   ` Eric Sunshine
2020-08-21 18:41     ` Jeff King
2020-08-22  1:16   ` brian m. carlson
2020-08-24 17:37     ` Jeff King
2020-08-24 17:55       ` Eric Sunshine
2020-08-21 18:44 ` [PATCH 0/3] index-pack threading defaults Jeff King
2020-08-21 18:59   ` Junio C Hamano
2020-08-21 19:14     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).