From: Jeff Hostetler <git@jeffhostetler.com>
To: Matheus Tavares <matheus.bernardino@usp.br>, git@vger.kernel.org
Cc: gerardu@amazon.com
Subject: Re: RFC: auto-enabling parallel-checkout on NFS
Date: Mon, 16 Nov 2020 10:19:09 -0500 [thread overview]
Message-ID: <9c999e38-34db-84bb-3a91-ae2a62b964b5@jeffhostetler.com> (raw)
In-Reply-To: <20201115194359.67901-1-matheus.bernardino@usp.br>
On 11/15/20 2:43 PM, Matheus Tavares wrote:
> Hi, everyone
>
> I've been testing the parallel checkout code on some different machines,
> to benchmark its performance against the sequential version. As
> discussed in [1], the biggest speedups, on Linux, are usually seen on
> SSDs and NFS volumes. (I haven't got the chance to benchmark on
> Windows or OSX yet.)
>
> Regarding NFS, I got some 2~3.4x speedups even when the NFS client and
> server were both running on single-core machines. Here are some runtimes
> for a linux-v5.8 clone (means of 5 cold-cache executions):
>
> nfs 3.0 nfs 4.0 nfs 4.1
> 1: 183.708 s ± 3.290 s 205.851 s ± 0.844 s 217.317 s ± 3.047 s
> 2: 130.510 s ± 3.917 s 139.124 s ± 0.772 s 142.963 s ± 0.765 s
> 4: 89.611 s ± 1.032 s 102.701 s ± 1.665 s 94.728 s ± 1.014 s
> 8: 68.097 s ± 0.820 s 104.914 s ± 1.239 s 69.359 s ± 0.619 s
> 16: 63.999 s ± 0.820 s 104.808 s ± 2.279 s 64.843 s ± 0.587 s
> 32: 62.316 s ± 2.095 s 102.105 s ± 1.537 s 64.122 s ± 0.374 s
> 64: 63.699 s ± 0.841 s 103.103 s ± 1.319 s 63.532 s ± 0.734 s
>
> The parallel version was also faster for some smaller checkouts. For
> example, the following numbers come from a bat-v0.16.0 clone
> (251 files, ~3MB):
>
> nfs 3.0 nfs 4.0 nfs 4.1
> 1: 0.853 s ± 0.080 s 0.814 s ± 0.020 s 0.876 s ± 0.065 s
> 2: 0.671 s ± 0.020 s 0.702 s ± 0.030 s 0.705 s ± 0.030 s
> 4: 0.530 s ± 0.024 s 0.595 s ± 0.020 s 0.570 s ± 0.030 s
> 8: 0.470 s ± 0.033 s 0.609 s ± 0.025 s 0.510 s ± 0.031 s
> 16: 0.469 s ± 0.037 s 0.616 s ± 0.022 s 0.513 s ± 0.030 s
> 32: 0.487 s ± 0.030 s 0.639 s ± 0.018 s 0.527 s ± 0.028 s
> 64: 0.520 s ± 0.022 s 0.680 s ± 0.028 s 0.562 s ± 0.026 s
>
> While looking at these numbers with Geert (in CC), he had the idea that
> we could try to detect when the checkout path is within an NFS mount,
> and auto-enable paralellism for this case. This way, users in NFS would
> get the best performance by default. And it seems that using ~16 workers
> would produce good results regardless of the NFS version that they might
> be running.
>
> The major downside is that detecting the file system type is quite
> platform-dependent, so there is no simple and portable solution. (Also,
> I'm not sure if the optimal number of workers would be the same on
> different OSes). But we decided to give it a try, so this is a
> rough prototype that would work for Linux:
> https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9
>
> Any thoughts on this idea? Or alternative suggestions?
>
> Thanks,
> Matheus
>
> [1]: https://lore.kernel.org/git/815137685ac3e41444201316f537db9797dcacd2.1604521276.git.matheus.bernardino@usp.br/
>
I can't really speak to NFS performance, but I have to wonder if there's
not something else affecting the results -- 4 and/or 8 core results are
better than 16+ results in some columns. And we get diminishing returns
after ~16.
I'm wondering if during these test runs, you were IO vs CPU bound and if
VM was a problem. I'm wondering if setting thread affinity would help
here.
Jeff
next prev parent reply other threads:[~2020-11-16 15:20 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares
2020-11-16 15:19 ` Jeff Hostetler [this message]
2020-11-19 4:01 ` Matheus Tavares
2020-11-19 14:04 ` Jeff Hostetler
2020-11-20 12:10 ` Ævar Arnfjörð Bjarmason
2020-11-23 23:18 ` Geert Jansen
2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason
2020-11-19 14:11 ` Jeff Hostetler
2020-11-23 23:37 ` Geert Jansen
2020-11-24 12:58 ` Ævar Arnfjörð Bjarmason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9c999e38-34db-84bb-3a91-ae2a62b964b5@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=gerardu@amazon.com \
--cc=git@vger.kernel.org \
--cc=matheus.bernardino@usp.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).