* RFC: auto-enabling parallel-checkout on NFS @ 2020-11-15 19:43 Matheus Tavares 2020-11-16 15:19 ` Jeff Hostetler 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 10+ messages in thread From: Matheus Tavares @ 2020-11-15 19:43 UTC (permalink / raw) To: git; +Cc: gerardu Hi, everyone I've been testing the parallel checkout code on some different machines, to benchmark its performance against the sequential version. As discussed in [1], the biggest speedups, on Linux, are usually seen on SSDs and NFS volumes. (I haven't got the chance to benchmark on Windows or OSX yet.) Regarding NFS, I got some 2~3.4x speedups even when the NFS client and server were both running on single-core machines. Here are some runtimes for a linux-v5.8 clone (means of 5 cold-cache executions): nfs 3.0 nfs 4.0 nfs 4.1 1: 183.708 s ± 3.290 s 205.851 s ± 0.844 s 217.317 s ± 3.047 s 2: 130.510 s ± 3.917 s 139.124 s ± 0.772 s 142.963 s ± 0.765 s 4: 89.611 s ± 1.032 s 102.701 s ± 1.665 s 94.728 s ± 1.014 s 8: 68.097 s ± 0.820 s 104.914 s ± 1.239 s 69.359 s ± 0.619 s 16: 63.999 s ± 0.820 s 104.808 s ± 2.279 s 64.843 s ± 0.587 s 32: 62.316 s ± 2.095 s 102.105 s ± 1.537 s 64.122 s ± 0.374 s 64: 63.699 s ± 0.841 s 103.103 s ± 1.319 s 63.532 s ± 0.734 s The parallel version was also faster for some smaller checkouts. For example, the following numbers come from a bat-v0.16.0 clone (251 files, ~3MB): nfs 3.0 nfs 4.0 nfs 4.1 1: 0.853 s ± 0.080 s 0.814 s ± 0.020 s 0.876 s ± 0.065 s 2: 0.671 s ± 0.020 s 0.702 s ± 0.030 s 0.705 s ± 0.030 s 4: 0.530 s ± 0.024 s 0.595 s ± 0.020 s 0.570 s ± 0.030 s 8: 0.470 s ± 0.033 s 0.609 s ± 0.025 s 0.510 s ± 0.031 s 16: 0.469 s ± 0.037 s 0.616 s ± 0.022 s 0.513 s ± 0.030 s 32: 0.487 s ± 0.030 s 0.639 s ± 0.018 s 0.527 s ± 0.028 s 64: 0.520 s ± 0.022 s 0.680 s ± 0.028 s 0.562 s ± 0.026 s While looking at these numbers with Geert (in CC), he had the idea that we could try to detect when the checkout path is within an NFS mount, and auto-enable paralellism for this case. This way, users in NFS would get the best performance by default. And it seems that using ~16 workers would produce good results regardless of the NFS version that they might be running. The major downside is that detecting the file system type is quite platform-dependent, so there is no simple and portable solution. (Also, I'm not sure if the optimal number of workers would be the same on different OSes). But we decided to give it a try, so this is a rough prototype that would work for Linux: https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9 Any thoughts on this idea? Or alternative suggestions? Thanks, Matheus [1]: https://lore.kernel.org/git/815137685ac3e41444201316f537db9797dcacd2.1604521276.git.matheus.bernardino@usp.br/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares @ 2020-11-16 15:19 ` Jeff Hostetler 2020-11-19 4:01 ` Matheus Tavares 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 10+ messages in thread From: Jeff Hostetler @ 2020-11-16 15:19 UTC (permalink / raw) To: Matheus Tavares, git; +Cc: gerardu On 11/15/20 2:43 PM, Matheus Tavares wrote: > Hi, everyone > > I've been testing the parallel checkout code on some different machines, > to benchmark its performance against the sequential version. As > discussed in [1], the biggest speedups, on Linux, are usually seen on > SSDs and NFS volumes. (I haven't got the chance to benchmark on > Windows or OSX yet.) > > Regarding NFS, I got some 2~3.4x speedups even when the NFS client and > server were both running on single-core machines. Here are some runtimes > for a linux-v5.8 clone (means of 5 cold-cache executions): > > nfs 3.0 nfs 4.0 nfs 4.1 > 1: 183.708 s ± 3.290 s 205.851 s ± 0.844 s 217.317 s ± 3.047 s > 2: 130.510 s ± 3.917 s 139.124 s ± 0.772 s 142.963 s ± 0.765 s > 4: 89.611 s ± 1.032 s 102.701 s ± 1.665 s 94.728 s ± 1.014 s > 8: 68.097 s ± 0.820 s 104.914 s ± 1.239 s 69.359 s ± 0.619 s > 16: 63.999 s ± 0.820 s 104.808 s ± 2.279 s 64.843 s ± 0.587 s > 32: 62.316 s ± 2.095 s 102.105 s ± 1.537 s 64.122 s ± 0.374 s > 64: 63.699 s ± 0.841 s 103.103 s ± 1.319 s 63.532 s ± 0.734 s > > The parallel version was also faster for some smaller checkouts. For > example, the following numbers come from a bat-v0.16.0 clone > (251 files, ~3MB): > > nfs 3.0 nfs 4.0 nfs 4.1 > 1: 0.853 s ± 0.080 s 0.814 s ± 0.020 s 0.876 s ± 0.065 s > 2: 0.671 s ± 0.020 s 0.702 s ± 0.030 s 0.705 s ± 0.030 s > 4: 0.530 s ± 0.024 s 0.595 s ± 0.020 s 0.570 s ± 0.030 s > 8: 0.470 s ± 0.033 s 0.609 s ± 0.025 s 0.510 s ± 0.031 s > 16: 0.469 s ± 0.037 s 0.616 s ± 0.022 s 0.513 s ± 0.030 s > 32: 0.487 s ± 0.030 s 0.639 s ± 0.018 s 0.527 s ± 0.028 s > 64: 0.520 s ± 0.022 s 0.680 s ± 0.028 s 0.562 s ± 0.026 s > > While looking at these numbers with Geert (in CC), he had the idea that > we could try to detect when the checkout path is within an NFS mount, > and auto-enable paralellism for this case. This way, users in NFS would > get the best performance by default. And it seems that using ~16 workers > would produce good results regardless of the NFS version that they might > be running. > > The major downside is that detecting the file system type is quite > platform-dependent, so there is no simple and portable solution. (Also, > I'm not sure if the optimal number of workers would be the same on > different OSes). But we decided to give it a try, so this is a > rough prototype that would work for Linux: > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9 > > Any thoughts on this idea? Or alternative suggestions? > > Thanks, > Matheus > > [1]: https://lore.kernel.org/git/815137685ac3e41444201316f537db9797dcacd2.1604521276.git.matheus.bernardino@usp.br/ > I can't really speak to NFS performance, but I have to wonder if there's not something else affecting the results -- 4 and/or 8 core results are better than 16+ results in some columns. And we get diminishing returns after ~16. I'm wondering if during these test runs, you were IO vs CPU bound and if VM was a problem. I'm wondering if setting thread affinity would help here. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-16 15:19 ` Jeff Hostetler @ 2020-11-19 4:01 ` Matheus Tavares 2020-11-19 14:04 ` Jeff Hostetler 0 siblings, 1 reply; 10+ messages in thread From: Matheus Tavares @ 2020-11-19 4:01 UTC (permalink / raw) To: git; +Cc: gerardu, git, matheus.bernardino Hi, Jeff On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > > I can't really speak to NFS performance, but I have to wonder if there's > not something else affecting the results -- 4 and/or 8 core results are > better than 16+ results in some columns. And we get diminishing returns > after ~16. Yeah, that's a good point. I'm not sure yet what's causing the diminishing returns, but Geert and I are investigating. Maybe we are hitting some limit for parallelism in this scenario. > I'm wondering if during these test runs, you were IO vs CPU bound and if > VM was a problem. I would say we are more IO bound during these tests. While a sequential linux-v5.8 checkout usually uses 100% of one core in my laptop's SSD, in this setup, it only used 5% to 10%. And even with 64 workers (on a single core), CPU usage stays around 60% most of the time. About memory, the peak PSS was around 1.75GB, with 64 workers, and the machine has 10GB of RAM. But are there other numbers that I should keep an eye on while running the test? > I'm wondering if setting thread affinity would help here. Hmm, I only had one core online during the benchmark, so I think thread affinity wouldn't impact the runtime. Thanks, Matheus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-19 4:01 ` Matheus Tavares @ 2020-11-19 14:04 ` Jeff Hostetler 2020-11-20 12:10 ` Ævar Arnfjörð Bjarmason 2020-11-23 23:18 ` Geert Jansen 0 siblings, 2 replies; 10+ messages in thread From: Jeff Hostetler @ 2020-11-19 14:04 UTC (permalink / raw) To: Matheus Tavares; +Cc: gerardu, git On 11/18/20 11:01 PM, Matheus Tavares wrote: > Hi, Jeff > > On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote: >> >> I can't really speak to NFS performance, but I have to wonder if there's >> not something else affecting the results -- 4 and/or 8 core results are >> better than 16+ results in some columns. And we get diminishing returns >> after ~16. > > Yeah, that's a good point. I'm not sure yet what's causing the > diminishing returns, but Geert and I are investigating. Maybe we are > hitting some limit for parallelism in this scenario. I seem to recall back when I was working on this problem that the unzip of each blob was a major pain point. Combine this long delta-chains and each worker would need multiple rounds of read/memmap, unzip, and de-delta before it had the complete blob and could then smudge and write. This makes me wonder if repacking the repo with shorter delta-chains affects the checkout times. And improves the perf when there are more workers. I'm not saying that this is a solution, but rather an experiment to see if it changes anything and maybe adjust our focus. > >> I'm wondering if during these test runs, you were IO vs CPU bound and if >> VM was a problem. > > I would say we are more IO bound during these tests. While a sequential > linux-v5.8 checkout usually uses 100% of one core in my laptop's SSD, > in this setup, it only used 5% to 10%. And even with 64 workers (on a > single core), CPU usage stays around 60% most of the time. > > About memory, the peak PSS was around 1.75GB, with 64 workers, and the > machine has 10GB of RAM. But are there other numbers that I should keep > an eye on while running the test? > >> I'm wondering if setting thread affinity would help here. > > Hmm, I only had one core online during the benchmark, so I think thread > affinity wouldn't impact the runtime. I wasn't really thinking about the 64 workers on 1 core case. I was more thinking about the 64 workers on 64 cores case and wondering if workers were being randomly bounced from core to core and we were thrashing. > > Thanks, > Matheus > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-19 14:04 ` Jeff Hostetler @ 2020-11-20 12:10 ` Ævar Arnfjörð Bjarmason 2020-11-23 23:18 ` Geert Jansen 1 sibling, 0 replies; 10+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2020-11-20 12:10 UTC (permalink / raw) To: Jeff Hostetler; +Cc: Matheus Tavares, gerardu, git On Thu, Nov 19 2020, Jeff Hostetler wrote: > On 11/18/20 11:01 PM, Matheus Tavares wrote: >> Hi, Jeff >> >> On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote: >>> >>> I can't really speak to NFS performance, but I have to wonder if there's >>> not something else affecting the results -- 4 and/or 8 core results are >>> better than 16+ results in some columns. And we get diminishing returns >>> after ~16. >> >> Yeah, that's a good point. I'm not sure yet what's causing the >> diminishing returns, but Geert and I are investigating. Maybe we are >> hitting some limit for parallelism in this scenario. > > I seem to recall back when I was working on this problem that > the unzip of each blob was a major pain point. Combine this > long delta-chains and each worker would need multiple rounds of > read/memmap, unzip, and de-delta before it had the complete blob > and could then smudge and write. > > This makes me wonder if repacking the repo with shorter delta-chains > affects the checkout times. And improves the perf when there are > more workers. I'm not saying that this is a solution, but rather > an experiment to see if it changes anything and maybe adjust our > focus. I've had part success with "git gc --keep-largest-pack" / gc.bigPackThreshold=N where N is at least the size you get from a fresh "git clone" when on NFS. It has the effect of essentially implementing a version of what you're suggesting, but in an arguably better way. Your initial clone will have whatever depth of chains you have, but all new objects pulled down will go into new packs/objects that won't share chains with that old big pack. So your repository will be bigger overall, but your old/new pack/pack(s) will eventually come to mostly reflect a cold/hot object storage. So what you need from a pack is more likely to already have been fetched into the FS cache, and over an NFS mount those requests may have been pre-fetched/fetched already. You can also more effectively warm the local OS cache by cat-ing >/dev/null the pack-files that aren't the big large pack on-login or whatever. >> >>> I'm wondering if during these test runs, you were IO vs CPU bound and if >>> VM was a problem. >> >> I would say we are more IO bound during these tests. While a sequential >> linux-v5.8 checkout usually uses 100% of one core in my laptop's SSD, >> in this setup, it only used 5% to 10%. And even with 64 workers (on a >> single core), CPU usage stays around 60% most of the time. >> >> About memory, the peak PSS was around 1.75GB, with 64 workers, and the >> machine has 10GB of RAM. But are there other numbers that I should keep >> an eye on while running the test? >> >>> I'm wondering if setting thread affinity would help here. >> >> Hmm, I only had one core online during the benchmark, so I think thread >> affinity wouldn't impact the runtime. > > I wasn't really thinking about the 64 workers on 1 core case. I was > more thinking about the 64 workers on 64 cores case and wondering > if workers were being randomly bounced from core to core and we were > thrashing. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-19 14:04 ` Jeff Hostetler 2020-11-20 12:10 ` Ævar Arnfjörð Bjarmason @ 2020-11-23 23:18 ` Geert Jansen 1 sibling, 0 replies; 10+ messages in thread From: Geert Jansen @ 2020-11-23 23:18 UTC (permalink / raw) To: Jeff Hostetler; +Cc: Matheus Tavares, git Hi Jeff, On Thu, Nov 19, 2020 at 09:04:34AM -0500, Jeff Hostetler wrote: > On 11/18/20 11:01 PM, Matheus Tavares wrote: > > > >On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > >> > >>I can't really speak to NFS performance, but I have to wonder if there's > >>not something else affecting the results -- 4 and/or 8 core results are > >>better than 16+ results in some columns. And we get diminishing returns > >>after ~16. > > > >Yeah, that's a good point. I'm not sure yet what's causing the > >diminishing returns, but Geert and I are investigating. Maybe we are > >hitting some limit for parallelism in this scenario. > > I seem to recall back when I was working on this problem that > the unzip of each blob was a major pain point. Combine this > long delta-chains and each worker would need multiple rounds of > read/memmap, unzip, and de-delta before it had the complete blob > and could then smudge and write. I think that there are two cases here: 1) (CPU bound case) On local machines with multiple cores and SSD disks, checkout is CPU bound and the parallel checkout works because the unzipping can now run on multiple CPUs in parallel. Shorter chains would use less CPU time and we'd see a smilar benefit on both paralell and sequential checkout. 2) (IO bound case) On networked file systems, file system IO is pretty much always the bottleneck for git and similar applications that use small files. On NFS calling open() is always a round trip, and so is close() (in the absence of delegations and O_CREAT). The latency of these calls depends on the NFS server and network distance, but 1ms is a reasonable order of magnitude when thinking about this. Beause this 1ms is a lot more than the typical CPU time to process a single blob, checkout will be IO bound. Parallel checkout works by allowing the application to maintain an IO depth > 1 for these workloads, which amortizes the network latency over multiple requests. Regarding the diminishing returns: I did some initial analysis of Mattheus' code and I'm not sure yet. I see the code achieving a high IO depth in our server logs, which would indicate that the diminishing returns are caused by file system contention. This would have to be some kind of general contention since it happens both on NFS and EFS. I will do a deeper investigation on this and will report what I find. Best regards, Geert ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares 2020-11-16 15:19 ` Jeff Hostetler @ 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason 2020-11-19 14:11 ` Jeff Hostetler 2020-11-23 23:37 ` Geert Jansen 1 sibling, 2 replies; 10+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2020-11-19 9:01 UTC (permalink / raw) To: Matheus Tavares; +Cc: git, gerardu, Derrick Stolee > The major downside is that detecting the file system type is quite > platform-dependent, so there is no simple and portable solution. (Also, > I'm not sure if the optimal number of workers would be the same on > different OSes). But we decided to give it a try, so this is a > rough prototype that would work for Linux: > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9 I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs() ? x : y" as a stopgap. I do think we should be thinking about a sustainable way of doing this sort of thing, this method of testing once and hardcoding something isn't a good approach. It doesn't anticipate all sorts of different setups, e.g. in this case NFS is not a FS, but a protocol, there's probably going to be some implementations where parallel is much worse due to a quirk of the implementation. I think integrating an optimization run with the relatively new git-maintenance is a better way forward. You'd configure e.g.: maintenance.performanceTests.enabled=true maintenance.performanceTests.writeConfig=true Which would run e.g.: git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false) git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers) Such an implementation can be really basic at first, or even just punt on the test and use your current "is it NFS?" check. But I think we should be moving to some helper that does the actual test locally when asked/configured by the user, so we're not making a bunch of guesses in advance about the size/shape of the repository, OS/nfs/fs etc. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason @ 2020-11-19 14:11 ` Jeff Hostetler 2020-11-23 23:37 ` Geert Jansen 1 sibling, 0 replies; 10+ messages in thread From: Jeff Hostetler @ 2020-11-19 14:11 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Matheus Tavares Cc: git, gerardu, Derrick Stolee On 11/19/20 4:01 AM, Ævar Arnfjörð Bjarmason wrote: > >> The major downside is that detecting the file system type is quite >> platform-dependent, so there is no simple and portable solution. (Also, >> I'm not sure if the optimal number of workers would be the same on >> different OSes). But we decided to give it a try, so this is a >> rough prototype that would work for Linux: >> https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9 > > I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs() > ? x : y" as a stopgap. > > I do think we should be thinking about a sustainable way of doing this > sort of thing, this method of testing once and hardcoding something > isn't a good approach. > > It doesn't anticipate all sorts of different setups, e.g. in this case > NFS is not a FS, but a protocol, there's probably going to be some > implementations where parallel is much worse due to a quirk of the > implementation. > > I think integrating an optimization run with the relatively new > git-maintenance is a better way forward. > > You'd configure e.g.: > > maintenance.performanceTests.enabled=true > maintenance.performanceTests.writeConfig=true > > Which would run e.g.: > > git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false) > git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers) > > Such an implementation can be really basic at first, or even just punt > on the test and use your current "is it NFS?" check. > > But I think we should be moving to some helper that does the actual test > locally when asked/configured by the user, so we're not making a bunch > of guesses in advance about the size/shape of the repository, OS/nfs/fs > etc. > I kinda like this idea. It would give us a chance to let maintenance periodically probe the repo/system and improve some of these tuning knobs. Jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason 2020-11-19 14:11 ` Jeff Hostetler @ 2020-11-23 23:37 ` Geert Jansen 2020-11-24 12:58 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 10+ messages in thread From: Geert Jansen @ 2020-11-23 23:37 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Matheus Tavares, git, Derrick Stolee Hi Ævar, On Thu, Nov 19, 2020 at 10:01:07AM +0100, Ævar Arnfjörð Bjarmason wrote: > > > The major downside is that detecting the file system type is quite > > platform-dependent, so there is no simple and portable solution. (Also, > > I'm not sure if the optimal number of workers would be the same on > > different OSes). But we decided to give it a try, so this is a > > rough prototype that would work for Linux: > > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9 > > I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs() > ? x : y" as a stopgap. > > I do think we should be thinking about a sustainable way of doing this > sort of thing, this method of testing once and hardcoding something > isn't a good approach. > > It doesn't anticipate all sorts of different setups, e.g. in this case > NFS is not a FS, but a protocol, there's probably going to be some > implementations where parallel is much worse due to a quirk of the > implementation. > > I think integrating an optimization run with the relatively new > git-maintenance is a better way forward. > > You'd configure e.g.: > > maintenance.performanceTests.enabled=true > maintenance.performanceTests.writeConfig=true > > Which would run e.g.: > > git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false) > git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers) > > Such an implementation can be really basic at first, or even just punt > on the test and use your current "is it NFS?" check. > > But I think we should be moving to some helper that does the actual test > locally when asked/configured by the user, so we're not making a bunch > of guesses in advance about the size/shape of the repository, OS/nfs/fs > etc. I like this idea as something that will give the best configuration for a given repository. I also know from working with customers for a long time that most users will use the default settings for almost any application, and that default configurations therefore matter a lot. The ideal experience would be, in my view, that a clone or checkout would automatically benefit from parallel checkout, even if this is the first checkout into a new repository. Maybe both ideas could be combined? We could have some reasonable heuristic based on file system type (and maybe number of CPUs) that gives most of the benefits of paralell checkout, while still being a reasonable compromise that that works across different NFS servers and file systems. Power users that want more aggressive tuning could run the maintenance command that measures file system performance and comes up with an optimal value for checkout.workers. Regards, Geert ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: RFC: auto-enabling parallel-checkout on NFS 2020-11-23 23:37 ` Geert Jansen @ 2020-11-24 12:58 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 10+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2020-11-24 12:58 UTC (permalink / raw) To: Geert Jansen; +Cc: Matheus Tavares, git, Derrick Stolee On Tue, Nov 24 2020, Geert Jansen wrote: > Hi Ævar, > > On Thu, Nov 19, 2020 at 10:01:07AM +0100, Ævar Arnfjörð Bjarmason wrote: >> >> > The major downside is that detecting the file system type is quite >> > platform-dependent, so there is no simple and portable solution. (Also, >> > I'm not sure if the optimal number of workers would be the same on >> > different OSes). But we decided to give it a try, so this is a >> > rough prototype that would work for Linux: >> > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9 >> >> I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs() >> ? x : y" as a stopgap. >> >> I do think we should be thinking about a sustainable way of doing this >> sort of thing, this method of testing once and hardcoding something >> isn't a good approach. >> >> It doesn't anticipate all sorts of different setups, e.g. in this case >> NFS is not a FS, but a protocol, there's probably going to be some >> implementations where parallel is much worse due to a quirk of the >> implementation. >> >> I think integrating an optimization run with the relatively new >> git-maintenance is a better way forward. >> >> You'd configure e.g.: >> >> maintenance.performanceTests.enabled=true >> maintenance.performanceTests.writeConfig=true >> >> Which would run e.g.: >> >> git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false) >> git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers) >> >> Such an implementation can be really basic at first, or even just punt >> on the test and use your current "is it NFS?" check. >> >> But I think we should be moving to some helper that does the actual test >> locally when asked/configured by the user, so we're not making a bunch >> of guesses in advance about the size/shape of the repository, OS/nfs/fs >> etc. > > I like this idea as something that will give the best configuration for a given > repository. I also know from working with customers for a long time that most > users will use the default settings for almost any application, and that > default configurations therefore matter a lot. > > The ideal experience would be, in my view, that a clone or checkout would > automatically benefit from parallel checkout, even if this is the first > checkout into a new repository. > > Maybe both ideas could be combined? We could have some reasonable heuristic > based on file system type (and maybe number of CPUs) that gives most of the > benefits of paralell checkout, while still being a reasonable compromise that > that works across different NFS servers and file systems. Power users that want > more aggressive tuning could run the maintenance command that measures file > system performance and comes up with an optimal value for checkout.workers. Yeah, I'm not opposed to it in the least. I just think as a practical matter it may become a non-issue if we had something like maintenance.performanceTests.* Because we eventually run a "gc/maintenance", and there we detach from the terminal, so we can run something like a find_optimal_nr_threads() without keeping the user waiting. If the only reason we had a find_if_nfs_and_nr_cores_to_guess_nr_threads() was because the more general find_optimal_nr_threads() took a tad too long when run interactively then changing where/how it's run would make the find_if_nfs_and_nr_cores_to_guess_nr_threads() codepath unnecessary. The "on clone" case is something we have in general with other speedups & sane defaults. E.g. in making the commit-graph. I haven't kept up with the latest state of that, but there was work/discussions on generating that there too in a smart way. E.g. you clone & we either make it or fork to the background and generate it. So in practice the user cloning a big repo has sane performance right away or really soon. But yeah, fully agreed on that we should ship sane defaults when possible. I do think (and I may be wrong here) that in particular with performance options it's more acceptable to not try as hard by default. A lot of them don't matter except for some really large repos, and users working with those tend to be part of some established ecosystem, where e.g. you clone chromium.git and a script could set up your custom config for you. But maybe there's common cases I'm not aware of where that assumption doesn't hold, e.g. (I'm just guessing here) people cloning arbitrary repos on some VM on Amazon that happens to use NFS, and then wondering why things are slow. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-11-24 13:00 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares 2020-11-16 15:19 ` Jeff Hostetler 2020-11-19 4:01 ` Matheus Tavares 2020-11-19 14:04 ` Jeff Hostetler 2020-11-20 12:10 ` Ævar Arnfjörð Bjarmason 2020-11-23 23:18 ` Geert Jansen 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason 2020-11-19 14:11 ` Jeff Hostetler 2020-11-23 23:37 ` Geert Jansen 2020-11-24 12:58 ` Ævar Arnfjörð Bjarmason
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).