git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* RFC: auto-enabling parallel-checkout on NFS
@ 2020-11-15 19:43 Matheus Tavares
  2020-11-16 15:19 ` Jeff Hostetler
  2020-11-19  9:01 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 10+ messages in thread
From: Matheus Tavares @ 2020-11-15 19:43 UTC (permalink / raw)
  To: git; +Cc: gerardu

Hi, everyone

I've been testing the parallel checkout code on some different machines,
to benchmark its performance against the sequential version. As
discussed in [1], the biggest speedups, on Linux, are usually seen on
SSDs and NFS volumes. (I haven't got the chance to benchmark on
Windows or OSX yet.)

Regarding NFS, I got some 2~3.4x speedups even when the NFS client and
server were both running on single-core machines. Here are some runtimes
for a linux-v5.8 clone (means of 5 cold-cache executions):

    nfs 3.0              nfs 4.0              nfs 4.1
1:  183.708 s ± 3.290 s  205.851 s ± 0.844 s  217.317 s ± 3.047 s
2:  130.510 s ± 3.917 s  139.124 s ± 0.772 s  142.963 s ± 0.765 s
4:   89.611 s ± 1.032 s  102.701 s ± 1.665 s   94.728 s ± 1.014 s
8:   68.097 s ± 0.820 s  104.914 s ± 1.239 s   69.359 s ± 0.619 s
16:  63.999 s ± 0.820 s  104.808 s ± 2.279 s   64.843 s ± 0.587 s
32:  62.316 s ± 2.095 s  102.105 s ± 1.537 s   64.122 s ± 0.374 s
64:  63.699 s ± 0.841 s  103.103 s ± 1.319 s   63.532 s ± 0.734 s

The parallel version was also faster for some smaller checkouts. For
example, the following numbers come from a bat-v0.16.0 clone
(251 files, ~3MB):

     nfs 3.0             nfs 4.0             nfs 4.1
1:   0.853 s ± 0.080 s   0.814 s ± 0.020 s   0.876 s ± 0.065 s
2:   0.671 s ± 0.020 s   0.702 s ± 0.030 s   0.705 s ± 0.030 s
4:   0.530 s ± 0.024 s   0.595 s ± 0.020 s   0.570 s ± 0.030 s
8:   0.470 s ± 0.033 s   0.609 s ± 0.025 s   0.510 s ± 0.031 s
16:  0.469 s ± 0.037 s   0.616 s ± 0.022 s   0.513 s ± 0.030 s
32:  0.487 s ± 0.030 s   0.639 s ± 0.018 s   0.527 s ± 0.028 s
64:  0.520 s ± 0.022 s   0.680 s ± 0.028 s   0.562 s ± 0.026 s

While looking at these numbers with Geert (in CC), he had the idea that
we could try to detect when the checkout path is within an NFS mount,
and auto-enable paralellism for this case. This way, users in NFS would
get the best performance by default. And it seems that using ~16 workers
would produce good results regardless of the NFS version that they might
be running.

The major downside is that detecting the file system type is quite
platform-dependent, so there is no simple and portable solution. (Also,
I'm not sure if the optimal number of workers would be the same on
different OSes). But we decided to give it a try, so this is a
rough prototype that would work for Linux:
https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9

Any thoughts on this idea? Or alternative suggestions?

Thanks,
Matheus

[1]: https://lore.kernel.org/git/815137685ac3e41444201316f537db9797dcacd2.1604521276.git.matheus.bernardino@usp.br/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares
@ 2020-11-16 15:19 ` Jeff Hostetler
  2020-11-19  4:01   ` Matheus Tavares
  2020-11-19  9:01 ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff Hostetler @ 2020-11-16 15:19 UTC (permalink / raw)
  To: Matheus Tavares, git; +Cc: gerardu



On 11/15/20 2:43 PM, Matheus Tavares wrote:
> Hi, everyone
> 
> I've been testing the parallel checkout code on some different machines,
> to benchmark its performance against the sequential version. As
> discussed in [1], the biggest speedups, on Linux, are usually seen on
> SSDs and NFS volumes. (I haven't got the chance to benchmark on
> Windows or OSX yet.)
> 
> Regarding NFS, I got some 2~3.4x speedups even when the NFS client and
> server were both running on single-core machines. Here are some runtimes
> for a linux-v5.8 clone (means of 5 cold-cache executions):
> 
>      nfs 3.0              nfs 4.0              nfs 4.1
> 1:  183.708 s ± 3.290 s  205.851 s ± 0.844 s  217.317 s ± 3.047 s
> 2:  130.510 s ± 3.917 s  139.124 s ± 0.772 s  142.963 s ± 0.765 s
> 4:   89.611 s ± 1.032 s  102.701 s ± 1.665 s   94.728 s ± 1.014 s
> 8:   68.097 s ± 0.820 s  104.914 s ± 1.239 s   69.359 s ± 0.619 s
> 16:  63.999 s ± 0.820 s  104.808 s ± 2.279 s   64.843 s ± 0.587 s
> 32:  62.316 s ± 2.095 s  102.105 s ± 1.537 s   64.122 s ± 0.374 s
> 64:  63.699 s ± 0.841 s  103.103 s ± 1.319 s   63.532 s ± 0.734 s
> 
> The parallel version was also faster for some smaller checkouts. For
> example, the following numbers come from a bat-v0.16.0 clone
> (251 files, ~3MB):
> 
>       nfs 3.0             nfs 4.0             nfs 4.1
> 1:   0.853 s ± 0.080 s   0.814 s ± 0.020 s   0.876 s ± 0.065 s
> 2:   0.671 s ± 0.020 s   0.702 s ± 0.030 s   0.705 s ± 0.030 s
> 4:   0.530 s ± 0.024 s   0.595 s ± 0.020 s   0.570 s ± 0.030 s
> 8:   0.470 s ± 0.033 s   0.609 s ± 0.025 s   0.510 s ± 0.031 s
> 16:  0.469 s ± 0.037 s   0.616 s ± 0.022 s   0.513 s ± 0.030 s
> 32:  0.487 s ± 0.030 s   0.639 s ± 0.018 s   0.527 s ± 0.028 s
> 64:  0.520 s ± 0.022 s   0.680 s ± 0.028 s   0.562 s ± 0.026 s
> 
> While looking at these numbers with Geert (in CC), he had the idea that
> we could try to detect when the checkout path is within an NFS mount,
> and auto-enable paralellism for this case. This way, users in NFS would
> get the best performance by default. And it seems that using ~16 workers
> would produce good results regardless of the NFS version that they might
> be running.
> 
> The major downside is that detecting the file system type is quite
> platform-dependent, so there is no simple and portable solution. (Also,
> I'm not sure if the optimal number of workers would be the same on
> different OSes). But we decided to give it a try, so this is a
> rough prototype that would work for Linux:
> https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9
> 
> Any thoughts on this idea? Or alternative suggestions?
> 
> Thanks,
> Matheus
> 
> [1]: https://lore.kernel.org/git/815137685ac3e41444201316f537db9797dcacd2.1604521276.git.matheus.bernardino@usp.br/
> 

I can't really speak to NFS performance, but I have to wonder if there's
not something else affecting the results -- 4 and/or 8 core results are
better than 16+ results in some columns.  And we get diminishing returns
after ~16.

I'm wondering if during these test runs, you were IO vs CPU bound and if
VM was a problem.  I'm wondering if setting thread affinity would help
here.

Jeff


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-16 15:19 ` Jeff Hostetler
@ 2020-11-19  4:01   ` Matheus Tavares
  2020-11-19 14:04     ` Jeff Hostetler
  0 siblings, 1 reply; 10+ messages in thread
From: Matheus Tavares @ 2020-11-19  4:01 UTC (permalink / raw)
  To: git; +Cc: gerardu, git, matheus.bernardino

Hi, Jeff

On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> I can't really speak to NFS performance, but I have to wonder if there's
> not something else affecting the results -- 4 and/or 8 core results are
> better than 16+ results in some columns.  And we get diminishing returns
> after ~16.

Yeah, that's a good point. I'm not sure yet what's causing the
diminishing returns, but Geert and I are investigating. Maybe we are
hitting some limit for parallelism in this scenario.

> I'm wondering if during these test runs, you were IO vs CPU bound and if
> VM was a problem.

I would say we are more IO bound during these tests. While a sequential
linux-v5.8 checkout usually uses 100% of one core in my laptop's SSD,
in this setup, it only used 5% to 10%. And even with 64 workers (on a
single core), CPU usage stays around 60% most of the time.

About memory, the peak PSS was around 1.75GB, with 64 workers, and the
machine has 10GB of RAM. But are there other numbers that I should keep
an eye on while running the test?

> I'm wondering if setting thread affinity would help here.

Hmm, I only had one core online during the benchmark, so I think thread
affinity wouldn't impact the runtime.

Thanks,
Matheus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares
  2020-11-16 15:19 ` Jeff Hostetler
@ 2020-11-19  9:01 ` Ævar Arnfjörð Bjarmason
  2020-11-19 14:11   ` Jeff Hostetler
  2020-11-23 23:37   ` Geert Jansen
  1 sibling, 2 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-11-19  9:01 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, gerardu, Derrick Stolee


> The major downside is that detecting the file system type is quite
> platform-dependent, so there is no simple and portable solution. (Also,
> I'm not sure if the optimal number of workers would be the same on
> different OSes). But we decided to give it a try, so this is a
> rough prototype that would work for Linux:
> https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9

I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs()
? x : y" as a stopgap.

I do think we should be thinking about a sustainable way of doing this
sort of thing, this method of testing once and hardcoding something
isn't a good approach.

It doesn't anticipate all sorts of different setups, e.g. in this case
NFS is not a FS, but a protocol, there's probably going to be some
implementations where parallel is much worse due to a quirk of the
implementation.

I think integrating an optimization run with the relatively new
git-maintenance is a better way forward.

You'd configure e.g.:

    maintenance.performanceTests.enabled=true
    maintenance.performanceTests.writeConfig=true

Which would run e.g.:

    git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false)
    git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers)

Such an implementation can be really basic at first, or even just punt
on the test and use your current "is it NFS?" check.

But I think we should be moving to some helper that does the actual test
locally when asked/configured by the user, so we're not making a bunch
of guesses in advance about the size/shape of the repository, OS/nfs/fs
etc.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-19  4:01   ` Matheus Tavares
@ 2020-11-19 14:04     ` Jeff Hostetler
  2020-11-20 12:10       ` Ævar Arnfjörð Bjarmason
  2020-11-23 23:18       ` Geert Jansen
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff Hostetler @ 2020-11-19 14:04 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: gerardu, git



On 11/18/20 11:01 PM, Matheus Tavares wrote:
> Hi, Jeff
> 
> On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> I can't really speak to NFS performance, but I have to wonder if there's
>> not something else affecting the results -- 4 and/or 8 core results are
>> better than 16+ results in some columns.  And we get diminishing returns
>> after ~16.
> 
> Yeah, that's a good point. I'm not sure yet what's causing the
> diminishing returns, but Geert and I are investigating. Maybe we are
> hitting some limit for parallelism in this scenario.

I seem to recall back when I was working on this problem that
the unzip of each blob was a major pain point.  Combine this
long delta-chains and each worker would need multiple rounds of
read/memmap, unzip, and de-delta before it had the complete blob
and could then smudge and write.

This makes me wonder if repacking the repo with shorter delta-chains
affects the checkout times.  And improves the perf when there are
more workers.  I'm not saying that this is a solution, but rather
an experiment to see if it changes anything and maybe adjust our
focus.

> 
>> I'm wondering if during these test runs, you were IO vs CPU bound and if
>> VM was a problem.
> 
> I would say we are more IO bound during these tests. While a sequential
> linux-v5.8 checkout usually uses 100% of one core in my laptop's SSD,
> in this setup, it only used 5% to 10%. And even with 64 workers (on a
> single core), CPU usage stays around 60% most of the time.
> 
> About memory, the peak PSS was around 1.75GB, with 64 workers, and the
> machine has 10GB of RAM. But are there other numbers that I should keep
> an eye on while running the test?
> 
>> I'm wondering if setting thread affinity would help here.
> 
> Hmm, I only had one core online during the benchmark, so I think thread
> affinity wouldn't impact the runtime.

I wasn't really thinking about the 64 workers on 1 core case.  I was
more thinking about the 64 workers on 64 cores case and wondering
if workers were being randomly bounced from core to core and we were
thrashing.

> 
> Thanks,
> Matheus
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-19  9:01 ` Ævar Arnfjörð Bjarmason
@ 2020-11-19 14:11   ` Jeff Hostetler
  2020-11-23 23:37   ` Geert Jansen
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff Hostetler @ 2020-11-19 14:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Matheus Tavares
  Cc: git, gerardu, Derrick Stolee



On 11/19/20 4:01 AM, Ævar Arnfjörð Bjarmason wrote:
> 
>> The major downside is that detecting the file system type is quite
>> platform-dependent, so there is no simple and portable solution. (Also,
>> I'm not sure if the optimal number of workers would be the same on
>> different OSes). But we decided to give it a try, so this is a
>> rough prototype that would work for Linux:
>> https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9
> 
> I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs()
> ? x : y" as a stopgap.
> 
> I do think we should be thinking about a sustainable way of doing this
> sort of thing, this method of testing once and hardcoding something
> isn't a good approach.
> 
> It doesn't anticipate all sorts of different setups, e.g. in this case
> NFS is not a FS, but a protocol, there's probably going to be some
> implementations where parallel is much worse due to a quirk of the
> implementation.
> 
> I think integrating an optimization run with the relatively new
> git-maintenance is a better way forward.
> 
> You'd configure e.g.:
> 
>      maintenance.performanceTests.enabled=true
>      maintenance.performanceTests.writeConfig=true
> 
> Which would run e.g.:
> 
>      git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false)
>      git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers)
> 
> Such an implementation can be really basic at first, or even just punt
> on the test and use your current "is it NFS?" check.
> 
> But I think we should be moving to some helper that does the actual test
> locally when asked/configured by the user, so we're not making a bunch
> of guesses in advance about the size/shape of the repository, OS/nfs/fs
> etc.
> 

I kinda like this idea.  It would give us a chance to let maintenance
periodically probe the repo/system and improve some of these tuning
knobs.

Jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-19 14:04     ` Jeff Hostetler
@ 2020-11-20 12:10       ` Ævar Arnfjörð Bjarmason
  2020-11-23 23:18       ` Geert Jansen
  1 sibling, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-11-20 12:10 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Matheus Tavares, gerardu, git


On Thu, Nov 19 2020, Jeff Hostetler wrote:

> On 11/18/20 11:01 PM, Matheus Tavares wrote:
>> Hi, Jeff
>>
>> On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>>
>>> I can't really speak to NFS performance, but I have to wonder if there's
>>> not something else affecting the results -- 4 and/or 8 core results are
>>> better than 16+ results in some columns.  And we get diminishing returns
>>> after ~16.
>>
>> Yeah, that's a good point. I'm not sure yet what's causing the
>> diminishing returns, but Geert and I are investigating. Maybe we are
>> hitting some limit for parallelism in this scenario.
>
> I seem to recall back when I was working on this problem that
> the unzip of each blob was a major pain point.  Combine this
> long delta-chains and each worker would need multiple rounds of
> read/memmap, unzip, and de-delta before it had the complete blob
> and could then smudge and write.
>
> This makes me wonder if repacking the repo with shorter delta-chains
> affects the checkout times.  And improves the perf when there are
> more workers.  I'm not saying that this is a solution, but rather
> an experiment to see if it changes anything and maybe adjust our
> focus.

I've had part success with "git gc --keep-largest-pack" /
gc.bigPackThreshold=N where N is at least the size you get from a fresh
"git clone" when on NFS.

It has the effect of essentially implementing a version of what you're
suggesting, but in an arguably better way. Your initial clone will have
whatever depth of chains you have, but all new objects pulled down will
go into new packs/objects that won't share chains with that old big
pack.

So your repository will be bigger overall, but your old/new pack/pack(s)
will eventually come to mostly reflect a cold/hot object storage. So
what you need from a pack is more likely to already have been fetched
into the FS cache, and over an NFS mount those requests may have been
pre-fetched/fetched already. You can also more effectively warm the
local OS cache by cat-ing >/dev/null the pack-files that aren't the big
large pack on-login or whatever.

>>
>>> I'm wondering if during these test runs, you were IO vs CPU bound and if
>>> VM was a problem.
>>
>> I would say we are more IO bound during these tests. While a sequential
>> linux-v5.8 checkout usually uses 100% of one core in my laptop's SSD,
>> in this setup, it only used 5% to 10%. And even with 64 workers (on a
>> single core), CPU usage stays around 60% most of the time.
>>
>> About memory, the peak PSS was around 1.75GB, with 64 workers, and the
>> machine has 10GB of RAM. But are there other numbers that I should keep
>> an eye on while running the test?
>>
>>> I'm wondering if setting thread affinity would help here.
>>
>> Hmm, I only had one core online during the benchmark, so I think thread
>> affinity wouldn't impact the runtime.
>
> I wasn't really thinking about the 64 workers on 1 core case.  I was
> more thinking about the 64 workers on 64 cores case and wondering
> if workers were being randomly bounced from core to core and we were
> thrashing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-19 14:04     ` Jeff Hostetler
  2020-11-20 12:10       ` Ævar Arnfjörð Bjarmason
@ 2020-11-23 23:18       ` Geert Jansen
  1 sibling, 0 replies; 10+ messages in thread
From: Geert Jansen @ 2020-11-23 23:18 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Matheus Tavares, git

Hi Jeff,

On Thu, Nov 19, 2020 at 09:04:34AM -0500, Jeff Hostetler wrote:

> On 11/18/20 11:01 PM, Matheus Tavares wrote:
> >
> >On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> >>
> >>I can't really speak to NFS performance, but I have to wonder if there's
> >>not something else affecting the results -- 4 and/or 8 core results are
> >>better than 16+ results in some columns.  And we get diminishing returns
> >>after ~16.
> >
> >Yeah, that's a good point. I'm not sure yet what's causing the
> >diminishing returns, but Geert and I are investigating. Maybe we are
> >hitting some limit for parallelism in this scenario.
> 
> I seem to recall back when I was working on this problem that
> the unzip of each blob was a major pain point.  Combine this
> long delta-chains and each worker would need multiple rounds of
> read/memmap, unzip, and de-delta before it had the complete blob
> and could then smudge and write.

I think that there are two cases here:

1) (CPU bound case) On local machines with multiple cores and SSD disks,
   checkout is CPU bound and the parallel checkout works because the unzipping
   can now run on multiple CPUs in parallel. Shorter chains would use less CPU
   time and we'd see a smilar benefit on both paralell and sequential checkout.

2) (IO bound case) On networked file systems, file system IO is pretty much
   always the bottleneck for git and similar applications that use small files.
   On NFS calling open() is always a round trip, and so is close() (in the
   absence of delegations and O_CREAT). The latency of these calls depends on
   the NFS server and network distance, but 1ms is a reasonable order of
   magnitude when thinking about this. Beause this 1ms is a lot more than the
   typical CPU time to process a single blob, checkout will be IO bound.
   Parallel checkout works by allowing the application to maintain an IO depth
   > 1 for these workloads, which amortizes the network latency over multiple
   requests.

Regarding the diminishing returns: I did some initial analysis of Mattheus'
code and I'm not sure yet. I see the code achieving a high IO depth in our
server logs, which would indicate that the diminishing returns are caused by
file system contention. This would have to be some kind of general contention
since it happens both on NFS and EFS. I will do a deeper investigation on this
and will report what I find.

Best regards,
Geert

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-19  9:01 ` Ævar Arnfjörð Bjarmason
  2020-11-19 14:11   ` Jeff Hostetler
@ 2020-11-23 23:37   ` Geert Jansen
  2020-11-24 12:58     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 10+ messages in thread
From: Geert Jansen @ 2020-11-23 23:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Matheus Tavares, git, Derrick Stolee

Hi Ævar,

On Thu, Nov 19, 2020 at 10:01:07AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> > The major downside is that detecting the file system type is quite
> > platform-dependent, so there is no simple and portable solution. (Also,
> > I'm not sure if the optimal number of workers would be the same on
> > different OSes). But we decided to give it a try, so this is a
> > rough prototype that would work for Linux:
> > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9
> 
> I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs()
> ? x : y" as a stopgap.
> 
> I do think we should be thinking about a sustainable way of doing this
> sort of thing, this method of testing once and hardcoding something
> isn't a good approach.
> 
> It doesn't anticipate all sorts of different setups, e.g. in this case
> NFS is not a FS, but a protocol, there's probably going to be some
> implementations where parallel is much worse due to a quirk of the
> implementation.
> 
> I think integrating an optimization run with the relatively new
> git-maintenance is a better way forward.
> 
> You'd configure e.g.:
> 
>     maintenance.performanceTests.enabled=true
>     maintenance.performanceTests.writeConfig=true
> 
> Which would run e.g.:
> 
>     git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false)
>     git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers)
> 
> Such an implementation can be really basic at first, or even just punt
> on the test and use your current "is it NFS?" check.
> 
> But I think we should be moving to some helper that does the actual test
> locally when asked/configured by the user, so we're not making a bunch
> of guesses in advance about the size/shape of the repository, OS/nfs/fs
> etc.

I like this idea as something that will give the best configuration for a given
repository. I also know from working with customers for a long time that most
users will use the default settings for almost any application, and that
default configurations therefore matter a lot.

The ideal experience would be, in my view, that a clone or checkout would
automatically benefit from parallel checkout, even if this is the first
checkout into a new repository.

Maybe both ideas could be combined? We could have some reasonable heuristic
based on file system type (and maybe number of CPUs) that gives most of the
benefits of paralell checkout, while still being a reasonable compromise that
that works across different NFS servers and file systems. Power users that want
more aggressive tuning could run the maintenance command that measures file
system performance and comes up with an optimal value for checkout.workers.

Regards,
Geert

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RFC: auto-enabling parallel-checkout on NFS
  2020-11-23 23:37   ` Geert Jansen
@ 2020-11-24 12:58     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-11-24 12:58 UTC (permalink / raw)
  To: Geert Jansen; +Cc: Matheus Tavares, git, Derrick Stolee


On Tue, Nov 24 2020, Geert Jansen wrote:

> Hi Ævar,
>
> On Thu, Nov 19, 2020 at 10:01:07AM +0100, Ævar Arnfjörð Bjarmason wrote:
>> 
>> > The major downside is that detecting the file system type is quite
>> > platform-dependent, so there is no simple and portable solution. (Also,
>> > I'm not sure if the optimal number of workers would be the same on
>> > different OSes). But we decided to give it a try, so this is a
>> > rough prototype that would work for Linux:
>> > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9
>> 
>> I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs()
>> ? x : y" as a stopgap.
>> 
>> I do think we should be thinking about a sustainable way of doing this
>> sort of thing, this method of testing once and hardcoding something
>> isn't a good approach.
>> 
>> It doesn't anticipate all sorts of different setups, e.g. in this case
>> NFS is not a FS, but a protocol, there's probably going to be some
>> implementations where parallel is much worse due to a quirk of the
>> implementation.
>> 
>> I think integrating an optimization run with the relatively new
>> git-maintenance is a better way forward.
>> 
>> You'd configure e.g.:
>> 
>>     maintenance.performanceTests.enabled=true
>>     maintenance.performanceTests.writeConfig=true
>> 
>> Which would run e.g.:
>> 
>>     git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false)
>>     git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers)
>> 
>> Such an implementation can be really basic at first, or even just punt
>> on the test and use your current "is it NFS?" check.
>> 
>> But I think we should be moving to some helper that does the actual test
>> locally when asked/configured by the user, so we're not making a bunch
>> of guesses in advance about the size/shape of the repository, OS/nfs/fs
>> etc.
>
> I like this idea as something that will give the best configuration for a given
> repository. I also know from working with customers for a long time that most
> users will use the default settings for almost any application, and that
> default configurations therefore matter a lot.
>
> The ideal experience would be, in my view, that a clone or checkout would
> automatically benefit from parallel checkout, even if this is the first
> checkout into a new repository.
>
> Maybe both ideas could be combined? We could have some reasonable heuristic
> based on file system type (and maybe number of CPUs) that gives most of the
> benefits of paralell checkout, while still being a reasonable compromise that
> that works across different NFS servers and file systems. Power users that want
> more aggressive tuning could run the maintenance command that measures file
> system performance and comes up with an optimal value for checkout.workers.

Yeah, I'm not opposed to it in the least. I just think as a practical
matter it may become a non-issue if we had something like
maintenance.performanceTests.*

Because we eventually run a "gc/maintenance", and there we detach from
the terminal, so we can run something like a find_optimal_nr_threads()
without keeping the user waiting.

If the only reason we had a
find_if_nfs_and_nr_cores_to_guess_nr_threads() was because the more
general find_optimal_nr_threads() took a tad too long when run
interactively then changing where/how it's run would make the
find_if_nfs_and_nr_cores_to_guess_nr_threads() codepath unnecessary.

The "on clone" case is something we have in general with other speedups
& sane defaults. E.g. in making the commit-graph.

I haven't kept up with the latest state of that, but there was
work/discussions on generating that there too in a smart way. E.g. you
clone & we either make it or fork to the background and generate it. So
in practice the user cloning a big repo has sane performance right away
or really soon.

But yeah, fully agreed on that we should ship sane defaults when
possible.

I do think (and I may be wrong here) that in particular with performance
options it's more acceptable to not try as hard by default. A lot of
them don't matter except for some really large repos, and users working
with those tend to be part of some established ecosystem, where e.g. you
clone chromium.git and a script could set up your custom config for you.

But maybe there's common cases I'm not aware of where that assumption
doesn't hold, e.g. (I'm just guessing here) people cloning arbitrary
repos on some VM on Amazon that happens to use NFS, and then wondering
why things are slow.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-11-24 13:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares
2020-11-16 15:19 ` Jeff Hostetler
2020-11-19  4:01   ` Matheus Tavares
2020-11-19 14:04     ` Jeff Hostetler
2020-11-20 12:10       ` Ævar Arnfjörð Bjarmason
2020-11-23 23:18       ` Geert Jansen
2020-11-19  9:01 ` Ævar Arnfjörð Bjarmason
2020-11-19 14:11   ` Jeff Hostetler
2020-11-23 23:37   ` Geert Jansen
2020-11-24 12:58     ` Ævar Arnfjörð Bjarmason

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git