From: "Ævar Arnfjörð Bjarmason" <email@example.com> To: Geert Jansen <firstname.lastname@example.org> Cc: Matheus Tavares <email@example.com>, firstname.lastname@example.org, Derrick Stolee <email@example.com> Subject: Re: RFC: auto-enabling parallel-checkout on NFS Date: Tue, 24 Nov 2020 13:58:31 +0100 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <20201123233735.GB28189@dev-dsk-gerardu-1d-54592b62.us-east-1.amazon.com> On Tue, Nov 24 2020, Geert Jansen wrote: > Hi Ævar, > > On Thu, Nov 19, 2020 at 10:01:07AM +0100, Ævar Arnfjörð Bjarmason wrote: >> >> > The major downside is that detecting the file system type is quite >> > platform-dependent, so there is no simple and portable solution. (Also, >> > I'm not sure if the optimal number of workers would be the same on >> > different OSes). But we decided to give it a try, so this is a >> > rough prototype that would work for Linux: >> > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9 >> >> I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs() >> ? x : y" as a stopgap. >> >> I do think we should be thinking about a sustainable way of doing this >> sort of thing, this method of testing once and hardcoding something >> isn't a good approach. >> >> It doesn't anticipate all sorts of different setups, e.g. in this case >> NFS is not a FS, but a protocol, there's probably going to be some >> implementations where parallel is much worse due to a quirk of the >> implementation. >> >> I think integrating an optimization run with the relatively new >> git-maintenance is a better way forward. >> >> You'd configure e.g.: >> >> maintenance.performanceTests.enabled=true >> maintenance.performanceTests.writeConfig=true >> >> Which would run e.g.: >> >> git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false) >> git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers) >> >> Such an implementation can be really basic at first, or even just punt >> on the test and use your current "is it NFS?" check. >> >> But I think we should be moving to some helper that does the actual test >> locally when asked/configured by the user, so we're not making a bunch >> of guesses in advance about the size/shape of the repository, OS/nfs/fs >> etc. > > I like this idea as something that will give the best configuration for a given > repository. I also know from working with customers for a long time that most > users will use the default settings for almost any application, and that > default configurations therefore matter a lot. > > The ideal experience would be, in my view, that a clone or checkout would > automatically benefit from parallel checkout, even if this is the first > checkout into a new repository. > > Maybe both ideas could be combined? We could have some reasonable heuristic > based on file system type (and maybe number of CPUs) that gives most of the > benefits of paralell checkout, while still being a reasonable compromise that > that works across different NFS servers and file systems. Power users that want > more aggressive tuning could run the maintenance command that measures file > system performance and comes up with an optimal value for checkout.workers. Yeah, I'm not opposed to it in the least. I just think as a practical matter it may become a non-issue if we had something like maintenance.performanceTests.* Because we eventually run a "gc/maintenance", and there we detach from the terminal, so we can run something like a find_optimal_nr_threads() without keeping the user waiting. If the only reason we had a find_if_nfs_and_nr_cores_to_guess_nr_threads() was because the more general find_optimal_nr_threads() took a tad too long when run interactively then changing where/how it's run would make the find_if_nfs_and_nr_cores_to_guess_nr_threads() codepath unnecessary. The "on clone" case is something we have in general with other speedups & sane defaults. E.g. in making the commit-graph. I haven't kept up with the latest state of that, but there was work/discussions on generating that there too in a smart way. E.g. you clone & we either make it or fork to the background and generate it. So in practice the user cloning a big repo has sane performance right away or really soon. But yeah, fully agreed on that we should ship sane defaults when possible. I do think (and I may be wrong here) that in particular with performance options it's more acceptable to not try as hard by default. A lot of them don't matter except for some really large repos, and users working with those tend to be part of some established ecosystem, where e.g. you clone chromium.git and a script could set up your custom config for you. But maybe there's common cases I'm not aware of where that assumption doesn't hold, e.g. (I'm just guessing here) people cloning arbitrary repos on some VM on Amazon that happens to use NFS, and then wondering why things are slow.
prev parent reply other threads:[~2020-11-24 13:00 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-15 19:43 Matheus Tavares 2020-11-16 15:19 ` Jeff Hostetler 2020-11-19 4:01 ` Matheus Tavares 2020-11-19 14:04 ` Jeff Hostetler 2020-11-20 12:10 ` Ævar Arnfjörð Bjarmason 2020-11-23 23:18 ` Geert Jansen 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason 2020-11-19 14:11 ` Jeff Hostetler 2020-11-23 23:37 ` Geert Jansen 2020-11-24 12:58 ` Ævar Arnfjörð Bjarmason [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: RFC: auto-enabling parallel-checkout on NFS' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).