git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Geert Jansen <gerardu@amazon.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Matheus Tavares <matheus.bernardino@usp.br>,
	<git@vger.kernel.org>, Derrick Stolee <stolee@gmail.com>
Subject: Re: RFC: auto-enabling parallel-checkout on NFS
Date: Mon, 23 Nov 2020 23:37:35 +0000	[thread overview]
Message-ID: <20201123233735.GB28189@dev-dsk-gerardu-1d-54592b62.us-east-1.amazon.com> (raw)
In-Reply-To: <87y2ixpvos.fsf@evledraar.gmail.com>

Hi Ævar,

On Thu, Nov 19, 2020 at 10:01:07AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> > The major downside is that detecting the file system type is quite
> > platform-dependent, so there is no simple and portable solution. (Also,
> > I'm not sure if the optimal number of workers would be the same on
> > different OSes). But we decided to give it a try, so this is a
> > rough prototype that would work for Linux:
> > https://github.com/matheustavares/git/commit/2e2c787e2a1742fed8c35dba185b7cd208603de9
> 
> I'm not intrinsically opposed to hardcoding some "nr_threads = is_nfs()
> ? x : y" as a stopgap.
> 
> I do think we should be thinking about a sustainable way of doing this
> sort of thing, this method of testing once and hardcoding something
> isn't a good approach.
> 
> It doesn't anticipate all sorts of different setups, e.g. in this case
> NFS is not a FS, but a protocol, there's probably going to be some
> implementations where parallel is much worse due to a quirk of the
> implementation.
> 
> I think integrating an optimization run with the relatively new
> git-maintenance is a better way forward.
> 
> You'd configure e.g.:
> 
>     maintenance.performanceTests.enabled=true
>     maintenance.performanceTests.writeConfig=true
> 
> Which would run e.g.:
> 
>     git config --type bool core.untrackedCache $(git update-index --test-untracked-cache && echo true || echo false)
>     git config checkout.workers $(git maintenance--helper auto-discover-config checkout.workers)
> 
> Such an implementation can be really basic at first, or even just punt
> on the test and use your current "is it NFS?" check.
> 
> But I think we should be moving to some helper that does the actual test
> locally when asked/configured by the user, so we're not making a bunch
> of guesses in advance about the size/shape of the repository, OS/nfs/fs
> etc.

I like this idea as something that will give the best configuration for a given
repository. I also know from working with customers for a long time that most
users will use the default settings for almost any application, and that
default configurations therefore matter a lot.

The ideal experience would be, in my view, that a clone or checkout would
automatically benefit from parallel checkout, even if this is the first
checkout into a new repository.

Maybe both ideas could be combined? We could have some reasonable heuristic
based on file system type (and maybe number of CPUs) that gives most of the
benefits of paralell checkout, while still being a reasonable compromise that
that works across different NFS servers and file systems. Power users that want
more aggressive tuning could run the maintenance command that measures file
system performance and comes up with an optimal value for checkout.workers.

Regards,
Geert

  parent reply	other threads:[~2020-11-23 23:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-15 19:43 RFC: auto-enabling parallel-checkout on NFS Matheus Tavares
2020-11-16 15:19 ` Jeff Hostetler
2020-11-19  4:01   ` Matheus Tavares
2020-11-19 14:04     ` Jeff Hostetler
2020-11-20 12:10       ` Ævar Arnfjörð Bjarmason
2020-11-23 23:18       ` Geert Jansen
2020-11-19  9:01 ` Ævar Arnfjörð Bjarmason
2020-11-19 14:11   ` Jeff Hostetler
2020-11-23 23:37   ` Geert Jansen [this message]
2020-11-24 12:58     ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201123233735.GB28189@dev-dsk-gerardu-1d-54592b62.us-east-1.amazon.com \
    --to=gerardu@amazon.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).