From: Geert Jansen <email@example.com> To: Jeff Hostetler <firstname.lastname@example.org> Cc: Matheus Tavares <email@example.com>, <firstname.lastname@example.org> Subject: Re: RFC: auto-enabling parallel-checkout on NFS Date: Mon, 23 Nov 2020 23:18:18 +0000 [thread overview] Message-ID: <20201123231817.GA28189@dev-dsk-gerardu-1d-54592b62.us-east-1.amazon.com> (raw) In-Reply-To: <email@example.com> Hi Jeff, On Thu, Nov 19, 2020 at 09:04:34AM -0500, Jeff Hostetler wrote: > On 11/18/20 11:01 PM, Matheus Tavares wrote: > > > >On Mon, Nov 16, 2020 at 12:19 PM Jeff Hostetler <firstname.lastname@example.org> wrote: > >> > >>I can't really speak to NFS performance, but I have to wonder if there's > >>not something else affecting the results -- 4 and/or 8 core results are > >>better than 16+ results in some columns. And we get diminishing returns > >>after ~16. > > > >Yeah, that's a good point. I'm not sure yet what's causing the > >diminishing returns, but Geert and I are investigating. Maybe we are > >hitting some limit for parallelism in this scenario. > > I seem to recall back when I was working on this problem that > the unzip of each blob was a major pain point. Combine this > long delta-chains and each worker would need multiple rounds of > read/memmap, unzip, and de-delta before it had the complete blob > and could then smudge and write. I think that there are two cases here: 1) (CPU bound case) On local machines with multiple cores and SSD disks, checkout is CPU bound and the parallel checkout works because the unzipping can now run on multiple CPUs in parallel. Shorter chains would use less CPU time and we'd see a smilar benefit on both paralell and sequential checkout. 2) (IO bound case) On networked file systems, file system IO is pretty much always the bottleneck for git and similar applications that use small files. On NFS calling open() is always a round trip, and so is close() (in the absence of delegations and O_CREAT). The latency of these calls depends on the NFS server and network distance, but 1ms is a reasonable order of magnitude when thinking about this. Beause this 1ms is a lot more than the typical CPU time to process a single blob, checkout will be IO bound. Parallel checkout works by allowing the application to maintain an IO depth > 1 for these workloads, which amortizes the network latency over multiple requests. Regarding the diminishing returns: I did some initial analysis of Mattheus' code and I'm not sure yet. I see the code achieving a high IO depth in our server logs, which would indicate that the diminishing returns are caused by file system contention. This would have to be some kind of general contention since it happens both on NFS and EFS. I will do a deeper investigation on this and will report what I find. Best regards, Geert
next prev parent reply other threads:[~2020-11-23 23:24 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-15 19:43 Matheus Tavares 2020-11-16 15:19 ` Jeff Hostetler 2020-11-19 4:01 ` Matheus Tavares 2020-11-19 14:04 ` Jeff Hostetler 2020-11-20 12:10 ` Ævar Arnfjörð Bjarmason 2020-11-23 23:18 ` Geert Jansen [this message] 2020-11-19 9:01 ` Ævar Arnfjörð Bjarmason 2020-11-19 14:11 ` Jeff Hostetler 2020-11-23 23:37 ` Geert Jansen 2020-11-24 12:58 ` Ævar Arnfjörð Bjarmason
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201123231817.GA28189@dev-dsk-gerardu-1d-54592b62.us-east-1.amazon.com \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: RFC: auto-enabling parallel-checkout on NFS' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).