git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: git@vger.kernel.org, gitster@pobox.com, git@jeffhostetler.com,
	chriscool@tuxfamily.org, peff@peff.net, newren@gmail.com,
	jrnieder@gmail.com, martin.agren@gmail.com
Subject: Re: [PATCH v4 11/19] parallel-checkout: make it truly parallel
Date: Wed, 16 Dec 2020 14:31:23 -0800	[thread overview]
Message-ID: <20201216223123.GH3783238@google.com> (raw)
In-Reply-To: <815137685ac3e41444201316f537db9797dcacd2.1604521276.git.matheus.bernardino@usp.br>

On Wed, Nov 04, 2020 at 05:33:10PM -0300, Matheus Tavares wrote:
> 
> Use multiple worker processes to distribute the queued entries and call
> write_checkout_item() in parallel for them. The items are distributed
> uniformly in contiguous chunks. This minimizes the chances of two
> workers writing to the same directory simultaneously, which could
> affect performance due to lock contention in the kernel. Work stealing
> (or any other format of re-distribution) is not implemented yet.
> 
> The parallel version was benchmarked during three operations in the
> linux repo, with cold cache: cloning v5.8, checking out v5.8 from
> v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The
> four tables below show the mean run times and standard deviations for
> 5 runs in: a local file system with SSD, a local file system with HDD, a
> Linux NFS server, and Amazon EFS. The numbers of workers were chosen
> based on what produces the best result for each case.
> 
> Local SSD:
> 
>             Clone                  Checkout I             Checkout II
> Sequential  8.171 s ± 0.206 s      8.735 s ± 0.230 s      4.166 s ± 0.246 s
> 10 workers  3.277 s ± 0.138 s      3.774 s ± 0.188 s      2.561 s ± 0.120 s
> Speedup     2.49 ± 0.12            2.31 ± 0.13            1.63 ± 0.12
> 
> Local HDD:
> 
>             Clone                  Checkout I             Checkout II
> Sequential  35.157 s ± 0.205 s     48.835 s ± 0.407 s     47.302 s ± 1.435 s
> 8 workers   35.538 s ± 0.325 s     49.353 s ± 0.826 s     48.919 s ± 0.416 s
> Speedup     0.99 ± 0.01            0.99 ± 0.02            0.97 ± 0.03
> 
> Linux NFS server (v4.1, on EBS, single availability zone):
> 
>             Clone                  Checkout I             Checkout II
> Sequential  216.070 s ± 3.611 s    211.169 s ± 3.147 s    57.446 s ± 1.301 s
> 32 workers  67.997 s ± 0.740 s     66.563 s ± 0.457 s     23.708 s ± 0.622 s
> Speedup     3.18 ± 0.06            3.17 ± 0.05            2.42 ± 0.08
> 
> EFS (v4.1, replicated over multiple availability zones):
> 
>             Clone                  Checkout I             Checkout II
> Sequential  1249.329 s ± 13.857 s  1438.979 s ± 78.792 s  543.919 s ± 18.745 s
> 64 workers  225.864 s ± 12.433 s   316.345 s ± 1.887 s    183.648 s ± 10.095 s
> Speedup     5.53 ± 0.31            4.55 ± 0.25            2.96 ± 0.19
> 
> The above benchmarks show that parallel checkout is most effective on
> repositories located on an SSD or over a distributed file system. For
> local file systems on spinning disks, and/or older machines, the
> parallelism does not always bring a good performance. In fact, it can
> even increase the run time. For this reason, the sequential code is
> still the default. Two settings are added to optionally enable and
> configure the new parallel version as desired.
> 
> Local SSD tests were executed in an i7-7700HQ (4 cores with
> hyper-threading) running Manjaro Linux. Local HDD tests were executed in
> an i7-2600 (also 4 cores with hyper-threading), HDD Seagate Barracuda
> 7200 rpm SATA 3.0, running Debian 9.13. NFS and EFS tests were
> executed in an Amazon EC2 c5n.large instance, with 2 vCPUs. The Linux
> NFS server was running on a m6g.large instance with 1 TB, EBS GP2
> volume. Before each timing, the linux repository was removed (or checked
> out back), and `sync && sysctl vm.drop_caches=3` was executed.
> 
> Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>

Only having done a quick skim, is there a reason that you are doing the
workqueue handling from scratch rather than using
run-command.h:run_processes_parallel()? The implementation you use and
the implementation in run-command.c seem really similar to me.

 - Emily

  reply	other threads:[~2020-12-16 22:33 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 21:33 [RFC PATCH 00/21] [RFC] Parallel checkout Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 01/21] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 02/21] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 03/21] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 04/21] convert: add conv_attrs classification Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 05/21] entry: extract a header file for entry.c functions Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 06/21] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 07/21] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 08/21] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 09/21] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 10/21] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 11/21] parallel-checkout: make it truly parallel Matheus Tavares
2020-08-19 21:34   ` Jeff Hostetler
2020-08-20  1:33     ` Matheus Tavares Bernardino
2020-08-20 14:39       ` Jeff Hostetler
2020-08-10 21:33 ` [RFC PATCH 12/21] parallel-checkout: add configuration options Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 13/21] parallel-checkout: support progress displaying Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 14/21] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 15/21] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 16/21] checkout-index: add " Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 17/21] parallel-checkout: avoid stat() calls in workers Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 18/21] entry: use is_dir_sep() when checking leading dirs Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 19/21] symlinks: make has_dirs_only_path() track FL_NOENT Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 20/21] parallel-checkout: create leading dirs in workers Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 21/21] parallel-checkout: skip checking the working tree on clone Matheus Tavares
2020-08-12 16:57 ` [RFC PATCH 00/21] [RFC] Parallel checkout Jeff Hostetler
2020-09-22 22:49 ` [PATCH v2 00/19] Parallel Checkout (part I) Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 04/19] convert: add conv_attrs classification Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-10-01 15:53     ` Jeff Hostetler
2020-10-01 15:59       ` Jeff Hostetler
2020-09-22 22:49   ` [PATCH v2 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-10-05  6:17     ` [PATCH] parallel-checkout: drop unused checkout state parameter Jeff King
2020-10-05 13:13       ` Matheus Tavares Bernardino
2020-10-05 13:45         ` Jeff King
2020-09-22 22:49   ` [PATCH v2 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-09-29 19:52     ` Martin Ågren
2020-09-30 14:02       ` Matheus Tavares Bernardino
2020-09-22 22:49   ` [PATCH v2 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 15/19] checkout-index: add " Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-10-20  1:35     ` Jonathan Nieder
2020-10-20  2:55       ` Taylor Blau
2020-10-20 13:18         ` Matheus Tavares Bernardino
2020-10-20 19:09           ` Junio C Hamano
2020-10-20  3:18       ` Matheus Tavares Bernardino
2020-10-20  4:16         ` Jonathan Nieder
2020-10-20 19:14         ` Junio C Hamano
2020-09-22 22:49   ` [PATCH v2 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-09-22 22:49   ` [PATCH v2 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-10-29  2:14   ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-10-29 23:40       ` Junio C Hamano
2020-10-30 17:01         ` Matheus Tavares Bernardino
2020-10-30 17:38           ` Junio C Hamano
2020-10-29  2:14     ` [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-10-29 23:48       ` Junio C Hamano
2020-10-29  2:14     ` [PATCH v3 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-10-29 23:51       ` Junio C Hamano
2020-10-29  2:14     ` [PATCH v3 04/19] convert: add conv_attrs classification Matheus Tavares
2020-10-29 23:53       ` Junio C Hamano
2020-10-29  2:14     ` [PATCH v3 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-10-30 21:36       ` Junio C Hamano
2020-10-29  2:14     ` [PATCH v3 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-10-30 21:58       ` Junio C Hamano
2020-10-29  2:14     ` [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-10-30 22:02       ` Junio C Hamano
2020-10-29  2:14     ` [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-11-02 19:35       ` Junio C Hamano
2020-11-03  3:48         ` Matheus Tavares Bernardino
2020-10-29  2:14     ` [PATCH v3 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 15/19] checkout-index: add " Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-10-29  2:14     ` [PATCH v3 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-10-29 19:48     ` [PATCH v3 00/19] Parallel Checkout (part I) Junio C Hamano
2020-10-30 15:58     ` Jeff Hostetler
2020-11-04 20:32     ` [PATCH v4 " Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-12-05 10:40         ` Christian Couder
2020-12-05 21:53           ` Matheus Tavares Bernardino
2020-11-04 20:33       ` [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-12-05 11:10         ` Christian Couder
2020-12-05 22:20           ` Matheus Tavares Bernardino
2020-11-04 20:33       ` [PATCH v4 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-12-05 11:45         ` Christian Couder
2020-11-04 20:33       ` [PATCH v4 04/19] convert: add conv_attrs classification Matheus Tavares
2020-12-05 12:07         ` Christian Couder
2020-12-05 22:08           ` Matheus Tavares Bernardino
2020-11-04 20:33       ` [PATCH v4 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-12-06  8:31         ` Christian Couder
2020-11-04 20:33       ` [PATCH v4 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-12-06  8:53         ` Christian Couder
2020-11-04 20:33       ` [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-12-06  9:35         ` Christian Couder
2020-12-07 13:52           ` Matheus Tavares Bernardino
2020-11-04 20:33       ` [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-12-06 10:02         ` Christian Couder
2020-12-07 16:47           ` Matheus Tavares Bernardino
2020-11-04 20:33       ` [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-12-06 11:36         ` Christian Couder
2020-12-07 19:06           ` Matheus Tavares Bernardino
2020-11-04 20:33       ` [PATCH v4 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-12-16 22:31         ` Emily Shaffer [this message]
2020-12-17 15:00           ` Matheus Tavares Bernardino
2020-11-04 20:33       ` [PATCH v4 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 15/19] checkout-index: add " Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-11-04 20:33       ` [PATCH v4 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-12-16 14:50       ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 4/9] convert: add classification for conv_attrs struct Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 5/9] entry: extract a header file for entry.c functions Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-12-16 14:50         ` [PATCH v5 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
2020-12-16 15:27         ` [PATCH v5 0/9] Parallel Checkout (part I) Christian Couder
2020-12-17  1:11         ` Junio C Hamano
2021-03-23 14:19         ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 4/9] convert: add classification for conv_attrs struct Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 5/9] entry: extract a header file for entry.c functions Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2021-03-23 14:19           ` [PATCH v6 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
2021-03-23 17:34           ` [PATCH v6 0/9] Parallel Checkout (part 1) Junio C Hamano
2020-10-01 16:42 ` [RFC PATCH 00/21] [RFC] Parallel checkout Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201216223123.GH3783238@google.com \
    --to=emilyshaffer@google.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=martin.agren@gmail.com \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).