From: Jeff Hostetler <git@jeffhostetler.com>
To: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
Cc: git <git@vger.kernel.org>, "Derrick Stolee" <stolee@gmail.com>,
jeffhost@microsoft.com,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Paul Tan" <pyokagan@gmail.com>,
"Denton Liu" <liu.denton@gmail.com>,
"Remi Lespinet" <remi.lespinet@ensimag.grenoble-inp.fr>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [RFC PATCH 11/21] parallel-checkout: make it truly parallel
Date: Thu, 20 Aug 2020 10:39:49 -0400 [thread overview]
Message-ID: <5327b0e3-fdaa-11e1-f7da-832eb988555e@jeffhostetler.com> (raw)
In-Reply-To: <CAHd-oW5QjLPmvRL0YojL=NiO-ciY7rNZkQuvCtnF-1vmR4ykKQ@mail.gmail.com>
On 8/19/20 9:33 PM, Matheus Tavares Bernardino wrote:
> On Wed, Aug 19, 2020 at 6:34 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>
>> On 8/10/20 5:33 PM, Matheus Tavares wrote:
>>>
>>> +static struct child_process *setup_workers(struct checkout *state, int num_workers)
>>> +{
>>> + struct child_process *workers;
>>> + int i, workers_with_one_extra_item;
>>> + size_t base_batch_size, next_to_assign = 0;
>>> +
>>> + base_batch_size = parallel_checkout->nr / num_workers;
>>> + workers_with_one_extra_item = parallel_checkout->nr % num_workers;
>>> + ALLOC_ARRAY(workers, num_workers);
>>> +
>>> + for (i = 0; i < num_workers; ++i) {
>>> + struct child_process *cp = &workers[i];
>>> + size_t batch_size = base_batch_size;
>>> +
>>> + child_process_init(cp);
>>> + cp->git_cmd = 1;
>>> + cp->in = -1;
>>> + cp->out = -1;
>>> + strvec_push(&cp->args, "checkout--helper");
>>> + if (state->base_dir_len)
>>> + strvec_pushf(&cp->args, "--prefix=%s", state->base_dir);
>>> + if (start_command(cp))
>>> + die(_("failed to spawn checkout worker"));
>>
>> We should consider splitting this loop into one to start the helpers
>> and another loop to later send them their assignments. This would
>> better hide the process startup costs.
>>
>> When comparing this version with my pc-p4-core branch on Windows,
>> I was seeing a delay of 0.8 seconds between each helper process
>> getting started. And on my version a delay of 0.2 between them.
>>
>> I was testing with a huge repo and the batch size was ~200k, so it
>> foreground process was stuck in send_batch() for a while before it
>> could start the next helper process.
>>
>> It still takes the same amount of time to send each batch, but
>> the 2nd thru nth helpers can be starting while we are sending the
>> batch to the 1st helper. (This might just be a Windows issue because
>> of how slow process creation is on Windows....)
>
> Thanks for the explanation. I will split the loop in v2.
>
>> We could maybe also save a little time splitting the batches
>> across the helpers, but that's a refinement for later...
>>
>>> +
>>> + /* distribute the extra work evenly */
>>> + if (i < workers_with_one_extra_item)
>>> + batch_size++;
>>> +
>>> + send_batch(cp->in, next_to_assign, batch_size);
>>> + next_to_assign += batch_size;
>>> }
>>>
>>> + return workers;
>>> +}
>>> +
>>> +static void finish_workers(struct child_process *workers, int num_workers)
>>> +{
>>> + int i;
>>> + for (i = 0; i < num_workers; ++i) {
>>> + struct child_process *w = &workers[i];
>>> + if (w->in >= 0)
>>> + close(w->in);
>>> + if (w->out >= 0)
>>> + close(w->out);
>>
>> You might also consider splitting this loop too. The net-net here
>> is that the foreground process closes the handle to the child and
>> waits for the child to exit -- which it will because it get EOF on
>> its stdin.
>>
>> But the foreground process is stuck in a wait() for it to do so.
>>
>> You could make finish_workers() just call close() on all the child
>> handles and then have an atexit() handler to actually wait() and
>> reap them. This would let the children exit asynchronously (while
>> the caller here in the foreground process is updating the index
>> on disk, for example).
>
> Makes sense, thanks. And I think we could achieve this by setting both
> `clean_on_exit` and `wait_after_clean` on the child_process struct,
> right? (BTW, I have just noticed that we probably want these flags set
> for another reason too: we wouldn't want the workers to keep checking
> out files if the main process was killed.)
>
> Maybe the downside of using the atexit() handler, instead of calling
> finish_command(), would be that we cannot free the `workers` array
> when cleaning up parallel-checkout, right? Also, we wouldn't be able
> to report an error if the worker ends with an error code (but at this
> point it would have already sent all the results to the foreground
> process, anyway).
>
> Do you think that we can mitigate the wait() cost by just splitting
> the loop in two, like we are going to do in setup_workers()?
Yeah, I was thinking about that after I hit send, that it'd be simpler
to do a close loop and then a wait loop and not bother with the atexit
complexity and yet get us most of the gains.
>
>>
>>> + if (finish_command(w))
>>> + die(_("checkout worker finished with error"));
>>> + }
>>> + free(workers);
>>> +}
next prev parent reply other threads:[~2020-08-20 14:40 UTC|newest]
Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 21:33 [RFC PATCH 00/21] [RFC] Parallel checkout Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 01/21] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 02/21] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 03/21] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 04/21] convert: add conv_attrs classification Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 05/21] entry: extract a header file for entry.c functions Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 06/21] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 07/21] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 08/21] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 09/21] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 10/21] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 11/21] parallel-checkout: make it truly parallel Matheus Tavares
2020-08-19 21:34 ` Jeff Hostetler
2020-08-20 1:33 ` Matheus Tavares Bernardino
2020-08-20 14:39 ` Jeff Hostetler [this message]
2020-08-10 21:33 ` [RFC PATCH 12/21] parallel-checkout: add configuration options Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 13/21] parallel-checkout: support progress displaying Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 14/21] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 15/21] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 16/21] checkout-index: add " Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 17/21] parallel-checkout: avoid stat() calls in workers Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 18/21] entry: use is_dir_sep() when checking leading dirs Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 19/21] symlinks: make has_dirs_only_path() track FL_NOENT Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 20/21] parallel-checkout: create leading dirs in workers Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 21/21] parallel-checkout: skip checking the working tree on clone Matheus Tavares
2020-08-12 16:57 ` [RFC PATCH 00/21] [RFC] Parallel checkout Jeff Hostetler
2020-09-22 22:49 ` [PATCH v2 00/19] Parallel Checkout (part I) Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 04/19] convert: add conv_attrs classification Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-10-01 15:53 ` Jeff Hostetler
2020-10-01 15:59 ` Jeff Hostetler
2020-09-22 22:49 ` [PATCH v2 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-10-05 6:17 ` [PATCH] parallel-checkout: drop unused checkout state parameter Jeff King
2020-10-05 13:13 ` Matheus Tavares Bernardino
2020-10-05 13:45 ` Jeff King
2020-09-22 22:49 ` [PATCH v2 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-09-29 19:52 ` Martin Ågren
2020-09-30 14:02 ` Matheus Tavares Bernardino
2020-09-22 22:49 ` [PATCH v2 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 15/19] checkout-index: add " Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-10-20 1:35 ` Jonathan Nieder
2020-10-20 2:55 ` Taylor Blau
2020-10-20 13:18 ` Matheus Tavares Bernardino
2020-10-20 19:09 ` Junio C Hamano
2020-10-20 3:18 ` Matheus Tavares Bernardino
2020-10-20 4:16 ` Jonathan Nieder
2020-10-20 19:14 ` Junio C Hamano
2020-09-22 22:49 ` [PATCH v2 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-10-29 23:40 ` Junio C Hamano
2020-10-30 17:01 ` Matheus Tavares Bernardino
2020-10-30 17:38 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-10-29 23:48 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-10-29 23:51 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 04/19] convert: add conv_attrs classification Matheus Tavares
2020-10-29 23:53 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-10-30 21:36 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-10-30 21:58 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-10-30 22:02 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-11-02 19:35 ` Junio C Hamano
2020-11-03 3:48 ` Matheus Tavares Bernardino
2020-10-29 2:14 ` [PATCH v3 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 15/19] checkout-index: add " Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-10-29 19:48 ` [PATCH v3 00/19] Parallel Checkout (part I) Junio C Hamano
2020-10-30 15:58 ` Jeff Hostetler
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-12-05 10:40 ` Christian Couder
2020-12-05 21:53 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-12-05 11:10 ` Christian Couder
2020-12-05 22:20 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-12-05 11:45 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 04/19] convert: add conv_attrs classification Matheus Tavares
2020-12-05 12:07 ` Christian Couder
2020-12-05 22:08 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-12-06 8:31 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-12-06 8:53 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-12-06 9:35 ` Christian Couder
2020-12-07 13:52 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-12-06 10:02 ` Christian Couder
2020-12-07 16:47 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-12-06 11:36 ` Christian Couder
2020-12-07 19:06 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-12-16 22:31 ` Emily Shaffer
2020-12-17 15:00 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 15/19] checkout-index: add " Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 4/9] convert: add classification for conv_attrs struct Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 5/9] entry: extract a header file for entry.c functions Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
2020-12-16 15:27 ` [PATCH v5 0/9] Parallel Checkout (part I) Christian Couder
2020-12-17 1:11 ` Junio C Hamano
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 4/9] convert: add classification for conv_attrs struct Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 5/9] entry: extract a header file for entry.c functions Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
2021-03-23 17:34 ` [PATCH v6 0/9] Parallel Checkout (part 1) Junio C Hamano
2020-10-01 16:42 ` [RFC PATCH 00/21] [RFC] Parallel checkout Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5327b0e3-fdaa-11e1-f7da-832eb988555e@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=liu.denton@gmail.com \
--cc=matheus.bernardino@usp.br \
--cc=pclouds@gmail.com \
--cc=pyokagan@gmail.com \
--cc=remi.lespinet@ensimag.grenoble-inp.fr \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).