From: Christian Couder <christian.couder@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: git <git@vger.kernel.org>, "Junio C Hamano" <gitster@pobox.com>,
"Jeff Hostetler" <git@jeffhostetler.com>,
"Christian Couder" <chriscool@tuxfamily.org>,
"Jeff King" <peff@peff.net>, "Elijah Newren" <newren@gmail.com>,
"Jonathan Nieder" <jrnieder@gmail.com>,
"Martin Ågren" <martin.agren@gmail.com>
Subject: Re: [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout
Date: Sun, 6 Dec 2020 12:36:00 +0100 [thread overview]
Message-ID: <CAP8UFD2+6DYpqM5d1-OCMr4hnGksng4R3e23hcQErT4-ff77=Q@mail.gmail.com> (raw)
In-Reply-To: <bc8447cd9c106055a715305ab506adc6abae7713.1604521275.git.matheus.bernardino@usp.br>
On Wed, Nov 4, 2020 at 9:34 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> This new interface allows us to enqueue some of the entries being
> checked out to later call write_entry() for them in parallel. For now,
> the parallel checkout machinery is enabled by default and there is no
> user configuration, but run_parallel_checkout() just writes the queued
> entries in sequence (without spawning additional workers). The next
> patch will actually implement the parallelism and, later, we will make
> it configurable.
I would think that it might be more logical to first add a
configuration that does nothing, then add writing the queued entries
in sequence without parallelism, and then add actual parallelism.
> When there are path collisions among the entries being written (which
> can happen e.g. with case-sensitive files in case-insensitive file
> systems), the parallel checkout code detects the problem and marks the
> item with PC_ITEM_COLLIDED.
Is this needed in this step that only writes the queued entries in
sequence without parallelism, or could this be added later, before the
step that adds actual parallelism?
> Later, these items are sequentially fed to
> checkout_entry() again. This is similar to the way the sequential code
> deals with collisions, overwriting the previously checked out entries
> with the subsequent ones. The only difference is that, when we start
> writing the entries in parallel, we won't be able to determine which of
> the colliding entries will survive on disk (for the sequential
> algorithm, it is always the last one).
So I guess that PC_ITEM_COLLIDED will then be used to decide which
entries will not be checked out in parallel?
> I also experimented with the idea of not overwriting colliding entries,
> and it seemed to work well in my simple tests.
There are a number of co-author of this patch, so it's not very clear
who "I" is. Maybe:
"The idea of not overwriting colliding entries seemed to work well in
simple tests, however ..."
> However, because just one
> entry of each colliding group would be actually written, the others
> would have null lstat() fields on the index. This might not be a problem
> by itself, but it could cause performance penalties for subsequent
> commands that need to refresh the index: when the st_size value cached
> is 0, read-cache.c:ie_modified() will go to the filesystem to see if the
> contents match. As mentioned in the function:
>
> * Immediately after read-tree or update-index --cacheinfo,
> * the length field is zero, as we have never even read the
> * lstat(2) information once, and we cannot trust DATA_CHANGED
> * returned by ie_match_stat() which in turn was returned by
> * ce_match_stat_basic() to signal that the filesize of the
> * blob changed. We have to actually go to the filesystem to
> * see if the contents match, and if so, should answer "unchanged".
>
> So, if we have N entries in a colliding group and we decide to write and
> lstat() only one of them, every subsequent git-status will have to read,
> convert, and hash the written file N - 1 times, to check that the N - 1
> unwritten entries are dirty. By checking out all colliding entries (like
> the sequential code does), we only pay the overhead once.
Ok.
> 5 files changed, 410 insertions(+), 3 deletions(-)
It looks like a lot of new code in one patch/commit, which is why it
might be interesting to split it.
> @@ -7,6 +7,7 @@
> #include "progress.h"
> #include "fsmonitor.h"
> #include "entry.h"
> +#include "parallel-checkout.h"
>
> static void create_directories(const char *path, int path_len,
> const struct checkout *state)
> @@ -426,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state,
> for (i = 0; i < state->istate->cache_nr; i++) {
> struct cache_entry *dup = state->istate->cache[i];
>
> - if (dup == ce)
> - break;
> + if (dup == ce) {
> + /*
> + * Parallel checkout creates the files in no particular
> + * order. So the other side of the collision may appear
> + * after the given cache_entry in the array.
> + */
Is it really the case right now that the code creates files in no
particular order or will that be the case later when actual
parallelism is implemented?
> + if (parallel_checkout_status() == PC_RUNNING)
> + continue;
> + else
> + break;
> + }
> +struct parallel_checkout_item {
> + /* pointer to a istate->cache[] entry. Not owned by us. */
> + struct cache_entry *ce;
> + struct conv_attrs ca;
> + struct stat st;
> + enum pc_item_status status;
> +};
"item" seems not very clear to me. If there is only one
parallel_checkout_item for each cache_entry then it might be better to
use "parallel_checkout_entry" instead of "parallel_checkout_item".
> +enum pc_status {
> + PC_UNINITIALIZED = 0,
> + PC_ACCEPTING_ENTRIES,
> + PC_RUNNING,
> +};
> +
> +enum pc_status parallel_checkout_status(void);
> +void init_parallel_checkout(void);
Maybe a comment to tell what the above function does could be helpful.
If I had to guess, I would write something like:
/*
* Put parallel checkout into the PC_ACCEPTING_ENTRIES state.
* Should be used only when in the PC_UNINITIALIZED state.
*/
> +/*
> + * Return -1 if parallel checkout is currently not enabled
Is it "enabled" or "initialized" or "configured" here? Does it refer
to `enum pc_status` or a config option or something else? Looking at
the code, it is testing if the status PC_ACCEPTING_ENTRIES, so
perhaps: s/not enabled/not accepting entries/
> or if the entry is
> + * not eligible for parallel checkout. Otherwise, enqueue the entry for later
> + * write and return 0.
> + */
> +int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
next prev parent reply other threads:[~2020-12-06 11:37 UTC|newest]
Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 21:33 [RFC PATCH 00/21] [RFC] Parallel checkout Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 01/21] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 02/21] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 03/21] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 04/21] convert: add conv_attrs classification Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 05/21] entry: extract a header file for entry.c functions Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 06/21] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 07/21] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 08/21] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 09/21] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 10/21] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 11/21] parallel-checkout: make it truly parallel Matheus Tavares
2020-08-19 21:34 ` Jeff Hostetler
2020-08-20 1:33 ` Matheus Tavares Bernardino
2020-08-20 14:39 ` Jeff Hostetler
2020-08-10 21:33 ` [RFC PATCH 12/21] parallel-checkout: add configuration options Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 13/21] parallel-checkout: support progress displaying Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 14/21] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 15/21] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 16/21] checkout-index: add " Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 17/21] parallel-checkout: avoid stat() calls in workers Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 18/21] entry: use is_dir_sep() when checking leading dirs Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 19/21] symlinks: make has_dirs_only_path() track FL_NOENT Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 20/21] parallel-checkout: create leading dirs in workers Matheus Tavares
2020-08-10 21:33 ` [RFC PATCH 21/21] parallel-checkout: skip checking the working tree on clone Matheus Tavares
2020-08-12 16:57 ` [RFC PATCH 00/21] [RFC] Parallel checkout Jeff Hostetler
2020-09-22 22:49 ` [PATCH v2 00/19] Parallel Checkout (part I) Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 04/19] convert: add conv_attrs classification Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-10-01 15:53 ` Jeff Hostetler
2020-10-01 15:59 ` Jeff Hostetler
2020-09-22 22:49 ` [PATCH v2 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-10-05 6:17 ` [PATCH] parallel-checkout: drop unused checkout state parameter Jeff King
2020-10-05 13:13 ` Matheus Tavares Bernardino
2020-10-05 13:45 ` Jeff King
2020-09-22 22:49 ` [PATCH v2 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-09-29 19:52 ` Martin Ågren
2020-09-30 14:02 ` Matheus Tavares Bernardino
2020-09-22 22:49 ` [PATCH v2 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 15/19] checkout-index: add " Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-10-20 1:35 ` Jonathan Nieder
2020-10-20 2:55 ` Taylor Blau
2020-10-20 13:18 ` Matheus Tavares Bernardino
2020-10-20 19:09 ` Junio C Hamano
2020-10-20 3:18 ` Matheus Tavares Bernardino
2020-10-20 4:16 ` Jonathan Nieder
2020-10-20 19:14 ` Junio C Hamano
2020-09-22 22:49 ` [PATCH v2 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-09-22 22:49 ` [PATCH v2 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 00/19] Parallel Checkout (part I) Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-10-29 23:40 ` Junio C Hamano
2020-10-30 17:01 ` Matheus Tavares Bernardino
2020-10-30 17:38 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-10-29 23:48 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-10-29 23:51 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 04/19] convert: add conv_attrs classification Matheus Tavares
2020-10-29 23:53 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-10-30 21:36 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-10-30 21:58 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-10-30 22:02 ` Junio C Hamano
2020-10-29 2:14 ` [PATCH v3 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-11-02 19:35 ` Junio C Hamano
2020-11-03 3:48 ` Matheus Tavares Bernardino
2020-10-29 2:14 ` [PATCH v3 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 15/19] checkout-index: add " Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-10-29 2:14 ` [PATCH v3 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-10-29 19:48 ` [PATCH v3 00/19] Parallel Checkout (part I) Junio C Hamano
2020-10-30 15:58 ` Jeff Hostetler
2020-11-04 20:32 ` [PATCH v4 " Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 01/19] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-12-05 10:40 ` Christian Couder
2020-12-05 21:53 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 02/19] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-12-05 11:10 ` Christian Couder
2020-12-05 22:20 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 03/19] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-12-05 11:45 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 04/19] convert: add conv_attrs classification Matheus Tavares
2020-12-05 12:07 ` Christian Couder
2020-12-05 22:08 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 05/19] entry: extract a header file for entry.c functions Matheus Tavares
2020-12-06 8:31 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 06/19] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 07/19] entry: extract cache_entry update from write_entry() Matheus Tavares
2020-12-06 8:53 ` Christian Couder
2020-11-04 20:33 ` [PATCH v4 08/19] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-12-06 9:35 ` Christian Couder
2020-12-07 13:52 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 09/19] entry: add checkout_entry_ca() which takes preloaded conv_attrs Matheus Tavares
2020-12-06 10:02 ` Christian Couder
2020-12-07 16:47 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 10/19] unpack-trees: add basic support for parallel checkout Matheus Tavares
2020-12-06 11:36 ` Christian Couder [this message]
2020-12-07 19:06 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 11/19] parallel-checkout: make it truly parallel Matheus Tavares
2020-12-16 22:31 ` Emily Shaffer
2020-12-17 15:00 ` Matheus Tavares Bernardino
2020-11-04 20:33 ` [PATCH v4 12/19] parallel-checkout: support progress displaying Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 13/19] make_transient_cache_entry(): optionally alloc from mem_pool Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 14/19] builtin/checkout.c: complete parallel checkout support Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 15/19] checkout-index: add " Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 16/19] parallel-checkout: add tests for basic operations Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 17/19] parallel-checkout: add tests related to clone collisions Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 18/19] parallel-checkout: add tests related to .gitattributes Matheus Tavares
2020-11-04 20:33 ` [PATCH v4 19/19] ci: run test round with parallel-checkout enabled Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 0/9] Parallel Checkout (part I) Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 4/9] convert: add classification for conv_attrs struct Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 5/9] entry: extract a header file for entry.c functions Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2020-12-16 14:50 ` [PATCH v5 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
2020-12-16 15:27 ` [PATCH v5 0/9] Parallel Checkout (part I) Christian Couder
2020-12-17 1:11 ` Junio C Hamano
2021-03-23 14:19 ` [PATCH v6 0/9] Parallel Checkout (part 1) Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 1/9] convert: make convert_attrs() and convert structs public Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 2/9] convert: add [async_]convert_to_working_tree_ca() variants Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 3/9] convert: add get_stream_filter_ca() variant Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 4/9] convert: add classification for conv_attrs struct Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 5/9] entry: extract a header file for entry.c functions Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 6/9] entry: make fstat_output() and read_blob_entry() public Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 7/9] entry: extract update_ce_after_write() from write_entry() Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 8/9] entry: move conv_attrs lookup up to checkout_entry() Matheus Tavares
2021-03-23 14:19 ` [PATCH v6 9/9] entry: add checkout_entry_ca() taking preloaded conv_attrs Matheus Tavares
2021-03-23 17:34 ` [PATCH v6 0/9] Parallel Checkout (part 1) Junio C Hamano
2020-10-01 16:42 ` [RFC PATCH 00/21] [RFC] Parallel checkout Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAP8UFD2+6DYpqM5d1-OCMr4hnGksng4R3e23hcQErT4-ff77=Q@mail.gmail.com' \
--to=christian.couder@gmail.com \
--cc=chriscool@tuxfamily.org \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=martin.agren@gmail.com \
--cc=matheus.bernardino@usp.br \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).