From: Glen Choo <chooglen@google.com>
To: Calvin Wan <calvinwan@google.com>, git@vger.kernel.org
Cc: Calvin Wan <calvinwan@google.com>,
avarab@gmail.com, newren@gmail.com, jonathantanmy@google.com,
phillip.wood123@gmail.com
Subject: Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules
Date: Thu, 16 Mar 2023 19:51:58 -0700 [thread overview]
Message-ID: <kl6lh6ukum29.fsf@chooglen-macbookpro.roam.corp.google.com> (raw)
In-Reply-To: <kl6ljzzguqss.fsf@chooglen-macbookpro.roam.corp.google.com>
Glen Choo <chooglen@google.com> writes:
> It would be good if we could avoid mixing unrelated information sources
> in "struct submodule_status_util", since a) this makes it very tightly
> coupled to run_diff_files() and b) it causes us to repeat ourselves in
> the same function (.changed = changed, record_file_diff()).
>
> The only reason why the code looks this way right now is that
> match_stat_with_submodule() sets defer_submodule_status based on whether
> or not we should ignore the submodule, and this eventually tells
> get_submodule_status() what submodules it needs to care about. But,
> deciding whether to spawn a subprocess for which submodule is exactly
> what the .get_next_task member is for.
>
>> diff --git a/submodule.c b/submodule.c
>> index 426074cebb..6f6e150a3f 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -1981,6 +1994,121 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
>> return dirty_submodule;
>> }
>>
>> +static struct status_task *
>> +get_status_task_from_index(struct submodule_parallel_status *sps,
>> + struct strbuf *err)
>> +{
>> + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) {
>> + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util;
>> + struct status_task *task;
>> +
>> + if (!verify_submodule_git_directory(util->path))
>> + continue;
>
> So right here, we could use the "check if this submodule should be
> ignored" logic form match_stat_with_submodule() to decide whether or not
> to spawn the subprocess. IOW, I am advocating for
> get_submodules_status() to be a parallel version of
> match_stat_with_submodule() (not a parallel version of
> is_submodule_modified() that shuttles extra information).
It turns out to be quite difficult to implement a parallel
match_stat_with_submodule():
a) we can't remove it because it still has another caller
b) its internals are quite hard to refactor: one conditional arm depends
on "changed", which is set by calling ie_match_stat(), which in turn
requires the "struct stat" to have already been lstat()-ed...
So even though this series adds a lot, it is just about as minimally
invasive as possible.
I suspect that there are some possible cleanups down the line, e.g.
is_submodule_modified() is rightfully only called by diff-lib.c , so I
think it should be a static function there. And once we move that, we
can make our parallel function static, and then we don't have to worry
about tight coupling to run_diff_files(). To keep the range-diff
manageable, that can be left for a future cleanup though.
next prev parent reply other threads:[~2023-03-17 2:52 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/>
2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan
2023-01-05 23:23 ` Calvin Wan
2023-01-17 19:30 ` [PATCH v6 " Calvin Wan
2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan
2023-02-08 0:55 ` Ævar Arnfjörð Bjarmason
2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan
2023-02-09 1:42 ` Ævar Arnfjörð Bjarmason
2023-02-09 19:50 ` Junio C Hamano
2023-02-09 21:52 ` Calvin Wan
2023-02-09 22:25 ` Junio C Hamano
2023-02-10 13:24 ` Ævar Arnfjörð Bjarmason
2023-02-10 17:42 ` Junio C Hamano
2023-02-09 20:50 ` Phillip Wood
2023-03-02 21:52 ` [PATCH v9 " Calvin Wan
2023-03-02 22:02 ` [PATCH v9 1/6] run-command: add on_stderr_output_fn to run_processes_parallel_opts Calvin Wan
2023-03-02 22:02 ` [PATCH v9 2/6] submodule: rename strbuf variable Calvin Wan
2023-03-03 0:25 ` Junio C Hamano
2023-03-06 17:37 ` Calvin Wan
2023-03-06 18:30 ` Junio C Hamano
2023-03-06 19:00 ` Calvin Wan
2023-03-02 22:02 ` [PATCH v9 3/6] submodule: move status parsing into function Calvin Wan
2023-03-17 20:42 ` Glen Choo
2023-03-02 22:02 ` [PATCH v9 4/6] submodule: refactor is_submodule_modified() Calvin Wan
2023-03-02 22:02 ` [PATCH v9 5/6] diff-lib: refactor out diff_change logic Calvin Wan
2023-03-02 22:02 ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-03-07 8:41 ` Ævar Arnfjörð Bjarmason
2023-03-07 10:21 ` Ævar Arnfjörð Bjarmason
2023-03-07 17:55 ` Junio C Hamano
2023-03-17 1:09 ` Glen Choo
2023-03-17 2:51 ` Glen Choo [this message]
2023-02-09 0:02 ` [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-02-13 6:34 ` Glen Choo
2023-02-13 17:52 ` Junio C Hamano
2023-02-13 18:26 ` Calvin Wan
2023-02-09 0:02 ` [PATCH v8 2/6] submodule: strbuf variable rename Calvin Wan
2023-02-13 8:37 ` Glen Choo
2023-02-09 0:02 ` [PATCH v8 3/6] submodule: move status parsing into function Calvin Wan
2023-02-09 0:02 ` [PATCH v8 4/6] submodule: refactor is_submodule_modified() Calvin Wan
2023-02-13 7:06 ` Glen Choo
2023-02-09 0:02 ` [PATCH v8 5/6] diff-lib: refactor out diff_change logic Calvin Wan
2023-02-09 1:48 ` Ævar Arnfjörð Bjarmason
2023-02-13 8:42 ` Glen Choo
2023-02-13 18:29 ` Calvin Wan
2023-02-14 4:03 ` Glen Choo
2023-02-09 0:02 ` [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-02-13 8:36 ` Glen Choo
2023-02-07 18:17 ` [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-02-07 22:16 ` Ævar Arnfjörð Bjarmason
2023-02-08 22:50 ` Calvin Wan
2023-02-08 14:19 ` Phillip Wood
2023-02-08 22:54 ` Calvin Wan
2023-02-09 20:37 ` Phillip Wood
2023-02-07 18:17 ` [PATCH v7 2/7] submodule: strbuf variable rename Calvin Wan
2023-02-07 22:47 ` Ævar Arnfjörð Bjarmason
2023-02-08 22:59 ` Calvin Wan
2023-02-07 18:17 ` [PATCH v7 3/7] submodule: move status parsing into function Calvin Wan
2023-02-07 18:17 ` [PATCH v7 4/7] submodule: refactor is_submodule_modified() Calvin Wan
2023-02-07 22:59 ` Ævar Arnfjörð Bjarmason
2023-02-07 18:17 ` [PATCH v7 5/7] diff-lib: refactor out diff_change logic Calvin Wan
2023-02-08 14:28 ` Phillip Wood
2023-02-08 23:12 ` Calvin Wan
2023-02-09 20:53 ` Phillip Wood
2023-02-07 18:17 ` [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule Calvin Wan
2023-02-08 8:18 ` Ævar Arnfjörð Bjarmason
2023-02-08 17:07 ` Phillip Wood
2023-02-08 23:13 ` Calvin Wan
2023-02-08 14:22 ` Phillip Wood
2023-02-07 18:17 ` [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-02-07 23:06 ` Ævar Arnfjörð Bjarmason
2023-01-17 19:30 ` [PATCH v6 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-01-17 19:30 ` [PATCH v6 2/6] submodule: strbuf variable rename Calvin Wan
2023-01-17 19:30 ` [PATCH v6 3/6] submodule: move status parsing into function Calvin Wan
2023-01-17 19:30 ` [PATCH v6 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2023-01-17 19:30 ` [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-01-26 9:09 ` Glen Choo
2023-01-26 9:16 ` Glen Choo
2023-01-26 18:52 ` Calvin Wan
2023-01-17 19:30 ` [PATCH v6 6/6] submodule: call parallel code from serial status Calvin Wan
2023-01-26 8:09 ` Glen Choo
2023-01-26 8:45 ` Glen Choo
2023-01-04 21:54 ` [PATCH v5 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan
2023-01-04 21:54 ` [PATCH v5 2/6] submodule: strbuf variable rename Calvin Wan
2023-01-04 21:54 ` [PATCH v5 3/6] submodule: move status parsing into function Calvin Wan
2023-01-04 21:54 ` [PATCH v5 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan
2023-01-04 21:54 ` [PATCH v5 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-01-04 21:54 ` [PATCH v5 6/6] submodule: call parallel code from serial status Calvin Wan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=kl6lh6ukum29.fsf@chooglen-macbookpro.roam.corp.google.com \
--to=chooglen@google.com \
--cc=avarab@gmail.com \
--cc=calvinwan@google.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=newren@gmail.com \
--cc=phillip.wood123@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).