* [PATCH v5 0/6] submodule: parallelize diff [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> @ 2023-01-04 21:54 ` Calvin Wan 2023-01-05 23:23 ` Calvin Wan ` (7 more replies) 2023-01-04 21:54 ` [PATCH v5 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan ` (5 subsequent siblings) 6 siblings, 8 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-04 21:54 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Original cover letter for context: https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ Thank you again everyone for the numerous reviews! For this reroll, I incorporated most of the feedback given, fixed a bug I found, and made some stylistic refactors. I also added a new patch at the end that swaps the serial implementation in is_submodule_modified for the new parallel one. While I had patch 6 originally smushed with the previous one, the diff came out not very reviewer friendly so it has been separated out. Changes since v4 (Patch 1) The code in run-command.c that calls duplicate_output_fn has been cleaned up and no longer passes a separate strbuf for the output. It instead passes an offset that represents the starting point in the original strbuf. (Patch 5) Moved status parsing from status_duplicate_output to status_finish. In pp_buffer_stderr::run-command.c, output is gathered by strbuf_read_once which reads 8192 bytes at once so a longer status message would error out during status parsing since part of it would be cut off. Therefore, status parsing must happen at the end of the process rather than in duplicate_output_fn (and has subsequently been moved). (Patch 6) New patch swapping serial implementation in is_submodule_modified for the new parallel one. Calvin Wan (6): run-command: add duplicate_output_fn to run_processes_parallel_opts submodule: strbuf variable rename submodule: move status parsing into function diff-lib: refactor match_stat_with_submodule diff-lib: parallelize run_diff_files for submodules submodule: call parallel code from serial status Documentation/config/submodule.txt | 12 ++ diff-lib.c | 104 ++++++++++-- run-command.c | 16 +- run-command.h | 27 ++++ submodule.c | 250 ++++++++++++++++++++++------- submodule.h | 9 ++ t/helper/test-run-command.c | 21 +++ t/t0061-run-command.sh | 39 +++++ t/t4027-diff-submodule.sh | 19 +++ t/t7506-status-submodule.sh | 19 +++ 10 files changed, 441 insertions(+), 75 deletions(-) -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v5 0/6] submodule: parallelize diff 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan @ 2023-01-05 23:23 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan ` (6 subsequent siblings) 7 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-05 23:23 UTC (permalink / raw) To: git; +Cc: emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Apologies for the broken link to the previous versions. Looks like I had some encoding issues with copy/paste. Here are the previous versions v4: https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/ v3: https://lore.kernel.org/git/20221020232532.1128326-1-calvinwan@google.com/ v2: https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ v1: https://lore.kernel.org/git/20220922232947.631309-1-calvinwan@google.com/ On Wed, Jan 4, 2023 at 1:54 PM Calvin Wan <calvinwan@google.com> wrote: > > Original cover letter for context: > https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ > > Thank you again everyone for the numerous reviews! For this reroll, I > incorporated most of the feedback given, fixed a bug I found, and made > some stylistic refactors. I also added a new patch at the end that swaps > the serial implementation in is_submodule_modified for the new parallel > one. While I had patch 6 originally smushed with the previous one, > the diff came out not very reviewer friendly so it has been separated > out. > > Changes since v4 > > (Patch 1) > The code in run-command.c that calls duplicate_output_fn has been > cleaned up and no longer passes a separate strbuf for the output. It > instead passes an offset that represents the starting point in the > original strbuf. > > (Patch 5) > Moved status parsing from status_duplicate_output to status_finish. In > pp_buffer_stderr::run-command.c, output is gathered by strbuf_read_once > which reads 8192 bytes at once so a longer status message would error > out during status parsing since part of it would be cut off. Therefore, > status parsing must happen at the end of the process rather than in > duplicate_output_fn (and has subsequently been moved). > > (Patch 6) > New patch swapping serial implementation in is_submodule_modified for > the new parallel one. > > Calvin Wan (6): > run-command: add duplicate_output_fn to run_processes_parallel_opts > submodule: strbuf variable rename > submodule: move status parsing into function > diff-lib: refactor match_stat_with_submodule > diff-lib: parallelize run_diff_files for submodules > submodule: call parallel code from serial status > > Documentation/config/submodule.txt | 12 ++ > diff-lib.c | 104 ++++++++++-- > run-command.c | 16 +- > run-command.h | 27 ++++ > submodule.c | 250 ++++++++++++++++++++++------- > submodule.h | 9 ++ > t/helper/test-run-command.c | 21 +++ > t/t0061-run-command.sh | 39 +++++ > t/t4027-diff-submodule.sh | 19 +++ > t/t7506-status-submodule.sh | 19 +++ > 10 files changed, 441 insertions(+), 75 deletions(-) > > -- > 2.39.0.314.g84b9a713c41-goog > ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v6 0/6] submodule: parallelize diff 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan 2023-01-05 23:23 ` Calvin Wan @ 2023-01-17 19:30 ` Calvin Wan 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan ` (7 more replies) 2023-01-17 19:30 ` [PATCH v6 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan ` (5 subsequent siblings) 7 siblings, 8 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-17 19:30 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Original cover letter for context: https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ (Quick reroll to fix leaks from v5) Thank you again everyone for the numerous reviews! For this reroll, I incorporated most of the feedback given, fixed a bug I found, and made some stylistic refactors. I also added a new patch at the end that swaps the serial implementation in is_submodule_modified for the new parallel one. While I had patch 6 originally smushed with the previous one, the diff came out not very reviewer friendly so it has been separated out. Changes since v4 (Patch 1) The code in run-command.c that calls duplicate_output_fn has been cleaned up and no longer passes a separate strbuf for the output. It instead passes an offset that represents the starting point in the original strbuf. (Patch 5) Moved status parsing from status_duplicate_output to status_finish. In pp_buffer_stderr::run-command.c, output is gathered by strbuf_read_once which reads 8192 bytes at once so a longer status message would error out during status parsing since part of it would be cut off. Therefore, status parsing must happen at the end of the process rather than in duplicate_output_fn (and has subsequently been moved). (Patch 6) New patch swapping serial implementation in is_submodule_modified for the new parallel one. Calvin Wan (6): run-command: add duplicate_output_fn to run_processes_parallel_opts submodule: strbuf variable rename submodule: move status parsing into function diff-lib: refactor match_stat_with_submodule diff-lib: parallelize run_diff_files for submodules submodule: call parallel code from serial status Documentation/config/submodule.txt | 12 ++ diff-lib.c | 104 ++++++++++-- run-command.c | 16 +- run-command.h | 27 +++ submodule.c | 254 ++++++++++++++++++++++------- submodule.h | 9 + t/helper/test-run-command.c | 21 +++ t/t0061-run-command.sh | 39 +++++ t/t4027-diff-submodule.sh | 19 +++ t/t7506-status-submodule.sh | 19 +++ 10 files changed, 445 insertions(+), 75 deletions(-) -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v7 0/7] submodule: parallelize diff 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan @ 2023-02-07 18:16 ` Calvin Wan 2023-02-08 0:55 ` Ævar Arnfjörð Bjarmason ` (7 more replies) 2023-02-07 18:17 ` [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan ` (6 subsequent siblings) 7 siblings, 8 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:16 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy Original cover letter for context: https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ Changes since v6 Added patches 4 and 5 to refactor out more functionality so that it is clear what changes my final patch makes. Since the large majority of the functionality between the serial and parallel implementation is now shared, I no longer remove the serial implementation. Added additional tests to verify setting parallelism doesn't alter output Calvin Wan (7): run-command: add duplicate_output_fn to run_processes_parallel_opts submodule: strbuf variable rename submodule: move status parsing into function submodule: refactor is_submodule_modified() diff-lib: refactor out diff_change logic diff-lib: refactor match_stat_with_submodule diff-lib: parallelize run_diff_files for submodules Documentation/config/submodule.txt | 12 ++ diff-lib.c | 133 +++++++++++--- run-command.c | 16 +- run-command.h | 27 +++ submodule.c | 274 ++++++++++++++++++++++++----- submodule.h | 9 + t/helper/test-run-command.c | 21 +++ t/t0061-run-command.sh | 39 ++++ t/t4027-diff-submodule.sh | 31 ++++ t/t7506-status-submodule.sh | 25 +++ 10 files changed, 508 insertions(+), 79 deletions(-) -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 0/7] submodule: parallelize diff 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan @ 2023-02-08 0:55 ` Ævar Arnfjörð Bjarmason 2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan ` (6 subsequent siblings) 7 siblings, 0 replies; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-08 0:55 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy On Tue, Feb 07 2023, Calvin Wan wrote: > Original cover letter for context: > https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ I went over this, noticed some issues, some nits, but definitely some things worth an eventual re-roll. > Changes since v6 I would very much appreciate for future iterations if you can start including a range-diff to the previous version. > Added patches 4 and 5 to refactor out more functionality so that it is > clear what changes my final patch makes. Since the large majority of > the functionality between the serial and parallel implementation is now > shared, I no longer remove the serial implementation. > > Added additional tests to verify setting parallelism doesn't alter > output I could have, but didn't manually apply both v6 and v7 and produce a range-diff, having it in the CL would really help to track the changes across re-rolls. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v8 0/6] submodule: parallelize diff 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan 2023-02-08 0:55 ` Ævar Arnfjörð Bjarmason @ 2023-02-09 0:02 ` Calvin Wan 2023-02-09 1:42 ` Ævar Arnfjörð Bjarmason ` (3 more replies) 2023-02-09 0:02 ` [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan ` (5 subsequent siblings) 7 siblings, 4 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-09 0:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Original cover letter for context: https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ This reroll contains stylistic changes suggested by Avar and Phillip, and includes a range-diff below. Calvin Wan (6): run-command: add duplicate_output_fn to run_processes_parallel_opts submodule: strbuf variable rename submodule: move status parsing into function submodule: refactor is_submodule_modified() diff-lib: refactor out diff_change logic diff-lib: parallelize run_diff_files for submodules Documentation/config/submodule.txt | 12 ++ diff-lib.c | 133 +++++++++++---- run-command.c | 16 +- run-command.h | 25 +++ submodule.c | 266 ++++++++++++++++++++++++----- submodule.h | 9 + t/helper/test-run-command.c | 20 +++ t/t0061-run-command.sh | 39 +++++ t/t4027-diff-submodule.sh | 31 ++++ t/t7506-status-submodule.sh | 25 +++ 10 files changed, 497 insertions(+), 79 deletions(-) Range-diff against v7: 1: 311b1abfbe ! 1: 5d51250c67 run-command: add duplicate_output_fn to run_processes_parallel_opts @@ run-command.c: static void pp_init(struct parallel_processes *pp, if (!opts->get_next_task) BUG("you need to specify a get_next_task function"); -+ if (opts->duplicate_output && opts->ungroup) -+ BUG("duplicate_output and ungroup are incompatible with each other"); ++ if (opts->ungroup) { ++ if (opts->duplicate_output) ++ BUG("duplicate_output and ungroup are incompatible with each other"); ++ } + CALLOC_ARRAY(pp->children, n); if (!opts->ungroup) @@ run-command.c: static void pp_buffer_stderr(struct parallel_processes *pp, + } else if (n < 0) { if (errno != EAGAIN) die_errno("read"); -+ } else { -+ if (opts->duplicate_output) -+ opts->duplicate_output(&pp->children[i].err, -+ strlen(pp->children[i].err.buf) - n, -+ opts->data, -+ pp->children[i].data); ++ } else if (opts->duplicate_output) { ++ opts->duplicate_output(&pp->children[i].err, ++ pp->children[i].err.len - n, ++ opts->data, pp->children[i].data); + } } } @@ run-command.h: typedef int (*start_failure_fn)(struct strbuf *out, + * + * This function is incompatible with "ungroup" + */ -+typedef void (*duplicate_output_fn)(struct strbuf *out, -+ size_t offset, -+ void *pp_cb, -+ void *pp_task_cb); ++typedef void (*duplicate_output_fn)(struct strbuf *out, size_t offset, ++ void *pp_cb, void *pp_task_cb); + /** * This callback is called on every child process that finished processing. @@ run-command.h: struct run_process_parallel_opts start_failure_fn start_failure; + /** -+ * duplicate_output: See duplicate_output_fn() above. This should be -+ * NULL unless process specific output is needed ++ * duplicate_output: See duplicate_output_fn() above. Unless you need ++ * to capture output from child processes, leave this as NULL. + */ + duplicate_output_fn duplicate_output; + @@ t/helper/test-run-command.c: static int no_job(struct child_process *cp, + void *pp_task_cb UNUSED) +{ + struct string_list list = STRING_LIST_INIT_DUP; ++ struct string_list_item *item; + + string_list_split(&list, out->buf + offset, '\n', -1); -+ for (size_t i = 0; i < list.nr; i++) { -+ if (strlen(list.items[i].string) > 0) -+ fprintf(stderr, "duplicate_output: %s\n", list.items[i].string); -+ } ++ for_each_string_list_item(item, &list) ++ fprintf(stderr, "duplicate_output: %s\n", item->string); + string_list_clear(&list, 0); +} + @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with m + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && -+ sed "/duplicate_output/d" err > err1 && ++ sed "/duplicate_output/d" err >err1 && + test_cmp expect err1 +' + @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with a + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && -+ sed "/duplicate_output/d" err > err1 && ++ sed "/duplicate_output/d" err >err1 && + test_cmp expect err1 +' + @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with m + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && -+ sed "/duplicate_output/d" err > err1 && ++ sed "/duplicate_output/d" err >err1 && + test_cmp expect err1 +' + 2: d00a18dd84 = 2: 6ded5b6788 submodule: strbuf variable rename 3: dcda518922 = 3: 0c71cea8cd submodule: move status parsing into function 4: c6fc5ba13b ! 4: 5c8cc93f9f submodule: refactor is_submodule_modified() @@ submodule.c: static int config_update_recurse_submodules = RECURSE_SUBMODULES_OF static int initialized_fetch_ref_tips; static struct oid_array ref_tips_before_fetch; static struct oid_array ref_tips_after_fetch; -+static const char *status_porcelain_start_error = -+ N_("could not run 'git status --porcelain=2' in submodule %s"); -+static const char *status_porcelain_fail_error = -+ N_("'git status --porcelain=2' failed in submodule %s"); ++#define STATUS_PORCELAIN_START_ERROR \ ++ N_("could not run 'git status --porcelain=2' in submodule %s") ++#define STATUS_PORCELAIN_FAIL_ERROR \ ++ N_("'git status --porcelain=2' failed in submodule %s") /* * Check if the .gitmodules file is unmerged. Parsing of the .gitmodules file @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack + prepare_status_porcelain(&cp, path, ignore_untracked); if (start_command(&cp)) - die(_("Could not run 'git status --porcelain=2' in submodule %s"), path); -+ die(_(status_porcelain_start_error), path); ++ die(_(STATUS_PORCELAIN_START_ERROR), path); fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack if (finish_command(&cp) && !ignore_cp_exit_code) - die(_("'git status --porcelain=2' failed in submodule %s"), path); -+ die(_(status_porcelain_fail_error), path); ++ die(_(STATUS_PORCELAIN_FAIL_ERROR), path); strbuf_release(&buf); return dirty_submodule; 5: 1ea8eae9c9 = 5: 6c2b62abc8 diff-lib: refactor out diff_change logic 6: 0d35fcc38d < -: ---------- diff-lib: refactor match_stat_with_submodule 7: fd1eec974d ! 6: bb25dadbe5 diff-lib: parallelize run_diff_files for submodules @@ diff-lib.c: static int check_removed(const struct index_state *istate, const str + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); - struct diff_flags orig_flags; +- if (S_ISGITLINK(ce->ce_mode)) { +- struct diff_flags orig_flags = diffopt->flags; +- if (!diffopt->flags.override_submodule_config) +- set_diffopt_flags_from_submodule_config(diffopt, ce->name); +- if (diffopt->flags.ignore_submodules) +- changed = 0; +- else if (!diffopt->flags.ignore_dirty_submodules && +- (!changed || diffopt->flags.dirty_submodules)) ++ struct diff_flags orig_flags; + int defer = 0; - - if (!S_ISGITLINK(ce->ce_mode)) -- return changed; ++ ++ if (!S_ISGITLINK(ce->ce_mode)) + goto ret; - - orig_flags = diffopt->flags; - if (!diffopt->flags.override_submodule_config) -@@ diff-lib.c: static int match_stat_with_submodule(struct diff_options *diffopt, - goto cleanup; - } - if (!diffopt->flags.ignore_dirty_submodules && -- (!changed || diffopt->flags.dirty_submodules)) -- *dirty_submodule = is_submodule_modified(ce->name, ++ ++ orig_flags = diffopt->flags; ++ if (!diffopt->flags.override_submodule_config) ++ set_diffopt_flags_from_submodule_config(diffopt, ce->name); ++ if (diffopt->flags.ignore_submodules) { ++ changed = 0; ++ goto cleanup; ++ } ++ if (!diffopt->flags.ignore_dirty_submodules && + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { -+ *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); + *dirty_submodule = is_submodule_modified(ce->name, +- diffopt->flags.ignore_untracked_in_submodules); +- diffopt->flags = orig_flags; ++ diffopt->flags.ignore_untracked_in_submodules); + } -+ } - cleanup: - diffopt->flags = orig_flags; + } ++cleanup: ++ diffopt->flags = orig_flags; +ret: + if (defer_submodule_status) + *defer_submodule_status = defer; @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) changed, istate, ce)) continue; } -+ if (submodules.nr > 0) { -+ int parallel_jobs; -+ if (git_config_get_int("submodule.diffjobs", ¶llel_jobs)) ++ if (submodules.nr) { ++ unsigned long parallel_jobs; ++ struct string_list_item *item; ++ ++ if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) + parallel_jobs = 1; + else if (!parallel_jobs) + parallel_jobs = online_cpus(); -+ else if (parallel_jobs < 0) -+ die(_("submodule.diffjobs cannot be negative")); + + if (get_submodules_status(&submodules, parallel_jobs)) + die(_("submodule status failed")); -+ for (size_t i = 0; i < submodules.nr; i++) { -+ struct submodule_status_util *util = submodules.items[i].util; ++ for_each_string_list_item(item, &submodules) { ++ struct submodule_status_util *util = item->util; + + if (diff_change_helper(&revs->diffopt, util->newmode, + util->dirty_submodule, util->changed, @@ submodule.c: int submodule_touches_in_range(struct repository *r, + int result; + + struct string_list *submodule_names; -+ -+ /* Pending statuses by OIDs */ -+ struct status_task **oid_status_tasks; -+ int oid_status_tasks_nr, oid_status_tasks_alloc; +}; + struct submodule_parallel_fetch { @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack + struct status_task *task = task_cb; + + sps->result = 1; -+ strbuf_addf(err, -+ _(status_porcelain_start_error), -+ task->path); ++ strbuf_addf(err, _(STATUS_PORCELAIN_START_ERROR), task->path); + return 0; +} + @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack + + if (retvalue) { + sps->result = 1; -+ strbuf_addf(err, -+ _(status_porcelain_fail_error), -+ task->path); ++ strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path); + } + + parse_status_porcelain_strbuf(&task->out, -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 0/6] submodule: parallelize diff 2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan @ 2023-02-09 1:42 ` Ævar Arnfjörð Bjarmason 2023-02-09 19:50 ` Junio C Hamano ` (2 subsequent siblings) 3 siblings, 0 replies; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-09 1:42 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy, phillip.wood123 On Thu, Feb 09 2023, Calvin Wan wrote: > 6: 0d35fcc38d < -: ---------- diff-lib: refactor match_stat_with_submodule > 7: fd1eec974d ! 6: bb25dadbe5 diff-lib: parallelize run_diff_files for submodules > @@ diff-lib.c: static int check_removed(const struct index_state *istate, const str > + unsigned *ignore_untracked) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > - struct diff_flags orig_flags; > +- if (S_ISGITLINK(ce->ce_mode)) { > +- struct diff_flags orig_flags = diffopt->flags; > +- if (!diffopt->flags.override_submodule_config) > +- set_diffopt_flags_from_submodule_config(diffopt, ce->name); > +- if (diffopt->flags.ignore_submodules) > +- changed = 0; > +- else if (!diffopt->flags.ignore_dirty_submodules && > +- (!changed || diffopt->flags.dirty_submodules)) > ++ struct diff_flags orig_flags; > + int defer = 0; > - > - if (!S_ISGITLINK(ce->ce_mode)) > -- return changed; > ++ > ++ if (!S_ISGITLINK(ce->ce_mode)) > + goto ret; > - > - orig_flags = diffopt->flags; > - if (!diffopt->flags.override_submodule_config) > -@@ diff-lib.c: static int match_stat_with_submodule(struct diff_options *diffopt, > - goto cleanup; > - } > - if (!diffopt->flags.ignore_dirty_submodules && > -- (!changed || diffopt->flags.dirty_submodules)) > -- *dirty_submodule = is_submodule_modified(ce->name, > ++ > ++ orig_flags = diffopt->flags; > ++ if (!diffopt->flags.override_submodule_config) > ++ set_diffopt_flags_from_submodule_config(diffopt, ce->name); > ++ if (diffopt->flags.ignore_submodules) { > ++ changed = 0; > ++ goto cleanup; > ++ } > ++ if (!diffopt->flags.ignore_dirty_submodules && > + (!changed || diffopt->flags.dirty_submodules)) { > + if (defer_submodule_status && *defer_submodule_status) { > + defer = 1; > + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; > + } else { > -+ *dirty_submodule = is_submodule_modified(ce->name, > - diffopt->flags.ignore_untracked_in_submodules); > + *dirty_submodule = is_submodule_modified(ce->name, > +- diffopt->flags.ignore_untracked_in_submodules); > +- diffopt->flags = orig_flags; > ++ diffopt->flags.ignore_untracked_in_submodules); > + } > -+ } > - cleanup: > - diffopt->flags = orig_flags; > + } > ++cleanup: > ++ diffopt->flags = orig_flags; > +ret: > + if (defer_submodule_status) > + *defer_submodule_status = defer; > @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) > changed, istate, ce)) I think you dropped the 7/8 per my suggestion in [1]. I think this 6/6 is actually worse than the v6. I.e. it seems you dropped the previous refactoring commit by squashing the refactoring+functional change together. What I was pointing out in [1] was that you don't need the refactoring, and that both the change itself and the end-state is much easier to look at and reason about as a result I.e. I think the diff in your 6/6 should just be what's after "it becomes" in [1] (maybe with some pre-refactoring, e.g. we could add the braces first or whatever). But in case you strongly prefer the current end-state I think having your previous refactoring prep would be better, because it would at least split off some of the refactoring & functional change. I haven't looked as deeply at this v8 as v7 for the rest, but from skimming the range-diff it all looked good otherwise. 1. https://lore.kernel.org/git/230208.861qn01s4g.gmgdl@evledraar.gmail.com/ ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 0/6] submodule: parallelize diff 2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan 2023-02-09 1:42 ` Ævar Arnfjörð Bjarmason @ 2023-02-09 19:50 ` Junio C Hamano 2023-02-09 21:52 ` Calvin Wan 2023-02-09 20:50 ` Phillip Wood 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan 3 siblings, 1 reply; 86+ messages in thread From: Junio C Hamano @ 2023-02-09 19:50 UTC (permalink / raw) To: Calvin Wan; +Cc: git, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > Original cover letter for context: > https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ Thanks. I'll try to take a look at this today. By the way, how are you driving send-email when sending a multi-patch series with a cover letter? It seems that all messages in this series including its cover are marked as if they are replies to the cover letter of the previous round, which is a bit harder to follow than making only [v8 0/6] as a reply to [v7 0/X] and all [v8 n/6] (n > 0) to be replies to [v8 0/6]. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 0/6] submodule: parallelize diff 2023-02-09 19:50 ` Junio C Hamano @ 2023-02-09 21:52 ` Calvin Wan 2023-02-09 22:25 ` Junio C Hamano 2023-02-10 13:24 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-09 21:52 UTC (permalink / raw) To: Junio C Hamano Cc: git, avarab, chooglen, newren, jonathantanmy, phillip.wood123 > By the way, how are you driving send-email when sending a > multi-patch series with a cover letter? It seems that all > messages in this series including its cover are marked as if they > are replies to the cover letter of the previous round, which is a > bit harder to follow than making only [v8 0/6] as a reply to [v7 0/X] > and all [v8 n/6] (n > 0) to be replies to [v8 0/6]. I'll do that from now on -- didn't realize that make it harder to follow ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 0/6] submodule: parallelize diff 2023-02-09 21:52 ` Calvin Wan @ 2023-02-09 22:25 ` Junio C Hamano 2023-02-10 13:24 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 86+ messages in thread From: Junio C Hamano @ 2023-02-09 22:25 UTC (permalink / raw) To: Calvin Wan; +Cc: git, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: >> By the way, how are you driving send-email when sending a >> multi-patch series with a cover letter? It seems that all >> messages in this series including its cover are marked as if they >> are replies to the cover letter of the previous round, which is a >> bit harder to follow than making only [v8 0/6] as a reply to [v7 0/X] >> and all [v8 n/6] (n > 0) to be replies to [v8 0/6]. > > I'll do that from now on -- didn't realize that make it harder to follow "a bit harder" may even have been an exaggeration. It is just being different from how other topics by many other people are formatted. Thanks. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 0/6] submodule: parallelize diff 2023-02-09 21:52 ` Calvin Wan 2023-02-09 22:25 ` Junio C Hamano @ 2023-02-10 13:24 ` Ævar Arnfjörð Bjarmason 2023-02-10 17:42 ` Junio C Hamano 1 sibling, 1 reply; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-10 13:24 UTC (permalink / raw) To: Calvin Wan Cc: Junio C Hamano, git, chooglen, newren, jonathantanmy, phillip.wood123 On Thu, Feb 09 2023, Calvin Wan wrote: >> By the way, how are you driving send-email when sending a >> multi-patch series with a cover letter? It seems that all >> messages in this series including its cover are marked as if they >> are replies to the cover letter of the previous round, which is a >> bit harder to follow than making only [v8 0/6] as a reply to [v7 0/X] >> and all [v8 n/6] (n > 0) to be replies to [v8 0/6]. > > I'll do that from now on -- didn't realize that make it harder to follow Welcome to the club :) This came up before when I'd been sending mails like this for years, without realizing the difference: https://lore.kernel.org/git/nycvar.QRO.7.76.6.2103191540330.57@tvgsbejvaqbjf.bet/ & https://lore.kernel.org/git/xmqqr1k9k2w7.fsf@gitster.g/ ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 0/6] submodule: parallelize diff 2023-02-10 13:24 ` Ævar Arnfjörð Bjarmason @ 2023-02-10 17:42 ` Junio C Hamano 0 siblings, 0 replies; 86+ messages in thread From: Junio C Hamano @ 2023-02-10 17:42 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Calvin Wan, git, chooglen, newren, jonathantanmy, phillip.wood123 Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Welcome to the club :) > > This came up before when I'd been sending mails like this for years, > without realizing the difference: > https://lore.kernel.org/git/nycvar.QRO.7.76.6.2103191540330.57@tvgsbejvaqbjf.bet/ > & https://lore.kernel.org/git/xmqqr1k9k2w7.fsf@gitster.g/ The organization makes it easier to identify the cover letter, mechanically from the thread structure without relying on the subject line [*], and that is one of the things that the procedure to prepare the "What's cooking" report needs to do. Side note: When the "What's cooking" report is updated, it knows individual commits on a topic, and the message ID of the patch for each of these commits (they are recorded in refs/notes/amlog). But the message ID of the cover letter is not recorded anywhere because it does not become any commit, so it looks at these messages to find references and/or in-reply-to. A flat "everything including cover is reply to previous cover" organization would not help to find the cover of this iteration at all. In Documentation/SubmittingPatches, we give unnecessary either-or recommendation. We should clearly spell it out instead, perhaps something like this. Documentation/SubmittingPatches | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches index 927f7329a5..af7f2a4045 100644 --- c/Documentation/SubmittingPatches +++ w/Documentation/SubmittingPatches @@ -346,8 +346,9 @@ your code. For this reason, each patch should be submitted Multiple related patches should be grouped into their own e-mail thread to help readers find all parts of the series. To that end, -send them as replies to either an additional "cover letter" message -(see below), the first patch, or the respective preceding patch. +send them as replies to an additional "cover letter" message +(see below), which should be a reply to the "cover letter" of +the previous iteration. If your log message (including your name on the `Signed-off-by` trailer) is not writable in ASCII, make sure that ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v8 0/6] submodule: parallelize diff 2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan 2023-02-09 1:42 ` Ævar Arnfjörð Bjarmason 2023-02-09 19:50 ` Junio C Hamano @ 2023-02-09 20:50 ` Phillip Wood 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan 3 siblings, 0 replies; 86+ messages in thread From: Phillip Wood @ 2023-02-09 20:50 UTC (permalink / raw) To: Calvin Wan, git; +Cc: avarab, chooglen, newren, jonathantanmy, phillip.wood123 Hi Calvin On 09/02/2023 00:02, Calvin Wan wrote: > Original cover letter for context: > https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ > > This reroll contains stylistic changes suggested by Avar and Phillip, > and includes a range-diff below. > > Calvin Wan (6): > run-command: add duplicate_output_fn to run_processes_parallel_opts > submodule: strbuf variable rename > submodule: move status parsing into function > submodule: refactor is_submodule_modified() > diff-lib: refactor out diff_change logic > diff-lib: parallelize run_diff_files for submodules > > Documentation/config/submodule.txt | 12 ++ > diff-lib.c | 133 +++++++++++---- > run-command.c | 16 +- > run-command.h | 25 +++ > submodule.c | 266 ++++++++++++++++++++++++----- > submodule.h | 9 + > t/helper/test-run-command.c | 20 +++ > t/t0061-run-command.sh | 39 +++++ > t/t4027-diff-submodule.sh | 31 ++++ > t/t7506-status-submodule.sh | 25 +++ > 10 files changed, 497 insertions(+), 79 deletions(-) > > Range-diff against v7: > 6: 0d35fcc38d < -: ---------- diff-lib: refactor match_stat_with_submodule > 7: fd1eec974d ! 6: bb25dadbe5 diff-lib: parallelize run_diff_files for submodules > @@ diff-lib.c: static int check_removed(const struct index_state *istate, const str > + unsigned *ignore_untracked) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > - struct diff_flags orig_flags; > +- if (S_ISGITLINK(ce->ce_mode)) { > +- struct diff_flags orig_flags = diffopt->flags; > +- if (!diffopt->flags.override_submodule_config) > +- set_diffopt_flags_from_submodule_config(diffopt, ce->name); > +- if (diffopt->flags.ignore_submodules) > +- changed = 0; > +- else if (!diffopt->flags.ignore_dirty_submodules && > +- (!changed || diffopt->flags.dirty_submodules)) > ++ struct diff_flags orig_flags; > + int defer = 0; > - > - if (!S_ISGITLINK(ce->ce_mode)) > -- return changed; > ++ > ++ if (!S_ISGITLINK(ce->ce_mode)) > + goto ret; > - > - orig_flags = diffopt->flags; > - if (!diffopt->flags.override_submodule_config) > -@@ diff-lib.c: static int match_stat_with_submodule(struct diff_options *diffopt, > - goto cleanup; > - } > - if (!diffopt->flags.ignore_dirty_submodules && > -- (!changed || diffopt->flags.dirty_submodules)) > -- *dirty_submodule = is_submodule_modified(ce->name, > ++ > ++ orig_flags = diffopt->flags; > ++ if (!diffopt->flags.override_submodule_config) > ++ set_diffopt_flags_from_submodule_config(diffopt, ce->name); > ++ if (diffopt->flags.ignore_submodules) { > ++ changed = 0; > ++ goto cleanup; > ++ } > ++ if (!diffopt->flags.ignore_dirty_submodules && > + (!changed || diffopt->flags.dirty_submodules)) { > + if (defer_submodule_status && *defer_submodule_status) { > + defer = 1; > + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; > + } else { > -+ *dirty_submodule = is_submodule_modified(ce->name, > - diffopt->flags.ignore_untracked_in_submodules); > + *dirty_submodule = is_submodule_modified(ce->name, > +- diffopt->flags.ignore_untracked_in_submodules); > +- diffopt->flags = orig_flags; > ++ diffopt->flags.ignore_untracked_in_submodules); > + } > -+ } > - cleanup: > - diffopt->flags = orig_flags; > + } > ++cleanup: > ++ diffopt->flags = orig_flags; > +ret: > + if (defer_submodule_status) The idea behind the suggestion to drop the previous patch from the last version was to stop refactoring the if block and get away from having these labels. Can't you just add the "if (defer_submodule_status && ...)" into the existing code? Best Wishes Phillip > + *defer_submodule_status = defer; > @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) > changed, istate, ce)) > continue; > } > -+ if (submodules.nr > 0) { > -+ int parallel_jobs; > -+ if (git_config_get_int("submodule.diffjobs", ¶llel_jobs)) > ++ if (submodules.nr) { > ++ unsigned long parallel_jobs; > ++ struct string_list_item *item; > ++ > ++ if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) > + parallel_jobs = 1; > + else if (!parallel_jobs) > + parallel_jobs = online_cpus(); > -+ else if (parallel_jobs < 0) > -+ die(_("submodule.diffjobs cannot be negative")); > + > + if (get_submodules_status(&submodules, parallel_jobs)) > + die(_("submodule status failed")); > -+ for (size_t i = 0; i < submodules.nr; i++) { > -+ struct submodule_status_util *util = submodules.items[i].util; > ++ for_each_string_list_item(item, &submodules) { > ++ struct submodule_status_util *util = item->util; > + > + if (diff_change_helper(&revs->diffopt, util->newmode, > + util->dirty_submodule, util->changed, > @@ submodule.c: int submodule_touches_in_range(struct repository *r, > + int result; > + > + struct string_list *submodule_names; > -+ > -+ /* Pending statuses by OIDs */ > -+ struct status_task **oid_status_tasks; > -+ int oid_status_tasks_nr, oid_status_tasks_alloc; > +}; > + > struct submodule_parallel_fetch { > @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack > + struct status_task *task = task_cb; > + > + sps->result = 1; > -+ strbuf_addf(err, > -+ _(status_porcelain_start_error), > -+ task->path); > ++ strbuf_addf(err, _(STATUS_PORCELAIN_START_ERROR), task->path); > + return 0; > +} > + > @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack > + > + if (retvalue) { > + sps->result = 1; > -+ strbuf_addf(err, > -+ _(status_porcelain_fail_error), > -+ task->path); > ++ strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path); > + } > + > + parse_status_porcelain_strbuf(&task->out, ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v9 0/6] submodule: parallelize diff 2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan ` (2 preceding siblings ...) 2023-02-09 20:50 ` Phillip Wood @ 2023-03-02 21:52 ` Calvin Wan 2023-03-02 22:02 ` [PATCH v9 1/6] run-command: add on_stderr_output_fn to run_processes_parallel_opts Calvin Wan ` (5 more replies) 3 siblings, 6 replies; 86+ messages in thread From: Calvin Wan @ 2023-03-02 21:52 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Original cover letter for context: https://lore.kernel.org/git/20221011232604.839941-1-calvinwan@google.com/ I appreciate all the reviewers that have stuck through this entire series! Hoping this can be the final reroll as I believe I've addressed all feedback and personally am happy with the state of the patches. Changes from v8 - renamed duplicate_output_fn to on_stderr_output_fn - renamed diff_change_helper() to record_file_diff() and added comments - reworded commit message for patch 5 - removed the refactoring of match_stat_with_submodule() - inlined parse_status_porcelain_strbuf() - fixed stylistic nits and cleaned up unnecessary variables and logic Calvin Wan (6): run-command: add on_stderr_output_fn to run_processes_parallel_opts submodule: rename strbuf variable submodule: move status parsing into function submodule: refactor is_submodule_modified() diff-lib: refactor out diff_change logic diff-lib: parallelize run_diff_files for submodules Documentation/config/submodule.txt | 12 ++ diff-lib.c | 123 +++++++++++--- run-command.c | 16 +- run-command.h | 25 +++ submodule.c | 254 +++++++++++++++++++++++------ submodule.h | 9 + t/helper/test-run-command.c | 20 +++ t/t0061-run-command.sh | 39 +++++ t/t4027-diff-submodule.sh | 31 ++++ t/t7506-status-submodule.sh | 25 +++ 10 files changed, 478 insertions(+), 76 deletions(-) Range-diff against v8: 1: 5d51250c67 ! 1: 49749ae3a5 run-command: add duplicate_output_fn to run_processes_parallel_opts @@ Metadata Author: Calvin Wan <calvinwan@google.com> ## Commit message ## - run-command: add duplicate_output_fn to run_processes_parallel_opts + run-command: add on_stderr_output_fn to run_processes_parallel_opts ## run-command.c ## @@ run-command.c: static void pp_init(struct parallel_processes *pp, @@ run-command.c: static void pp_init(struct parallel_processes *pp, BUG("you need to specify a get_next_task function"); + if (opts->ungroup) { -+ if (opts->duplicate_output) -+ BUG("duplicate_output and ungroup are incompatible with each other"); ++ if (opts->on_stderr_output) ++ BUG("on_stderr_output and ungroup are incompatible with each other"); + } + CALLOC_ARRAY(pp->children, n); @@ run-command.c: static void pp_buffer_stderr(struct parallel_processes *pp, + } else if (n < 0) { if (errno != EAGAIN) die_errno("read"); -+ } else if (opts->duplicate_output) { -+ opts->duplicate_output(&pp->children[i].err, ++ } else if (opts->on_stderr_output) { ++ opts->on_stderr_output(&pp->children[i].err, + pp->children[i].err.len - n, + opts->data, pp->children[i].data); + } @@ run-command.h: typedef int (*start_failure_fn)(struct strbuf *out, +/** + * This callback is called whenever output from a child process is buffered -+ * ++ * + * See run_processes_parallel() below for a discussion of the "struct + * strbuf *out" parameter. -+ * ++ * + * The offset refers to the number of bytes originally in "out" before + * the output from the child process was buffered. Therefore, the buffer + * range, "out + buf" to the end of "out", would contain the buffer of @@ run-command.h: typedef int (*start_failure_fn)(struct strbuf *out, + * + * This function is incompatible with "ungroup" + */ -+typedef void (*duplicate_output_fn)(struct strbuf *out, size_t offset, ++typedef void (*on_stderr_output_fn)(struct strbuf *out, size_t offset, + void *pp_cb, void *pp_task_cb); + /** @@ run-command.h: struct run_process_parallel_opts start_failure_fn start_failure; + /** -+ * duplicate_output: See duplicate_output_fn() above. Unless you need ++ * on_stderr_output: See on_stderr_output_fn() above. Unless you need + * to capture output from child processes, leave this as NULL. + */ -+ duplicate_output_fn duplicate_output; ++ on_stderr_output_fn on_stderr_output; + /** * task_finished: See task_finished_fn() above. This can be @@ t/helper/test-run-command.c: static int no_job(struct child_process *cp, return 0; } -+static void duplicate_output(struct strbuf *out, ++static void on_stderr_output(struct strbuf *out, + size_t offset, + void *pp_cb UNUSED, + void *pp_task_cb UNUSED) @@ t/helper/test-run-command.c: static int no_job(struct child_process *cp, + + string_list_split(&list, out->buf + offset, '\n', -1); + for_each_string_list_item(item, &list) -+ fprintf(stderr, "duplicate_output: %s\n", item->string); ++ fprintf(stderr, "on_stderr_output: %s\n", item->string); + string_list_clear(&list, 0); +} + @@ t/helper/test-run-command.c: int cmd__run_command(int argc, const char **argv) opts.ungroup = 1; } -+ if (!strcmp(argv[1], "--duplicate-output")) { ++ if (!strcmp(argv[1], "--on-stderr-output")) { + argv += 1; + argc -= 1; -+ opts.duplicate_output = duplicate_output; ++ opts.on_stderr_output = on_stderr_output; + } + jobs = atoi(argv[2]); @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with m test_cmp expect actual ' -+test_expect_success 'run_command runs in parallel with more jobs available than tasks --duplicate-output' ' -+ test-tool run-command --duplicate-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && ++test_expect_success 'run_command runs in parallel with more jobs available than tasks --on-stderr-output' ' ++ test-tool run-command --on-stderr-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && -+ test 4 = $(grep -c "duplicate_output: Hello" err) && -+ test 4 = $(grep -c "duplicate_output: World" err) && -+ sed "/duplicate_output/d" err >err1 && ++ test 4 = $(grep -c "on_stderr_output: Hello" err) && ++ test 4 = $(grep -c "on_stderr_output: World" err) && ++ sed "/on_stderr_output/d" err >err1 && + test_cmp expect err1 +' + @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with a test_cmp expect actual ' -+test_expect_success 'run_command runs in parallel with as many jobs as tasks --duplicate-output' ' -+ test-tool run-command --duplicate-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && ++test_expect_success 'run_command runs in parallel with as many jobs as tasks --on-stderr-output' ' ++ test-tool run-command --on-stderr-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && -+ test 4 = $(grep -c "duplicate_output: Hello" err) && -+ test 4 = $(grep -c "duplicate_output: World" err) && -+ sed "/duplicate_output/d" err >err1 && ++ test 4 = $(grep -c "on_stderr_output: Hello" err) && ++ test 4 = $(grep -c "on_stderr_output: World" err) && ++ sed "/on_stderr_output/d" err >err1 && + test_cmp expect err1 +' + @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with m test_cmp expect actual ' -+test_expect_success 'run_command runs in parallel with more tasks than jobs available --duplicate-output' ' -+ test-tool run-command --duplicate-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && ++test_expect_success 'run_command runs in parallel with more tasks than jobs available --on-stderr-output' ' ++ test-tool run-command --on-stderr-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && -+ test 4 = $(grep -c "duplicate_output: Hello" err) && -+ test 4 = $(grep -c "duplicate_output: World" err) && -+ sed "/duplicate_output/d" err >err1 && ++ test 4 = $(grep -c "on_stderr_output: Hello" err) && ++ test 4 = $(grep -c "on_stderr_output: World" err) && ++ sed "/on_stderr_output/d" err >err1 && + test_cmp expect err1 +' + @@ t/t0061-run-command.sh: test_expect_success 'run_command is asked to abort grace test_cmp expect actual ' -+test_expect_success 'run_command is asked to abort gracefully --duplicate-output' ' -+ test-tool run-command --duplicate-output run-command-abort 3 false >out 2>err && ++test_expect_success 'run_command is asked to abort gracefully --on-stderr-output' ' ++ test-tool run-command --on-stderr-output run-command-abort 3 false >out 2>err && + test_must_be_empty out && + test_cmp expect err +' @@ t/t0061-run-command.sh: test_expect_success 'run_command outputs ' ' test_cmp expect actual ' -+test_expect_success 'run_command outputs --duplicate-output' ' -+ test-tool run-command --duplicate-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && ++test_expect_success 'run_command outputs --on-stderr-output' ' ++ test-tool run-command --on-stderr-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test_cmp expect err +' 2: 6ded5b6788 ! 2: 6c62e670f9 submodule: strbuf variable rename @@ Metadata Author: Calvin Wan <calvinwan@google.com> ## Commit message ## - submodule: strbuf variable rename + submodule: rename strbuf variable ## submodule.c ## @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untracked) 3: 0c71cea8cd = 3: 24e02f2a24 submodule: move status parsing into function 4: 5c8cc93f9f = 4: 86c1f734a0 submodule: refactor is_submodule_modified() 5: 6c2b62abc8 ! 5: 811a1fee55 diff-lib: refactor out diff_change logic @@ diff-lib.c: static int match_stat_with_submodule(struct diff_options *diffopt, return changed; } -+static int diff_change_helper(struct diff_options *options, -+ unsigned newmode, unsigned dirty_submodule, -+ int changed, struct index_state *istate, -+ struct cache_entry *ce) ++/** ++ * Records diff_change if there is a change in the entry from run_diff_files. ++ * If there is no change, then the cache entry is marked CE_UPTODATE and ++ * CE_FSMONITOR_VALID. If there is no change and the find_copies_harder flag ++ * is not set, then the function returns early. ++ */ ++static void record_file_diff(struct diff_options *options, unsigned newmode, ++ unsigned dirty_submodule, int changed, ++ struct index_state *istate, ++ struct cache_entry *ce) +{ + unsigned int oldmode; + const struct object_id *old_oid, *new_oid; @@ diff-lib.c: static int match_stat_with_submodule(struct diff_options *diffopt, + ce_mark_uptodate(ce); + mark_fsmonitor_valid(istate, ce); + if (!options->flags.find_copies_harder) -+ return 1; ++ return; + } + oldmode = ce->ce_mode; + old_oid = &ce->oid; + new_oid = changed ? null_oid() : &ce->oid; -+ diff_change(options, oldmode, newmode, -+ old_oid, new_oid, -+ !is_null_oid(old_oid), -+ !is_null_oid(new_oid), -+ ce->name, 0, dirty_submodule); -+ return 0; ++ diff_change(options, oldmode, newmode, old_oid, new_oid, ++ !is_null_oid(old_oid), !is_null_oid(new_oid), ++ ce->name, 0, dirty_submodule); +} + int run_diff_files(struct rev_info *revs, unsigned int option) @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) - !is_null_oid(new_oid), - ce->name, 0, dirty_submodule); - -+ if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, -+ changed, istate, ce)) -+ continue; ++ record_file_diff(&revs->diffopt, newmode, dirty_submodule, ++ changed, istate, ce); } diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); 6: bb25dadbe5 ! 6: 17010fc179 diff-lib: parallelize run_diff_files for submodules @@ diff-lib.c: static int check_removed(const struct index_state *istate, const str + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); -- if (S_ISGITLINK(ce->ce_mode)) { -- struct diff_flags orig_flags = diffopt->flags; -- if (!diffopt->flags.override_submodule_config) -- set_diffopt_flags_from_submodule_config(diffopt, ce->name); ++ int defer = 0; ++ + if (S_ISGITLINK(ce->ce_mode)) { + struct diff_flags orig_flags = diffopt->flags; + if (!diffopt->flags.override_submodule_config) + set_diffopt_flags_from_submodule_config(diffopt, ce->name); - if (diffopt->flags.ignore_submodules) -- changed = 0; ++ if (diffopt->flags.ignore_submodules) { + changed = 0; - else if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) -+ struct diff_flags orig_flags; -+ int defer = 0; -+ -+ if (!S_ISGITLINK(ce->ce_mode)) -+ goto ret; -+ -+ orig_flags = diffopt->flags; -+ if (!diffopt->flags.override_submodule_config) -+ set_diffopt_flags_from_submodule_config(diffopt, ce->name); -+ if (diffopt->flags.ignore_submodules) { -+ changed = 0; -+ goto cleanup; -+ } -+ if (!diffopt->flags.ignore_dirty_submodules && -+ (!changed || diffopt->flags.dirty_submodules)) { -+ if (defer_submodule_status && *defer_submodule_status) { -+ defer = 1; -+ *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; -+ } else { - *dirty_submodule = is_submodule_modified(ce->name, +- *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); -- diffopt->flags = orig_flags; ++ } else if (!diffopt->flags.ignore_dirty_submodules && ++ (!changed || diffopt->flags.dirty_submodules)) { ++ if (defer_submodule_status && *defer_submodule_status) { ++ defer = 1; ++ *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; ++ } else { ++ *dirty_submodule = is_submodule_modified(ce->name, + diffopt->flags.ignore_untracked_in_submodules); ++ } + } + diffopt->flags = orig_flags; } -+cleanup: -+ diffopt->flags = orig_flags; -+ret: ++ + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); +@@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) + unsigned int newmode; + struct cache_entry *ce = istate->cache[i]; + int changed; +- unsigned dirty_submodule = 0; ++ int defer_submodule_status = 1; + + if (diff_can_quit_early(&revs->diffopt)) + break; @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce->ce_mode; } else { struct stat st; + unsigned ignore_untracked = 0; -+ int defer_submodule_status = 1; changed = check_removed(istate, ce, &st); if (changed) { @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) changed = match_stat_with_submodule(&revs->diffopt, ce, &st, - ce_option, &dirty_submodule); -+ ce_option, &dirty_submodule, ++ ce_option, NULL, + &defer_submodule_status, + &ignore_untracked); newmode = ce_mode_from_stat(ce, st.st_mode); @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) + } } - if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, - changed, istate, ce)) - continue; - } +- record_file_diff(&revs->diffopt, newmode, dirty_submodule, +- changed, istate, ce); ++ if (!defer_submodule_status) ++ record_file_diff(&revs->diffopt, newmode, 0, ++ changed,istate, ce); ++ } + if (submodules.nr) { + unsigned long parallel_jobs; + struct string_list_item *item; @@ diff-lib.c: int run_diff_files(struct rev_info *revs, unsigned int option) + for_each_string_list_item(item, &submodules) { + struct submodule_status_util *util = item->util; + -+ if (diff_change_helper(&revs->diffopt, util->newmode, -+ util->dirty_submodule, util->changed, -+ istate, util->ce)) -+ continue; ++ record_file_diff(&revs->diffopt, util->newmode, ++ util->dirty_submodule, util->changed, ++ istate, util->ce); + } -+ } + } + string_list_clear(&submodules, 1); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); @@ submodule.c: struct fetch_task { /** * When a submodule is not defined in .gitmodules, we cannot access it * via the regular submodule-config. Create a fake submodule, which we can -@@ submodule.c: static int parse_status_porcelain(char *str, size_t len, - return 0; - } - -+static void parse_status_porcelain_strbuf(struct strbuf *buf, -+ unsigned *dirty_submodule, -+ int ignore_untracked) -+{ -+ struct string_list list = STRING_LIST_INIT_DUP; -+ struct string_list_item *item; -+ -+ string_list_split(&list, buf->buf, '\n', -1); -+ -+ for_each_string_list_item(item, &list) { -+ if (parse_status_porcelain(item->string, -+ strlen(item->string), -+ dirty_submodule, -+ ignore_untracked)) -+ break; -+ } -+ string_list_clear(&list, 0); -+} -+ - unsigned is_submodule_modified(const char *path, int ignore_untracked) - { - struct child_process cp = CHILD_PROCESS_INIT; @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untracked) return dirty_submodule; } @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack + return 0; +} + -+static void status_duplicate_output(struct strbuf *out, ++static void status_on_stderr_output(struct strbuf *out, + size_t offset, + void *cb, void *task_cb) +{ @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack + struct string_list_item *it = + string_list_lookup(sps->submodule_names, task->path); + struct submodule_status_util *util = it->util; ++ struct string_list list = STRING_LIST_INIT_DUP; ++ struct string_list_item *item; + + if (retvalue) { + sps->result = 1; + strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path); + } + -+ parse_status_porcelain_strbuf(&task->out, -+ &util->dirty_submodule, -+ util->ignore_untracked); -+ ++ string_list_split(&list, task->out.buf, '\n', -1); ++ for_each_string_list_item(item, &list) { ++ if (parse_status_porcelain(item->string, ++ strlen(item->string), ++ &util->dirty_submodule, ++ util->ignore_untracked)) ++ break; ++ } ++ string_list_clear(&list, 0); + strbuf_release(&task->out); + free(task); + @@ submodule.c: unsigned is_submodule_modified(const char *path, int ignore_untrack + + .get_next_task = get_next_submodule_status, + .start_failure = status_start_failure, -+ .duplicate_output = status_duplicate_output, ++ .on_stderr_output = status_on_stderr_output, + .task_finished = status_finish, + .data = &sps, + }; -- 2.40.0.rc0.216.gc4246ad0f0-goog ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v9 1/6] run-command: add on_stderr_output_fn to run_processes_parallel_opts 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan @ 2023-03-02 22:02 ` Calvin Wan 2023-03-02 22:02 ` [PATCH v9 2/6] submodule: rename strbuf variable Calvin Wan ` (4 subsequent siblings) 5 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-03-02 22:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Add duplicate_output_fn as an optionally set function in run_process_parallel_opts. If set, output from each child process is copied and passed to the callback function whenever output from the child process is buffered to allow for separate parsing. Fix two items in pp_buffer_stderr: * strbuf_read_once returns a ssize_t but the variable it is set to is an int so fix that. * Add missing brackets to "else if" statement The ungroup/duplicate_output incompatibility check is nested to prepare for future imcompatibles modes with ungroup. Signed-off-by: Calvin Wan <calvinwan@google.com> --- run-command.c | 16 ++++++++++++--- run-command.h | 25 ++++++++++++++++++++++++ t/helper/test-run-command.c | 20 +++++++++++++++++++ t/t0061-run-command.sh | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 97 insertions(+), 3 deletions(-) diff --git a/run-command.c b/run-command.c index 756f1839aa..7eed4e98c2 100644 --- a/run-command.c +++ b/run-command.c @@ -1526,6 +1526,11 @@ static void pp_init(struct parallel_processes *pp, if (!opts->get_next_task) BUG("you need to specify a get_next_task function"); + if (opts->ungroup) { + if (opts->on_stderr_output) + BUG("on_stderr_output and ungroup are incompatible with each other"); + } + CALLOC_ARRAY(pp->children, n); if (!opts->ungroup) CALLOC_ARRAY(pp->pfd, n); @@ -1645,14 +1650,19 @@ static void pp_buffer_stderr(struct parallel_processes *pp, for (size_t i = 0; i < opts->processes; i++) { if (pp->children[i].state == GIT_CP_WORKING && pp->pfd[i].revents & (POLLIN | POLLHUP)) { - int n = strbuf_read_once(&pp->children[i].err, - pp->children[i].process.err, 0); + ssize_t n = strbuf_read_once(&pp->children[i].err, + pp->children[i].process.err, 0); if (n == 0) { close(pp->children[i].process.err); pp->children[i].state = GIT_CP_WAIT_CLEANUP; - } else if (n < 0) + } else if (n < 0) { if (errno != EAGAIN) die_errno("read"); + } else if (opts->on_stderr_output) { + opts->on_stderr_output(&pp->children[i].err, + pp->children[i].err.len - n, + opts->data, pp->children[i].data); + } } } } diff --git a/run-command.h b/run-command.h index 072db56a4d..8f08e41fae 100644 --- a/run-command.h +++ b/run-command.h @@ -408,6 +408,25 @@ typedef int (*start_failure_fn)(struct strbuf *out, void *pp_cb, void *pp_task_cb); +/** + * This callback is called whenever output from a child process is buffered + * + * See run_processes_parallel() below for a discussion of the "struct + * strbuf *out" parameter. + * + * The offset refers to the number of bytes originally in "out" before + * the output from the child process was buffered. Therefore, the buffer + * range, "out + buf" to the end of "out", would contain the buffer of + * the child process output. + * + * pp_cb is the callback cookie as passed into run_processes_parallel, + * pp_task_cb is the callback cookie as passed into get_next_task_fn. + * + * This function is incompatible with "ungroup" + */ +typedef void (*on_stderr_output_fn)(struct strbuf *out, size_t offset, + void *pp_cb, void *pp_task_cb); + /** * This callback is called on every child process that finished processing. * @@ -461,6 +480,12 @@ struct run_process_parallel_opts */ start_failure_fn start_failure; + /** + * on_stderr_output: See on_stderr_output_fn() above. Unless you need + * to capture output from child processes, leave this as NULL. + */ + on_stderr_output_fn on_stderr_output; + /** * task_finished: See task_finished_fn() above. This can be * NULL to omit any special handling. diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index 3ecb830f4a..a2fac6f762 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -52,6 +52,20 @@ static int no_job(struct child_process *cp, return 0; } +static void on_stderr_output(struct strbuf *out, + size_t offset, + void *pp_cb UNUSED, + void *pp_task_cb UNUSED) +{ + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + + string_list_split(&list, out->buf + offset, '\n', -1); + for_each_string_list_item(item, &list) + fprintf(stderr, "on_stderr_output: %s\n", item->string); + string_list_clear(&list, 0); +} + static int task_finished(int result, struct strbuf *err, void *pp_cb, @@ -439,6 +453,12 @@ int cmd__run_command(int argc, const char **argv) opts.ungroup = 1; } + if (!strcmp(argv[1], "--on-stderr-output")) { + argv += 1; + argc -= 1; + opts.on_stderr_output = on_stderr_output; + } + jobs = atoi(argv[2]); strvec_clear(&proc.args); strvec_pushv(&proc.args, (const char **)argv + 3); diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index e2411f6a9b..883d871dfb 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -135,6 +135,15 @@ test_expect_success 'run_command runs in parallel with more jobs available than test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more jobs available than tasks --on-stderr-output' ' + test-tool run-command --on-stderr-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "on_stderr_output: Hello" err) && + test 4 = $(grep -c "on_stderr_output: World" err) && + sed "/on_stderr_output/d" err >err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' ' test-tool run-command --ungroup run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -147,6 +156,15 @@ test_expect_success 'run_command runs in parallel with as many jobs as tasks' ' test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with as many jobs as tasks --on-stderr-output' ' + test-tool run-command --on-stderr-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "on_stderr_output: Hello" err) && + test 4 = $(grep -c "on_stderr_output: World" err) && + sed "/on_stderr_output/d" err >err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' ' test-tool run-command --ungroup run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -159,6 +177,15 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more tasks than jobs available --on-stderr-output' ' + test-tool run-command --on-stderr-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "on_stderr_output: Hello" err) && + test 4 = $(grep -c "on_stderr_output: World" err) && + sed "/on_stderr_output/d" err >err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' ' test-tool run-command --ungroup run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -180,6 +207,12 @@ test_expect_success 'run_command is asked to abort gracefully' ' test_cmp expect actual ' +test_expect_success 'run_command is asked to abort gracefully --on-stderr-output' ' + test-tool run-command --on-stderr-output run-command-abort 3 false >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command is asked to abort gracefully (ungroup)' ' test-tool run-command --ungroup run-command-abort 3 false >out 2>err && test_must_be_empty out && @@ -196,6 +229,12 @@ test_expect_success 'run_command outputs ' ' test_cmp expect actual ' +test_expect_success 'run_command outputs --on-stderr-output' ' + test-tool run-command --on-stderr-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command outputs (ungroup) ' ' test-tool run-command --ungroup run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_must_be_empty out && -- 2.40.0.rc0.216.gc4246ad0f0-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v9 2/6] submodule: rename strbuf variable 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan 2023-03-02 22:02 ` [PATCH v9 1/6] run-command: add on_stderr_output_fn to run_processes_parallel_opts Calvin Wan @ 2023-03-02 22:02 ` Calvin Wan 2023-03-03 0:25 ` Junio C Hamano 2023-03-02 22:02 ` [PATCH v9 3/6] submodule: move status parsing into function Calvin Wan ` (3 subsequent siblings) 5 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-03-02 22:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 A prepatory change for a future patch that moves the status parsing logic to a separate function. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/submodule.c b/submodule.c index fae24ef34a..faf37c1101 100644 --- a/submodule.c +++ b/submodule.c @@ -1906,25 +1906,28 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { + char *str = buf.buf; + const size_t len = buf.len; + /* regular untracked files */ - if (buf.buf[0] == '?') + if (str[0] == '?') dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '1' || - buf.buf[0] == '2') { + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { /* T = line type, XY = status, SSSS = submodule state */ - if (buf.len < strlen("T XY SSSS")) + if (len < strlen("T XY SSSS")) BUG("invalid status --porcelain=2 line %s", - buf.buf); + str); - if (buf.buf[5] == 'S' && buf.buf[8] == 'U') + if (str[5] == 'S' && str[8] == 'U') /* nested untracked file */ dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '2' || - memcmp(buf.buf + 5, "S..U", 4)) + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) /* other change */ dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; } -- 2.40.0.rc0.216.gc4246ad0f0-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v9 2/6] submodule: rename strbuf variable 2023-03-02 22:02 ` [PATCH v9 2/6] submodule: rename strbuf variable Calvin Wan @ 2023-03-03 0:25 ` Junio C Hamano 2023-03-06 17:37 ` Calvin Wan 0 siblings, 1 reply; 86+ messages in thread From: Junio C Hamano @ 2023-03-03 0:25 UTC (permalink / raw) To: Calvin Wan; +Cc: git, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > A prepatory change for a future patch that moves the status parsing > logic to a separate function. > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > submodule.c | 23 +++++++++++++---------- > 1 file changed, 13 insertions(+), 10 deletions(-) > Subject: Re: [PATCH v9 2/6] submodule: rename strbuf variable What strbuf variable renamed to what? I have a feeling that squashing this and 3/6 into a single patch, and pass buf.buf and buf.len to the new helper function without introducing an intermediate variables in the caller, would make the resulting code easier to follow. In any case, nice factoring out of a useful helper function. > diff --git a/submodule.c b/submodule.c > index fae24ef34a..faf37c1101 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1906,25 +1906,28 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) > > fp = xfdopen(cp.out, "r"); > while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { > + char *str = buf.buf; > + const size_t len = buf.len; > + > /* regular untracked files */ > - if (buf.buf[0] == '?') > + if (str[0] == '?') > dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > > - if (buf.buf[0] == 'u' || > - buf.buf[0] == '1' || > - buf.buf[0] == '2') { > + if (str[0] == 'u' || > + str[0] == '1' || > + str[0] == '2') { > /* T = line type, XY = status, SSSS = submodule state */ > - if (buf.len < strlen("T XY SSSS")) > + if (len < strlen("T XY SSSS")) > BUG("invalid status --porcelain=2 line %s", > - buf.buf); > + str); > > - if (buf.buf[5] == 'S' && buf.buf[8] == 'U') > + if (str[5] == 'S' && str[8] == 'U') > /* nested untracked file */ > dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > > - if (buf.buf[0] == 'u' || > - buf.buf[0] == '2' || > - memcmp(buf.buf + 5, "S..U", 4)) > + if (str[0] == 'u' || > + str[0] == '2' || > + memcmp(str + 5, "S..U", 4)) > /* other change */ > dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; > } ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v9 2/6] submodule: rename strbuf variable 2023-03-03 0:25 ` Junio C Hamano @ 2023-03-06 17:37 ` Calvin Wan 2023-03-06 18:30 ` Junio C Hamano 0 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-03-06 17:37 UTC (permalink / raw) To: Junio C Hamano Cc: git, avarab, chooglen, newren, jonathantanmy, phillip.wood123 On Thu, Mar 2, 2023 at 4:25 PM Junio C Hamano <gitster@pobox.com> wrote: > > Calvin Wan <calvinwan@google.com> writes: > > > A prepatory change for a future patch that moves the status parsing > > logic to a separate function. > > > > Signed-off-by: Calvin Wan <calvinwan@google.com> > > --- > > submodule.c | 23 +++++++++++++---------- > > 1 file changed, 13 insertions(+), 10 deletions(-) > > > Subject: Re: [PATCH v9 2/6] submodule: rename strbuf variable > > What strbuf variable renamed to what? > > I have a feeling that squashing this and 3/6 into a single patch, > and pass buf.buf and buf.len to the new helper function without > introducing an intermediate variables in the caller, would make the > resulting code easier to follow. > > In any case, nice factoring out of a useful helper function. > A much earlier version squashed those changes together, but it was recommended to split those changes up; I think I am indifferent either way since the refactoring is clear to me whether it is split up or not. https://lore.kernel.org/git/221012.868rllo545.gmgdl@evledraar.gmail.com/ ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v9 2/6] submodule: rename strbuf variable 2023-03-06 17:37 ` Calvin Wan @ 2023-03-06 18:30 ` Junio C Hamano 2023-03-06 19:00 ` Calvin Wan 0 siblings, 1 reply; 86+ messages in thread From: Junio C Hamano @ 2023-03-06 18:30 UTC (permalink / raw) To: Calvin Wan; +Cc: git, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > On Thu, Mar 2, 2023 at 4:25 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Calvin Wan <calvinwan@google.com> writes: >> >> > A prepatory change for a future patch that moves the status parsing >> > logic to a separate function. >> > >> > Signed-off-by: Calvin Wan <calvinwan@google.com> >> > --- >> > submodule.c | 23 +++++++++++++---------- >> > 1 file changed, 13 insertions(+), 10 deletions(-) >> >> > Subject: Re: [PATCH v9 2/6] submodule: rename strbuf variable >> >> What strbuf variable renamed to what? >> >> I have a feeling that squashing this and 3/6 into a single patch, >> and pass buf.buf and buf.len to the new helper function without >> introducing an intermediate variables in the caller, would make the >> resulting code easier to follow. >> >> In any case, nice factoring out of a useful helper function. >> > > A much earlier version squashed those changes together, but it was > recommended to split those changes up; I think I am indifferent either way > since the refactoring is clear to me whether it is split up or not. > https://lore.kernel.org/git/221012.868rllo545.gmgdl@evledraar.gmail.com/ I am indifferent, either, but with or without them squashed into a single patch, "rename strbuf" would not be how you would describe the value of this refactoring, which is to make the interface not depend on strbuf. Some callers may have separate <ptr,len> pair that is not in strbuf, and with the current interface they are forced to wrap the pair in a throw-away strbuf which is not nice. And squashing them together into a single patch, it becomes a lot clear what the point of these two steps combined is. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v9 2/6] submodule: rename strbuf variable 2023-03-06 18:30 ` Junio C Hamano @ 2023-03-06 19:00 ` Calvin Wan 0 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-03-06 19:00 UTC (permalink / raw) To: Junio C Hamano Cc: git, avarab, chooglen, newren, jonathantanmy, phillip.wood123 On Mon, Mar 6, 2023 at 10:30 AM Junio C Hamano <gitster@pobox.com> wrote: > > Calvin Wan <calvinwan@google.com> writes: > > > On Thu, Mar 2, 2023 at 4:25 PM Junio C Hamano <gitster@pobox.com> wrote: > >> > >> Calvin Wan <calvinwan@google.com> writes: > >> > >> > A prepatory change for a future patch that moves the status parsing > >> > logic to a separate function. > >> > > >> > Signed-off-by: Calvin Wan <calvinwan@google.com> > >> > --- > >> > submodule.c | 23 +++++++++++++---------- > >> > 1 file changed, 13 insertions(+), 10 deletions(-) > >> > >> > Subject: Re: [PATCH v9 2/6] submodule: rename strbuf variable > >> > >> What strbuf variable renamed to what? > >> > >> I have a feeling that squashing this and 3/6 into a single patch, > >> and pass buf.buf and buf.len to the new helper function without > >> introducing an intermediate variables in the caller, would make the > >> resulting code easier to follow. > >> > >> In any case, nice factoring out of a useful helper function. > >> > > > > A much earlier version squashed those changes together, but it was > > recommended to split those changes up; I think I am indifferent either way > > since the refactoring is clear to me whether it is split up or not. > > https://lore.kernel.org/git/221012.868rllo545.gmgdl@evledraar.gmail.com/ > > I am indifferent, either, but with or without them squashed into a > single patch, "rename strbuf" would not be how you would describe > the value of this refactoring, which is to make the interface not > depend on strbuf. Some callers may have separate <ptr,len> pair > that is not in strbuf, and with the current interface they are > forced to wrap the pair in a throw-away strbuf which is not nice. I see what you mean here; will reword the commit message, thanks! ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v9 3/6] submodule: move status parsing into function 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan 2023-03-02 22:02 ` [PATCH v9 1/6] run-command: add on_stderr_output_fn to run_processes_parallel_opts Calvin Wan 2023-03-02 22:02 ` [PATCH v9 2/6] submodule: rename strbuf variable Calvin Wan @ 2023-03-02 22:02 ` Calvin Wan 2023-03-17 20:42 ` Glen Choo 2023-03-02 22:02 ` [PATCH v9 4/6] submodule: refactor is_submodule_modified() Calvin Wan ` (2 subsequent siblings) 5 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-03-02 22:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 A future patch requires the ability to parse the output of git status --porcelain=2. Move parsing code from is_submodule_modified to parse_status_porcelain. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 74 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/submodule.c b/submodule.c index faf37c1101..768d4b4cd7 100644 --- a/submodule.c +++ b/submodule.c @@ -1870,6 +1870,45 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int parse_status_porcelain(char *str, size_t len, + unsigned *dirty_submodule, + int ignore_untracked) +{ + /* regular untracked files */ + if (str[0] == '?') + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { + /* T = line type, XY = status, SSSS = submodule state */ + if (len < strlen("T XY SSSS")) + BUG("invalid status --porcelain=2 line %s", + str); + + if (str[5] == 'S' && str[8] == 'U') + /* nested untracked file */ + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) + /* other change */ + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } + + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + ignore_untracked)) { + /* + * We're not interested in any further information from + * the child any more, neither output nor its exit code. + */ + return 1; + } + return 0; +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1909,39 +1948,10 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) char *str = buf.buf; const size_t len = buf.len; - /* regular untracked files */ - if (str[0] == '?') - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '1' || - str[0] == '2') { - /* T = line type, XY = status, SSSS = submodule state */ - if (len < strlen("T XY SSSS")) - BUG("invalid status --porcelain=2 line %s", - str); - - if (str[5] == 'S' && str[8] == 'U') - /* nested untracked file */ - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '2' || - memcmp(str + 5, "S..U", 4)) - /* other change */ - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; - } - - if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || - ignore_untracked)) { - /* - * We're not interested in any further information from - * the child any more, neither output nor its exit code. - */ - ignore_cp_exit_code = 1; + ignore_cp_exit_code = parse_status_porcelain(str, len, &dirty_submodule, + ignore_untracked); + if (ignore_cp_exit_code) break; - } } fclose(fp); -- 2.40.0.rc0.216.gc4246ad0f0-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v9 3/6] submodule: move status parsing into function 2023-03-02 22:02 ` [PATCH v9 3/6] submodule: move status parsing into function Calvin Wan @ 2023-03-17 20:42 ` Glen Choo 0 siblings, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-03-17 20:42 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > A future patch requires the ability to parse the output of git > status --porcelain=2. Move parsing code from is_submodule_modified to > parse_status_porcelain. If my mental model is correct [1], i.e. that we are implementing a parallel version of is_submodule_modified(). I think we should be more explicit in this patch and the next, e.g.: In a later patch, we will implement a parallel version of is_submodule_modified(). Refactor its "git status --porcelain=2" parsing code so that we can reuse it both the parallel and non-parallel versions. If so, then this is pretty much doing the same thing as the next patch, so if the --color-moved diff isn't too bad, I think we can squash them, which will make the commit message easier to write too: In a later patch, we will implement a parallel version of is_submodule_modified(). Refactor its setup and parsing code so that we can reuse it both the parallel and non-parallel versions. - Setting up the subprocess is moved to prepare_status_porcelain() - XYZ is moved to verify_submodule_git_directory() - ABC is moved to parse_foobarbaz() Just an idea. I don't think squashing is necessarily better, but being explciit that we want a parallel version of is_submodule_modified() will make this easier to follow. [1] https://lore.kernel.org/git/kl6ljzzguqss.fsf@chooglen-macbookpro.roam.corp.google.com ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v9 4/6] submodule: refactor is_submodule_modified() 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan ` (2 preceding siblings ...) 2023-03-02 22:02 ` [PATCH v9 3/6] submodule: move status parsing into function Calvin Wan @ 2023-03-02 22:02 ` Calvin Wan 2023-03-02 22:02 ` [PATCH v9 5/6] diff-lib: refactor out diff_change logic Calvin Wan 2023-03-02 22:02 ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 5 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-03-02 22:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Refactor out submodule status logic and error messages that will be used in a future patch. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 65 ++++++++++++++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 23 deletions(-) diff --git a/submodule.c b/submodule.c index 768d4b4cd7..426074cebb 100644 --- a/submodule.c +++ b/submodule.c @@ -28,6 +28,10 @@ static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; static int initialized_fetch_ref_tips; static struct oid_array ref_tips_before_fetch; static struct oid_array ref_tips_after_fetch; +#define STATUS_PORCELAIN_START_ERROR \ + N_("could not run 'git status --porcelain=2' in submodule %s") +#define STATUS_PORCELAIN_FAIL_ERROR \ + N_("'git status --porcelain=2' failed in submodule %s") /* * Check if the .gitmodules file is unmerged. Parsing of the .gitmodules file @@ -1870,6 +1874,40 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int verify_submodule_git_directory(const char *path) +{ + const char *git_dir; + struct strbuf buf = STRBUF_INIT; + + strbuf_addf(&buf, "%s/.git", path); + git_dir = read_gitfile(buf.buf); + if (!git_dir) + git_dir = buf.buf; + if (!is_git_directory(git_dir)) { + if (is_directory(git_dir)) + die(_("'%s' not recognized as a git repository"), git_dir); + strbuf_release(&buf); + /* The submodule is not checked out, so it is not modified */ + return 0; + } + strbuf_release(&buf); + return 1; +} + +static void prepare_status_porcelain(struct child_process *cp, + const char *path, int ignore_untracked) +{ + strvec_pushl(&cp->args, "status", "--porcelain=2", NULL); + if (ignore_untracked) + strvec_push(&cp->args, "-uno"); + + prepare_submodule_repo_env(&cp->env); + cp->git_cmd = 1; + cp->no_stdin = 1; + cp->out = -1; + cp->dir = path; +} + static int parse_status_porcelain(char *str, size_t len, unsigned *dirty_submodule, int ignore_untracked) @@ -1915,33 +1953,14 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) struct strbuf buf = STRBUF_INIT; FILE *fp; unsigned dirty_submodule = 0; - const char *git_dir; int ignore_cp_exit_code = 0; - strbuf_addf(&buf, "%s/.git", path); - git_dir = read_gitfile(buf.buf); - if (!git_dir) - git_dir = buf.buf; - if (!is_git_directory(git_dir)) { - if (is_directory(git_dir)) - die(_("'%s' not recognized as a git repository"), git_dir); - strbuf_release(&buf); - /* The submodule is not checked out, so it is not modified */ + if (!verify_submodule_git_directory(path)) return 0; - } - strbuf_reset(&buf); - - strvec_pushl(&cp.args, "status", "--porcelain=2", NULL); - if (ignore_untracked) - strvec_push(&cp.args, "-uno"); - prepare_submodule_repo_env(&cp.env); - cp.git_cmd = 1; - cp.no_stdin = 1; - cp.out = -1; - cp.dir = path; + prepare_status_porcelain(&cp, path, ignore_untracked); if (start_command(&cp)) - die(_("Could not run 'git status --porcelain=2' in submodule %s"), path); + die(_(STATUS_PORCELAIN_START_ERROR), path); fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { @@ -1956,7 +1975,7 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fclose(fp); if (finish_command(&cp) && !ignore_cp_exit_code) - die(_("'git status --porcelain=2' failed in submodule %s"), path); + die(_(STATUS_PORCELAIN_FAIL_ERROR), path); strbuf_release(&buf); return dirty_submodule; -- 2.40.0.rc0.216.gc4246ad0f0-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v9 5/6] diff-lib: refactor out diff_change logic 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan ` (3 preceding siblings ...) 2023-03-02 22:02 ` [PATCH v9 4/6] submodule: refactor is_submodule_modified() Calvin Wan @ 2023-03-02 22:02 ` Calvin Wan 2023-03-02 22:02 ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 5 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-03-02 22:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 In run_diff_files, there is logic that records the diff and updates relevant bits at the end of each entry iteration. Refactor out that logic into a helper function so a future patch can call it. Signed-off-by: Calvin Wan <calvinwan@google.com> --- diff-lib.c | 48 +++++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index dec040c366..744ae98a69 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -88,6 +88,34 @@ static int match_stat_with_submodule(struct diff_options *diffopt, return changed; } +/** + * Records diff_change if there is a change in the entry from run_diff_files. + * If there is no change, then the cache entry is marked CE_UPTODATE and + * CE_FSMONITOR_VALID. If there is no change and the find_copies_harder flag + * is not set, then the function returns early. + */ +static void record_file_diff(struct diff_options *options, unsigned newmode, + unsigned dirty_submodule, int changed, + struct index_state *istate, + struct cache_entry *ce) +{ + unsigned int oldmode; + const struct object_id *old_oid, *new_oid; + + if (!changed && !dirty_submodule) { + ce_mark_uptodate(ce); + mark_fsmonitor_valid(istate, ce); + if (!options->flags.find_copies_harder) + return; + } + oldmode = ce->ce_mode; + old_oid = &ce->oid; + new_oid = changed ? null_oid() : &ce->oid; + diff_change(options, oldmode, newmode, old_oid, new_oid, + !is_null_oid(old_oid), !is_null_oid(new_oid), + ce->name, 0, dirty_submodule); +} + int run_diff_files(struct rev_info *revs, unsigned int option) { int entries, i; @@ -105,11 +133,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option) diff_unmerged_stage = 2; entries = istate->cache_nr; for (i = 0; i < entries; i++) { - unsigned int oldmode, newmode; + unsigned int newmode; struct cache_entry *ce = istate->cache[i]; int changed; unsigned dirty_submodule = 0; - const struct object_id *old_oid, *new_oid; if (diff_can_quit_early(&revs->diffopt)) break; @@ -245,21 +272,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce_mode_from_stat(ce, st.st_mode); } - if (!changed && !dirty_submodule) { - ce_mark_uptodate(ce); - mark_fsmonitor_valid(istate, ce); - if (!revs->diffopt.flags.find_copies_harder) - continue; - } - oldmode = ce->ce_mode; - old_oid = &ce->oid; - new_oid = changed ? null_oid() : &ce->oid; - diff_change(&revs->diffopt, oldmode, newmode, - old_oid, new_oid, - !is_null_oid(old_oid), - !is_null_oid(new_oid), - ce->name, 0, dirty_submodule); - + record_file_diff(&revs->diffopt, newmode, dirty_submodule, + changed, istate, ce); } diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); -- 2.40.0.rc0.216.gc4246ad0f0-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan ` (4 preceding siblings ...) 2023-03-02 22:02 ` [PATCH v9 5/6] diff-lib: refactor out diff_change logic Calvin Wan @ 2023-03-02 22:02 ` Calvin Wan 2023-03-07 8:41 ` Ævar Arnfjörð Bjarmason ` (2 more replies) 5 siblings, 3 replies; 86+ messages in thread From: Calvin Wan @ 2023-03-02 22:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 During the iteration of the index entries in run_diff_files, whenever a submodule is found and needs its status checked, a subprocess is spawned for it. Instead of spawning the subprocess immediately and waiting for its completion to continue, hold onto all submodules and relevant information in a list. Then use that list to create tasks for run_processes_parallel. Subprocess output is passed to status_on_stderr_output which stores it to be parsed on completion of the subprocess. Add config option submodule.diffJobs to set the maximum number of parallel jobs. The option defaults to 1 if unset. If set to 0, the number of jobs is set to online_cpus(). Since run_diff_files is called from many different commands, I chose to grab the config option in the function rather than adding variables to every git command and then figuring out how to pass them all in. Signed-off-by: Calvin Wan <calvinwan@google.com> --- Documentation/config/submodule.txt | 12 +++ diff-lib.c | 81 +++++++++++++++--- submodule.c | 128 +++++++++++++++++++++++++++++ submodule.h | 9 ++ t/t4027-diff-submodule.sh | 31 +++++++ t/t7506-status-submodule.sh | 25 ++++++ 6 files changed, 274 insertions(+), 12 deletions(-) diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt index 6490527b45..3209eb8117 100644 --- a/Documentation/config/submodule.txt +++ b/Documentation/config/submodule.txt @@ -93,6 +93,18 @@ submodule.fetchJobs:: in parallel. A value of 0 will give some reasonable default. If unset, it defaults to 1. +submodule.diffJobs:: + Specifies how many submodules are diffed at the same time. A + positive integer allows up to that number of submodules diffed + in parallel. A value of 0 will give some reasonable default. + If unset, it defaults to 1. The diff operation is used by many + other git commands such as add, merge, diff, status, stash and + more. Note that the expensive part of the diff operation is + reading the index from cache or memory. Therefore multiple jobs + may be detrimental to performance if your hardware does not + support parallel reads or if the number of jobs greatly exceeds + the amount of supported reads. + submodule.alternateLocation:: Specifies how the submodules obtain alternates when submodules are cloned. Possible values are `no`, `superproject`. diff --git a/diff-lib.c b/diff-lib.c index 744ae98a69..7fe6ced950 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -14,6 +14,7 @@ #include "dir.h" #include "fsmonitor.h" #include "commit-reach.h" +#include "config.h" /* * diff-files @@ -65,26 +66,41 @@ static int check_removed(const struct index_state *istate, const struct cache_en * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES * option is set, the caller does not only want to know if a submodule is * modified at all but wants to know all the conditions that are met (new - * commits, untracked content and/or modified content). + * commits, untracked content and/or modified content). If + * defer_submodule_status bit is set, dirty_submodule will be left to the + * caller to set. defer_submodule_status can also be set to 0 in this + * function if there is no need to check if the submodule is modified. */ static int match_stat_with_submodule(struct diff_options *diffopt, const struct cache_entry *ce, struct stat *st, unsigned ce_option, - unsigned *dirty_submodule) + unsigned *dirty_submodule, int *defer_submodule_status, + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); + int defer = 0; + if (S_ISGITLINK(ce->ce_mode)) { struct diff_flags orig_flags = diffopt->flags; if (!diffopt->flags.override_submodule_config) set_diffopt_flags_from_submodule_config(diffopt, ce->name); - if (diffopt->flags.ignore_submodules) + if (diffopt->flags.ignore_submodules) { changed = 0; - else if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); + } else if (!diffopt->flags.ignore_dirty_submodules && + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { + *dirty_submodule = is_submodule_modified(ce->name, + diffopt->flags.ignore_untracked_in_submodules); + } + } diffopt->flags = orig_flags; } + + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; } @@ -124,6 +140,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ? CE_MATCH_RACY_IS_DIRTY : 0); uint64_t start = getnanotime(); struct index_state *istate = revs->diffopt.repo->index; + struct string_list submodules = STRING_LIST_INIT_NODUP; diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); @@ -136,7 +153,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) unsigned int newmode; struct cache_entry *ce = istate->cache[i]; int changed; - unsigned dirty_submodule = 0; + int defer_submodule_status = 1; if (diff_can_quit_early(&revs->diffopt)) break; @@ -247,6 +264,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce->ce_mode; } else { struct stat st; + unsigned ignore_untracked = 0; changed = check_removed(istate, ce, &st); if (changed) { @@ -268,13 +286,52 @@ int run_diff_files(struct rev_info *revs, unsigned int option) } changed = match_stat_with_submodule(&revs->diffopt, ce, &st, - ce_option, &dirty_submodule); + ce_option, NULL, + &defer_submodule_status, + &ignore_untracked); newmode = ce_mode_from_stat(ce, st.st_mode); + if (defer_submodule_status) { + struct submodule_status_util tmp = { + .changed = changed, + .dirty_submodule = 0, + .ignore_untracked = ignore_untracked, + .newmode = newmode, + .ce = ce, + .path = ce->name, + }; + struct string_list_item *item; + + item = string_list_append(&submodules, ce->name); + item->util = xmalloc(sizeof(tmp)); + memcpy(item->util, &tmp, sizeof(tmp)); + continue; + } } - record_file_diff(&revs->diffopt, newmode, dirty_submodule, - changed, istate, ce); + if (!defer_submodule_status) + record_file_diff(&revs->diffopt, newmode, 0, + changed,istate, ce); + } + if (submodules.nr) { + unsigned long parallel_jobs; + struct string_list_item *item; + + if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) + parallel_jobs = 1; + else if (!parallel_jobs) + parallel_jobs = online_cpus(); + + if (get_submodules_status(&submodules, parallel_jobs)) + die(_("submodule status failed")); + for_each_string_list_item(item, &submodules) { + struct submodule_status_util *util = item->util; + + record_file_diff(&revs->diffopt, util->newmode, + util->dirty_submodule, util->changed, + istate, util->ce); + } } + string_list_clear(&submodules, 1); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); trace_performance_since(start, "diff-files"); @@ -322,7 +379,7 @@ static int get_stat_data(const struct index_state *istate, return -1; } changed = match_stat_with_submodule(diffopt, ce, &st, - 0, dirty_submodule); + 0, dirty_submodule, NULL, NULL); if (changed) { mode = ce_mode_from_stat(ce, st.st_mode); oid = null_oid(); diff --git a/submodule.c b/submodule.c index 426074cebb..6f6e150a3f 100644 --- a/submodule.c +++ b/submodule.c @@ -1373,6 +1373,13 @@ int submodule_touches_in_range(struct repository *r, return ret; } +struct submodule_parallel_status { + size_t index_count; + int result; + + struct string_list *submodule_names; +}; + struct submodule_parallel_fetch { /* * The index of the last index entry processed by @@ -1455,6 +1462,12 @@ struct fetch_task { struct oid_array *commits; /* Ensure these commits are fetched */ }; +struct status_task { + const char *path; + struct strbuf out; + int ignore_untracked; +}; + /** * When a submodule is not defined in .gitmodules, we cannot access it * via the regular submodule-config. Create a fake submodule, which we can @@ -1981,6 +1994,121 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) return dirty_submodule; } +static struct status_task * +get_status_task_from_index(struct submodule_parallel_status *sps, + struct strbuf *err) +{ + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; + struct status_task *task; + + if (!verify_submodule_git_directory(util->path)) + continue; + + task = xmalloc(sizeof(*task)); + task->path = util->path; + task->ignore_untracked = util->ignore_untracked; + strbuf_init(&task->out, 0); + sps->index_count++; + return task; + } + return NULL; +} + +static int get_next_submodule_status(struct child_process *cp, + struct strbuf *err, void *data, + void **task_cb) +{ + struct submodule_parallel_status *sps = data; + struct status_task *task = get_status_task_from_index(sps, err); + + if (!task) + return 0; + + child_process_init(cp); + prepare_submodule_repo_env_in_gitdir(&cp->env); + prepare_status_porcelain(cp, task->path, task->ignore_untracked); + *task_cb = task; + return 1; +} + +static int status_start_failure(struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + + sps->result = 1; + strbuf_addf(err, _(STATUS_PORCELAIN_START_ERROR), task->path); + return 0; +} + +static void status_on_stderr_output(struct strbuf *out, + size_t offset, + void *cb, void *task_cb) +{ + struct status_task *task = task_cb; + + strbuf_add(&task->out, out->buf + offset, out->len - offset); + strbuf_setlen(out, offset); +} + +static int status_finish(int retvalue, struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + struct string_list_item *it = + string_list_lookup(sps->submodule_names, task->path); + struct submodule_status_util *util = it->util; + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + + if (retvalue) { + sps->result = 1; + strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path); + } + + string_list_split(&list, task->out.buf, '\n', -1); + for_each_string_list_item(item, &list) { + if (parse_status_porcelain(item->string, + strlen(item->string), + &util->dirty_submodule, + util->ignore_untracked)) + break; + } + string_list_clear(&list, 0); + strbuf_release(&task->out); + free(task); + + return 0; +} + +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs) +{ + struct submodule_parallel_status sps = { + .submodule_names = submodules, + }; + const struct run_process_parallel_opts opts = { + .tr2_category = "submodule", + .tr2_label = "parallel/status", + + .processes = max_parallel_jobs, + + .get_next_task = get_next_submodule_status, + .start_failure = status_start_failure, + .on_stderr_output = status_on_stderr_output, + .task_finished = status_finish, + .data = &sps, + }; + + string_list_sort(sps.submodule_names); + run_processes_parallel(&opts); + + return sps.result; +} + int submodule_uses_gitfile(const char *path) { struct child_process cp = CHILD_PROCESS_INIT; diff --git a/submodule.h b/submodule.h index b52a4ff1e7..08d278a414 100644 --- a/submodule.h +++ b/submodule.h @@ -41,6 +41,13 @@ struct submodule_update_strategy { .type = SM_UPDATE_UNSPECIFIED, \ } +struct submodule_status_util { + int changed, ignore_untracked; + unsigned dirty_submodule, newmode; + struct cache_entry *ce; + const char *path; +}; + int is_gitmodules_unmerged(struct index_state *istate); int is_writing_gitmodules_ok(void); int is_staging_gitmodules_ok(struct index_state *istate); @@ -94,6 +101,8 @@ int fetch_submodules(struct repository *r, int command_line_option, int default_option, int quiet, int max_parallel_jobs); +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs); unsigned is_submodule_modified(const char *path, int ignore_untracked); int submodule_uses_gitfile(const char *path); diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh index 40164ae07d..1c747cc325 100755 --- a/t/t4027-diff-submodule.sh +++ b/t/t4027-diff-submodule.sh @@ -34,6 +34,25 @@ test_expect_success setup ' subtip=$3 subprev=$2 ' +test_expect_success 'diff in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git diff && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git diff && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + test_expect_success 'git diff --raw HEAD' ' hexsz=$(test_oid hexsz) && git diff --raw --abbrev=$hexsz HEAD >actual && @@ -70,6 +89,18 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree)' ' test_cmp expect.body actual.body ' +test_expect_success 'git diff HEAD with dirty submodule (work tree, parallel)' ' + ( + cd sub && + git reset --hard && + echo >>world + ) && + git -c submodule.diffJobs=8 diff HEAD >actual && + sed -e "1,/^@@/d" actual >actual.body && + expect_from_to >expect.body $subtip $subprev-dirty && + test_cmp expect.body actual.body +' + test_expect_success 'git diff HEAD with dirty submodule (index)' ' ( cd sub && diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index d050091345..7da64e4c4c 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -412,4 +412,29 @@ test_expect_success 'status with added file in nested submodule (short)' ' EOF ' +test_expect_success 'status in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git status && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git status && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + +test_expect_success 'status in superproject with submodules (parallel)' ' + git -C super status --porcelain >output && + git -C super -c submodule.diffJobs=8 status --porcelain >output_parallel && + diff output output_parallel +' + test_done -- 2.40.0.rc0.216.gc4246ad0f0-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules 2023-03-02 22:02 ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan @ 2023-03-07 8:41 ` Ævar Arnfjörð Bjarmason 2023-03-07 10:21 ` Ævar Arnfjörð Bjarmason 2023-03-17 1:09 ` Glen Choo 2 siblings, 0 replies; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-03-07 8:41 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy, phillip.wood123 On Thu, Mar 02 2023, Calvin Wan wrote: Some of this is stuff I probably should have noted in earlier rounds, sorry, but then again the diff-churn in those made it harder to review, now that that's mostly out of the way (yay!) .... > +submodule.diffJobs:: > + Specifies how many submodules are diffed at the same time. A > + positive integer allows up to that number of submodules diffed > + in parallel. A value of 0 will give some reasonable default. > + If unset, it defaults to 1. The diff operation is used by many Nit: Maybe start a new paragraph as of "The diff..."? > + other git commands such as add, merge, diff, status, stash and > + more. Note that the expensive part of the diff operation is Nit: Maybe change 'add', 'merge' etc. to linkgit:git-add[1], or quote them? > + reading the index from cache or memory. Therefore multiple jobs With how much we conflate "the cache" and "index" saying "the index from cache" might be especially confusing. I think we can just skip " from cache or memory" here. > static int match_stat_with_submodule(struct diff_options *diffopt, > const struct cache_entry *ce, > struct stat *st, unsigned ce_option, > - unsigned *dirty_submodule) > + unsigned *dirty_submodule, int *defer_submodule_status, Nit: The other one is an "unsigned", shouldn't "defer_submodule_status" also be (more on this below). > + unsigned *ignore_untracked) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > + int defer = 0; > + > if (S_ISGITLINK(ce->ce_mode)) { > struct diff_flags orig_flags = diffopt->flags; > if (!diffopt->flags.override_submodule_config) > set_diffopt_flags_from_submodule_config(diffopt, ce->name); The meaty functional change here looks *much* better, thanks! I.e. this is pretty much what I suggested in https://lore.kernel.org/git/230208.861qn01s4g.gmgdl@evledraar.gmail.com/ > - if (diffopt->flags.ignore_submodules) > + if (diffopt->flags.ignore_submodules) { Not worth a re-roll in itself, but FWIW I think this change would be marginally easier to follow with *a* preceding refactoring change, but per the above & https://lore.kernel.org/git/230209.867cwrzk1l.gmgdl@evledraar.gmail.com/ I just didn't think v7's 6/7 (https://lore.kernel.org/git/20230207181706.363453-7-calvinwan@google.com/) was what we needed there. I.e. in this case a leading change that would add these braces would make this a bit easier to read... > changed = 0; > - else if (!diffopt->flags.ignore_dirty_submodules && ...ditto this line, which would stay the same. > - (!changed || diffopt->flags.dirty_submodules)) > - *dirty_submodule = is_submodule_modified(ce->name, > - diffopt->flags.ignore_untracked_in_submodules); Here you are incorrectly changing the indentation of this away from our usual coding style, which... > + } else if (!diffopt->flags.ignore_dirty_submodules && > + (!changed || diffopt->flags.dirty_submodules)) { > + if (defer_submodule_status && *defer_submodule_status) { Hrm, if if I remove that "&& *defer_submodule_status" all of our tests pass, the only two callers of this function are one where this is NULL, and where it's non-NULL but pre-initilized to 1, and the caller will check if it's then flipped to 0. > + defer = 1; > + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; > + } else { > + *dirty_submodule = is_submodule_modified(ce->name, > + diffopt->flags.ignore_untracked_in_submodules); ...needlessly inflates the diff here, at least under -w and move detection, as we correctly detect the "*dirty_submodule" line as the same, but the "diffopt->flags" line also has a re-indentation change unrelated to adding the "else" scope. > + } > + } > diffopt->flags = orig_flags; > } > + > + if (defer_submodule_status) > + *defer_submodule_status = defer; Having read this whole thing to the end again I think this on top would be much simpler (if I'm right about it being functionally equivalent), and would address some of the above: diff --git a/diff-lib.c b/diff-lib.c index 7fe6ced9501..d5c823f512a 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -78,7 +78,6 @@ static int match_stat_with_submodule(struct diff_options *diffopt, unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); - int defer = 0; if (S_ISGITLINK(ce->ce_mode)) { struct diff_flags orig_flags = diffopt->flags; @@ -88,8 +87,8 @@ static int match_stat_with_submodule(struct diff_options *diffopt, changed = 0; } else if (!diffopt->flags.ignore_dirty_submodules && (!changed || diffopt->flags.dirty_submodules)) { - if (defer_submodule_status && *defer_submodule_status) { - defer = 1; + if (defer_submodule_status) { + *defer_submodule_status = 1; *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; } else { *dirty_submodule = is_submodule_modified(ce->name, @@ -99,8 +98,6 @@ static int match_stat_with_submodule(struct diff_options *diffopt, diffopt->flags = orig_flags; } - if (defer_submodule_status) - *defer_submodule_status = defer; return changed; } @@ -153,7 +150,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) unsigned int newmode; struct cache_entry *ce = istate->cache[i]; int changed; - int defer_submodule_status = 1; + int defer_submodule_status = 0; if (diff_can_quit_early(&revs->diffopt)) break; We could also just leave it, but I for one found it a bit hard to follow that this interface seems to be a tri-state (NULL, set to 0, set to 1), but really it's dual-state, i.e. NULL or a "tell me to defer this" bit. > return changed; > } > > @@ -124,6 +140,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > ? CE_MATCH_RACY_IS_DIRTY : 0); > uint64_t start = getnanotime(); > struct index_state *istate = revs->diffopt.repo->index; > + struct string_list submodules = STRING_LIST_INIT_NODUP; > > diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); > > @@ -136,7 +153,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > unsigned int newmode; > struct cache_entry *ce = istate->cache[i]; > int changed; > - unsigned dirty_submodule = 0; > + int defer_submodule_status = 1; Hrm, having suggested the diff above I just noticed this now, I ended up inverting this, but found the "defer_submodule_status" name a bit odd, can't we just keep "unsigned dirty_submodule"? (that would also address the change from "unsigned" to "int" noted above, which is seeminly unnecessary). But maybe I'm missing a subtlety here, and we should have "deferred status" as apposed to "dirty submodule", but in any case the new one looks like it doesn't need negative values. > + } > + if (submodules.nr) { > + unsigned long parallel_jobs; > + struct string_list_item *item; > + > + if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) > + parallel_jobs = 1; > + else if (!parallel_jobs) > + parallel_jobs = online_cpus(); Given that online_cpus() returns int the "unsigned long" is slightly odd here, but it's because git_config_get_ulong() exist, but we have no git_config_get_uint(), so this is OK (but could be cleaned up as some #leftoverbits). > + if (get_submodules_status(&submodules, parallel_jobs)) > + die(_("submodule status failed")); Here we're adding get_submodules_status(), and returning the actual error code from "status", but then ignoring it here, and returning 128 for any non-zero. I think this would be better as either: code = get_submodules_status(...); die_message(...) exit(code); Or to just have the function itself return !!status, i.e. a "ok" or "not ok". Admittedly a nit, but I have spent quite a bit of time chasing down various exit-code losses in the submodule code, and it would be nice if we just carry the code up, or more explicitly ignore it, but don't add code that seems to care about it, but really doesn't. I also changed this "die" to a "BUG" and our tests passed, so we have no tests for when "status" failed, will such a thing even happen in practice? > + for_each_string_list_item(item, &submodules) { > + struct submodule_status_util *util = item->util; > + > + record_file_diff(&revs->diffopt, util->newmode, > + util->dirty_submodule, util->changed, > + istate, util->ce); > + } > } > + string_list_clear(&submodules, 1); > diffcore_std(&revs->diffopt); > diff_flush(&revs->diffopt); > trace_performance_since(start, "diff-files"); > @@ -322,7 +379,7 @@ static int get_stat_data(const struct index_state *istate, > return -1; > } > changed = match_stat_with_submodule(diffopt, ce, &st, > - 0, dirty_submodule); > + 0, dirty_submodule, NULL, NULL); > if (changed) { > mode = ce_mode_from_stat(ce, st.st_mode); > oid = null_oid(); > diff --git a/submodule.c b/submodule.c > index 426074cebb..6f6e150a3f 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1373,6 +1373,13 @@ int submodule_touches_in_range(struct repository *r, > return ret; > } > > +struct submodule_parallel_status { > + size_t index_count; > + int result; > + > + struct string_list *submodule_names; > +}; Hrm, actually reading a bit more I think part of my comments above are incorrect, i.e. this "result" seems like an exit code, but really in the guts of the API we're ignoring the actual code we get, and just setting this to 1. Per the above I think it might be OK to ignore the exit code (or not), but I really wish we did this more explicitly, e.g. if you want to ignore it call this something like "failed", not "result", and make it an "unsigned int failed:1" to firmly indicate that it's a boolean at the API level. > +struct status_task { > + const char *path; I think we should call this "ce_path", but more on that below. > + struct strbuf out; > + int ignore_untracked; Continued type mismatch commentary: Elsewhere in this diff this is "unsigned", and this compiles for me if I make it "unsigned int ignore_untracked:1", so let's set it to such a flag instead? > +static int status_finish(int retvalue, struct strbuf *err, > + void *cb, void *task_cb) > +{ > + struct submodule_parallel_status *sps = cb; > + struct status_task *task = task_cb; > + struct string_list_item *it = > + string_list_lookup(sps->submodule_names, task->path); > + struct submodule_status_util *util = it->util; > + struct string_list list = STRING_LIST_INIT_DUP; > + struct string_list_item *item; > + > + if (retvalue) { > + sps->result = 1; > + strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path); > + } > + > + string_list_split(&list, task->out.buf, '\n', -1); I think I noted in some earlier round that taking a string and splitting it by \n was a bit wasteful in the test code, but this uses the same pattern. Maybe it's not a performance concern here either, but won't we potentially have to parse some very large statuses here? Aside from that, I haven't tried or reviewed this bit in detail, but this seems to be making things harder than they need to be. Why are we buffering up all of the output into "out" here, only to split it by "\n" later on, and then consider each line as a status line? Shouldn't we be allocating this string_list to begin with, and append to it in the "status_on_stderr_output" callback instead? > + for_each_string_list_item(item, &list) { > + if (parse_status_porcelain(item->string, > + strlen(item->string), > + &util->dirty_submodule, > + util->ignore_untracked)) OK, this seemingly buggy bit of error handling seems to actually be OK on further review, because we'll BUG() out in the function if it fails, so the non-zero return here just means "we're done here". > + break; > + } Style: drop the braces here, as this is just a for/if/body with a single body line. > +int get_submodules_status(struct string_list *submodules, > + int max_parallel_jobs) > +{ > + struct submodule_parallel_status sps = { > + .submodule_names = submodules, > + }; > + const struct run_process_parallel_opts opts = { > + .tr2_category = "submodule", > + .tr2_label = "parallel/status", > + > + .processes = max_parallel_jobs, > + > + .get_next_task = get_next_submodule_status, > + .start_failure = status_start_failure, > + .on_stderr_output = status_on_stderr_output, > + .task_finished = status_finish, > + .data = &sps, > + }; > + > + string_list_sort(sps.submodule_names); > + run_processes_parallel(&opts); > + > + return sps.result; All OK, except as noted above the "result" here is just "did we fail?". > +} > + > int submodule_uses_gitfile(const char *path) > { > struct child_process cp = CHILD_PROCESS_INIT; > diff --git a/submodule.h b/submodule.h > index b52a4ff1e7..08d278a414 100644 > --- a/submodule.h > +++ b/submodule.h > @@ -41,6 +41,13 @@ struct submodule_update_strategy { > .type = SM_UPDATE_UNSPECIFIED, \ > } > > +struct submodule_status_util { > + int changed, ignore_untracked; > + unsigned dirty_submodule, newmode; > + struct cache_entry *ce; > + const char *path; Re "ce_path" above: What's the point of adding a "path" here if we already have "ce"? You just seem to assign "path" to "ce->name" always. I tried this fix-up on top & it worked: diff --git a/diff-lib.c b/diff-lib.c index d5c823f512a..39d8179f0ed 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -294,7 +294,6 @@ int run_diff_files(struct rev_info *revs, unsigned int option) .ignore_untracked = ignore_untracked, .newmode = newmode, .ce = ce, - .path = ce->name, }; struct string_list_item *item; diff --git a/submodule.c b/submodule.c index 3eba00f1533..c220d85815a 100644 --- a/submodule.c +++ b/submodule.c @@ -2002,11 +2002,11 @@ get_status_task_from_index(struct submodule_parallel_status *sps, struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; struct status_task *task; - if (!verify_submodule_git_directory(util->path)) + if (!verify_submodule_git_directory(util->ce->name)) continue; task = xmalloc(sizeof(*task)); - task->path = util->path; + task->path = util->ce->name; task->ignore_untracked = util->ignore_untracked; strbuf_init(&task->out, 0); sps->index_count++; diff --git a/submodule.h b/submodule.h index 3b6abca05cd..3427c495573 100644 --- a/submodule.h +++ b/submodule.h @@ -45,7 +45,6 @@ struct submodule_status_util { int changed, ignore_untracked; unsigned dirty_submodule, newmode; struct cache_entry *ce; - const char *path; }; int is_gitmodules_unmerged(struct index_state *istate); I'd be all for actually narrowing the scope of data we get in general, i.e. do we need all of the "ce" members? I didn't check, but doing this just seems like needless duplication. > @@ -94,6 +101,8 @@ int fetch_submodules(struct repository *r, > int command_line_option, > int default_option, > int quiet, int max_parallel_jobs); > +int get_submodules_status(struct string_list *submodules, > + int max_parallel_jobs); It would be nice to get some API docs for the new function, re its "result" behavior etc. noted above > unsigned is_submodule_modified(const char *path, int ignore_untracked); > int submodule_uses_gitfile(const char *path); > > diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh > index 40164ae07d..1c747cc325 100755 > --- a/t/t4027-diff-submodule.sh > +++ b/t/t4027-diff-submodule.sh > @@ -34,6 +34,25 @@ test_expect_success setup ' > subtip=$3 subprev=$2 > ' > > +test_expect_success 'diff in superproject with submodules respects parallel settings' ' > + test_when_finished "rm -f trace.out" && > + ( > + GIT_TRACE=$(pwd)/trace.out git diff && > + grep "1 tasks" trace.out && > + >trace.out && > + > + git config submodule.diffJobs 8 && > + GIT_TRACE=$(pwd)/trace.out git diff && > + grep "8 tasks" trace.out && > + >trace.out && > + > + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && > + grep "preparing to run up to [0-9]* tasks" trace.out && > + ! grep "up to 0 tasks" trace.out && > + >trace.out > + ) > +' > + > test_expect_success 'git diff --raw HEAD' ' > hexsz=$(test_oid hexsz) && > git diff --raw --abbrev=$hexsz HEAD >actual && > @@ -70,6 +89,18 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree)' ' > test_cmp expect.body actual.body > ' > > +test_expect_success 'git diff HEAD with dirty submodule (work tree, parallel)' ' > + ( > + cd sub && > + git reset --hard && > + echo >>world > + ) && > + git -c submodule.diffJobs=8 diff HEAD >actual && > + sed -e "1,/^@@/d" actual >actual.body && > + expect_from_to >expect.body $subtip $subprev-dirty && > + test_cmp expect.body actual.body > +' > + > test_expect_success 'git diff HEAD with dirty submodule (index)' ' > ( > cd sub && > diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh > index d050091345..7da64e4c4c 100755 > --- a/t/t7506-status-submodule.sh > +++ b/t/t7506-status-submodule.sh > @@ -412,4 +412,29 @@ test_expect_success 'status with added file in nested submodule (short)' ' > EOF > ' > > +test_expect_success 'status in superproject with submodules respects parallel settings' ' > + test_when_finished "rm -f trace.out" && > + ( > + GIT_TRACE=$(pwd)/trace.out git status && > + grep "1 tasks" trace.out && > + >trace.out && > + > + git config submodule.diffJobs 8 && > + GIT_TRACE=$(pwd)/trace.out git status && > + grep "8 tasks" trace.out && > + >trace.out && > + > + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && > + grep "preparing to run up to [0-9]* tasks" trace.out && > + ! grep "up to 0 tasks" trace.out && > + >trace.out > + ) > +' > + > +test_expect_success 'status in superproject with submodules (parallel)' ' > + git -C super status --porcelain >output && > + git -C super -c submodule.diffJobs=8 status --porcelain >output_parallel && > + diff output output_parallel Shouldn't this be a "test_cmp" instead of "diff", and use "actual" and "expect" instead of "output" and "output_parallel"? I'd also rename the test to something like "output with submodule.diffJobs=N equals submodule.diffJobs=1". Except is that even correct? Don't we need to set submodule.diffJobs=1 explicitly so it doesn't default to online_cpus() here? Maybe I missed an earlier config setup... ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules 2023-03-02 22:02 ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-03-07 8:41 ` Ævar Arnfjörð Bjarmason @ 2023-03-07 10:21 ` Ævar Arnfjörð Bjarmason 2023-03-07 17:55 ` Junio C Hamano 2023-03-17 1:09 ` Glen Choo 2 siblings, 1 reply; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-03-07 10:21 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy, phillip.wood123 On Thu, Mar 02 2023, Calvin Wan wrote: > + if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) > + parallel_jobs = 1; Something I missed when eyeballing this in my just-sent review, here we have a "revs->repo" already, so let's not fall back on "the_repository", but use it. I think you want this as a fix-up: diff --git a/diff-lib.c b/diff-lib.c index 925d64ff58c..ec8a0f98085 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -312,7 +312,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) unsigned long parallel_jobs; struct string_list_item *item; - if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) + if (repo_config_get_ulong(revs->repo, "submodule.diffjobs", + ¶llel_jobs)) parallel_jobs = 1; else if (!parallel_jobs) parallel_jobs = online_cpus(); ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules 2023-03-07 10:21 ` Ævar Arnfjörð Bjarmason @ 2023-03-07 17:55 ` Junio C Hamano 0 siblings, 0 replies; 86+ messages in thread From: Junio C Hamano @ 2023-03-07 17:55 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Calvin Wan, git, chooglen, newren, jonathantanmy, phillip.wood123 Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Thu, Mar 02 2023, Calvin Wan wrote: > >> + if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) >> + parallel_jobs = 1; > > Something I missed when eyeballing this in my just-sent review, here we > have a "revs->repo" already, so let's not fall back on "the_repository", > but use it. I think you want this as a fix-up: > > diff --git a/diff-lib.c b/diff-lib.c > index 925d64ff58c..ec8a0f98085 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -312,7 +312,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > unsigned long parallel_jobs; > struct string_list_item *item; > > - if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) > + if (repo_config_get_ulong(revs->repo, "submodule.diffjobs", > + ¶llel_jobs)) > parallel_jobs = 1; > else if (!parallel_jobs) > parallel_jobs = online_cpus(); Good eyes. Thanks for a careful review. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules 2023-03-02 22:02 ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-03-07 8:41 ` Ævar Arnfjörð Bjarmason 2023-03-07 10:21 ` Ævar Arnfjörð Bjarmason @ 2023-03-17 1:09 ` Glen Choo 2023-03-17 2:51 ` Glen Choo 2 siblings, 1 reply; 86+ messages in thread From: Glen Choo @ 2023-03-17 1:09 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 I haven't verified if the code in this version is correct or not, as I found it a bit difficult to follow through the churn. After reading this series again, I've established a better mental model of the code, and I think there are some renames and documentation changes we can make to make this clearer. Unfortunately, I think the biggest clarification would be _yet_ another refactor, and I'm not sure if we actually want to bear so much churn. I might do this refactor locally to see if it really is _much_ cleaner or not. If anyone has thoughts on the refactor, do chime in. Calvin Wan <calvinwan@google.com> writes: > diff --git a/diff-lib.c b/diff-lib.c > index 744ae98a69..7fe6ced950 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -65,26 +66,41 @@ static int check_removed(const struct index_state *istate, const struct cache_en > * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES > * option is set, the caller does not only want to know if a submodule is > * modified at all but wants to know all the conditions that are met (new > - * commits, untracked content and/or modified content). > + * commits, untracked content and/or modified content). If > + * defer_submodule_status bit is set, dirty_submodule will be left to the > + * caller to set. defer_submodule_status can also be set to 0 in this > + * function if there is no need to check if the submodule is modified. > */ > static int match_stat_with_submodule(struct diff_options *diffopt, > const struct cache_entry *ce, > struct stat *st, unsigned ce_option, > - unsigned *dirty_submodule) > + unsigned *dirty_submodule, int *defer_submodule_status, > + unsigned *ignore_untracked) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > + int defer = 0; > + > if (S_ISGITLINK(ce->ce_mode)) { > struct diff_flags orig_flags = diffopt->flags; > if (!diffopt->flags.override_submodule_config) > set_diffopt_flags_from_submodule_config(diffopt, ce->name); > - if (diffopt->flags.ignore_submodules) > + if (diffopt->flags.ignore_submodules) { > changed = 0; > - else if (!diffopt->flags.ignore_dirty_submodules && > - (!changed || diffopt->flags.dirty_submodules)) > - *dirty_submodule = is_submodule_modified(ce->name, > - diffopt->flags.ignore_untracked_in_submodules); > + } else if (!diffopt->flags.ignore_dirty_submodules && > + (!changed || diffopt->flags.dirty_submodules)) { > + if (defer_submodule_status && *defer_submodule_status) { > + defer = 1; > + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; > + } else { > + *dirty_submodule = is_submodule_modified(ce->name, > + diffopt->flags.ignore_untracked_in_submodules); > + } > + } > diffopt->flags = orig_flags; > } > + > + if (defer_submodule_status) > + *defer_submodule_status = defer; The crux of this patch is that we are replacing some serial operation with a parallel operation. The replacement happens here, where we are replacing is_submodule_modified() by 'deferring' it. So to verify if the parallel implementation is correct, we should compare the "setup" and "finish" steps in is_submodule_modified() and get_submodules_status(). Eyeballing it, it looks correct, especially because we made sure to refactor out the shared logic in previous patches. To reflect this, I think it would be clearer to rename get_submodules_status() to something similar (e.g. are_submodules_modified_parallel()), with an explicit comment saying that it is meant to be a parallel implementation of is_submodule_modified(). Except, I told a little white lie in the previous paragraph, because get_submodules_status() isn't _just_ a parallel implementation of is_submodule_modified()... > @@ -268,13 +286,52 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > } > > changed = match_stat_with_submodule(&revs->diffopt, ce, &st, > - ce_option, &dirty_submodule); > + ce_option, NULL, > + &defer_submodule_status, > + &ignore_untracked); > newmode = ce_mode_from_stat(ce, st.st_mode); > + if (defer_submodule_status) { > + struct submodule_status_util tmp = { > + .changed = changed, > + .dirty_submodule = 0, > + .ignore_untracked = ignore_untracked, > + .newmode = newmode, > + .ce = ce, > + .path = ce->name, > + }; > + struct string_list_item *item; > + > + item = string_list_append(&submodules, ce->name); > + item->util = xmalloc(sizeof(tmp)); > + memcpy(item->util, &tmp, sizeof(tmp)); > + continue; > + } because get_submodules_status() doesn't just contain the results of the parallel processes, it is _also_ shuttling "changed" and "ignore_untracked" from match_stat_with_submodule(), as well as .newmode, .ce and .path from run_diff_files() (basically everything except .dirty_submodule)... > } > > - record_file_diff(&revs->diffopt, newmode, dirty_submodule, > - changed, istate, ce); > + if (!defer_submodule_status) > + record_file_diff(&revs->diffopt, newmode, 0, > + changed,istate, ce); > + } > + if (submodules.nr) { > + unsigned long parallel_jobs; > + struct string_list_item *item; > + > + if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) > + parallel_jobs = 1; > + else if (!parallel_jobs) > + parallel_jobs = online_cpus(); > + > + if (get_submodules_status(&submodules, parallel_jobs)) > + die(_("submodule status failed")); > + for_each_string_list_item(item, &submodules) { > + struct submodule_status_util *util = item->util; > + > + record_file_diff(&revs->diffopt, util->newmode, > + util->dirty_submodule, util->changed, > + istate, util->ce); > + } so that we can pass all of this back into record_file_diff(). The only member that is changed by the parallel process is .dirty_submodule, which is exactly what we would expect from a parallel version of is_submodule_modified(). If we don't want to do a bigger refactor, I think we should also add comments to members of "struct submodule_status_util" to document where they come from and what they are used for. The rest of the comments are refactor-related. It would be good if we could avoid mixing unrelated information sources in "struct submodule_status_util", since a) this makes it very tightly coupled to run_diff_files() and b) it causes us to repeat ourselves in the same function (.changed = changed, record_file_diff()). The only reason why the code looks this way right now is that match_stat_with_submodule() sets defer_submodule_status based on whether or not we should ignore the submodule, and this eventually tells get_submodule_status() what submodules it needs to care about. But, deciding whether to spawn a subprocess for which submodule is exactly what the .get_next_task member is for. > diff --git a/submodule.c b/submodule.c > index 426074cebb..6f6e150a3f 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1981,6 +1994,121 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) > return dirty_submodule; > } > > +static struct status_task * > +get_status_task_from_index(struct submodule_parallel_status *sps, > + struct strbuf *err) > +{ > + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { > + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; > + struct status_task *task; > + > + if (!verify_submodule_git_directory(util->path)) > + continue; So right here, we could use the "check if this submodule should be ignored" logic form match_stat_with_submodule() to decide whether or not to spawn the subprocess. IOW, I am advocating for get_submodules_status() to be a parallel version of match_stat_with_submodule() (not a parallel version of is_submodule_modified() that shuttles extra information). Another sign that this refactor is a good idea is that it lets us simplify _existing_ submodule logic in run_diff_files(). Prior to this patch, we have: unsigned dirty_submodule = 0; ... changed = match_stat_with_submodule(&revs->diffopt, ce, &st, ce_option, NULL, &defer_submodule_status, &ignore_untracked); // If submodule was deferred, shuttle a bunch of information // If not, call record_file_diff() but the body of match_stat_with_submodule() is just ie_match_stat() + some additional submodule logic. Post refactor, this would look something like: struct string_list submodules; ... // For any submodule, just append it to a list and let the // parallel thing take care of it. if (S_ISGITLINK(ce->ce_mode) { // Probably pass .newmode and .ce to the util too... string_list_append(submodules, ce->name); } else { changed = ie_match_stat(foo, bar, baz); record_file_diff(); } ... if (submodules.nr) { parallel_match_stat_with_submodule_wip_name(&submodules); for_each_string_list_item(item, &submodules) { record_file_diff(&item); } } Which I think is easier to follow, since we won't need defer_submodule_status any more, and we don't shuttle information from match_stat_with_submodule(). Though I'm a bit unhappy that it's still pretty coupled to run_diff_files() (it still has to shuttle .newmode, .ce). Also, I don't think this refactor lets us avoid the refactors we did in the previous patches. > + > + task = xmalloc(sizeof(*task)); > + task->path = util->path; > + task->ignore_untracked = util->ignore_untracked; > + strbuf_init(&task->out, 0); > + sps->index_count++; > + return task; > + } > + return NULL; > +} > + > +static int get_next_submodule_status(struct child_process *cp, > + struct strbuf *err, void *data, > + void **task_cb) > +{ > + struct submodule_parallel_status *sps = data; > + struct status_task *task = get_status_task_from_index(sps, err); As an aside, I think we can inline get_status_task_from_index(). I suspect this pattern was copied from get_next_submodule(), which gets fetch tasks from two different places (hence _from_index and _from_changed), but here I don't think we will ever get status tasks from more than one place. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules 2023-03-17 1:09 ` Glen Choo @ 2023-03-17 2:51 ` Glen Choo 0 siblings, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-03-17 2:51 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 Glen Choo <chooglen@google.com> writes: > It would be good if we could avoid mixing unrelated information sources > in "struct submodule_status_util", since a) this makes it very tightly > coupled to run_diff_files() and b) it causes us to repeat ourselves in > the same function (.changed = changed, record_file_diff()). > > The only reason why the code looks this way right now is that > match_stat_with_submodule() sets defer_submodule_status based on whether > or not we should ignore the submodule, and this eventually tells > get_submodule_status() what submodules it needs to care about. But, > deciding whether to spawn a subprocess for which submodule is exactly > what the .get_next_task member is for. > >> diff --git a/submodule.c b/submodule.c >> index 426074cebb..6f6e150a3f 100644 >> --- a/submodule.c >> +++ b/submodule.c >> @@ -1981,6 +1994,121 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) >> return dirty_submodule; >> } >> >> +static struct status_task * >> +get_status_task_from_index(struct submodule_parallel_status *sps, >> + struct strbuf *err) >> +{ >> + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { >> + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; >> + struct status_task *task; >> + >> + if (!verify_submodule_git_directory(util->path)) >> + continue; > > So right here, we could use the "check if this submodule should be > ignored" logic form match_stat_with_submodule() to decide whether or not > to spawn the subprocess. IOW, I am advocating for > get_submodules_status() to be a parallel version of > match_stat_with_submodule() (not a parallel version of > is_submodule_modified() that shuttles extra information). It turns out to be quite difficult to implement a parallel match_stat_with_submodule(): a) we can't remove it because it still has another caller b) its internals are quite hard to refactor: one conditional arm depends on "changed", which is set by calling ie_match_stat(), which in turn requires the "struct stat" to have already been lstat()-ed... So even though this series adds a lot, it is just about as minimally invasive as possible. I suspect that there are some possible cleanups down the line, e.g. is_submodule_modified() is rightfully only called by diff-lib.c , so I think it should be a static function there. And once we move that, we can make our parallel function static, and then we don't have to worry about tight coupling to run_diff_files(). To keep the range-diff manageable, that can be left for a future cleanup though. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan 2023-02-08 0:55 ` Ævar Arnfjörð Bjarmason 2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan @ 2023-02-09 0:02 ` Calvin Wan 2023-02-13 6:34 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 2/6] submodule: strbuf variable rename Calvin Wan ` (4 subsequent siblings) 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-09 0:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Add duplicate_output_fn as an optionally set function in run_process_parallel_opts. If set, output from each child process is copied and passed to the callback function whenever output from the child process is buffered to allow for separate parsing. Fix two items in pp_buffer_stderr: * strbuf_read_once returns a ssize_t but the variable it is set to is an int so fix that. * Add missing brackets to "else if" statement The ungroup/duplicate_output incompatibility check is nested to prepare for future imcompatibles modes with ungroup. Signed-off-by: Calvin Wan <calvinwan@google.com> --- run-command.c | 16 ++++++++++++--- run-command.h | 25 ++++++++++++++++++++++++ t/helper/test-run-command.c | 20 +++++++++++++++++++ t/t0061-run-command.sh | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 97 insertions(+), 3 deletions(-) diff --git a/run-command.c b/run-command.c index 756f1839aa..50f741f2ab 100644 --- a/run-command.c +++ b/run-command.c @@ -1526,6 +1526,11 @@ static void pp_init(struct parallel_processes *pp, if (!opts->get_next_task) BUG("you need to specify a get_next_task function"); + if (opts->ungroup) { + if (opts->duplicate_output) + BUG("duplicate_output and ungroup are incompatible with each other"); + } + CALLOC_ARRAY(pp->children, n); if (!opts->ungroup) CALLOC_ARRAY(pp->pfd, n); @@ -1645,14 +1650,19 @@ static void pp_buffer_stderr(struct parallel_processes *pp, for (size_t i = 0; i < opts->processes; i++) { if (pp->children[i].state == GIT_CP_WORKING && pp->pfd[i].revents & (POLLIN | POLLHUP)) { - int n = strbuf_read_once(&pp->children[i].err, - pp->children[i].process.err, 0); + ssize_t n = strbuf_read_once(&pp->children[i].err, + pp->children[i].process.err, 0); if (n == 0) { close(pp->children[i].process.err); pp->children[i].state = GIT_CP_WAIT_CLEANUP; - } else if (n < 0) + } else if (n < 0) { if (errno != EAGAIN) die_errno("read"); + } else if (opts->duplicate_output) { + opts->duplicate_output(&pp->children[i].err, + pp->children[i].err.len - n, + opts->data, pp->children[i].data); + } } } } diff --git a/run-command.h b/run-command.h index 072db56a4d..0c16d7f251 100644 --- a/run-command.h +++ b/run-command.h @@ -408,6 +408,25 @@ typedef int (*start_failure_fn)(struct strbuf *out, void *pp_cb, void *pp_task_cb); +/** + * This callback is called whenever output from a child process is buffered + * + * See run_processes_parallel() below for a discussion of the "struct + * strbuf *out" parameter. + * + * The offset refers to the number of bytes originally in "out" before + * the output from the child process was buffered. Therefore, the buffer + * range, "out + buf" to the end of "out", would contain the buffer of + * the child process output. + * + * pp_cb is the callback cookie as passed into run_processes_parallel, + * pp_task_cb is the callback cookie as passed into get_next_task_fn. + * + * This function is incompatible with "ungroup" + */ +typedef void (*duplicate_output_fn)(struct strbuf *out, size_t offset, + void *pp_cb, void *pp_task_cb); + /** * This callback is called on every child process that finished processing. * @@ -461,6 +480,12 @@ struct run_process_parallel_opts */ start_failure_fn start_failure; + /** + * duplicate_output: See duplicate_output_fn() above. Unless you need + * to capture output from child processes, leave this as NULL. + */ + duplicate_output_fn duplicate_output; + /** * task_finished: See task_finished_fn() above. This can be * NULL to omit any special handling. diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index 3ecb830f4a..4596ba68a8 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -52,6 +52,20 @@ static int no_job(struct child_process *cp, return 0; } +static void duplicate_output(struct strbuf *out, + size_t offset, + void *pp_cb UNUSED, + void *pp_task_cb UNUSED) +{ + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + + string_list_split(&list, out->buf + offset, '\n', -1); + for_each_string_list_item(item, &list) + fprintf(stderr, "duplicate_output: %s\n", item->string); + string_list_clear(&list, 0); +} + static int task_finished(int result, struct strbuf *err, void *pp_cb, @@ -439,6 +453,12 @@ int cmd__run_command(int argc, const char **argv) opts.ungroup = 1; } + if (!strcmp(argv[1], "--duplicate-output")) { + argv += 1; + argc -= 1; + opts.duplicate_output = duplicate_output; + } + jobs = atoi(argv[2]); strvec_clear(&proc.args); strvec_pushv(&proc.args, (const char **)argv + 3); diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index e2411f6a9b..31f1db96fc 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -135,6 +135,15 @@ test_expect_success 'run_command runs in parallel with more jobs available than test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more jobs available than tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err >err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' ' test-tool run-command --ungroup run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -147,6 +156,15 @@ test_expect_success 'run_command runs in parallel with as many jobs as tasks' ' test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with as many jobs as tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err >err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' ' test-tool run-command --ungroup run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -159,6 +177,15 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more tasks than jobs available --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err >err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' ' test-tool run-command --ungroup run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -180,6 +207,12 @@ test_expect_success 'run_command is asked to abort gracefully' ' test_cmp expect actual ' +test_expect_success 'run_command is asked to abort gracefully --duplicate-output' ' + test-tool run-command --duplicate-output run-command-abort 3 false >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command is asked to abort gracefully (ungroup)' ' test-tool run-command --ungroup run-command-abort 3 false >out 2>err && test_must_be_empty out && @@ -196,6 +229,12 @@ test_expect_success 'run_command outputs ' ' test_cmp expect actual ' +test_expect_success 'run_command outputs --duplicate-output' ' + test-tool run-command --duplicate-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command outputs (ungroup) ' ' test-tool run-command --ungroup run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_must_be_empty out && -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-09 0:02 ` [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan @ 2023-02-13 6:34 ` Glen Choo 2023-02-13 17:52 ` Junio C Hamano 0 siblings, 1 reply; 86+ messages in thread From: Glen Choo @ 2023-02-13 6:34 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > @@ -1645,14 +1650,19 @@ static void pp_buffer_stderr(struct parallel_processes *pp, > for (size_t i = 0; i < opts->processes; i++) { > if (pp->children[i].state == GIT_CP_WORKING && > pp->pfd[i].revents & (POLLIN | POLLHUP)) { > - int n = strbuf_read_once(&pp->children[i].err, > - pp->children[i].process.err, 0); > + ssize_t n = strbuf_read_once(&pp->children[i].err, > + pp->children[i].process.err, 0); > if (n == 0) { > close(pp->children[i].process.err); > pp->children[i].state = GIT_CP_WAIT_CLEANUP; > - } else if (n < 0) > + } else if (n < 0) { > if (errno != EAGAIN) > die_errno("read"); > + } else if (opts->duplicate_output) { > + opts->duplicate_output(&pp->children[i].err, > + pp->children[i].err.len - n, > + opts->data, pp->children[i].data); > + } > } > } > } What do we think of the name "duplicate_output"? IMO it made sense in earlier versions when we were copying the output to a separate buffer (I believe it was renamed in response to [1]), but now that we're just calling a callback on the main buffer, it seems misleading. Maybe "output_buffered" would be better? Sidenote: One convention from JS that I like is to name such event listeners as "on_<event_name>", e.g. "on_output_buffered". This makes naming a lot easier sometimes because you don't have to worry about having your event listener being mistaken for something else. It wouldn't be idiomatic for Git today, but I wonder what others think about adopting this. [1] https://lore.kernel.org/git/xmqq4jvxpw46.fsf@gitster.g/ > +/** > + * This callback is called whenever output from a child process is buffered > + * > + * See run_processes_parallel() below for a discussion of the "struct > + * strbuf *out" parameter. > + * > + * The offset refers to the number of bytes originally in "out" before > + * the output from the child process was buffered. Therefore, the buffer > + * range, "out + buf" to the end of "out", would contain the buffer of > + * the child process output. Looks like there's extra whitespace on the 'blank' lines. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-13 6:34 ` Glen Choo @ 2023-02-13 17:52 ` Junio C Hamano 2023-02-13 18:26 ` Calvin Wan 0 siblings, 1 reply; 86+ messages in thread From: Junio C Hamano @ 2023-02-13 17:52 UTC (permalink / raw) To: Glen Choo; +Cc: Calvin Wan, git, avarab, newren, jonathantanmy, phillip.wood123 Glen Choo <chooglen@google.com> writes: > What do we think of the name "duplicate_output"? IMO it made sense in > earlier versions when we were copying the output to a separate buffer (I > believe it was renamed in response to [1]), but now that we're just > calling a callback on the main buffer, it seems misleading. Maybe > "output_buffered" would be better? Yeah, we do not even know what the callback does to the data we are giving it. The only thing we know is that we have output from the child, and in addition to the usual buffering we do ourselves, we are allowing the callback to peek into the buffered data in advance. If the callback does consume it *and* remove the buffered data it consumed right away, then as you say, "duplicate" becomes a word that totally misses the point. There is no duplication, as the callback consumed and we no longer has our own copy, either. If the callback consumes it but leaves the buffered data as-is, and we would show that once the child finishes anyway, you can say that we are feeding a duplicate of buffered data to the callback. The mechanism could be used merely to count how much output we have accumulated so far to update the progress-bar, for example, and the output may be given after the process is done. But note that we are not doing an "output" of "buffered" data in such a case. To me, both "duplicate_output" and "output_buffered" sound like they are names that are quite specific to the expected use case the person who proposed the names had in mind, yet it is a bit hard to guess exactly what the expected use cases they had in mind were, because the names are not quite specific enough. > Sidenote: One convention from JS that I like is to name such event > listeners as "on_<event_name>", e.g. "on_output_buffered". Thanks for bringing this up. I agree that "Upon X happening, do this" is a very good convention to follow. I think the callback is made whenever the child emits to the standard error stream, so "on_error_output" (if we are worried that "error" has a too strong "something bad happend" connotation, then perhaps "on_stderr_output" may dampen it) perhaps? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-13 17:52 ` Junio C Hamano @ 2023-02-13 18:26 ` Calvin Wan 0 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-13 18:26 UTC (permalink / raw) To: Junio C Hamano Cc: Glen Choo, git, avarab, newren, jonathantanmy, phillip.wood123 > > Sidenote: One convention from JS that I like is to name such event > > listeners as "on_<event_name>", e.g. "on_output_buffered". > > Thanks for bringing this up. I agree that "Upon X happening, do > this" is a very good convention to follow. I think the callback is > made whenever the child emits to the standard error stream, so > "on_error_output" (if we are worried that "error" has a too strong > "something bad happend" connotation, then perhaps "on_stderr_output" > may dampen it) perhaps? "on_stderr_output" sounds much better than "duplicate_output". I did spend much time trying to come up with a better name, but couldn't find anything that conveyed what the expected use case of this function was. Thanks, I'll rename it on my next reroll. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v8 2/6] submodule: strbuf variable rename 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan ` (2 preceding siblings ...) 2023-02-09 0:02 ` [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan @ 2023-02-09 0:02 ` Calvin Wan 2023-02-13 8:37 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 3/6] submodule: move status parsing into function Calvin Wan ` (3 subsequent siblings) 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-09 0:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 A prepatory change for a future patch that moves the status parsing logic to a separate function. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/submodule.c b/submodule.c index fae24ef34a..faf37c1101 100644 --- a/submodule.c +++ b/submodule.c @@ -1906,25 +1906,28 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { + char *str = buf.buf; + const size_t len = buf.len; + /* regular untracked files */ - if (buf.buf[0] == '?') + if (str[0] == '?') dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '1' || - buf.buf[0] == '2') { + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { /* T = line type, XY = status, SSSS = submodule state */ - if (buf.len < strlen("T XY SSSS")) + if (len < strlen("T XY SSSS")) BUG("invalid status --porcelain=2 line %s", - buf.buf); + str); - if (buf.buf[5] == 'S' && buf.buf[8] == 'U') + if (str[5] == 'S' && str[8] == 'U') /* nested untracked file */ dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '2' || - memcmp(buf.buf + 5, "S..U", 4)) + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) /* other change */ dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; } -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v8 2/6] submodule: strbuf variable rename 2023-02-09 0:02 ` [PATCH v8 2/6] submodule: strbuf variable rename Calvin Wan @ 2023-02-13 8:37 ` Glen Choo 0 siblings, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-02-13 8:37 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > Subject: [PATCH v8 2/6] submodule: strbuf variable rename This should probably be "submodule: rename strbuf variable". ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v8 3/6] submodule: move status parsing into function 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan ` (3 preceding siblings ...) 2023-02-09 0:02 ` [PATCH v8 2/6] submodule: strbuf variable rename Calvin Wan @ 2023-02-09 0:02 ` Calvin Wan 2023-02-09 0:02 ` [PATCH v8 4/6] submodule: refactor is_submodule_modified() Calvin Wan ` (2 subsequent siblings) 7 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-09 0:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 A future patch requires the ability to parse the output of git status --porcelain=2. Move parsing code from is_submodule_modified to parse_status_porcelain. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 74 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/submodule.c b/submodule.c index faf37c1101..768d4b4cd7 100644 --- a/submodule.c +++ b/submodule.c @@ -1870,6 +1870,45 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int parse_status_porcelain(char *str, size_t len, + unsigned *dirty_submodule, + int ignore_untracked) +{ + /* regular untracked files */ + if (str[0] == '?') + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { + /* T = line type, XY = status, SSSS = submodule state */ + if (len < strlen("T XY SSSS")) + BUG("invalid status --porcelain=2 line %s", + str); + + if (str[5] == 'S' && str[8] == 'U') + /* nested untracked file */ + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) + /* other change */ + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } + + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + ignore_untracked)) { + /* + * We're not interested in any further information from + * the child any more, neither output nor its exit code. + */ + return 1; + } + return 0; +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1909,39 +1948,10 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) char *str = buf.buf; const size_t len = buf.len; - /* regular untracked files */ - if (str[0] == '?') - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '1' || - str[0] == '2') { - /* T = line type, XY = status, SSSS = submodule state */ - if (len < strlen("T XY SSSS")) - BUG("invalid status --porcelain=2 line %s", - str); - - if (str[5] == 'S' && str[8] == 'U') - /* nested untracked file */ - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '2' || - memcmp(str + 5, "S..U", 4)) - /* other change */ - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; - } - - if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || - ignore_untracked)) { - /* - * We're not interested in any further information from - * the child any more, neither output nor its exit code. - */ - ignore_cp_exit_code = 1; + ignore_cp_exit_code = parse_status_porcelain(str, len, &dirty_submodule, + ignore_untracked); + if (ignore_cp_exit_code) break; - } } fclose(fp); -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v8 4/6] submodule: refactor is_submodule_modified() 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan ` (4 preceding siblings ...) 2023-02-09 0:02 ` [PATCH v8 3/6] submodule: move status parsing into function Calvin Wan @ 2023-02-09 0:02 ` Calvin Wan 2023-02-13 7:06 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 5/6] diff-lib: refactor out diff_change logic Calvin Wan 2023-02-09 0:02 ` [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-09 0:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Refactor out submodule status logic and error messages that will be used in a future patch. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 65 ++++++++++++++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 23 deletions(-) diff --git a/submodule.c b/submodule.c index 768d4b4cd7..426074cebb 100644 --- a/submodule.c +++ b/submodule.c @@ -28,6 +28,10 @@ static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; static int initialized_fetch_ref_tips; static struct oid_array ref_tips_before_fetch; static struct oid_array ref_tips_after_fetch; +#define STATUS_PORCELAIN_START_ERROR \ + N_("could not run 'git status --porcelain=2' in submodule %s") +#define STATUS_PORCELAIN_FAIL_ERROR \ + N_("'git status --porcelain=2' failed in submodule %s") /* * Check if the .gitmodules file is unmerged. Parsing of the .gitmodules file @@ -1870,6 +1874,40 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int verify_submodule_git_directory(const char *path) +{ + const char *git_dir; + struct strbuf buf = STRBUF_INIT; + + strbuf_addf(&buf, "%s/.git", path); + git_dir = read_gitfile(buf.buf); + if (!git_dir) + git_dir = buf.buf; + if (!is_git_directory(git_dir)) { + if (is_directory(git_dir)) + die(_("'%s' not recognized as a git repository"), git_dir); + strbuf_release(&buf); + /* The submodule is not checked out, so it is not modified */ + return 0; + } + strbuf_release(&buf); + return 1; +} + +static void prepare_status_porcelain(struct child_process *cp, + const char *path, int ignore_untracked) +{ + strvec_pushl(&cp->args, "status", "--porcelain=2", NULL); + if (ignore_untracked) + strvec_push(&cp->args, "-uno"); + + prepare_submodule_repo_env(&cp->env); + cp->git_cmd = 1; + cp->no_stdin = 1; + cp->out = -1; + cp->dir = path; +} + static int parse_status_porcelain(char *str, size_t len, unsigned *dirty_submodule, int ignore_untracked) @@ -1915,33 +1953,14 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) struct strbuf buf = STRBUF_INIT; FILE *fp; unsigned dirty_submodule = 0; - const char *git_dir; int ignore_cp_exit_code = 0; - strbuf_addf(&buf, "%s/.git", path); - git_dir = read_gitfile(buf.buf); - if (!git_dir) - git_dir = buf.buf; - if (!is_git_directory(git_dir)) { - if (is_directory(git_dir)) - die(_("'%s' not recognized as a git repository"), git_dir); - strbuf_release(&buf); - /* The submodule is not checked out, so it is not modified */ + if (!verify_submodule_git_directory(path)) return 0; - } - strbuf_reset(&buf); - - strvec_pushl(&cp.args, "status", "--porcelain=2", NULL); - if (ignore_untracked) - strvec_push(&cp.args, "-uno"); - prepare_submodule_repo_env(&cp.env); - cp.git_cmd = 1; - cp.no_stdin = 1; - cp.out = -1; - cp.dir = path; + prepare_status_porcelain(&cp, path, ignore_untracked); if (start_command(&cp)) - die(_("Could not run 'git status --porcelain=2' in submodule %s"), path); + die(_(STATUS_PORCELAIN_START_ERROR), path); fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { @@ -1956,7 +1975,7 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fclose(fp); if (finish_command(&cp) && !ignore_cp_exit_code) - die(_("'git status --porcelain=2' failed in submodule %s"), path); + die(_(STATUS_PORCELAIN_FAIL_ERROR), path); strbuf_release(&buf); return dirty_submodule; -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v8 4/6] submodule: refactor is_submodule_modified() 2023-02-09 0:02 ` [PATCH v8 4/6] submodule: refactor is_submodule_modified() Calvin Wan @ 2023-02-13 7:06 ` Glen Choo 0 siblings, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-02-13 7:06 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > Refactor out submodule status logic and error messages that will be > used in a future patch. This improves the readability of the last patch by quite a lot. Thanks for taking the suggestion :) (This patch was actually introduced in the previous round, but I missed that, sorry.) ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v8 5/6] diff-lib: refactor out diff_change logic 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan ` (5 preceding siblings ...) 2023-02-09 0:02 ` [PATCH v8 4/6] submodule: refactor is_submodule_modified() Calvin Wan @ 2023-02-09 0:02 ` Calvin Wan 2023-02-09 1:48 ` Ævar Arnfjörð Bjarmason 2023-02-13 8:42 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 7 siblings, 2 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-09 0:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 Refactor out logic that sets up the diff_change call into a helper function for a future patch. Signed-off-by: Calvin Wan <calvinwan@google.com> --- diff-lib.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index dec040c366..7101cfda3f 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -88,6 +88,31 @@ static int match_stat_with_submodule(struct diff_options *diffopt, return changed; } +static int diff_change_helper(struct diff_options *options, + unsigned newmode, unsigned dirty_submodule, + int changed, struct index_state *istate, + struct cache_entry *ce) +{ + unsigned int oldmode; + const struct object_id *old_oid, *new_oid; + + if (!changed && !dirty_submodule) { + ce_mark_uptodate(ce); + mark_fsmonitor_valid(istate, ce); + if (!options->flags.find_copies_harder) + return 1; + } + oldmode = ce->ce_mode; + old_oid = &ce->oid; + new_oid = changed ? null_oid() : &ce->oid; + diff_change(options, oldmode, newmode, + old_oid, new_oid, + !is_null_oid(old_oid), + !is_null_oid(new_oid), + ce->name, 0, dirty_submodule); + return 0; +} + int run_diff_files(struct rev_info *revs, unsigned int option) { int entries, i; @@ -105,11 +130,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option) diff_unmerged_stage = 2; entries = istate->cache_nr; for (i = 0; i < entries; i++) { - unsigned int oldmode, newmode; + unsigned int newmode; struct cache_entry *ce = istate->cache[i]; int changed; unsigned dirty_submodule = 0; - const struct object_id *old_oid, *new_oid; if (diff_can_quit_early(&revs->diffopt)) break; @@ -245,21 +269,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce_mode_from_stat(ce, st.st_mode); } - if (!changed && !dirty_submodule) { - ce_mark_uptodate(ce); - mark_fsmonitor_valid(istate, ce); - if (!revs->diffopt.flags.find_copies_harder) - continue; - } - oldmode = ce->ce_mode; - old_oid = &ce->oid; - new_oid = changed ? null_oid() : &ce->oid; - diff_change(&revs->diffopt, oldmode, newmode, - old_oid, new_oid, - !is_null_oid(old_oid), - !is_null_oid(new_oid), - ce->name, 0, dirty_submodule); - + if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, + changed, istate, ce)) + continue; } diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v8 5/6] diff-lib: refactor out diff_change logic 2023-02-09 0:02 ` [PATCH v8 5/6] diff-lib: refactor out diff_change logic Calvin Wan @ 2023-02-09 1:48 ` Ævar Arnfjörð Bjarmason 2023-02-13 8:42 ` Glen Choo 1 sibling, 0 replies; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-09 1:48 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy, phillip.wood123 On Thu, Feb 09 2023, Calvin Wan wrote: > + diff_change(options, oldmode, newmode, > + old_oid, new_oid, > + !is_null_oid(old_oid), > + !is_null_oid(new_oid), > + ce->name, 0, dirty_submodule); Nit: This has odd not-our-usual-style indentation (to align with the "("). I didn't spot it before, but I vaguely recall seeing something like this in another one of your patches, but maybe I misrecall. In case not maybe some editor settings need tweaking? I haven't looked carefully at the rest to see if the same issue occurs in other code here. > - if (!changed && !dirty_submodule) { > - ce_mark_uptodate(ce); > - mark_fsmonitor_valid(istate, ce); > - if (!revs->diffopt.flags.find_copies_harder) > - continue; > - } > - oldmode = ce->ce_mode; > - old_oid = &ce->oid; > - new_oid = changed ? null_oid() : &ce->oid; > - diff_change(&revs->diffopt, oldmode, newmode, > - old_oid, new_oid, > - !is_null_oid(old_oid), > - !is_null_oid(new_oid), > - ce->name, 0, dirty_submodule); So in this case it's not new code, but code moving, note the four spaces after the sequence of tabs that aren't in your version. So perhaps your editor on re-indentation is configured not to just strip off the leading \t to re-indent (which is all that's needed here) but strips all whitespace, then re-indents after its own mind? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 5/6] diff-lib: refactor out diff_change logic 2023-02-09 0:02 ` [PATCH v8 5/6] diff-lib: refactor out diff_change logic Calvin Wan 2023-02-09 1:48 ` Ævar Arnfjörð Bjarmason @ 2023-02-13 8:42 ` Glen Choo 2023-02-13 18:29 ` Calvin Wan 1 sibling, 1 reply; 86+ messages in thread From: Glen Choo @ 2023-02-13 8:42 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > Refactor out logic that sets up the diff_change call into a helper > function for a future patch. This seems underspecified; there are two diff_change calls in diff-lib, and the call in show_modified() is not changed in this patch. > +static int diff_change_helper(struct diff_options *options, > + unsigned newmode, unsigned dirty_submodule, > + int changed, struct index_state *istate, > + struct cache_entry *ce) The function name is very generic, and it's not clear: - What this does besides calling "diff_change()". - When I should call this instead of "diff_change()". - What the return value means. Both of these should be documented in a comment, and I also suggest renaming the function. > @@ -245,21 +269,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > newmode = ce_mode_from_stat(ce, st.st_mode); > } > > - if (!changed && !dirty_submodule) { > - ce_mark_uptodate(ce); > - mark_fsmonitor_valid(istate, ce); > - if (!revs->diffopt.flags.find_copies_harder) > - continue; > - } > - oldmode = ce->ce_mode; > - old_oid = &ce->oid; > - new_oid = changed ? null_oid() : &ce->oid; > - diff_change(&revs->diffopt, oldmode, newmode, > - old_oid, new_oid, > - !is_null_oid(old_oid), > - !is_null_oid(new_oid), > - ce->name, 0, dirty_submodule); > - > + if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, > + changed, istate, ce)) > + continue; > } If I'm reading the indentation correctly, the "continue" comes right before the end of the for-loop block, so it's a no-op and should be removed. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 5/6] diff-lib: refactor out diff_change logic 2023-02-13 8:42 ` Glen Choo @ 2023-02-13 18:29 ` Calvin Wan 2023-02-14 4:03 ` Glen Choo 0 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-13 18:29 UTC (permalink / raw) To: Glen Choo; +Cc: git, avarab, newren, jonathantanmy, phillip.wood123 On Mon, Feb 13, 2023 at 12:42 AM Glen Choo <chooglen@google.com> wrote: > > Calvin Wan <calvinwan@google.com> writes: > > > Refactor out logic that sets up the diff_change call into a helper > > function for a future patch. > > This seems underspecified; there are two diff_change calls in diff-lib, > and the call in show_modified() is not changed in this patch. > > > +static int diff_change_helper(struct diff_options *options, > > + unsigned newmode, unsigned dirty_submodule, > > + int changed, struct index_state *istate, > > + struct cache_entry *ce) > > The function name is very generic, and it's not clear: > > - What this does besides calling "diff_change()". > - When I should call this instead of "diff_change()". > - What the return value means. > > Both of these should be documented in a comment, and I also suggest > renaming the function. ack. > > @@ -245,21 +269,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > > newmode = ce_mode_from_stat(ce, st.st_mode); > > } > > > > - if (!changed && !dirty_submodule) { > > - ce_mark_uptodate(ce); > > - mark_fsmonitor_valid(istate, ce); > > - if (!revs->diffopt.flags.find_copies_harder) > > - continue; > > - } > > - oldmode = ce->ce_mode; > > - old_oid = &ce->oid; > > - new_oid = changed ? null_oid() : &ce->oid; > > - diff_change(&revs->diffopt, oldmode, newmode, > > - old_oid, new_oid, > > - !is_null_oid(old_oid), > > - !is_null_oid(new_oid), > > - ce->name, 0, dirty_submodule); > > - > > + if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, > > + changed, istate, ce)) > > + continue; > > } > > If I'm reading the indentation correctly, the "continue" comes right > before the end of the for-loop block, so it's a no-op and should be > removed. It is a no-op, but I left it in as future-proofing in case more code is added after that block later. I'm not sure whether that line of reasoning is enough to leave it in though. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v8 5/6] diff-lib: refactor out diff_change logic 2023-02-13 18:29 ` Calvin Wan @ 2023-02-14 4:03 ` Glen Choo 0 siblings, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-02-14 4:03 UTC (permalink / raw) To: Calvin Wan; +Cc: git, avarab, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: >> > @@ -245,21 +269,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) >> > newmode = ce_mode_from_stat(ce, st.st_mode); >> > } >> > >> > - if (!changed && !dirty_submodule) { >> > - ce_mark_uptodate(ce); >> > - mark_fsmonitor_valid(istate, ce); >> > - if (!revs->diffopt.flags.find_copies_harder) >> > - continue; >> > - } >> > - oldmode = ce->ce_mode; >> > - old_oid = &ce->oid; >> > - new_oid = changed ? null_oid() : &ce->oid; >> > - diff_change(&revs->diffopt, oldmode, newmode, >> > - old_oid, new_oid, >> > - !is_null_oid(old_oid), >> > - !is_null_oid(new_oid), >> > - ce->name, 0, dirty_submodule); >> > - >> > + if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, >> > + changed, istate, ce)) >> > + continue; >> > } >> >> If I'm reading the indentation correctly, the "continue" comes right >> before the end of the for-loop block, so it's a no-op and should be >> removed. > > It is a no-op, but I left it in as future-proofing in case more code is > added after that block later. I'm not sure whether that line of > reasoning is enough to leave it in though. I don't think it is. If we haven't thought of the reason why we would need to skip code, that seems like YAGNI to me. As a matter of personal taste, I wouldn't leave a trailing "continue" in an earlier patch even if I were going to change it in a later patch, because it looks too much like an unintentional mistake. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan ` (6 preceding siblings ...) 2023-02-09 0:02 ` [PATCH v8 5/6] diff-lib: refactor out diff_change logic Calvin Wan @ 2023-02-09 0:02 ` Calvin Wan 2023-02-13 8:36 ` Glen Choo 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-09 0:02 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy, phillip.wood123 During the iteration of the index entries in run_diff_files, whenever a submodule is found and needs its status checked, a subprocess is spawned for it. Instead of spawning the subprocess immediately and waiting for its completion to continue, hold onto all submodules and relevant information in a list. Then use that list to create tasks for run_processes_parallel. Subprocess output is duplicated and passed to status_pipe_output which stores it to be parsed on completion of the subprocess. Add config option submodule.diffJobs to set the maximum number of parallel jobs. The option defaults to 1 if unset. If set to 0, the number of jobs is set to online_cpus(). Since run_diff_files is called from many different commands, I chose to grab the config option in the function rather than adding variables to every git command and then figuring out how to pass them all in. Signed-off-by: Calvin Wan <calvinwan@google.com> --- Documentation/config/submodule.txt | 12 +++ diff-lib.c | 91 ++++++++++++++++--- submodule.c | 140 +++++++++++++++++++++++++++++ submodule.h | 9 ++ t/t4027-diff-submodule.sh | 31 +++++++ t/t7506-status-submodule.sh | 25 ++++++ 6 files changed, 294 insertions(+), 14 deletions(-) diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt index 6490527b45..3209eb8117 100644 --- a/Documentation/config/submodule.txt +++ b/Documentation/config/submodule.txt @@ -93,6 +93,18 @@ submodule.fetchJobs:: in parallel. A value of 0 will give some reasonable default. If unset, it defaults to 1. +submodule.diffJobs:: + Specifies how many submodules are diffed at the same time. A + positive integer allows up to that number of submodules diffed + in parallel. A value of 0 will give some reasonable default. + If unset, it defaults to 1. The diff operation is used by many + other git commands such as add, merge, diff, status, stash and + more. Note that the expensive part of the diff operation is + reading the index from cache or memory. Therefore multiple jobs + may be detrimental to performance if your hardware does not + support parallel reads or if the number of jobs greatly exceeds + the amount of supported reads. + submodule.alternateLocation:: Specifies how the submodules obtain alternates when submodules are cloned. Possible values are `no`, `superproject`. diff --git a/diff-lib.c b/diff-lib.c index 7101cfda3f..2dde575ec6 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -14,6 +14,7 @@ #include "dir.h" #include "fsmonitor.h" #include "commit-reach.h" +#include "config.h" /* * diff-files @@ -65,26 +66,46 @@ static int check_removed(const struct index_state *istate, const struct cache_en * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES * option is set, the caller does not only want to know if a submodule is * modified at all but wants to know all the conditions that are met (new - * commits, untracked content and/or modified content). + * commits, untracked content and/or modified content). If + * defer_submodule_status bit is set, dirty_submodule will be left to the + * caller to set. defer_submodule_status can also be set to 0 in this + * function if there is no need to check if the submodule is modified. */ static int match_stat_with_submodule(struct diff_options *diffopt, const struct cache_entry *ce, struct stat *st, unsigned ce_option, - unsigned *dirty_submodule) + unsigned *dirty_submodule, int *defer_submodule_status, + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); - if (S_ISGITLINK(ce->ce_mode)) { - struct diff_flags orig_flags = diffopt->flags; - if (!diffopt->flags.override_submodule_config) - set_diffopt_flags_from_submodule_config(diffopt, ce->name); - if (diffopt->flags.ignore_submodules) - changed = 0; - else if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) + struct diff_flags orig_flags; + int defer = 0; + + if (!S_ISGITLINK(ce->ce_mode)) + goto ret; + + orig_flags = diffopt->flags; + if (!diffopt->flags.override_submodule_config) + set_diffopt_flags_from_submodule_config(diffopt, ce->name); + if (diffopt->flags.ignore_submodules) { + changed = 0; + goto cleanup; + } + if (!diffopt->flags.ignore_dirty_submodules && + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); - diffopt->flags = orig_flags; + diffopt->flags.ignore_untracked_in_submodules); + } } +cleanup: + diffopt->flags = orig_flags; +ret: + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; } @@ -121,6 +142,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ? CE_MATCH_RACY_IS_DIRTY : 0); uint64_t start = getnanotime(); struct index_state *istate = revs->diffopt.repo->index; + struct string_list submodules = STRING_LIST_INIT_NODUP; diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); @@ -244,6 +266,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce->ce_mode; } else { struct stat st; + unsigned ignore_untracked = 0; + int defer_submodule_status = 1; changed = check_removed(istate, ce, &st); if (changed) { @@ -265,14 +289,53 @@ int run_diff_files(struct rev_info *revs, unsigned int option) } changed = match_stat_with_submodule(&revs->diffopt, ce, &st, - ce_option, &dirty_submodule); + ce_option, &dirty_submodule, + &defer_submodule_status, + &ignore_untracked); newmode = ce_mode_from_stat(ce, st.st_mode); + if (defer_submodule_status) { + struct submodule_status_util tmp = { + .changed = changed, + .dirty_submodule = 0, + .ignore_untracked = ignore_untracked, + .newmode = newmode, + .ce = ce, + .path = ce->name, + }; + struct string_list_item *item; + + item = string_list_append(&submodules, ce->name); + item->util = xmalloc(sizeof(tmp)); + memcpy(item->util, &tmp, sizeof(tmp)); + continue; + } } if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, changed, istate, ce)) continue; } + if (submodules.nr) { + unsigned long parallel_jobs; + struct string_list_item *item; + + if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) + parallel_jobs = 1; + else if (!parallel_jobs) + parallel_jobs = online_cpus(); + + if (get_submodules_status(&submodules, parallel_jobs)) + die(_("submodule status failed")); + for_each_string_list_item(item, &submodules) { + struct submodule_status_util *util = item->util; + + if (diff_change_helper(&revs->diffopt, util->newmode, + util->dirty_submodule, util->changed, + istate, util->ce)) + continue; + } + } + string_list_clear(&submodules, 1); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); trace_performance_since(start, "diff-files"); @@ -320,7 +383,7 @@ static int get_stat_data(const struct index_state *istate, return -1; } changed = match_stat_with_submodule(diffopt, ce, &st, - 0, dirty_submodule); + 0, dirty_submodule, NULL, NULL); if (changed) { mode = ce_mode_from_stat(ce, st.st_mode); oid = null_oid(); diff --git a/submodule.c b/submodule.c index 426074cebb..e175fb8d3f 100644 --- a/submodule.c +++ b/submodule.c @@ -1373,6 +1373,13 @@ int submodule_touches_in_range(struct repository *r, return ret; } +struct submodule_parallel_status { + size_t index_count; + int result; + + struct string_list *submodule_names; +}; + struct submodule_parallel_fetch { /* * The index of the last index entry processed by @@ -1455,6 +1462,12 @@ struct fetch_task { struct oid_array *commits; /* Ensure these commits are fetched */ }; +struct status_task { + const char *path; + struct strbuf out; + int ignore_untracked; +}; + /** * When a submodule is not defined in .gitmodules, we cannot access it * via the regular submodule-config. Create a fake submodule, which we can @@ -1947,6 +1960,25 @@ static int parse_status_porcelain(char *str, size_t len, return 0; } +static void parse_status_porcelain_strbuf(struct strbuf *buf, + unsigned *dirty_submodule, + int ignore_untracked) +{ + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + + string_list_split(&list, buf->buf, '\n', -1); + + for_each_string_list_item(item, &list) { + if (parse_status_porcelain(item->string, + strlen(item->string), + dirty_submodule, + ignore_untracked)) + break; + } + string_list_clear(&list, 0); +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1981,6 +2013,114 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) return dirty_submodule; } +static struct status_task * +get_status_task_from_index(struct submodule_parallel_status *sps, + struct strbuf *err) +{ + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; + struct status_task *task; + + if (!verify_submodule_git_directory(util->path)) + continue; + + task = xmalloc(sizeof(*task)); + task->path = util->path; + task->ignore_untracked = util->ignore_untracked; + strbuf_init(&task->out, 0); + sps->index_count++; + return task; + } + return NULL; +} + +static int get_next_submodule_status(struct child_process *cp, + struct strbuf *err, void *data, + void **task_cb) +{ + struct submodule_parallel_status *sps = data; + struct status_task *task = get_status_task_from_index(sps, err); + + if (!task) + return 0; + + child_process_init(cp); + prepare_submodule_repo_env_in_gitdir(&cp->env); + prepare_status_porcelain(cp, task->path, task->ignore_untracked); + *task_cb = task; + return 1; +} + +static int status_start_failure(struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + + sps->result = 1; + strbuf_addf(err, _(STATUS_PORCELAIN_START_ERROR), task->path); + return 0; +} + +static void status_duplicate_output(struct strbuf *out, + size_t offset, + void *cb, void *task_cb) +{ + struct status_task *task = task_cb; + + strbuf_add(&task->out, out->buf + offset, out->len - offset); + strbuf_setlen(out, offset); +} + +static int status_finish(int retvalue, struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + struct string_list_item *it = + string_list_lookup(sps->submodule_names, task->path); + struct submodule_status_util *util = it->util; + + if (retvalue) { + sps->result = 1; + strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path); + } + + parse_status_porcelain_strbuf(&task->out, + &util->dirty_submodule, + util->ignore_untracked); + + strbuf_release(&task->out); + free(task); + + return 0; +} + +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs) +{ + struct submodule_parallel_status sps = { + .submodule_names = submodules, + }; + const struct run_process_parallel_opts opts = { + .tr2_category = "submodule", + .tr2_label = "parallel/status", + + .processes = max_parallel_jobs, + + .get_next_task = get_next_submodule_status, + .start_failure = status_start_failure, + .duplicate_output = status_duplicate_output, + .task_finished = status_finish, + .data = &sps, + }; + + string_list_sort(sps.submodule_names); + run_processes_parallel(&opts); + + return sps.result; +} + int submodule_uses_gitfile(const char *path) { struct child_process cp = CHILD_PROCESS_INIT; diff --git a/submodule.h b/submodule.h index b52a4ff1e7..08d278a414 100644 --- a/submodule.h +++ b/submodule.h @@ -41,6 +41,13 @@ struct submodule_update_strategy { .type = SM_UPDATE_UNSPECIFIED, \ } +struct submodule_status_util { + int changed, ignore_untracked; + unsigned dirty_submodule, newmode; + struct cache_entry *ce; + const char *path; +}; + int is_gitmodules_unmerged(struct index_state *istate); int is_writing_gitmodules_ok(void); int is_staging_gitmodules_ok(struct index_state *istate); @@ -94,6 +101,8 @@ int fetch_submodules(struct repository *r, int command_line_option, int default_option, int quiet, int max_parallel_jobs); +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs); unsigned is_submodule_modified(const char *path, int ignore_untracked); int submodule_uses_gitfile(const char *path); diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh index 40164ae07d..1c747cc325 100755 --- a/t/t4027-diff-submodule.sh +++ b/t/t4027-diff-submodule.sh @@ -34,6 +34,25 @@ test_expect_success setup ' subtip=$3 subprev=$2 ' +test_expect_success 'diff in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git diff && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git diff && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + test_expect_success 'git diff --raw HEAD' ' hexsz=$(test_oid hexsz) && git diff --raw --abbrev=$hexsz HEAD >actual && @@ -70,6 +89,18 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree)' ' test_cmp expect.body actual.body ' +test_expect_success 'git diff HEAD with dirty submodule (work tree, parallel)' ' + ( + cd sub && + git reset --hard && + echo >>world + ) && + git -c submodule.diffJobs=8 diff HEAD >actual && + sed -e "1,/^@@/d" actual >actual.body && + expect_from_to >expect.body $subtip $subprev-dirty && + test_cmp expect.body actual.body +' + test_expect_success 'git diff HEAD with dirty submodule (index)' ' ( cd sub && diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index d050091345..7da64e4c4c 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -412,4 +412,29 @@ test_expect_success 'status with added file in nested submodule (short)' ' EOF ' +test_expect_success 'status in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git status && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git status && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + +test_expect_success 'status in superproject with submodules (parallel)' ' + git -C super status --porcelain >output && + git -C super -c submodule.diffJobs=8 status --porcelain >output_parallel && + diff output output_parallel +' + test_done -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules 2023-02-09 0:02 ` [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan @ 2023-02-13 8:36 ` Glen Choo 0 siblings, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-02-13 8:36 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, avarab, newren, jonathantanmy, phillip.wood123 Calvin Wan <calvinwan@google.com> writes: > @@ -244,6 +266,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > newmode = ce->ce_mode; > } else { > struct stat st; > + unsigned ignore_untracked = 0; > + int defer_submodule_status = 1; > > changed = check_removed(istate, ce, &st); > if (changed) { Previously [1] it wasn't entirely clear whether we intended to always parallelize submodule diffing, but now it seems that we always try to parallelize. In essence, this means that we don't have a serial implementation any more, but maybe that's okay. [1] https://lore.kernel.org/git/kl6lilgtveoe.fsf@chooglen-macbookpro.roam.corp.google.com/ > @@ -265,14 +289,53 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > } > > changed = match_stat_with_submodule(&revs->diffopt, ce, &st, > - ce_option, &dirty_submodule); > + ce_option, &dirty_submodule, > + &defer_submodule_status, > + &ignore_untracked); Here we get the 'changed' bit of the submodule. Because we always defer, we never call is_submodule_modified() inside match_stat_with_submodule() and as such, we never set "dirty_submodule" here. If so, could we remove the variable altogether? > newmode = ce_mode_from_stat(ce, st.st_mode); > + if (defer_submodule_status) { > + struct submodule_status_util tmp = { > + .changed = changed, > + .dirty_submodule = 0, > + .ignore_untracked = ignore_untracked, > + .newmode = newmode, > + .ce = ce, > + .path = ce->name, > + }; > + struct string_list_item *item; > + > + item = string_list_append(&submodules, ce->name); > + item->util = xmalloc(sizeof(tmp)); > + memcpy(item->util, &tmp, sizeof(tmp)); > + continue; > + } > } > > if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, > changed, istate, ce)) I'm surprised to see that we still call "diff_change_helper()" even though we've 'deferred' the submodule diff, especially since "changed" is set and "dirty_submodule" is unset. Even if this is safe, I think we shouldn't do this because... > + if (submodules.nr) { > + unsigned long parallel_jobs; > + struct string_list_item *item; > + > + if (git_config_get_ulong("submodule.diffjobs", ¶llel_jobs)) > + parallel_jobs = 1; > + else if (!parallel_jobs) > + parallel_jobs = online_cpus(); > + > + if (get_submodules_status(&submodules, parallel_jobs)) > + die(_("submodule status failed")); > + for_each_string_list_item(item, &submodules) { > + struct submodule_status_util *util = item->util; > + > + if (diff_change_helper(&revs->diffopt, util->newmode, > + util->dirty_submodule, util->changed, > + istate, util->ce)) Here we call "diff_change_helper()" again on the deferred submodule, but now with the "dirty_submodule" value we expected. At best this is wasteful, but at worst this is possibly wrong. For good measure, I applied this patch to see if we needed either "dirty_submodule" or the second "diff_change_helper()" call; our test suite still passes after I remove both of them. diff --git a/diff-lib.c b/diff-lib.c index 2dde575ec6..21adcc7fd6 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -156,6 +156,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) struct cache_entry *ce = istate->cache[i]; int changed; unsigned dirty_submodule = 0; + int defer_submodule_status = 1; if (diff_can_quit_early(&revs->diffopt)) break; @@ -267,7 +268,6 @@ int run_diff_files(struct rev_info *revs, unsigned int option) } else { struct stat st; unsigned ignore_untracked = 0; - int defer_submodule_status = 1; changed = check_removed(istate, ce, &st); if (changed) { @@ -311,9 +311,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) } } - if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, - changed, istate, ce)) - continue; + if (!defer_submodule_status) + diff_change_helper(&revs->diffopt, newmode, 0, + changed, istate, ce); } if (submodules.nr) { unsigned long parallel_jobs; > +static void parse_status_porcelain_strbuf(struct strbuf *buf, > + unsigned *dirty_submodule, > + int ignore_untracked) > +{ > + struct string_list list = STRING_LIST_INIT_DUP; > + struct string_list_item *item; > + > + string_list_split(&list, buf->buf, '\n', -1); > + > + for_each_string_list_item(item, &list) { > + if (parse_status_porcelain(item->string, > + strlen(item->string), > + dirty_submodule, > + ignore_untracked)) > + break; > + } > + string_list_clear(&list, 0); > +} Given that this function only has one caller, is quite simple, and isn't actually a strbuf version of "parse_status_porcelain()" (it's actually a multiline version that also happens to accept a strbuf), I think this might be better inlined. > +test_expect_success 'status in superproject with submodules (parallel)' ' > + git -C super status --porcelain >output && > + git -C super -c submodule.diffJobs=8 status --porcelain >output_parallel && > + diff output output_parallel > +' > + > test_done When I first suggested this test, I thought that we would sometimes defer submodule status and sometimes not, so this would be a good way to check between both modes. Now this is less useful, since this is only checking that parallelism > 1 doesn't affect the output, but it's still a useful reasonableness check IMO. Thanks. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan @ 2023-02-07 18:17 ` Calvin Wan 2023-02-07 22:16 ` Ævar Arnfjörð Bjarmason 2023-02-08 14:19 ` Phillip Wood 2023-02-07 18:17 ` [PATCH v7 2/7] submodule: strbuf variable rename Calvin Wan ` (5 subsequent siblings) 7 siblings, 2 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:17 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy Add duplicate_output_fn as an optionally set function in run_process_parallel_opts. If set, output from each child process is copied and passed to the callback function whenever output from the child process is buffered to allow for separate parsing. Signed-off-by: Calvin Wan <calvinwan@google.com> --- run-command.c | 16 ++++++++++++--- run-command.h | 27 +++++++++++++++++++++++++ t/helper/test-run-command.c | 21 ++++++++++++++++++++ t/t0061-run-command.sh | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 100 insertions(+), 3 deletions(-) diff --git a/run-command.c b/run-command.c index 756f1839aa..cad88befe0 100644 --- a/run-command.c +++ b/run-command.c @@ -1526,6 +1526,9 @@ static void pp_init(struct parallel_processes *pp, if (!opts->get_next_task) BUG("you need to specify a get_next_task function"); + if (opts->duplicate_output && opts->ungroup) + BUG("duplicate_output and ungroup are incompatible with each other"); + CALLOC_ARRAY(pp->children, n); if (!opts->ungroup) CALLOC_ARRAY(pp->pfd, n); @@ -1645,14 +1648,21 @@ static void pp_buffer_stderr(struct parallel_processes *pp, for (size_t i = 0; i < opts->processes; i++) { if (pp->children[i].state == GIT_CP_WORKING && pp->pfd[i].revents & (POLLIN | POLLHUP)) { - int n = strbuf_read_once(&pp->children[i].err, - pp->children[i].process.err, 0); + ssize_t n = strbuf_read_once(&pp->children[i].err, + pp->children[i].process.err, 0); if (n == 0) { close(pp->children[i].process.err); pp->children[i].state = GIT_CP_WAIT_CLEANUP; - } else if (n < 0) + } else if (n < 0) { if (errno != EAGAIN) die_errno("read"); + } else { + if (opts->duplicate_output) + opts->duplicate_output(&pp->children[i].err, + strlen(pp->children[i].err.buf) - n, + opts->data, + pp->children[i].data); + } } } } diff --git a/run-command.h b/run-command.h index 072db56a4d..6dcf999f6c 100644 --- a/run-command.h +++ b/run-command.h @@ -408,6 +408,27 @@ typedef int (*start_failure_fn)(struct strbuf *out, void *pp_cb, void *pp_task_cb); +/** + * This callback is called whenever output from a child process is buffered + * + * See run_processes_parallel() below for a discussion of the "struct + * strbuf *out" parameter. + * + * The offset refers to the number of bytes originally in "out" before + * the output from the child process was buffered. Therefore, the buffer + * range, "out + buf" to the end of "out", would contain the buffer of + * the child process output. + * + * pp_cb is the callback cookie as passed into run_processes_parallel, + * pp_task_cb is the callback cookie as passed into get_next_task_fn. + * + * This function is incompatible with "ungroup" + */ +typedef void (*duplicate_output_fn)(struct strbuf *out, + size_t offset, + void *pp_cb, + void *pp_task_cb); + /** * This callback is called on every child process that finished processing. * @@ -461,6 +482,12 @@ struct run_process_parallel_opts */ start_failure_fn start_failure; + /** + * duplicate_output: See duplicate_output_fn() above. This should be + * NULL unless process specific output is needed + */ + duplicate_output_fn duplicate_output; + /** * task_finished: See task_finished_fn() above. This can be * NULL to omit any special handling. diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index 3ecb830f4a..ffd3cd0045 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -52,6 +52,21 @@ static int no_job(struct child_process *cp, return 0; } +static void duplicate_output(struct strbuf *out, + size_t offset, + void *pp_cb UNUSED, + void *pp_task_cb UNUSED) +{ + struct string_list list = STRING_LIST_INIT_DUP; + + string_list_split(&list, out->buf + offset, '\n', -1); + for (size_t i = 0; i < list.nr; i++) { + if (strlen(list.items[i].string) > 0) + fprintf(stderr, "duplicate_output: %s\n", list.items[i].string); + } + string_list_clear(&list, 0); +} + static int task_finished(int result, struct strbuf *err, void *pp_cb, @@ -439,6 +454,12 @@ int cmd__run_command(int argc, const char **argv) opts.ungroup = 1; } + if (!strcmp(argv[1], "--duplicate-output")) { + argv += 1; + argc -= 1; + opts.duplicate_output = duplicate_output; + } + jobs = atoi(argv[2]); strvec_clear(&proc.args); strvec_pushv(&proc.args, (const char **)argv + 3); diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index e2411f6a9b..879e536638 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -135,6 +135,15 @@ test_expect_success 'run_command runs in parallel with more jobs available than test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more jobs available than tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' ' test-tool run-command --ungroup run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -147,6 +156,15 @@ test_expect_success 'run_command runs in parallel with as many jobs as tasks' ' test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with as many jobs as tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' ' test-tool run-command --ungroup run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -159,6 +177,15 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more tasks than jobs available --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' ' test-tool run-command --ungroup run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -180,6 +207,12 @@ test_expect_success 'run_command is asked to abort gracefully' ' test_cmp expect actual ' +test_expect_success 'run_command is asked to abort gracefully --duplicate-output' ' + test-tool run-command --duplicate-output run-command-abort 3 false >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command is asked to abort gracefully (ungroup)' ' test-tool run-command --ungroup run-command-abort 3 false >out 2>err && test_must_be_empty out && @@ -196,6 +229,12 @@ test_expect_success 'run_command outputs ' ' test_cmp expect actual ' +test_expect_success 'run_command outputs --duplicate-output' ' + test-tool run-command --duplicate-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command outputs (ungroup) ' ' test-tool run-command --ungroup run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_must_be_empty out && -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-07 18:17 ` [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan @ 2023-02-07 22:16 ` Ævar Arnfjörð Bjarmason 2023-02-08 22:50 ` Calvin Wan 2023-02-08 14:19 ` Phillip Wood 1 sibling, 1 reply; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-07 22:16 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy On Tue, Feb 07 2023, Calvin Wan wrote: > diff --git a/run-command.c b/run-command.c > index 756f1839aa..cad88befe0 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -1526,6 +1526,9 @@ static void pp_init(struct parallel_processes *pp, > if (!opts->get_next_task) > BUG("you need to specify a get_next_task function"); > > + if (opts->duplicate_output && opts->ungroup) > + BUG("duplicate_output and ungroup are incompatible with each other"); > + > CALLOC_ARRAY(pp->children, n); > if (!opts->ungroup) > CALLOC_ARRAY(pp->pfd, n); A trivial request, not worth a re-roll in itself: The "prep" topic[1] I have for Emily's eventual config-based hooks doesn't need to add new run-command.c modes that are incompatible with ungroup, but that happens in the next stage of that saga. When I merge your topic here with that, the end result here is: if (opts->ungroup) { if (opts->feed_pipe) BUG(".ungroup=1 is incompatible with .feed_pipe != NULL"); if (opts->consume_sideband) BUG(".ungroup=1 is incompatible with .consume_sideband != NULL"); } if (!opts->get_next_task) BUG("you need to specify a get_next_task function"); if (opts->duplicate_output && opts->ungroup) BUG("duplicate_output and ungroup are incompatible with each other"); So, whether do the incompatibility check before or after "get_next_task" is arbitrary. If I had to pick, I think doing it after as you're doing here probably makes more sense. But would ou mind if this addition of yours were instead: if (opts->ungroup) { if (opts->duplicate_output) BUG("duplicate_output and ungroup are incompatible with each other") } Like I said, a trivial request. But it will save us the eventual refactoring of that into nested checks as we add more of these options. To the extent that we need to mention the seemingly odd looking pattern we could just say that we're future-proofing this for future incompatible modes. 1. https://lore.kernel.org/git/cover-0.5-00000000000-20230123T170550Z-avarab@gmail.com/#t > @@ -1645,14 +1648,21 @@ static void pp_buffer_stderr(struct parallel_processes *pp, > for (size_t i = 0; i < opts->processes; i++) { > if (pp->children[i].state == GIT_CP_WORKING && > pp->pfd[i].revents & (POLLIN | POLLHUP)) { > - int n = strbuf_read_once(&pp->children[i].err, > - pp->children[i].process.err, 0); > + ssize_t n = strbuf_read_once(&pp->children[i].err, > + pp->children[i].process.err, 0); This s/int/ssize_t/ change is a good on, but not mentioned in the commit message. Maybe worth splitting out? If I revert that back to "int" on top of this entire topic our tests still pass, so while it's a good change it seems entirely unrelated to the "duplicate_output" subject of this patch. > if (n == 0) { > close(pp->children[i].process.err); > pp->children[i].state = GIT_CP_WAIT_CLEANUP; > - } else if (n < 0) > + } else if (n < 0) { Here you're adding braces, which is an otherwise good change (but maybe worth splitting up, I haven't read the rest of this topic to see if there's even more style changes). In this case we should/could have done this change with the pre-image, before "duplicate_output". > if (errno != EAGAIN) > die_errno("read"); > + } else { > + if (opts->duplicate_output) I've read ahead and this topic adds nothing new to this "else" block, so why the extra indentation instead of: } else if (opts->duplicate_output) { [...]; > + opts->duplicate_output(&pp->children[i].err, > + strlen(pp->children[i].err.buf) - n, Uh, why are we getting the length of strbuf with strlen()? Am I missing something obvious here, or should this be: pp->children[i].err.len - n ? > + opts->data, > + pp->children[i].data); Especially with how otherwise painful the wrapping is here (well, not very, but we can easily save a \t-indent here). > + } > } > } > } > diff --git a/run-command.h b/run-command.h > index 072db56a4d..6dcf999f6c 100644 > --- a/run-command.h > +++ b/run-command.h > @@ -408,6 +408,27 @@ typedef int (*start_failure_fn)(struct strbuf *out, > void *pp_cb, > void *pp_task_cb); > > +/** > + * This callback is called whenever output from a child process is buffered > + * > + * See run_processes_parallel() below for a discussion of the "struct > + * strbuf *out" parameter. > + * > + * The offset refers to the number of bytes originally in "out" before > + * the output from the child process was buffered. Therefore, the buffer > + * range, "out + buf" to the end of "out", would contain the buffer of > + * the child process output. > + * > + * pp_cb is the callback cookie as passed into run_processes_parallel, > + * pp_task_cb is the callback cookie as passed into get_next_task_fn. > + * > + * This function is incompatible with "ungroup" > + */ > +typedef void (*duplicate_output_fn)(struct strbuf *out, > + size_t offset, > + void *pp_cb, > + void *pp_task_cb); There's some over-wrapping here, I see some existing code does it, but for new code we could follow our usual style, which would put this on two lines. > + > /** > * This callback is called on every child process that finished processing. > * > @@ -461,6 +482,12 @@ struct run_process_parallel_opts > */ > start_failure_fn start_failure; > > + /** > + * duplicate_output: See duplicate_output_fn() above. This should be > + * NULL unless process specific output is needed > + */ Here we mostly refer to the previous docs, but the "unless process specific output is neeed" is very confusing. Without seeing the name or having read the above I'd think this were some "do_not_pipe_to_dev_null" feature. Shouldn't we say "Unless you need to capture the output... leave this at NULL" or something? > +static void duplicate_output(struct strbuf *out, > + size_t offset, > + void *pp_cb UNUSED, > + void *pp_task_cb UNUSED) > +{ > + struct string_list list = STRING_LIST_INIT_DUP; > + > + string_list_split(&list, out->buf + offset, '\n', -1); > + for (size_t i = 0; i < list.nr; i++) { > + if (strlen(list.items[i].string) > 0) First, you can use for_each_string_list_item() here to make this look much nicer/simpler. Second, don't use strlen(s) > 0, just use strlen(s). Third, you can git rid of the {} braces for the "for" here. But just getting rid of that strlen() check and printing makes all your tests pass. And why is this thing that wants to prove to us that we're capturing the output wanting to strip successive newlines? Using a struct string_list for this is also pretty wasteful, we could just make this a while-loop that printed this string when it sees "\n". But it's just test code, so we don't care, I think it's fine for it to be wastful, I just don't see why it's doing what it's doing, and what it's going out of its way to do isn't tested for here. > +test_expect_success 'run_command runs in parallel with more jobs available than tasks --duplicate-output' ' > + test-tool run-command --duplicate-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > + test_must_be_empty out && > + test 4 = $(grep -c "duplicate_output: Hello" err) && > + test 4 = $(grep -c "duplicate_output: World" err) && > + sed "/duplicate_output/d" err > err1 && Style: ">f" not "> f". ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-07 22:16 ` Ævar Arnfjörð Bjarmason @ 2023-02-08 22:50 ` Calvin Wan 0 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-08 22:50 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, chooglen, newren, jonathantanmy > But would ou mind if this addition of yours were instead: > > if (opts->ungroup) { > if (opts->duplicate_output) > BUG("duplicate_output and ungroup are incompatible with each other") > } I don't see why not -- will change. > > @@ -1645,14 +1648,21 @@ static void pp_buffer_stderr(struct parallel_processes *pp, > > for (size_t i = 0; i < opts->processes; i++) { > > if (pp->children[i].state == GIT_CP_WORKING && > > pp->pfd[i].revents & (POLLIN | POLLHUP)) { > > - int n = strbuf_read_once(&pp->children[i].err, > > - pp->children[i].process.err, 0); > > + ssize_t n = strbuf_read_once(&pp->children[i].err, > > + pp->children[i].process.err, 0); > > This s/int/ssize_t/ change is a good on, but not mentioned in the commit > message. Maybe worth splitting out? I'll call this and the style change out in the commit message instead of splitting it out. > And why is this thing that wants to prove to us that we're capturing the > output wanting to strip successive newlines? I added it as a sanity check originally, but you're right that this is unnecessary. Thanks for your comments on the other stylistic nits. I've gone ahead and fixed them all. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-07 18:17 ` [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan 2023-02-07 22:16 ` Ævar Arnfjörð Bjarmason @ 2023-02-08 14:19 ` Phillip Wood 2023-02-08 22:54 ` Calvin Wan 1 sibling, 1 reply; 86+ messages in thread From: Phillip Wood @ 2023-02-08 14:19 UTC (permalink / raw) To: Calvin Wan, git; +Cc: avarab, chooglen, newren, jonathantanmy Hi Calvin On 07/02/2023 18:17, Calvin Wan wrote: > Add duplicate_output_fn as an optionally set function in > run_process_parallel_opts. If set, output from each child process is > copied and passed to the callback function whenever output from the > child process is buffered to allow for separate parsing. > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > run-command.c | 16 ++++++++++++--- > run-command.h | 27 +++++++++++++++++++++++++ > t/helper/test-run-command.c | 21 ++++++++++++++++++++ > t/t0061-run-command.sh | 39 +++++++++++++++++++++++++++++++++++++ > 4 files changed, 100 insertions(+), 3 deletions(-) > > diff --git a/run-command.c b/run-command.c > index 756f1839aa..cad88befe0 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -1526,6 +1526,9 @@ static void pp_init(struct parallel_processes *pp, > if (!opts->get_next_task) > BUG("you need to specify a get_next_task function"); > > + if (opts->duplicate_output && opts->ungroup) > + BUG("duplicate_output and ungroup are incompatible with each other"); > + > CALLOC_ARRAY(pp->children, n); > if (!opts->ungroup) > CALLOC_ARRAY(pp->pfd, n); > @@ -1645,14 +1648,21 @@ static void pp_buffer_stderr(struct parallel_processes *pp, > for (size_t i = 0; i < opts->processes; i++) { > if (pp->children[i].state == GIT_CP_WORKING && > pp->pfd[i].revents & (POLLIN | POLLHUP)) { > - int n = strbuf_read_once(&pp->children[i].err, > - pp->children[i].process.err, 0); > + ssize_t n = strbuf_read_once(&pp->children[i].err, > + pp->children[i].process.err, 0); > if (n == 0) { > close(pp->children[i].process.err); > pp->children[i].state = GIT_CP_WAIT_CLEANUP; > - } else if (n < 0) > + } else if (n < 0) { > if (errno != EAGAIN) > die_errno("read"); > + } else { > + if (opts->duplicate_output) > + opts->duplicate_output(&pp->children[i].err, > + strlen(pp->children[i].err.buf) - n, Looking at how this is used in patch 7 I think it would be better to pass a const char*, length pair rather than a struct strbuf*, offset pair. i.e. opts->duplicate_output(pp->children[i].err.buf + pp->children[i].err.len - n, n, ...) That would make it clear that we do not expect duplicate_output() to alter the buffer and would avoid the duplicate_output() having to add the offset to the start of the buffer to find the new data. Best Wishes Phillip > + opts->data, > + pp->children[i].data); > + } > } > } > } > diff --git a/run-command.h b/run-command.h > index 072db56a4d..6dcf999f6c 100644 > --- a/run-command.h > +++ b/run-command.h > @@ -408,6 +408,27 @@ typedef int (*start_failure_fn)(struct strbuf *out, > void *pp_cb, > void *pp_task_cb); > > +/** > + * This callback is called whenever output from a child process is buffered > + * > + * See run_processes_parallel() below for a discussion of the "struct > + * strbuf *out" parameter. > + * > + * The offset refers to the number of bytes originally in "out" before > + * the output from the child process was buffered. Therefore, the buffer > + * range, "out + buf" to the end of "out", would contain the buffer of > + * the child process output. > + * > + * pp_cb is the callback cookie as passed into run_processes_parallel, > + * pp_task_cb is the callback cookie as passed into get_next_task_fn. > + * > + * This function is incompatible with "ungroup" > + */ > +typedef void (*duplicate_output_fn)(struct strbuf *out, > + size_t offset, > + void *pp_cb, > + void *pp_task_cb); > + > /** > * This callback is called on every child process that finished processing. > * > @@ -461,6 +482,12 @@ struct run_process_parallel_opts > */ > start_failure_fn start_failure; > > + /** > + * duplicate_output: See duplicate_output_fn() above. This should be > + * NULL unless process specific output is needed > + */ > + duplicate_output_fn duplicate_output; > + > /** > * task_finished: See task_finished_fn() above. This can be > * NULL to omit any special handling. > diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c > index 3ecb830f4a..ffd3cd0045 100644 > --- a/t/helper/test-run-command.c > +++ b/t/helper/test-run-command.c > @@ -52,6 +52,21 @@ static int no_job(struct child_process *cp, > return 0; > } > > +static void duplicate_output(struct strbuf *out, > + size_t offset, > + void *pp_cb UNUSED, > + void *pp_task_cb UNUSED) > +{ > + struct string_list list = STRING_LIST_INIT_DUP; > + > + string_list_split(&list, out->buf + offset, '\n', -1); > + for (size_t i = 0; i < list.nr; i++) { > + if (strlen(list.items[i].string) > 0) > + fprintf(stderr, "duplicate_output: %s\n", list.items[i].string); > + } > + string_list_clear(&list, 0); > +} > + > static int task_finished(int result, > struct strbuf *err, > void *pp_cb, > @@ -439,6 +454,12 @@ int cmd__run_command(int argc, const char **argv) > opts.ungroup = 1; > } > > + if (!strcmp(argv[1], "--duplicate-output")) { > + argv += 1; > + argc -= 1; > + opts.duplicate_output = duplicate_output; > + } > + > jobs = atoi(argv[2]); > strvec_clear(&proc.args); > strvec_pushv(&proc.args, (const char **)argv + 3); > diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh > index e2411f6a9b..879e536638 100755 > --- a/t/t0061-run-command.sh > +++ b/t/t0061-run-command.sh > @@ -135,6 +135,15 @@ test_expect_success 'run_command runs in parallel with more jobs available than > test_cmp expect actual > ' > > +test_expect_success 'run_command runs in parallel with more jobs available than tasks --duplicate-output' ' > + test-tool run-command --duplicate-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > + test_must_be_empty out && > + test 4 = $(grep -c "duplicate_output: Hello" err) && > + test 4 = $(grep -c "duplicate_output: World" err) && > + sed "/duplicate_output/d" err > err1 && > + test_cmp expect err1 > +' > + > test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' ' > test-tool run-command --ungroup run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > test_line_count = 8 out && > @@ -147,6 +156,15 @@ test_expect_success 'run_command runs in parallel with as many jobs as tasks' ' > test_cmp expect actual > ' > > +test_expect_success 'run_command runs in parallel with as many jobs as tasks --duplicate-output' ' > + test-tool run-command --duplicate-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > + test_must_be_empty out && > + test 4 = $(grep -c "duplicate_output: Hello" err) && > + test 4 = $(grep -c "duplicate_output: World" err) && > + sed "/duplicate_output/d" err > err1 && > + test_cmp expect err1 > +' > + > test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' ' > test-tool run-command --ungroup run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > test_line_count = 8 out && > @@ -159,6 +177,15 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai > test_cmp expect actual > ' > > +test_expect_success 'run_command runs in parallel with more tasks than jobs available --duplicate-output' ' > + test-tool run-command --duplicate-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > + test_must_be_empty out && > + test 4 = $(grep -c "duplicate_output: Hello" err) && > + test 4 = $(grep -c "duplicate_output: World" err) && > + sed "/duplicate_output/d" err > err1 && > + test_cmp expect err1 > +' > + > test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' ' > test-tool run-command --ungroup run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > test_line_count = 8 out && > @@ -180,6 +207,12 @@ test_expect_success 'run_command is asked to abort gracefully' ' > test_cmp expect actual > ' > > +test_expect_success 'run_command is asked to abort gracefully --duplicate-output' ' > + test-tool run-command --duplicate-output run-command-abort 3 false >out 2>err && > + test_must_be_empty out && > + test_cmp expect err > +' > + > test_expect_success 'run_command is asked to abort gracefully (ungroup)' ' > test-tool run-command --ungroup run-command-abort 3 false >out 2>err && > test_must_be_empty out && > @@ -196,6 +229,12 @@ test_expect_success 'run_command outputs ' ' > test_cmp expect actual > ' > > +test_expect_success 'run_command outputs --duplicate-output' ' > + test-tool run-command --duplicate-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > + test_must_be_empty out && > + test_cmp expect err > +' > + > test_expect_success 'run_command outputs (ungroup) ' ' > test-tool run-command --ungroup run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && > test_must_be_empty out && ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-08 14:19 ` Phillip Wood @ 2023-02-08 22:54 ` Calvin Wan 2023-02-09 20:37 ` Phillip Wood 0 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-08 22:54 UTC (permalink / raw) To: phillip.wood; +Cc: git, avarab, chooglen, newren, jonathantanmy > > + } else { > > + if (opts->duplicate_output) > > + opts->duplicate_output(&pp->children[i].err, > > + strlen(pp->children[i].err.buf) - n, > > Looking at how this is used in patch 7 I think it would be better to > pass a const char*, length pair rather than a struct strbuf*, offset pair. > i.e. > opts->duplicate_output(pp->children[i].err.buf + > pp->children[i].err.len - n, n, ...) > > That would make it clear that we do not expect duplicate_output() to > alter the buffer and would avoid the duplicate_output() having to add > the offset to the start of the buffer to find the new data. I don't think that would work since pp->children[i].err.buf + pp->children[i].err.len - n wouldn't end up as a const char* unless I'm missing something? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-02-08 22:54 ` Calvin Wan @ 2023-02-09 20:37 ` Phillip Wood 0 siblings, 0 replies; 86+ messages in thread From: Phillip Wood @ 2023-02-09 20:37 UTC (permalink / raw) To: Calvin Wan; +Cc: git, avarab, chooglen, newren, jonathantanmy Hi Calvin On 08/02/2023 22:54, Calvin Wan wrote: >>> + } else { >>> + if (opts->duplicate_output) >>> + opts->duplicate_output(&pp->children[i].err, >>> + strlen(pp->children[i].err.buf) - n, >> >> Looking at how this is used in patch 7 I think it would be better to >> pass a const char*, length pair rather than a struct strbuf*, offset pair. >> i.e. >> opts->duplicate_output(pp->children[i].err.buf + >> pp->children[i].err.len - n, n, ...) >> >> That would make it clear that we do not expect duplicate_output() to >> alter the buffer and would avoid the duplicate_output() having to add >> the offset to the start of the buffer to find the new data. > > I don't think that would work since > pp->children[i].err.buf + pp->children[i].err.len - n > wouldn't end up as a const char* unless I'm missing something? You can still pass it to a function that takes a const char* though and change type of the callback to typedef void (*duplicate_output_fn)(const char *out, size_t offset, void *pp_cb, void *pp_task_cb); Best Wishes Phillip ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v7 2/7] submodule: strbuf variable rename 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan 2023-02-07 18:17 ` [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan @ 2023-02-07 18:17 ` Calvin Wan 2023-02-07 22:47 ` Ævar Arnfjörð Bjarmason 2023-02-07 18:17 ` [PATCH v7 3/7] submodule: move status parsing into function Calvin Wan ` (4 subsequent siblings) 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:17 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy A prepatory change for a future patch that moves the status parsing logic to a separate function. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/submodule.c b/submodule.c index fae24ef34a..faf37c1101 100644 --- a/submodule.c +++ b/submodule.c @@ -1906,25 +1906,28 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { + char *str = buf.buf; + const size_t len = buf.len; + /* regular untracked files */ - if (buf.buf[0] == '?') + if (str[0] == '?') dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '1' || - buf.buf[0] == '2') { + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { /* T = line type, XY = status, SSSS = submodule state */ - if (buf.len < strlen("T XY SSSS")) + if (len < strlen("T XY SSSS")) BUG("invalid status --porcelain=2 line %s", - buf.buf); + str); - if (buf.buf[5] == 'S' && buf.buf[8] == 'U') + if (str[5] == 'S' && str[8] == 'U') /* nested untracked file */ dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '2' || - memcmp(buf.buf + 5, "S..U", 4)) + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) /* other change */ dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; } -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v7 2/7] submodule: strbuf variable rename 2023-02-07 18:17 ` [PATCH v7 2/7] submodule: strbuf variable rename Calvin Wan @ 2023-02-07 22:47 ` Ævar Arnfjörð Bjarmason 2023-02-08 22:59 ` Calvin Wan 0 siblings, 1 reply; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-07 22:47 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy On Tue, Feb 07 2023, Calvin Wan wrote: > A prepatory change for a future patch that moves the status parsing > logic to a separate function. Ah, I think I suggested splitting this up in some previous round, and coming back to this this + the next patch look very nice with the move detection, thanks! > fp = xfdopen(cp.out, "r"); > while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { > + char *str = buf.buf; > + const size_t len = buf.len; > + > /* regular untracked files */ > - if (buf.buf[0] == '?') > + if (str[0] == '?') > dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; I'll only add that we could also do this on top: diff --git a/submodule.c b/submodule.c index c7c6bfb2e26..eeb940d96a0 100644 --- a/submodule.c +++ b/submodule.c @@ -1875,7 +1875,7 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) struct child_process cp = CHILD_PROCESS_INIT; struct strbuf buf = STRBUF_INIT; FILE *fp; - unsigned dirty_submodule = 0; + unsigned dirty_submodule0 = 0; const char *git_dir; int ignore_cp_exit_code = 0; @@ -1908,10 +1908,11 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { char *str = buf.buf; const size_t len = buf.len; + unsigned *dirty_submodule = &dirty_submodule0; /* regular untracked files */ if (str[0] == '?') - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; if (str[0] == 'u' || str[0] == '1' || @@ -1923,17 +1924,17 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) if (str[5] == 'S' && str[8] == 'U') /* nested untracked file */ - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; if (str[0] == 'u' || str[0] == '2' || memcmp(str + 5, "S..U", 4)) /* other change */ - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; } - if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || ignore_untracked)) { /* * We're not interested in any further information from @@ -1949,7 +1950,7 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) die(_("'git status --porcelain=2' failed in submodule %s"), path); strbuf_release(&buf); - return dirty_submodule; + return dirty_submodule0; } int submodule_uses_gitfile(const char *path) Which, if we're massaging this for a subsequent smaller diff we can do to make only the comment adjustment part of this be a non-moved line. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 2/7] submodule: strbuf variable rename 2023-02-07 22:47 ` Ævar Arnfjörð Bjarmason @ 2023-02-08 22:59 ` Calvin Wan 0 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-08 22:59 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, chooglen, newren, jonathantanmy > I'll only add that we could also do this on top: > > diff --git a/submodule.c b/submodule.c > index c7c6bfb2e26..eeb940d96a0 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1875,7 +1875,7 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) > struct child_process cp = CHILD_PROCESS_INIT; > struct strbuf buf = STRBUF_INIT; > FILE *fp; > - unsigned dirty_submodule = 0; > + unsigned dirty_submodule0 = 0; > const char *git_dir; > int ignore_cp_exit_code = 0; > > @@ -1908,10 +1908,11 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) > while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { > char *str = buf.buf; > const size_t len = buf.len; > + unsigned *dirty_submodule = &dirty_submodule0; > > /* regular untracked files */ > if (str[0] == '?') > - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > > if (str[0] == 'u' || > str[0] == '1' || > @@ -1923,17 +1924,17 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) > > if (str[5] == 'S' && str[8] == 'U') > /* nested untracked file */ > - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; > > if (str[0] == 'u' || > str[0] == '2' || > memcmp(str + 5, "S..U", 4)) > /* other change */ > - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; > + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; > } > > - if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && > - ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || > + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && > + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || > ignore_untracked)) { > /* > * We're not interested in any further information from > @@ -1949,7 +1950,7 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) > die(_("'git status --porcelain=2' failed in submodule %s"), path); > > strbuf_release(&buf); > - return dirty_submodule; > + return dirty_submodule0; > } > > int submodule_uses_gitfile(const char *path) > > Which, if we're massaging this for a subsequent smaller diff we can do > to make only the comment adjustment part of this be a non-moved line. Ah that's a neat little trick -- I'll save this one for the next time I do a refactor like this :) ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v7 3/7] submodule: move status parsing into function 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan ` (2 preceding siblings ...) 2023-02-07 18:17 ` [PATCH v7 2/7] submodule: strbuf variable rename Calvin Wan @ 2023-02-07 18:17 ` Calvin Wan 2023-02-07 18:17 ` [PATCH v7 4/7] submodule: refactor is_submodule_modified() Calvin Wan ` (3 subsequent siblings) 7 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:17 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy A future patch requires the ability to parse the output of git status --porcelain=2. Move parsing code from is_submodule_modified to parse_status_porcelain. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 74 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/submodule.c b/submodule.c index faf37c1101..768d4b4cd7 100644 --- a/submodule.c +++ b/submodule.c @@ -1870,6 +1870,45 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int parse_status_porcelain(char *str, size_t len, + unsigned *dirty_submodule, + int ignore_untracked) +{ + /* regular untracked files */ + if (str[0] == '?') + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { + /* T = line type, XY = status, SSSS = submodule state */ + if (len < strlen("T XY SSSS")) + BUG("invalid status --porcelain=2 line %s", + str); + + if (str[5] == 'S' && str[8] == 'U') + /* nested untracked file */ + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) + /* other change */ + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } + + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + ignore_untracked)) { + /* + * We're not interested in any further information from + * the child any more, neither output nor its exit code. + */ + return 1; + } + return 0; +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1909,39 +1948,10 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) char *str = buf.buf; const size_t len = buf.len; - /* regular untracked files */ - if (str[0] == '?') - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '1' || - str[0] == '2') { - /* T = line type, XY = status, SSSS = submodule state */ - if (len < strlen("T XY SSSS")) - BUG("invalid status --porcelain=2 line %s", - str); - - if (str[5] == 'S' && str[8] == 'U') - /* nested untracked file */ - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '2' || - memcmp(str + 5, "S..U", 4)) - /* other change */ - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; - } - - if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || - ignore_untracked)) { - /* - * We're not interested in any further information from - * the child any more, neither output nor its exit code. - */ - ignore_cp_exit_code = 1; + ignore_cp_exit_code = parse_status_porcelain(str, len, &dirty_submodule, + ignore_untracked); + if (ignore_cp_exit_code) break; - } } fclose(fp); -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v7 4/7] submodule: refactor is_submodule_modified() 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan ` (3 preceding siblings ...) 2023-02-07 18:17 ` [PATCH v7 3/7] submodule: move status parsing into function Calvin Wan @ 2023-02-07 18:17 ` Calvin Wan 2023-02-07 22:59 ` Ævar Arnfjörð Bjarmason 2023-02-07 18:17 ` [PATCH v7 5/7] diff-lib: refactor out diff_change logic Calvin Wan ` (2 subsequent siblings) 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:17 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy Refactor out submodule status logic and error messages that will be used in a future patch. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 65 ++++++++++++++++++++++++++++++++++------------------- 1 file changed, 42 insertions(+), 23 deletions(-) diff --git a/submodule.c b/submodule.c index 768d4b4cd7..d88aa2c573 100644 --- a/submodule.c +++ b/submodule.c @@ -28,6 +28,10 @@ static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; static int initialized_fetch_ref_tips; static struct oid_array ref_tips_before_fetch; static struct oid_array ref_tips_after_fetch; +static const char *status_porcelain_start_error = + N_("could not run 'git status --porcelain=2' in submodule %s"); +static const char *status_porcelain_fail_error = + N_("'git status --porcelain=2' failed in submodule %s"); /* * Check if the .gitmodules file is unmerged. Parsing of the .gitmodules file @@ -1870,6 +1874,40 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int verify_submodule_git_directory(const char *path) +{ + const char *git_dir; + struct strbuf buf = STRBUF_INIT; + + strbuf_addf(&buf, "%s/.git", path); + git_dir = read_gitfile(buf.buf); + if (!git_dir) + git_dir = buf.buf; + if (!is_git_directory(git_dir)) { + if (is_directory(git_dir)) + die(_("'%s' not recognized as a git repository"), git_dir); + strbuf_release(&buf); + /* The submodule is not checked out, so it is not modified */ + return 0; + } + strbuf_release(&buf); + return 1; +} + +static void prepare_status_porcelain(struct child_process *cp, + const char *path, int ignore_untracked) +{ + strvec_pushl(&cp->args, "status", "--porcelain=2", NULL); + if (ignore_untracked) + strvec_push(&cp->args, "-uno"); + + prepare_submodule_repo_env(&cp->env); + cp->git_cmd = 1; + cp->no_stdin = 1; + cp->out = -1; + cp->dir = path; +} + static int parse_status_porcelain(char *str, size_t len, unsigned *dirty_submodule, int ignore_untracked) @@ -1915,33 +1953,14 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) struct strbuf buf = STRBUF_INIT; FILE *fp; unsigned dirty_submodule = 0; - const char *git_dir; int ignore_cp_exit_code = 0; - strbuf_addf(&buf, "%s/.git", path); - git_dir = read_gitfile(buf.buf); - if (!git_dir) - git_dir = buf.buf; - if (!is_git_directory(git_dir)) { - if (is_directory(git_dir)) - die(_("'%s' not recognized as a git repository"), git_dir); - strbuf_release(&buf); - /* The submodule is not checked out, so it is not modified */ + if (!verify_submodule_git_directory(path)) return 0; - } - strbuf_reset(&buf); - - strvec_pushl(&cp.args, "status", "--porcelain=2", NULL); - if (ignore_untracked) - strvec_push(&cp.args, "-uno"); - prepare_submodule_repo_env(&cp.env); - cp.git_cmd = 1; - cp.no_stdin = 1; - cp.out = -1; - cp.dir = path; + prepare_status_porcelain(&cp, path, ignore_untracked); if (start_command(&cp)) - die(_("Could not run 'git status --porcelain=2' in submodule %s"), path); + die(_(status_porcelain_start_error), path); fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { @@ -1956,7 +1975,7 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fclose(fp); if (finish_command(&cp) && !ignore_cp_exit_code) - die(_("'git status --porcelain=2' failed in submodule %s"), path); + die(_(status_porcelain_fail_error), path); strbuf_release(&buf); return dirty_submodule; -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v7 4/7] submodule: refactor is_submodule_modified() 2023-02-07 18:17 ` [PATCH v7 4/7] submodule: refactor is_submodule_modified() Calvin Wan @ 2023-02-07 22:59 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-07 22:59 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy On Tue, Feb 07 2023, Calvin Wan wrote: > diff --git a/submodule.c b/submodule.c > index 768d4b4cd7..d88aa2c573 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -28,6 +28,10 @@ static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; > static int initialized_fetch_ref_tips; > static struct oid_array ref_tips_before_fetch; > static struct oid_array ref_tips_after_fetch; > +static const char *status_porcelain_start_error = > + N_("could not run 'git status --porcelain=2' in submodule %s"); > +static const char *status_porcelain_fail_error = > + N_("'git status --porcelain=2' failed in submodule %s"); Let's instead do: #define STATUS_PORCELAIN_START_ERROR \ N_("could not run 'git status --porcelain=2' in submodule %s") #define STATUS_PORCELAIN_FAIL_ERROR \ N_("'git status --porcelain=2' failed in submodule %s") Because a thing you're not discussing in the commit message is that the disadvantage of doing this sort of thing is that we lose the checking that -Wformat gives us (try to add an extra "%s" to these in your version, then the macro version, with gcc and/or clang). Personally I'd prefer just copy/pasting over losing that, but using a macro instead of a variable allows us to have our cake and eat it too. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v7 5/7] diff-lib: refactor out diff_change logic 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan ` (4 preceding siblings ...) 2023-02-07 18:17 ` [PATCH v7 4/7] submodule: refactor is_submodule_modified() Calvin Wan @ 2023-02-07 18:17 ` Calvin Wan 2023-02-08 14:28 ` Phillip Wood 2023-02-07 18:17 ` [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule Calvin Wan 2023-02-07 18:17 ` [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules Calvin Wan 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:17 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy Refactor out logic that sets up the diff_change call into a helper function for a future patch. Signed-off-by: Calvin Wan <calvinwan@google.com> --- diff-lib.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index dec040c366..7101cfda3f 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -88,6 +88,31 @@ static int match_stat_with_submodule(struct diff_options *diffopt, return changed; } +static int diff_change_helper(struct diff_options *options, + unsigned newmode, unsigned dirty_submodule, + int changed, struct index_state *istate, + struct cache_entry *ce) +{ + unsigned int oldmode; + const struct object_id *old_oid, *new_oid; + + if (!changed && !dirty_submodule) { + ce_mark_uptodate(ce); + mark_fsmonitor_valid(istate, ce); + if (!options->flags.find_copies_harder) + return 1; + } + oldmode = ce->ce_mode; + old_oid = &ce->oid; + new_oid = changed ? null_oid() : &ce->oid; + diff_change(options, oldmode, newmode, + old_oid, new_oid, + !is_null_oid(old_oid), + !is_null_oid(new_oid), + ce->name, 0, dirty_submodule); + return 0; +} + int run_diff_files(struct rev_info *revs, unsigned int option) { int entries, i; @@ -105,11 +130,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option) diff_unmerged_stage = 2; entries = istate->cache_nr; for (i = 0; i < entries; i++) { - unsigned int oldmode, newmode; + unsigned int newmode; struct cache_entry *ce = istate->cache[i]; int changed; unsigned dirty_submodule = 0; - const struct object_id *old_oid, *new_oid; if (diff_can_quit_early(&revs->diffopt)) break; @@ -245,21 +269,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce_mode_from_stat(ce, st.st_mode); } - if (!changed && !dirty_submodule) { - ce_mark_uptodate(ce); - mark_fsmonitor_valid(istate, ce); - if (!revs->diffopt.flags.find_copies_harder) - continue; - } - oldmode = ce->ce_mode; - old_oid = &ce->oid; - new_oid = changed ? null_oid() : &ce->oid; - diff_change(&revs->diffopt, oldmode, newmode, - old_oid, new_oid, - !is_null_oid(old_oid), - !is_null_oid(new_oid), - ce->name, 0, dirty_submodule); - + if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, + changed, istate, ce)) + continue; } diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v7 5/7] diff-lib: refactor out diff_change logic 2023-02-07 18:17 ` [PATCH v7 5/7] diff-lib: refactor out diff_change logic Calvin Wan @ 2023-02-08 14:28 ` Phillip Wood 2023-02-08 23:12 ` Calvin Wan 0 siblings, 1 reply; 86+ messages in thread From: Phillip Wood @ 2023-02-08 14:28 UTC (permalink / raw) To: Calvin Wan, git; +Cc: avarab, chooglen, newren, jonathantanmy Hi Calvin On 07/02/2023 18:17, Calvin Wan wrote: > Refactor out logic that sets up the diff_change call into a helper > function for a future patch. > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > diff-lib.c | 46 +++++++++++++++++++++++++++++----------------- > 1 file changed, 29 insertions(+), 17 deletions(-) > > diff --git a/diff-lib.c b/diff-lib.c > index dec040c366..7101cfda3f 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -88,6 +88,31 @@ static int match_stat_with_submodule(struct diff_options *diffopt, > return changed; > } > > +static int diff_change_helper(struct diff_options *options, > + unsigned newmode, unsigned dirty_submodule, > + int changed, I worry that having three integer parameters next to each other makes it very easy to mix them up with out getting any errors from the compiler because the types are all compatible. Could the last two be combined into a flags argument? A similar issues occurs in match_stat_with_submodule() in patch 7 Best Wishes Phillip struct index_state *istate, > + struct cache_entry *ce) > +{ > + unsigned int oldmode; > + const struct object_id *old_oid, *new_oid; > + > + if (!changed && !dirty_submodule) { > + ce_mark_uptodate(ce); > + mark_fsmonitor_valid(istate, ce); > + if (!options->flags.find_copies_harder) > + return 1; > + } > + oldmode = ce->ce_mode; > + old_oid = &ce->oid; > + new_oid = changed ? null_oid() : &ce->oid; > + diff_change(options, oldmode, newmode, > + old_oid, new_oid, > + !is_null_oid(old_oid), > + !is_null_oid(new_oid), > + ce->name, 0, dirty_submodule); > + return 0; > +} > + > int run_diff_files(struct rev_info *revs, unsigned int option) > { > int entries, i; > @@ -105,11 +130,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > diff_unmerged_stage = 2; > entries = istate->cache_nr; > for (i = 0; i < entries; i++) { > - unsigned int oldmode, newmode; > + unsigned int newmode; > struct cache_entry *ce = istate->cache[i]; > int changed; > unsigned dirty_submodule = 0; > - const struct object_id *old_oid, *new_oid; > > if (diff_can_quit_early(&revs->diffopt)) > break; > @@ -245,21 +269,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > newmode = ce_mode_from_stat(ce, st.st_mode); > } > > - if (!changed && !dirty_submodule) { > - ce_mark_uptodate(ce); > - mark_fsmonitor_valid(istate, ce); > - if (!revs->diffopt.flags.find_copies_harder) > - continue; > - } > - oldmode = ce->ce_mode; > - old_oid = &ce->oid; > - new_oid = changed ? null_oid() : &ce->oid; > - diff_change(&revs->diffopt, oldmode, newmode, > - old_oid, new_oid, > - !is_null_oid(old_oid), > - !is_null_oid(new_oid), > - ce->name, 0, dirty_submodule); > - > + if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, > + changed, istate, ce)) > + continue; > } > diffcore_std(&revs->diffopt); > diff_flush(&revs->diffopt); ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 5/7] diff-lib: refactor out diff_change logic 2023-02-08 14:28 ` Phillip Wood @ 2023-02-08 23:12 ` Calvin Wan 2023-02-09 20:53 ` Phillip Wood 0 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-08 23:12 UTC (permalink / raw) To: phillip.wood; +Cc: git, avarab, chooglen, newren, jonathantanmy > I worry that having three integer parameters next to each other makes it > very easy to mix them up with out getting any errors from the compiler > because the types are all compatible. Could the last two be combined > into a flags argument? A similar issues occurs in > match_stat_with_submodule() in patch 7 I'm not sure how much more I want to engineer a static helper function that is only being called in one other place. I also don't understand what you mean by combining the last two into paramters a flags argument. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 5/7] diff-lib: refactor out diff_change logic 2023-02-08 23:12 ` Calvin Wan @ 2023-02-09 20:53 ` Phillip Wood 0 siblings, 0 replies; 86+ messages in thread From: Phillip Wood @ 2023-02-09 20:53 UTC (permalink / raw) To: Calvin Wan; +Cc: git, avarab, chooglen, newren, jonathantanmy Hi Calvin On 08/02/2023 23:12, Calvin Wan wrote: >> I worry that having three integer parameters next to each other makes it >> very easy to mix them up with out getting any errors from the compiler >> because the types are all compatible. Could the last two be combined >> into a flags argument? A similar issues occurs in >> match_stat_with_submodule() in patch 7 > > I'm not sure how much more I want to engineer a static helper function > that is only being called in one other place. I also don't understand what > you mean by combining the last two into paramters a flags argument. Are `dirty_submodule` and `changed` booleans? If so then you can have a single bit flags argument made up of #define SUBMODULE_DIRTY 1 #define SUBMODULE_CHANGED 2 Best Wishes Phillip ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan ` (5 preceding siblings ...) 2023-02-07 18:17 ` [PATCH v7 5/7] diff-lib: refactor out diff_change logic Calvin Wan @ 2023-02-07 18:17 ` Calvin Wan 2023-02-08 8:18 ` Ævar Arnfjörð Bjarmason 2023-02-08 14:22 ` Phillip Wood 2023-02-07 18:17 ` [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules Calvin Wan 7 siblings, 2 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:17 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy Flatten out the if statements in match_stat_with_submodule so the logic is more readable and easier for future patches to add to. orig_flags didn't need to be set if the cache entry wasn't a GITLINK so defer setting it. Signed-off-by: Calvin Wan <calvinwan@google.com> --- diff-lib.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index 7101cfda3f..e18c886a80 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -73,18 +73,24 @@ static int match_stat_with_submodule(struct diff_options *diffopt, unsigned *dirty_submodule) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); - if (S_ISGITLINK(ce->ce_mode)) { - struct diff_flags orig_flags = diffopt->flags; - if (!diffopt->flags.override_submodule_config) - set_diffopt_flags_from_submodule_config(diffopt, ce->name); - if (diffopt->flags.ignore_submodules) - changed = 0; - else if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); - diffopt->flags = orig_flags; + struct diff_flags orig_flags; + + if (!S_ISGITLINK(ce->ce_mode)) + return changed; + + orig_flags = diffopt->flags; + if (!diffopt->flags.override_submodule_config) + set_diffopt_flags_from_submodule_config(diffopt, ce->name); + if (diffopt->flags.ignore_submodules) { + changed = 0; + goto cleanup; } + if (!diffopt->flags.ignore_dirty_submodules && + (!changed || diffopt->flags.dirty_submodules)) + *dirty_submodule = is_submodule_modified(ce->name, + diffopt->flags.ignore_untracked_in_submodules); +cleanup: + diffopt->flags = orig_flags; return changed; } -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule 2023-02-07 18:17 ` [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule Calvin Wan @ 2023-02-08 8:18 ` Ævar Arnfjörð Bjarmason 2023-02-08 17:07 ` Phillip Wood 2023-02-08 14:22 ` Phillip Wood 1 sibling, 1 reply; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-08 8:18 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy On Tue, Feb 07 2023, Calvin Wan wrote: > diff --git a/diff-lib.c b/diff-lib.c > index 7101cfda3f..e18c886a80 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -73,18 +73,24 @@ static int match_stat_with_submodule(struct diff_options *diffopt, > unsigned *dirty_submodule) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > - if (S_ISGITLINK(ce->ce_mode)) { > - struct diff_flags orig_flags = diffopt->flags; > - if (!diffopt->flags.override_submodule_config) > - set_diffopt_flags_from_submodule_config(diffopt, ce->name); > - if (diffopt->flags.ignore_submodules) > - changed = 0; > - else if (!diffopt->flags.ignore_dirty_submodules && > - (!changed || diffopt->flags.dirty_submodules)) > - *dirty_submodule = is_submodule_modified(ce->name, > - diffopt->flags.ignore_untracked_in_submodules); > - diffopt->flags = orig_flags; > + struct diff_flags orig_flags; > + > + if (!S_ISGITLINK(ce->ce_mode)) > + return changed; > + > + orig_flags = diffopt->flags; > + if (!diffopt->flags.override_submodule_config) > + set_diffopt_flags_from_submodule_config(diffopt, ce->name); > + if (diffopt->flags.ignore_submodules) { > + changed = 0; > + goto cleanup; > } > + if (!diffopt->flags.ignore_dirty_submodules && > + (!changed || diffopt->flags.dirty_submodules)) > + *dirty_submodule = is_submodule_modified(ce->name, > + diffopt->flags.ignore_untracked_in_submodules); > +cleanup: > + diffopt->flags = orig_flags; > return changed; > } Parallel to reviewing your topic I started wondering if we couldn't get rid of this "orig_flags" flip-flopping, i.e. can't we just set the specific flags we want in output parameters. Anyway, having looked at this closely I think this patch should be dropped entirely. I don't understand how this refactoring is meant to make the end result easier to read, reason about, or how it helps the subsequent patch. In addition to the above diff in 7/7 you do (and that's the change this is meant to help): static int match_stat_with_submodule(struct diff_options *diffopt, const struct cache_entry *ce, struct stat *st, unsigned ce_option, - unsigned *dirty_submodule) + unsigned *dirty_submodule, int *defer_submodule_status, + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); struct diff_flags orig_flags; + int defer = 0; if (!S_ISGITLINK(ce->ce_mode)) - return changed; + goto ret; orig_flags = diffopt->flags; if (!diffopt->flags.override_submodule_config) @@ -86,11 +92,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt, goto cleanup; } if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { + *dirty_submodule = is_submodule_modified(ce->name, diffopt->flags.ignore_untracked_in_submodules); + } + } cleanup: diffopt->flags = orig_flags; +ret: + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; } But if I rebase out this 6/7 patch and solve the conflict for 7/7 it becomes: @@ -65,14 +66,20 @@ static int check_removed(const struct index_state *istate, const struct cache_en * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES * option is set, the caller does not only want to know if a submodule is * modified at all but wants to know all the conditions that are met (new - * commits, untracked content and/or modified content). + * commits, untracked content and/or modified content). If + * defer_submodule_status bit is set, dirty_submodule will be left to the + * caller to set. defer_submodule_status can also be set to 0 in this + * function if there is no need to check if the submodule is modified. */ static int match_stat_with_submodule(struct diff_options *diffopt, const struct cache_entry *ce, struct stat *st, unsigned ce_option, - unsigned *dirty_submodule) + unsigned *dirty_submodule, int *defer_submodule_status, + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); + int defer = 0; + if (S_ISGITLINK(ce->ce_mode)) { struct diff_flags orig_flags = diffopt->flags; if (!diffopt->flags.override_submodule_config) @@ -80,11 +87,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt, if (diffopt->flags.ignore_submodules) changed = 0; else if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { + *dirty_submodule = is_submodule_modified(ce->name, + diffopt->flags.ignore_untracked_in_submodules); + } + } diffopt->flags = orig_flags; } + + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; } I can see how there's some room for *a* refactoring to reduce the subsequent diff, but not by mutch. But this commit didn't help at all. This whole "goto ret", and "goto cleanup" is just working around the fact that you pulled "orig_flags" out of the "if" scope. Normally the de-indentation would be worth it, but here it's not. The control flow becomes more complex to reason about as a result. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule 2023-02-08 8:18 ` Ævar Arnfjörð Bjarmason @ 2023-02-08 17:07 ` Phillip Wood 2023-02-08 23:13 ` Calvin Wan 0 siblings, 1 reply; 86+ messages in thread From: Phillip Wood @ 2023-02-08 17:07 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Calvin Wan Cc: git, chooglen, newren, jonathantanmy On 08/02/2023 08:18, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Feb 07 2023, Calvin Wan wrote: > Anyway, having looked at this closely I think this patch should be > dropped entirely. I don't understand how this refactoring is meant to > make the end result easier to read, reason about, or how it helps the > subsequent patch. That's my feeling too c.f. <19f91fea-a2a9-7dc6-d940-cc10f384fe76@dunelm.org.uk>. This patch has improved since that comment on v4 but I still think we'd be better off without it. Best Wishes Phillip > In addition to the above diff in 7/7 you do (and that's the change this > is meant to help): > > static int match_stat_with_submodule(struct diff_options *diffopt, > const struct cache_entry *ce, > struct stat *st, unsigned ce_option, > - unsigned *dirty_submodule) > + unsigned *dirty_submodule, int *defer_submodule_status, > + unsigned *ignore_untracked) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > struct diff_flags orig_flags; > + int defer = 0; > > if (!S_ISGITLINK(ce->ce_mode)) > - return changed; > + goto ret; > > orig_flags = diffopt->flags; > if (!diffopt->flags.override_submodule_config) > @@ -86,11 +92,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt, > goto cleanup; > } > if (!diffopt->flags.ignore_dirty_submodules && > - (!changed || diffopt->flags.dirty_submodules)) > - *dirty_submodule = is_submodule_modified(ce->name, > + (!changed || diffopt->flags.dirty_submodules)) { > + if (defer_submodule_status && *defer_submodule_status) { > + defer = 1; > + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; > + } else { > + *dirty_submodule = is_submodule_modified(ce->name, > diffopt->flags.ignore_untracked_in_submodules); > + } > + } > cleanup: > diffopt->flags = orig_flags; > +ret: > + if (defer_submodule_status) > + *defer_submodule_status = defer; > return changed; > } > > But if I rebase out this 6/7 patch and solve the conflict for 7/7 it > becomes: > > @@ -65,14 +66,20 @@ static int check_removed(const struct index_state *istate, const struct cache_en > * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES > * option is set, the caller does not only want to know if a submodule is > * modified at all but wants to know all the conditions that are met (new > - * commits, untracked content and/or modified content). > + * commits, untracked content and/or modified content). If > + * defer_submodule_status bit is set, dirty_submodule will be left to the > + * caller to set. defer_submodule_status can also be set to 0 in this > + * function if there is no need to check if the submodule is modified. > */ > static int match_stat_with_submodule(struct diff_options *diffopt, > const struct cache_entry *ce, > struct stat *st, unsigned ce_option, > - unsigned *dirty_submodule) > + unsigned *dirty_submodule, int *defer_submodule_status, > + unsigned *ignore_untracked) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > + int defer = 0; > + > if (S_ISGITLINK(ce->ce_mode)) { > struct diff_flags orig_flags = diffopt->flags; > if (!diffopt->flags.override_submodule_config) > @@ -80,11 +87,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt, > if (diffopt->flags.ignore_submodules) > changed = 0; > else if (!diffopt->flags.ignore_dirty_submodules && > - (!changed || diffopt->flags.dirty_submodules)) > - *dirty_submodule = is_submodule_modified(ce->name, > - diffopt->flags.ignore_untracked_in_submodules); > + (!changed || diffopt->flags.dirty_submodules)) { > + if (defer_submodule_status && *defer_submodule_status) { > + defer = 1; > + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; > + } else { > + *dirty_submodule = is_submodule_modified(ce->name, > + diffopt->flags.ignore_untracked_in_submodules); > + } > + } > diffopt->flags = orig_flags; > } > + > + if (defer_submodule_status) > + *defer_submodule_status = defer; > return changed; > } > > > I can see how there's some room for *a* refactoring to reduce the > subsequent diff, but not by mutch. > > But this commit didn't help at all. This whole "goto ret", and "goto > cleanup" is just working around the fact that you pulled "orig_flags" > out of the "if" scope. Normally the de-indentation would be worth it, > but here it's not. The control flow becomes more complex to reason about > as a result. > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule 2023-02-08 17:07 ` Phillip Wood @ 2023-02-08 23:13 ` Calvin Wan 0 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-02-08 23:13 UTC (permalink / raw) To: phillip.wood Cc: Ævar Arnfjörð Bjarmason, git, chooglen, newren, jonathantanmy I agree that this patch should be dropped. Thanks for catching this. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule 2023-02-07 18:17 ` [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule Calvin Wan 2023-02-08 8:18 ` Ævar Arnfjörð Bjarmason @ 2023-02-08 14:22 ` Phillip Wood 1 sibling, 0 replies; 86+ messages in thread From: Phillip Wood @ 2023-02-08 14:22 UTC (permalink / raw) To: Calvin Wan, git; +Cc: avarab, chooglen, newren, jonathantanmy Hi Calvin On 07/02/2023 18:17, Calvin Wan wrote: > Flatten out the if statements in match_stat_with_submodule so the > logic is more readable and easier for future patches to add to. > orig_flags didn't need to be set if the cache entry wasn't a > GITLINK so defer setting it. > > Signed-off-by: Calvin Wan <calvinwan@google.com> > --- > diff-lib.c | 28 +++++++++++++++++----------- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/diff-lib.c b/diff-lib.c > index 7101cfda3f..e18c886a80 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -73,18 +73,24 @@ static int match_stat_with_submodule(struct diff_options *diffopt, > unsigned *dirty_submodule) > { > int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); > - if (S_ISGITLINK(ce->ce_mode)) { > - struct diff_flags orig_flags = diffopt->flags; > - if (!diffopt->flags.override_submodule_config) > - set_diffopt_flags_from_submodule_config(diffopt, ce->name); > - if (diffopt->flags.ignore_submodules) > - changed = 0; > - else if (!diffopt->flags.ignore_dirty_submodules && > - (!changed || diffopt->flags.dirty_submodules)) > - *dirty_submodule = is_submodule_modified(ce->name, > - diffopt->flags.ignore_untracked_in_submodules); > - diffopt->flags = orig_flags; > + struct diff_flags orig_flags; > + > + if (!S_ISGITLINK(ce->ce_mode)) > + return changed; > + > + orig_flags = diffopt->flags; > + if (!diffopt->flags.override_submodule_config) > + set_diffopt_flags_from_submodule_config(diffopt, ce->name); > + if (diffopt->flags.ignore_submodules) { > + changed = 0; > + goto cleanup; Looking ahead to patch 7 there are no new uses of the "cleanup" label so I think it would be simpler to leave the code as it was, rather than changing the "else if" below to "if" and adding the goto here. Best Wishes Phillip > } > + if (!diffopt->flags.ignore_dirty_submodules && > + (!changed || diffopt->flags.dirty_submodules)) > + *dirty_submodule = is_submodule_modified(ce->name, > + diffopt->flags.ignore_untracked_in_submodules); > +cleanup: > + diffopt->flags = orig_flags; > return changed; > } > ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan ` (6 preceding siblings ...) 2023-02-07 18:17 ` [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule Calvin Wan @ 2023-02-07 18:17 ` Calvin Wan 2023-02-07 23:06 ` Ævar Arnfjörð Bjarmason 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-02-07 18:17 UTC (permalink / raw) To: git; +Cc: Calvin Wan, avarab, chooglen, newren, jonathantanmy During the iteration of the index entries in run_diff_files, whenever a submodule is found and needs its status checked, a subprocess is spawned for it. Instead of spawning the subprocess immediately and waiting for its completion to continue, hold onto all submodules and relevant information in a list. Then use that list to create tasks for run_processes_parallel. Subprocess output is duplicated and passed to status_pipe_output which stores it to be parsed on completion of the subprocess. Add config option submodule.diffJobs to set the maximum number of parallel jobs. The option defaults to 1 if unset. If set to 0, the number of jobs is set to online_cpus(). Since run_diff_files is called from many different commands, I chose to grab the config option in the function rather than adding variables to every git command and then figuring out how to pass them all in. Signed-off-by: Calvin Wan <calvinwan@google.com> --- Documentation/config/submodule.txt | 12 +++ diff-lib.c | 71 ++++++++++++-- submodule.c | 148 +++++++++++++++++++++++++++++ submodule.h | 9 ++ t/t4027-diff-submodule.sh | 31 ++++++ t/t7506-status-submodule.sh | 25 +++++ 6 files changed, 289 insertions(+), 7 deletions(-) diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt index 6490527b45..3209eb8117 100644 --- a/Documentation/config/submodule.txt +++ b/Documentation/config/submodule.txt @@ -93,6 +93,18 @@ submodule.fetchJobs:: in parallel. A value of 0 will give some reasonable default. If unset, it defaults to 1. +submodule.diffJobs:: + Specifies how many submodules are diffed at the same time. A + positive integer allows up to that number of submodules diffed + in parallel. A value of 0 will give some reasonable default. + If unset, it defaults to 1. The diff operation is used by many + other git commands such as add, merge, diff, status, stash and + more. Note that the expensive part of the diff operation is + reading the index from cache or memory. Therefore multiple jobs + may be detrimental to performance if your hardware does not + support parallel reads or if the number of jobs greatly exceeds + the amount of supported reads. + submodule.alternateLocation:: Specifies how the submodules obtain alternates when submodules are cloned. Possible values are `no`, `superproject`. diff --git a/diff-lib.c b/diff-lib.c index e18c886a80..f91cd73ae7 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -14,6 +14,7 @@ #include "dir.h" #include "fsmonitor.h" #include "commit-reach.h" +#include "config.h" /* * diff-files @@ -65,18 +66,23 @@ static int check_removed(const struct index_state *istate, const struct cache_en * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES * option is set, the caller does not only want to know if a submodule is * modified at all but wants to know all the conditions that are met (new - * commits, untracked content and/or modified content). + * commits, untracked content and/or modified content). If + * defer_submodule_status bit is set, dirty_submodule will be left to the + * caller to set. defer_submodule_status can also be set to 0 in this + * function if there is no need to check if the submodule is modified. */ static int match_stat_with_submodule(struct diff_options *diffopt, const struct cache_entry *ce, struct stat *st, unsigned ce_option, - unsigned *dirty_submodule) + unsigned *dirty_submodule, int *defer_submodule_status, + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); struct diff_flags orig_flags; + int defer = 0; if (!S_ISGITLINK(ce->ce_mode)) - return changed; + goto ret; orig_flags = diffopt->flags; if (!diffopt->flags.override_submodule_config) @@ -86,11 +92,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt, goto cleanup; } if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { + *dirty_submodule = is_submodule_modified(ce->name, diffopt->flags.ignore_untracked_in_submodules); + } + } cleanup: diffopt->flags = orig_flags; +ret: + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; } @@ -127,6 +142,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ? CE_MATCH_RACY_IS_DIRTY : 0); uint64_t start = getnanotime(); struct index_state *istate = revs->diffopt.repo->index; + struct string_list submodules = STRING_LIST_INIT_NODUP; diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); @@ -250,6 +266,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce->ce_mode; } else { struct stat st; + unsigned ignore_untracked = 0; + int defer_submodule_status = 1; changed = check_removed(istate, ce, &st); if (changed) { @@ -271,14 +289,53 @@ int run_diff_files(struct rev_info *revs, unsigned int option) } changed = match_stat_with_submodule(&revs->diffopt, ce, &st, - ce_option, &dirty_submodule); + ce_option, &dirty_submodule, + &defer_submodule_status, + &ignore_untracked); newmode = ce_mode_from_stat(ce, st.st_mode); + if (defer_submodule_status) { + struct submodule_status_util tmp = { + .changed = changed, + .dirty_submodule = 0, + .ignore_untracked = ignore_untracked, + .newmode = newmode, + .ce = ce, + .path = ce->name, + }; + struct string_list_item *item; + + item = string_list_append(&submodules, ce->name); + item->util = xmalloc(sizeof(tmp)); + memcpy(item->util, &tmp, sizeof(tmp)); + continue; + } } if (diff_change_helper(&revs->diffopt, newmode, dirty_submodule, changed, istate, ce)) continue; } + if (submodules.nr > 0) { + int parallel_jobs; + if (git_config_get_int("submodule.diffjobs", ¶llel_jobs)) + parallel_jobs = 1; + else if (!parallel_jobs) + parallel_jobs = online_cpus(); + else if (parallel_jobs < 0) + die(_("submodule.diffjobs cannot be negative")); + + if (get_submodules_status(&submodules, parallel_jobs)) + die(_("submodule status failed")); + for (size_t i = 0; i < submodules.nr; i++) { + struct submodule_status_util *util = submodules.items[i].util; + + if (diff_change_helper(&revs->diffopt, util->newmode, + util->dirty_submodule, util->changed, + istate, util->ce)) + continue; + } + } + string_list_clear(&submodules, 1); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); trace_performance_since(start, "diff-files"); @@ -326,7 +383,7 @@ static int get_stat_data(const struct index_state *istate, return -1; } changed = match_stat_with_submodule(diffopt, ce, &st, - 0, dirty_submodule); + 0, dirty_submodule, NULL, NULL); if (changed) { mode = ce_mode_from_stat(ce, st.st_mode); oid = null_oid(); diff --git a/submodule.c b/submodule.c index d88aa2c573..3e1811691a 100644 --- a/submodule.c +++ b/submodule.c @@ -1373,6 +1373,17 @@ int submodule_touches_in_range(struct repository *r, return ret; } +struct submodule_parallel_status { + size_t index_count; + int result; + + struct string_list *submodule_names; + + /* Pending statuses by OIDs */ + struct status_task **oid_status_tasks; + int oid_status_tasks_nr, oid_status_tasks_alloc; +}; + struct submodule_parallel_fetch { /* * The index of the last index entry processed by @@ -1455,6 +1466,12 @@ struct fetch_task { struct oid_array *commits; /* Ensure these commits are fetched */ }; +struct status_task { + const char *path; + struct strbuf out; + int ignore_untracked; +}; + /** * When a submodule is not defined in .gitmodules, we cannot access it * via the regular submodule-config. Create a fake submodule, which we can @@ -1947,6 +1964,25 @@ static int parse_status_porcelain(char *str, size_t len, return 0; } +static void parse_status_porcelain_strbuf(struct strbuf *buf, + unsigned *dirty_submodule, + int ignore_untracked) +{ + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + + string_list_split(&list, buf->buf, '\n', -1); + + for_each_string_list_item(item, &list) { + if (parse_status_porcelain(item->string, + strlen(item->string), + dirty_submodule, + ignore_untracked)) + break; + } + string_list_clear(&list, 0); +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1981,6 +2017,118 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) return dirty_submodule; } +static struct status_task * +get_status_task_from_index(struct submodule_parallel_status *sps, + struct strbuf *err) +{ + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; + struct status_task *task; + + if (!verify_submodule_git_directory(util->path)) + continue; + + task = xmalloc(sizeof(*task)); + task->path = util->path; + task->ignore_untracked = util->ignore_untracked; + strbuf_init(&task->out, 0); + sps->index_count++; + return task; + } + return NULL; +} + +static int get_next_submodule_status(struct child_process *cp, + struct strbuf *err, void *data, + void **task_cb) +{ + struct submodule_parallel_status *sps = data; + struct status_task *task = get_status_task_from_index(sps, err); + + if (!task) + return 0; + + child_process_init(cp); + prepare_submodule_repo_env_in_gitdir(&cp->env); + prepare_status_porcelain(cp, task->path, task->ignore_untracked); + *task_cb = task; + return 1; +} + +static int status_start_failure(struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + + sps->result = 1; + strbuf_addf(err, + _(status_porcelain_start_error), + task->path); + return 0; +} + +static void status_duplicate_output(struct strbuf *out, + size_t offset, + void *cb, void *task_cb) +{ + struct status_task *task = task_cb; + + strbuf_add(&task->out, out->buf + offset, out->len - offset); + strbuf_setlen(out, offset); +} + +static int status_finish(int retvalue, struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + struct string_list_item *it = + string_list_lookup(sps->submodule_names, task->path); + struct submodule_status_util *util = it->util; + + if (retvalue) { + sps->result = 1; + strbuf_addf(err, + _(status_porcelain_fail_error), + task->path); + } + + parse_status_porcelain_strbuf(&task->out, + &util->dirty_submodule, + util->ignore_untracked); + + strbuf_release(&task->out); + free(task); + + return 0; +} + +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs) +{ + struct submodule_parallel_status sps = { + .submodule_names = submodules, + }; + const struct run_process_parallel_opts opts = { + .tr2_category = "submodule", + .tr2_label = "parallel/status", + + .processes = max_parallel_jobs, + + .get_next_task = get_next_submodule_status, + .start_failure = status_start_failure, + .duplicate_output = status_duplicate_output, + .task_finished = status_finish, + .data = &sps, + }; + + string_list_sort(sps.submodule_names); + run_processes_parallel(&opts); + + return sps.result; +} + int submodule_uses_gitfile(const char *path) { struct child_process cp = CHILD_PROCESS_INIT; diff --git a/submodule.h b/submodule.h index b52a4ff1e7..08d278a414 100644 --- a/submodule.h +++ b/submodule.h @@ -41,6 +41,13 @@ struct submodule_update_strategy { .type = SM_UPDATE_UNSPECIFIED, \ } +struct submodule_status_util { + int changed, ignore_untracked; + unsigned dirty_submodule, newmode; + struct cache_entry *ce; + const char *path; +}; + int is_gitmodules_unmerged(struct index_state *istate); int is_writing_gitmodules_ok(void); int is_staging_gitmodules_ok(struct index_state *istate); @@ -94,6 +101,8 @@ int fetch_submodules(struct repository *r, int command_line_option, int default_option, int quiet, int max_parallel_jobs); +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs); unsigned is_submodule_modified(const char *path, int ignore_untracked); int submodule_uses_gitfile(const char *path); diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh index 40164ae07d..1c747cc325 100755 --- a/t/t4027-diff-submodule.sh +++ b/t/t4027-diff-submodule.sh @@ -34,6 +34,25 @@ test_expect_success setup ' subtip=$3 subprev=$2 ' +test_expect_success 'diff in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git diff && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git diff && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + test_expect_success 'git diff --raw HEAD' ' hexsz=$(test_oid hexsz) && git diff --raw --abbrev=$hexsz HEAD >actual && @@ -70,6 +89,18 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree)' ' test_cmp expect.body actual.body ' +test_expect_success 'git diff HEAD with dirty submodule (work tree, parallel)' ' + ( + cd sub && + git reset --hard && + echo >>world + ) && + git -c submodule.diffJobs=8 diff HEAD >actual && + sed -e "1,/^@@/d" actual >actual.body && + expect_from_to >expect.body $subtip $subprev-dirty && + test_cmp expect.body actual.body +' + test_expect_success 'git diff HEAD with dirty submodule (index)' ' ( cd sub && diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index d050091345..7da64e4c4c 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -412,4 +412,29 @@ test_expect_success 'status with added file in nested submodule (short)' ' EOF ' +test_expect_success 'status in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git status && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git status && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + +test_expect_success 'status in superproject with submodules (parallel)' ' + git -C super status --porcelain >output && + git -C super -c submodule.diffJobs=8 status --porcelain >output_parallel && + diff output output_parallel +' + test_done -- 2.39.1.519.gcb327c4b5f-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules 2023-02-07 18:17 ` [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules Calvin Wan @ 2023-02-07 23:06 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 86+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-02-07 23:06 UTC (permalink / raw) To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy On Tue, Feb 07 2023, Calvin Wan wrote: > [...] > + sps->result = 1; > + strbuf_addf(err, > + _(status_porcelain_start_error), > + task->path); > + return 0; > [...] > + if (retvalue) { > + sps->result = 1; > + strbuf_addf(err, > + _(status_porcelain_fail_error), > + task->path); > [...] This is nitpicky, but what's with the short lines and over-wrapping? If you change these two to (just using my macro version on top, but it's the same with yours): strbuf_addf(err, _(STATUS_PORCELAIN_START_ERROR), task->path); And: strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path); Both of these are under our usual line limit at their respective indentation (the latter at 77, rule of thumb is to wrap at 79-80). > + if (submodules.nr > 0) { Don't compare unsigned to >0, just use "submodules.nr". > + int parallel_jobs; nit: add extra \n, or maybe just call this "int v", as it's clear from the scope what it's about... > + if (git_config_get_int("submodule.diffjobs", ¶llel_jobs)) > + parallel_jobs = 1; > + else if (!parallel_jobs) > + parallel_jobs = online_cpus(); > + else if (parallel_jobs < 0) > + die(_("submodule.diffjobs cannot be negative")); Can't you use the "ulong" instead of "int" and have it handle this "is negative?" error check for you? > + > + if (get_submodules_status(&submodules, parallel_jobs)) > + die(_("submodule status failed")); > + for (size_t i = 0; i < submodules.nr; i++) { Another case that can use for_each_string_list_item(). > +struct submodule_parallel_status { > + size_t index_count; > + int result; > + > + struct string_list *submodule_names; > + > + /* Pending statuses by OIDs */ > + struct status_task **oid_status_tasks; > + int oid_status_tasks_nr, oid_status_tasks_alloc; For new structs, let's use size_t, not "int" for alloc/nr. Also, as this is 7/7 and we're not adding another such pattern for the forseeable future, can we just call these "size_t nr", "size_t alloc" and "tasks"? And having said all that, it turns out this is just dead code that can be removed? Blindly copied from submodule_parallel_fetch? ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v6 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan 2023-01-05 23:23 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan @ 2023-01-17 19:30 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 2/6] submodule: strbuf variable rename Calvin Wan ` (4 subsequent siblings) 7 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-17 19:30 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Add duplicate_output_fn as an optionally set function in run_process_parallel_opts. If set, output from each child process is copied and passed to the callback function whenever output from the child process is buffered to allow for separate parsing. Signed-off-by: Calvin Wan <calvinwan@google.com> --- run-command.c | 16 ++++++++++++--- run-command.h | 27 +++++++++++++++++++++++++ t/helper/test-run-command.c | 21 ++++++++++++++++++++ t/t0061-run-command.sh | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 100 insertions(+), 3 deletions(-) diff --git a/run-command.c b/run-command.c index 756f1839aa..cad88befe0 100644 --- a/run-command.c +++ b/run-command.c @@ -1526,6 +1526,9 @@ static void pp_init(struct parallel_processes *pp, if (!opts->get_next_task) BUG("you need to specify a get_next_task function"); + if (opts->duplicate_output && opts->ungroup) + BUG("duplicate_output and ungroup are incompatible with each other"); + CALLOC_ARRAY(pp->children, n); if (!opts->ungroup) CALLOC_ARRAY(pp->pfd, n); @@ -1645,14 +1648,21 @@ static void pp_buffer_stderr(struct parallel_processes *pp, for (size_t i = 0; i < opts->processes; i++) { if (pp->children[i].state == GIT_CP_WORKING && pp->pfd[i].revents & (POLLIN | POLLHUP)) { - int n = strbuf_read_once(&pp->children[i].err, - pp->children[i].process.err, 0); + ssize_t n = strbuf_read_once(&pp->children[i].err, + pp->children[i].process.err, 0); if (n == 0) { close(pp->children[i].process.err); pp->children[i].state = GIT_CP_WAIT_CLEANUP; - } else if (n < 0) + } else if (n < 0) { if (errno != EAGAIN) die_errno("read"); + } else { + if (opts->duplicate_output) + opts->duplicate_output(&pp->children[i].err, + strlen(pp->children[i].err.buf) - n, + opts->data, + pp->children[i].data); + } } } } diff --git a/run-command.h b/run-command.h index 072db56a4d..6dcf999f6c 100644 --- a/run-command.h +++ b/run-command.h @@ -408,6 +408,27 @@ typedef int (*start_failure_fn)(struct strbuf *out, void *pp_cb, void *pp_task_cb); +/** + * This callback is called whenever output from a child process is buffered + * + * See run_processes_parallel() below for a discussion of the "struct + * strbuf *out" parameter. + * + * The offset refers to the number of bytes originally in "out" before + * the output from the child process was buffered. Therefore, the buffer + * range, "out + buf" to the end of "out", would contain the buffer of + * the child process output. + * + * pp_cb is the callback cookie as passed into run_processes_parallel, + * pp_task_cb is the callback cookie as passed into get_next_task_fn. + * + * This function is incompatible with "ungroup" + */ +typedef void (*duplicate_output_fn)(struct strbuf *out, + size_t offset, + void *pp_cb, + void *pp_task_cb); + /** * This callback is called on every child process that finished processing. * @@ -461,6 +482,12 @@ struct run_process_parallel_opts */ start_failure_fn start_failure; + /** + * duplicate_output: See duplicate_output_fn() above. This should be + * NULL unless process specific output is needed + */ + duplicate_output_fn duplicate_output; + /** * task_finished: See task_finished_fn() above. This can be * NULL to omit any special handling. diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index 3ecb830f4a..ffd3cd0045 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -52,6 +52,21 @@ static int no_job(struct child_process *cp, return 0; } +static void duplicate_output(struct strbuf *out, + size_t offset, + void *pp_cb UNUSED, + void *pp_task_cb UNUSED) +{ + struct string_list list = STRING_LIST_INIT_DUP; + + string_list_split(&list, out->buf + offset, '\n', -1); + for (size_t i = 0; i < list.nr; i++) { + if (strlen(list.items[i].string) > 0) + fprintf(stderr, "duplicate_output: %s\n", list.items[i].string); + } + string_list_clear(&list, 0); +} + static int task_finished(int result, struct strbuf *err, void *pp_cb, @@ -439,6 +454,12 @@ int cmd__run_command(int argc, const char **argv) opts.ungroup = 1; } + if (!strcmp(argv[1], "--duplicate-output")) { + argv += 1; + argc -= 1; + opts.duplicate_output = duplicate_output; + } + jobs = atoi(argv[2]); strvec_clear(&proc.args); strvec_pushv(&proc.args, (const char **)argv + 3); diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index e2411f6a9b..879e536638 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -135,6 +135,15 @@ test_expect_success 'run_command runs in parallel with more jobs available than test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more jobs available than tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' ' test-tool run-command --ungroup run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -147,6 +156,15 @@ test_expect_success 'run_command runs in parallel with as many jobs as tasks' ' test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with as many jobs as tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' ' test-tool run-command --ungroup run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -159,6 +177,15 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more tasks than jobs available --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' ' test-tool run-command --ungroup run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -180,6 +207,12 @@ test_expect_success 'run_command is asked to abort gracefully' ' test_cmp expect actual ' +test_expect_success 'run_command is asked to abort gracefully --duplicate-output' ' + test-tool run-command --duplicate-output run-command-abort 3 false >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command is asked to abort gracefully (ungroup)' ' test-tool run-command --ungroup run-command-abort 3 false >out 2>err && test_must_be_empty out && @@ -196,6 +229,12 @@ test_expect_success 'run_command outputs ' ' test_cmp expect actual ' +test_expect_success 'run_command outputs --duplicate-output' ' + test-tool run-command --duplicate-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command outputs (ungroup) ' ' test-tool run-command --ungroup run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_must_be_empty out && -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v6 2/6] submodule: strbuf variable rename 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan ` (2 preceding siblings ...) 2023-01-17 19:30 ` [PATCH v6 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan @ 2023-01-17 19:30 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 3/6] submodule: move status parsing into function Calvin Wan ` (3 subsequent siblings) 7 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-17 19:30 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy A prepatory change for a future patch that moves the status parsing logic to a separate function. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/submodule.c b/submodule.c index fae24ef34a..faf37c1101 100644 --- a/submodule.c +++ b/submodule.c @@ -1906,25 +1906,28 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { + char *str = buf.buf; + const size_t len = buf.len; + /* regular untracked files */ - if (buf.buf[0] == '?') + if (str[0] == '?') dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '1' || - buf.buf[0] == '2') { + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { /* T = line type, XY = status, SSSS = submodule state */ - if (buf.len < strlen("T XY SSSS")) + if (len < strlen("T XY SSSS")) BUG("invalid status --porcelain=2 line %s", - buf.buf); + str); - if (buf.buf[5] == 'S' && buf.buf[8] == 'U') + if (str[5] == 'S' && str[8] == 'U') /* nested untracked file */ dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '2' || - memcmp(buf.buf + 5, "S..U", 4)) + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) /* other change */ dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; } -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v6 3/6] submodule: move status parsing into function 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan ` (3 preceding siblings ...) 2023-01-17 19:30 ` [PATCH v6 2/6] submodule: strbuf variable rename Calvin Wan @ 2023-01-17 19:30 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan ` (2 subsequent siblings) 7 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-17 19:30 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy A future patch requires the ability to parse the output of git status --porcelain=2. Move parsing code from is_submodule_modified to parse_status_porcelain. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 74 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/submodule.c b/submodule.c index faf37c1101..768d4b4cd7 100644 --- a/submodule.c +++ b/submodule.c @@ -1870,6 +1870,45 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int parse_status_porcelain(char *str, size_t len, + unsigned *dirty_submodule, + int ignore_untracked) +{ + /* regular untracked files */ + if (str[0] == '?') + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { + /* T = line type, XY = status, SSSS = submodule state */ + if (len < strlen("T XY SSSS")) + BUG("invalid status --porcelain=2 line %s", + str); + + if (str[5] == 'S' && str[8] == 'U') + /* nested untracked file */ + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) + /* other change */ + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } + + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + ignore_untracked)) { + /* + * We're not interested in any further information from + * the child any more, neither output nor its exit code. + */ + return 1; + } + return 0; +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1909,39 +1948,10 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) char *str = buf.buf; const size_t len = buf.len; - /* regular untracked files */ - if (str[0] == '?') - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '1' || - str[0] == '2') { - /* T = line type, XY = status, SSSS = submodule state */ - if (len < strlen("T XY SSSS")) - BUG("invalid status --porcelain=2 line %s", - str); - - if (str[5] == 'S' && str[8] == 'U') - /* nested untracked file */ - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '2' || - memcmp(str + 5, "S..U", 4)) - /* other change */ - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; - } - - if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || - ignore_untracked)) { - /* - * We're not interested in any further information from - * the child any more, neither output nor its exit code. - */ - ignore_cp_exit_code = 1; + ignore_cp_exit_code = parse_status_porcelain(str, len, &dirty_submodule, + ignore_untracked); + if (ignore_cp_exit_code) break; - } } fclose(fp); -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v6 4/6] diff-lib: refactor match_stat_with_submodule 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan ` (4 preceding siblings ...) 2023-01-17 19:30 ` [PATCH v6 3/6] submodule: move status parsing into function Calvin Wan @ 2023-01-17 19:30 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-01-17 19:30 ` [PATCH v6 6/6] submodule: call parallel code from serial status Calvin Wan 7 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-17 19:30 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Flatten out the if statements in match_stat_with_submodule so the logic is more readable and easier for future patches to add to. orig_flags didn't need to be set if the cache entry wasn't a GITLINK so defer setting it. Signed-off-by: Calvin Wan <calvinwan@google.com> --- diff-lib.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index dec040c366..64583fded0 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -73,18 +73,24 @@ static int match_stat_with_submodule(struct diff_options *diffopt, unsigned *dirty_submodule) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); - if (S_ISGITLINK(ce->ce_mode)) { - struct diff_flags orig_flags = diffopt->flags; - if (!diffopt->flags.override_submodule_config) - set_diffopt_flags_from_submodule_config(diffopt, ce->name); - if (diffopt->flags.ignore_submodules) - changed = 0; - else if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); - diffopt->flags = orig_flags; + struct diff_flags orig_flags; + + if (!S_ISGITLINK(ce->ce_mode)) + return changed; + + orig_flags = diffopt->flags; + if (!diffopt->flags.override_submodule_config) + set_diffopt_flags_from_submodule_config(diffopt, ce->name); + if (diffopt->flags.ignore_submodules) { + changed = 0; + goto cleanup; } + if (!diffopt->flags.ignore_dirty_submodules && + (!changed || diffopt->flags.dirty_submodules)) + *dirty_submodule = is_submodule_modified(ce->name, + diffopt->flags.ignore_untracked_in_submodules); +cleanup: + diffopt->flags = orig_flags; return changed; } -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan ` (5 preceding siblings ...) 2023-01-17 19:30 ` [PATCH v6 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan @ 2023-01-17 19:30 ` Calvin Wan 2023-01-26 9:09 ` Glen Choo 2023-01-26 9:16 ` Glen Choo 2023-01-17 19:30 ` [PATCH v6 6/6] submodule: call parallel code from serial status Calvin Wan 7 siblings, 2 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-17 19:30 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy During the iteration of the index entries in run_diff_files, whenever a submodule is found and needs its status checked, a subprocess is spawned for it. Instead of spawning the subprocess immediately and waiting for its completion to continue, hold onto all submodules and relevant information in a list. Then use that list to create tasks for run_processes_parallel. Subprocess output is duplicated and passed to status_pipe_output which stores it to be parsed on completion of the subprocess. Add config option submodule.diffJobs to set the maximum number of parallel jobs. The option defaults to 1 if unset. If set to 0, the number of jobs is set to online_cpus(). Since run_diff_files is called from many different commands, I chose to grab the config option in the function rather than adding variables to every git command and then figuring out how to pass them all in. Signed-off-by: Calvin Wan <calvinwan@google.com> --- Documentation/config/submodule.txt | 12 ++ diff-lib.c | 84 ++++++++++++-- submodule.c | 169 +++++++++++++++++++++++++++++ submodule.h | 9 ++ t/t4027-diff-submodule.sh | 19 ++++ t/t7506-status-submodule.sh | 19 ++++ 6 files changed, 305 insertions(+), 7 deletions(-) diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt index 6490527b45..3209eb8117 100644 --- a/Documentation/config/submodule.txt +++ b/Documentation/config/submodule.txt @@ -93,6 +93,18 @@ submodule.fetchJobs:: in parallel. A value of 0 will give some reasonable default. If unset, it defaults to 1. +submodule.diffJobs:: + Specifies how many submodules are diffed at the same time. A + positive integer allows up to that number of submodules diffed + in parallel. A value of 0 will give some reasonable default. + If unset, it defaults to 1. The diff operation is used by many + other git commands such as add, merge, diff, status, stash and + more. Note that the expensive part of the diff operation is + reading the index from cache or memory. Therefore multiple jobs + may be detrimental to performance if your hardware does not + support parallel reads or if the number of jobs greatly exceeds + the amount of supported reads. + submodule.alternateLocation:: Specifies how the submodules obtain alternates when submodules are cloned. Possible values are `no`, `superproject`. diff --git a/diff-lib.c b/diff-lib.c index 64583fded0..f51ea07f36 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -14,6 +14,7 @@ #include "dir.h" #include "fsmonitor.h" #include "commit-reach.h" +#include "config.h" /* * diff-files @@ -65,18 +66,23 @@ static int check_removed(const struct index_state *istate, const struct cache_en * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES * option is set, the caller does not only want to know if a submodule is * modified at all but wants to know all the conditions that are met (new - * commits, untracked content and/or modified content). + * commits, untracked content and/or modified content). If + * defer_submodule_status bit is set, dirty_submodule will be left to the + * caller to set. defer_submodule_status can also be set to 0 in this + * function if there is no need to check if the submodule is modified. */ static int match_stat_with_submodule(struct diff_options *diffopt, const struct cache_entry *ce, struct stat *st, unsigned ce_option, - unsigned *dirty_submodule) + unsigned *dirty_submodule, int *defer_submodule_status, + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); struct diff_flags orig_flags; + int defer = 0; if (!S_ISGITLINK(ce->ce_mode)) - return changed; + goto ret; orig_flags = diffopt->flags; if (!diffopt->flags.override_submodule_config) @@ -86,11 +92,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt, goto cleanup; } if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { + *dirty_submodule = is_submodule_modified(ce->name, diffopt->flags.ignore_untracked_in_submodules); + } + } cleanup: diffopt->flags = orig_flags; +ret: + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; } @@ -102,6 +117,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ? CE_MATCH_RACY_IS_DIRTY : 0); uint64_t start = getnanotime(); struct index_state *istate = revs->diffopt.repo->index; + struct string_list submodules = STRING_LIST_INIT_NODUP; diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); @@ -226,6 +242,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce->ce_mode; } else { struct stat st; + unsigned ignore_untracked = 0; + int defer_submodule_status = !!revs->repo; changed = check_removed(istate, ce, &st); if (changed) { @@ -247,8 +265,26 @@ int run_diff_files(struct rev_info *revs, unsigned int option) } changed = match_stat_with_submodule(&revs->diffopt, ce, &st, - ce_option, &dirty_submodule); + ce_option, &dirty_submodule, + &defer_submodule_status, + &ignore_untracked); newmode = ce_mode_from_stat(ce, st.st_mode); + if (defer_submodule_status) { + struct submodule_status_util tmp = { + .changed = changed, + .dirty_submodule = 0, + .ignore_untracked = ignore_untracked, + .newmode = newmode, + .ce = ce, + .path = ce->name, + }; + struct string_list_item *item; + + item = string_list_append(&submodules, ce->name); + item->util = xmalloc(sizeof(tmp)); + memcpy(item->util, &tmp, sizeof(tmp)); + continue; + } } if (!changed && !dirty_submodule) { @@ -267,6 +303,40 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ce->name, 0, dirty_submodule); } + if (submodules.nr > 0) { + int parallel_jobs; + if (git_config_get_int("submodule.diffjobs", ¶llel_jobs)) + parallel_jobs = 1; + else if (!parallel_jobs) + parallel_jobs = online_cpus(); + else if (parallel_jobs < 0) + die(_("submodule.diffjobs cannot be negative")); + + if (get_submodules_status(&submodules, parallel_jobs)) + die(_("submodule status failed")); + for (size_t i = 0; i < submodules.nr; i++) { + struct submodule_status_util *util = submodules.items[i].util; + struct cache_entry *ce = util->ce; + unsigned int oldmode; + const struct object_id *old_oid, *new_oid; + + if (!util->changed && !util->dirty_submodule) { + ce_mark_uptodate(ce); + mark_fsmonitor_valid(istate, ce); + if (!revs->diffopt.flags.find_copies_harder) + continue; + } + oldmode = ce->ce_mode; + old_oid = &ce->oid; + new_oid = util->changed ? null_oid() : &ce->oid; + diff_change(&revs->diffopt, oldmode, util->newmode, + old_oid, new_oid, + !is_null_oid(old_oid), + !is_null_oid(new_oid), + ce->name, 0, util->dirty_submodule); + } + } + string_list_clear(&submodules, 1); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); trace_performance_since(start, "diff-files"); @@ -314,7 +384,7 @@ static int get_stat_data(const struct index_state *istate, return -1; } changed = match_stat_with_submodule(diffopt, ce, &st, - 0, dirty_submodule); + 0, dirty_submodule, NULL, NULL); if (changed) { mode = ce_mode_from_stat(ce, st.st_mode); oid = null_oid(); diff --git a/submodule.c b/submodule.c index 768d4b4cd7..da95ea1f5e 100644 --- a/submodule.c +++ b/submodule.c @@ -1369,6 +1369,17 @@ int submodule_touches_in_range(struct repository *r, return ret; } +struct submodule_parallel_status { + size_t index_count; + int result; + + struct string_list *submodule_names; + + /* Pending statuses by OIDs */ + struct status_task **oid_status_tasks; + int oid_status_tasks_nr, oid_status_tasks_alloc; +}; + struct submodule_parallel_fetch { /* * The index of the last index entry processed by @@ -1451,6 +1462,12 @@ struct fetch_task { struct oid_array *commits; /* Ensure these commits are fetched */ }; +struct status_task { + const char *path; + struct strbuf out; + int ignore_untracked; +}; + /** * When a submodule is not defined in .gitmodules, we cannot access it * via the regular submodule-config. Create a fake submodule, which we can @@ -1909,6 +1926,25 @@ static int parse_status_porcelain(char *str, size_t len, return 0; } +static void parse_status_porcelain_strbuf(struct strbuf *buf, + unsigned *dirty_submodule, + int ignore_untracked) +{ + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + + string_list_split(&list, buf->buf, '\n', -1); + + for_each_string_list_item(item, &list) { + if (parse_status_porcelain(item->string, + strlen(item->string), + dirty_submodule, + ignore_untracked)) + break; + } + string_list_clear(&list, 0); +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1962,6 +1998,139 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) return dirty_submodule; } +static struct status_task * +get_status_task_from_index(struct submodule_parallel_status *sps, + struct strbuf *err) +{ + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; + struct status_task *task; + struct strbuf buf = STRBUF_INIT; + const char *git_dir; + + strbuf_addf(&buf, "%s/.git", util->path); + git_dir = read_gitfile(buf.buf); + if (!git_dir) + git_dir = buf.buf; + if (!is_git_directory(git_dir)) { + if (is_directory(git_dir)) + die(_("'%s' not recognized as a git repository"), git_dir); + strbuf_release(&buf); + /* The submodule is not checked out, so it is not modified */ + util->dirty_submodule = 0; + continue; + } + strbuf_release(&buf); + + task = xmalloc(sizeof(*task)); + task->path = util->path; + task->ignore_untracked = util->ignore_untracked; + strbuf_init(&task->out, 0); + sps->index_count++; + return task; + } + return NULL; +} + +static int get_next_submodule_status(struct child_process *cp, + struct strbuf *err, void *data, + void **task_cb) +{ + struct submodule_parallel_status *sps = data; + struct status_task *task = get_status_task_from_index(sps, err); + + if (!task) + return 0; + + child_process_init(cp); + prepare_submodule_repo_env_in_gitdir(&cp->env); + + strvec_init(&cp->args); + strvec_pushl(&cp->args, "status", "--porcelain=2", NULL); + if (task->ignore_untracked) + strvec_push(&cp->args, "-uno"); + + prepare_submodule_repo_env(&cp->env); + cp->git_cmd = 1; + cp->dir = task->path; + *task_cb = task; + return 1; +} + +static int status_start_failure(struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + + sps->result = 1; + strbuf_addf(err, + _("could not run 'git status --porcelain=2' in submodule %s"), + task->path); + return 0; +} + +static void status_duplicate_output(struct strbuf *out, + size_t offset, + void *cb, void *task_cb) +{ + struct status_task *task = task_cb; + + strbuf_add(&task->out, out->buf + offset, out->len - offset); + strbuf_setlen(out, offset); +} + +static int status_finish(int retvalue, struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + struct string_list_item *it = + string_list_lookup(sps->submodule_names, task->path); + struct submodule_status_util *util = it->util; + + if (retvalue) { + sps->result = 1; + strbuf_addf(err, + _("'git status --porcelain=2' failed in submodule %s"), + task->path); + } + + parse_status_porcelain_strbuf(&task->out, + &util->dirty_submodule, + util->ignore_untracked); + + strbuf_release(&task->out); + free(task); + + return 0; +} + +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs) +{ + struct submodule_parallel_status sps = { + .submodule_names = submodules, + }; + const struct run_process_parallel_opts opts = { + .tr2_category = "submodule", + .tr2_label = "parallel/status", + + .processes = max_parallel_jobs, + + .get_next_task = get_next_submodule_status, + .start_failure = status_start_failure, + .duplicate_output = status_duplicate_output, + .task_finished = status_finish, + .data = &sps, + }; + + string_list_sort(sps.submodule_names); + run_processes_parallel(&opts); + + return sps.result; +} + int submodule_uses_gitfile(const char *path) { struct child_process cp = CHILD_PROCESS_INIT; diff --git a/submodule.h b/submodule.h index b52a4ff1e7..08d278a414 100644 --- a/submodule.h +++ b/submodule.h @@ -41,6 +41,13 @@ struct submodule_update_strategy { .type = SM_UPDATE_UNSPECIFIED, \ } +struct submodule_status_util { + int changed, ignore_untracked; + unsigned dirty_submodule, newmode; + struct cache_entry *ce; + const char *path; +}; + int is_gitmodules_unmerged(struct index_state *istate); int is_writing_gitmodules_ok(void); int is_staging_gitmodules_ok(struct index_state *istate); @@ -94,6 +101,8 @@ int fetch_submodules(struct repository *r, int command_line_option, int default_option, int quiet, int max_parallel_jobs); +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs); unsigned is_submodule_modified(const char *path, int ignore_untracked); int submodule_uses_gitfile(const char *path); diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh index 40164ae07d..e08ee315a7 100755 --- a/t/t4027-diff-submodule.sh +++ b/t/t4027-diff-submodule.sh @@ -34,6 +34,25 @@ test_expect_success setup ' subtip=$3 subprev=$2 ' +test_expect_success 'diff in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git diff && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git diff && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + test_expect_success 'git diff --raw HEAD' ' hexsz=$(test_oid hexsz) && git diff --raw --abbrev=$hexsz HEAD >actual && diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index d050091345..52a82b703f 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -412,4 +412,23 @@ test_expect_success 'status with added file in nested submodule (short)' ' EOF ' +test_expect_success 'status in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git status && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git status && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + test_done -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules 2023-01-17 19:30 ` [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan @ 2023-01-26 9:09 ` Glen Choo 2023-01-26 9:16 ` Glen Choo 1 sibling, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-01-26 9:09 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, newren, jonathantanmy As Jonathan mentioned in [1], I think we should refactor functions out from the serial implementation in a preparatory patch, then use those functions to implement the parallel version in this patch. In its current form, there is a fair amount of duplicated code, which makes it tricky to review because of the additional overhead of checking what the duplicated code does and whether we've copied it correcly. For cleanliness, I'll only point out the duplicated code in this email; I'll comment on other things I spotted in another one. [1] https://lore.kernel.org/git/20221128210125.2751300-1-jonathantanmy@google.com/ Calvin Wan <calvinwan@google.com> writes: > + for (size_t i = 0; i < submodules.nr; i++) { > + struct submodule_status_util *util = submodules.items[i].util; > + struct cache_entry *ce = util->ce; > + unsigned int oldmode; > + const struct object_id *old_oid, *new_oid; > + > + if (!util->changed && !util->dirty_submodule) { > + ce_mark_uptodate(ce); > + mark_fsmonitor_valid(istate, ce); > + if (!revs->diffopt.flags.find_copies_harder) > + continue; > + } > + oldmode = ce->ce_mode; > + old_oid = &ce->oid; > + new_oid = util->changed ? null_oid() : &ce->oid; > + diff_change(&revs->diffopt, oldmode, util->newmode, > + old_oid, new_oid, > + !is_null_oid(old_oid), > + !is_null_oid(new_oid), > + ce->name, 0, util->dirty_submodule); > + } > + } The lines from "if (!util->changed && !util->dirty_submodule)" onwards are copied from earlier in run_diff_files(). This might be refactored into something like diff_submodule_change(). > +static struct status_task * > +get_status_task_from_index(struct submodule_parallel_status *sps, > + struct strbuf *err) > +{ > + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { > + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; > + struct status_task *task; > + struct strbuf buf = STRBUF_INIT; > + const char *git_dir; > + > + strbuf_addf(&buf, "%s/.git", util->path); > + git_dir = read_gitfile(buf.buf); This... > +static int get_next_submodule_status(struct child_process *cp, > + struct strbuf *err, void *data, > + void **task_cb) > +{ > + struct submodule_parallel_status *sps = data; > + struct status_task *task = get_status_task_from_index(sps, err); > + > + if (!task) > + return 0; > + > + child_process_init(cp); > + prepare_submodule_repo_env_in_gitdir(&cp->env); > + > + strvec_init(&cp->args); > + strvec_pushl(&cp->args, "status", "--porcelain=2", NULL); > + if (task->ignore_untracked) > + strvec_push(&cp->args, "-uno"); > + > + prepare_submodule_repo_env(&cp->env); > + cp->git_cmd = 1; this... > +static int status_start_failure(struct strbuf *err, > + void *cb, void *task_cb) > +{ > + struct submodule_parallel_status *sps = cb; > + struct status_task *task = task_cb; > + > + sps->result = 1; > + strbuf_addf(err, > + _("could not run 'git status --porcelain=2' in submodule %s"), > + task->path); > + return 0; > +} this... > +static int status_finish(int retvalue, struct strbuf *err, > + void *cb, void *task_cb) > +{ > + struct submodule_parallel_status *sps = cb; > + struct status_task *task = task_cb; > + struct string_list_item *it = > + string_list_lookup(sps->submodule_names, task->path); > + struct submodule_status_util *util = it->util; > + > + if (retvalue) { > + sps->result = 1; > + strbuf_addf(err, > + _("'git status --porcelain=2' failed in submodule %s"), > + task->path); > + } and this are all copied from different parts of is_submodule_modified(). To refactor them out, I think we could combine the first two into "setup_submodule_status()". The last one could be moved into "process_submodule_status_result()" or perhaps we could find a way to combine it into parse_status_porcelain(). ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules 2023-01-17 19:30 ` [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-01-26 9:09 ` Glen Choo @ 2023-01-26 9:16 ` Glen Choo 2023-01-26 18:52 ` Calvin Wan 1 sibling, 1 reply; 86+ messages in thread From: Glen Choo @ 2023-01-26 9:16 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, newren, jonathantanmy Calvin Wan <calvinwan@google.com> writes: > @@ -226,6 +242,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > newmode = ce->ce_mode; > } else { > struct stat st; > + unsigned ignore_untracked = 0; > + int defer_submodule_status = !!revs->repo; What is the reasoning behind this condition? I would expect revs->repo to always be set, and we would always end up deferring. > newmode = ce_mode_from_stat(ce, st.st_mode); > + if (defer_submodule_status) { > + struct submodule_status_util tmp = { > + .changed = changed, > + .dirty_submodule = 0, > + .ignore_untracked = ignore_untracked, > + .newmode = newmode, > + .ce = ce, > + .path = ce->name, > + }; > + struct string_list_item *item; > + > + item = string_list_append(&submodules, ce->name); > + item->util = xmalloc(sizeof(tmp)); > + memcpy(item->util, &tmp, sizeof(tmp)); (Not a C expert) Since we don't return the string list, I wonder if we can avoid the memcpy() by using &tmp like so: struct string_list_item *item; item = string_list_append(&submodules, ce->name); item->util = &tmp; And then when we call string_list_clear(), we wouldn't need to free the util since we exit the stack frame. > +test_expect_success 'diff in superproject with submodules respects parallel settings' ' > + test_when_finished "rm -f trace.out" && > + ( > + GIT_TRACE=$(pwd)/trace.out git diff && > + grep "1 tasks" trace.out && > + >trace.out && > + > + git config submodule.diffJobs 8 && > + GIT_TRACE=$(pwd)/trace.out git diff && > + grep "8 tasks" trace.out && > + >trace.out && > + > + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && > + grep "preparing to run up to [0-9]* tasks" trace.out && > + ! grep "up to 0 tasks" trace.out && > + >trace.out > + ) > +' > + Could we get tests to check that the output of git diff isn't changed by setting parallelism? This might not be feasible for submodule.diffJobs > 1 due to raciness, but it would be good to see for submodule.diffJobs = 1 at least. > test_expect_success 'git diff --raw HEAD' ' > hexsz=$(test_oid hexsz) && > git diff --raw --abbrev=$hexsz HEAD >actual && > diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh > index d050091345..52a82b703f 100755 > --- a/t/t7506-status-submodule.sh > +++ b/t/t7506-status-submodule.sh > @@ -412,4 +412,23 @@ test_expect_success 'status with added file in nested submodule (short)' ' > EOF > ' > > +test_expect_success 'status in superproject with submodules respects parallel settings' ' > + test_when_finished "rm -f trace.out" && > + ( > + GIT_TRACE=$(pwd)/trace.out git status && > + grep "1 tasks" trace.out && > + >trace.out && > + > + git config submodule.diffJobs 8 && > + GIT_TRACE=$(pwd)/trace.out git status && > + grep "8 tasks" trace.out && > + >trace.out && > + > + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && > + grep "preparing to run up to [0-9]* tasks" trace.out && > + ! grep "up to 0 tasks" trace.out && > + >trace.out > + ) > +' > + Ditto for "status". ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules 2023-01-26 9:16 ` Glen Choo @ 2023-01-26 18:52 ` Calvin Wan 0 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-26 18:52 UTC (permalink / raw) To: Glen Choo Cc: git, emilyshaffer, avarab, phillip.wood123, newren, jonathantanmy On Thu, Jan 26, 2023 at 1:16 AM Glen Choo <chooglen@google.com> wrote: > > > Calvin Wan <calvinwan@google.com> writes: > > > @@ -226,6 +242,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > > newmode = ce->ce_mode; > > } else { > > struct stat st; > > + unsigned ignore_untracked = 0; > > + int defer_submodule_status = !!revs->repo; > > What is the reasoning behind this condition? I would expect revs->repo > to always be set, and we would always end up deferring. Ah looks like a vestigial sanity check. You're correct that we would always be deferring anyways. > > > newmode = ce_mode_from_stat(ce, st.st_mode); > > + if (defer_submodule_status) { > > + struct submodule_status_util tmp = { > > + .changed = changed, > > + .dirty_submodule = 0, > > + .ignore_untracked = ignore_untracked, > > + .newmode = newmode, > > + .ce = ce, > > + .path = ce->name, > > + }; > > + struct string_list_item *item; > > + > > + item = string_list_append(&submodules, ce->name); > > + item->util = xmalloc(sizeof(tmp)); > > + memcpy(item->util, &tmp, sizeof(tmp)); > > (Not a C expert) Since we don't return the string list, I wonder if we > can avoid the memcpy() by using &tmp like so: > > struct string_list_item *item; > item = string_list_append(&submodules, ce->name); > item->util = &tmp; > > And then when we call string_list_clear(), we wouldn't need to free the > util since we exit the stack frame. Unfortunately this doesn't work because tmp is deallocated off the stack after changing scope. > > +test_expect_success 'diff in superproject with submodules respects parallel settings' ' > > + test_when_finished "rm -f trace.out" && > > + ( > > + GIT_TRACE=$(pwd)/trace.out git diff && > > + grep "1 tasks" trace.out && > > + >trace.out && > > + > > + git config submodule.diffJobs 8 && > > + GIT_TRACE=$(pwd)/trace.out git diff && > > + grep "8 tasks" trace.out && > > + >trace.out && > > + > > + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && > > + grep "preparing to run up to [0-9]* tasks" trace.out && > > + ! grep "up to 0 tasks" trace.out && > > + >trace.out > > + ) > > +' > > + > > Could we get tests to check that the output of git diff isn't changed by > setting parallelism? This might not be feasible for submodule.diffJobs > > 1 due to raciness, but it would be good to see for submodule.diffJobs = > 1 at least. ack. > > > test_expect_success 'git diff --raw HEAD' ' > > hexsz=$(test_oid hexsz) && > > git diff --raw --abbrev=$hexsz HEAD >actual && > > diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh > > index d050091345..52a82b703f 100755 > > --- a/t/t7506-status-submodule.sh > > +++ b/t/t7506-status-submodule.sh > > @@ -412,4 +412,23 @@ test_expect_success 'status with added file in nested submodule (short)' ' > > EOF > > ' > > > > +test_expect_success 'status in superproject with submodules respects parallel settings' ' > > + test_when_finished "rm -f trace.out" && > > + ( > > + GIT_TRACE=$(pwd)/trace.out git status && > > + grep "1 tasks" trace.out && > > + >trace.out && > > + > > + git config submodule.diffJobs 8 && > > + GIT_TRACE=$(pwd)/trace.out git status && > > + grep "8 tasks" trace.out && > > + >trace.out && > > + > > + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && > > + grep "preparing to run up to [0-9]* tasks" trace.out && > > + ! grep "up to 0 tasks" trace.out && > > + >trace.out > > + ) > > +' > > + > > Ditto for "status". ack. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v6 6/6] submodule: call parallel code from serial status 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan ` (6 preceding siblings ...) 2023-01-17 19:30 ` [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan @ 2023-01-17 19:30 ` Calvin Wan 2023-01-26 8:09 ` Glen Choo 7 siblings, 1 reply; 86+ messages in thread From: Calvin Wan @ 2023-01-17 19:30 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Remove the serial implementation of status inside of is_submodule_modified since the parallel implementation of status with one job accomplishes the same task. Combine parse_status_porcelain and parse_status_porcelain_strbuf since the only other caller of parse_status_porcelain was in is_submodule_modified Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 146 ++++++++++++++++++---------------------------------- 1 file changed, 51 insertions(+), 95 deletions(-) diff --git a/submodule.c b/submodule.c index da95ea1f5e..2009748d9f 100644 --- a/submodule.c +++ b/submodule.c @@ -1887,46 +1887,7 @@ int fetch_submodules(struct repository *r, return spf.result; } -static int parse_status_porcelain(char *str, size_t len, - unsigned *dirty_submodule, - int ignore_untracked) -{ - /* regular untracked files */ - if (str[0] == '?') - *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '1' || - str[0] == '2') { - /* T = line type, XY = status, SSSS = submodule state */ - if (len < strlen("T XY SSSS")) - BUG("invalid status --porcelain=2 line %s", - str); - - if (str[5] == 'S' && str[8] == 'U') - /* nested untracked file */ - *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '2' || - memcmp(str + 5, "S..U", 4)) - /* other change */ - *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; - } - - if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || - ignore_untracked)) { - /* - * We're not interested in any further information from - * the child any more, neither output nor its exit code. - */ - return 1; - } - return 0; -} - -static void parse_status_porcelain_strbuf(struct strbuf *buf, +static void parse_status_porcelain(struct strbuf *buf, unsigned *dirty_submodule, int ignore_untracked) { @@ -1936,65 +1897,60 @@ static void parse_status_porcelain_strbuf(struct strbuf *buf, string_list_split(&list, buf->buf, '\n', -1); for_each_string_list_item(item, &list) { - if (parse_status_porcelain(item->string, - strlen(item->string), - dirty_submodule, - ignore_untracked)) + char *str = item->string; + /* regular untracked files */ + if (str[0] == '?') + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { + /* T = line type, XY = status, SSSS = submodule state */ + if (strlen(str) < strlen("T XY SSSS")) + BUG("invalid status --porcelain=2 line %s", + str); + + if (str[5] == 'S' && str[8] == 'U') + /* nested untracked file */ + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) + /* other change */ + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } + + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + ignore_untracked)) { + /* + * We're not interested in any further information from + * the child any more, neither output nor its exit code. + */ break; + } } string_list_clear(&list, 0); } unsigned is_submodule_modified(const char *path, int ignore_untracked) { - struct child_process cp = CHILD_PROCESS_INIT; - struct strbuf buf = STRBUF_INIT; - FILE *fp; - unsigned dirty_submodule = 0; - const char *git_dir; - int ignore_cp_exit_code = 0; - - strbuf_addf(&buf, "%s/.git", path); - git_dir = read_gitfile(buf.buf); - if (!git_dir) - git_dir = buf.buf; - if (!is_git_directory(git_dir)) { - if (is_directory(git_dir)) - die(_("'%s' not recognized as a git repository"), git_dir); - strbuf_release(&buf); - /* The submodule is not checked out, so it is not modified */ - return 0; - } - strbuf_reset(&buf); - - strvec_pushl(&cp.args, "status", "--porcelain=2", NULL); - if (ignore_untracked) - strvec_push(&cp.args, "-uno"); - - prepare_submodule_repo_env(&cp.env); - cp.git_cmd = 1; - cp.no_stdin = 1; - cp.out = -1; - cp.dir = path; - if (start_command(&cp)) - die(_("Could not run 'git status --porcelain=2' in submodule %s"), path); - - fp = xfdopen(cp.out, "r"); - while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { - char *str = buf.buf; - const size_t len = buf.len; - - ignore_cp_exit_code = parse_status_porcelain(str, len, &dirty_submodule, - ignore_untracked); - if (ignore_cp_exit_code) - break; - } - fclose(fp); - - if (finish_command(&cp) && !ignore_cp_exit_code) - die(_("'git status --porcelain=2' failed in submodule %s"), path); - - strbuf_release(&buf); + struct submodule_status_util util = { + .dirty_submodule = 0, + .ignore_untracked = ignore_untracked, + .path = path, + }; + struct string_list sub = STRING_LIST_INIT_NODUP; + struct string_list_item *item; + int dirty_submodule; + + item = string_list_append(&sub, path); + item->util = &util; + if (get_submodules_status(&sub, 1)) + die(_("submodule status failed")); + dirty_submodule = util.dirty_submodule; + string_list_clear(&sub, 0); return dirty_submodule; } @@ -2096,9 +2052,9 @@ static int status_finish(int retvalue, struct strbuf *err, task->path); } - parse_status_porcelain_strbuf(&task->out, - &util->dirty_submodule, - util->ignore_untracked); + parse_status_porcelain(&task->out, + &util->dirty_submodule, + util->ignore_untracked); strbuf_release(&task->out); free(task); -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [PATCH v6 6/6] submodule: call parallel code from serial status 2023-01-17 19:30 ` [PATCH v6 6/6] submodule: call parallel code from serial status Calvin Wan @ 2023-01-26 8:09 ` Glen Choo 2023-01-26 8:45 ` Glen Choo 0 siblings, 1 reply; 86+ messages in thread From: Glen Choo @ 2023-01-26 8:09 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, newren, jonathantanmy Calvin Wan <calvinwan@google.com> writes: > Remove the serial implementation of status inside of > is_submodule_modified since the parallel implementation of status with > one job accomplishes the same task. > > Combine parse_status_porcelain and parse_status_porcelain_strbuf since > the only other caller of parse_status_porcelain was in > is_submodule_modified I see that this is in direct response to Jonathan's earlier comment [1] that we should have only one implementation. Thanks, this is helpful. Definitely a step in the right direction. That said, I don't think this patch's position in the series makes sense. I would have expected a patch like this to come before 5/6. I.e. this series duplicates code in 5/6 and deletes it in 6/6 so that we only have one implementation for both serial and parallel submodule status. Instead, I would have expected we would refactor out the serial implementation, then use the refactored code for the parallel implementation. Not having duplicated code in 5/6 would shrink the line count a lot and make it easier to review. [1] https://lore.kernel.org/git/20221128210125.2751300-1-jonathantanmy@google.com/ ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [PATCH v6 6/6] submodule: call parallel code from serial status 2023-01-26 8:09 ` Glen Choo @ 2023-01-26 8:45 ` Glen Choo 0 siblings, 0 replies; 86+ messages in thread From: Glen Choo @ 2023-01-26 8:45 UTC (permalink / raw) To: Calvin Wan, git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, newren, jonathantanmy Glen Choo <chooglen@google.com> writes: > Calvin Wan <calvinwan@google.com> writes: > >> Remove the serial implementation of status inside of >> is_submodule_modified since the parallel implementation of status with >> one job accomplishes the same task. > > I see that this is in direct response to Jonathan's earlier comment [1] > that we should have only one implementation. Thanks, this is helpful. > Definitely a step in the right direction. > > That said, I don't think this patch's position in the series makes > sense. I would have expected a patch like this to come before 5/6. I.e. > this series duplicates code in 5/6 and deletes it in 6/6 so that we only > have one implementation for both serial and parallel submodule status. > > Instead, I would have expected we would refactor out the serial > implementation, then use the refactored code for the parallel > implementation. Not having duplicated code in 5/6 would shrink the line > count a lot and make it easier to review. > > [1] https://lore.kernel.org/git/20221128210125.2751300-1-jonathantanmy@google.com/ Ah, I realize I completely misunderstood this patch. I thought that this was deleting code that was duplicated between the serial and parallel implementations in 5/6 such that both ended up sharing just one copy of the code. Instead, this patch deletes the serial implementation altogether and replaces it with the parallel one. As such, this patch can't come earlier than 5/6, because we need the parallel implementation to exist before we can use it. For reviewability of 5/6, I'd still strongly prefer that we refactor out functions (I'll leave more specific comments on that patch). We could still consider replacing the serial implementation with "parallel with a single job", though I suspect that it will be unnecessary if we do the refactoring well. I'm also not sure how idiomatic it is to call run_processes_parallel() with a hardcoded value of 1, but I don't feel too strongly about that. ^ permalink raw reply [flat|nested] 86+ messages in thread
* [PATCH v5 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan @ 2023-01-04 21:54 ` Calvin Wan 2023-01-04 21:54 ` [PATCH v5 2/6] submodule: strbuf variable rename Calvin Wan ` (4 subsequent siblings) 6 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-04 21:54 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Add duplicate_output_fn as an optionally set function in run_process_parallel_opts. If set, output from each child process is copied and passed to the callback function whenever output from the child process is buffered to allow for separate parsing. Signed-off-by: Calvin Wan <calvinwan@google.com> --- run-command.c | 16 ++++++++++++--- run-command.h | 27 +++++++++++++++++++++++++ t/helper/test-run-command.c | 21 ++++++++++++++++++++ t/t0061-run-command.sh | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 100 insertions(+), 3 deletions(-) diff --git a/run-command.c b/run-command.c index 756f1839aa..cad88befe0 100644 --- a/run-command.c +++ b/run-command.c @@ -1526,6 +1526,9 @@ static void pp_init(struct parallel_processes *pp, if (!opts->get_next_task) BUG("you need to specify a get_next_task function"); + if (opts->duplicate_output && opts->ungroup) + BUG("duplicate_output and ungroup are incompatible with each other"); + CALLOC_ARRAY(pp->children, n); if (!opts->ungroup) CALLOC_ARRAY(pp->pfd, n); @@ -1645,14 +1648,21 @@ static void pp_buffer_stderr(struct parallel_processes *pp, for (size_t i = 0; i < opts->processes; i++) { if (pp->children[i].state == GIT_CP_WORKING && pp->pfd[i].revents & (POLLIN | POLLHUP)) { - int n = strbuf_read_once(&pp->children[i].err, - pp->children[i].process.err, 0); + ssize_t n = strbuf_read_once(&pp->children[i].err, + pp->children[i].process.err, 0); if (n == 0) { close(pp->children[i].process.err); pp->children[i].state = GIT_CP_WAIT_CLEANUP; - } else if (n < 0) + } else if (n < 0) { if (errno != EAGAIN) die_errno("read"); + } else { + if (opts->duplicate_output) + opts->duplicate_output(&pp->children[i].err, + strlen(pp->children[i].err.buf) - n, + opts->data, + pp->children[i].data); + } } } } diff --git a/run-command.h b/run-command.h index 072db56a4d..6dcf999f6c 100644 --- a/run-command.h +++ b/run-command.h @@ -408,6 +408,27 @@ typedef int (*start_failure_fn)(struct strbuf *out, void *pp_cb, void *pp_task_cb); +/** + * This callback is called whenever output from a child process is buffered + * + * See run_processes_parallel() below for a discussion of the "struct + * strbuf *out" parameter. + * + * The offset refers to the number of bytes originally in "out" before + * the output from the child process was buffered. Therefore, the buffer + * range, "out + buf" to the end of "out", would contain the buffer of + * the child process output. + * + * pp_cb is the callback cookie as passed into run_processes_parallel, + * pp_task_cb is the callback cookie as passed into get_next_task_fn. + * + * This function is incompatible with "ungroup" + */ +typedef void (*duplicate_output_fn)(struct strbuf *out, + size_t offset, + void *pp_cb, + void *pp_task_cb); + /** * This callback is called on every child process that finished processing. * @@ -461,6 +482,12 @@ struct run_process_parallel_opts */ start_failure_fn start_failure; + /** + * duplicate_output: See duplicate_output_fn() above. This should be + * NULL unless process specific output is needed + */ + duplicate_output_fn duplicate_output; + /** * task_finished: See task_finished_fn() above. This can be * NULL to omit any special handling. diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index 3ecb830f4a..ffd3cd0045 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -52,6 +52,21 @@ static int no_job(struct child_process *cp, return 0; } +static void duplicate_output(struct strbuf *out, + size_t offset, + void *pp_cb UNUSED, + void *pp_task_cb UNUSED) +{ + struct string_list list = STRING_LIST_INIT_DUP; + + string_list_split(&list, out->buf + offset, '\n', -1); + for (size_t i = 0; i < list.nr; i++) { + if (strlen(list.items[i].string) > 0) + fprintf(stderr, "duplicate_output: %s\n", list.items[i].string); + } + string_list_clear(&list, 0); +} + static int task_finished(int result, struct strbuf *err, void *pp_cb, @@ -439,6 +454,12 @@ int cmd__run_command(int argc, const char **argv) opts.ungroup = 1; } + if (!strcmp(argv[1], "--duplicate-output")) { + argv += 1; + argc -= 1; + opts.duplicate_output = duplicate_output; + } + jobs = atoi(argv[2]); strvec_clear(&proc.args); strvec_pushv(&proc.args, (const char **)argv + 3); diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index e2411f6a9b..879e536638 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -135,6 +135,15 @@ test_expect_success 'run_command runs in parallel with more jobs available than test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more jobs available than tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' ' test-tool run-command --ungroup run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -147,6 +156,15 @@ test_expect_success 'run_command runs in parallel with as many jobs as tasks' ' test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with as many jobs as tasks --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' ' test-tool run-command --ungroup run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -159,6 +177,15 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai test_cmp expect actual ' +test_expect_success 'run_command runs in parallel with more tasks than jobs available --duplicate-output' ' + test-tool run-command --duplicate-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test 4 = $(grep -c "duplicate_output: Hello" err) && + test 4 = $(grep -c "duplicate_output: World" err) && + sed "/duplicate_output/d" err > err1 && + test_cmp expect err1 +' + test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' ' test-tool run-command --ungroup run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_line_count = 8 out && @@ -180,6 +207,12 @@ test_expect_success 'run_command is asked to abort gracefully' ' test_cmp expect actual ' +test_expect_success 'run_command is asked to abort gracefully --duplicate-output' ' + test-tool run-command --duplicate-output run-command-abort 3 false >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command is asked to abort gracefully (ungroup)' ' test-tool run-command --ungroup run-command-abort 3 false >out 2>err && test_must_be_empty out && @@ -196,6 +229,12 @@ test_expect_success 'run_command outputs ' ' test_cmp expect actual ' +test_expect_success 'run_command outputs --duplicate-output' ' + test-tool run-command --duplicate-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && + test_must_be_empty out && + test_cmp expect err +' + test_expect_success 'run_command outputs (ungroup) ' ' test-tool run-command --ungroup run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err && test_must_be_empty out && -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v5 2/6] submodule: strbuf variable rename [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan 2023-01-04 21:54 ` [PATCH v5 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan @ 2023-01-04 21:54 ` Calvin Wan 2023-01-04 21:54 ` [PATCH v5 3/6] submodule: move status parsing into function Calvin Wan ` (3 subsequent siblings) 6 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-04 21:54 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy A prepatory change for a future patch that moves the status parsing logic to a separate function. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/submodule.c b/submodule.c index fae24ef34a..faf37c1101 100644 --- a/submodule.c +++ b/submodule.c @@ -1906,25 +1906,28 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) fp = xfdopen(cp.out, "r"); while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { + char *str = buf.buf; + const size_t len = buf.len; + /* regular untracked files */ - if (buf.buf[0] == '?') + if (str[0] == '?') dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '1' || - buf.buf[0] == '2') { + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { /* T = line type, XY = status, SSSS = submodule state */ - if (buf.len < strlen("T XY SSSS")) + if (len < strlen("T XY SSSS")) BUG("invalid status --porcelain=2 line %s", - buf.buf); + str); - if (buf.buf[5] == 'S' && buf.buf[8] == 'U') + if (str[5] == 'S' && str[8] == 'U') /* nested untracked file */ dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - if (buf.buf[0] == 'u' || - buf.buf[0] == '2' || - memcmp(buf.buf + 5, "S..U", 4)) + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) /* other change */ dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; } -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v5 3/6] submodule: move status parsing into function [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> ` (2 preceding siblings ...) 2023-01-04 21:54 ` [PATCH v5 2/6] submodule: strbuf variable rename Calvin Wan @ 2023-01-04 21:54 ` Calvin Wan 2023-01-04 21:54 ` [PATCH v5 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan ` (2 subsequent siblings) 6 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-04 21:54 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy A future patch requires the ability to parse the output of git status --porcelain=2. Move parsing code from is_submodule_modified to parse_status_porcelain. Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 74 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 32 deletions(-) diff --git a/submodule.c b/submodule.c index faf37c1101..768d4b4cd7 100644 --- a/submodule.c +++ b/submodule.c @@ -1870,6 +1870,45 @@ int fetch_submodules(struct repository *r, return spf.result; } +static int parse_status_porcelain(char *str, size_t len, + unsigned *dirty_submodule, + int ignore_untracked) +{ + /* regular untracked files */ + if (str[0] == '?') + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { + /* T = line type, XY = status, SSSS = submodule state */ + if (len < strlen("T XY SSSS")) + BUG("invalid status --porcelain=2 line %s", + str); + + if (str[5] == 'S' && str[8] == 'U') + /* nested untracked file */ + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) + /* other change */ + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } + + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + ignore_untracked)) { + /* + * We're not interested in any further information from + * the child any more, neither output nor its exit code. + */ + return 1; + } + return 0; +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1909,39 +1948,10 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) char *str = buf.buf; const size_t len = buf.len; - /* regular untracked files */ - if (str[0] == '?') - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '1' || - str[0] == '2') { - /* T = line type, XY = status, SSSS = submodule state */ - if (len < strlen("T XY SSSS")) - BUG("invalid status --porcelain=2 line %s", - str); - - if (str[5] == 'S' && str[8] == 'U') - /* nested untracked file */ - dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '2' || - memcmp(str + 5, "S..U", 4)) - /* other change */ - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; - } - - if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || - ignore_untracked)) { - /* - * We're not interested in any further information from - * the child any more, neither output nor its exit code. - */ - ignore_cp_exit_code = 1; + ignore_cp_exit_code = parse_status_porcelain(str, len, &dirty_submodule, + ignore_untracked); + if (ignore_cp_exit_code) break; - } } fclose(fp); -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v5 4/6] diff-lib: refactor match_stat_with_submodule [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> ` (3 preceding siblings ...) 2023-01-04 21:54 ` [PATCH v5 3/6] submodule: move status parsing into function Calvin Wan @ 2023-01-04 21:54 ` Calvin Wan 2023-01-04 21:54 ` [PATCH v5 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-01-04 21:54 ` [PATCH v5 6/6] submodule: call parallel code from serial status Calvin Wan 6 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-04 21:54 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Flatten out the if statements in match_stat_with_submodule so the logic is more readable and easier for future patches to add to. orig_flags didn't need to be set if the cache entry wasn't a GITLINK so defer setting it. Signed-off-by: Calvin Wan <calvinwan@google.com> --- diff-lib.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index dec040c366..64583fded0 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -73,18 +73,24 @@ static int match_stat_with_submodule(struct diff_options *diffopt, unsigned *dirty_submodule) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); - if (S_ISGITLINK(ce->ce_mode)) { - struct diff_flags orig_flags = diffopt->flags; - if (!diffopt->flags.override_submodule_config) - set_diffopt_flags_from_submodule_config(diffopt, ce->name); - if (diffopt->flags.ignore_submodules) - changed = 0; - else if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, - diffopt->flags.ignore_untracked_in_submodules); - diffopt->flags = orig_flags; + struct diff_flags orig_flags; + + if (!S_ISGITLINK(ce->ce_mode)) + return changed; + + orig_flags = diffopt->flags; + if (!diffopt->flags.override_submodule_config) + set_diffopt_flags_from_submodule_config(diffopt, ce->name); + if (diffopt->flags.ignore_submodules) { + changed = 0; + goto cleanup; } + if (!diffopt->flags.ignore_dirty_submodules && + (!changed || diffopt->flags.dirty_submodules)) + *dirty_submodule = is_submodule_modified(ce->name, + diffopt->flags.ignore_untracked_in_submodules); +cleanup: + diffopt->flags = orig_flags; return changed; } -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v5 5/6] diff-lib: parallelize run_diff_files for submodules [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> ` (4 preceding siblings ...) 2023-01-04 21:54 ` [PATCH v5 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan @ 2023-01-04 21:54 ` Calvin Wan 2023-01-04 21:54 ` [PATCH v5 6/6] submodule: call parallel code from serial status Calvin Wan 6 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-04 21:54 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy During the iteration of the index entries in run_diff_files, whenever a submodule is found and needs its status checked, a subprocess is spawned for it. Instead of spawning the subprocess immediately and waiting for its completion to continue, hold onto all submodules and relevant information in a list. Then use that list to create tasks for run_processes_parallel. Subprocess output is duplicated and passed to status_pipe_output which stores it to be parsed on completion of the subprocess. Add config option submodule.diffJobs to set the maximum number of parallel jobs. The option defaults to 1 if unset. If set to 0, the number of jobs is set to online_cpus(). Since run_diff_files is called from many different commands, I chose to grab the config option in the function rather than adding variables to every git command and then figuring out how to pass them all in. Signed-off-by: Calvin Wan <calvinwan@google.com> --- Documentation/config/submodule.txt | 12 +++ diff-lib.c | 84 +++++++++++++-- submodule.c | 168 +++++++++++++++++++++++++++++ submodule.h | 9 ++ t/t4027-diff-submodule.sh | 19 ++++ t/t7506-status-submodule.sh | 19 ++++ 6 files changed, 304 insertions(+), 7 deletions(-) diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt index 6490527b45..3209eb8117 100644 --- a/Documentation/config/submodule.txt +++ b/Documentation/config/submodule.txt @@ -93,6 +93,18 @@ submodule.fetchJobs:: in parallel. A value of 0 will give some reasonable default. If unset, it defaults to 1. +submodule.diffJobs:: + Specifies how many submodules are diffed at the same time. A + positive integer allows up to that number of submodules diffed + in parallel. A value of 0 will give some reasonable default. + If unset, it defaults to 1. The diff operation is used by many + other git commands such as add, merge, diff, status, stash and + more. Note that the expensive part of the diff operation is + reading the index from cache or memory. Therefore multiple jobs + may be detrimental to performance if your hardware does not + support parallel reads or if the number of jobs greatly exceeds + the amount of supported reads. + submodule.alternateLocation:: Specifies how the submodules obtain alternates when submodules are cloned. Possible values are `no`, `superproject`. diff --git a/diff-lib.c b/diff-lib.c index 64583fded0..f51ea07f36 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -14,6 +14,7 @@ #include "dir.h" #include "fsmonitor.h" #include "commit-reach.h" +#include "config.h" /* * diff-files @@ -65,18 +66,23 @@ static int check_removed(const struct index_state *istate, const struct cache_en * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES * option is set, the caller does not only want to know if a submodule is * modified at all but wants to know all the conditions that are met (new - * commits, untracked content and/or modified content). + * commits, untracked content and/or modified content). If + * defer_submodule_status bit is set, dirty_submodule will be left to the + * caller to set. defer_submodule_status can also be set to 0 in this + * function if there is no need to check if the submodule is modified. */ static int match_stat_with_submodule(struct diff_options *diffopt, const struct cache_entry *ce, struct stat *st, unsigned ce_option, - unsigned *dirty_submodule) + unsigned *dirty_submodule, int *defer_submodule_status, + unsigned *ignore_untracked) { int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option); struct diff_flags orig_flags; + int defer = 0; if (!S_ISGITLINK(ce->ce_mode)) - return changed; + goto ret; orig_flags = diffopt->flags; if (!diffopt->flags.override_submodule_config) @@ -86,11 +92,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt, goto cleanup; } if (!diffopt->flags.ignore_dirty_submodules && - (!changed || diffopt->flags.dirty_submodules)) - *dirty_submodule = is_submodule_modified(ce->name, + (!changed || diffopt->flags.dirty_submodules)) { + if (defer_submodule_status && *defer_submodule_status) { + defer = 1; + *ignore_untracked = diffopt->flags.ignore_untracked_in_submodules; + } else { + *dirty_submodule = is_submodule_modified(ce->name, diffopt->flags.ignore_untracked_in_submodules); + } + } cleanup: diffopt->flags = orig_flags; +ret: + if (defer_submodule_status) + *defer_submodule_status = defer; return changed; } @@ -102,6 +117,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ? CE_MATCH_RACY_IS_DIRTY : 0); uint64_t start = getnanotime(); struct index_state *istate = revs->diffopt.repo->index; + struct string_list submodules = STRING_LIST_INIT_NODUP; diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); @@ -226,6 +242,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) newmode = ce->ce_mode; } else { struct stat st; + unsigned ignore_untracked = 0; + int defer_submodule_status = !!revs->repo; changed = check_removed(istate, ce, &st); if (changed) { @@ -247,8 +265,26 @@ int run_diff_files(struct rev_info *revs, unsigned int option) } changed = match_stat_with_submodule(&revs->diffopt, ce, &st, - ce_option, &dirty_submodule); + ce_option, &dirty_submodule, + &defer_submodule_status, + &ignore_untracked); newmode = ce_mode_from_stat(ce, st.st_mode); + if (defer_submodule_status) { + struct submodule_status_util tmp = { + .changed = changed, + .dirty_submodule = 0, + .ignore_untracked = ignore_untracked, + .newmode = newmode, + .ce = ce, + .path = ce->name, + }; + struct string_list_item *item; + + item = string_list_append(&submodules, ce->name); + item->util = xmalloc(sizeof(tmp)); + memcpy(item->util, &tmp, sizeof(tmp)); + continue; + } } if (!changed && !dirty_submodule) { @@ -267,6 +303,40 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ce->name, 0, dirty_submodule); } + if (submodules.nr > 0) { + int parallel_jobs; + if (git_config_get_int("submodule.diffjobs", ¶llel_jobs)) + parallel_jobs = 1; + else if (!parallel_jobs) + parallel_jobs = online_cpus(); + else if (parallel_jobs < 0) + die(_("submodule.diffjobs cannot be negative")); + + if (get_submodules_status(&submodules, parallel_jobs)) + die(_("submodule status failed")); + for (size_t i = 0; i < submodules.nr; i++) { + struct submodule_status_util *util = submodules.items[i].util; + struct cache_entry *ce = util->ce; + unsigned int oldmode; + const struct object_id *old_oid, *new_oid; + + if (!util->changed && !util->dirty_submodule) { + ce_mark_uptodate(ce); + mark_fsmonitor_valid(istate, ce); + if (!revs->diffopt.flags.find_copies_harder) + continue; + } + oldmode = ce->ce_mode; + old_oid = &ce->oid; + new_oid = util->changed ? null_oid() : &ce->oid; + diff_change(&revs->diffopt, oldmode, util->newmode, + old_oid, new_oid, + !is_null_oid(old_oid), + !is_null_oid(new_oid), + ce->name, 0, util->dirty_submodule); + } + } + string_list_clear(&submodules, 1); diffcore_std(&revs->diffopt); diff_flush(&revs->diffopt); trace_performance_since(start, "diff-files"); @@ -314,7 +384,7 @@ static int get_stat_data(const struct index_state *istate, return -1; } changed = match_stat_with_submodule(diffopt, ce, &st, - 0, dirty_submodule); + 0, dirty_submodule, NULL, NULL); if (changed) { mode = ce_mode_from_stat(ce, st.st_mode); oid = null_oid(); diff --git a/submodule.c b/submodule.c index 768d4b4cd7..a0ca646d9b 100644 --- a/submodule.c +++ b/submodule.c @@ -1369,6 +1369,17 @@ int submodule_touches_in_range(struct repository *r, return ret; } +struct submodule_parallel_status { + size_t index_count; + int result; + + struct string_list *submodule_names; + + /* Pending statuses by OIDs */ + struct status_task **oid_status_tasks; + int oid_status_tasks_nr, oid_status_tasks_alloc; +}; + struct submodule_parallel_fetch { /* * The index of the last index entry processed by @@ -1451,6 +1462,12 @@ struct fetch_task { struct oid_array *commits; /* Ensure these commits are fetched */ }; +struct status_task { + const char *path; + struct strbuf out; + int ignore_untracked; +}; + /** * When a submodule is not defined in .gitmodules, we cannot access it * via the regular submodule-config. Create a fake submodule, which we can @@ -1909,6 +1926,25 @@ static int parse_status_porcelain(char *str, size_t len, return 0; } +static void parse_status_porcelain_strbuf(struct strbuf *buf, + unsigned *dirty_submodule, + int ignore_untracked) +{ + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + + string_list_split(&list, buf->buf, '\n', -1); + + for_each_string_list_item(item, &list) { + if (parse_status_porcelain(item->string, + strlen(item->string), + dirty_submodule, + ignore_untracked)) + break; + } + string_list_clear(&list, 0); +} + unsigned is_submodule_modified(const char *path, int ignore_untracked) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1962,6 +1998,138 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked) return dirty_submodule; } +static struct status_task * +get_status_task_from_index(struct submodule_parallel_status *sps, + struct strbuf *err) +{ + for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) { + struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util; + struct status_task *task; + struct strbuf buf = STRBUF_INIT; + const char *git_dir; + + strbuf_addf(&buf, "%s/.git", util->path); + git_dir = read_gitfile(buf.buf); + if (!git_dir) + git_dir = buf.buf; + if (!is_git_directory(git_dir)) { + if (is_directory(git_dir)) + die(_("'%s' not recognized as a git repository"), git_dir); + strbuf_release(&buf); + /* The submodule is not checked out, so it is not modified */ + util->dirty_submodule = 0; + continue; + } + strbuf_release(&buf); + + task = xmalloc(sizeof(*task)); + task->path = util->path; + task->ignore_untracked = util->ignore_untracked; + strbuf_init(&task->out, 0); + sps->index_count++; + return task; + } + return NULL; +} + +static int get_next_submodule_status(struct child_process *cp, + struct strbuf *err, void *data, + void **task_cb) +{ + struct submodule_parallel_status *sps = data; + struct status_task *task = get_status_task_from_index(sps, err); + + if (!task) + return 0; + + child_process_init(cp); + prepare_submodule_repo_env_in_gitdir(&cp->env); + + strvec_init(&cp->args); + strvec_pushl(&cp->args, "status", "--porcelain=2", NULL); + if (task->ignore_untracked) + strvec_push(&cp->args, "-uno"); + + prepare_submodule_repo_env(&cp->env); + cp->git_cmd = 1; + cp->dir = task->path; + *task_cb = task; + return 1; +} + +static int status_start_failure(struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + + sps->result = 1; + strbuf_addf(err, + _("could not run 'git status --porcelain=2' in submodule %s"), + task->path); + return 0; +} + +static void status_duplicate_output(struct strbuf *out, + size_t offset, + void *cb, void *task_cb) +{ + struct status_task *task = task_cb; + + strbuf_add(&task->out, out->buf + offset, out->len - offset); + strbuf_setlen(out, offset); +} + +static int status_finish(int retvalue, struct strbuf *err, + void *cb, void *task_cb) +{ + struct submodule_parallel_status *sps = cb; + struct status_task *task = task_cb; + struct string_list_item *it = + string_list_lookup(sps->submodule_names, task->path); + struct submodule_status_util *util = it->util; + + if (retvalue) { + sps->result = 1; + strbuf_addf(err, + _("'git status --porcelain=2' failed in submodule %s"), + task->path); + } + + parse_status_porcelain_strbuf(&task->out, + &util->dirty_submodule, + util->ignore_untracked); + + free(task); + + return 0; +} + +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs) +{ + struct submodule_parallel_status sps = { + .submodule_names = submodules, + }; + const struct run_process_parallel_opts opts = { + .tr2_category = "submodule", + .tr2_label = "parallel/status", + + .processes = max_parallel_jobs, + + .get_next_task = get_next_submodule_status, + .start_failure = status_start_failure, + .duplicate_output = status_duplicate_output, + .task_finished = status_finish, + .data = &sps, + }; + + string_list_sort(sps.submodule_names); + run_processes_parallel(&opts); + + return sps.result; +} + int submodule_uses_gitfile(const char *path) { struct child_process cp = CHILD_PROCESS_INIT; diff --git a/submodule.h b/submodule.h index b52a4ff1e7..08d278a414 100644 --- a/submodule.h +++ b/submodule.h @@ -41,6 +41,13 @@ struct submodule_update_strategy { .type = SM_UPDATE_UNSPECIFIED, \ } +struct submodule_status_util { + int changed, ignore_untracked; + unsigned dirty_submodule, newmode; + struct cache_entry *ce; + const char *path; +}; + int is_gitmodules_unmerged(struct index_state *istate); int is_writing_gitmodules_ok(void); int is_staging_gitmodules_ok(struct index_state *istate); @@ -94,6 +101,8 @@ int fetch_submodules(struct repository *r, int command_line_option, int default_option, int quiet, int max_parallel_jobs); +int get_submodules_status(struct string_list *submodules, + int max_parallel_jobs); unsigned is_submodule_modified(const char *path, int ignore_untracked); int submodule_uses_gitfile(const char *path); diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh index 40164ae07d..e08ee315a7 100755 --- a/t/t4027-diff-submodule.sh +++ b/t/t4027-diff-submodule.sh @@ -34,6 +34,25 @@ test_expect_success setup ' subtip=$3 subprev=$2 ' +test_expect_success 'diff in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git diff && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git diff && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + test_expect_success 'git diff --raw HEAD' ' hexsz=$(test_oid hexsz) && git diff --raw --abbrev=$hexsz HEAD >actual && diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index d050091345..52a82b703f 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -412,4 +412,23 @@ test_expect_success 'status with added file in nested submodule (short)' ' EOF ' +test_expect_success 'status in superproject with submodules respects parallel settings' ' + test_when_finished "rm -f trace.out" && + ( + GIT_TRACE=$(pwd)/trace.out git status && + grep "1 tasks" trace.out && + >trace.out && + + git config submodule.diffJobs 8 && + GIT_TRACE=$(pwd)/trace.out git status && + grep "8 tasks" trace.out && + >trace.out && + + GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status && + grep "preparing to run up to [0-9]* tasks" trace.out && + ! grep "up to 0 tasks" trace.out && + >trace.out + ) +' + test_done -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
* [PATCH v5 6/6] submodule: call parallel code from serial status [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> ` (5 preceding siblings ...) 2023-01-04 21:54 ` [PATCH v5 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan @ 2023-01-04 21:54 ` Calvin Wan 6 siblings, 0 replies; 86+ messages in thread From: Calvin Wan @ 2023-01-04 21:54 UTC (permalink / raw) To: git Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123, chooglen, newren, jonathantanmy Remove the serial implementation of status inside of is_submodule_modified since the parallel implementation of status with one job accomplishes the same task. Combine parse_status_porcelain and parse_status_porcelain_strbuf since the only other caller of parse_status_porcelain was in is_submodule_modified Signed-off-by: Calvin Wan <calvinwan@google.com> --- submodule.c | 143 ++++++++++++++++++---------------------------------- 1 file changed, 48 insertions(+), 95 deletions(-) diff --git a/submodule.c b/submodule.c index a0ca646d9b..042e26137f 100644 --- a/submodule.c +++ b/submodule.c @@ -1887,46 +1887,7 @@ int fetch_submodules(struct repository *r, return spf.result; } -static int parse_status_porcelain(char *str, size_t len, - unsigned *dirty_submodule, - int ignore_untracked) -{ - /* regular untracked files */ - if (str[0] == '?') - *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '1' || - str[0] == '2') { - /* T = line type, XY = status, SSSS = submodule state */ - if (len < strlen("T XY SSSS")) - BUG("invalid status --porcelain=2 line %s", - str); - - if (str[5] == 'S' && str[8] == 'U') - /* nested untracked file */ - *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; - - if (str[0] == 'u' || - str[0] == '2' || - memcmp(str + 5, "S..U", 4)) - /* other change */ - *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; - } - - if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && - ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || - ignore_untracked)) { - /* - * We're not interested in any further information from - * the child any more, neither output nor its exit code. - */ - return 1; - } - return 0; -} - -static void parse_status_porcelain_strbuf(struct strbuf *buf, +static void parse_status_porcelain(struct strbuf *buf, unsigned *dirty_submodule, int ignore_untracked) { @@ -1936,66 +1897,58 @@ static void parse_status_porcelain_strbuf(struct strbuf *buf, string_list_split(&list, buf->buf, '\n', -1); for_each_string_list_item(item, &list) { - if (parse_status_porcelain(item->string, - strlen(item->string), - dirty_submodule, - ignore_untracked)) + char *str = item->string; + /* regular untracked files */ + if (str[0] == '?') + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '1' || + str[0] == '2') { + /* T = line type, XY = status, SSSS = submodule state */ + if (strlen(str) < strlen("T XY SSSS")) + BUG("invalid status --porcelain=2 line %s", + str); + + if (str[5] == 'S' && str[8] == 'U') + /* nested untracked file */ + *dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED; + + if (str[0] == 'u' || + str[0] == '2' || + memcmp(str + 5, "S..U", 4)) + /* other change */ + *dirty_submodule |= DIRTY_SUBMODULE_MODIFIED; + } + + if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) && + ((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) || + ignore_untracked)) { + /* + * We're not interested in any further information from + * the child any more, neither output nor its exit code. + */ break; + } } string_list_clear(&list, 0); } unsigned is_submodule_modified(const char *path, int ignore_untracked) { - struct child_process cp = CHILD_PROCESS_INIT; - struct strbuf buf = STRBUF_INIT; - FILE *fp; - unsigned dirty_submodule = 0; - const char *git_dir; - int ignore_cp_exit_code = 0; - - strbuf_addf(&buf, "%s/.git", path); - git_dir = read_gitfile(buf.buf); - if (!git_dir) - git_dir = buf.buf; - if (!is_git_directory(git_dir)) { - if (is_directory(git_dir)) - die(_("'%s' not recognized as a git repository"), git_dir); - strbuf_release(&buf); - /* The submodule is not checked out, so it is not modified */ - return 0; - } - strbuf_reset(&buf); - - strvec_pushl(&cp.args, "status", "--porcelain=2", NULL); - if (ignore_untracked) - strvec_push(&cp.args, "-uno"); - - prepare_submodule_repo_env(&cp.env); - cp.git_cmd = 1; - cp.no_stdin = 1; - cp.out = -1; - cp.dir = path; - if (start_command(&cp)) - die(_("Could not run 'git status --porcelain=2' in submodule %s"), path); - - fp = xfdopen(cp.out, "r"); - while (strbuf_getwholeline(&buf, fp, '\n') != EOF) { - char *str = buf.buf; - const size_t len = buf.len; - - ignore_cp_exit_code = parse_status_porcelain(str, len, &dirty_submodule, - ignore_untracked); - if (ignore_cp_exit_code) - break; - } - fclose(fp); - - if (finish_command(&cp) && !ignore_cp_exit_code) - die(_("'git status --porcelain=2' failed in submodule %s"), path); + struct submodule_status_util util = { + .dirty_submodule = 0, + .ignore_untracked = ignore_untracked, + .path = path, + }; + struct string_list sub = STRING_LIST_INIT_NODUP; + struct string_list_item *item; - strbuf_release(&buf); - return dirty_submodule; + item = string_list_append(&sub, path); + item->util = &util; + if (get_submodules_status(&sub, 1)) + die(_("submodule status failed")); + return util.dirty_submodule; } static struct status_task * @@ -2096,9 +2049,9 @@ static int status_finish(int retvalue, struct strbuf *err, task->path); } - parse_status_porcelain_strbuf(&task->out, - &util->dirty_submodule, - util->ignore_untracked); + parse_status_porcelain(&task->out, + &util->dirty_submodule, + util->ignore_untracked); free(task); -- 2.39.0.314.g84b9a713c41-goog ^ permalink raw reply related [flat|nested] 86+ messages in thread
end of thread, other threads:[~2023-03-17 20:42 UTC | newest] Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <https://lore.kernel.org/git/20221108184200.2813458-1-calvinwan@google.com/> 2023-01-04 21:54 ` [PATCH v5 0/6] submodule: parallelize diff Calvin Wan 2023-01-05 23:23 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 " Calvin Wan 2023-02-07 18:16 ` [PATCH v7 0/7] " Calvin Wan 2023-02-08 0:55 ` Ævar Arnfjörð Bjarmason 2023-02-09 0:02 ` [PATCH v8 0/6] " Calvin Wan 2023-02-09 1:42 ` Ævar Arnfjörð Bjarmason 2023-02-09 19:50 ` Junio C Hamano 2023-02-09 21:52 ` Calvin Wan 2023-02-09 22:25 ` Junio C Hamano 2023-02-10 13:24 ` Ævar Arnfjörð Bjarmason 2023-02-10 17:42 ` Junio C Hamano 2023-02-09 20:50 ` Phillip Wood 2023-03-02 21:52 ` [PATCH v9 " Calvin Wan 2023-03-02 22:02 ` [PATCH v9 1/6] run-command: add on_stderr_output_fn to run_processes_parallel_opts Calvin Wan 2023-03-02 22:02 ` [PATCH v9 2/6] submodule: rename strbuf variable Calvin Wan 2023-03-03 0:25 ` Junio C Hamano 2023-03-06 17:37 ` Calvin Wan 2023-03-06 18:30 ` Junio C Hamano 2023-03-06 19:00 ` Calvin Wan 2023-03-02 22:02 ` [PATCH v9 3/6] submodule: move status parsing into function Calvin Wan 2023-03-17 20:42 ` Glen Choo 2023-03-02 22:02 ` [PATCH v9 4/6] submodule: refactor is_submodule_modified() Calvin Wan 2023-03-02 22:02 ` [PATCH v9 5/6] diff-lib: refactor out diff_change logic Calvin Wan 2023-03-02 22:02 ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-03-07 8:41 ` Ævar Arnfjörð Bjarmason 2023-03-07 10:21 ` Ævar Arnfjörð Bjarmason 2023-03-07 17:55 ` Junio C Hamano 2023-03-17 1:09 ` Glen Choo 2023-03-17 2:51 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan 2023-02-13 6:34 ` Glen Choo 2023-02-13 17:52 ` Junio C Hamano 2023-02-13 18:26 ` Calvin Wan 2023-02-09 0:02 ` [PATCH v8 2/6] submodule: strbuf variable rename Calvin Wan 2023-02-13 8:37 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 3/6] submodule: move status parsing into function Calvin Wan 2023-02-09 0:02 ` [PATCH v8 4/6] submodule: refactor is_submodule_modified() Calvin Wan 2023-02-13 7:06 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 5/6] diff-lib: refactor out diff_change logic Calvin Wan 2023-02-09 1:48 ` Ævar Arnfjörð Bjarmason 2023-02-13 8:42 ` Glen Choo 2023-02-13 18:29 ` Calvin Wan 2023-02-14 4:03 ` Glen Choo 2023-02-09 0:02 ` [PATCH v8 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-02-13 8:36 ` Glen Choo 2023-02-07 18:17 ` [PATCH v7 1/7] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan 2023-02-07 22:16 ` Ævar Arnfjörð Bjarmason 2023-02-08 22:50 ` Calvin Wan 2023-02-08 14:19 ` Phillip Wood 2023-02-08 22:54 ` Calvin Wan 2023-02-09 20:37 ` Phillip Wood 2023-02-07 18:17 ` [PATCH v7 2/7] submodule: strbuf variable rename Calvin Wan 2023-02-07 22:47 ` Ævar Arnfjörð Bjarmason 2023-02-08 22:59 ` Calvin Wan 2023-02-07 18:17 ` [PATCH v7 3/7] submodule: move status parsing into function Calvin Wan 2023-02-07 18:17 ` [PATCH v7 4/7] submodule: refactor is_submodule_modified() Calvin Wan 2023-02-07 22:59 ` Ævar Arnfjörð Bjarmason 2023-02-07 18:17 ` [PATCH v7 5/7] diff-lib: refactor out diff_change logic Calvin Wan 2023-02-08 14:28 ` Phillip Wood 2023-02-08 23:12 ` Calvin Wan 2023-02-09 20:53 ` Phillip Wood 2023-02-07 18:17 ` [PATCH v7 6/7] diff-lib: refactor match_stat_with_submodule Calvin Wan 2023-02-08 8:18 ` Ævar Arnfjörð Bjarmason 2023-02-08 17:07 ` Phillip Wood 2023-02-08 23:13 ` Calvin Wan 2023-02-08 14:22 ` Phillip Wood 2023-02-07 18:17 ` [PATCH v7 7/7] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-02-07 23:06 ` Ævar Arnfjörð Bjarmason 2023-01-17 19:30 ` [PATCH v6 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan 2023-01-17 19:30 ` [PATCH v6 2/6] submodule: strbuf variable rename Calvin Wan 2023-01-17 19:30 ` [PATCH v6 3/6] submodule: move status parsing into function Calvin Wan 2023-01-17 19:30 ` [PATCH v6 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan 2023-01-17 19:30 ` [PATCH v6 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-01-26 9:09 ` Glen Choo 2023-01-26 9:16 ` Glen Choo 2023-01-26 18:52 ` Calvin Wan 2023-01-17 19:30 ` [PATCH v6 6/6] submodule: call parallel code from serial status Calvin Wan 2023-01-26 8:09 ` Glen Choo 2023-01-26 8:45 ` Glen Choo 2023-01-04 21:54 ` [PATCH v5 1/6] run-command: add duplicate_output_fn to run_processes_parallel_opts Calvin Wan 2023-01-04 21:54 ` [PATCH v5 2/6] submodule: strbuf variable rename Calvin Wan 2023-01-04 21:54 ` [PATCH v5 3/6] submodule: move status parsing into function Calvin Wan 2023-01-04 21:54 ` [PATCH v5 4/6] diff-lib: refactor match_stat_with_submodule Calvin Wan 2023-01-04 21:54 ` [PATCH v5 5/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan 2023-01-04 21:54 ` [PATCH v5 6/6] submodule: call parallel code from serial status Calvin Wan
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).