git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] submodule: parallelize status
@ 2022-09-22 23:29 Calvin Wan
  2022-09-22 23:29 ` [PATCH 1/4] run-command: add pipe_output to run_processes_parallel Calvin Wan
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-22 23:29 UTC (permalink / raw)
  To: git; +Cc: emilyshaffer, Calvin Wan

When running 'git status' in a superproject, git spawns a subprocess in
series to run status for every submodule. For projects with many large
submodules, parallelizing status subprocesses can significantly speed
up the runtime for both cold and warm caches.

Here are some timing tests from running status on the Android Open
Source Project (AOSP). My machine has an SSD and 48 cores. 
  Warm Cache:
    git 2.37
Time (mean ± σ):     17.685 s ±  2.040 s    [User: 5.041 s, System: 22.799 s]
Range (min … max):   16.168 s … 22.804 s    10 runs

    this patch (status.parallelSubmodules=1)
Time (mean ± σ):     13.102 s ±  0.500 s    [User: 4.894 s, System: 19.533 s]
Range (min … max):   12.841 s … 14.447 s    10 runs

    this patch (status.parallelSubmodules=5)
Time (mean ± σ):      3.994 s ±  0.152 s    [User: 4.998 s, System: 20.805 s]
Range (min … max):    3.744 s …  4.163 s    10 runs

    this patch (status.parallelSubmodules=10)
Time (mean ± σ):      3.445 s ±  0.085 s    [User: 5.151 s, System: 20.208 s]
Range (min … max):    3.319 s …  3.586 s    10 runs

    this patch (status.parallelSubmodules=20)
Time (mean ± σ):      3.626 s ±  0.109 s    [User: 5.087 s, System: 20.366 s]
Range (min … max):    3.438 s …  3.763 s    10 runs

We can see that there are diminishing returns and even slightly worse
performance after a certain number of max processes, but optimally
there is a speed up factor of around 5.

  Cold Cache:
    git 2.37
      mean of 3 runs: 6m32s
    this patch (status.parallelSubmodules=1)
      mean of 3 runs: 5m34s
    this patch (status.parallelSubmodules=5)
      mean of 3 runs: 2m23s
    this patch (status.parallelSubmodules=10)
      mean of 3 runs: 2m45s
    this patch (status.parallelSubmodules=20)
      mean of 3 runs: 3m23s

We can witness the same phenomenon as above and optimally there is a
speed up factor of around 2.7.

Patch 1 adds output piping to run_processes_parallel so the output
from each submodule can be parsed. Patches 2 and 3 move preexisting
functionality into separate functions and refactor code to prepare
for patch 4 to implement parallelization.

Future work: The reason why status is much slower on a cold cache vs
warm cache is mainly due to refreshing the index. It is worth
investigating whether this is entirely necessary.

Calvin Wan (4):
  run-command: add pipe_output to run_processes_parallel
  submodule: move status parsing into function
  diff-lib: refactor functions
  diff-lib: parallelize run_diff_files for submodules

 Documentation/config/status.txt |   6 +
 add-interactive.c               |   2 +-
 builtin/add.c                   |   4 +-
 builtin/commit.c                |   6 +
 builtin/diff-files.c            |   2 +-
 builtin/diff.c                  |   2 +-
 builtin/merge.c                 |   2 +-
 builtin/stash.c                 |   2 +-
 builtin/submodule--helper.c     |   4 +-
 diff-lib.c                      | 120 +++++++++++++-----
 diff.h                          |   2 +-
 run-command.c                   |   6 +-
 run-command.h                   |   9 ++
 submodule.c                     | 213 +++++++++++++++++++++++++++-----
 submodule.h                     |   9 ++
 t/helper/test-run-command.c     |  31 ++++-
 t/t0061-run-command.sh          |  26 ++++
 wt-status.c                     |   6 +-
 wt-status.h                     |   1 +
 19 files changed, 372 insertions(+), 81 deletions(-)


base-commit: 5502f77b6944eda8e26813d8f542cffe7d110aea
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-22 23:29 [PATCH 0/4] submodule: parallelize status Calvin Wan
@ 2022-09-22 23:29 ` Calvin Wan
  2022-09-23  7:52   ` Ævar Arnfjörð Bjarmason
  2022-09-23 18:58   ` Junio C Hamano
  2022-09-22 23:29 ` [PATCH 2/4] submodule: move status parsing into function Calvin Wan
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-22 23:29 UTC (permalink / raw)
  To: git; +Cc: emilyshaffer, Calvin Wan

run_processes_parallel periodically collects output from its child
processes, prints it, and then resets the buffers for each child.
Add run_processes_parallel_pipe_output variable so output can be
collected and fed to task_finished. When set, the function referenced
by task_finished should parse the output of each child process.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 run-command.c               |  6 +++++-
 run-command.h               |  9 +++++++++
 t/helper/test-run-command.c | 31 ++++++++++++++++++++++++++++---
 t/t0061-run-command.sh      | 26 ++++++++++++++++++++++++++
 4 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/run-command.c b/run-command.c
index 14f17830f5..893bc1d294 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1474,6 +1474,7 @@ enum child_state {
 };
 
 int run_processes_parallel_ungroup;
+int run_processes_parallel_pipe_output;
 struct parallel_processes {
 	void *data;
 
@@ -1770,10 +1771,12 @@ int run_processes_parallel(int n,
 	int output_timeout = 100;
 	int spawn_cap = 4;
 	int ungroup = run_processes_parallel_ungroup;
+	int pipe_output = run_processes_parallel_pipe_output;
 	struct parallel_processes pp;
 
 	/* unset for the next API user */
 	run_processes_parallel_ungroup = 0;
+	run_processes_parallel_pipe_output = 0;
 
 	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb,
 		ungroup);
@@ -1800,7 +1803,8 @@ int run_processes_parallel(int n,
 				pp.children[i].state = GIT_CP_WAIT_CLEANUP;
 		} else {
 			pp_buffer_stderr(&pp, output_timeout);
-			pp_output(&pp);
+			if (!pipe_output)
+				pp_output(&pp);
 		}
 		code = pp_collect_finished(&pp);
 		if (code) {
diff --git a/run-command.h b/run-command.h
index 0e85e5846a..a5b1d63f49 100644
--- a/run-command.h
+++ b/run-command.h
@@ -483,8 +483,17 @@ typedef int (*task_finished_fn)(int result,
  * "run_processes_parallel_ungroup" to "1" before invoking
  * run_processes_parallel(), it will be set back to "0" as soon as the
  * API reads that setting.
+ * 
+ * If the "pipe_output" option is specified, the output will be piped
+ * to task_finished_fn in the "struct strbuf *out" variable. The output
+ * will still be printed unless the callback resets the strbuf. The
+ * "pipe_output" option can be enabled by setting the global
+ * "run_processes_parallel_pipe_output" to "1" before invoking
+ * run_processes_parallel(), it will be set back to "0" as soon as the
+ * API reads that setting.
  */
 extern int run_processes_parallel_ungroup;
+extern int run_processes_parallel_pipe_output;
 int run_processes_parallel(int n,
 			   get_next_task_fn,
 			   start_failure_fn,
diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
index c9283b47af..030e533c6b 100644
--- a/t/helper/test-run-command.c
+++ b/t/helper/test-run-command.c
@@ -20,6 +20,7 @@
 #include "wildmatch.h"
 #include "gettext.h"
 
+static int pipe_output = 0;
 static int number_callbacks;
 static int parallel_next(struct child_process *cp,
 			 struct strbuf *err,
@@ -52,15 +53,32 @@ static int no_job(struct child_process *cp,
 	return 0;
 }
 
+static int task_finished_pipe_output(int result,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	if (err && pipe_output) {
+		fprintf(stderr, "%s", err->buf);
+		strbuf_reset(err);
+	}
+	return 0;
+}
+
 static int task_finished(int result,
 			 struct strbuf *err,
 			 void *pp_cb,
 			 void *pp_task_cb)
 {
-	if (err)
+	if (err) {
 		strbuf_addstr(err, "asking for a quick stop\n");
-	else
+		if (pipe_output) {
+			fprintf(stderr, "%s", err->buf);
+			strbuf_reset(err);
+		}
+	} else {
 		fprintf(stderr, "asking for a quick stop\n");
+	}
 	return 1;
 }
 
@@ -423,13 +441,20 @@ int cmd__run_command(int argc, const char **argv)
 		run_processes_parallel_ungroup = 1;
 	}
 
+	if (!strcmp(argv[1], "--pipe-output")) {
+		argv += 1;
+		argc -= 1;
+		run_processes_parallel_pipe_output = 1;
+		pipe_output = 1;
+	}
+
 	jobs = atoi(argv[2]);
 	strvec_clear(&proc.args);
 	strvec_pushv(&proc.args, (const char **)argv + 3);
 
 	if (!strcmp(argv[1], "run-command-parallel"))
 		exit(run_processes_parallel(jobs, parallel_next,
-					    NULL, NULL, &proc));
+					    NULL, task_finished_pipe_output, &proc));
 
 	if (!strcmp(argv[1], "run-command-abort"))
 		exit(run_processes_parallel(jobs, parallel_next,
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 7b5423eebd..97ca942a74 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -140,6 +140,11 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail
 	test_line_count = 4 err
 '
 
+test_expect_success 'run_command runs pipe_output in parallel with more jobs available than tasks' '
+	test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
 	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
 	test_cmp expect actual
@@ -151,6 +156,11 @@ test_expect_success 'run_command runs ungrouped in parallel with as many jobs as
 	test_line_count = 4 err
 '
 
+test_expect_success 'run_command runs pipe_output in parallel with as many jobs as tasks' '
+	test-tool run-command --pipe-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
 	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
 	test_cmp expect actual
@@ -162,6 +172,12 @@ test_expect_success 'run_command runs ungrouped in parallel with more tasks than
 	test_line_count = 4 err
 '
 
+test_expect_success 'run_command runs pipe_output in parallel with more tasks than jobs available' '
+	test-tool run-command --pipe-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+
 cat >expect <<-EOF
 preloaded output of a child
 asking for a quick stop
@@ -182,6 +198,11 @@ test_expect_success 'run_command is asked to abort gracefully (ungroup)' '
 	test_line_count = 6 err
 '
 
+test_expect_success 'run_command is asked to abort gracefully (pipe_output)' '
+	test-tool run-command --pipe-output run-command-abort 3 false 2>actual &&
+	test_cmp expect actual
+'
+
 cat >expect <<-EOF
 no further jobs available
 EOF
@@ -197,6 +218,11 @@ test_expect_success 'run_command outputs (ungroup) ' '
 	test_cmp expect err
 '
 
+test_expect_success 'run_command outputs (pipe_output) ' '
+	test-tool run-command --pipe-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_trace () {
 	expect="$1"
 	shift
-- 
@@ -140,6 +140,11 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail
 	test_line_count = 4 err
 '
 
+test_expect_success 'run_command runs pipe_output in parallel with more jobs available than tasks' '
+	test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
 	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
 	test_cmp expect actual
@@ -151,6 +156,11 @@ test_expect_success 'run_command runs ungrouped in parallel with as many jobs as
 	test_line_count = 4 err
 '
 
+test_expect_success 'run_command runs pipe_output in parallel with as many jobs as tasks' '
+	test-tool run-command --pipe-output run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
 	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
 	test_cmp expect actual
@@ -162,6 +172,12 @@ test_expect_success 'run_command runs ungrouped in parallel with more tasks than
 	test_line_count = 4 err
 '
 
+test_expect_success 'run_command runs pipe_output in parallel with more tasks than jobs available' '
+	test-tool run-command --pipe-output run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+
 cat >expect <<-EOF
 preloaded output of a child
 asking for a quick stop
@@ -182,6 +198,11 @@ test_expect_success 'run_command is asked to abort gracefully (ungroup)' '
 	test_line_count = 6 err
 '
 
+test_expect_success 'run_command is asked to abort gracefully (pipe_output)' '
+	test-tool run-command --pipe-output run-command-abort 3 false 2>actual &&
+	test_cmp expect actual
+'
+
 cat >expect <<-EOF
 no further jobs available
 EOF
@@ -197,6 +218,11 @@ test_expect_success 'run_command outputs (ungroup) ' '
 	test_cmp expect err
 '
 
+test_expect_success 'run_command outputs (pipe_output) ' '
+	test-tool run-command --pipe-output run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_trace () {
 	expect="$1"
 	shift
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/4] submodule: move status parsing into function
  2022-09-22 23:29 [PATCH 0/4] submodule: parallelize status Calvin Wan
  2022-09-22 23:29 ` [PATCH 1/4] run-command: add pipe_output to run_processes_parallel Calvin Wan
@ 2022-09-22 23:29 ` Calvin Wan
  2022-09-22 23:29 ` [PATCH 3/4] diff-lib: refactor functions Calvin Wan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-22 23:29 UTC (permalink / raw)
  To: git; +Cc: emilyshaffer, Calvin Wan

A future patch requires the ability to parse the output of git
status --porcelain=2. Move parsing code from is_submodule_modified to
parse_status_porcelain.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 submodule.c | 71 +++++++++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3fa5db3ecd..91213ba83c 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1753,6 +1753,43 @@ static int commit_missing_in_sub(const struct object_id *oid, void *data)
 	return type != OBJ_COMMIT;
 }
 
+static int parse_status_porcelain(char *buf, unsigned *dirty_submodule, int ignore_untracked)
+{
+	/* regular untracked files */
+	if (buf[0] == '?')
+		*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
+
+	if (buf[0] == 'u' ||
+		buf[0] == '1' ||
+		buf[0] == '2') {
+		/* T = line type, XY = status, SSSS = submodule state */
+		if (strlen(buf) < strlen("T XY SSSS"))
+			BUG("invalid status --porcelain=2 line %s",
+				buf);
+
+		if (buf[5] == 'S' && buf[8] == 'U')
+			/* nested untracked file */
+			*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
+
+		if (buf[0] == 'u' ||
+			buf[0] == '2' ||
+			memcmp(buf + 5, "S..U", 4))
+			/* other change */
+			*dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
+	}
+
+	if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
+		((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
+		ignore_untracked)) {
+		/*
+		* We're not interested in any further information from
+		* the child any more, neither output nor its exit code.
+		*/
+		return 1;
+	}
+	return 0;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
@@ -1893,39 +1930,9 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
 
 	fp = xfdopen(cp.out, "r");
 	while (strbuf_getwholeline(&buf, fp, '\n') != EOF) {
-		/* regular untracked files */
-		if (buf.buf[0] == '?')
-			dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
-
-		if (buf.buf[0] == 'u' ||
-		    buf.buf[0] == '1' ||
-		    buf.buf[0] == '2') {
-			/* T = line type, XY = status, SSSS = submodule state */
-			if (buf.len < strlen("T XY SSSS"))
-				BUG("invalid status --porcelain=2 line %s",
-				    buf.buf);
-
-			if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
-				/* nested untracked file */
-				dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
-
-			if (buf.buf[0] == 'u' ||
-			    buf.buf[0] == '2' ||
-			    memcmp(buf.buf + 5, "S..U", 4))
-				/* other change */
-				dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
-		}
-
-		if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
-		    ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
-		     ignore_untracked)) {
-			/*
-			 * We're not interested in any further information from
-			 * the child any more, neither output nor its exit code.
-			 */
-			ignore_cp_exit_code = 1;
+		ignore_cp_exit_code = parse_status_porcelain(buf.buf, &dirty_submodule, ignore_untracked);
+		if (ignore_cp_exit_code)
 			break;
-		}
 	}
 	fclose(fp);
 
-- 
@@ -1753,6 +1753,43 @@ static int commit_missing_in_sub(const struct object_id *oid, void *data)
 	return type != OBJ_COMMIT;
 }
 
+static int parse_status_porcelain(char *buf, unsigned *dirty_submodule, int ignore_untracked)
+{
+	/* regular untracked files */
+	if (buf[0] == '?')
+		*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
+
+	if (buf[0] == 'u' ||
+		buf[0] == '1' ||
+		buf[0] == '2') {
+		/* T = line type, XY = status, SSSS = submodule state */
+		if (strlen(buf) < strlen("T XY SSSS"))
+			BUG("invalid status --porcelain=2 line %s",
+				buf);
+
+		if (buf[5] == 'S' && buf[8] == 'U')
+			/* nested untracked file */
+			*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
+
+		if (buf[0] == 'u' ||
+			buf[0] == '2' ||
+			memcmp(buf + 5, "S..U", 4))
+			/* other change */
+			*dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
+	}
+
+	if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
+		((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
+		ignore_untracked)) {
+		/*
+		* We're not interested in any further information from
+		* the child any more, neither output nor its exit code.
+		*/
+		return 1;
+	}
+	return 0;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
@@ -1893,39 +1930,9 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
 
 	fp = xfdopen(cp.out, "r");
 	while (strbuf_getwholeline(&buf, fp, '\n') != EOF) {
-		/* regular untracked files */
-		if (buf.buf[0] == '?')
-			dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
-
-		if (buf.buf[0] == 'u' ||
-		    buf.buf[0] == '1' ||
-		    buf.buf[0] == '2') {
-			/* T = line type, XY = status, SSSS = submodule state */
-			if (buf.len < strlen("T XY SSSS"))
-				BUG("invalid status --porcelain=2 line %s",
-				    buf.buf);
-
-			if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
-				/* nested untracked file */
-				dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
-
-			if (buf.buf[0] == 'u' ||
-			    buf.buf[0] == '2' ||
-			    memcmp(buf.buf + 5, "S..U", 4))
-				/* other change */
-				dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
-		}
-
-		if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
-		    ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
-		     ignore_untracked)) {
-			/*
-			 * We're not interested in any further information from
-			 * the child any more, neither output nor its exit code.
-			 */
-			ignore_cp_exit_code = 1;
+		ignore_cp_exit_code = parse_status_porcelain(buf.buf, &dirty_submodule, ignore_untracked);
+		if (ignore_cp_exit_code)
 			break;
-		}
 	}
 	fclose(fp);
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/4] diff-lib: refactor functions
  2022-09-22 23:29 [PATCH 0/4] submodule: parallelize status Calvin Wan
  2022-09-22 23:29 ` [PATCH 1/4] run-command: add pipe_output to run_processes_parallel Calvin Wan
  2022-09-22 23:29 ` [PATCH 2/4] submodule: move status parsing into function Calvin Wan
@ 2022-09-22 23:29 ` Calvin Wan
  2022-09-23 20:36   ` Junio C Hamano
  2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
  2022-09-23 22:56 ` [PATCH 0/4] submodule: parallelize status Junio C Hamano
  4 siblings, 1 reply; 31+ messages in thread
From: Calvin Wan @ 2022-09-22 23:29 UTC (permalink / raw)
  To: git; +Cc: emilyshaffer, Calvin Wan

Flatten out the if statements in match_stat_with_submodule so the
logic is more readable and easier for future patches to add to.

Move code that updates relevant variables from the end of
run_diff_files to finish_run_diff_files. A future patch will utilize
said function.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 diff-lib.c | 71 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 7eb66a417a..2e148b79e6 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -73,21 +73,50 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
 				     unsigned *dirty_submodule)
 {
 	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
-	if (S_ISGITLINK(ce->ce_mode)) {
-		struct diff_flags orig_flags = diffopt->flags;
-		if (!diffopt->flags.override_submodule_config)
-			set_diffopt_flags_from_submodule_config(diffopt, ce->name);
-		if (diffopt->flags.ignore_submodules)
-			changed = 0;
-		else if (!diffopt->flags.ignore_dirty_submodules &&
-			 (!changed || diffopt->flags.dirty_submodules))
-			*dirty_submodule = is_submodule_modified(ce->name,
-								 diffopt->flags.ignore_untracked_in_submodules);
-		diffopt->flags = orig_flags;
+	struct diff_flags orig_flags = diffopt->flags;
+	if (!S_ISGITLINK(ce->ce_mode))
+		goto cleanup;
+	if (!diffopt->flags.override_submodule_config)
+		set_diffopt_flags_from_submodule_config(diffopt, ce->name);
+	if (diffopt->flags.ignore_submodules) {
+		changed = 0;
+		goto cleanup;
 	}
+	if (!diffopt->flags.ignore_dirty_submodules &&
+		(!changed || diffopt->flags.dirty_submodules))
+			*dirty_submodule = is_submodule_modified(ce->name,
+							diffopt->flags.ignore_untracked_in_submodules);
+cleanup:
+	diffopt->flags = orig_flags;
 	return changed;
 }
 
+static void finish_run_diff_files(struct rev_info *revs,
+						  struct cache_entry *ce,
+						  struct index_state *istate,
+						  int changed, int dirty_submodule,
+						  unsigned int newmode)
+{
+	unsigned int oldmode;
+	const struct object_id *old_oid, *new_oid;
+
+	if (!changed && !dirty_submodule) {
+			ce_mark_uptodate(ce);
+			if (!S_ISGITLINK(ce->ce_mode))
+				mark_fsmonitor_valid(istate, ce);
+			if (!revs->diffopt.flags.find_copies_harder)
+				return;
+		}
+		oldmode = ce->ce_mode;
+		old_oid = &ce->oid;
+		new_oid = changed ? null_oid() : &ce->oid;
+		diff_change(&revs->diffopt, oldmode, newmode,
+			    old_oid, new_oid,
+			    !is_null_oid(old_oid),
+			    !is_null_oid(new_oid),
+			    ce->name, 0, dirty_submodule);
+}
+
 int run_diff_files(struct rev_info *revs, unsigned int option)
 {
 	int entries, i;
@@ -105,11 +134,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		diff_unmerged_stage = 2;
 	entries = istate->cache_nr;
 	for (i = 0; i < entries; i++) {
-		unsigned int oldmode, newmode;
+		unsigned int newmode;
 		struct cache_entry *ce = istate->cache[i];
 		int changed;
 		unsigned dirty_submodule = 0;
-		const struct object_id *old_oid, *new_oid;
 
 		if (diff_can_quit_early(&revs->diffopt))
 			break;
@@ -244,22 +272,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 							    ce_option, &dirty_submodule);
 			newmode = ce_mode_from_stat(ce, st.st_mode);
 		}
-
-		if (!changed && !dirty_submodule) {
-			ce_mark_uptodate(ce);
-			mark_fsmonitor_valid(istate, ce);
-			if (!revs->diffopt.flags.find_copies_harder)
-				continue;
-		}
-		oldmode = ce->ce_mode;
-		old_oid = &ce->oid;
-		new_oid = changed ? null_oid() : &ce->oid;
-		diff_change(&revs->diffopt, oldmode, newmode,
-			    old_oid, new_oid,
-			    !is_null_oid(old_oid),
-			    !is_null_oid(new_oid),
-			    ce->name, 0, dirty_submodule);
-
+		finish_run_diff_files(revs, ce, istate, changed, dirty_submodule, newmode);
 	}
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
-- 
@@ -73,21 +73,50 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
 				     unsigned *dirty_submodule)
 {
 	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
-	if (S_ISGITLINK(ce->ce_mode)) {
-		struct diff_flags orig_flags = diffopt->flags;
-		if (!diffopt->flags.override_submodule_config)
-			set_diffopt_flags_from_submodule_config(diffopt, ce->name);
-		if (diffopt->flags.ignore_submodules)
-			changed = 0;
-		else if (!diffopt->flags.ignore_dirty_submodules &&
-			 (!changed || diffopt->flags.dirty_submodules))
-			*dirty_submodule = is_submodule_modified(ce->name,
-								 diffopt->flags.ignore_untracked_in_submodules);
-		diffopt->flags = orig_flags;
+	struct diff_flags orig_flags = diffopt->flags;
+	if (!S_ISGITLINK(ce->ce_mode))
+		goto cleanup;
+	if (!diffopt->flags.override_submodule_config)
+		set_diffopt_flags_from_submodule_config(diffopt, ce->name);
+	if (diffopt->flags.ignore_submodules) {
+		changed = 0;
+		goto cleanup;
 	}
+	if (!diffopt->flags.ignore_dirty_submodules &&
+		(!changed || diffopt->flags.dirty_submodules))
+			*dirty_submodule = is_submodule_modified(ce->name,
+							diffopt->flags.ignore_untracked_in_submodules);
+cleanup:
+	diffopt->flags = orig_flags;
 	return changed;
 }
 
+static void finish_run_diff_files(struct rev_info *revs,
+						  struct cache_entry *ce,
+						  struct index_state *istate,
+						  int changed, int dirty_submodule,
+						  unsigned int newmode)
+{
+	unsigned int oldmode;
+	const struct object_id *old_oid, *new_oid;
+
+	if (!changed && !dirty_submodule) {
+			ce_mark_uptodate(ce);
+			if (!S_ISGITLINK(ce->ce_mode))
+				mark_fsmonitor_valid(istate, ce);
+			if (!revs->diffopt.flags.find_copies_harder)
+				return;
+		}
+		oldmode = ce->ce_mode;
+		old_oid = &ce->oid;
+		new_oid = changed ? null_oid() : &ce->oid;
+		diff_change(&revs->diffopt, oldmode, newmode,
+			    old_oid, new_oid,
+			    !is_null_oid(old_oid),
+			    !is_null_oid(new_oid),
+			    ce->name, 0, dirty_submodule);
+}
+
 int run_diff_files(struct rev_info *revs, unsigned int option)
 {
 	int entries, i;
@@ -105,11 +134,10 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		diff_unmerged_stage = 2;
 	entries = istate->cache_nr;
 	for (i = 0; i < entries; i++) {
-		unsigned int oldmode, newmode;
+		unsigned int newmode;
 		struct cache_entry *ce = istate->cache[i];
 		int changed;
 		unsigned dirty_submodule = 0;
-		const struct object_id *old_oid, *new_oid;
 
 		if (diff_can_quit_early(&revs->diffopt))
 			break;
@@ -244,22 +272,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 							    ce_option, &dirty_submodule);
 			newmode = ce_mode_from_stat(ce, st.st_mode);
 		}
-
-		if (!changed && !dirty_submodule) {
-			ce_mark_uptodate(ce);
-			mark_fsmonitor_valid(istate, ce);
-			if (!revs->diffopt.flags.find_copies_harder)
-				continue;
-		}
-		oldmode = ce->ce_mode;
-		old_oid = &ce->oid;
-		new_oid = changed ? null_oid() : &ce->oid;
-		diff_change(&revs->diffopt, oldmode, newmode,
-			    old_oid, new_oid,
-			    !is_null_oid(old_oid),
-			    !is_null_oid(new_oid),
-			    ce->name, 0, dirty_submodule);
-
+		finish_run_diff_files(revs, ce, istate, changed, dirty_submodule, newmode);
 	}
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-22 23:29 [PATCH 0/4] submodule: parallelize status Calvin Wan
                   ` (2 preceding siblings ...)
  2022-09-22 23:29 ` [PATCH 3/4] diff-lib: refactor functions Calvin Wan
@ 2022-09-22 23:29 ` Calvin Wan
  2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
                     ` (3 more replies)
  2022-09-23 22:56 ` [PATCH 0/4] submodule: parallelize status Junio C Hamano
  4 siblings, 4 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-22 23:29 UTC (permalink / raw)
  To: git; +Cc: emilyshaffer, Calvin Wan

During the iteration of the index entries in run_diff_files, whenever
a submodule is found and needs its status checked, a subprocess is
spawned for it. Instead of spawning the subprocess immediately and
waiting for its completion to continue, hold onto all submodules and
relevant information in a list. Then use that list to create tasks for
run_processes_parallel. Finished subprocesses pipe their output to
status_finish which parses it and sets the relevant variables.

Add config option status.parallelSubmodules to set the maximum number
of parallel jobs. 

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 Documentation/config/status.txt |   6 ++
 add-interactive.c               |   2 +-
 builtin/add.c                   |   4 +-
 builtin/commit.c                |   6 ++
 builtin/diff-files.c            |   2 +-
 builtin/diff.c                  |   2 +-
 builtin/merge.c                 |   2 +-
 builtin/stash.c                 |   2 +-
 builtin/submodule--helper.c     |   4 +-
 diff-lib.c                      |  55 +++++++++++--
 diff.h                          |   2 +-
 submodule.c                     | 142 ++++++++++++++++++++++++++++++++
 submodule.h                     |   9 ++
 wt-status.c                     |   6 +-
 wt-status.h                     |   1 +
 15 files changed, 226 insertions(+), 19 deletions(-)

diff --git a/Documentation/config/status.txt b/Documentation/config/status.txt
index 0fc704ab80..92c5511fec 100644
--- a/Documentation/config/status.txt
+++ b/Documentation/config/status.txt
@@ -75,3 +75,9 @@ status.submoduleSummary::
 	the --ignore-submodules=dirty command-line option or the 'git
 	submodule summary' command, which shows a similar output but does
 	not honor these settings.
+
+status.parallelSubmodules::
+	When linkgit:git-status[1] is run in a superproject with
+	submodules, a status subprocess is spawned for every submodule.
+	This option specifies the number of submodule status subprocesses
+	to run in parallel. If unset, it defaults to 1.
diff --git a/add-interactive.c b/add-interactive.c
index 22fcd3412c..3e38aa833d 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -565,7 +565,7 @@ static int get_modified_files(struct repository *r,
 			run_diff_index(&rev, 1);
 		else {
 			rev.diffopt.flags.ignore_dirty_submodules = 1;
-			run_diff_files(&rev, 0);
+			run_diff_files(&rev, 0, -1);
 		}
 
 		release_revisions(&rev);
diff --git a/builtin/add.c b/builtin/add.c
index f84372964c..094d59c1b4 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -148,7 +148,7 @@ int add_files_to_cache(const char *prefix,
 	 * may not have their own transaction active.
 	 */
 	begin_odb_transaction();
-	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
+	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED, -1);
 	end_odb_transaction();
 
 	release_revisions(&rev);
@@ -325,7 +325,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
 	out = xopen(file, O_CREAT | O_WRONLY | O_TRUNC, 0666);
 	rev.diffopt.file = xfdopen(out, "w");
 	rev.diffopt.close_file = 1;
-	if (run_diff_files(&rev, 0))
+	if (run_diff_files(&rev, 0, -1))
 		die(_("Could not write patch"));
 
 	if (launch_editor(file, NULL, NULL))
diff --git a/builtin/commit.c b/builtin/commit.c
index fcf9c85947..c5147a7952 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1468,6 +1468,12 @@ static int git_status_config(const char *k, const char *v, void *cb)
 		s->detect_rename = git_config_rename(k, v);
 		return 0;
 	}
+	if (!strcmp(k, "status.parallelsubmodules")) {
+		s->parallel_jobs_submodules = git_config_int(k, v);
+		if (s->parallel_jobs_submodules < 0)
+			die(_("status.parallelsubmodules cannot be negative"));
+		return 0;
+	}
 	return git_diff_ui_config(k, v, NULL);
 }
 
diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 92cf6e1e92..eb1b576440 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -80,7 +80,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
 		result = -1;
 		goto cleanup;
 	}
-	result = run_diff_files(&rev, options);
+	result = run_diff_files(&rev, options, -1);
 	result = diff_result_code(&rev.diffopt, result);
 cleanup:
 	release_revisions(&rev);
diff --git a/builtin/diff.c b/builtin/diff.c
index 54bb3de964..c3c532517b 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -275,7 +275,7 @@ static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv
 		perror("read_cache_preload");
 		return -1;
 	}
-	return run_diff_files(revs, options);
+	return run_diff_files(revs, options, -1);
 }
 
 struct symdiff {
diff --git a/builtin/merge.c b/builtin/merge.c
index f7c92c0e64..ff373fa880 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1001,7 +1001,7 @@ static int evaluate_result(void)
 		DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = count_diff_files;
 	rev.diffopt.format_callback_data = &cnt;
-	run_diff_files(&rev, 0);
+	run_diff_files(&rev, 0, -1);
 
 	/*
 	 * Check how many unmerged entries are
diff --git a/builtin/stash.c b/builtin/stash.c
index 30fa101460..097ea55214 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1075,7 +1075,7 @@ static int check_changes_tracked_files(const struct pathspec *ps)
 		goto done;
 	}
 
-	result = run_diff_files(&rev, 0);
+	result = run_diff_files(&rev, 0, -1);
 	if (diff_result_code(&rev.diffopt, result)) {
 		ret = 1;
 		goto done;
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fac52ade5e..b50dbf5e36 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -679,7 +679,7 @@ static void status_submodule(const char *path, const struct object_id *ce_oid,
 	diff_files_args.nr = setup_revisions(diff_files_args.nr,
 					     diff_files_args.v,
 					     &rev, NULL);
-	diff_files_result = run_diff_files(&rev, 0);
+	diff_files_result = run_diff_files(&rev, 0, -1);
 
 	if (!diff_result_code(&rev.diffopt, diff_files_result)) {
 		print_status(flags, ' ', path, ce_oid,
@@ -1143,7 +1143,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
 	if (diff_cmd == DIFF_INDEX)
 		run_diff_index(&rev, info->cached);
 	else
-		run_diff_files(&rev, 0);
+		run_diff_files(&rev, 0, -1);
 	prepare_submodule_summary(info, &list);
 cleanup:
 	strvec_clear(&diff_args);
diff --git a/diff-lib.c b/diff-lib.c
index 2e148b79e6..ec745755fc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -65,14 +65,19 @@ static int check_removed(const struct index_state *istate, const struct cache_en
  * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES
  * option is set, the caller does not only want to know if a submodule is
  * modified at all but wants to know all the conditions that are met (new
- * commits, untracked content and/or modified content).
+ * commits, untracked content and/or modified content). If
+ * defer_submodule_status bit is set, dirty_submodule will be left to the
+ * caller to set. defer_submodule_status can also be set to 0 in this
+ * function if there is no need to check if the submodule is modified.
  */
 static int match_stat_with_submodule(struct diff_options *diffopt,
 				     const struct cache_entry *ce,
 				     struct stat *st, unsigned ce_option,
-				     unsigned *dirty_submodule)
+				     unsigned *dirty_submodule, int *defer_submodule_status,
+					 int *ignore_untracked_in_submodules)
 {
 	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
+	int defer = 0;
 	struct diff_flags orig_flags = diffopt->flags;
 	if (!S_ISGITLINK(ce->ce_mode))
 		goto cleanup;
@@ -83,11 +88,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
 		goto cleanup;
 	}
 	if (!diffopt->flags.ignore_dirty_submodules &&
-		(!changed || diffopt->flags.dirty_submodules))
+		(!changed || diffopt->flags.dirty_submodules)) {
+		if (defer_submodule_status && *defer_submodule_status) {
+			defer = 1;
+			*ignore_untracked_in_submodules =
+							diffopt->flags.ignore_untracked_in_submodules;
+		} else {
 			*dirty_submodule = is_submodule_modified(ce->name,
 							diffopt->flags.ignore_untracked_in_submodules);
+		}
+	}
 cleanup:
 	diffopt->flags = orig_flags;
+	if (defer_submodule_status)
+		*defer_submodule_status = defer;
 	return changed;
 }
 
@@ -117,7 +131,7 @@ static void finish_run_diff_files(struct rev_info *revs,
 			    ce->name, 0, dirty_submodule);
 }
 
-int run_diff_files(struct rev_info *revs, unsigned int option)
+int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs)
 {
 	int entries, i;
 	int diff_unmerged_stage = revs->max_count;
@@ -125,6 +139,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			      ? CE_MATCH_RACY_IS_DIRTY : 0);
 	uint64_t start = getnanotime();
 	struct index_state *istate = revs->diffopt.repo->index;
+	struct string_list submodules = STRING_LIST_INIT_NODUP;
 
 	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
 
@@ -138,6 +153,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 		struct cache_entry *ce = istate->cache[i];
 		int changed;
 		unsigned dirty_submodule = 0;
+		int defer_submodule_status = revs && revs->repo &&
+							!strcmp(revs->repo->gitdir, ".git");
+		int ignore_untracked_in_submodules;
 
 		if (diff_can_quit_early(&revs->diffopt))
 			break;
@@ -269,11 +287,36 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			}
 
 			changed = match_stat_with_submodule(&revs->diffopt, ce, &st,
-							    ce_option, &dirty_submodule);
+							ce_option, &dirty_submodule,
+							&defer_submodule_status,
+							&ignore_untracked_in_submodules);
 			newmode = ce_mode_from_stat(ce, st.st_mode);
+			if (defer_submodule_status) {
+				struct string_list_item *item =
+								string_list_append(&submodules, ce->name);
+				struct submodule_status_util *util = xmalloc(sizeof(*util));
+				util->changed = changed;
+				util->dirty_submodule = 0;
+				util->ignore_untracked = ignore_untracked_in_submodules;
+				util->newmode = newmode;
+				util->ce = ce;
+				item->util = util;
+				continue;
+			}
 		}
 		finish_run_diff_files(revs, ce, istate, changed, dirty_submodule, newmode);
 	}
+	if (submodules.nr > 0) {
+		if (get_submodules_status(revs->repo, &submodules,
+						parallel_jobs > 0 ? parallel_jobs : 1))
+			BUG("Submodule status failed");
+		for (int i = 0; i < submodules.nr; i++) {
+			struct submodule_status_util *util = submodules.items[i].util;
+
+			finish_run_diff_files(revs, util->ce, NULL, util->changed,
+							util->dirty_submodule, util->newmode);
+		}
+	}
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
 	trace_performance_since(start, "diff-files");
@@ -321,7 +364,7 @@ static int get_stat_data(const struct index_state *istate,
 			return -1;
 		}
 		changed = match_stat_with_submodule(diffopt, ce, &st,
-						    0, dirty_submodule);
+						    0, dirty_submodule, NULL, NULL);
 		if (changed) {
 			mode = ce_mode_from_stat(ce, st.st_mode);
 			oid = null_oid();
diff --git a/diff.h b/diff.h
index 8ae18e5ab1..5a6a615381 100644
--- a/diff.h
+++ b/diff.h
@@ -627,7 +627,7 @@ void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb);
 #define DIFF_SILENT_ON_REMOVED 01
 /* report racily-clean paths as modified */
 #define DIFF_RACY_IS_MODIFIED 02
-int run_diff_files(struct rev_info *revs, unsigned int option);
+int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs);
 
 #define DIFF_INDEX_CACHED 01
 #define DIFF_INDEX_MERGE_BASE 02
diff --git a/submodule.c b/submodule.c
index 91213ba83c..15729bb327 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1362,6 +1362,20 @@ int submodule_touches_in_range(struct repository *r,
 	return ret;
 }
 
+struct submodule_parallel_status {
+	int index_count;
+	int result;
+
+	struct string_list *submodule_names;
+	struct repository *r;
+
+	/* Pending statuses by OIDs */
+	struct status_task **oid_status_tasks;
+	int oid_status_tasks_nr, oid_status_tasks_alloc;
+};
+
+#define SPS_INIT { 0 }
+
 struct submodule_parallel_fetch {
 	/*
 	 * The index of the last index entry processed by
@@ -1444,6 +1458,12 @@ struct fetch_task {
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
 
+struct status_task {
+	struct repository *repo;
+	const char *path;
+	int ignore_untracked;
+};
+
 /**
  * When a submodule is not defined in .gitmodules, we cannot access it
  * via the regular submodule-config. Create a fake submodule, which we can
@@ -1547,6 +1567,41 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
 	return NULL;
 }
 
+static struct status_task *
+get_status_task_from_index(struct submodule_parallel_status *sps,
+			  struct strbuf *err)
+{
+	for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) {
+		struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util;
+		const struct cache_entry *ce = util->ce;
+		struct status_task *task;
+		struct strbuf buf = STRBUF_INIT;
+		const char *git_dir;
+
+		strbuf_addf(&buf, "%s/.git", ce->name);
+		git_dir = read_gitfile(buf.buf);
+		if (!git_dir)
+			git_dir = buf.buf;
+		if (!is_git_directory(git_dir)) {
+			if (is_directory(git_dir))
+				die(_("'%s' not recognized as a git repository"), git_dir);
+			strbuf_release(&buf);
+			/* The submodule is not checked out, so it is not modified */
+			util->dirty_submodule = 0;
+			continue;
+		}
+		strbuf_release(&buf);
+
+		task = xmalloc(sizeof(*task));
+		memset(task, 0, sizeof(*task));
+		task->path = ce->name;
+		task->ignore_untracked = util->ignore_untracked;
+		sps->index_count++;
+		return task;
+	}
+	return NULL;
+}
+
 static struct fetch_task *
 get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
 			  struct strbuf *err)
@@ -1744,6 +1799,16 @@ static int fetch_start_failure(struct strbuf *err,
 	return 0;
 }
 
+static int status_start_failure(struct strbuf *err,
+			       void *cb, void *task_cb)
+{
+	struct submodule_parallel_status *sps = cb;
+
+	sps->result = 1;
+	return 0;
+}
+
+
 static int commit_missing_in_sub(const struct object_id *oid, void *data)
 {
 	struct repository *subrepo = data;
@@ -1790,6 +1855,30 @@ static int parse_status_porcelain(char *buf, unsigned *dirty_submodule, int igno
 	return 0;
 }
 
+static int status_finish(int retvalue, struct strbuf *err,
+			void *cb, void *task_cb)
+{
+	struct submodule_parallel_status *sps = cb;
+	struct status_task *task = task_cb;
+	struct string_list list = STRING_LIST_INIT_DUP;
+	struct string_list_item *it = string_list_lookup(sps->submodule_names,
+											task->path);
+	struct submodule_status_util *util = it->util;
+
+	int i;
+
+	string_list_split(&list, err->buf, '\n', -1);
+
+	for (i = 0; i < list.nr; i++) {
+		if (parse_status_porcelain(list.items[i].string,
+							&util->dirty_submodule, util->ignore_untracked))
+			break;
+	}
+	strbuf_reset(err);
+
+	return 0;
+}
+
 static int fetch_finish(int retvalue, struct strbuf *err,
 			void *cb, void *task_cb)
 {
@@ -1943,6 +2032,59 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
 	return dirty_submodule;
 }
 
+static int get_next_submodule_status(struct child_process *cp,
+						struct strbuf *err, void *data, void **task_cb)
+{
+	struct submodule_parallel_status *sps = data;
+	struct status_task *task = get_status_task_from_index(sps, err);
+
+	int ignore_untracked;
+
+	if (!task) {
+		return 0;
+	}
+
+	ignore_untracked = task->ignore_untracked;
+
+	child_process_init(cp);
+	prepare_submodule_repo_env_in_gitdir(&cp->env);
+
+	strvec_init(&cp->args);
+	strvec_pushl(&cp->args, "status", "--porcelain=2", NULL);
+	if (ignore_untracked)
+		strvec_push(&cp->args, "-uno");
+
+	prepare_submodule_repo_env(&cp->env);
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->dir = task->path;
+	*task_cb = task;
+	return 1;
+}
+
+int get_submodules_status(struct repository *r,
+			struct string_list *submodules,
+		    int max_parallel_jobs)
+{
+	struct submodule_parallel_status sps = SPS_INIT;
+
+	sps.r = r;
+
+	if (repo_read_index(r) < 0)
+		die(_("index file corrupt"));
+
+	sps.submodule_names = submodules;
+	string_list_sort(sps.submodule_names);
+	run_processes_parallel_pipe_output = 1;
+	run_processes_parallel_tr2(max_parallel_jobs,
+				get_next_submodule_status,
+				status_start_failure,
+				status_finish,
+				&sps,
+				"submodule", "parallel/status");
+	return sps.result;
+}
+
 int submodule_uses_gitfile(const char *path)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
diff --git a/submodule.h b/submodule.h
index bfaa9da186..18a42c64ce 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,6 +41,12 @@ struct submodule_update_strategy {
 	.type = SM_UPDATE_UNSPECIFIED, \
 }
 
+struct submodule_status_util {
+	int changed, ignore_untracked;
+	unsigned dirty_submodule, newmode;
+	struct cache_entry *ce;
+};
+
 int is_gitmodules_unmerged(struct index_state *istate);
 int is_writing_gitmodules_ok(void);
 int is_staging_gitmodules_ok(struct index_state *istate);
@@ -94,6 +100,9 @@ int fetch_submodules(struct repository *r,
 		     int command_line_option,
 		     int default_option,
 		     int quiet, int max_parallel_jobs);
+int get_submodules_status(struct repository *r,
+			 struct string_list *submodules,
+		     int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 
diff --git a/wt-status.c b/wt-status.c
index 867e3e417e..9864484f81 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -615,7 +615,7 @@ static void wt_status_collect_changes_worktree(struct wt_status *s)
 	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
 	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
 	copy_pathspec(&rev.prune_data, &s->pathspec);
-	run_diff_files(&rev, 0);
+	run_diff_files(&rev, 0, s->parallel_jobs_submodules);
 	release_revisions(&rev);
 }
 
@@ -1149,7 +1149,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
 		setup_work_tree();
 		rev.diffopt.a_prefix = "i/";
 		rev.diffopt.b_prefix = "w/";
-		run_diff_files(&rev, 0);
+		run_diff_files(&rev, 0, -1);
 	}
 	release_revisions(&rev);
 }
@@ -2544,7 +2544,7 @@ int has_unstaged_changes(struct repository *r, int ignore_submodules)
 	}
 	rev_info.diffopt.flags.quick = 1;
 	diff_setup_done(&rev_info.diffopt);
-	result = run_diff_files(&rev_info, 0);
+	result = run_diff_files(&rev_info, 0, -1);
 	result = diff_result_code(&rev_info.diffopt, result);
 	release_revisions(&rev_info);
 	return result;
diff --git a/wt-status.h b/wt-status.h
index ab9cc9d8f0..2ea2317715 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -131,6 +131,7 @@ struct wt_status {
 	enum wt_status_format status_format;
 	struct wt_status_state state;
 	struct object_id oid_commit; /* when not Initial */
+	int parallel_jobs_submodules;
 
 	/* These are computed during processing of the individual sections */
 	int committable;
-- 
@@ -131,6 +131,7 @@ struct wt_status {
 	enum wt_status_format status_format;
 	struct wt_status_state state;
 	struct object_id oid_commit; /* when not Initial */
+	int parallel_jobs_submodules;
 
 	/* These are computed during processing of the individual sections */
 	int committable;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-22 23:29 ` [PATCH 1/4] run-command: add pipe_output to run_processes_parallel Calvin Wan
@ 2022-09-23  7:52   ` Ævar Arnfjörð Bjarmason
  2022-09-26 16:59     ` Calvin Wan
  2022-09-23 18:58   ` Junio C Hamano
  1 sibling, 1 reply; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-23  7:52 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer


On Thu, Sep 22 2022, Calvin Wan wrote:

> diff --git a/run-command.c b/run-command.c
> index 14f17830f5..893bc1d294 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1474,6 +1474,7 @@ enum child_state {
>  };
>  
>  int run_processes_parallel_ungroup;
> +int run_processes_parallel_pipe_output;
>  struct parallel_processes {
>  	void *data;
>  
> @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n,
>  	int output_timeout = 100;
>  	int spawn_cap = 4;
>  	int ungroup = run_processes_parallel_ungroup;
> +	int pipe_output = run_processes_parallel_pipe_output;
>  	struct parallel_processes pp;
>  
>  	/* unset for the next API user */
>  	run_processes_parallel_ungroup = 0;
> +	run_processes_parallel_pipe_output = 0;
>  
>  	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb,
>  		ungroup);

I guess we could live with this, but this passing a function argument as
a global variable interface is something that came out of a topic to fix
a release regression:
https://lore.kernel.org/git/cover-0.6-00000000000-20220421T122108Z-avarab@gmail.com/

An earlier version of that series simply changed the API to pass an
"opts" struct instead:
https://lore.kernel.org/git/patch-v2-2.8-5f0a6e9925f-20220518T195858Z-avarab@gmail.com/

I really should have submitted those post-release cleanup patches
already, and I'm not sure whether the right thing at this point is to
take this & do the cleanup for "ungroup" *and* this new argument later.

But maybe you're interested in cherry-picking & adjusting the relevant
part of that series for this one? I.e. we're not in some post-release
regression hurry, so rather than extending the use of this odd interface
we could (and maybe should) just fix how we're doing it first.

On the implementation:

> + * If the "pipe_output" option is specified, the output will be piped
> + * to task_finished_fn in the "struct strbuf *out" variable. The output
> + * will still be printed unless the callback resets the strbuf. The
> + * "pipe_output" option can be enabled by setting the global
> + * "run_processes_parallel_pipe_output" to "1" before invoking
> + * run_processes_parallel(), it will be set back to "0" as soon as the
> + * API reads that setting.

...okey, but...

> +static int task_finished_pipe_output(int result,
> +			 struct strbuf *err,
> +			 void *pp_cb,
> +			 void *pp_task_cb)
> +{
> +	if (err && pipe_output) {
> +		fprintf(stderr, "%s", err->buf);
> +		strbuf_reset(err);

...my memory's hazy, and I haven't re-logged in any detail, but is it
really the API interface here that the "output" callback function is
responsible for resetting the strbuf that the API gives to it?

That seems backwards to me, and e.g. a look at "start_failure" shows
that we strbuf_reset() the "err".

What's the point of doing it in the API consumer? If it doesn't do it
we'll presumably keep accumulating output. Is there a use-case for that?

Or perhaps it's not needed & this is really just misleading boilerplate?

> @@ -140,6 +140,11 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail
>  	test_line_count = 4 err
>  '
>  
> +test_expect_success 'run_command runs pipe_output in parallel with more jobs available than tasks' '
> +	test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
> +	test_cmp expect actual
> +'
> +

Like the global argument, the copy/pasting for "ungroup" was mostly a
matter of expediency.

But at least in that case we have a different assertion (test_cmp
v.s. test_line_count).

But here this test case seems to be exactly the same as for the
"vanilla" version.

So can't we make this some:

	for opt in '' '--pipe-output'
	do
		test_expect_success ...
	done

?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
@ 2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
  2022-09-24 20:17     ` Junio C Hamano
  2022-09-26 17:50     ` Calvin Wan
  2022-09-23 21:44   ` Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-23  8:06 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer


On Thu, Sep 22 2022, Calvin Wan wrote:

> +status.parallelSubmodules::
> +	When linkgit:git-status[1] is run in a superproject with
> +	submodules, a status subprocess is spawned for every submodule.
> +	This option specifies the number of submodule status subprocesses
> +	to run in parallel. If unset, it defaults to 1.

Why do we default to 1, instead of e.g. grep.threads defaulting to the
"cores available"?

> +struct submodule_status_util {
> +	int changed, ignore_untracked;

nit: we tend not to group by type, which also makes adding API docs
later easier.

Except that we tend to do that if all the things are really boolean flags, which these are, so maybe:

	unsigned int changed:1,
        	     ignore_untracked:1;

?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-22 23:29 ` [PATCH 1/4] run-command: add pipe_output to run_processes_parallel Calvin Wan
  2022-09-23  7:52   ` Ævar Arnfjörð Bjarmason
@ 2022-09-23 18:58   ` Junio C Hamano
  2022-09-26 17:31     ` Calvin Wan
  1 sibling, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2022-09-23 18:58 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer

Calvin Wan <calvinwan@google.com> writes:

> run_processes_parallel periodically collects output from its child
> processes, prints it, and then resets the buffers for each child.
> Add run_processes_parallel_pipe_output variable so output can be
> collected and fed to task_finished. When set, the function referenced
> by task_finished should parse the output of each child process.

I am puzzled.

 * Why are we configuring an API behaviour via a global variable in
   21st century?

 * The name "task_finished" is mentioned, but it is unclear what it
   is.  Is it one of the parameters to run_process_parallel()?

 * Is the effect of the new feature that task_finished callback is
   called with the output, in addition to the normal output?  I am
   not sure why it is called "pipe".  The task_finished callback may
   be free to fork a child and send the received output from the
   task to that child over the pipe, but that is what a client code
   could do and is inappropriate to base the name of the mechanism,
   isn't it?

> @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n,
>  	int output_timeout = 100;
>  	int spawn_cap = 4;
>  	int ungroup = run_processes_parallel_ungroup;
> +	int pipe_output = run_processes_parallel_pipe_output;
>  	struct parallel_processes pp;
>  
>  	/* unset for the next API user */
>  	run_processes_parallel_ungroup = 0;
> +	run_processes_parallel_pipe_output = 0;
>  
>  	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb,
>  		ungroup);
> @@ -1800,7 +1803,8 @@ int run_processes_parallel(int n,
>  				pp.children[i].state = GIT_CP_WAIT_CLEANUP;
>  		} else {
>  			pp_buffer_stderr(&pp, output_timeout);
> -			pp_output(&pp);
> +			if (!pipe_output)
> +				pp_output(&pp);

So, we do not send the output from the child to the regular output
channel when pipe_output is in effect.  OK.

>  		}
>  		code = pp_collect_finished(&pp);
>  		if (code) {

And no other code changes?  This is quite different from what I
expected from reading the proposed log message.

Am I correct to say that under this new mode, we no longer flush any
output while the child task is running (due to the change in the
above hunk to omit calls to pp_output() during the run) and instead
keep accumulating in the strbuf, until the child task finishes, at
which time pp_collect_finished() will call task_finished callback.

Even though the callback usually consumes the last piece of the
output since the last pp_output() call made during the normal
execution of the run_processes_parallel() loop, because we omitted
these calls, we have full output from the child task accumulated in
the children[].err strbuf.  We may still not output .err for real,
as we may not be the output_owner, in which case we may only append
to .buffered_output member.

I am puzzled simply because, if the above summary is correct, I do
not see how a word "pipe" have a chance to come into the picture.

I can sort of see that in this mode, we would end up buffering the
entire output from each child task into one strbuf each, and can
avoid stalling the child tasks waiting for their turn to see their
output pipes drained.  But is this a reasonable thing to do?  How do
we control the memory consumption to avoid having to spool unbounded
amount of output from child tasks in core, or do we have a good
reason to believe that we do not have to bother?

Thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/4] diff-lib: refactor functions
  2022-09-22 23:29 ` [PATCH 3/4] diff-lib: refactor functions Calvin Wan
@ 2022-09-23 20:36   ` Junio C Hamano
  2022-09-26 17:35     ` Calvin Wan
  0 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2022-09-23 20:36 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer

Calvin Wan <calvinwan@google.com> writes:

> Flatten out the if statements in match_stat_with_submodule so the
> logic is more readable and easier for future patches to add to.
>
> Move code that updates relevant variables from the end of
> run_diff_files to finish_run_diff_files. A future patch will utilize
> said function.
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>  diff-lib.c | 71 ++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 42 insertions(+), 29 deletions(-)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index 7eb66a417a..2e148b79e6 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -73,21 +73,50 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
>  				     unsigned *dirty_submodule)
>  {
>  	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
> -	if (S_ISGITLINK(ce->ce_mode)) {
> -		struct diff_flags orig_flags = diffopt->flags;
> -		if (!diffopt->flags.override_submodule_config)
> -			set_diffopt_flags_from_submodule_config(diffopt, ce->name);
> -		if (diffopt->flags.ignore_submodules)
> -			changed = 0;
> -		else if (!diffopt->flags.ignore_dirty_submodules &&
> -			 (!changed || diffopt->flags.dirty_submodules))
> -			*dirty_submodule = is_submodule_modified(ce->name,
> -								 diffopt->flags.ignore_untracked_in_submodules);
> -		diffopt->flags = orig_flags;
> +	struct diff_flags orig_flags = diffopt->flags;
> +	if (!S_ISGITLINK(ce->ce_mode))
> +		goto cleanup;
> +	if (!diffopt->flags.override_submodule_config)
> +		set_diffopt_flags_from_submodule_config(diffopt, ce->name);
> +	if (diffopt->flags.ignore_submodules) {
> +		changed = 0;
> +		goto cleanup;
>  	}
> +	if (!diffopt->flags.ignore_dirty_submodules &&
> +		(!changed || diffopt->flags.dirty_submodules))
> +			*dirty_submodule = is_submodule_modified(ce->name,
> +							diffopt->flags.ignore_untracked_in_submodules);
> +cleanup:
> +	diffopt->flags = orig_flags;
>  	return changed;
>  }

Unlike the original, this always makes two needless structure
assignments for anything that is not a submodule.  

Can we fix that regression before moving forward?

Even when ce_mode is a gitlink, if .ignore_submodules bit is set,
the two structure assignments between diffopt->flags and orig_flags
become necessary, so the original was already doing extra copies but
we do not have to make it worse.

> +static void finish_run_diff_files(struct rev_info *revs,
> +						  struct cache_entry *ce,
> +						  struct index_state *istate,
> +						  int changed, int dirty_submodule,
> +						  unsigned int newmode)
> +{
> +	unsigned int oldmode;
> +	const struct object_id *old_oid, *new_oid;
> +
> +	if (!changed && !dirty_submodule) {
> +			ce_mark_uptodate(ce);
> +			if (!S_ISGITLINK(ce->ce_mode))
> +				mark_fsmonitor_valid(istate, ce);
> +			if (!revs->diffopt.flags.find_copies_harder)
> +				return;
> +		}
> +		oldmode = ce->ce_mode;
> +		old_oid = &ce->oid;
> +		new_oid = changed ? null_oid() : &ce->oid;
> +		diff_change(&revs->diffopt, oldmode, newmode,
> +			    old_oid, new_oid,
> +			    !is_null_oid(old_oid),
> +			    !is_null_oid(new_oid),
> +			    ce->name, 0, dirty_submodule);
> +}

Strange indentation.  It is unclear why this bottom 1/3 of the loop
body of run_diff_files() need to be a separate helper function,
while the top 2/3 does not.  The resulting loop (below) becomes very
hard to follow because the reader cannot tell when diff_change() is
called and when it is not.

Overall, I see this change detrimental to diff-lib API at this step
in the series.  Later steps may show something more rewarding than
the downsides we see here, hopefully.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
  2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
@ 2022-09-23 21:44   ` Junio C Hamano
  2022-09-26 19:12     ` Calvin Wan
  2022-09-25 13:59   ` Phillip Wood
  2022-09-27 18:40   ` Emily Shaffer
  3 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2022-09-23 21:44 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer

Calvin Wan <calvinwan@google.com> writes:

> During the iteration of the index entries in run_diff_files, whenever
> a submodule is found and needs its status checked, a subprocess is
> spawned for it. Instead of spawning the subprocess immediately and
> waiting for its completion to continue, hold onto all submodules and
> relevant information in a list. Then use that list to create tasks for
> run_processes_parallel. Finished subprocesses pipe their output to
> status_finish which parses it and sets the relevant variables.

Excellent---I like the idea very much.

You make it sound as if this is only for "git status", but shouldn't
it benefit the usual "git diff" the same way when you have
submodules that can be dirty?

> Add config option status.parallelSubmodules to set the maximum number
> of parallel jobs. 

Configuration is fine, but I am having a hard time justifying the
addition of an extra parameter to run_diff_files().  It might be
more palatable to give a new bit (default off) in the rev structure
that tells it that it is OK to go into the "defer_submodule_status"
mode, if we absolutely want to change the behaviour of
run_diff_files() only for a single caller or something, but I doubt
it should even be needed.

I cannot think of a sane user interface that says "when
run_diff_files() is invoked as an implementation detail of 'status',
use this value, and when it is running for another command 'foo',
use this other value".  How would a user decide to pick a different
value for different commands?

Letting a single configuration variable to decide the degree of
parallelism inside run_diff_files() would be sufficient, and we
shouldn't have to touch all the existing calling sites of
run_diff_files() all over the place.  If you absolutely want to do
this, I'd rather see the new member for the configuration variable
not in wt_status but in rev_info.

> +status.parallelSubmodules::
> +	When linkgit:git-status[1] is run in a superproject with
> +	submodules, a status subprocess is spawned for every submodule.
> +	This option specifies the number of submodule status subprocesses
> +	to run in parallel. If unset, it defaults to 1.

As I said, I am not sure <cmd>.parallelSubmodules per command makes
much sense.  "If unset, it defaults to" sounds a bit redundant (if
it is explicitly set, the default value should not matter).

> @@ -83,11 +88,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
>  		goto cleanup;
>  	}
>  	if (!diffopt->flags.ignore_dirty_submodules &&
> -		(!changed || diffopt->flags.dirty_submodules))
> +		(!changed || diffopt->flags.dirty_submodules)) {
> +		if (defer_submodule_status && *defer_submodule_status) {
> +			defer = 1;
> +			*ignore_untracked_in_submodules =
> +							diffopt->flags.ignore_untracked_in_submodules;
> +		} else {
>  			*dirty_submodule = is_submodule_modified(ce->name,
>  							diffopt->flags.ignore_untracked_in_submodules);
> +		}
> +	}

OK, so the caller can look at *defer

>  cleanup:
>  	diffopt->flags = orig_flags;
> +	if (defer_submodule_status)
> +		*defer_submodule_status = defer;
>  	return changed;
>  }
>  
> @@ -117,7 +131,7 @@ static void finish_run_diff_files(struct rev_info *revs,
>  			    ce->name, 0, dirty_submodule);
>  }
>  
> -int run_diff_files(struct rev_info *revs, unsigned int option)
> +int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs)
>  {
>  	int entries, i;
>  	int diff_unmerged_stage = revs->max_count;
> @@ -125,6 +139,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			      ? CE_MATCH_RACY_IS_DIRTY : 0);
>  	uint64_t start = getnanotime();
>  	struct index_state *istate = revs->diffopt.repo->index;
> +	struct string_list submodules = STRING_LIST_INIT_NODUP;
>  
>  	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
>  
> @@ -138,6 +153,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  		struct cache_entry *ce = istate->cache[i];
>  		int changed;
>  		unsigned dirty_submodule = 0;
> +		int defer_submodule_status = revs && revs->repo &&
> +							!strcmp(revs->repo->gitdir, ".git");

What is this overly deeply indented comparison with ".git" doing?
What are we checking and why?  Is that something we need to do for
each and every active_cache[] entry, or isn't it constant over the
loop?

> +		int ignore_untracked_in_submodules;
>  
>  		if (diff_can_quit_early(&revs->diffopt))
>  			break;
> @@ -269,11 +287,36 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			}
>  
>  			changed = match_stat_with_submodule(&revs->diffopt, ce, &st,
> +							ce_option, &dirty_submodule,
> +							&defer_submodule_status,
> +							&ignore_untracked_in_submodules);
>  			newmode = ce_mode_from_stat(ce, st.st_mode);
> +			if (defer_submodule_status) {
> +				struct string_list_item *item =
> +								string_list_append(&submodules, ce->name);
> +				struct submodule_status_util *util = xmalloc(sizeof(*util));
> +				util->changed = changed;
> +				util->dirty_submodule = 0;
> +				util->ignore_untracked = ignore_untracked_in_submodules;
> +				util->newmode = newmode;
> +				util->ce = ce;
> +				item->util = util;
> +				continue;

This makes me wonder if defer_submodule_status should be a string
list that receives the punted submodules---iow, don't these details
belong to match_stat_with_submodule() rather than its caller here?

Even better may be to start a new child task for the submodule here,
letting it work while the parent process moves on to the next entry
in the active_cache[].  I do not know if you thought about doing it
that way.

I dunno.

> +			}
>  		}
>  		finish_run_diff_files(revs, ce, istate, changed, dirty_submodule, newmode);
>  	}
> +	if (submodules.nr > 0) {
> +		if (get_submodules_status(revs->repo, &submodules,
> +						parallel_jobs > 0 ? parallel_jobs : 1))
> +			BUG("Submodule status failed");
> +		for (int i = 0; i < submodules.nr; i++) {

We still do not allow "for (type var = init; ...)" if I am not
mistaken.  Check the coding guidelines.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] submodule: parallelize status
  2022-09-22 23:29 [PATCH 0/4] submodule: parallelize status Calvin Wan
                   ` (3 preceding siblings ...)
  2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
@ 2022-09-23 22:56 ` Junio C Hamano
  2022-09-26 16:33   ` Calvin Wan
  4 siblings, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2022-09-23 22:56 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer

Calvin Wan <calvinwan@google.com> writes:

> Future work: The reason why status is much slower on a cold cache vs
> warm cache is mainly due to refreshing the index. It is worth
> investigating whether this is entirely necessary.

I suspect that the CI breakage

  https://github.com/git/git/actions/runs/3115673002/jobs/5052804463

is due to this topic.  

In "SANTIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make test" run
locally, I see t7507 failing quite badly.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
@ 2022-09-24 20:17     ` Junio C Hamano
  2022-09-26 17:50     ` Calvin Wan
  1 sibling, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2022-09-24 20:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Calvin Wan, git, emilyshaffer

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Sep 22 2022, Calvin Wan wrote:
>
>> +status.parallelSubmodules::
>> +	When linkgit:git-status[1] is run in a superproject with
>> +	submodules, a status subprocess is spawned for every submodule.
>> +	This option specifies the number of submodule status subprocesses
>> +	to run in parallel. If unset, it defaults to 1.
>
> Why do we default to 1, instead of e.g. grep.threads defaulting to the
> "cores available"?

I would imagine we would want to be able to say:

 - I do not trust the parallel mode yet, just use the single process
   method that we have always been using.

 - I do not know how many cores I have, just use a reasonable
   default parallelism.

 - I want to use N processes because I know better than auto-scaling
   based on num_cpus.

And the value of 1 would be a reasonable way to express the first
one.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
  2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
  2022-09-23 21:44   ` Junio C Hamano
@ 2022-09-25 13:59   ` Phillip Wood
  2022-09-26 17:11     ` Junio C Hamano
  2022-09-26 19:22     ` Calvin Wan
  2022-09-27 18:40   ` Emily Shaffer
  3 siblings, 2 replies; 31+ messages in thread
From: Phillip Wood @ 2022-09-25 13:59 UTC (permalink / raw)
  To: Calvin Wan, git; +Cc: emilyshaffer, Ævar Arnfjörð Bjarmason

Hi Calvin

On 23/09/2022 00:29, Calvin Wan wrote:
> During the iteration of the index entries in run_diff_files, whenever
> a submodule is found and needs its status checked, a subprocess is
> spawned for it. Instead of spawning the subprocess immediately and
> waiting for its completion to continue, hold onto all submodules and
> relevant information in a list. Then use that list to create tasks for
> run_processes_parallel. Finished subprocesses pipe their output to
> status_finish which parses it and sets the relevant variables.
> 
> Add config option status.parallelSubmodules to set the maximum number
> of parallel jobs.

I suspect in the future we may want to parallelize other commands for 
submodules in which case a more general name such as submodules.threads 
might be a better choice. The speed up in the cover letter is 
impressive, could this be safely enabled by default?

> index fcf9c85947..c5147a7952 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1468,6 +1468,12 @@ static int git_status_config(const char *k, const char *v, void *cb)
>   		s->detect_rename = git_config_rename(k, v);
>   		return 0;
>   	}
> +	if (!strcmp(k, "status.parallelsubmodules")) {
> +		s->parallel_jobs_submodules = git_config_int(k, v);
> +		if (s->parallel_jobs_submodules < 0)
> +			die(_("status.parallelsubmodules cannot be negative"));

What does a value of zero mean?

> diff --git a/diff-lib.c b/diff-lib.c
> index 2e148b79e6..ec745755fc 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c

> -int run_diff_files(struct rev_info *revs, unsigned int option)
> +int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs)

Another possibility would be to add a member to struct diff_opts, rather 
than changing the function signature here, I'm wondering what the trade 
offs of the two approaches are. Also seeing all the callers from other 
commands being changed made me wonder if they would benefit from 
parallelizing submodules as well. There aren't any tests - could we use 
GIT_TRACE2 to check that we are running the submodule diffs in parallel?

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] submodule: parallelize status
  2022-09-23 22:56 ` [PATCH 0/4] submodule: parallelize status Junio C Hamano
@ 2022-09-26 16:33   ` Calvin Wan
  0 siblings, 0 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-26 16:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, emilyshaffer

I'm getting the following error running that command locally:
Bailout called.  Further testing stopped:
GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when
compiled with SANITIZE=leak

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-23  7:52   ` Ævar Arnfjörð Bjarmason
@ 2022-09-26 16:59     ` Calvin Wan
  2022-09-27 10:52       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 31+ messages in thread
From: Calvin Wan @ 2022-09-26 16:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, emilyshaffer

> An earlier version of that series simply changed the API to pass an
> "opts" struct instead:
> https://lore.kernel.org/git/patch-v2-2.8-5f0a6e9925f-20220518T195858Z-avarab@gmail.com/
>
> I really should have submitted those post-release cleanup patches
> already, and I'm not sure whether the right thing at this point is to
> take this & do the cleanup for "ungroup" *and* this new argument later.
>
> But maybe you're interested in cherry-picking & adjusting the relevant
> part of that series for this one? I.e. we're not in some post-release
> regression hurry, so rather than extending the use of this odd interface
> we could (and maybe should) just fix how we're doing it first.

I'll go ahead and give this a try. I was also a little bit surprised that
"ungroup" was set this way, but didn't realize it was for a quick fix.

>
> On the implementation:
>
> > + * If the "pipe_output" option is specified, the output will be piped
> > + * to task_finished_fn in the "struct strbuf *out" variable. The output
> > + * will still be printed unless the callback resets the strbuf. The
> > + * "pipe_output" option can be enabled by setting the global
> > + * "run_processes_parallel_pipe_output" to "1" before invoking
> > + * run_processes_parallel(), it will be set back to "0" as soon as the
> > + * API reads that setting.
>
> ...okey, but...
>
> > +static int task_finished_pipe_output(int result,
> > +                      struct strbuf *err,
> > +                      void *pp_cb,
> > +                      void *pp_task_cb)
> > +{
> > +     if (err && pipe_output) {
> > +             fprintf(stderr, "%s", err->buf);
> > +             strbuf_reset(err);
>
> ...my memory's hazy, and I haven't re-logged in any detail, but is it
> really the API interface here that the "output" callback function is
> responsible for resetting the strbuf that the API gives to it?
>
> That seems backwards to me, and e.g. a look at "start_failure" shows
> that we strbuf_reset() the "err".
>
> What's the point of doing it in the API consumer? If it doesn't do it
> we'll presumably keep accumulating output. Is there a use-case for that?
>
> Or perhaps it's not needed & this is really just misleading boilerplate?

Ultimately it is not needed -- I added it as an example to showcase that
the output is correctly being piped to "task_finished_pipe_output". The
reset is necessary in this case to prevent the output from being printed
twice. I'm not sure how exactly else I would go about testing "pipe_output".

>
> > @@ -140,6 +140,11 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail
> >       test_line_count = 4 err
> >  '
> >
> > +test_expect_success 'run_command runs pipe_output in parallel with more jobs available than tasks' '
> > +     test-tool run-command --pipe-output run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
> > +     test_cmp expect actual
> > +'
> > +
>
> Like the global argument, the copy/pasting for "ungroup" was mostly a
> matter of expediency.
>
> But at least in that case we have a different assertion (test_cmp
> v.s. test_line_count).
>
> But here this test case seems to be exactly the same as for the
> "vanilla" version.
>
> So can't we make this some:
>
>         for opt in '' '--pipe-output'
>         do
>                 test_expect_success ...
>         done
>
> ?

Yes we can -- but I may need to rethink how instead I should be testing
this option?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-25 13:59   ` Phillip Wood
@ 2022-09-26 17:11     ` Junio C Hamano
  2022-09-26 19:22     ` Calvin Wan
  1 sibling, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2022-09-26 17:11 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Calvin Wan, git, emilyshaffer, Ævar Arnfjörð Bjarmason

Phillip Wood <phillip.wood123@gmail.com> writes:

>> +	if (!strcmp(k, "status.parallelsubmodules")) {
>> +		s->parallel_jobs_submodules = git_config_int(k, v);
>> +		if (s->parallel_jobs_submodules < 0)
>> +			die(_("status.parallelsubmodules cannot be negative"));
>
> What does a value of zero mean?

Good question.  I don't remember what the code in the patch I read
actually does, but I would imagine we would want to be able to say:

 - I do not trust the parallel mode yet, just use the single process
   method that we have always been using.

 - I do not know how many cores I have, just use a reasonable
   default parallelism.

 - I want to use N processes because I know better than auto-scaling
   based on num_cpus.

And the value of 1 would be a reasonable way to express the first
one, and 0 would be a reasonable thing to do for the second one.

>> -int run_diff_files(struct rev_info *revs, unsigned int option)
>> +int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs)
>
> Another possibility would be to add a member to struct diff_opts,

Yes, absolutely.  Somewhere that is reachable from rev_info
structure would be more appropriate.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-23 18:58   ` Junio C Hamano
@ 2022-09-26 17:31     ` Calvin Wan
  2022-09-27  4:45       ` Junio C Hamano
  2022-09-27  9:05       ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-26 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, emilyshaffer

>  * Why are we configuring an API behaviour via a global variable in
>    21st century?

I was mimicking how "ungroup" worked, but now that Avar mentions
that pattern was for a quick regression fix, I can fix it to pass it in as a
parameter.

>  * The name "task_finished" is mentioned, but it is unclear what it
>    is.  Is it one of the parameters to run_process_parallel()?

It is one of the callback functions passed in as a parameter to
run_process_paraller(). I'll go ahead and clarify that.

>  * Is the effect of the new feature that task_finished callback is
>    called with the output, in addition to the normal output?  I am
>    not sure why it is called "pipe".  The task_finished callback may
>    be free to fork a child and send the received output from the
>    task to that child over the pipe, but that is what a client code
>    could do and is inappropriate to base the name of the mechanism,
>    isn't it?

The output in task_finished callback, before pipe_output, either
contains part of the output or the entire output of the child process,
since the output is periodically collected into stderr and then reset.
The intention of output I believe is for the caller to be able to add
anything they would like to the end (this can be seen with functions
like fetch_finished() in builtin/fetch.c). My intention with pipe_output
is to guarantee that output contains the entire output of the child
process so task_finished can utilize it.

>
> > @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n,
> >       int output_timeout = 100;
> >       int spawn_cap = 4;
> >       int ungroup = run_processes_parallel_ungroup;
> > +     int pipe_output = run_processes_parallel_pipe_output;
> >       struct parallel_processes pp;
> >
> >       /* unset for the next API user */
> >       run_processes_parallel_ungroup = 0;
> > +     run_processes_parallel_pipe_output = 0;
> >
> >       pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb,
> >               ungroup);
> > @@ -1800,7 +1803,8 @@ int run_processes_parallel(int n,
> >                               pp.children[i].state = GIT_CP_WAIT_CLEANUP;
> >               } else {
> >                       pp_buffer_stderr(&pp, output_timeout);
> > -                     pp_output(&pp);
> > +                     if (!pipe_output)
> > +                             pp_output(&pp);
>
> So, we do not send the output from the child to the regular output
> channel when pipe_output is in effect.  OK.
>
> >               }
> >               code = pp_collect_finished(&pp);
> >               if (code) {
>
> And no other code changes?  This is quite different from what I
> expected from reading the proposed log message.
>
> Am I correct to say that under this new mode, we no longer flush any
> output while the child task is running (due to the change in the
> above hunk to omit calls to pp_output() during the run) and instead
> keep accumulating in the strbuf, until the child task finishes, at
> which time pp_collect_finished() will call task_finished callback.
>
> Even though the callback usually consumes the last piece of the
> output since the last pp_output() call made during the normal
> execution of the run_processes_parallel() loop, because we omitted
> these calls, we have full output from the child task accumulated in
> the children[].err strbuf.  We may still not output .err for real,
> as we may not be the output_owner, in which case we may only append
> to .buffered_output member.
>
> I am puzzled simply because, if the above summary is correct, I do
> not see how a word "pipe" have a chance to come into the picture.

Ah I see what you mean here -- your summary is correct. Something
like "buffer_output" would make much more sense.

> I can sort of see that in this mode, we would end up buffering the
> entire output from each child task into one strbuf each, and can
> avoid stalling the child tasks waiting for their turn to see their
> output pipes drained.  But is this a reasonable thing to do?  How do
> we control the memory consumption to avoid having to spool unbounded
> amount of output from child tasks in core, or do we have a good
> reason to believe that we do not have to bother?

You are correct that storing unbounded output doesn't seem like a good
idea. One idea I have is to parse output during the periodic collection rather
than waiting till the end. The other idea I had was to add another
"git status --porcelain" option that would only output the necessary
pieces of information so we wouldn't have to bother with worrying about
unbounded output.

Any other thoughts as to how I can workaround this?

Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/4] diff-lib: refactor functions
  2022-09-23 20:36   ` Junio C Hamano
@ 2022-09-26 17:35     ` Calvin Wan
  0 siblings, 0 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-26 17:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, emilyshaffer

> Unlike the original, this always makes two needless structure
> assignments for anything that is not a submodule.
>
> Can we fix that regression before moving forward?
>
> Even when ce_mode is a gitlink, if .ignore_submodules bit is set,
> the two structure assignments between diffopt->flags and orig_flags
> become necessary, so the original was already doing extra copies but
> we do not have to make it worse.

ack

> Strange indentation.  It is unclear why this bottom 1/3 of the loop
> body of run_diff_files() need to be a separate helper function,
> while the top 2/3 does not.  The resulting loop (below) becomes very
> hard to follow because the reader cannot tell when diff_change() is
> called and when it is not.
>
> Overall, I see this change detrimental to diff-lib API at this step
> in the series.  Later steps may show something more rewarding than
> the downsides we see here, hopefully.

I'll see if I can come up with a way to rewrite this so it is less confusing.
Alternatively, I could remove this refactor.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
  2022-09-24 20:17     ` Junio C Hamano
@ 2022-09-26 17:50     ` Calvin Wan
  1 sibling, 0 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-26 17:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, emilyshaffer

> > +status.parallelSubmodules::
> > +     When linkgit:git-status[1] is run in a superproject with
> > +     submodules, a status subprocess is spawned for every submodule.
> > +     This option specifies the number of submodule status subprocesses
> > +     to run in parallel. If unset, it defaults to 1.
>
> Why do we default to 1, instead of e.g. grep.threads defaulting to the
> "cores available"?

In my cover letter, I noted that with too many processes, status starts to
slow down (but is still better than the baseline). This is because the
expensive part of status is IO, specifically reading objects from the index.
Much of the speedup of this patch comes from taking advantage of the
ability to do parallel reads on an SSD, rather than splitting up the work
of status. However, this doesn't work with an HDD, so status may
actually be slower than baseline with multiple processes since there is
now scheduling/switching overhead.

>
> > +struct submodule_status_util {
> > +     int changed, ignore_untracked;
>
> nit: we tend not to group by type, which also makes adding API docs
> later easier.
>
> Except that we tend to do that if all the things are really boolean flags, which these are, so maybe:
>
>         unsigned int changed:1,
>                      ignore_untracked:1;
>
> ?

ack.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-23 21:44   ` Junio C Hamano
@ 2022-09-26 19:12     ` Calvin Wan
  0 siblings, 0 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-26 19:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, emilyshaffer

> You make it sound as if this is only for "git status", but shouldn't
> it benefit the usual "git diff" the same way when you have
> submodules that can be dirty?

It should also! I'll reword the commit message.

> > Add config option status.parallelSubmodules to set the maximum number
> > of parallel jobs.
>
> Configuration is fine, but I am having a hard time justifying the
> addition of an extra parameter to run_diff_files().  It might be
> more palatable to give a new bit (default off) in the rev structure
> that tells it that it is OK to go into the "defer_submodule_status"
> mode, if we absolutely want to change the behaviour of
> run_diff_files() only for a single caller or something, but I doubt
> it should even be needed.
>
> I cannot think of a sane user interface that says "when
> run_diff_files() is invoked as an implementation detail of 'status',
> use this value, and when it is running for another command 'foo',
> use this other value".  How would a user decide to pick a different
> value for different commands?
>
> Letting a single configuration variable to decide the degree of
> parallelism inside run_diff_files() would be sufficient, and we
> shouldn't have to touch all the existing calling sites of
> run_diff_files() all over the place.  If you absolutely want to do
> this, I'd rather see the new member for the configuration variable
> not in wt_status but in rev_info.

I agree the configuration variable should be in rev_info and not
be specific to status since other callers can take advantage of it.

>
> > +status.parallelSubmodules::
> > +     When linkgit:git-status[1] is run in a superproject with
> > +     submodules, a status subprocess is spawned for every submodule.
> > +     This option specifies the number of submodule status subprocesses
> > +     to run in parallel. If unset, it defaults to 1.
>
> As I said, I am not sure <cmd>.parallelSubmodules per command makes
> much sense.  "If unset, it defaults to" sounds a bit redundant (if
> it is explicitly set, the default value should not matter).

ack.

> > @@ -138,6 +153,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> >               struct cache_entry *ce = istate->cache[i];
> >               int changed;
> >               unsigned dirty_submodule = 0;
> > +             int defer_submodule_status = revs && revs->repo &&
> > +                                                     !strcmp(revs->repo->gitdir, ".git");
>
> What is this overly deeply indented comparison with ".git" doing?
> What are we checking and why?  Is that something we need to do for
> each and every active_cache[] entry, or isn't it constant over the
> loop?

It is checking to see whether we are in the superproject or not, since
it doesn't make sense to parallelize status in a submodule subprocess.
It doesn't need to be in the loop though -- will move out.

> > +             int ignore_untracked_in_submodules;
> >
> >               if (diff_can_quit_early(&revs->diffopt))
> >                       break;
> > @@ -269,11 +287,36 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> >                       }
> >
> >                       changed = match_stat_with_submodule(&revs->diffopt, ce, &st,
> > +                                                     ce_option, &dirty_submodule,
> > +                                                     &defer_submodule_status,
> > +                                                     &ignore_untracked_in_submodules);
> >                       newmode = ce_mode_from_stat(ce, st.st_mode);
> > +                     if (defer_submodule_status) {
> > +                             struct string_list_item *item =
> > +                                                             string_list_append(&submodules, ce->name);
> > +                             struct submodule_status_util *util = xmalloc(sizeof(*util));
> > +                             util->changed = changed;
> > +                             util->dirty_submodule = 0;
> > +                             util->ignore_untracked = ignore_untracked_in_submodules;
> > +                             util->newmode = newmode;
> > +                             util->ce = ce;
> > +                             item->util = util;
> > +                             continue;
>
> This makes me wonder if defer_submodule_status should be a string
> list that receives the punted submodules---iow, don't these details
> belong to match_stat_with_submodule() rather than its caller here?

That makes sense. I can definitely clean up this section and
match_stat_with_submodules() more.

>
> Even better may be to start a new child task for the submodule here,
> letting it work while the parent process moves on to the next entry
> in the active_cache[].  I do not know if you thought about doing it
> that way.

I think the implementation complexity of doing it this way would be
very high -- basically reimplementing run_processes_parallel(), but
within this loop. And the benefit of doing so I think is not worth it -- at
that point, I might as well parallelize the entire function.

> > +             for (int i = 0; i < submodules.nr; i++) {
>
> We still do not allow "for (type var = init; ...)" if I am not
> mistaken.  Check the coding guidelines.

ack.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-25 13:59   ` Phillip Wood
  2022-09-26 17:11     ` Junio C Hamano
@ 2022-09-26 19:22     ` Calvin Wan
  1 sibling, 0 replies; 31+ messages in thread
From: Calvin Wan @ 2022-09-26 19:22 UTC (permalink / raw)
  To: phillip.wood; +Cc: git, emilyshaffer, Ævar Arnfjörð Bjarmason

> I suspect in the future we may want to parallelize other commands for
> submodules in which case a more general name such as submodules.threads
> might be a better choice. The speed up in the cover letter is
> impressive, could this be safely enabled by default?

Unfortunately not. To reiterate the answer I gave to Avar:
In my cover letter, I noted that with too many processes, status starts to
slow down (but is still better than the baseline). This is because the
expensive part of status is IO, specifically reading objects from the index.
Much of the speedup of this patch comes from taking advantage of the
ability to do parallel reads on an SSD, rather than splitting up the work
of status. However, this doesn't work with an HDD, so status may
actually be slower than baseline with multiple processes since there is
now scheduling/switching overhead.

>
> > index fcf9c85947..c5147a7952 100644
> > --- a/builtin/commit.c
> > +++ b/builtin/commit.c
> > @@ -1468,6 +1468,12 @@ static int git_status_config(const char *k, const char *v, void *cb)
> >               s->detect_rename = git_config_rename(k, v);
> >               return 0;
> >       }
> > +     if (!strcmp(k, "status.parallelsubmodules")) {
> > +             s->parallel_jobs_submodules = git_config_int(k, v);
> > +             if (s->parallel_jobs_submodules < 0)
> > +                     die(_("status.parallelsubmodules cannot be negative"));
>
> What does a value of zero mean?

Looking at my code I set it to 1 if it's zero, but I should update the logic to
something more reasonable as Junio suggested.

>
> > diff --git a/diff-lib.c b/diff-lib.c
> > index 2e148b79e6..ec745755fc 100644
> > --- a/diff-lib.c
> > +++ b/diff-lib.c
>
> > -int run_diff_files(struct rev_info *revs, unsigned int option)
> > +int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs)
>
> Another possibility would be to add a member to struct diff_opts, rather
> than changing the function signature here, I'm wondering what the trade
> offs of the two approaches are. Also seeing all the callers from other
> commands being changed made me wonder if they would benefit from
> parallelizing submodules as well. There aren't any tests - could we use
> GIT_TRACE2 to check that we are running the submodule diffs in parallel?

Adding the option to rev_info seems like the best way forward. I neglected to
write tests for this patch since the parallelism comes from
run_processes_parallel() which already has tests for that. But maybe it is a
good idea to also add a test with GIT_TRACE2 for a sanity check.

Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-26 17:31     ` Calvin Wan
@ 2022-09-27  4:45       ` Junio C Hamano
  2022-09-27 18:10         ` Calvin Wan
  2022-09-27  9:05       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 31+ messages in thread
From: Junio C Hamano @ 2022-09-27  4:45 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer

Calvin Wan <calvinwan@google.com> writes:

> You are correct that storing unbounded output doesn't seem like a good
> idea. One idea I have is to parse output during the periodic collection rather
> than waiting till the end. The other idea I had was to add another
> "git status --porcelain" option that would only output the necessary
> pieces of information so we wouldn't have to bother with worrying about
> unbounded output.
>
> Any other thoughts as to how I can workaround this?

I wonder if you can arrange not to let them make unbounded progress?

In order to run diff-files with path A B C D E ... where B and D are
submodules and others are not submodules, you do not have to run and
finish comparison for B and D before you can do the comparison for
other paths, in order to preserve the proper output order.  You can
start child task for B and D and arrange so that they will run for
any other submodules, and then you

 - run comparison for A.  The child task for B and D may be running
   and starting to talk back to you, in which case their write may
   get stuck waiting for you to read from them, but that is OK, as
   you will read from them shortly.

 - wait for the child task for B.  This is done by reading from the
   pipe connected to it and waiting for its death synchronously.
   The child task for D is still running and may be making progress,
   but you are not obligated to read its output to the end.  You can
   postpone reading to conserve memory and it will fill the pipe and
   stall automatically.  Then accept the result for B.

 - run comparison for C.

 - wait for the child task for D.  You may have some data read
   already while dealing with B, but you may still have some reading
   to do at this point.  Let it finish synchronously.

 - run comparison for E.

etc., perhaps?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-26 17:31     ` Calvin Wan
  2022-09-27  4:45       ` Junio C Hamano
@ 2022-09-27  9:05       ` Ævar Arnfjörð Bjarmason
  2022-09-27 17:55         ` Calvin Wan
  1 sibling, 1 reply; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-27  9:05 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Junio C Hamano, git, emilyshaffer


On Mon, Sep 26 2022, Calvin Wan wrote:

>>  * Why are we configuring an API behaviour via a global variable in
>>    21st century?
>
> I was mimicking how "ungroup" worked, but now that Avar mentions
> that pattern was for a quick regression fix, I can fix it to pass it in as a
> parameter.

I don't know if you started this work or not, but I had a WIP rebase of
those on-list patches lying around in my tree, which I'm rebasing on
"master" currently.

Some of it's a bit tricky with the in-tree activity since then, I'll
send them in when they're ready, maybe you already did your own rebasing
of them, or maybe it helps & you'd like to base a re-submission on top
of them.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-26 16:59     ` Calvin Wan
@ 2022-09-27 10:52       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-27 10:52 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer


On Mon, Sep 26 2022, Calvin Wan wrote:

>> On the implementation:
>>
>> > + * If the "pipe_output" option is specified, the output will be piped
>> > + * to task_finished_fn in the "struct strbuf *out" variable. The output
>> > + * will still be printed unless the callback resets the strbuf. The
>> > + * "pipe_output" option can be enabled by setting the global
>> > + * "run_processes_parallel_pipe_output" to "1" before invoking
>> > + * run_processes_parallel(), it will be set back to "0" as soon as the
>> > + * API reads that setting.
>>
>> ...okey, but...
>>
>> > +static int task_finished_pipe_output(int result,
>> > +                      struct strbuf *err,
>> > +                      void *pp_cb,
>> > +                      void *pp_task_cb)
>> > +{
>> > +     if (err && pipe_output) {
>> > +             fprintf(stderr, "%s", err->buf);
>> > +             strbuf_reset(err);
>>
>> ...my memory's hazy, and I haven't re-logged in any detail, but is it
>> really the API interface here that the "output" callback function is
>> responsible for resetting the strbuf that the API gives to it?
>>
>> That seems backwards to me, and e.g. a look at "start_failure" shows
>> that we strbuf_reset() the "err".
>>
>> What's the point of doing it in the API consumer? If it doesn't do it
>> we'll presumably keep accumulating output. Is there a use-case for that?
>>
>> Or perhaps it's not needed & this is really just misleading boilerplate?
>
> Ultimately it is not needed -- I added it as an example to showcase that
> the output is correctly being piped to "task_finished_pipe_output". The
> reset is necessary in this case to prevent the output from being printed
> twice. I'm not sure how exactly else I would go about testing "pipe_output".

If that's the intent then having that reset there seems to me to be
doing the exact opposite of what you want.

If the API is broken and passing the output along twice without clearing
it in-between the two calls your strbuf_reset() would be sweeping that
issue under the rug, that API brokenness would be "repaired" by your
test.

Whereas if you remove the strbuf_reset() it should behave as it does
now, and if it doesn't the API itself is broken, i.e. after calling the
callback it should be resetting the buffer.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-27  9:05       ` Ævar Arnfjörð Bjarmason
@ 2022-09-27 17:55         ` Calvin Wan
  2022-09-27 19:34           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 31+ messages in thread
From: Calvin Wan @ 2022-09-27 17:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git, emilyshaffer

> I don't know if you started this work or not, but I had a WIP rebase of
> those on-list patches lying around in my tree, which I'm rebasing on
> "master" currently.
>
> Some of it's a bit tricky with the in-tree activity since then, I'll
> send them in when they're ready, maybe you already did your own rebasing
> of them, or maybe it helps & you'd like to base a re-submission on top
> of them.

I have not started on this part yet. Do you have an estimate as to when
you're planning on submitting your rebase? I'm also considering not using
my pipe_output option and going a different route since there is the
issue of dealing with potentially unbounded output.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-27  4:45       ` Junio C Hamano
@ 2022-09-27 18:10         ` Calvin Wan
  2022-09-27 21:40           ` Junio C Hamano
  0 siblings, 1 reply; 31+ messages in thread
From: Calvin Wan @ 2022-09-27 18:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, emilyshaffer

> I wonder if you can arrange not to let them make unbounded progress?
>
> In order to run diff-files with path A B C D E ... where B and D are
> submodules and others are not submodules, you do not have to run and
> finish comparison for B and D before you can do the comparison for
> other paths, in order to preserve the proper output order.  You can
> start child task for B and D and arrange so that they will run for
> any other submodules, and then you

There is no need to preserve proper output order, as the output is
sorted at the end.

>  - run comparison for A.  The child task for B and D may be running
>    and starting to talk back to you, in which case their write may
>    get stuck waiting for you to read from them, but that is OK, as
>    you will read from them shortly.
>
>  - wait for the child task for B.  This is done by reading from the
>    pipe connected to it and waiting for its death synchronously.
>    The child task for D is still running and may be making progress,
>    but you are not obligated to read its output to the end.  You can
>    postpone reading to conserve memory and it will fill the pipe and
>    stall automatically.  Then accept the result for B.
>
>  - run comparison for C.
>
>  - wait for the child task for D.  You may have some data read
>    already while dealing with B, but you may still have some reading
>    to do at this point.  Let it finish synchronously.
>
>  - run comparison for E.
>
> etc., perhaps?

I understand the idea you're suggesting and I think it would work,
but I'm worried about the overhead this would create. I would rather
implement a separate "git status --porcelain" output for just this
submodule case so 1. we would not have to worry about unbounded
output and 2. both the output parsing and the command could be
optimized. In parse_status_porcelain, the function returns early if
the submodule is found to have untracked and modified changes.
This early termination can happen on the command side, rather
than the parsing side.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
                     ` (2 preceding siblings ...)
  2022-09-25 13:59   ` Phillip Wood
@ 2022-09-27 18:40   ` Emily Shaffer
  3 siblings, 0 replies; 31+ messages in thread
From: Emily Shaffer @ 2022-09-27 18:40 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git

On Thu, Sep 22, 2022 at 11:29:47PM +0000, Calvin Wan wrote:
> diff --git a/builtin/commit.c b/builtin/commit.c
> index fcf9c85947..c5147a7952 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1468,6 +1468,12 @@ static int git_status_config(const char *k, const char *v, void *cb)
>  		s->detect_rename = git_config_rename(k, v);
>  		return 0;
>  	}
> +	if (!strcmp(k, "status.parallelsubmodules")) {
> +		s->parallel_jobs_submodules = git_config_int(k, v);
> +		if (s->parallel_jobs_submodules < 0)
> +			die(_("status.parallelsubmodules cannot be negative"));

Seems odd to have this check when all through the rest of the code
you're happily setting parallel_jobs=-1.

Although I don't think I disagree with the check ;) rather, I disagree
with the semantics used elsewhere for this magic -1 value.

> +		return 0;
> +	}
>  	return git_diff_ui_config(k, v, NULL);
>  }
>  
> diff --git a/diff-lib.c b/diff-lib.c
> index 2e148b79e6..ec745755fc 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -65,14 +65,19 @@ static int check_removed(const struct index_state *istate, const struct cache_en
>   * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES
>   * option is set, the caller does not only want to know if a submodule is
>   * modified at all but wants to know all the conditions that are met (new
> - * commits, untracked content and/or modified content).
> + * commits, untracked content and/or modified content). If
> + * defer_submodule_status bit is set, dirty_submodule will be left to the
> + * caller to set. defer_submodule_status can also be set to 0 in this
> + * function if there is no need to check if the submodule is modified.
>   */

Should the defer optimization be moved to its own pack? I think it is
somewhat orthogonal to deciding whether to parallelize or not, yes?

>  static int match_stat_with_submodule(struct diff_options *diffopt,
>  				     const struct cache_entry *ce,
>  				     struct stat *st, unsigned ce_option,
> -				     unsigned *dirty_submodule)
> +				     unsigned *dirty_submodule, int *defer_submodule_status,
> +					 int *ignore_untracked_in_submodules)
>  {
>  	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
> +	int defer = 0;
>  	struct diff_flags orig_flags = diffopt->flags;
>  	if (!S_ISGITLINK(ce->ce_mode))
>  		goto cleanup;
> @@ -83,11 +88,20 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
>  		goto cleanup;
>  	}
>  	if (!diffopt->flags.ignore_dirty_submodules &&
> -		(!changed || diffopt->flags.dirty_submodules))
> +		(!changed || diffopt->flags.dirty_submodules)) {
> +		if (defer_submodule_status && *defer_submodule_status) {
> +			defer = 1;
> +			*ignore_untracked_in_submodules =
> +							diffopt->flags.ignore_untracked_in_submodules;
> +		} else {
>  			*dirty_submodule = is_submodule_modified(ce->name,
>  							diffopt->flags.ignore_untracked_in_submodules);
> +		}
> +	}
>  cleanup:
>  	diffopt->flags = orig_flags;
> +	if (defer_submodule_status)
> +		*defer_submodule_status = defer;
>  	return changed;
>  }
>  
> @@ -117,7 +131,7 @@ static void finish_run_diff_files(struct rev_info *revs,
>  			    ce->name, 0, dirty_submodule);
>  }
>  
> -int run_diff_files(struct rev_info *revs, unsigned int option)
> +int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs)

The meaning of 'parallel_jobs' isn't documented anywhere in this patch
that I can see, but all over we're setting it to some magic -1 value,
right? What's that about?

>  {
>  	int entries, i;
>  	int diff_unmerged_stage = revs->max_count;
> @@ -125,6 +139,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			      ? CE_MATCH_RACY_IS_DIRTY : 0);
>  	uint64_t start = getnanotime();
>  	struct index_state *istate = revs->diffopt.repo->index;
> +	struct string_list submodules = STRING_LIST_INIT_NODUP;
>  
>  	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
>  
> @@ -138,6 +153,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  		struct cache_entry *ce = istate->cache[i];
>  		int changed;
>  		unsigned dirty_submodule = 0;
> +		int defer_submodule_status = revs && revs->repo &&
> +							!strcmp(revs->repo->gitdir, ".git");
> +		int ignore_untracked_in_submodules;

Does this actually compile with -Wall? I thought it is no good to do an
inline function call in the middle of the allocation block?

>  
>  		if (diff_can_quit_early(&revs->diffopt))
>  			break;
> @@ -269,11 +287,36 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			}
>  
>  			changed = match_stat_with_submodule(&revs->diffopt, ce, &st,
> -							    ce_option, &dirty_submodule);
> +							ce_option, &dirty_submodule,
> +							&defer_submodule_status,
> +							&ignore_untracked_in_submodules);
>  			newmode = ce_mode_from_stat(ce, st.st_mode);
> +			if (defer_submodule_status) {
> +				struct string_list_item *item =
> +								string_list_append(&submodules, ce->name);
> +				struct submodule_status_util *util = xmalloc(sizeof(*util));
> +				util->changed = changed;
> +				util->dirty_submodule = 0;
> +				util->ignore_untracked = ignore_untracked_in_submodules;
> +				util->newmode = newmode;
> +				util->ce = ce;
> +				item->util = util;
> +				continue;
> +			}
>  		}
>  		finish_run_diff_files(revs, ce, istate, changed, dirty_submodule, newmode);
>  	}
> +	if (submodules.nr > 0) {
> +		if (get_submodules_status(revs->repo, &submodules,
> +						parallel_jobs > 0 ? parallel_jobs : 1))

We're not doing a lookup in case of parallel_jobs<0, right? So why even
bother having this special meaning for -1? It's undocumented in the
header for run_diff_files anyway. In fact, I'd expect parallel_jobs=-1
to mean "as many jobs as possible" - but here you're using it just to
mean 1.

Why not let parallel_jobs be unsigned, do special casing for 0 if you
want to use that to mean "recommended # of jobs", and otherwise take it
at face value? Am I missing something in my skim over the patch?

> +			BUG("Submodule status failed");
> +		for (int i = 0; i < submodules.nr; i++) {
> +			struct submodule_status_util *util = submodules.items[i].util;
> +
> +			finish_run_diff_files(revs, util->ce, NULL, util->changed,
> +							util->dirty_submodule, util->newmode);
> +		}
> +	}
>  	diffcore_std(&revs->diffopt);
>  	diff_flush(&revs->diffopt);
>  	trace_performance_since(start, "diff-files");
> @@ -321,7 +364,7 @@ static int get_stat_data(const struct index_state *istate,
>  			return -1;
>  		}
>  		changed = match_stat_with_submodule(diffopt, ce, &st,
> -						    0, dirty_submodule);
> +						    0, dirty_submodule, NULL, NULL);
>  		if (changed) {
>  			mode = ce_mode_from_stat(ce, st.st_mode);
>  			oid = null_oid();
> diff --git a/diff.h b/diff.h
> index 8ae18e5ab1..5a6a615381 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -627,7 +627,7 @@ void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb);
>  #define DIFF_SILENT_ON_REMOVED 01
>  /* report racily-clean paths as modified */
>  #define DIFF_RACY_IS_MODIFIED 02
> -int run_diff_files(struct rev_info *revs, unsigned int option);
> +int run_diff_files(struct rev_info *revs, unsigned int option, int parallel_jobs);

See above; probably documenting parallel_jobs should happen here.

>  
>  #define DIFF_INDEX_CACHED 01
>  #define DIFF_INDEX_MERGE_BASE 02
> diff --git a/submodule.c b/submodule.c
> index 91213ba83c..15729bb327 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1362,6 +1362,20 @@ int submodule_touches_in_range(struct repository *r,
>  	return ret;
>  }
>  

I actually might like to see the conversion to using
run_processes_parallel happen in an earlier patch, with jobs=1. Might
make the review load easier.

That said, in previous series which took that approach, because it
worked correctly with jobs=1 we missed a bug in the writing of the new
*_task_finished() callback.... Maybe that doesn't apply here as the
jobs=N patch would follow on in the same series, so you'd test it right
away?

> +struct submodule_parallel_status {
> +	int index_count;
> +	int result;
> +
> +	struct string_list *submodule_names;
> +	struct repository *r;
> +
> +	/* Pending statuses by OIDs */
> +	struct status_task **oid_status_tasks;
> +	int oid_status_tasks_nr, oid_status_tasks_alloc;
> +};
> +
> +#define SPS_INIT { 0 }
> +
>  struct submodule_parallel_fetch {
>  	/*
>  	 * The index of the last index entry processed by
> @@ -1444,6 +1458,12 @@ struct fetch_task {
>  	struct oid_array *commits; /* Ensure these commits are fetched */
>  };
>  
> +struct status_task {
> +	struct repository *repo;
> +	const char *path;
> +	int ignore_untracked;
> +};
> +
>  /**
>   * When a submodule is not defined in .gitmodules, we cannot access it
>   * via the regular submodule-config. Create a fake submodule, which we can
> @@ -1547,6 +1567,41 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
>  	return NULL;
>  }
>  
> +static struct status_task *
> +get_status_task_from_index(struct submodule_parallel_status *sps,
> +			  struct strbuf *err)
The spacing of these long function signatures feels odd; I'd much rather
see something like:

static struct status_task* get_status_task_from_index(
  (args)

Plus, as this get_status_task_from_index() exists only to be called by
get_next_submodule_status(), I'd find it easier to read if this function
was closer to that one. But that's a pretty tiny nit ;)

> +{
> +	for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) {
> +		struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util;
> +		const struct cache_entry *ce = util->ce;
> +		struct status_task *task;
> +		struct strbuf buf = STRBUF_INIT;
> +		const char *git_dir;
> +
> +		strbuf_addf(&buf, "%s/.git", ce->name);
> +		git_dir = read_gitfile(buf.buf);
> +		if (!git_dir)
> +			git_dir = buf.buf;
> +		if (!is_git_directory(git_dir)) {
> +			if (is_directory(git_dir))
> +				die(_("'%s' not recognized as a git repository"), git_dir);
> +			strbuf_release(&buf);
> +			/* The submodule is not checked out, so it is not modified */
> +			util->dirty_submodule = 0;
> +			continue;
> +		}
> +		strbuf_release(&buf);
> +
> +		task = xmalloc(sizeof(*task));
> +		memset(task, 0, sizeof(*task));
> +		task->path = ce->name;
> +		task->ignore_untracked = util->ignore_untracked;
> +		sps->index_count++;
> +		return task;
> +	}
> +	return NULL;
> +}
> +
>  static struct fetch_task *
>  get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
>  			  struct strbuf *err)
> @@ -1744,6 +1799,16 @@ static int fetch_start_failure(struct strbuf *err,
>  	return 0;
>  }
>  
> +static int status_start_failure(struct strbuf *err,
> +			       void *cb, void *task_cb)
> +{
> +	struct submodule_parallel_status *sps = cb;
> +
> +	sps->result = 1;
> +	return 0;
> +}
> +
> +
>  static int commit_missing_in_sub(const struct object_id *oid, void *data)
>  {
>  	struct repository *subrepo = data;
> @@ -1790,6 +1855,30 @@ static int parse_status_porcelain(char *buf, unsigned *dirty_submodule, int igno
>  	return 0;
>  }
>  
> +static int status_finish(int retvalue, struct strbuf *err,
> +			void *cb, void *task_cb)
> +{
> +	struct submodule_parallel_status *sps = cb;
> +	struct status_task *task = task_cb;
> +	struct string_list list = STRING_LIST_INIT_DUP;
> +	struct string_list_item *it = string_list_lookup(sps->submodule_names,
> +											task->path);
> +	struct submodule_status_util *util = it->util;
> +
> +	int i;
> +
> +	string_list_split(&list, err->buf, '\n', -1);
> +
> +	for (i = 0; i < list.nr; i++) {
> +		if (parse_status_porcelain(list.items[i].string,
> +							&util->dirty_submodule, util->ignore_untracked))
> +			break;
> +	}
> +	strbuf_reset(err);
> +
> +	return 0;
> +}
> +
>  static int fetch_finish(int retvalue, struct strbuf *err,
>  			void *cb, void *task_cb)
>  {
> @@ -1943,6 +2032,59 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
>  	return dirty_submodule;
>  }
>  
> +static int get_next_submodule_status(struct child_process *cp,
> +						struct strbuf *err, void *data, void **task_cb)
> +{
> +	struct submodule_parallel_status *sps = data;
> +	struct status_task *task = get_status_task_from_index(sps, err);
> +
> +	int ignore_untracked;
> +
> +	if (!task) {
> +		return 0;
> +	}
> +
> +	ignore_untracked = task->ignore_untracked;
> +
> +	child_process_init(cp);
> +	prepare_submodule_repo_env_in_gitdir(&cp->env);
> +
> +	strvec_init(&cp->args);
> +	strvec_pushl(&cp->args, "status", "--porcelain=2", NULL);
> +	if (ignore_untracked)
> +		strvec_push(&cp->args, "-uno");
> +
> +	prepare_submodule_repo_env(&cp->env);
> +	cp->git_cmd = 1;
> +	cp->no_stdin = 1;
> +	cp->dir = task->path;
> +	*task_cb = task;
> +	return 1;
> +}
> +
> +int get_submodules_status(struct repository *r,
> +			struct string_list *submodules,
> +		    int max_parallel_jobs)
> +{
> +	struct submodule_parallel_status sps = SPS_INIT;
> +
> +	sps.r = r;
> +
> +	if (repo_read_index(r) < 0)
> +		die(_("index file corrupt"));
> +
> +	sps.submodule_names = submodules;
> +	string_list_sort(sps.submodule_names);
> +	run_processes_parallel_pipe_output = 1;
> +	run_processes_parallel_tr2(max_parallel_jobs,
> +				get_next_submodule_status,
> +				status_start_failure,
> +				status_finish,
> +				&sps,
> +				"submodule", "parallel/status");
> +	return sps.result;
> +}
> +
>  int submodule_uses_gitfile(const char *path)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> diff --git a/submodule.h b/submodule.h
> index bfaa9da186..18a42c64ce 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -41,6 +41,12 @@ struct submodule_update_strategy {
>  	.type = SM_UPDATE_UNSPECIFIED, \
>  }
>  
> +struct submodule_status_util {
> +	int changed, ignore_untracked;
> +	unsigned dirty_submodule, newmode;
> +	struct cache_entry *ce;
> +};
> +
>  int is_gitmodules_unmerged(struct index_state *istate);
>  int is_writing_gitmodules_ok(void);
>  int is_staging_gitmodules_ok(struct index_state *istate);
> @@ -94,6 +100,9 @@ int fetch_submodules(struct repository *r,
>  		     int command_line_option,
>  		     int default_option,
>  		     int quiet, int max_parallel_jobs);
> +int get_submodules_status(struct repository *r,
> +			 struct string_list *submodules,
> +		     int max_parallel_jobs);
>  unsigned is_submodule_modified(const char *path, int ignore_untracked);
>  int submodule_uses_gitfile(const char *path);
>  
> diff --git a/wt-status.c b/wt-status.c
> index 867e3e417e..9864484f81 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -615,7 +615,7 @@ static void wt_status_collect_changes_worktree(struct wt_status *s)
>  	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>  	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>  	copy_pathspec(&rev.prune_data, &s->pathspec);
> -	run_diff_files(&rev, 0);
> +	run_diff_files(&rev, 0, s->parallel_jobs_submodules);
>  	release_revisions(&rev);
>  }
>  
> @@ -1149,7 +1149,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
>  		setup_work_tree();
>  		rev.diffopt.a_prefix = "i/";
>  		rev.diffopt.b_prefix = "w/";
> -		run_diff_files(&rev, 0);
> +		run_diff_files(&rev, 0, -1);
>  	}
>  	release_revisions(&rev);
>  }
> @@ -2544,7 +2544,7 @@ int has_unstaged_changes(struct repository *r, int ignore_submodules)
>  	}
>  	rev_info.diffopt.flags.quick = 1;
>  	diff_setup_done(&rev_info.diffopt);
> -	result = run_diff_files(&rev_info, 0);
> +	result = run_diff_files(&rev_info, 0, -1);
>  	result = diff_result_code(&rev_info.diffopt, result);
>  	release_revisions(&rev_info);
>  	return result;
> diff --git a/wt-status.h b/wt-status.h
> index ab9cc9d8f0..2ea2317715 100644
> --- a/wt-status.h
> +++ b/wt-status.h
> @@ -131,6 +131,7 @@ struct wt_status {
>  	enum wt_status_format status_format;
>  	struct wt_status_state state;
>  	struct object_id oid_commit; /* when not Initial */
> +	int parallel_jobs_submodules;
>  
>  	/* These are computed during processing of the individual sections */
>  	int committable;
> -- 
> 2.37.3.998.g577e59143f-goog
> 

I feel like we're missing tests exercising status.parallelSubmodules=N?

 - Emily

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-27 17:55         ` Calvin Wan
@ 2022-09-27 19:34           ` Ævar Arnfjörð Bjarmason
  2022-09-27 20:45             ` Calvin Wan
  0 siblings, 1 reply; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-27 19:34 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Junio C Hamano, git, emilyshaffer


On Tue, Sep 27 2022, Calvin Wan wrote:

>> I don't know if you started this work or not, but I had a WIP rebase of
>> those on-list patches lying around in my tree, which I'm rebasing on
>> "master" currently.
>>
>> Some of it's a bit tricky with the in-tree activity since then, I'll
>> send them in when they're ready, maybe you already did your own rebasing
>> of them, or maybe it helps & you'd like to base a re-submission on top
>> of them.
>
> I have not started on this part yet. Do you have an estimate as to when
> you're planning on submitting your rebase?

I hacked this up today, it passes CI at least:

	https://github.com/git/git/compare/master...avar:avar/hook-run-process-parallel-tty-regression-2-argument-passing

> I'm also considering not using
> my pipe_output option and going a different route since there is the
> issue of dealing with potentially unbounded output.

Okey, if it's not blocking a re-submission of yours then I'll definitely
wait until after the RC to submit the above, at least, but if you'd like
it earlier...


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-27 19:34           ` Ævar Arnfjörð Bjarmason
@ 2022-09-27 20:45             ` Calvin Wan
  2022-09-28  5:40               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 31+ messages in thread
From: Calvin Wan @ 2022-09-27 20:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git, emilyshaffer

> Okey, if it's not blocking a re-submission of yours then I'll definitely
> wait until after the RC to submit the above, at least, but if you'd like
> it earlier...

I just tested a solution where I add pipe_output_fn to
parallel_processes instead of having it as a variable. Not only does
it work but it also solves my unbounded output problem, so this is
no longer blocked resubmission.

Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-27 18:10         ` Calvin Wan
@ 2022-09-27 21:40           ` Junio C Hamano
  0 siblings, 0 replies; 31+ messages in thread
From: Junio C Hamano @ 2022-09-27 21:40 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer

Calvin Wan <calvinwan@google.com> writes:

> ... I would rather
> implement a separate "git status --porcelain" output for just this
> submodule case so 1. we would not have to worry about unbounded
> output and 2. both the output parsing and the command could be
> optimized.

Sounds good.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
  2022-09-27 20:45             ` Calvin Wan
@ 2022-09-28  5:40               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 31+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-28  5:40 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Junio C Hamano, git, emilyshaffer


On Tue, Sep 27 2022, Calvin Wan wrote:

>> Okey, if it's not blocking a re-submission of yours then I'll definitely
>> wait until after the RC to submit the above, at least, but if you'd like
>> it earlier...
>
> I just tested a solution where I add pipe_output_fn to
> parallel_processes instead of having it as a variable. Not only does
> it work but it also solves my unbounded output problem, so this is
> no longer blocked resubmission.

You mean the internal "struct parallel_processes"? How do you get the
parameter there, presumably by passing it to
run_processes_parallel{,_tr2}() as a new parameter?

The reason for why the "ungroup" wasn't added as a parameter at the time
was to avoid the churn of changing every single caller of the API.

But it should really be a "parameter", and doing it via a struct means
adding such parameters doesn't need to change every single caller.

Then we have outstanding WIP patches for the hook.[ch] API which needed
to add two other parameters...

So I think first ripping off the band-aid of making it painless to
extend the interface is the right thing to do, unless I've missed some
way of doing it that you've just discovered...





^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-09-28  5:45 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 23:29 [PATCH 0/4] submodule: parallelize status Calvin Wan
2022-09-22 23:29 ` [PATCH 1/4] run-command: add pipe_output to run_processes_parallel Calvin Wan
2022-09-23  7:52   ` Ævar Arnfjörð Bjarmason
2022-09-26 16:59     ` Calvin Wan
2022-09-27 10:52       ` Ævar Arnfjörð Bjarmason
2022-09-23 18:58   ` Junio C Hamano
2022-09-26 17:31     ` Calvin Wan
2022-09-27  4:45       ` Junio C Hamano
2022-09-27 18:10         ` Calvin Wan
2022-09-27 21:40           ` Junio C Hamano
2022-09-27  9:05       ` Ævar Arnfjörð Bjarmason
2022-09-27 17:55         ` Calvin Wan
2022-09-27 19:34           ` Ævar Arnfjörð Bjarmason
2022-09-27 20:45             ` Calvin Wan
2022-09-28  5:40               ` Ævar Arnfjörð Bjarmason
2022-09-22 23:29 ` [PATCH 2/4] submodule: move status parsing into function Calvin Wan
2022-09-22 23:29 ` [PATCH 3/4] diff-lib: refactor functions Calvin Wan
2022-09-23 20:36   ` Junio C Hamano
2022-09-26 17:35     ` Calvin Wan
2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
2022-09-24 20:17     ` Junio C Hamano
2022-09-26 17:50     ` Calvin Wan
2022-09-23 21:44   ` Junio C Hamano
2022-09-26 19:12     ` Calvin Wan
2022-09-25 13:59   ` Phillip Wood
2022-09-26 17:11     ` Junio C Hamano
2022-09-26 19:22     ` Calvin Wan
2022-09-27 18:40   ` Emily Shaffer
2022-09-23 22:56 ` [PATCH 0/4] submodule: parallelize status Junio C Hamano
2022-09-26 16:33   ` Calvin Wan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).