[PATCH v2 0/4] submodule: parallelize diff

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH v2 0/4] submodule: parallelize diff
       [not found] <https://lore.kernel.org/git/20220922232947.631309-1-calvinwan@google.com/>
@ 2022-10-11 23:26 ` Calvin Wan
  2022-10-12  5:52   ` Junio C Hamano
  2022-10-11 23:26 ` [PATCH v2 1/4] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Calvin Wan @ 2022-10-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123

Changes since v1

This series is now rebased on top of Avar's run-command refactor
series[1], which allows new functions to be easily added to
run_parallel_processes without having to change other callers.

The config option has been renamed to submodule.diffJobs. This name
accurately reflects what functionality is affected. While other APIs
parse for config options from the initial command, this config option is
parsed for in the diff-lib.c library. This is because there are so many
entry points into run_diff_files resulting in signicant code changes
required to pass in this config option.

I also wanted to pose another question to list regarding defaults for
parallel processes. For jobs that clearly scale with the number of
processes (aka jobs that are mostly processor bound), it is obvious that
setting the default number of processes to the number of available cores
is the most optimal option. However, this changes when the job is mostly
I/O bound or has a combination of I/O and processing. Looking at my use
case for `status` on a cold cache (see below), we notice that increasing
the number of parallel processes speeds up status, but after a certain
number, it actually starts slowing down. This is because `status` spends
most of its time reading from the index. While SSDs are able to do a
certain amount of reads in parallel, this limit can be significantly
smaller than the number of cores and going past that limit causes the
read head to constantly swap, slowing down `status`. With an HDD,
parallel reads aren't even possible and while I haven't tested my patch
on an HDD, I have a suspicion that adding multiple processes would
probably make `status` slower than the baseline. How should the default
be set then? In my case, the safe option would be to default it to 1,
but then many users wouldn't be able to discover this optimization.
There are also multiple places in the git documentation where we say
that setting a config option for a parallel process to 0 will result in
a "reasonable amount" of processes, which generally entails the number
of available cores. Can this be problematic for other parallel processes
that spend a significant time in I/O? Should my option even have the
option to set it to 0 given the pitalls?

Original cover letter:

When running 'git status' in a superproject, git spawns a subprocess in
series to run status for every submodule. For projects with many large
submodules, parallelizing status subprocesses can significantly speed
up the runtime for both cold and warm caches. While my initial
intention was to speedup status, it turns out that much of the runtime
spent on status is in diff-lib.c, resulting in speedups for many
different other commands that utilize this library.

Here are some timing tests from running status on the Android Open
Source Project (AOSP). My machine has an SSD and 48 cores. 
  Warm Cache:
    git 2.37
Time (mean ± σ):     17.685 s ±  2.040 s    [User: 5.041 s, System: 22.799 s]
Range (min … max):   16.168 s … 22.804 s    10 runs

    this patch (status.parallelSubmodules=1)
Time (mean ± σ):     13.102 s ±  0.500 s    [User: 4.894 s, System: 19.533 s]
Range (min … max):   12.841 s … 14.447 s    10 runs

    this patch (status.parallelSubmodules=5)
Time (mean ± σ):      3.994 s ±  0.152 s    [User: 4.998 s, System: 20.805 s]
Range (min … max):    3.744 s …  4.163 s    10 runs

    this patch (status.parallelSubmodules=10)
Time (mean ± σ):      3.445 s ±  0.085 s    [User: 5.151 s, System: 20.208 s]
Range (min … max):    3.319 s …  3.586 s    10 runs

    this patch (status.parallelSubmodules=20)
Time (mean ± σ):      3.626 s ±  0.109 s    [User: 5.087 s, System: 20.366 s]
Range (min … max):    3.438 s …  3.763 s    10 runs

We can see that there are diminishing returns and even slightly worse
performance after a certain number of max processes, but optimally
there is a speed up factor of around 5.

  Cold Cache:
    git 2.37
      mean of 3 runs: 6m32s
    this patch (status.parallelSubmodules=1)
      mean of 3 runs: 5m34s
    this patch (status.parallelSubmodules=5)
      mean of 3 runs: 2m23s
    this patch (status.parallelSubmodules=10)
      mean of 3 runs: 2m45s
    this patch (status.parallelSubmodules=20)
      mean of 3 runs: 3m23s

We can witness the same phenomenon as above and optimally there is a
speed up factor of around 2.7.

Patch 1 adds output piping to run_processes_parallel_opt so the output
from each submodule can be parsed. Patches 2 and 3 move preexisting
functionality into separate functions and refactor code to prepare
for patch 4 to implement parallelization.

Future work: The reason why status is much slower on a cold cache vs
warm cache is mainly due to refreshing the index. It is worth
investigating whether this is entirely necessary.

[1] https://lore.kernel.org/git/cover-00.15-00000000000-20220930T111343Z-avarab@gmail.com/

Calvin Wan (4):
  run-command: add pipe_output_fn to run_processes_parallel_opts
  submodule: move status parsing into function
  diff-lib: refactor match_stat_with_submodule
  diff-lib: parallelize run_diff_files for submodules

 Documentation/config/submodule.txt |  12 ++
 diff-lib.c                         | 108 ++++++++++++--
 run-command.c                      |  12 +-
 run-command.h                      |  22 +++
 submodule.c                        | 230 +++++++++++++++++++++++++----
 submodule.h                        |   9 ++
 t/helper/test-run-command.c        |  18 +++
 t/t0061-run-command.sh             |  65 ++++----
 t/t4027-diff-submodule.sh          |  19 +++
 t/t7506-status-submodule.sh        |  19 +++
 10 files changed, 441 insertions(+), 73 deletions(-)

-- 
2.38.0.rc1.362.ged0d419d3c-goog

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] run-command: add pipe_output_fn to run_processes_parallel_opts
       [not found] <https://lore.kernel.org/git/20220922232947.631309-1-calvinwan@google.com/>
  2022-10-11 23:26 ` [PATCH v2 0/4] submodule: parallelize diff Calvin Wan
@ 2022-10-11 23:26 ` Calvin Wan
  2022-10-12  7:58   ` Ævar Arnfjörð Bjarmason
  2022-10-11 23:26 ` [PATCH v2 2/4] submodule: move status parsing into function Calvin Wan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Calvin Wan @ 2022-10-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123

Add pipe_output_fn as an optionally set function in
run_process_parallel_opts. If set, output from each child process is
piped to the callback function to allow for separate parsing.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 run-command.c               | 12 +++++--
 run-command.h               | 22 +++++++++++++
 t/helper/test-run-command.c | 18 ++++++++++
 t/t0061-run-command.sh      | 65 +++++++++++++++++++++++--------------
 4 files changed, 90 insertions(+), 27 deletions(-)

diff --git a/run-command.c b/run-command.c
index da02631933..c6090e4cb8 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1689,12 +1689,16 @@ static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
 	}
 }
 
-static void pp_output(struct parallel_processes *pp)
+static void pp_output(struct parallel_processes *pp,
+			const struct run_process_parallel_opts *opts)
 {
 	int i = pp->output_owner;
 
 	if (pp->children[i].state == GIT_CP_WORKING &&
 	    pp->children[i].err.len) {
+		if (opts->pipe_output)
+			opts->pipe_output(&pp->children[i].err, pp->data,
+						pp->children[i].data);
 		strbuf_write(&pp->children[i].err, stderr);
 		strbuf_reset(&pp->children[i].err);
 	}
@@ -1716,6 +1720,10 @@ static int pp_collect_finished(struct parallel_processes *pp,
 
 		code = finish_command(&pp->children[i].process);
 
+		if (opts->pipe_output)
+			opts->pipe_output(&pp->children[i].err, pp->data,
+						pp->children[i].data);
+
 		if (opts->task_finished)
 			code = opts->task_finished(code, opts->ungroup ? NULL :
 						   &pp->children[i].err, pp->data,
@@ -1803,7 +1811,7 @@ void run_processes_parallel(const struct run_process_parallel_opts *opts)
 				pp.children[i].state = GIT_CP_WAIT_CLEANUP;
 		} else {
 			pp_buffer_stderr(&pp, output_timeout);
-			pp_output(&pp);
+			pp_output(&pp, opts);
 		}
 		code = pp_collect_finished(&pp, opts);
 		if (code) {
diff --git a/run-command.h b/run-command.h
index 075bd9b9de..cb51c56ea6 100644
--- a/run-command.h
+++ b/run-command.h
@@ -440,6 +440,22 @@ typedef int (*start_failure_fn)(struct strbuf *out,
 				void *pp_cb,
 				void *pp_task_cb);
 
+/**
+ * This callback is periodically called while child processes are
+ * running and also when a child process finishes.
+ *
+ * "struct strbuf *out" contains the output collected from pp_task_cb
+ * since the last call of this function.
+ *
+ * pp_cb is the callback cookie as passed into run_processes_parallel,
+ * pp_task_cb is the callback cookie as passed into get_next_task_fn.
+ *
+ * This function is incompatible with "ungroup"
+ */
+typedef void (*pipe_output_fn)(struct strbuf *out,
+				void *pp_cb,
+				void *pp_task_cb);
+
 /**
  * This callback is called on every child process that finished processing.
  *
@@ -493,6 +509,12 @@ struct run_process_parallel_opts
 	 */
 	start_failure_fn start_failure;
 
+	/**
+	 * pipe_output: See pipe_output_fn() above. This should be
+	 * NULL unless process specific output is needed
+	 */
+	pipe_output_fn pipe_output;
+
 	/**
 	 * task_finished: See task_finished_fn() above. This can be
 	 * NULL to omit any special handling.
diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
index 46bac2bb70..d3c3df7960 100644
--- a/t/helper/test-run-command.c
+++ b/t/helper/test-run-command.c
@@ -52,6 +52,18 @@ static int no_job(struct child_process *cp,
 	return 0;
 }
 
+static void pipe_output(struct strbuf *out,
+				void *pp_cb,
+				void *pp_task_cb)
+{
+	fprintf(stderr, "%s", out->buf);
+	/*
+	 * Resetting output to show that piped output would print the
+	 * same as other tests without the pipe_output() function set
+	 */
+	strbuf_reset(out);
+}
+
 static int task_finished(int result,
 			 struct strbuf *err,
 			 void *pp_cb,
@@ -439,6 +451,12 @@ int cmd__run_command(int argc, const char **argv)
 		opts.ungroup = 1;
 	}
 
+	if (!strcmp(argv[1], "--pipe-output")) {
+		argv += 1;
+		argc -= 1;
+		opts.pipe_output = pipe_output;
+	}
+
 	jobs = atoi(argv[2]);
 	strvec_clear(&proc.args);
 	strvec_pushv(&proc.args, (const char **)argv + 3);
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 19af082750..feabb3717b 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -129,11 +129,14 @@ Hello
 World
 EOF
 
-test_expect_success 'run_command runs in parallel with more jobs available than tasks' '
-	test-tool run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
-	test_must_be_empty out &&
-	test_cmp expect err
-'
+for opt in '' '--pipe-output'
+do
+	test_expect_success "run_command runs in parallel with more jobs available than tasks $opt" '
+		test-tool run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
+		test_must_be_empty out &&
+		test_cmp expect err
+	'
+done
 
 test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' '
 	test-tool run-command --ungroup run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
@@ -141,11 +144,14 @@ test_expect_success 'run_command runs ungrouped in parallel with more jobs avail
 	test_line_count = 4 err
 '
 
-test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
-	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
-	test_must_be_empty out &&
-	test_cmp expect err
-'
+for opt in '' '--pipe-output'
+do
+	test_expect_success "run_command runs in parallel with as many jobs as tasks $opt" '
+		test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
+		test_must_be_empty out &&
+		test_cmp expect err
+	'
+done
 
 test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' '
 	test-tool run-command --ungroup run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
@@ -153,11 +159,14 @@ test_expect_success 'run_command runs ungrouped in parallel with as many jobs as
 	test_line_count = 4 err
 '
 
-test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
-	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
-	test_must_be_empty out &&
-	test_cmp expect err
-'
+for opt in '' '--pipe-output'
+do
+	test_expect_success "run_command runs in parallel with more tasks than jobs available $opt" '
+		test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
+		test_must_be_empty out &&
+		test_cmp expect err
+	'
+done
 
 test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' '
 	test-tool run-command --ungroup run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
@@ -174,11 +183,14 @@ preloaded output of a child
 asking for a quick stop
 EOF
 
-test_expect_success 'run_command is asked to abort gracefully' '
-	test-tool run-command run-command-abort 3 false >out 2>err &&
-	test_must_be_empty out &&
-	test_cmp expect err
-'
+for opt in '' '--pipe-output'
+do
+	test_expect_success "run_command is asked to abort gracefully $opt" '
+		test-tool run-command run-command-abort 3 false >out 2>err &&
+		test_must_be_empty out &&
+		test_cmp expect err
+	'
+done
 
 test_expect_success 'run_command is asked to abort gracefully (ungroup)' '
 	test-tool run-command --ungroup run-command-abort 3 false >out 2>err &&
@@ -190,11 +202,14 @@ cat >expect <<-EOF
 no further jobs available
 EOF
 
-test_expect_success 'run_command outputs ' '
-	test-tool run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
-	test_must_be_empty out &&
-	test_cmp expect err
-'
+for opt in '' '--pipe-output'
+do
+	test_expect_success "run_command outputs $opt" '
+		test-tool run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
+		test_must_be_empty out &&
+		test_cmp expect err
+	'
+done
 
 test_expect_success 'run_command outputs (ungroup) ' '
 	test-tool run-command --ungroup run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] submodule: move status parsing into function
       [not found] <https://lore.kernel.org/git/20220922232947.631309-1-calvinwan@google.com/>
  2022-10-11 23:26 ` [PATCH v2 0/4] submodule: parallelize diff Calvin Wan
  2022-10-11 23:26 ` [PATCH v2 1/4] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
@ 2022-10-11 23:26 ` Calvin Wan
  2022-10-12  7:41   ` Ævar Arnfjörð Bjarmason
  2022-10-12  8:27   ` Ævar Arnfjörð Bjarmason
  2022-10-11 23:26 ` [PATCH v2 3/4] diff-lib: refactor match_stat_with_submodule Calvin Wan
  2022-10-11 23:26 ` [PATCH v2 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
  4 siblings, 2 replies; 11+ messages in thread
From: Calvin Wan @ 2022-10-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123

A future patch requires the ability to parse the output of git
status --porcelain=2. Move parsing code from is_submodule_modified to
parse_status_porcelain.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 submodule.c | 71 +++++++++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/submodule.c b/submodule.c
index 72b295b87b..a3410ed8f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1864,6 +1864,43 @@ int fetch_submodules(struct repository *r,
 	return spf.result;
 }
 
+static int parse_status_porcelain(char *buf, unsigned *dirty_submodule, int ignore_untracked)
+{
+	/* regular untracked files */
+	if (buf[0] == '?')
+		*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
+
+	if (buf[0] == 'u' ||
+		buf[0] == '1' ||
+		buf[0] == '2') {
+		/* T = line type, XY = status, SSSS = submodule state */
+		if (strlen(buf) < strlen("T XY SSSS"))
+			BUG("invalid status --porcelain=2 line %s",
+				buf);
+
+		if (buf[5] == 'S' && buf[8] == 'U')
+			/* nested untracked file */
+			*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
+
+		if (buf[0] == 'u' ||
+			buf[0] == '2' ||
+			memcmp(buf + 5, "S..U", 4))
+			/* other change */
+			*dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
+	}
+
+	if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
+		((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
+		ignore_untracked)) {
+		/*
+		* We're not interested in any further information from
+		* the child any more, neither output nor its exit code.
+		*/
+		return 1;
+	}
+	return 0;
+}
+
 unsigned is_submodule_modified(const char *path, int ignore_untracked)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
@@ -1900,39 +1937,9 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
 
 	fp = xfdopen(cp.out, "r");
 	while (strbuf_getwholeline(&buf, fp, '\n') != EOF) {
-		/* regular untracked files */
-		if (buf.buf[0] == '?')
-			dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
-
-		if (buf.buf[0] == 'u' ||
-		    buf.buf[0] == '1' ||
-		    buf.buf[0] == '2') {
-			/* T = line type, XY = status, SSSS = submodule state */
-			if (buf.len < strlen("T XY SSSS"))
-				BUG("invalid status --porcelain=2 line %s",
-				    buf.buf);
-
-			if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
-				/* nested untracked file */
-				dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
-
-			if (buf.buf[0] == 'u' ||
-			    buf.buf[0] == '2' ||
-			    memcmp(buf.buf + 5, "S..U", 4))
-				/* other change */
-				dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
-		}
-
-		if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
-		    ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
-		     ignore_untracked)) {
-			/*
-			 * We're not interested in any further information from
-			 * the child any more, neither output nor its exit code.
-			 */
-			ignore_cp_exit_code = 1;
+		ignore_cp_exit_code = parse_status_porcelain(buf.buf, &dirty_submodule, ignore_untracked);
+		if (ignore_cp_exit_code)
 			break;
-		}
 	}
 	fclose(fp);
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] diff-lib: refactor match_stat_with_submodule
       [not found] <https://lore.kernel.org/git/20220922232947.631309-1-calvinwan@google.com/>
                   ` (2 preceding siblings ...)
  2022-10-11 23:26 ` [PATCH v2 2/4] submodule: move status parsing into function Calvin Wan
@ 2022-10-11 23:26 ` Calvin Wan
  2022-10-11 23:26 ` [PATCH v2 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
  4 siblings, 0 replies; 11+ messages in thread
From: Calvin Wan @ 2022-10-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123

Flatten out the if statements in match_stat_with_submodule so the
logic is more readable and easier for future patches to add to.
orig_flags didn't need to be set if the cache entry wasn't a
GITLINK so defer setting it.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 diff-lib.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 2edea41a23..e249322141 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -73,18 +73,25 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
 				     unsigned *dirty_submodule)
 {
 	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
-	if (S_ISGITLINK(ce->ce_mode)) {
-		struct diff_flags orig_flags = diffopt->flags;
-		if (!diffopt->flags.override_submodule_config)
-			set_diffopt_flags_from_submodule_config(diffopt, ce->name);
-		if (diffopt->flags.ignore_submodules)
-			changed = 0;
-		else if (!diffopt->flags.ignore_dirty_submodules &&
+	struct diff_flags orig_flags;
+
+	if (!S_ISGITLINK(ce->ce_mode))
+		goto ret;
+
+	orig_flags = diffopt->flags;
+	if (!diffopt->flags.override_submodule_config)
+		set_diffopt_flags_from_submodule_config(diffopt, ce->name);
+	if (diffopt->flags.ignore_submodules) {
+		changed = 0;
+		goto cleanup;
+	}
+	if (!diffopt->flags.ignore_dirty_submodules &&
 			 (!changed || diffopt->flags.dirty_submodules))
-			*dirty_submodule = is_submodule_modified(ce->name,
-								 diffopt->flags.ignore_untracked_in_submodules);
+		*dirty_submodule = is_submodule_modified(ce->name,
+					diffopt->flags.ignore_untracked_in_submodules);
+cleanup:
 		diffopt->flags = orig_flags;
-	}
+ret:
 	return changed;
 }
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] diff-lib: parallelize run_diff_files for submodules
       [not found] <https://lore.kernel.org/git/20220922232947.631309-1-calvinwan@google.com/>
                   ` (3 preceding siblings ...)
  2022-10-11 23:26 ` [PATCH v2 3/4] diff-lib: refactor match_stat_with_submodule Calvin Wan
@ 2022-10-11 23:26 ` Calvin Wan
  2022-10-12  8:31   ` Ævar Arnfjörð Bjarmason
  4 siblings, 1 reply; 11+ messages in thread
From: Calvin Wan @ 2022-10-11 23:26 UTC (permalink / raw)
  To: git; +Cc: Calvin Wan, emilyshaffer, avarab, phillip.wood123

During the iteration of the index entries in run_diff_files, whenever
a submodule is found and needs its status checked, a subprocess is
spawned for it. Instead of spawning the subprocess immediately and
waiting for its completion to continue, hold onto all submodules and
relevant information in a list. Then use that list to create tasks for
run_processes_parallel. Subprocess output is piped to
status_pipe_output which parses it. 

Add config option submodule.diffJobs to set the maximum number
of parallel jobs. The option defaults to 1 if unset. If set to 0, the
number of jobs is set to online_cpus(). I added a TODO here, regarding
defaults -- please see the cover letter for the discussion. 

Since run_diff_files is called from many different commands, I chose
to grab the config option in the function rather than adding variables
to every git command and then figuring out how to pass them all in.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 Documentation/config/submodule.txt |  12 +++
 diff-lib.c                         |  89 ++++++++++++++--
 submodule.c                        | 159 +++++++++++++++++++++++++++++
 submodule.h                        |   9 ++
 t/t4027-diff-submodule.sh          |  19 ++++
 t/t7506-status-submodule.sh        |  19 ++++
 6 files changed, 299 insertions(+), 8 deletions(-)

diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
index 6490527b45..3209eb8117 100644
--- a/Documentation/config/submodule.txt
+++ b/Documentation/config/submodule.txt
@@ -93,6 +93,18 @@ submodule.fetchJobs::
 	in parallel. A value of 0 will give some reasonable default.
 	If unset, it defaults to 1.
 
+submodule.diffJobs::
+	Specifies how many submodules are diffed at the same time. A
+	positive integer allows up to that number of submodules diffed
+	in parallel. A value of 0 will give some reasonable default.
+	If unset, it defaults to 1. The diff operation is used by many
+	other git commands such as add, merge, diff, status, stash and
+	more. Note that the expensive part of the diff operation is
+	reading the index from cache or memory. Therefore multiple jobs
+	may be detrimental to performance if your hardware does not
+	support parallel reads or if the number of jobs greatly exceeds
+	the amount of supported reads.
+
 submodule.alternateLocation::
 	Specifies how the submodules obtain alternates when submodules are
 	cloned. Possible values are `no`, `superproject`.
diff --git a/diff-lib.c b/diff-lib.c
index e249322141..44bb5558b3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -14,6 +14,7 @@
 #include "dir.h"
 #include "fsmonitor.h"
 #include "commit-reach.h"
+#include "config.h"
 
 /*
  * diff-files
@@ -65,15 +66,20 @@ static int check_removed(const struct index_state *istate, const struct cache_en
  * Return 1 when changes are detected, 0 otherwise. If the DIRTY_SUBMODULES
  * option is set, the caller does not only want to know if a submodule is
  * modified at all but wants to know all the conditions that are met (new
- * commits, untracked content and/or modified content).
+ * commits, untracked content and/or modified content). If
+ * defer_submodule_status bit is set, dirty_submodule will be left to the
+ * caller to set. defer_submodule_status can also be set to 0 in this
+ * function if there is no need to check if the submodule is modified.
  */
 static int match_stat_with_submodule(struct diff_options *diffopt,
 				     const struct cache_entry *ce,
 				     struct stat *st, unsigned ce_option,
-				     unsigned *dirty_submodule)
+				     unsigned *dirty_submodule, int *defer_submodule_status,
+					 int *ignore_untracked_in_submodules)
 {
 	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
 	struct diff_flags orig_flags;
+	int defer = 0;
 
 	if (!S_ISGITLINK(ce->ce_mode))
 		goto ret;
@@ -86,12 +92,21 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
 		goto cleanup;
 	}
 	if (!diffopt->flags.ignore_dirty_submodules &&
-			 (!changed || diffopt->flags.dirty_submodules))
-		*dirty_submodule = is_submodule_modified(ce->name,
-					diffopt->flags.ignore_untracked_in_submodules);
+		(!changed || diffopt->flags.dirty_submodules)) {
+		if (defer_submodule_status && *defer_submodule_status) {
+			defer = 1;
+			*ignore_untracked_in_submodules =
+							diffopt->flags.ignore_untracked_in_submodules;
+		} else {
+			*dirty_submodule = is_submodule_modified(ce->name,
+							diffopt->flags.ignore_untracked_in_submodules);
+		}
+	}
 cleanup:
-		diffopt->flags = orig_flags;
+	diffopt->flags = orig_flags;
 ret:
+	if (defer_submodule_status)
+		*defer_submodule_status = defer;
 	return changed;
 }
 
@@ -103,6 +118,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			      ? CE_MATCH_RACY_IS_DIRTY : 0);
 	uint64_t start = getnanotime();
 	struct index_state *istate = revs->diffopt.repo->index;
+	struct string_list submodules = STRING_LIST_INIT_NODUP;
 
 	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
 
@@ -227,6 +243,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			newmode = ce->ce_mode;
 		} else {
 			struct stat st;
+			int ignore_untracked_in_submodules = 0;
+			int defer_submodule_status = !!revs->repo;
 
 			changed = check_removed(istate, ce, &st);
 			if (changed) {
@@ -248,8 +266,22 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			}
 
 			changed = match_stat_with_submodule(&revs->diffopt, ce, &st,
-							    ce_option, &dirty_submodule);
+								ce_option, &dirty_submodule,
+								&defer_submodule_status,
+								&ignore_untracked_in_submodules);
 			newmode = ce_mode_from_stat(ce, st.st_mode);
+			if (defer_submodule_status) {
+				struct string_list_item *item =
+								string_list_append(&submodules, ce->name);
+				struct submodule_status_util *util = xmalloc(sizeof(*util));
+				util->changed = changed;
+				util->dirty_submodule = 0;
+				util->ignore_untracked = ignore_untracked_in_submodules;
+				util->newmode = newmode;
+				util->ce = ce;
+				item->util = util;
+				continue;
+			}
 		}
 
 		if (!changed && !dirty_submodule) {
@@ -268,6 +300,47 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 			    ce->name, 0, dirty_submodule);
 
 	}
+	if (submodules.nr > 0) {
+		int i;
+		int parallel_jobs = 1;
+		git_config_get_int("submodule.diffjobs", &parallel_jobs);
+		if (parallel_jobs < 0) {
+			die(_("submodule.diffjobs cannot be negative"));
+		}
+		else if (!parallel_jobs) {
+			/*
+			 * TODO: Decide what a reasonable default for parallel_jobs
+			 * is. Currently mimics what other parallel config options
+			 * default to.
+			 */
+			parallel_jobs = online_cpus();
+		}
+
+		if (get_submodules_status(revs->repo, &submodules, parallel_jobs))
+			BUG("Submodule status failed");
+		for (i = 0; i < submodules.nr; i++) {
+			struct submodule_status_util *util = submodules.items[i].util;
+			struct cache_entry *ce = util->ce;
+			unsigned int oldmode;
+			const struct object_id *old_oid, *new_oid;
+
+			if (!util->changed && !util->dirty_submodule) {
+				ce_mark_uptodate(ce);
+				mark_fsmonitor_valid(istate, ce);
+				if (!revs->diffopt.flags.find_copies_harder)
+					continue;
+			}
+			oldmode = ce->ce_mode;
+			old_oid = &ce->oid;
+			new_oid = util->changed ? null_oid() : &ce->oid;
+			diff_change(&revs->diffopt, oldmode, util->newmode,
+					old_oid, new_oid,
+					!is_null_oid(old_oid),
+					!is_null_oid(new_oid),
+					ce->name, 0, util->dirty_submodule);
+		}
+	}
+	string_list_clear(&submodules, 1);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
 	trace_performance_since(start, "diff-files");
@@ -315,7 +388,7 @@ static int get_stat_data(const struct index_state *istate,
 			return -1;
 		}
 		changed = match_stat_with_submodule(diffopt, ce, &st,
-						    0, dirty_submodule);
+						    0, dirty_submodule, NULL, NULL);
 		if (changed) {
 			mode = ce_mode_from_stat(ce, st.st_mode);
 			oid = null_oid();
diff --git a/submodule.c b/submodule.c
index a3410ed8f0..72051fda81 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1363,6 +1363,20 @@ int submodule_touches_in_range(struct repository *r,
 	return ret;
 }
 
+struct submodule_parallel_status {
+	int index_count;
+	int result;
+
+	struct string_list *submodule_names;
+	struct repository *r;
+
+	/* Pending statuses by OIDs */
+	struct status_task **oid_status_tasks;
+	int oid_status_tasks_nr, oid_status_tasks_alloc;
+};
+
+#define SPS_INIT { 0 }
+
 struct submodule_parallel_fetch {
 	/*
 	 * The index of the last index entry processed by
@@ -1445,6 +1459,13 @@ struct fetch_task {
 	struct oid_array *commits; /* Ensure these commits are fetched */
 };
 
+struct status_task {
+	struct repository *repo;
+	const char *path;
+	unsigned dirty_submodule;
+	int ignore_untracked;
+};
+
 /**
  * When a submodule is not defined in .gitmodules, we cannot access it
  * via the regular submodule-config. Create a fake submodule, which we can
@@ -1950,6 +1971,144 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
 	return dirty_submodule;
 }
 
+static struct status_task *
+get_status_task_from_index(struct submodule_parallel_status *sps,
+			  struct strbuf *err)
+{
+	for (; sps->index_count < sps->submodule_names->nr; sps->index_count++) {
+		struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util;
+		const struct cache_entry *ce = util->ce;
+		struct status_task *task;
+		struct strbuf buf = STRBUF_INIT;
+		const char *git_dir;
+
+		strbuf_addf(&buf, "%s/.git", ce->name);
+		git_dir = read_gitfile(buf.buf);
+		if (!git_dir)
+			git_dir = buf.buf;
+		if (!is_git_directory(git_dir)) {
+			if (is_directory(git_dir))
+				die(_("'%s' not recognized as a git repository"), git_dir);
+			strbuf_release(&buf);
+			/* The submodule is not checked out, so it is not modified */
+			util->dirty_submodule = 0;
+			continue;
+		}
+		strbuf_release(&buf);
+
+		task = xmalloc(sizeof(*task));
+		memset(task, 0, sizeof(*task));
+		task->path = ce->name;
+		task->dirty_submodule = util->dirty_submodule;
+		task->ignore_untracked = util->ignore_untracked;
+		sps->index_count++;
+		return task;
+	}
+	return NULL;
+}
+
+
+static int get_next_submodule_status(struct child_process *cp,
+						struct strbuf *err, void *data, void **task_cb)
+{
+	struct submodule_parallel_status *sps = data;
+	struct status_task *task = get_status_task_from_index(sps, err);
+	int ignore_untracked;
+
+	if (!task) {
+		return 0;
+	}
+
+	ignore_untracked = task->ignore_untracked;
+
+	child_process_init(cp);
+	prepare_submodule_repo_env_in_gitdir(&cp->env);
+
+	strvec_init(&cp->args);
+	strvec_pushl(&cp->args, "status", "--porcelain=2", NULL);
+	if (ignore_untracked)
+		strvec_push(&cp->args, "-uno");
+
+	prepare_submodule_repo_env(&cp->env);
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->dir = task->path;
+	*task_cb = task;
+	return 1;
+}
+
+static int status_start_failure(struct strbuf *err,
+			       void *cb, void *task_cb)
+{
+	struct submodule_parallel_status *sps = cb;
+
+	sps->result = 1;
+	return 0;
+}
+
+static void status_pipe_output(struct strbuf *out,
+			void *cb, void *task_cb)
+{
+	struct status_task *task = task_cb;
+	struct string_list list = STRING_LIST_INIT_DUP;
+	int i;
+
+	string_list_split(&list, out->buf, '\n', -1);
+
+	for (i = 0; i < list.nr; i++) {
+		if (parse_status_porcelain(list.items[i].string,
+							&task->dirty_submodule, task->ignore_untracked))
+			break;
+	}
+	string_list_clear(&list, 0);
+	strbuf_reset(out);
+}
+
+static int status_finish(int retvalue, struct strbuf *err,
+			void *cb, void *task_cb)
+{
+	struct submodule_parallel_status *sps = cb;
+	struct status_task *task = task_cb;
+	struct string_list_item *it =
+		string_list_lookup(sps->submodule_names, task->path);
+	struct submodule_status_util *util = it->util;
+
+	util->dirty_submodule = task->dirty_submodule;
+	free(task);
+
+	return 0;
+}
+
+int get_submodules_status(struct repository *r,
+			struct string_list *submodules,
+			int max_parallel_jobs)
+{
+	struct submodule_parallel_status sps = SPS_INIT;
+	const struct run_process_parallel_opts opts = {
+		.tr2_category = "submodule",
+		.tr2_label = "parallel/status",
+
+		.jobs = max_parallel_jobs,
+
+		.get_next_task = get_next_submodule_status,
+		.start_failure = status_start_failure,
+		.pipe_output = status_pipe_output,
+		.task_finished = status_finish,
+		.data = &sps,
+	};
+
+	sps.r = r;
+
+	if (repo_read_index(r) < 0)
+		die(_("index file corrupt"));
+
+	sps.submodule_names = submodules;
+	string_list_sort(sps.submodule_names);
+	run_processes_parallel(&opts);
+
+	return sps.result;
+}
+
 int submodule_uses_gitfile(const char *path)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
diff --git a/submodule.h b/submodule.h
index 6a9fec6de1..c829d37b9f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,6 +41,12 @@ struct submodule_update_strategy {
 	.type = SM_UPDATE_UNSPECIFIED, \
 }
 
+struct submodule_status_util {
+	int changed, ignore_untracked;
+	unsigned dirty_submodule, newmode;
+	struct cache_entry *ce;
+};
+
 int is_gitmodules_unmerged(struct index_state *istate);
 int is_writing_gitmodules_ok(void);
 int is_staging_gitmodules_ok(struct index_state *istate);
@@ -94,6 +100,9 @@ int fetch_submodules(struct repository *r,
 		     int command_line_option,
 		     int default_option,
 		     int quiet, int max_parallel_jobs);
+int get_submodules_status(struct repository *r,
+			 struct string_list *submodules,
+			 int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 
diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh
index 40164ae07d..e08ee315a7 100755
--- a/t/t4027-diff-submodule.sh
+++ b/t/t4027-diff-submodule.sh
@@ -34,6 +34,25 @@ test_expect_success setup '
 	subtip=$3 subprev=$2
 '
 
+test_expect_success 'diff in superproject with submodules respects parallel settings' '
+	test_when_finished "rm -f trace.out" &&
+	(
+		GIT_TRACE=$(pwd)/trace.out git diff &&
+		grep "1 tasks" trace.out &&
+		>trace.out &&
+
+		git config submodule.diffJobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git diff &&
+		grep "8 tasks" trace.out &&
+		>trace.out &&
+
+		GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff &&
+		grep "preparing to run up to [0-9]* tasks" trace.out &&
+		! grep "up to 0 tasks" trace.out &&
+		>trace.out
+	)
+'
+
 test_expect_success 'git diff --raw HEAD' '
 	hexsz=$(test_oid hexsz) &&
 	git diff --raw --abbrev=$hexsz HEAD >actual &&
diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
index f5426a8e58..dfdfb58bfc 100755
--- a/t/t7506-status-submodule.sh
+++ b/t/t7506-status-submodule.sh
@@ -410,4 +410,23 @@ test_expect_success 'status with added file in nested submodule (short)' '
 	EOF
 '
 
+test_expect_success 'status in superproject with submodules respects parallel settings' '
+	test_when_finished "rm -f trace.out" &&
+	(
+		GIT_TRACE=$(pwd)/trace.out git status &&
+		grep "1 tasks" trace.out &&
+		>trace.out &&
+
+		git config submodule.diffJobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git status &&
+		grep "8 tasks" trace.out &&
+		>trace.out &&
+
+		GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status &&
+		grep "preparing to run up to [0-9]* tasks" trace.out &&
+		! grep "up to 0 tasks" trace.out &&
+		>trace.out
+	)
+'
+
 test_done
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] submodule: parallelize diff
  2022-10-11 23:26 ` [PATCH v2 0/4] submodule: parallelize diff Calvin Wan
@ 2022-10-12  5:52   ` Junio C Hamano
  2022-10-14  0:39     ` Calvin Wan
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2022-10-12  5:52 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer, avarab, phillip.wood123

Calvin Wan <calvinwan@google.com> writes:

> I also wanted to pose another question to list regarding defaults for
> parallel processes. For jobs that clearly scale with the number of
> processes (aka jobs that are mostly processor bound), it is obvious that
> setting the default number of processes to the number of available cores
> is the most optimal option. However, this changes when the job is mostly
> I/O bound or has a combination of I/O and processing. Looking at my use
> case for `status` on a cold cache (see below), we notice that increasing
> the number of parallel processes speeds up status, but after a certain
> number, it actually starts slowing down.

I do not offhand recall how the default parallelism is computed
there, but if I am correct to suspect that "git grep" has a similar
scaling pattern, i.e. the threads all need to compete for I/O to
read from the filesystem to find needles from the haystack, perhaps
it would give us a precedent to model the behaviour of this part of
the code, too, hopefully?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] submodule: move status parsing into function
  2022-10-11 23:26 ` [PATCH v2 2/4] submodule: move status parsing into function Calvin Wan
@ 2022-10-12  7:41   ` Ævar Arnfjörð Bjarmason
  2022-10-12  8:27   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-12  7:41 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer, phillip.wood123


On Tue, Oct 11 2022, Calvin Wan wrote:

> A future patch requires the ability to parse the output of git
> status --porcelain=2. Move parsing code from is_submodule_modified to
> parse_status_porcelain.
>
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>  submodule.c | 71 +++++++++++++++++++++++++++++------------------------
>  1 file changed, 39 insertions(+), 32 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 72b295b87b..a3410ed8f0 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1864,6 +1864,43 @@ int fetch_submodules(struct repository *r,
>  	return spf.result;
>  }
>  
> +static int parse_status_porcelain(char *buf, unsigned *dirty_submodule, int ignore_untracked)
> +{
> +	/* regular untracked files */
> +	if (buf[0] == '?')
> +		*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
> +
> +	if (buf[0] == 'u' ||
> +		buf[0] == '1' ||
> +		buf[0] == '2') {
> +		/* T = line type, XY = status, SSSS = submodule state */
> +		if (strlen(buf) < strlen("T XY SSSS"))
> +			BUG("invalid status --porcelain=2 line %s",
> +				buf);
> +
> +		if (buf[5] == 'S' && buf[8] == 'U')
> +			/* nested untracked file */
> +			*dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
> +
> +		if (buf[0] == 'u' ||
> +			buf[0] == '2' ||
> +			memcmp(buf + 5, "S..U", 4))
> +			/* other change */
> +			*dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
> +	}
> +
> +	if ((*dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
> +		((*dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
> +		ignore_untracked)) {
> +		/*
> +		* We're not interested in any further information from
> +		* the child any more, neither output nor its exit code.
> +		*/
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  unsigned is_submodule_modified(const char *path, int ignore_untracked)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> @@ -1900,39 +1937,9 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
>  
>  	fp = xfdopen(cp.out, "r");
>  	while (strbuf_getwholeline(&buf, fp, '\n') != EOF) {
> -		/* regular untracked files */
> -		if (buf.buf[0] == '?')
> -			dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
> -
> -		if (buf.buf[0] == 'u' ||
> -		    buf.buf[0] == '1' ||
> -		    buf.buf[0] == '2') {
> -			/* T = line type, XY = status, SSSS = submodule state */
> -			if (buf.len < strlen("T XY SSSS"))
> -				BUG("invalid status --porcelain=2 line %s",
> -				    buf.buf);
> -
> -			if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
> -				/* nested untracked file */
> -				dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
> -
> -			if (buf.buf[0] == 'u' ||
> -			    buf.buf[0] == '2' ||
> -			    memcmp(buf.buf + 5, "S..U", 4))
> -				/* other change */
> -				dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
> -		}
> -
> -		if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
> -		    ((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
> -		     ignore_untracked)) {
> -			/*
> -			 * We're not interested in any further information from
> -			 * the child any more, neither output nor its exit code.
> -			 */
> -			ignore_cp_exit_code = 1;
> +		ignore_cp_exit_code = parse_status_porcelain(buf.buf, &dirty_submodule, ignore_untracked);
> +		if (ignore_cp_exit_code)
>  			break;
> -		}
>  	}
>  	fclose(fp);

This isn't just a code move, but a rewrite of much of the code to accept
a "char *buf" rather than a "struct strbuf buf".

I can see in a later commit that you'd like to change this to accept a
.string from a string_list.


Without changing anything else, if you lead with a commit that does (in
the initial loop):

	char *str = buf.buf;
        const size_t len = buf.len;

And then make it use "buf" and "len" instead you could follow with the
move commit, which at that ponit would benefit from the rename detection
more more than it does now.

Also: If we have a strbuf in this caller let's pass a "len", and not
here make it need to do a strlen() on every line when we've computed it
already, the other caller could just pass another strlen([...].string).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] run-command: add pipe_output_fn to run_processes_parallel_opts
  2022-10-11 23:26 ` [PATCH v2 1/4] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
@ 2022-10-12  7:58   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-12  7:58 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer, phillip.wood123


On Tue, Oct 11 2022, Calvin Wan wrote:

(re-arranging the diff a bit for discussion)

> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
> index 46bac2bb70..d3c3df7960 100644
> --- a/t/helper/test-run-command.c
> +++ b/t/helper/test-run-command.c
> @@ -52,6 +52,18 @@ static int no_job(struct child_process *cp,
>  	return 0;
>  }
>  
> +static void pipe_output(struct strbuf *out,
> +				void *pp_cb,
> +				void *pp_task_cb)
> +{
> +	fprintf(stderr, "%s", out->buf);
> +	/*
> +	 * Resetting output to show that piped output would print the
> +	 * same as other tests without the pipe_output() function set
> +	 */
> +	strbuf_reset(out);
> +}
> +

This is still unaddressed from the last round:
https://lore.kernel.org/git/220927.86edvxytla.gmgdl@evledraar.gmail.com/

Or well, it changed a bit, but the "reset" is still there.

Whatever you're aiming for here, this is apparently doing nothing,
because if I comment out the strbuf_reset() then all your tests pass,
which...

>  	int i = pp->output_owner;
>  
>  	if (pp->children[i].state == GIT_CP_WORKING &&
>  	    pp->children[i].err.len) {
> +		if (opts->pipe_output)
> +			opts->pipe_output(&pp->children[i].err, pp->data,
> +						pp->children[i].data);
>  		strbuf_write(&pp->children[i].err, stderr);
>  		strbuf_reset(&pp->children[i].err);
>  	}

...we can see from here...

> @@ -1716,6 +1720,10 @@ static int pp_collect_finished(struct parallel_processes *pp,
>  
>  		code = finish_command(&pp->children[i].process);
>  
> +		if (opts->pipe_output)
> +			opts->pipe_output(&pp->children[i].err, pp->data,
> +						pp->children[i].data);
> +
>  		if (opts->task_finished)
>  			code = opts->task_finished(code, opts->ungroup ? NULL :
>  						   &pp->children[i].err, pp->data,

...and in particular here, where the "err" is subsequently passed to
"task_finished" *would* actually have an impact, but (again,
re-arranging he diff a bit) ...

> @@ -439,6 +451,12 @@ int cmd__run_command(int argc, const char **argv)
>  		opts.ungroup = 1;
>  	}
>  
> +	if (!strcmp(argv[1], "--pipe-output")) {
> +		argv += 1;
> +		argc -= 1;
> +		opts.pipe_output = pipe_output;
> +	}
> +
>  	jobs = atoi(argv[2]);
>  	strvec_clear(&proc.args);
>  	strvec_pushv(&proc.args, (const char **)argv + 3);

...here to test that we'd need our tests to combine "pipe_output" with
"task_finished".

Which we should do in either case, e.g. if you define both and append to
"err" in what order do we call them, and does "err" accumulate? (I think
it should), in any case...

> +	/**
> +	 * pipe_output: See pipe_output_fn() above. This should be
> +	 * NULL unless process specific output is needed
> +	 */
> +	pipe_output_fn pipe_output;


...since we're on the topic we really should document what the behavior
is. My understanding of this API has been that the "err" is not "const"
because the callback is supposed to *append errors*, and there's an
implicit contract to do nothing else than that. But looking at your new
API docs:

> + * This callback is periodically called while child processes are
> + * running and also when a child process finishes.

Whatever you think this should be doing isn't documented precisely,
i.e. is it called before/after some other callbacks, should the user
rely on that at all, and...

> + * "struct strbuf *out" contains the output collected from pp_task_cb
> + * since the last call of this function.

...this just seems wrong. We don't collect outptu "from pp_task_cb", do
you mean to say something, well, I was about to put words in your mouth,
but I honestly don't know what you expect, so ...

*I'd* this to be documented as something like (which every other
 callback does):

	See run_processes_parallel() below for a discussion of the "struct
	strbuf *out" parameter.

Now *those* docs could be improved a bit, but how we handle "err/out"
really needs to be documented holistically, as it's shared betwene
callbacks.

> + *
> + * pp_cb is the callback cookie as passed into run_processes_parallel,
> + * pp_task_cb is the callback cookie as passed into get_next_task_fn.
> + *
> + * This function is incompatible with "ungroup"

I'll re-roll the base series, which has a good place to add a BUG() if
incompatible options are combined, which would be good to add, i.e.:

	if (opts->pipe_output && opts->ungroup)
		BUG(...);

> +for opt in '' '--pipe-output'
> +do
> +	test_expect_success "run_command runs in parallel with more jobs available than tasks $opt" '
> +		test-tool run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
> +		test_must_be_empty out &&
> +		test_cmp expect err
> +	'
> +done

I take an earlier comment back, you do have tests that would change if
that strbuf_reset() was removed, you're just not running them. You
forgot to put an "$opt" into the "test-tool" invocations:
	
	diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
	index feabb3717b0..19a674480cd 100755
	--- a/t/t0061-run-command.sh
	+++ b/t/t0061-run-command.sh
	@@ -162,7 +162,7 @@ test_expect_success 'run_command runs ungrouped in parallel with as many jobs as
	 for opt in '' '--pipe-output'
	 do
	 	test_expect_success "run_command runs in parallel with more tasks than jobs available $opt" '
	-		test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
	+		test-tool run-command $opt run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
	 		test_must_be_empty out &&
	 		test_cmp expect err
	 	'
	@@ -186,7 +186,7 @@ EOF
	 for opt in '' '--pipe-output'
	 do
	 	test_expect_success "run_command is asked to abort gracefully $opt" '
	-		test-tool run-command run-command-abort 3 false >out 2>err &&
	+		test-tool run-command $opt run-command-abort 3 false >out 2>err &&
	 		test_must_be_empty out &&
	 		test_cmp expect err
	 	'
	@@ -205,7 +205,7 @@ EOF
	 for opt in '' '--pipe-output'
	 do
	 	test_expect_success "run_command outputs $opt" '
	-		test-tool run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
	+		test-tool run-command $opt run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
	 		test_must_be_empty out &&
	 		test_cmp expect err
	 	'
	

With that all except 1 of your tests pass, now if I remove
strbuf_reset() and tweak the test so we can see what output comes from
what, then:
	
	diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
	index ce246dfa4bc..690467e75c0 100644
	--- a/t/helper/test-run-command.c
	+++ b/t/helper/test-run-command.c
	@@ -56,12 +56,11 @@ static void pipe_output(struct strbuf *out,
	 				void *pp_cb,
	 				void *pp_task_cb)
	 {
	-	fprintf(stderr, "%s", out->buf);
	-	/*
	-	 * Resetting output to show that piped output would print the
	-	 * same as other tests without the pipe_output() function set
	-	 */
	-	strbuf_reset(out);
	+	struct strbuf sb = STRBUF_INIT;
	+
	+	strbuf_add_lines(&sb, "pipe_output: ", out->buf, out->len);
	+	fputs(sb.buf, stderr);
	+	strbuf_release(&sb);
	 }
	 
	 static int task_finished(int result,

The first test we fail (but with the reset we don't fail this one) is:
	
	+ diff -u expect err
	--- expect      2022-10-12 08:22:58.032264828 +0000
	+++ err 2022-10-12 08:22:58.068264515 +0000
	@@ -1,12 +1,24 @@
	+pipe_output: preloaded output of a child
	+pipe_output: Hello
	+pipe_output: World
	 preloaded output of a child
	 Hello
	 World
	+pipe_output: preloaded output of a child
	+pipe_output: Hello
	+pipe_output: World
	+pipe_output: preloaded output of a child
	+pipe_output: Hello
	+pipe_output: World
	 preloaded output of a child
	 Hello
	 World
	 preloaded output of a child
	 Hello
	 World
	+pipe_output: preloaded output of a child
	+pipe_output: Hello
	+pipe_output: World
	 preloaded output of a child
	 Hello
	 World

So, the apparent reason for the strbuf_reset() is that at some point in
the past you passed $out, but wanted to "reset" it so that you could
re-use the existing test_cmp.

But that's just wrong, because we'd be sweeping under the rug exactly
what we want to be testing here. Instead of hiding this you should be
test_cmp-ing the full thing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] submodule: move status parsing into function
  2022-10-11 23:26 ` [PATCH v2 2/4] submodule: move status parsing into function Calvin Wan
  2022-10-12  7:41   ` Ævar Arnfjörð Bjarmason
@ 2022-10-12  8:27   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-12  8:27 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer, phillip.wood123

On Tue, Oct 11 2022, Calvin Wan wrote:

Just commenting on one case, but this is in other parts of the series
(e.g. your additions in submodule.c):

> +			BUG("invalid status --porcelain=2 line %s",
> +				buf);
> [...]
> -				BUG("invalid status --porcelain=2 line %s",
> -				    buf.buf);

Your editor's indentation settings need fixing. Arguments should align
with the opening "(", here you replaced 4 spaces with a "\t".

A "\t" is == 8 spaces for the purposes of our identation, if you need 7
spaces to align with the "7" you insert 7 spaces, if it's 8 a "\t", then
for one more a "\t" and one space etc.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 4/4] diff-lib: parallelize run_diff_files for submodules
  2022-10-11 23:26 ` [PATCH v2 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
@ 2022-10-12  8:31   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-12  8:31 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, emilyshaffer, phillip.wood123


On Tue, Oct 11 2022, Calvin Wan wrote:

Mostly style comments, but also others.:

> +				     unsigned *dirty_submodule, int *defer_submodule_status,
> +					 int *ignore_untracked_in_submodules)

If you can think of a (much) shorter name for this new paremeter, then...

> +		if (defer_submodule_status && *defer_submodule_status) {
> +			defer = 1;
> +			*ignore_untracked_in_submodules =
> +							diffopt->flags.ignore_untracked_in_submodules;
> +		} else {
> +			*dirty_submodule = is_submodule_modified(ce->name,
> +							diffopt->flags.ignore_untracked_in_submodules);

...code like this becomes a lot less verbose and doesn't need this much
indentation...

> +			if (defer_submodule_status) {
> +				struct string_list_item *item =
> +								string_list_append(&submodules, ce->name);

And e.g. here splittng up the two:

	struct string_list_item *item;

	item = ...

Makes for easier reading IMO.

> +				struct submodule_status_util *util = xmalloc(sizeof(*util));
> +				util->changed = changed;
> +				util->dirty_submodule = 0;
> +				util->ignore_untracked = ignore_untracked_in_submodules;
> +				util->newmode = newmode;
> +				util->ce = ce;

Maybe easier to read as:

	struct submodule_status_util tmp = {
		.changed = ...,
		.dirty_submodule =...,
	};

Then a single memcpy() to copy that data over to the malloc'd region.

> +				item->util = util;

And you can also do this idiomatically as:

	string_list_append(...)->util = util;

> +				continue;

> +		int parallel_jobs = 1;
> +		git_config_get_int("submodule.diffjobs", &parallel_jobs);
> +		if (parallel_jobs < 0) {
> +			die(_("submodule.diffjobs cannot be negative"));

Maybe we want something that uses git_parse_unsigned()?

> +		}

This should be cuddled with the "else if"

> +		else if (!parallel_jobs) {
> +			/*
> +			 * TODO: Decide what a reasonable default for parallel_jobs
> +			 * is. Currently mimics what other parallel config options
> +			 * default to.
> +			 */

It's OK to just drop this comment IMO.


> +			parallel_jobs = online_cpus();
> +		}

And as these are both one statement you can drop the {}'s altogether.

I think this would probably be more idiomatic as (untested):

	if (git_config_get_int(..., &v) || !v)
		jobs = online_cpus();
	else if (v < 0) /* or some API checks it for us? */
		die(...);
	else
		jobs = v;

I.e. we'd be explicitly using the "does the key exist" return value.

> +
> +		if (get_submodules_status(revs->repo, &submodules, parallel_jobs))
> +			BUG("Submodule status failed");

s/Sub/sub/, lower-case for bug(), die() etc.

> +		for (i = 0; i < submodules.nr; i++) {

You're iterating a string_list, so that "i" earlier should be size_t,
but better yet I think you can use for_each_string_list_item() here

> +struct submodule_parallel_status {
> +	int index_count;

Here you have an int...
> +	int result;
> +
> +	struct string_list *submodule_names;

..that'll be tracking the "nr" of this, which is size_t, let's have them
match.

Also, can we remove the "*" there and just have submodule.c populate
this struct directly, maybe not worth making it public, just a
thought...

..I didn't read further than this, ran out of time..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/4] submodule: parallelize diff
  2022-10-12  5:52   ` Junio C Hamano
@ 2022-10-14  0:39     ` Calvin Wan
  0 siblings, 0 replies; 11+ messages in thread
From: Calvin Wan @ 2022-10-14  0:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, emilyshaffer, avarab, phillip.wood123

> Calvin Wan <calvinwan@google.com> writes:
>
> > I also wanted to pose another question to list regarding defaults for
> > parallel processes. For jobs that clearly scale with the number of
> > processes (aka jobs that are mostly processor bound), it is obvious that
> > setting the default number of processes to the number of available cores
> > is the most optimal option. However, this changes when the job is mostly
> > I/O bound or has a combination of I/O and processing. Looking at my use
> > case for `status` on a cold cache (see below), we notice that increasing
> > the number of parallel processes speeds up status, but after a certain
> > number, it actually starts slowing down.
>
> I do not offhand recall how the default parallelism is computed
> there, but if I am correct to suspect that "git grep" has a similar
> scaling pattern, i.e. the threads all need to compete for I/O to
> read from the filesystem to find needles from the haystack, perhaps
> it would give us a precedent to model the behaviour of this part of
> the code, too, hopefully?

Setting grep.threads=0 does default it to the number of available cores
(at least the documentation is clear about this). I tested "git grep" on
my machine and found that it started slowing down after 4 threads --
this is most likely because my NVMe SSD uses 4 PCIe lanes aka it can at
most do 4 reads in parallel. AFAIK, there is no way to tell how many
reads a disk can do in parallel. This coupled with the fact that other
commands have varying levels of IO requirements makes it impossible to
set a "reasonable" amount of threads.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-10-14  0:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <https://lore.kernel.org/git/20220922232947.631309-1-calvinwan@google.com/>
2022-10-11 23:26 ` [PATCH v2 0/4] submodule: parallelize diff Calvin Wan
2022-10-12  5:52   ` Junio C Hamano
2022-10-14  0:39     ` Calvin Wan
2022-10-11 23:26 ` [PATCH v2 1/4] run-command: add pipe_output_fn to run_processes_parallel_opts Calvin Wan
2022-10-12  7:58   ` Ævar Arnfjörð Bjarmason
2022-10-11 23:26 ` [PATCH v2 2/4] submodule: move status parsing into function Calvin Wan
2022-10-12  7:41   ` Ævar Arnfjörð Bjarmason
2022-10-12  8:27   ` Ævar Arnfjörð Bjarmason
2022-10-11 23:26 ` [PATCH v2 3/4] diff-lib: refactor match_stat_with_submodule Calvin Wan
2022-10-11 23:26 ` [PATCH v2 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-10-12  8:31   ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).