git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] Fixes for the parallel processing engine and git submodule update
@ 2015-10-20 22:43 Stefan Beller
  2015-10-20 22:43 ` [PATCH 1/8] run-command: Fix early shutdown Stefan Beller
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

Patches 1-6 replace the last 6 patches of sb/submodule-parallel-fetch
(Patch 1,2 changed code, 3,4 stayed as is, 5 has more commit message, 
Patch 6 is the same again)

Patches 7,8 are new in the series .
Patch 7 keeps the update strategy in the cached submodue structs around,
Patch 8 rewrites some small part of the git submodule update script in C
by having another larger helper function in builtin/submodule--helper.c
which takes care of the cloning new submodules without having all the
intermediate steps as in previous versions of this series.

The patch 8 is just a rewrite/translation without enabling the parallel
processing though. This will be done in a later patch once we have
bike shedded enough how to name the user facing option for that.
(I guess the CLI option would be --jobs again, but I'd rather hint at
the config option)

This supersedes 
[RFC PATCHv1 00/12] git submodule update in C with parallel cloning

Any feedback welcome!
Thanks,
Stefan

Stefan Beller (8):
  run-command: Fix early shutdown
  run-command: Call get_next_task with a clean child process.
  run-command: Initialize the shutdown flag
  test-run-command: Test for gracefully aborting
  test-run-command: Increase test coverage
  run-command: Fix missing output from late callbacks
  submodule config: Keep update strategy around
  git submodule update: Have a dedicated helper for cloning

 builtin/submodule--helper.c | 222 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 run-command.c               |  27 +++++-
 submodule-config.c          |  11 +++
 submodule-config.h          |   1 +
 t/t0061-run-command.sh      |  37 +++++++-
 t/t7400-submodule-basic.sh  |   4 +-
 test-run-command.c          |  37 +++++++-
 8 files changed, 340 insertions(+), 44 deletions(-)

-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/8] run-command: Fix early shutdown
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-20 22:43 ` [PATCH 2/8] run-command: Call get_next_task with a clean child process Stefan Beller
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

The return value of `pp_collect_finished` indicates if we want to shutdown
the parallel processing early. Have just one return path to
return any accumulated results.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index ef3da27..b9363da 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1077,7 +1077,7 @@ static int pp_collect_finished(struct parallel_processes *pp)
 	while (pp->nr_processes > 0) {
 		pid = waitpid(-1, &wait_status, WNOHANG);
 		if (pid == 0)
-			return 0;
+			break;
 
 		if (pid < 0)
 			die_errno("wait");
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/8] run-command: Call get_next_task with a clean child process.
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
  2015-10-20 22:43 ` [PATCH 1/8] run-command: Fix early shutdown Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-20 23:05   ` Junio C Hamano
  2015-10-20 23:05   ` Junio C Hamano
  2015-10-20 22:43 ` [PATCH 3/8] run-command: Initialize the shutdown flag Stefan Beller
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

If the `get_next_task` did not explicitly called child_process_init
and only filled in some fields, there may have been some stale data
in the child process. This is hard to debug and also adds a review
burden for each new user of that API. To improve the situation, we
pass only cleanly initialized child structs to the get_next_task.

As an invariant you can now assume any child not in use is
cleaned up and ready for its next reuse.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index b9363da..a5ef874 100644
--- a/run-command.c
+++ b/run-command.c
@@ -13,6 +13,12 @@ void child_process_init(struct child_process *child)
 	argv_array_init(&child->env_array);
 }
 
+void child_process_deinit(struct child_process *child)
+{
+	argv_array_clear(&child->args);
+	argv_array_clear(&child->env_array);
+}
+
 struct child_to_clean {
 	pid_t pid;
 	struct child_to_clean *next;
@@ -962,6 +968,7 @@ static struct parallel_processes *pp_init(int n,
 
 	for (i = 0; i < n; i++) {
 		strbuf_init(&pp->children[i].err, 0);
+		child_process_init(&pp->children[i].process);
 		pp->pfd[i].events = POLLIN;
 		pp->pfd[i].fd = -1;
 	}
@@ -973,8 +980,10 @@ static void pp_cleanup(struct parallel_processes *pp)
 {
 	int i;
 
-	for (i = 0; i < pp->max_processes; i++)
+	for (i = 0; i < pp->max_processes; i++) {
 		strbuf_release(&pp->children[i].err);
+		child_process_deinit(&pp->children[i].process);
+	}
 
 	free(pp->children);
 	free(pp->pfd);
@@ -1134,6 +1143,8 @@ static int pp_collect_finished(struct parallel_processes *pp)
 		pp->nr_processes--;
 		pp->children[i].in_use = 0;
 		pp->pfd[i].fd = -1;
+		child_process_deinit(&pp->children[i].process);
+		child_process_init(&pp->children[i].process);
 
 		if (i != pp->output_owner) {
 			strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/8] run-command: Initialize the shutdown flag
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
  2015-10-20 22:43 ` [PATCH 1/8] run-command: Fix early shutdown Stefan Beller
  2015-10-20 22:43 ` [PATCH 2/8] run-command: Call get_next_task with a clean child process Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-20 22:43 ` [PATCH 4/8] test-run-command: Test for gracefully aborting Stefan Beller
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

It did work out without initializing the flag so far, but to make it
future proof, we want to explicitly initialize the flag.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/run-command.c b/run-command.c
index a5ef874..d354c01 100644
--- a/run-command.c
+++ b/run-command.c
@@ -962,6 +962,7 @@ static struct parallel_processes *pp_init(int n,
 
 	pp->nr_processes = 0;
 	pp->output_owner = 0;
+	pp->shutdown = 0;
 	pp->children = xcalloc(n, sizeof(*pp->children));
 	pp->pfd = xcalloc(n, sizeof(*pp->pfd));
 	strbuf_init(&pp->buffered_output, 0);
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/8] test-run-command: Test for gracefully aborting
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
                   ` (2 preceding siblings ...)
  2015-10-20 22:43 ` [PATCH 3/8] run-command: Initialize the shutdown flag Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-20 22:43 ` [PATCH 5/8] test-run-command: Increase test coverage Stefan Beller
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

We reuse the get_next_task callback which would stop after invoking the
test program 4 times. However as we have only 3 parallel processes started
(We pass in 3 as max parallel processes and 3 is smaller than the spawn
cap in run-command, so we will start the 3 processes in the first run
deterministically), then the abort logic from the new task_finished
callback kicks in and prevents the parallel processing engine to start
any more processes.

As the task_finished is unconditional, we will see its output 3 times, too.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t0061-run-command.sh | 14 ++++++++++++++
 test-run-command.c     | 14 ++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 49aa3db..0af77cd 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -67,4 +67,18 @@ test_expect_success 'run_command runs in parallel' '
 	test_cmp expect actual
 '
 
+cat >expect <<-EOF
+preloaded output of a child
+asking for a quick stop
+preloaded output of a child
+asking for a quick stop
+preloaded output of a child
+asking for a quick stop
+EOF
+
+test_expect_success 'run_command is asked to abort gracefully' '
+	test-run-command run-command-abort-3 false 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/test-run-command.c b/test-run-command.c
index 699d9e9..4b59482 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -34,6 +34,16 @@ static int parallel_next(void** task_cb,
 	return 1;
 }
 
+static int task_finished(int result,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	strbuf_addf(err, "asking for a quick stop\n");
+	return 1;
+}
+
 int main(int argc, char **argv)
 {
 	struct child_process proc = CHILD_PROCESS_INIT;
@@ -55,6 +65,10 @@ int main(int argc, char **argv)
 		exit(run_processes_parallel(4, parallel_next,
 					    NULL, NULL, &proc));
 
+	if (!strcmp(argv[1], "run-command-abort-3"))
+		exit(run_processes_parallel(3, parallel_next,
+					    NULL, task_finished, &proc));
+
 	fprintf(stderr, "check usage\n");
 	return 1;
 }
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/8] test-run-command: Increase test coverage
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
                   ` (3 preceding siblings ...)
  2015-10-20 22:43 ` [PATCH 4/8] test-run-command: Test for gracefully aborting Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-20 22:43 ` [PATCH 6/8] run-command: Fix missing output from late callbacks Stefan Beller
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

Currently we have exact 4 jobs to be run with at most 4 parallel
processes. This sounds as if we're testing only one special case,
but in reality we want to have any number of tasks be processed
successfully, so test:
* more tasks than max jobs (implying slots get reused)
* equal number of jobs and max jobs
* less tasks than max jobs (implying there are unused slots)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t0061-run-command.sh | 16 +++++++++++++---
 test-run-command.c     | 14 +++++++++-----
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 0af77cd..f27ada7 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -62,8 +62,18 @@ Hello
 World
 EOF
 
-test_expect_success 'run_command runs in parallel' '
-	test-run-command run-command-parallel-4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+test_expect_success 'run_command runs in parallel with more jobs available than tasks' '
+	test-run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
+	test-run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
+	test-run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
 	test_cmp expect actual
 '
 
@@ -77,7 +87,7 @@ asking for a quick stop
 EOF
 
 test_expect_success 'run_command is asked to abort gracefully' '
-	test-run-command run-command-abort-3 false 2>actual &&
+	test-run-command run-command-abort 3 false 2>actual &&
 	test_cmp expect actual
 '
 
diff --git a/test-run-command.c b/test-run-command.c
index 4b59482..c8770c2 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -47,10 +47,11 @@ static int task_finished(int result,
 int main(int argc, char **argv)
 {
 	struct child_process proc = CHILD_PROCESS_INIT;
+	int jobs;
 
 	if (argc < 3)
 		return 1;
-	proc.argv = (const char **)argv+2;
+	proc.argv = (const char **)argv + 2;
 
 	if (!strcmp(argv[1], "start-command-ENOENT")) {
 		if (start_command(&proc) < 0 && errno == ENOENT)
@@ -61,12 +62,15 @@ int main(int argc, char **argv)
 	if (!strcmp(argv[1], "run-command"))
 		exit(run_command(&proc));
 
-	if (!strcmp(argv[1], "run-command-parallel-4"))
-		exit(run_processes_parallel(4, parallel_next,
+	jobs = atoi(argv[2]);
+	proc.argv = (const char **)argv + 3;
+
+	if (!strcmp(argv[1], "run-command-parallel"))
+		exit(run_processes_parallel(jobs, parallel_next,
 					    NULL, NULL, &proc));
 
-	if (!strcmp(argv[1], "run-command-abort-3"))
-		exit(run_processes_parallel(3, parallel_next,
+	if (!strcmp(argv[1], "run-command-abort"))
+		exit(run_processes_parallel(jobs, parallel_next,
 					    NULL, task_finished, &proc));
 
 	fprintf(stderr, "check usage\n");
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/8] run-command: Fix missing output from late callbacks
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
                   ` (4 preceding siblings ...)
  2015-10-20 22:43 ` [PATCH 5/8] test-run-command: Increase test coverage Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-20 22:43 ` [PATCH 7/8] submodule config: Keep update strategy around Stefan Beller
  2015-10-20 22:43 ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
  7 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

The callbacks in the parallel processing API were given the contract, that
they are not allowed to print anything to stdout/err, but the API will take
care of their output needs instead.

In case a child process is started, the callback can first add its messages
to the buffer and then the child process output is buffered just in the
same buffer, and so the output will be taken care of eventually once the
child process is done.

When no child process is run, we also need to fulfill our promise to
output the buffer eventually. So when no child process is started, we need
to amend the contents of the buffer passed to the child to our buffer for
finished processes. We cannot output directly at that point in time as
another process may be in the middle of its output.

The buffer for finished child processes then also needs to be flushed in
the cleanup phase as it may contain data.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c          | 11 ++++++++++-
 t/t0061-run-command.sh |  9 +++++++++
 test-run-command.c     | 13 +++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index d354c01..fa73dae 100644
--- a/run-command.c
+++ b/run-command.c
@@ -988,6 +988,12 @@ static void pp_cleanup(struct parallel_processes *pp)
 
 	free(pp->children);
 	free(pp->pfd);
+
+	/*
+	 * When get_next_task added messages to the buffer in its last
+	 * iteration, the buffered output is non empty.
+	 */
+	fputs(pp->buffered_output.buf, stderr);
 	strbuf_release(&pp->buffered_output);
 
 	sigchain_pop_common();
@@ -1023,8 +1029,11 @@ static int pp_start_one(struct parallel_processes *pp)
 	if (!pp->get_next_task(&pp->children[i].data,
 			       &pp->children[i].process,
 			       &pp->children[i].err,
-			       pp->data))
+			       pp->data)) {
+		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
+		strbuf_reset(&pp->children[i].err);
 		return 1;
+	}
 
 	if (start_command(&pp->children[i].process)) {
 		int code = pp->start_failure(&pp->children[i].process,
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index f27ada7..12228b4 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -91,4 +91,13 @@ test_expect_success 'run_command is asked to abort gracefully' '
 	test_cmp expect actual
 '
 
+cat >expect <<-EOF
+no further jobs available
+EOF
+
+test_expect_success 'run_command outputs ' '
+	test-run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/test-run-command.c b/test-run-command.c
index c8770c2..13e5d44 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -34,6 +34,15 @@ static int parallel_next(void** task_cb,
 	return 1;
 }
 
+static int no_job(void** task_cb,
+		  struct child_process *cp,
+		  struct strbuf *err,
+		  void *cb)
+{
+	strbuf_addf(err, "no further jobs available\n");
+	return 0;
+}
+
 static int task_finished(int result,
 			 struct child_process *cp,
 			 struct strbuf *err,
@@ -73,6 +82,10 @@ int main(int argc, char **argv)
 		exit(run_processes_parallel(jobs, parallel_next,
 					    NULL, task_finished, &proc));
 
+	if (!strcmp(argv[1], "run-command-no-jobs"))
+		exit(run_processes_parallel(jobs, no_job,
+					    NULL, task_finished, &proc));
+
 	fprintf(stderr, "check usage\n");
 	return 1;
 }
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/8] submodule config: Keep update strategy around
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
                   ` (5 preceding siblings ...)
  2015-10-20 22:43 ` [PATCH 6/8] run-command: Fix missing output from late callbacks Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-20 22:43 ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
  7 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

This may conflict with origin/sb/submodule-config-parse, but only on a
syntactical level (this adds an else if {...} just after the refactoredd code).
There is no clash of functionality or semantics.

 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index 393de53..175bcbb 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -326,6 +327,16 @@ static int parse_config(const char *var, const char *value, void *data)
 		free((void *) submodule->url);
 		strbuf_addstr(&url, value);
 		submodule->url = strbuf_detach(&url, NULL);
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *)submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 release_return:
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
                   ` (6 preceding siblings ...)
  2015-10-20 22:43 ` [PATCH 7/8] submodule config: Keep update strategy around Stefan Beller
@ 2015-10-20 22:43 ` Stefan Beller
  2015-10-21 20:47   ` Junio C Hamano
  7 siblings, 1 reply; 17+ messages in thread
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode==none) are required in the
helper to make the decision for cloning. These checks have been moved
into the C function as well. (No need to repeat them in the shell
script)

As we can only access the stderr channel from within the parallel
processing engine, so we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so a
bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 222 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 235 insertions(+), 36 deletions(-)
 
 Review is best done starting at the end and scrolling up, as that's how the
 code flows in submodule--helper.c.

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..6d4815a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,227 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	int count;
+	int quiet;
+	int print_unmatched;
+	char *reference;
+	char *depth;
+	char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int get_next_task(void **pp_task_cb,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		const char *url = NULL;
+		int just_cloned = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string_const(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (just_cloned) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static int start_failure(struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int task_finished(int result,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			N_("Use the local reference repository "
+			   "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			N_("Create a shallow clone truncated to the "
+			   "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, get_next_task, start_failure, task_finished, &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +485,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8b0eb9a..ea883b9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -655,17 +655,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -682,27 +683,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -742,13 +726,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] run-command: Call get_next_task with a clean child process.
  2015-10-20 22:43 ` [PATCH 2/8] run-command: Call get_next_task with a clean child process Stefan Beller
@ 2015-10-20 23:05   ` Junio C Hamano
  2015-10-20 23:05   ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2015-10-20 23:05 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> If the `get_next_task` did not explicitly called child_process_init
> and only filled in some fields, there may have been some stale data
> in the child process. This is hard to debug and also adds a review
> burden for each new user of that API. To improve the situation, we
> pass only cleanly initialized child structs to the get_next_task.
>
> As an invariant you can now assume any child not in use is
> cleaned up and ready for its next reuse.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  run-command.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/run-command.c b/run-command.c
> index b9363da..a5ef874 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -13,6 +13,12 @@ void child_process_init(struct child_process *child)
>  	argv_array_init(&child->env_array);
>  }
>  
> +void child_process_deinit(struct child_process *child)
> +{
> +	argv_array_clear(&child->args);
> +	argv_array_clear(&child->env_array);
> +}
> +

Is this necessary (and is it necessary to make it global)?
I thought that finish_command() already clears them....

    ... goes and looks ...

Ahh, of course, pp_*() functions do use start_command() but do not
use finish_command(), which sort of breaks symmetry, but that cannot
be helped.  Because we want to wait for any of the multiple tasks
running, we cannot call finish_command() that explicitly says "I
want to wait for this one to finish".

And that is why you already have two calls to array-clear inside
collect_finished(), just after calling task_finished().

And of course we already have these array-clear calls in
finish_command().

So I agree that deinit helper should exist, but

 * it should be file-scope static;

 * it should be called by finish_command(); and

 * if you are calling it from collect_finished(), then existing
   calls to array-clear should go.

Other than that, this looks good.

Thansk.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] run-command: Call get_next_task with a clean child process.
  2015-10-20 22:43 ` [PATCH 2/8] run-command: Call get_next_task with a clean child process Stefan Beller
  2015-10-20 23:05   ` Junio C Hamano
@ 2015-10-20 23:05   ` Junio C Hamano
  2015-10-21 20:30     ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-20 23:05 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> If the `get_next_task` did not explicitly called child_process_init
> and only filled in some fields, there may have been some stale data
> in the child process. This is hard to debug and also adds a review
> burden for each new user of that API. To improve the situation, we
> pass only cleanly initialized child structs to the get_next_task.
>
> As an invariant you can now assume any child not in use is
> cleaned up and ready for its next reuse.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  run-command.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/run-command.c b/run-command.c
> index b9363da..a5ef874 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -13,6 +13,12 @@ void child_process_init(struct child_process *child)
>  	argv_array_init(&child->env_array);
>  }
>  
> +void child_process_deinit(struct child_process *child)
> +{
> +	argv_array_clear(&child->args);
> +	argv_array_clear(&child->env_array);
> +}
> +

Is this necessary (and is it necessary to make it global)?
I thought that finish_command() already clears them....

    ... goes and looks ...

Ahh, of course, pp_*() functions do use start_command() but do not
use finish_command(), which sort of breaks symmetry, but that cannot
be helped.  Because we want to wait for any of the multiple tasks
running, we cannot call finish_command() that explicitly says "I
want to wait for this one to finish".

And that is why you already have two calls to array-clear inside
collect_finished(), just after calling task_finished().

And of course we already have these array-clear calls in
finish_command().

So I agree that deinit helper should exist, but

 * it should be file-scope static;

 * it should be called by finish_command(); and

 * if you are calling it from collect_finished(), then existing
   calls to array-clear should go.

Other than that, this looks good.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] run-command: Call get_next_task with a clean child process.
  2015-10-20 23:05   ` Junio C Hamano
@ 2015-10-21 20:30     ` Junio C Hamano
  2015-10-21 21:07       ` Stefan Beller
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-21 20:30 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Junio C Hamano <gitster@pobox.com> writes:

> And of course we already have these array-clear calls in
> finish_command().
>
> So I agree that deinit helper should exist, but
>
>  * it should be file-scope static;
>
>  * it should be called by finish_command(); and
>
>  * if you are calling it from collect_finished(), then existing
>    calls to array-clear should go.
>
> Other than that, this looks good.

I'll queue this instead (the above squashed in and description
corrected).

-- >8 --
From: Stefan Beller <sbeller@google.com>
Date: Tue, 20 Oct 2015 15:43:44 -0700
Subject: [PATCH] run-command: clear leftover state from child_process structure

pp_start_one() function finds an unused child_process structure
passes it to start_command(), but the structure may have already
been used earlier.  finish_command() has code to clear leftover
states in the structure so that it can be reused, but the parallel
execution machinery does not (and cannot) use it, and instead has
its own pp_collect_finished().  This function, after culling a
process that has just finished, forgot to clear the child_process
structure before marking it ready for reuse.

Introduce child_process_deinit() helper function that clears two
instances of argv_array (one for arg, the other for env) in the
structure, make the existing codepaths that clear them call the
helper instead (which in turn will make sure that we will not leak
resources later even when we add new fields to the structure), and
also add a call to it in pp_collect_finished() before the function
marks the structure read for reuse.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 run-command.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/run-command.c b/run-command.c
index b9363da..684ccee 100644
--- a/run-command.c
+++ b/run-command.c
@@ -13,6 +13,12 @@ void child_process_init(struct child_process *child)
 	argv_array_init(&child->env_array);
 }
 
+static void child_process_deinit(struct child_process *child)
+{
+	argv_array_clear(&child->args);
+	argv_array_clear(&child->env_array);
+}
+
 struct child_to_clean {
 	pid_t pid;
 	struct child_to_clean *next;
@@ -338,8 +344,7 @@ int start_command(struct child_process *cmd)
 fail_pipe:
 			error("cannot create %s pipe for %s: %s",
 				str, cmd->argv[0], strerror(failed_errno));
-			argv_array_clear(&cmd->args);
-			argv_array_clear(&cmd->env_array);
+			child_process_deinit(cmd);
 			errno = failed_errno;
 			return -1;
 		}
@@ -525,8 +530,7 @@ fail_pipe:
 			close_pair(fderr);
 		else if (cmd->err)
 			close(cmd->err);
-		argv_array_clear(&cmd->args);
-		argv_array_clear(&cmd->env_array);
+		child_process_deinit(cmd);
 		errno = failed_errno;
 		return -1;
 	}
@@ -552,8 +556,7 @@ fail_pipe:
 int finish_command(struct child_process *cmd)
 {
 	int ret = wait_or_whine(cmd->pid, cmd->argv[0]);
-	argv_array_clear(&cmd->args);
-	argv_array_clear(&cmd->env_array);
+	child_process_deinit(cmd);
 	return ret;
 }
 
@@ -962,6 +965,7 @@ static struct parallel_processes *pp_init(int n,
 
 	for (i = 0; i < n; i++) {
 		strbuf_init(&pp->children[i].err, 0);
+		child_process_init(&pp->children[i].process);
 		pp->pfd[i].events = POLLIN;
 		pp->pfd[i].fd = -1;
 	}
@@ -973,8 +977,10 @@ static void pp_cleanup(struct parallel_processes *pp)
 {
 	int i;
 
-	for (i = 0; i < pp->max_processes; i++)
+	for (i = 0; i < pp->max_processes; i++) {
 		strbuf_release(&pp->children[i].err);
+		child_process_deinit(&pp->children[i].process);
+	}
 
 	free(pp->children);
 	free(pp->pfd);
@@ -1128,12 +1134,11 @@ static int pp_collect_finished(struct parallel_processes *pp)
 				      &pp->children[i].data))
 			result = 1;
 
-		argv_array_clear(&pp->children[i].process.args);
-		argv_array_clear(&pp->children[i].process.env_array);
-
 		pp->nr_processes--;
 		pp->children[i].in_use = 0;
 		pp->pfd[i].fd = -1;
+		child_process_deinit(&pp->children[i].process);
+		child_process_init(&pp->children[i].process);
 
 		if (i != pp->output_owner) {
 			strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
-- 
2.6.2-394-gc0a4d5b

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-20 22:43 ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
@ 2015-10-21 20:47   ` Junio C Hamano
  2015-10-21 21:06     ` Stefan Beller
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-21 20:47 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> This introduces a new helper function in git submodule--helper
> which takes care of cloning all submodules, which we want to
> parallelize eventually.
>
> Some tests (such as empty URL, update_mode==none) are required in the
> helper to make the decision for cloning. These checks have been moved
> into the C function as well. (No need to repeat them in the shell
> script)
>
> As we can only access the stderr channel from within the parallel
> processing engine, so we need to reroute the error message for
> specified but initialized submodules to stderr. As it is an error
> message, this should have gone to stderr in the first place, so a
> bug fix along the way.

The last paragraph is hard to parse; perhaps it is slightly
ungrammatical.

It would be a really good idea to split the small bit to redirect
the output that should have gone to the standard error to where it
should as a preparatory step before showing this patch.

I sense that this one is still a WIP/RFC, so I'll only skim it in
this round (but I may come back and read it again later with finer
toothed comb).

> +static int get_next_task(void **pp_task_cb,
> +			 struct child_process *cp,
> +			 struct strbuf *err,
> +			 void *pp_cb)

Will you have only one caller of the parallel run-command API in
this file, or will you be adding more to allow various different
operations run in parallel as more things are rewritten?  I am
guessing that it would be the latter, but if that is the case,
perhaps the function wants to be named a bit more specificly for
this first user, no?  Same for start_failure and task_finished.

> diff --git a/git-submodule.sh b/git-submodule.sh
> index 8b0eb9a..ea883b9 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -655,17 +655,18 @@ cmd_update()
>  		cmd_init "--" "$@" || return
>  	fi
>  
> -	cloned_modules=
> -	git submodule--helper list --prefix "$wt_prefix" "$@" | {
> +	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
> +		${wt_prefix:+--prefix "$wt_prefix"} \
> +		${prefix:+--recursive_prefix "$prefix"} \
> +		${update:+--update "$update"} \
> +		${reference:+--reference "$reference"} \
> +		${depth:+--depth "$depth"} \
> +		"$@" | {
>  	err=
> -	while read mode sha1 stage sm_path
> +	while read mode sha1 stage just_cloned sm_path
>  	do

I wonder if you really want this to be upstream of a pipe.  When the
downstream loop needs to abort, what happens to the remainder of the
"clone" part of the processing that is still ongoing in the upstream
of the pipe?  I would imagine that the "update-clone" network
accessing phase is the more human-time consuming part, so I suspect
that it would be much better to let the cloning part go and finish
first (during which time the human-user can spend time for other
things, like getting cup of coffee or filling expense reports) and
before moving to the loop that can stop and ask the human-user for
help.

The fix for the above could be trivial (do not pipe, just take the
output to a temporary file, and then feed the "while read" loop from
that temporary file), and I suspect it would make a big difference
for usability.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-21 20:47   ` Junio C Hamano
@ 2015-10-21 21:06     ` Stefan Beller
  2015-10-21 21:23       ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Beller @ 2015-10-21 21:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Wed, Oct 21, 2015 at 1:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This introduces a new helper function in git submodule--helper
>> which takes care of cloning all submodules, which we want to
>> parallelize eventually.
>>
>> Some tests (such as empty URL, update_mode==none) are required in the
>> helper to make the decision for cloning. These checks have been moved
>> into the C function as well. (No need to repeat them in the shell
>> script)
>>
>> As we can only access the stderr channel from within the parallel
>> processing engine, so we need to reroute the error message for
>> specified but initialized submodules to stderr. As it is an error
>> message, this should have gone to stderr in the first place, so a
>> bug fix along the way.
>
> The last paragraph is hard to parse; perhaps it is slightly
> ungrammatical.

I seem to have started a habit starting my sentences with "so..."
even in spoken English. If left out, this may be easier to read:

    As we can only access the stderr channel from within the parallel
    processing engine, we need to reroute the error message for
    "specified but initialized submodules" to stderr. As it is an error
    message, this should have gone to stderr in the first place.
    It's a bug fix along the way.

>
> It would be a really good idea to split the small bit to redirect
> the output that should have gone to the standard error to where it
> should as a preparatory step before showing this patch.

ok.

>
> I sense that this one is still a WIP/RFC, so I'll only skim it in
> this round (but I may come back and read it again later with finer
> toothed comb).
>
>> +static int get_next_task(void **pp_task_cb,
>> +                      struct child_process *cp,
>> +                      struct strbuf *err,
>> +                      void *pp_cb)
>
> Will you have only one caller of the parallel run-command API in
> this file, or will you be adding more to allow various different
> operations run in parallel as more things are rewritten?  I am
> guessing that it would be the latter, but if that is the case,
> perhaps the function wants to be named a bit more specificly for
> this first user, no?  Same for start_failure and task_finished.

Ok, will rename.
Although I am not sure if I need to rewrite more in C for "git submodule".

I only rewrite git submodule update because git clone --recurse is just
blindly calling out to git submodule update.  So instead of parallelizing
"submodule update" I could have put a parallel submodule clone into
the clone command. (That looks strangely appealing now, because it
may be even faster as there is no downstream pipe with sequential
checkouts, so you could have one parallel pool with chained clone
and checkout commands).

>
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index 8b0eb9a..ea883b9 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -655,17 +655,18 @@ cmd_update()
>>               cmd_init "--" "$@" || return
>>       fi
>>
>> -     cloned_modules=
>> -     git submodule--helper list --prefix "$wt_prefix" "$@" | {
>> +     git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
>> +             ${wt_prefix:+--prefix "$wt_prefix"} \
>> +             ${prefix:+--recursive_prefix "$prefix"} \
>> +             ${update:+--update "$update"} \
>> +             ${reference:+--reference "$reference"} \
>> +             ${depth:+--depth "$depth"} \
>> +             "$@" | {
>>       err=
>> -     while read mode sha1 stage sm_path
>> +     while read mode sha1 stage just_cloned sm_path
>>       do
>
> I wonder if you really want this to be upstream of a pipe.  When the
> downstream loop needs to abort, what happens to the remainder of the
> "clone" part of the processing that is still ongoing in the upstream
> of the pipe?  I would imagine that the "update-clone" network
> accessing phase is the more human-time consuming part, so I suspect
> that it would be much better to let the cloning part go and finish
> first (during which time the human-user can spend time for other
> things, like getting cup of coffee or filling expense reports) and
> before moving to the loop that can stop and ask the human-user for
> help.
>
> The fix for the above could be trivial (do not pipe, just take the
> output to a temporary file, and then feed the "while read" loop from
> that temporary file), and I suspect it would make a big difference
> for usability.

I'd like to counter your argument with quoting code from update_clone
method:

     run_processes_parallel(1, get_next_task, start_failure,
task_finished, &pp);

     if (pp.print_unmatched) {
         printf("#unmatched\n");
         return 1;
     }

     for_each_string_list_item(item, &pp.projectlines) {
         utf8_fprintf(stdout, "%s", item->string);
     }

So we do already all the cloning first, and then once we did all of that
we just put out all accumulated lines of text. (It was harder to come up with
a sufficient file name than just storing it in memory. I don't think
memory is an
issue here, only a few bytes per submodule. So even 1000 submodules would
consume maybe 100kB)

Having a file though would allow us to continue after human interaction fixed
one problem.




>
> Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/8] run-command: Call get_next_task with a clean child process.
  2015-10-21 20:30     ` Junio C Hamano
@ 2015-10-21 21:07       ` Stefan Beller
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-21 21:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Wed, Oct 21, 2015 at 1:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> And of course we already have these array-clear calls in
>> finish_command().
>>
>> So I agree that deinit helper should exist, but
>>
>>  * it should be file-scope static;
>>
>>  * it should be called by finish_command(); and
>>
>>  * if you are calling it from collect_finished(), then existing
>>    calls to array-clear should go.
>>
>> Other than that, this looks good.
>
> I'll queue this instead (the above squashed in and description
> corrected).
>

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-21 21:06     ` Stefan Beller
@ 2015-10-21 21:23       ` Junio C Hamano
  2015-10-21 22:14         ` Stefan Beller
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2015-10-21 21:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

Stefan Beller <sbeller@google.com> writes:

> I'd like to counter your argument with quoting code from update_clone
> method:
> 
>      run_processes_parallel(1, get_next_task, start_failure,
> task_finished, &pp);
>
>      if (pp.print_unmatched) {
>          printf("#unmatched\n");
>          return 1;
>      }
>
>      for_each_string_list_item(item, &pp.projectlines) {
>          utf8_fprintf(stdout, "%s", item->string);
>      }
>
> So we do already all the cloning first, and then once we did all of that
> we just put out all accumulated lines of text. (It was harder to come up with
> a sufficient file name than just storing it in memory. I don't think
> memory is an
> issue here, only a few bytes per submodule. So even 1000 submodules would
> consume maybe 100kB)

That does not sound like a counter-argument; two bad design choices
compensating each other's shortcomings, perhaps ;-)

> Having a file though would allow us to continue after human
> interaction fixed one problem.

Yes.  That does sound like a better design.

This obviously depends on the impact to the other part of what
cmd_update() does, but your earlier idea to investigate the
feasibility and usefulness of updating "clone --recurse-submodules"
does sound like a good thing to do, too.  That's an excellent point.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-21 21:23       ` Junio C Hamano
@ 2015-10-21 22:14         ` Stefan Beller
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2015-10-21 22:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Wed, Oct 21, 2015 at 2:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> I'd like to counter your argument with quoting code from update_clone
>> method:
>>
>>      run_processes_parallel(1, get_next_task, start_failure,
>> task_finished, &pp);
>>
>>      if (pp.print_unmatched) {
>>          printf("#unmatched\n");
>>          return 1;
>>      }
>>
>>      for_each_string_list_item(item, &pp.projectlines) {
>>          utf8_fprintf(stdout, "%s", item->string);
>>      }
>>
>> So we do already all the cloning first, and then once we did all of that
>> we just put out all accumulated lines of text. (It was harder to come up with
>> a sufficient file name than just storing it in memory. I don't think
>> memory is an
>> issue here, only a few bytes per submodule. So even 1000 submodules would
>> consume maybe 100kB)
>
> That does not sound like a counter-argument; two bad design choices
> compensating each other's shortcomings, perhaps ;-)

I was phrasing it worse than I meant to. I should have pointed out the
positive aspect of having first all clones done and then the
local post processing in the downstream pipe afterwards.

>
>> Having a file though would allow us to continue after human
>> interaction fixed one problem.
>
> Yes.  That does sound like a better design.

I don't think the proposed patches make it worse than it already is.
Before we have the "submodule--helper list" which tells downpipe to
do all the things. Now we just take out the cloning and make it an
upfront action, eventually faster due to parallelism.

Also I think we should not promote "git submodule" and specially
its update sub-command to be the best command available. Ideally
we want to rather implement implicit submodule handling in the
other commands such as clone, pull, fetch, checkout, merge, rebase.
and by that I mean better defaults than just "don't touch the submodules,
as that's the safest thing to do now".

>
> This obviously depends on the impact to the other part of what
> cmd_update() does, but your earlier idea to investigate the
> feasibility and usefulness of updating "clone --recurse-submodules"
> does sound like a good thing to do, too.  That's an excellent point.

I investigated and I think it's a bad idea now :)
Because of the --recursive switch we would need to do more than just

    submodules_init()
    run_parallel(.. clone_and_checkout...);

but each cloned submodule would need to be inspected for recursive
submodules again and then we would need to add that to the list of
submodules to process.

I estimate this is about as much work as improving "git submodule update"
to do uncontroversial checkouts in the first parallel phase.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-10-21 22:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-20 22:43 [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
2015-10-20 22:43 ` [PATCH 1/8] run-command: Fix early shutdown Stefan Beller
2015-10-20 22:43 ` [PATCH 2/8] run-command: Call get_next_task with a clean child process Stefan Beller
2015-10-20 23:05   ` Junio C Hamano
2015-10-20 23:05   ` Junio C Hamano
2015-10-21 20:30     ` Junio C Hamano
2015-10-21 21:07       ` Stefan Beller
2015-10-20 22:43 ` [PATCH 3/8] run-command: Initialize the shutdown flag Stefan Beller
2015-10-20 22:43 ` [PATCH 4/8] test-run-command: Test for gracefully aborting Stefan Beller
2015-10-20 22:43 ` [PATCH 5/8] test-run-command: Increase test coverage Stefan Beller
2015-10-20 22:43 ` [PATCH 6/8] run-command: Fix missing output from late callbacks Stefan Beller
2015-10-20 22:43 ` [PATCH 7/8] submodule config: Keep update strategy around Stefan Beller
2015-10-20 22:43 ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
2015-10-21 20:47   ` Junio C Hamano
2015-10-21 21:06     ` Stefan Beller
2015-10-21 21:23       ` Junio C Hamano
2015-10-21 22:14         ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).