git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/9] Expose the submodule parallelism to the user
@ 2015-10-27 18:15 Stefan Beller
  2015-10-27 18:15 ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
                   ` (9 more replies)
  0 siblings, 10 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Where does it apply?
---
This applies on 376d400f4c (run-command: fix missing output from late callbacks,
which is the latest commit in origin/sb/submodule-parallel-fetch which was
merged to origin/next)
The first patch is a duplicate of origin/sb/submodule-config-parse, so
it may make sense to drop the first patch and apply this series on top of a
merge of 376d400f4c and origin/sb/submodule-config-parse.

I realize sending refactorings in the area you'd be likely to touch as 
a separate patch (series) is not necessarily a good idea as it leads to
situations like this.

What does it do?
---
This series should finish the on going efforts of parallelizing
submodule network traffic. The patches contain tests for clone,
fetch and submodule update to use the actual parallelism both via
command line as well as a configured option. I decided to go with
"submodule.jobs" for all three for now.

Detailed breakdown of the patches
---

Patch 1 is a duplicate of origin/sb/submodule-config-parse and may make
merging with that easier.

Patch 2 adds the update strategy to the struct submodule, which is required in
patch 4.

Patch 3 adds rudimentary tracing output to the parallel processing commands.

Patch 4 rewrites parts of "git submodule update" in C, such that the cloning
is done from within the parallel processing engine. 

Patch 5 however exposes the possible parallelism of patch 4 to the user.
(doc + tests)

Patch 6 adds the parallel feature to clone, which just invokes "submodule update"
internally.

Patch 7 is a small refactoring preparing patch 8 to smoothly parse submodules.jobs.

Patch 9 teaches fetch to respect the desired parallelism both from command line
as well as the config option.

Thanks,
Stefan

Stefan Beller (9):
  submodule-config: "goto" removal in parse_config()
  submodule config: keep update strategy around
  run_processes_parallel: Add output to tracing messages
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones
  submodule config: remove name_and_item_from_var
  submodule-config: parse_config
  fetching submodules: Respect `submodule.jobs` config option

 Documentation/config.txt        |   7 ++
 Documentation/git-clone.txt     |   5 +-
 Documentation/git-submodule.txt |   6 +-
 builtin/clone.c                 |  26 ++++-
 builtin/fetch.c                 |   2 +-
 builtin/submodule--helper.c     | 243 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++-----
 run-command.c                   |   4 +
 submodule-config.c              | 166 ++++++++++++++-------------
 submodule-config.h              |   3 +
 submodule.c                     |   5 +
 t/t5526-fetch-submodules.sh     |  14 +++
 t/t7400-submodule-basic.sh      |   4 +-
 t/t7406-submodule-update.sh     |  27 +++++
 14 files changed, 444 insertions(+), 122 deletions(-)

-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/9] submodule-config: "goto" removal in parse_config()
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 21:26   ` Jonathan Nieder
  2015-10-27 18:15 ` [PATCH 2/9] submodule config: keep update strategy around Stefan Beller
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Many components in if/else if/... cascade jumped to a shared
clean-up with "goto release_return", but we can restructure the
function a bit and make them disappear, which reduces the line count
as well.  Also reformat overlong lines and poorly indented ones
while at it.

The order of rules to verify the value for "ignore" used to be to
complain on multiple values first and then complain to boolean, but
swap the order to match how the values for "path" and "url" are
verified.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 74 +++++++++++++++++++++---------------------------------
 1 file changed, 29 insertions(+), 45 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 393de53..afe0ea8 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -257,78 +257,62 @@ static int parse_config(const char *var, const char *value, void *data)
 	if (!name_and_item_from_var(var, &name, &item))
 		return 0;
 
-	submodule = lookup_or_create_by_name(me->cache, me->gitmodules_sha1,
-			name.buf);
+	submodule = lookup_or_create_by_name(me->cache,
+					     me->gitmodules_sha1,
+					     name.buf);
 
 	if (!strcmp(item.buf, "path")) {
-		struct strbuf path = STRBUF_INIT;
-		if (!value) {
+		if (!value)
 			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (!me->overwrite && submodule->path != NULL) {
+		else if (!me->overwrite && submodule->path != NULL)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"path");
-			goto release_return;
+		else {
+			if (submodule->path)
+				cache_remove_path(me->cache, submodule);
+			free((void *) submodule->path);
+			submodule->path = xstrdup(value);
+			cache_put_path(me->cache, submodule);
 		}
-
-		if (submodule->path)
-			cache_remove_path(me->cache, submodule);
-		free((void *) submodule->path);
-		strbuf_addstr(&path, value);
-		submodule->path = strbuf_detach(&path, NULL);
-		cache_put_path(me->cache, submodule);
 	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
-		    submodule->fetch_recurse != RECURSE_SUBMODULES_NONE) {
+		    submodule->fetch_recurse != RECURSE_SUBMODULES_NONE)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"fetchrecursesubmodules");
-			goto release_return;
-		}
-
-		submodule->fetch_recurse = parse_fetch_recurse(var, value,
+		else
+			submodule->fetch_recurse = parse_fetch_recurse(
+								var, value,
 								die_on_error);
 	} else if (!strcmp(item.buf, "ignore")) {
-		struct strbuf ignore = STRBUF_INIT;
-		if (!me->overwrite && submodule->ignore != NULL) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->ignore != NULL)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"ignore");
-			goto release_return;
-		}
-		if (!value) {
-			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (strcmp(value, "untracked") && strcmp(value, "dirty") &&
-		    strcmp(value, "all") && strcmp(value, "none")) {
+		else if (strcmp(value, "untracked") &&
+			 strcmp(value, "dirty") &&
+			 strcmp(value, "all") &&
+			 strcmp(value, "none"))
 			warning("Invalid parameter '%s' for config option "
 					"'submodule.%s.ignore'", value, var);
-			goto release_return;
+		else {
+			free((void *) submodule->ignore);
+			submodule->ignore = xstrdup(value);
 		}
-
-		free((void *) submodule->ignore);
-		strbuf_addstr(&ignore, value);
-		submodule->ignore = strbuf_detach(&ignore, NULL);
 	} else if (!strcmp(item.buf, "url")) {
-		struct strbuf url = STRBUF_INIT;
 		if (!value) {
 			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (!me->overwrite && submodule->url != NULL) {
+		} else if (!me->overwrite && submodule->url != NULL) {
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"url");
-			goto release_return;
+		} else {
+			free((void *) submodule->url);
+			submodule->url = xstrdup(value);
 		}
-
-		free((void *) submodule->url);
-		strbuf_addstr(&url, value);
-		submodule->url = strbuf_detach(&url, NULL);
 	}
 
-release_return:
 	strbuf_release(&name);
 	strbuf_release(&item);
 
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/9] submodule config: keep update strategy around
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
  2015-10-27 18:15 ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 18:15 ` [PATCH 3/9] run_processes_parallel: Add output to tracing messages Stefan Beller
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..8b8c7d1 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *)submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 	strbuf_release(&name);
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/9] run_processes_parallel: Add output to tracing messages
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
  2015-10-27 18:15 ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
  2015-10-27 18:15 ` [PATCH 2/9] submodule config: keep update strategy around Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 18:15 ` [PATCH 4/9] git submodule update: have a dedicated helper for cloning Stefan Beller
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This commit serves 2 purposes. First this may help the user who
tries to diagnose intermixed process calls. Second this may be used
in a later patch for testing. As the output of a command should not
change visibly except for going faster, grepping for the trace output
seems like a viable testing strategy.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/run-command.c b/run-command.c
index 1fbd286..9ac2df5 100644
--- a/run-command.c
+++ b/run-command.c
@@ -949,6 +949,9 @@ static struct parallel_processes *pp_init(int n,
 		n = online_cpus();
 
 	pp->max_processes = n;
+
+	trace_printf("run_processes_parallel: preparing to run up to %d children in parallel", n);
+
 	pp->data = data;
 	if (!get_next_task)
 		die("BUG: you need to specify a get_next_task function");
@@ -978,6 +981,7 @@ static void pp_cleanup(struct parallel_processes *pp)
 {
 	int i;
 
+	trace_printf("run_processes_parallel: parallel processing done");
 	for (i = 0; i < pp->max_processes; i++) {
 		strbuf_release(&pp->children[i].err);
 		child_process_deinit(&pp->children[i].process);
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/9] git submodule update: have a dedicated helper for cloning
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (2 preceding siblings ...)
  2015-10-27 18:15 ` [PATCH 3/9] run_processes_parallel: Add output to tracing messages Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 18:15 ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 234 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 247 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..1ec1b85 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,239 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	int count;
+	int quiet;
+	int print_unmatched;
+	char *reference;
+	char *depth;
+	char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		char *url = NULL;
+		int just_cloned = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (!sub) {
+			strbuf_addf(err, "BUG: internal error managing submodules. "
+				    "The cache could not locate '%s'", ce->name);
+			pp->print_unmatched = 1;
+			return 0;
+		}
+
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (just_cloned) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			free(url);
+			return 1;
+		} else
+			free(url);
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +497,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8b0eb9a..ea883b9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -655,17 +655,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -682,27 +683,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -742,13 +726,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (3 preceding siblings ...)
  2015-10-27 18:15 ` [PATCH 4/9] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 20:59   ` Junio C Hamano
  2015-10-27 18:15 ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  6 +++++-
 builtin/submodule--helper.c     | 17 +++++++++++++----
 git-submodule.sh                |  9 +++++++++
 t/t7406-submodule-update.sh     | 12 ++++++++++++
 4 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..f5429fa 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,10 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j::
+--jobs::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1ec1b85..c3d438a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -431,6 +431,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -451,6 +452,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -472,10 +475,16 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs == -1)
+		if (git_config_get_int("submodule.jobs", &max_jobs))
+			max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index ea883b9..c2dfb16 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -636,6 +636,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -661,6 +669,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..05ea66f 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
 	)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+	(cd super2 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+	 grep "7 children" trace.out &&
+	 git config submodule.jobs 8 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update &&
+	 grep "8 children" trace.out &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 9 &&
+	 grep "9 children" trace.out
+	)
+'
 test_done
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (4 preceding siblings ...)
  2015-10-27 18:15 ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 20:57   ` Junio C Hamano
  2015-10-27 18:15 ` [PATCH 7/9] submodule config: remove name_and_item_from_var Stefan Beller
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  5 ++++-
 builtin/clone.c             | 26 ++++++++++++++++++++------
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..affa52e 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,9 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j::
+--jobs::
+	The number of submodules fetched at the same time.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 5864ad1..b8b1d4c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_jobs = -1;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_jobs,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -674,8 +673,23 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_jobs == -1)
+			if (git_config_get_int("submodule.jobs", &max_jobs))
+				max_jobs = 1;
+		if (max_jobs != 1) {
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addf(&sb, "--jobs=%d", max_jobs);
+			argv_array_push(&args, sb.buf);
+			strbuf_release(&sb);
+		}
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 05ea66f..ade0524 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -786,4 +786,19 @@ test_expect_success 'submodule update can be run in parallel' '
 	 grep "9 children" trace.out
 	)
 '
+
+test_expect_success 'git clone passes the parallel jobs config on to submodules' '
+	test_when_finished "rm -rf super4" &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 7 . super4 &&
+	grep "7 children" trace.out &&
+	rm -rf super4 &&
+	git config --global submodule.jobs 8 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules . super4 &&
+	grep "8 children" trace.out &&
+	rm -rf super4 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 9 . super4 &&
+	grep "9 children" trace.out &&
+	rm -rf super4
+'
+
 test_done
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 7/9] submodule config: remove name_and_item_from_var
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (5 preceding siblings ...)
  2015-10-27 18:15 ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 18:15 ` [PATCH 8/9] submodule-config: parse_config Stefan Beller
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

By inlining `name_and_item_from_var` it is easy to add later options
which are not required to have a submodule name.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 8b8c7d1..4d0563c 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -161,22 +161,6 @@ static struct submodule *cache_lookup_name(struct submodule_cache *cache,
 	return NULL;
 }
 
-static int name_and_item_from_var(const char *var, struct strbuf *name,
-				  struct strbuf *item)
-{
-	const char *subsection, *key;
-	int subsection_len, parse;
-	parse = parse_config_key(var, "submodule", &subsection,
-			&subsection_len, &key);
-	if (parse < 0 || !subsection)
-		return 0;
-
-	strbuf_add(name, subsection, subsection_len);
-	strbuf_addstr(item, key);
-
-	return 1;
-}
-
 static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 		const unsigned char *gitmodules_sha1, const char *name)
 {
@@ -251,18 +235,25 @@ static int parse_config(const char *var, const char *value, void *data)
 {
 	struct parse_config_parameter *me = data;
 	struct submodule *submodule;
-	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
-	int ret = 0;
+	int subsection_len, ret = 0;
+	const char *subsection, *key;
+	char *name;
 
-	/* this also ensures that we only parse submodule entries */
-	if (!name_and_item_from_var(var, &name, &item))
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
 		return 0;
 
+	if (!subsection_len)
+		return 0;
+
+	/* subsection is not null terminated */
+	name = xmemdupz(subsection, subsection_len);
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
-					     name.buf);
+					     name);
+	free(name);
 
-	if (!strcmp(item.buf, "path")) {
+	if (!strcmp(key, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->path != NULL)
@@ -275,7 +266,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->path = xstrdup(value);
 			cache_put_path(me->cache, submodule);
 		}
-	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+	} else if (!strcmp(key, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
@@ -286,7 +277,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->fetch_recurse = parse_fetch_recurse(
 								var, value,
 								die_on_error);
-	} else if (!strcmp(item.buf, "ignore")) {
+	} else if (!strcmp(key, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->ignore != NULL)
@@ -302,7 +293,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->ignore);
 			submodule->ignore = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "url")) {
+	} else if (!strcmp(key, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
 		} else if (!me->overwrite && submodule->url != NULL) {
@@ -312,7 +303,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "update")) {
+	} else if (!strcmp(key, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->update != NULL)
@@ -324,9 +315,6 @@ static int parse_config(const char *var, const char *value, void *data)
 		}
 	}
 
-	strbuf_release(&name);
-	strbuf_release(&item);
-
 	return ret;
 }
 
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 8/9] submodule-config: parse_config
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (6 preceding siblings ...)
  2015-10-27 18:15 ` [PATCH 7/9] submodule config: remove name_and_item_from_var Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 18:15 ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
  2015-10-27 19:12 ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
  9 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This rewrites parse_config to distinguish between configs specific to
one submodule and configs which apply generically to all submodules.
We do not have generic submodule configs yet, but the next patch will
introduce "submodule.jobs".

Signed-off-by: Stefan Beller <sbeller@google.com>

# Conflicts:
#	submodule-config.c

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 58 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 4d0563c..1cea404 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -231,27 +231,23 @@ struct parse_config_parameter {
 	int overwrite;
 };
 
-static int parse_config(const char *var, const char *value, void *data)
+static int parse_generic_submodule_config(const char *var,
+					  const char *key,
+					  const char *value)
 {
-	struct parse_config_parameter *me = data;
-	struct submodule *submodule;
-	int subsection_len, ret = 0;
-	const char *subsection, *key;
-	char *name;
-
-	if (parse_config_key(var, "submodule", &subsection,
-			     &subsection_len, &key) < 0)
-		return 0;
-
-	if (!subsection_len)
-		return 0;
+	return 0;
+}
 
-	/* subsection is not null terminated */
-	name = xmemdupz(subsection, subsection_len);
-	submodule = lookup_or_create_by_name(me->cache,
-					     me->gitmodules_sha1,
-					     name);
-	free(name);
+static int parse_specific_submodule_config(struct parse_config_parameter *me,
+					   const char *name,
+					   const char *key,
+					   const char *value,
+					   const char *var)
+{
+	int ret = 0;
+	struct submodule *submodule = lookup_or_create_by_name(me->cache,
+							       me->gitmodules_sha1,
+							       name);
 
 	if (!strcmp(key, "path")) {
 		if (!value)
@@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
+static int parse_config(const char *var, const char *value, void *data)
+{
+	struct parse_config_parameter *me = data;
+
+	int subsection_len;
+	const char *subsection, *key;
+	char *name;
+
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
+		return 0;
+
+	if (!subsection_len)
+		return parse_generic_submodule_config(var, key, value);
+	else {
+		int ret;
+		/* subsection is not null terminated */
+		name = xmemdupz(subsection, subsection_len);
+		ret = parse_specific_submodule_config(me, name, key, value, var);
+		free(name);
+		return ret;
+	}
+}
+
 static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
 				      unsigned char *gitmodules_sha1)
 {
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (7 preceding siblings ...)
  2015-10-27 18:15 ` [PATCH 8/9] submodule-config: parse_config Stefan Beller
@ 2015-10-27 18:15 ` Stefan Beller
  2015-10-27 21:00   ` Junio C Hamano
  2015-10-27 19:12 ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This allows to configure fetching and updating in parallel
without having the command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt    |  7 +++++++
 builtin/fetch.c             |  2 +-
 submodule-config.c          |  9 +++++++++
 submodule-config.h          |  2 ++
 submodule.c                 |  5 +++++
 t/t5526-fetch-submodules.sh | 14 ++++++++++++++
 6 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 315f271..0b733d7 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2575,6 +2575,13 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
+submodule::jobs
+	This is used to determine how many submodules can be operated on in
+	parallel. Specifying a positive integer allows up to that number
+	of submodules being fetched in parallel. Specifying 0 the number
+	of cpus will be taken as the maximum number. Currently this is
+	used in fetch and clone operations only.
+
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
 	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f28eac6..b1399dc 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule-config.c b/submodule-config.c
index 1cea404..07bdcdf 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -32,6 +32,7 @@ enum lookup_type {
 
 static struct submodule_cache cache;
 static int is_cache_init;
+static int parallel_jobs = -1;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
@@ -235,6 +236,9 @@ static int parse_generic_submodule_config(const char *var,
 					  const char *key,
 					  const char *value)
 {
+	if (!strcmp(key, "jobs")) {
+		parallel_jobs = strtol(value, NULL, 10);
+	}
 	return 0;
 }
 
@@ -483,3 +487,8 @@ void submodule_free(void)
 	cache_free(&cache);
 	is_cache_init = 0;
 }
+
+int config_parallel_submodules(void)
+{
+	return parallel_jobs;
+}
diff --git a/submodule-config.h b/submodule-config.h
index f9e2a29..d9bbf9a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
 void submodule_free(void);
 
+int config_parallel_submodules(void);
+
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index c21b265..4822605 100644
--- a/submodule.c
+++ b/submodule.c
@@ -759,6 +759,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_parallel_submodules();
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 1b4ce69..5c3579c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'fetching submodules respects parallel settings' '
+	git config fetch.recurseSubmodules true &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
+		grep "7 children" trace.out &&
+		git config submodule.jobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git fetch &&
+		grep "8 children" trace.out &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
+		grep "9 children" trace.out
+	)
+'
+
 test_done
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/9] Expose the submodule parallelism to the user
  2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (8 preceding siblings ...)
  2015-10-27 18:15 ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-27 19:12 ` Junio C Hamano
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
  9 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2015-10-27 19:12 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> Where does it apply?
> ---
> This applies on 376d400f4c (run-command: fix missing output from late callbacks,
> which is the latest commit in origin/sb/submodule-parallel-fetch which was
> merged to origin/next)

Thanks for a detailed description.  I'd do this:

    $ git checkout -b sb/submodule-parallel-update 8b70042
    $ git merge sb/submodule-parallel-fetch~4 ;# 376d400f4c

apply 2-9 there (the fork point is the merge of config-parse topic
to 'master'), and drop the four patches near the top of the other
branch.

> I realize sending refactorings in the area you'd be likely to touch as 
> a separate patch (series) is not necessarily a good idea as it leads to
> situations like this.

Don't worry too much about it.  When you tackle a large area with a
lot of existing code, these things are bound to happen.

> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option.

;-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones
  2015-10-27 18:15 ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-27 20:57   ` Junio C Hamano
  2015-10-28 20:50     ` Stefan Beller
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2015-10-27 20:57 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> Just pass it along to "git submodule update", which may pick reasonable
> defaults if you don't specify an explicit number.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/git-clone.txt |  5 ++++-
>  builtin/clone.c             | 26 ++++++++++++++++++++------
>  t/t7406-submodule-update.sh | 15 +++++++++++++++
>  3 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index f1f2a3f..affa52e 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -14,7 +14,7 @@ SYNOPSIS
>  	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
>  	  [--dissociate] [--separate-git-dir <git dir>]
>  	  [--depth <depth>] [--[no-]single-branch]
> -	  [--recursive | --recurse-submodules] [--] <repository>
> +	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
>  	  [<directory>]
>  
>  DESCRIPTION
> @@ -216,6 +216,9 @@ objects from the source repository into a pack in the cloned repository.
>  	The result is Git repository can be separated from working
>  	tree.
>  
> +-j::
> +--jobs::

Judging from the way how "--depth <depth>" and other options with
parameter are described, I think this should be:

          -j <n>::
          --jobs <n>::

> +	The number of submodules fetched at the same time.

Do we want to say "Defaults to submodule.jobs" somewhere?

>  
>  <repository>::
>  	The (possibly remote) repository to clone from.  See the
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5864ad1..b8b1d4c 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -50,6 +50,7 @@ static int option_progress = -1;
>  static struct string_list option_config;
>  static struct string_list option_reference;
>  static int option_dissociate;
> +static int max_jobs = -1;
>  
>  static struct option builtin_clone_options[] = {
>  	OPT__VERBOSITY(&option_verbosity),
> @@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
>  		    N_("initialize submodules in the clone")),
>  	OPT_BOOL(0, "recurse-submodules", &option_recursive,
>  		    N_("initialize submodules in the clone")),
> +	OPT_INTEGER('j', "jobs", &max_jobs,
> +		    N_("number of submodules cloned in parallel")),
>  	OPT_STRING(0, "template", &option_template, N_("template-directory"),
>  		   N_("directory from which templates will be used")),
>  	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
> @@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
>  	OPT_END()
>  };
>  
> -static const char *argv_submodule[] = {
> -	"submodule", "update", "--init", "--recursive", NULL
> -};
> -
>  static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
>  {
>  	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
> @@ -674,8 +673,23 @@ static int checkout(void)
>  	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>  			   sha1_to_hex(sha1), "1", NULL);
>  
> -	if (!err && option_recursive)
> -		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
> +	if (!err && option_recursive) {
> +		struct argv_array args = ARGV_ARRAY_INIT;
> +		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
> +
> +		if (max_jobs == -1)
> +			if (git_config_get_int("submodule.jobs", &max_jobs))
> +				max_jobs = 1;

This is somewhat an irregular way to handle a configuration
variable.  Usually we instead do:

	* initialize a variable to "unspecified" (e.g. -1);
        * let git_config() callback to overwrite the variable;
        * let parse_options() to overwrite the variable.

so that you can just use the variable at the use site like this
function, knowing that the variable is already set with the correct
precedence order.

Besides, if you really cared what the value of submodule.jobs is,
shouldn't you be calling config_parallel_submodules()?  I'd also
think that you do not want to read that variable here in the first
place (see below)...

> +		if (max_jobs != 1) {
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_addf(&sb, "--jobs=%d", max_jobs);
> +			argv_array_push(&args, sb.buf);
> +			strbuf_release(&sb);
> +		}

I am tempted to suggest that you should not pay attention to
"submodule.jobs" in this command at all and just pass through
"--jobs=$max_jobs" that was specified from the command line, as the
spawned "submodule update --init --recursive" would handle
"submodule.jobs" itself.

Once you start allowing "clone.jobs" as a more specific version of
"submodule.jobs", then reading max_jobs first from "clone.jobs" and
then from the command line starts to make sense.  When neither is
specified, you would spawn "submodule update --init --recursive"
without any explicit "-j N" and let it honor its more generic
"submodule.jobs" setting; otherwise, you would run it with "-j N" to
override that more generic "submodule.jobs" setting with either the
value the command line -j given to "clone" or specified by a more
specific "clone.jobs".

> +		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
> +		argv_array_clear(&args);
> +	}

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-27 18:15 ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
@ 2015-10-27 20:59   ` Junio C Hamano
  2015-10-28 21:40     ` Stefan Beller
  0 siblings, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2015-10-27 20:59 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> @@ -374,6 +374,10 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
>  	clone with a history truncated to the specified number of revisions.
>  	See linkgit:git-clone[1]
>  
> +-j::
> +--jobs::

This probably should be 

          -j <n>::
          --jobs <n>::

(see comments on [6/9]).  I know the option description in this file
is sloppy and does not say "--name <name>" etc., as it should (but
it does say "--reference <repository>"), and fixing them may not be
within the scope of this series, but we do not need to add more to
the existing problems.

> +	This option is only valid for the update command.
> +	Clone new submodules in parallel with as many jobs.

And when 0 starts to meaning something special, we would need to
describe that here (and/or submodule.jobs entry in config.txt).
As I already said, I do not think "0 means num_cpus" is a useful
default, and I would prefer if we reserved 0 to mean something more
useful we would figure out later.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option
  2015-10-27 18:15 ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-27 21:00   ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2015-10-27 21:00 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 315f271..0b733d7 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2575,6 +2575,13 @@ submodule.<name>.ignore::
>  	"--ignore-submodules" option. The 'git submodule' commands are not
>  	affected by this setting.
>  
> +submodule::jobs

Did you mean this?

    submodule.jobs::

> +	This is used to determine how many submodules can be operated on in
> +	parallel. Specifying a positive integer allows up to that number
> +	of submodules being fetched in parallel. Specifying 0 the number
> +	of cpus will be taken as the maximum number. Currently this is
> +	used in fetch and clone operations only.
> +

You probably do not want to say "Currently this is" (you may still
want "only", though).  Whoever teaches other codepaths to pay
attention to the variable would update this as long as the
documentation stays current.

By the way, I doubt that "0 means num-CPUs" is a useful default for
parallelism that is used to help anything that is not CPU bound;
"clone", "submodule update", etc. are dominantly network bound, and
then disk I/O bound (especially if you are cloning from local disk).
I'd rather see "-j 0" to error out as "reserved for future use",
until we figure out what the useful default is, and then "-j 0" can
start using that default that is more useful than num_cpu.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] submodule-config: "goto" removal in parse_config()
  2015-10-27 18:15 ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
@ 2015-10-27 21:26   ` Jonathan Nieder
  2015-10-27 21:39     ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Jonathan Nieder @ 2015-10-27 21:26 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, gitster, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Hi,

Stefan Beller wrote:

> Subject: submodule-config: "goto" removal in parse_config()
>
> Many components in if/else if/... cascade jumped to a shared
> clean-up with "goto release_return", but we can restructure the
> function a bit and make them disappear,

Not having read the patch yet, the above makes me suspect this is
going to make the code worse.  A 'goto' for exception handling can
be a clean way to ensure everything allocated gets released, and
restructuring to avoid that can end up making the code more error
prone and harder to read.

In other words, the "goto" removal should be a side effect and not
the motivation.

>                                         which reduces the line count
> as well.  Also reformat overlong lines and poorly indented ones
> while at it.

These sound like good things.  Hopefully this will make the code
structure easier to understand, too.

> The order of rules to verify the value for "ignore" used to be to
> complain on multiple values first and then complain to boolean, but
> swap the order to match how the values for "path" and "url" are
> verified.

I don't understand this.  Hopefully the patch will make it clearer.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule-config.c | 74 +++++++++++++++++++++---------------------------------
>  1 file changed, 29 insertions(+), 45 deletions(-)

What patch does this apply against?  A similar patch appears to
already be part of "master".

[...]
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -257,78 +257,62 @@ static int parse_config(const char *var, const char *value, void *data)
>  	if (!name_and_item_from_var(var, &name, &item))
>  		return 0;
>  
> -	submodule = lookup_or_create_by_name(me->cache, me->gitmodules_sha1,
> -			name.buf);
> +	submodule = lookup_or_create_by_name(me->cache,
> +					     me->gitmodules_sha1,
> +					     name.buf);

Ok.

>  	if (!strcmp(item.buf, "path")) {
> -		struct strbuf path = STRBUF_INIT;
> -		if (!value) {
> +		if (!value)
>  			ret = config_error_nonbool(var);
> -			goto release_return;
> -		}

In the preimage, I can see at this line already that nothing more is going to
happen in this case.  In the postimage, I need to scroll down to find that
everything else is "else"s.

More generally, the patch seems to be about changing from a code structure
of

	if (condition) {
		handle it;
		goto done;
	}
	if (other condition) {
		handle it;
		goto done;
	}
	handle misc;
	goto done;

to

	if (condition) {
		handle it;
	} else if (other condition) {
		handle it;
	} else {
		handle misc;
	}

In this example the postimage is concise and simple enough that it's
probably worth it, but it is not obvious in the general case that this
is always a good thing to do.

Now that I see the patch is already merged, I don't think it needs
tweaks.  Just a little concerned about the possibility of people
judging from the commit message and emulating the pattern in the rest
of git.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] submodule-config: "goto" removal in parse_config()
  2015-10-27 21:26   ` Jonathan Nieder
@ 2015-10-27 21:39     ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2015-10-27 21:39 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, git, jacob.keller, peff, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Jonathan Nieder <jrnieder@gmail.com> writes:

> Not having read the patch yet, the above makes me suspect this is
> going to make the code worse.  A 'goto' for exception handling can
> be a clean way to ensure everything allocated gets released, and
> restructuring to avoid that can end up making the code more error
> prone and harder to read.
>
> In other words, the "goto" removal should be a side effect and not
> the motivation.

Yes, I shared the same general feeling (cf. $gmane/279405).

> More generally, the patch seems to be about changing from a code structure
> of
>
> 	if (condition) {
> 		handle it;
> 		goto done;
> 	}
> 	if (other condition) {
> 		handle it;
> 		goto done;
> 	}
> 	handle misc;
> 	goto done;
>
> to
>
> 	if (condition) {
> 		handle it;
> 	} else if (other condition) {
> 		handle it;
> 	} else {
> 		handle misc;
> 	}
>
> In this example the postimage is concise and simple enough that it's
> probably worth it, but it is not obvious in the general case that this
> is always a good thing to do.

Generally, a large piece of code is _easier_ to read with forward
"goto"s that jump to the shared clean-up code, as they serve as
visual cues that tell the reader "you can stop reading here and
ignore the remainder of this if/else if/... cascade".

> Now that I see the patch is already merged, I don't think it needs
> tweaks.  Just a little concerned about the possibility of people
> judging from the commit message and emulating the pattern in the rest
> of git.

Yes, we shouldn't let people blindly imitate this change.  I merged
it primarily because I wanted the change get out of my hair, as
other changes in flight started conflicting with it.

This kind of change can be good one only in a narrowly defined case
(like this one) but I agree that in general, as you said at the
beginning, it is an easy way to make the resulting code less
maintainable and harder to read.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones
  2015-10-27 20:57   ` Junio C Hamano
@ 2015-10-28 20:50     ` Stefan Beller
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 20:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Tue, Oct 27, 2015 at 1:57 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> +     The number of submodules fetched at the same time.
>
> Do we want to say "Defaults to submodule.jobs" somewhere?

Yes. :)

> I am tempted to suggest that you should not pay attention to
> "submodule.jobs" in this command at all and just pass through
> "--jobs=$max_jobs" that was specified from the command line, as the
> spawned "submodule update --init --recursive" would handle
> "submodule.jobs" itself.

makes sense.

>
> Once you start allowing "clone.jobs" as a more specific version of
> "submodule.jobs", then reading max_jobs first from "clone.jobs" and
> then from the command line starts to make sense.  When neither is
> specified, you would spawn "submodule update --init --recursive"
> without any explicit "-j N" and let it honor its more generic
> "submodule.jobs" setting; otherwise, you would run it with "-j N" to
> override that more generic "submodule.jobs" setting with either the
> value the command line -j given to "clone" or specified by a more
> specific "clone.jobs".

I see. Though I do not plan adding clone.jobs in the near future.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-27 20:59   ` Junio C Hamano
@ 2015-10-28 21:40     ` Stefan Beller
  2015-10-28 22:20       ` Junio C Hamano
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 21:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Tue, Oct 27, 2015 at 1:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
> And when 0 starts to meaning something special, we would need to
> describe that here (and/or submodule.jobs entry in config.txt).
> As I already said, I do not think "0 means num_cpus" is a useful
> default, and I would prefer if we reserved 0 to mean something more
> useful we would figure out later.

Ok I'll add that, too.

I am just debating with myself where the best place is.
In run-command.c in pp_init we have:

    if (n < 1)
        n = online_cpus();
    pp->max_processes = n;

we would need to change only that one place to insert an

    die("We haven't found the right default yet for 0");

However I think for most loads online_cpus makes sense as that
is ususally the bottleneck for local operations (if being excessive
memory may become an issue, but unlikely IMHO).
So instead I think it makes more sense to add it in the fetch/clone/update
to come up with a treatment for 0.

Maybe we want to make the explicit decision for the default value
for any user of the parallel processing, such that this code above
is misguided as it leads to bad defaults if reviewers are inattentive.

So having spelled out that, we may just want to bark in the pp_init
for having a number n < 1.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-28 21:40     ` Stefan Beller
@ 2015-10-28 22:20       ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2015-10-28 22:20 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine

Stefan Beller <sbeller@google.com> writes:

> On Tue, Oct 27, 2015 at 1:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> And when 0 starts to meaning something special, we would need to
>> describe that here (and/or submodule.jobs entry in config.txt).
>> As I already said, I do not think "0 means num_cpus" is a useful
>> default, and I would prefer if we reserved 0 to mean something more
>> useful we would figure out later.
>
> Ok I'll add that, too.

Sorry, but I take it back.  We just can document that (1) "-j 0"
will give you some default, (2) we do not promise that the default
will be optimal for you from day one, (3) we reserve the right to
"improve" it over time, and (4) we promise that we won't make it an
insanely wrong value.  And let's keep "0 currently means num_cpu",
which may or may not be optimal but it cannot be an "insanely wrong"
value.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-27 19:12 ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
@ 2015-10-28 23:21   ` Stefan Beller
  2015-10-28 23:21     ` [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages Stefan Beller
                       ` (9 more replies)
  0 siblings, 10 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This replaces origin/sb/submodule-parallel-update
(anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
into sb/submodule-parallel-update)

What does it do?
---
This series should finish the on going efforts of parallelizing
submodule network traffic. The patches contain tests for clone,
fetch and submodule update to use the actual parallelism both via
command line as well as a configured option. I decided to go with
"submodule.jobs" for all three for now.

What is new in v2?
---
* The patches got reordered slightly
* Documentation was adapted

Interdiff below

Stefan Beller (8):
  run_processes_parallel: Add output to tracing messages
  submodule config: keep update strategy around
  submodule config: remove name_and_item_from_var
  submodule-config: parse_config
  fetching submodules: Respect `submodule.jobs` config option
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones

 Documentation/config.txt        |   7 ++
 Documentation/git-clone.txt     |   6 +-
 Documentation/git-submodule.txt |   7 +-
 builtin/clone.c                 |  23 +++-
 builtin/fetch.c                 |   2 +-
 builtin/submodule--helper.c     | 244 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++-----
 run-command.c                   |   4 +
 submodule-config.c              |  98 ++++++++++------
 submodule-config.h              |   3 +
 submodule.c                     |   5 +
 t/t5526-fetch-submodules.sh     |  14 +++
 t/t7400-submodule-basic.sh      |   4 +-
 t/t7406-submodule-update.sh     |  27 +++++
 14 files changed, 418 insertions(+), 80 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0de0138..785721a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,12 +2643,12 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
-submodule::jobs
+submodule.jobs::
 	This is used to determine how many submodules can be operated on in
 	parallel. Specifying a positive integer allows up to that number
-	of submodules being fetched in parallel. Specifying 0 the number
-	of cpus will be taken as the maximum number. Currently this is
-	used in fetch and clone operations only.
+	of submodules being fetched in parallel. This is used in fetch
+	and clone operations only. A value of 0 will give some reasonable
+	default. The defaults may change with different versions of Git.
 
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index affa52e..01bd6b7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -216,9 +216,10 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
--j::
---jobs::
+-j <n>::
+--jobs <n>::
 	The number of submodules fetched at the same time.
+	Defaults to the `submodule.jobs` option.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f5429fa..c70fafd 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
--j::
---jobs::
+-j <n>::
+--jobs <n>::
 	This option is only valid for the update command.
 	Clone new submodules in parallel with as many jobs.
+	Defaults to the `submodule.jobs` option.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/clone.c b/builtin/clone.c
index 5ac2d89..22b9924 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -727,10 +727,7 @@ static int checkout(void)
 		struct argv_array args = ARGV_ARRAY_INIT;
 		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
 
-		if (max_jobs == -1)
-			if (git_config_get_int("submodule.jobs", &max_jobs))
-				max_jobs = 1;
-		if (max_jobs != 1) {
+		if (max_jobs != -1) {
 			struct strbuf sb = STRBUF_INIT;
 			strbuf_addf(&sb, "--jobs=%d", max_jobs);
 			argv_array_push(&args, sb.buf);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c3d438a..67dba1c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -476,9 +476,10 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
 
-	if (max_jobs == -1)
-		if (git_config_get_int("submodule.jobs", &max_jobs))
-			max_jobs = 1;
+	if (max_jobs < 0)
+		max_jobs = config_parallel_submodules();
+	if (max_jobs < 0)
+		max_jobs = 1;
 
 	run_processes_parallel(max_jobs,
 			       update_clone_get_next_task,

-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-10-30  1:10       ` Eric Sunshine
  2015-10-28 23:21     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
                       ` (8 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This commit serves 2 purposes. First this may help the user who
tries to diagnose intermixed process calls. Second this may be used
in a later patch for testing. As the output of a command should not
change visibly except for going faster, grepping for the trace output
seems like a viable testing strategy.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/run-command.c b/run-command.c
index 82cc238..49dec74 100644
--- a/run-command.c
+++ b/run-command.c
@@ -959,6 +959,9 @@ static struct parallel_processes *pp_init(int n,
 		n = online_cpus();
 
 	pp->max_processes = n;
+
+	trace_printf("run_processes_parallel: preparing to run up to %d children in parallel", n);
+
 	pp->data = data;
 	if (!get_next_task)
 		die("BUG: you need to specify a get_next_task function");
@@ -988,6 +991,7 @@ static void pp_cleanup(struct parallel_processes *pp)
 {
 	int i;
 
+	trace_printf("run_processes_parallel: parallel processing done");
 	for (i = 0; i < pp->max_processes; i++) {
 		strbuf_release(&pp->children[i].err);
 		child_process_deinit(&pp->children[i].process);
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
  2015-10-28 23:21     ` [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-10-30  1:14       ` Eric Sunshine
  2015-10-28 23:21     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
                       ` (7 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..8b8c7d1 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *)submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 	strbuf_release(&name);
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 3/8] submodule config: remove name_and_item_from_var
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
  2015-10-28 23:21     ` [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages Stefan Beller
  2015-10-28 23:21     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-10-30  1:23       ` Eric Sunshine
  2015-10-28 23:21     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
                       ` (6 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

By inlining `name_and_item_from_var` it is easy to add later options
which are not required to have a submodule name.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 8b8c7d1..4d0563c 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -161,22 +161,6 @@ static struct submodule *cache_lookup_name(struct submodule_cache *cache,
 	return NULL;
 }
 
-static int name_and_item_from_var(const char *var, struct strbuf *name,
-				  struct strbuf *item)
-{
-	const char *subsection, *key;
-	int subsection_len, parse;
-	parse = parse_config_key(var, "submodule", &subsection,
-			&subsection_len, &key);
-	if (parse < 0 || !subsection)
-		return 0;
-
-	strbuf_add(name, subsection, subsection_len);
-	strbuf_addstr(item, key);
-
-	return 1;
-}
-
 static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 		const unsigned char *gitmodules_sha1, const char *name)
 {
@@ -251,18 +235,25 @@ static int parse_config(const char *var, const char *value, void *data)
 {
 	struct parse_config_parameter *me = data;
 	struct submodule *submodule;
-	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
-	int ret = 0;
+	int subsection_len, ret = 0;
+	const char *subsection, *key;
+	char *name;
 
-	/* this also ensures that we only parse submodule entries */
-	if (!name_and_item_from_var(var, &name, &item))
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
 		return 0;
 
+	if (!subsection_len)
+		return 0;
+
+	/* subsection is not null terminated */
+	name = xmemdupz(subsection, subsection_len);
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
-					     name.buf);
+					     name);
+	free(name);
 
-	if (!strcmp(item.buf, "path")) {
+	if (!strcmp(key, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->path != NULL)
@@ -275,7 +266,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->path = xstrdup(value);
 			cache_put_path(me->cache, submodule);
 		}
-	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+	} else if (!strcmp(key, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
@@ -286,7 +277,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->fetch_recurse = parse_fetch_recurse(
 								var, value,
 								die_on_error);
-	} else if (!strcmp(item.buf, "ignore")) {
+	} else if (!strcmp(key, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->ignore != NULL)
@@ -302,7 +293,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->ignore);
 			submodule->ignore = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "url")) {
+	} else if (!strcmp(key, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
 		} else if (!me->overwrite && submodule->url != NULL) {
@@ -312,7 +303,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "update")) {
+	} else if (!strcmp(key, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->update != NULL)
@@ -324,9 +315,6 @@ static int parse_config(const char *var, const char *value, void *data)
 		}
 	}
 
-	strbuf_release(&name);
-	strbuf_release(&item);
-
 	return ret;
 }
 
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 4/8] submodule-config: parse_config
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
                       ` (2 preceding siblings ...)
  2015-10-28 23:21     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-10-30  1:53       ` Eric Sunshine
  2015-10-28 23:21     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
                       ` (5 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This rewrites parse_config to distinguish between configs specific to
one submodule and configs which apply generically to all submodules.
We do not have generic submodule configs yet, but the next patch will
introduce "submodule.jobs".

Signed-off-by: Stefan Beller <sbeller@google.com>

# Conflicts:
#	submodule-config.c

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 58 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 4d0563c..1cea404 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -231,27 +231,23 @@ struct parse_config_parameter {
 	int overwrite;
 };
 
-static int parse_config(const char *var, const char *value, void *data)
+static int parse_generic_submodule_config(const char *var,
+					  const char *key,
+					  const char *value)
 {
-	struct parse_config_parameter *me = data;
-	struct submodule *submodule;
-	int subsection_len, ret = 0;
-	const char *subsection, *key;
-	char *name;
-
-	if (parse_config_key(var, "submodule", &subsection,
-			     &subsection_len, &key) < 0)
-		return 0;
-
-	if (!subsection_len)
-		return 0;
+	return 0;
+}
 
-	/* subsection is not null terminated */
-	name = xmemdupz(subsection, subsection_len);
-	submodule = lookup_or_create_by_name(me->cache,
-					     me->gitmodules_sha1,
-					     name);
-	free(name);
+static int parse_specific_submodule_config(struct parse_config_parameter *me,
+					   const char *name,
+					   const char *key,
+					   const char *value,
+					   const char *var)
+{
+	int ret = 0;
+	struct submodule *submodule = lookup_or_create_by_name(me->cache,
+							       me->gitmodules_sha1,
+							       name);
 
 	if (!strcmp(key, "path")) {
 		if (!value)
@@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
+static int parse_config(const char *var, const char *value, void *data)
+{
+	struct parse_config_parameter *me = data;
+
+	int subsection_len;
+	const char *subsection, *key;
+	char *name;
+
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
+		return 0;
+
+	if (!subsection_len)
+		return parse_generic_submodule_config(var, key, value);
+	else {
+		int ret;
+		/* subsection is not null terminated */
+		name = xmemdupz(subsection, subsection_len);
+		ret = parse_specific_submodule_config(me, name, key, value, var);
+		free(name);
+		return ret;
+	}
+}
+
 static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
 				      unsigned char *gitmodules_sha1)
 {
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
                       ` (3 preceding siblings ...)
  2015-10-28 23:21     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-10-30  2:17       ` Eric Sunshine
  2015-10-28 23:21     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
                       ` (4 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This allows to configure fetching and updating in parallel
without having the command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt    |  7 +++++++
 builtin/fetch.c             |  2 +-
 submodule-config.c          |  9 +++++++++
 submodule-config.h          |  2 ++
 submodule.c                 |  5 +++++
 t/t5526-fetch-submodules.sh | 14 ++++++++++++++
 6 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..785721a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
+submodule.jobs::
+	This is used to determine how many submodules can be operated on in
+	parallel. Specifying a positive integer allows up to that number
+	of submodules being fetched in parallel. This is used in fetch
+	and clone operations only. A value of 0 will give some reasonable
+	default. The defaults may change with different versions of Git.
+
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
 	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9cc1c9d..60e6797 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule-config.c b/submodule-config.c
index 1cea404..07bdcdf 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -32,6 +32,7 @@ enum lookup_type {
 
 static struct submodule_cache cache;
 static int is_cache_init;
+static int parallel_jobs = -1;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
@@ -235,6 +236,9 @@ static int parse_generic_submodule_config(const char *var,
 					  const char *key,
 					  const char *value)
 {
+	if (!strcmp(key, "jobs")) {
+		parallel_jobs = strtol(value, NULL, 10);
+	}
 	return 0;
 }
 
@@ -483,3 +487,8 @@ void submodule_free(void)
 	cache_free(&cache);
 	is_cache_init = 0;
 }
+
+int config_parallel_submodules(void)
+{
+	return parallel_jobs;
+}
diff --git a/submodule-config.h b/submodule-config.h
index f9e2a29..d9bbf9a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
 void submodule_free(void);
 
+int config_parallel_submodules(void);
+
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index 0257ea3..188ba02 100644
--- a/submodule.c
+++ b/submodule.c
@@ -752,6 +752,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_parallel_submodules();
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 1b4ce69..5c3579c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'fetching submodules respects parallel settings' '
+	git config fetch.recurseSubmodules true &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
+		grep "7 children" trace.out &&
+		git config submodule.jobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git fetch &&
+		grep "8 children" trace.out &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
+		grep "9 children" trace.out
+	)
+'
+
 test_done
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
                       ` (4 preceding siblings ...)
  2015-10-28 23:21     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-10-29 22:34       ` Junio C Hamano
  2015-10-28 23:21     ` [PATCHv2 7/8] submodule update: expose parallelism to the user Stefan Beller
                       ` (3 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 234 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 247 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..1ec1b85 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,239 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	int count;
+	int quiet;
+	int print_unmatched;
+	char *reference;
+	char *depth;
+	char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		char *url = NULL;
+		int just_cloned = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (!sub) {
+			strbuf_addf(err, "BUG: internal error managing submodules. "
+				    "The cache could not locate '%s'", ce->name);
+			pp->print_unmatched = 1;
+			return 0;
+		}
+
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (just_cloned) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			free(url);
+			return 1;
+		} else
+			free(url);
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +497,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 9bc5c5f..9f554fb 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -664,17 +664,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -691,27 +692,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -751,13 +735,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 7/8] submodule update: expose parallelism to the user
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
                       ` (5 preceding siblings ...)
  2015-10-28 23:21     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-10-28 23:21     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  7 ++++++-
 builtin/submodule--helper.c     | 18 ++++++++++++++----
 git-submodule.sh                |  9 +++++++++
 t/t7406-submodule-update.sh     | 12 ++++++++++++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..c70fafd 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j <n>::
+--jobs <n>::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
+	Defaults to the `submodule.jobs` option.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1ec1b85..67dba1c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -431,6 +431,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -451,6 +452,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -472,10 +475,17 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs < 0)
+		max_jobs = config_parallel_submodules();
+	if (max_jobs < 0)
+		max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index 9f554fb..10c5af9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -645,6 +645,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -670,6 +678,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..05ea66f 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
 	)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+	(cd super2 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+	 grep "7 children" trace.out &&
+	 git config submodule.jobs 8 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update &&
+	 grep "8 children" trace.out &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 9 &&
+	 grep "9 children" trace.out
+	)
+'
 test_done
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
                       ` (6 preceding siblings ...)
  2015-10-28 23:21     ` [PATCHv2 7/8] submodule update: expose parallelism to the user Stefan Beller
@ 2015-10-28 23:21     ` Stefan Beller
  2015-11-01  8:58       ` Eric Sunshine
  2015-10-29 13:19     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
  2015-10-29 20:12     ` Junio C Hamano
  9 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  6 +++++-
 builtin/clone.c             | 23 +++++++++++++++++------
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..01bd6b7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,10 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j <n>::
+--jobs <n>::
+	The number of submodules fetched at the same time.
+	Defaults to the `submodule.jobs` option.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 9eaecd9..22b9924 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_jobs = -1;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_jobs,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -724,8 +723,20 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_jobs != -1) {
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addf(&sb, "--jobs=%d", max_jobs);
+			argv_array_push(&args, sb.buf);
+			strbuf_release(&sb);
+		}
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 05ea66f..ade0524 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -786,4 +786,19 @@ test_expect_success 'submodule update can be run in parallel' '
 	 grep "9 children" trace.out
 	)
 '
+
+test_expect_success 'git clone passes the parallel jobs config on to submodules' '
+	test_when_finished "rm -rf super4" &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 7 . super4 &&
+	grep "7 children" trace.out &&
+	rm -rf super4 &&
+	git config --global submodule.jobs 8 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules . super4 &&
+	grep "8 children" trace.out &&
+	rm -rf super4 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 9 . super4 &&
+	grep "9 children" trace.out &&
+	rm -rf super4
+'
+
 test_done
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
                       ` (7 preceding siblings ...)
  2015-10-28 23:21     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-29 13:19     ` Ramsay Jones
  2015-10-29 15:51       ` Stefan Beller
  2015-10-29 20:12     ` Junio C Hamano
  9 siblings, 1 reply; 48+ messages in thread
From: Ramsay Jones @ 2015-10-29 13:19 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine



On 28/10/15 23:21, Stefan Beller wrote:
> This replaces origin/sb/submodule-parallel-update
> (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
> into sb/submodule-parallel-update)
> 
> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option. I decided to go with
> "submodule.jobs" for all three for now.
> 
> What is new in v2?
> ---
> * The patches got reordered slightly
> * Documentation was adapted
> 
> Interdiff below
> 
> Stefan Beller (8):
>   run_processes_parallel: Add output to tracing messages
>   submodule config: keep update strategy around
>   submodule config: remove name_and_item_from_var
>   submodule-config: parse_config
>   fetching submodules: Respect `submodule.jobs` config option
>   git submodule update: have a dedicated helper for cloning
>   submodule update: expose parallelism to the user
>   clone: allow an explicit argument for parallel submodule clones
> 
>  Documentation/config.txt        |   7 ++
>  Documentation/git-clone.txt     |   6 +-
>  Documentation/git-submodule.txt |   7 +-
>  builtin/clone.c                 |  23 +++-
>  builtin/fetch.c                 |   2 +-
>  builtin/submodule--helper.c     | 244 ++++++++++++++++++++++++++++++++++++++++
>  git-submodule.sh                |  54 ++++-----
>  run-command.c                   |   4 +
>  submodule-config.c              |  98 ++++++++++------
>  submodule-config.h              |   3 +
>  submodule.c                     |   5 +
>  t/t5526-fetch-submodules.sh     |  14 +++
>  t/t7400-submodule-basic.sh      |   4 +-
>  t/t7406-submodule-update.sh     |  27 +++++
>  14 files changed, 418 insertions(+), 80 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0de0138..785721a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2643,12 +2643,12 @@ submodule.<name>.ignore::
>  	"--ignore-submodules" option. The 'git submodule' commands are not
>  	affected by this setting.
>  
> -submodule::jobs
> +submodule.jobs::
>  	This is used to determine how many submodules can be operated on in
>  	parallel. Specifying a positive integer allows up to that number
> -	of submodules being fetched in parallel. Specifying 0 the number
> -	of cpus will be taken as the maximum number. Currently this is
> -	used in fetch and clone operations only.
> +	of submodules being fetched in parallel. This is used in fetch
> +	and clone operations only. A value of 0 will give some reasonable
> +	default. The defaults may change with different versions of Git.
>  
>  tag.sort::
>  	This variable controls the sort ordering of tags when displayed by
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index affa52e..01bd6b7 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -216,9 +216,10 @@ objects from the source repository into a pack in the cloned repository.
>  	The result is Git repository can be separated from working
>  	tree.
>  
> --j::
> ---jobs::
> +-j <n>::
> +--jobs <n>::
>  	The number of submodules fetched at the same time.
> +	Defaults to the `submodule.jobs` option.

Hmm, is there a way to _not_ fetch in parallel (override the
config) from the command line for a given command?

ATB,
Ramsay Jones

>  
>  <repository>::
>  	The (possibly remote) repository to clone from.  See the
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index f5429fa..c70fafd 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
>  	clone with a history truncated to the specified number of revisions.
>  	See linkgit:git-clone[1]
>  
> --j::
> ---jobs::
> +-j <n>::
> +--jobs <n>::
>  	This option is only valid for the update command.
>  	Clone new submodules in parallel with as many jobs.
> +	Defaults to the `submodule.jobs` option.
>  
>  <path>...::
>  	Paths to submodule(s). When specified this will restrict the command
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5ac2d89..22b9924 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -727,10 +727,7 @@ static int checkout(void)
>  		struct argv_array args = ARGV_ARRAY_INIT;
>  		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
>  
> -		if (max_jobs == -1)
> -			if (git_config_get_int("submodule.jobs", &max_jobs))
> -				max_jobs = 1;
> -		if (max_jobs != 1) {
> +		if (max_jobs != -1) {
>  			struct strbuf sb = STRBUF_INIT;
>  			strbuf_addf(&sb, "--jobs=%d", max_jobs);
>  			argv_array_push(&args, sb.buf);
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index c3d438a..67dba1c 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -476,9 +476,10 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>  	/* Overlay the parsed .gitmodules file with .git/config */
>  	git_config(git_submodule_config, NULL);
>  
> -	if (max_jobs == -1)
> -		if (git_config_get_int("submodule.jobs", &max_jobs))
> -			max_jobs = 1;
> +	if (max_jobs < 0)
> +		max_jobs = config_parallel_submodules();
> +	if (max_jobs < 0)
> +		max_jobs = 1;
>  
>  	run_processes_parallel(max_jobs,
>  			       update_clone_get_next_task,
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 13:19     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
@ 2015-10-29 15:51       ` Stefan Beller
  2015-10-29 17:23         ` Junio C Hamano
  2015-10-29 23:50         ` Ramsay Jones
  0 siblings, 2 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-29 15:51 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:

> Hmm, is there a way to _not_ fetch in parallel (override the
> config) from the command line for a given command?
>
> ATB,
> Ramsay Jones

git config submodule.jobs 42
git <foo> --jobs 1 # should run just one task, despite having 42 configured

It does use the parallel processing machinery though, but with a maximum of
one subcommand being spawned. Is that what you're asking?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 15:51       ` Stefan Beller
@ 2015-10-29 17:23         ` Junio C Hamano
  2015-10-29 17:30           ` Stefan Beller
  2015-10-29 23:50         ` Ramsay Jones
  1 sibling, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2015-10-29 17:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Ramsay Jones, git@vger.kernel.org, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

Stefan Beller <sbeller@google.com> writes:

> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
> <ramsay@ramsayjones.plus.com> wrote:
>
>> Hmm, is there a way to _not_ fetch in parallel (override the
>> config) from the command line for a given command?
>>
>> ATB,
>> Ramsay Jones
>
> git config submodule.jobs 42
> git <foo> --jobs 1 # should run just one task, despite having 42 configured
>
> It does use the parallel processing machinery though, but with a maximum of
> one subcommand being spawned. Is that what you're asking?

With this patch, do we still keep a separate machinery that bypasses
the parallel thing altogether in the first place?

I was hoping that the underlying parallel machinery is polished
enough that using it with max=1 parallelism would be equivalent to
serial execution.  At least, that was my understanding of our goal,
and back when we reviewed the previous "fetch --recurse-sub" series,
my impression was we were already there.

And in that ideal endgame world, your "Give '-j1' from the command
line" would be perfectly an acceptable answer ;-).

Thanks.
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 17:23         ` Junio C Hamano
@ 2015-10-29 17:30           ` Stefan Beller
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-29 17:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramsay Jones, git@vger.kernel.org, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Thu, Oct 29, 2015 at 10:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>> <ramsay@ramsayjones.plus.com> wrote:
>>
>>> Hmm, is there a way to _not_ fetch in parallel (override the
>>> config) from the command line for a given command?
>>>
>>> ATB,
>>> Ramsay Jones
>>
>> git config submodule.jobs 42
>> git <foo> --jobs 1 # should run just one task, despite having 42 configured
>>
>> It does use the parallel processing machinery though, but with a maximum of
>> one subcommand being spawned. Is that what you're asking?
>
> With this patch, do we still keep a separate machinery that bypasses
> the parallel thing altogether in the first place?

No.

>
> I was hoping that the underlying parallel machinery is polished
> enough that using it with max=1 parallelism would be equivalent to
> serial execution.

There is no special code path for jobs=1.

It should be pretty close, just with the overhead of the parallel engine
spawning it one after the other and being an intermediate for output piping.
The one subcommand would still output via a pipe to the parallel engine,
which then outputs it immediately.

> At least, that was my understanding of our goal,
> and back when we reviewed the previous "fetch --recurse-sub" series,
> my impression was we were already there.
>
> And in that ideal endgame world, your "Give '-j1' from the command
> line" would be perfectly an acceptable answer ;-).

ok. :)

>
> Thanks.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
                       ` (8 preceding siblings ...)
  2015-10-29 13:19     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
@ 2015-10-29 20:12     ` Junio C Hamano
  9 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2015-10-29 20:12 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> This replaces origin/sb/submodule-parallel-update
> (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
> into sb/submodule-parallel-update)
>
> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option. I decided to go with
> "submodule.jobs" for all three for now.
>
> What is new in v2?
> ---
> * The patches got reordered slightly
> * Documentation was adapted

A couple of things I noticed (other than "many issues pointed out in
v1 have been updated") are:

 - The way 7/8 and 8/8 checks for uninitialized max_jobs are
   inconsistently written.  The way 7/8 does, i.e. (max_jobs < 0),
   looks more conventional.

 - "Defaults to the `submodule.jobs` option" should say
   "configuration variable" instead.

I haven't formed an opinion on 6/8 yet.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning
  2015-10-28 23:21     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-10-29 22:34       ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2015-10-29 22:34 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> +struct submodule_update_clone {
> +	int count;
> +	int quiet;
> +	int print_unmatched;
> +	char *reference;
> +	char *depth;
> +	char *update;
> +	const char *recursive_prefix;
> +	const char *prefix;
> +	struct module_list list;
> +	struct string_list projectlines;
> +	struct pathspec pathspec;
> +};

These fields should be split into at least two classes, the ones
that are primarily the "configuration", and the others that are
"states".  I am guessing 'quiet' is what the caller prepares and
tells the pp callbacks that they must work with reduced verbosity,
and 'print_unmatched' is also in the same boat.  From the above
structure definition, nobody can guess what 'count' represents.  Is
that the number of modules you have in the top-level superproject?
Is that the number of modules updated so far?  Some other number?

We can guess "list" is probably the list of modules to be cloned or
updated, but we have no idea what "projectlines" mean and what it
will be used for.  The only word with 'project' we would use in the
context of discussing submodules is the "top level superproject",
but then that will not need a "list", so that is not it.  Perhaps
this refers to a list of projects bound to our tree as submodules,
and perhaps each such submodule gives some kind of "lines", but it
is totally unclear what kind of lines they use.

> +static void fill_clone_command(struct child_process *cp, int quiet,
> +			       const char *prefix, const char *path,
> +			       const char *name, const char *url,
> +			       const char *reference, const char *depth)
> +{
> +	cp->git_cmd = 1;
> +	cp->no_stdin = 1;
> +	cp->stdout_to_stderr = 1;
> +	cp->err = -1;
> +	argv_array_push(&cp->args, "submodule--helper");
> +	argv_array_push(&cp->args, "clone");
> +	if (quiet)
> +		argv_array_push(&cp->args, "--quiet");
> +
> +	if (prefix) {
> +		argv_array_push(&cp->args, "--prefix");
> +		argv_array_push(&cp->args, prefix);
> +	}
> +	argv_array_push(&cp->args, "--path");
> +	argv_array_push(&cp->args, path);

The pattern makes readers wish if there were a way to make these
pair of pushes easier to read.  The best I can come up with is

    argv_array_pushl(&cp->args, "--path", path, NULL);

While that would be already a vast improvement, when we know there
are many "I want to push two", it makes me wonder if I am entitled
to find the repeated ", NULL" irritating.

    argv_array_push2(&cp->args, "--path", path);

on the hand feels slightly too specific.  I dunno.

> +static int update_clone_get_next_task(void **pp_task_cb,
> +				      struct child_process *cp,
> +				      struct strbuf *err,
> +				      void *pp_cb)
> +{
> +	struct submodule_update_clone *pp = pp_cb;
> +
> +	for (; pp->count < pp->list.nr; pp->count++) {
> +		const struct submodule *sub = NULL;
> +		const char *displaypath = NULL;
> +		const struct cache_entry *ce = pp->list.entries[pp->count];
> +		struct strbuf sb = STRBUF_INIT;
> +		const char *update_module = NULL;
> +		char *url = NULL;
> +		int just_cloned = 0;
> +
> +		if (ce_stage(ce)) {
> +			if (pp->recursive_prefix)
> +				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
> +					pp->recursive_prefix, ce->name);
> +			else
> +				strbuf_addf(err, "Skipping unmerged submodule %s\n",
> +					ce->name);
> +			continue;
> +		}
> +
> +		sub = submodule_from_path(null_sha1, ce->name);
> +		if (!sub) {
> +			strbuf_addf(err, "BUG: internal error managing submodules. "
> +				    "The cache could not locate '%s'", ce->name);
> +			pp->print_unmatched = 1;
> +			return 0;

This feels a bit inconsistent.  When the pp->count'th submodule is
set not to update (i.e. "none" below), you let this loop to ignore
that submodule and continue on to process pp->count+1'th one without
returning to the caller.  Is there a reason why this case should be
processed differently?  If the rest of the code treats this
condition as a "grave error" that tells the caller to never call
get-next again (i.e. the "emergency abort" condition), that sort of
makes sense, but I cannot offhand see if that is being done in this
patch.

> +		}
> +
> +		if (pp->recursive_prefix)
> +			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
> +		else
> +			displaypath = ce->name;
> +
> +		if (pp->update)
> +			update_module = pp->update;
> +		if (!update_module)
> +			update_module = sub->update;
> +		if (!update_module)
> +			update_module = "checkout";
> +		if (!strcmp(update_module, "none")) {
> +			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
> +			continue;
> +		}
> +
> +		/*
> +		 * Looking up the url in .git/config.
> +		 * We cannot fall back to .gitmodules as we only want to process

s/cannot/must not/, right?

> +		 * configured submodules. This renders the submodule lookup API
> +		 * useless, as it cannot lookup without fallback.
> +		 */

I doubt the value of the last sentence, especially the "useless"
part.

Either "We do not want to read .gitmodules and that is why we do not
use submodule config API, period" (which does not make it "useless",
it is just not meant to be used here at all), or "We do not want to
read .gitmodules in this codepath, and submodule config API cannot
be used here before we teach it an option to only check the config
without falling back" (which does not make it "useless", it is just
that you haven't made it ready to be used here yet).

> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "submodule.%s.url", sub->name);
> +		git_config_get_string(sb.buf, &url);
> +		if (!url) {
> +			/*
> +			 * Only mention uninitialized submodules when its
> +			 * path have been specified
> +			 */
> +			if (pp->pathspec.nr)
> +				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
> +					"Maybe you want to use 'update --init'?"), displaypath);
> +			continue;
> +		}
> +
> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "%s/.git", ce->name);
> +		just_cloned = !file_exists(sb.buf);

That name was misleading and had me scratch my head for a while.
This module is in the "needs cloning" state, and you haven't even
started cloning it yet.

> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
> +				sha1_to_hex(ce->sha1), ce_stage(ce),
> +				just_cloned, ce->name);
> +		string_list_append(&pp->projectlines, sb.buf);
> +
> +		if (just_cloned) {
> +			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
> +					   sub->name, url, pp->reference, pp->depth);
> +			pp->count++;
> +			free(url);
> +			return 1;
> +		} else
> +			free(url);
> +	}
> +	return 0;
> +}

That's it for today.  I'll take a look at the remainder another day.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 15:51       ` Stefan Beller
  2015-10-29 17:23         ` Junio C Hamano
@ 2015-10-29 23:50         ` Ramsay Jones
  2015-11-03 19:41           ` Stefan Beller
  1 sibling, 1 reply; 48+ messages in thread
From: Ramsay Jones @ 2015-10-29 23:50 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine



On 29/10/15 15:51, Stefan Beller wrote:
> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
> <ramsay@ramsayjones.plus.com> wrote:
> 
>> Hmm, is there a way to _not_ fetch in parallel (override the
>> config) from the command line for a given command?
>>
>> ATB,
>> Ramsay Jones
> 
> git config submodule.jobs 42
> git <foo> --jobs 1 # should run just one task, despite having 42 configured

Heh, yes ... I didn't pose the question quite right ...
> 
> It does use the parallel processing machinery though, but with a maximum of
> one subcommand being spawned. Is that what you're asking?

... but, despite that, you correctly inferred what I was really
asking about! :)

I was just wondering what overhead the parallel processing machinery
adds to the original 'non-parallel' code path (for the j=1 case).
I suspect the answer is 'not much', but that's just a guess.
Have you measured it? What happens if there is only a single
submodule to fetch?

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages
  2015-10-28 23:21     ` [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages Stefan Beller
@ 2015-10-30  1:10       ` Eric Sunshine
  2015-10-30 17:32         ` Stefan Beller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sunshine @ 2015-10-30  1:10 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> run_processes_parallel: Add output to tracing messages

This doesn't really say much. I guess you mean that the intention is
to delimit a section in which output from various tasks may be
intermixed. Perhaps:

    run_processes_parallel: delimit intermixed task output

or something.

> This commit serves 2 purposes. First this may help the user who
> tries to diagnose intermixed process calls. Second this may be used
> in a later patch for testing. As the output of a command should not
> change visibly except for going faster, grepping for the trace output
> seems like a viable testing strategy.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/run-command.c b/run-command.c
> index 82cc238..49dec74 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -959,6 +959,9 @@ static struct parallel_processes *pp_init(int n,
>                 n = online_cpus();
>
>         pp->max_processes = n;
> +
> +       trace_printf("run_processes_parallel: preparing to run up to %d children in parallel", n);

s/children/tasks/ maybe?

Minor: Perhaps drop "in parallel" since the parallelism is already
implied by the "run_processes_parallel" prefix.

> +
>         pp->data = data;
>         if (!get_next_task)
>                 die("BUG: you need to specify a get_next_task function");
> @@ -988,6 +991,7 @@ static void pp_cleanup(struct parallel_processes *pp)
>  {
>         int i;
>
> +       trace_printf("run_processes_parallel: parallel processing done");

Minor: Likewise, perhaps just "done" rather than "parallel processing
done" since the "run_processes_parallel" prefix already implies
parallelism.

>         for (i = 0; i < pp->max_processes; i++) {
>                 strbuf_release(&pp->children[i].err);
>                 child_process_deinit(&pp->children[i].process);
> --
> 2.5.0.281.g4ed9cdb

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-28 23:21     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
@ 2015-10-30  1:14       ` Eric Sunshine
  2015-10-30 17:38         ` Stefan Beller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sunshine @ 2015-10-30  1:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> We need the submodule update strategies in a later patch.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff --git a/submodule-config.c b/submodule-config.c
> index afe0ea8..8b8c7d1 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
>                         free((void *) submodule->url);
>                         submodule->url = xstrdup(value);
>                 }
> +       } else if (!strcmp(item.buf, "update")) {
> +               if (!value)
> +                       ret = config_error_nonbool(var);
> +               else if (!me->overwrite && submodule->update != NULL)

Although "foo != NULL" is unusual in this code-base, it is used
elsewhere in this file, including just outside the context seen above.
Okay.

> +                       warn_multiple_config(me->commit_sha1, submodule->name,
> +                                            "update");
> +               else {
> +                       free((void *)submodule->update);

Minor: Every other 'free((void *) foo)' in this file has a space after
"(void *)", one of which can be seen in the context just above.

> +                       submodule->update = xstrdup(value);
> +               }
>         }
>
>         strbuf_release(&name);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 3/8] submodule config: remove name_and_item_from_var
  2015-10-28 23:21     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
@ 2015-10-30  1:23       ` Eric Sunshine
  2015-10-30 18:37         ` Stefan Beller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sunshine @ 2015-10-30  1:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> submodule config: remove name_and_item_from_var
>
> By inlining `name_and_item_from_var` it is easy to add later options
> which are not required to have a submodule name.

I guess you're trying to say that name_and_item_from_var() didn't
provide a proper abstraction, thus wasn't as useful as expected.
Perhaps that commit message could make this shortcoming clearer.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/submodule-config.c b/submodule-config.c
> index 8b8c7d1..4d0563c 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -251,18 +235,25 @@ static int parse_config(const char *var, const char *value, void *data)
>  {
>         struct parse_config_parameter *me = data;
>         struct submodule *submodule;
> -       struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
> -       int ret = 0;
> +       int subsection_len, ret = 0;
> +       const char *subsection, *key;
> +       char *name;
>
> -       /* this also ensures that we only parse submodule entries */
> -       if (!name_and_item_from_var(var, &name, &item))
> +       if (parse_config_key(var, "submodule", &subsection,
> +                            &subsection_len, &key) < 0)
>                 return 0;
>
> +       if (!subsection_len)
> +               return 0;

Alternately:

    if (parse_config_key(var, "submodule", &subsection,
            &subsection_len, &key) < 0 || !subsection_len)
        return 0;

> +
> +       /* subsection is not null terminated */
> +       name = xmemdupz(subsection, subsection_len);
>         submodule = lookup_or_create_by_name(me->cache,
>                                              me->gitmodules_sha1,
> -                                            name.buf);
> +                                            name);
> +       free(name);

Since this is all private to submodule-config.c, I wonder if it would
be cleaner to change lookup_or_create_by_name() to accept a
name_length argument?

> -       if (!strcmp(item.buf, "path")) {
> +       if (!strcmp(key, "path")) {
>                 if (!value)
>                         ret = config_error_nonbool(var);
>                 else if (!me->overwrite && submodule->path != NULL)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 4/8] submodule-config: parse_config
  2015-10-28 23:21     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
@ 2015-10-30  1:53       ` Eric Sunshine
  2015-10-30 19:29         ` Stefan Beller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sunshine @ 2015-10-30  1:53 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> submodule-config: parse_config

Um, what?

> This rewrites parse_config to distinguish between configs specific to
> one submodule and configs which apply generically to all submodules.
> We do not have generic submodule configs yet, but the next patch will
> introduce "submodule.jobs".
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
>
> # Conflicts:
> #       submodule-config.c

Interesting.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/submodule-config.c b/submodule-config.c
> index 4d0563c..1cea404 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -231,27 +231,23 @@ struct parse_config_parameter {
>         int overwrite;
>  };
>
> -static int parse_config(const char *var, const char *value, void *data)
> +static int parse_generic_submodule_config(const char *var,
> +                                         const char *key,
> +                                         const char *value)
>  {
> -       struct parse_config_parameter *me = data;
> -       struct submodule *submodule;
> -       int subsection_len, ret = 0;
> -       const char *subsection, *key;
> -       char *name;
> -
> -       if (parse_config_key(var, "submodule", &subsection,
> -                            &subsection_len, &key) < 0)
> -               return 0;
> -
> -       if (!subsection_len)
> -               return 0;
> +       return 0;
> +}
>
> -       /* subsection is not null terminated */
> -       name = xmemdupz(subsection, subsection_len);
> -       submodule = lookup_or_create_by_name(me->cache,
> -                                            me->gitmodules_sha1,
> -                                            name);
> -       free(name);
> +static int parse_specific_submodule_config(struct parse_config_parameter *me,
> +                                          const char *name,
> +                                          const char *key,
> +                                          const char *value,
> +                                          const char *var)

Minor: Are these 'key', 'value', 'var' arguments analogous to the
like-named arguments of parse_generic_submodule_config()? If so, why
is the order of arguments different?

> +{
> +       int ret = 0;
> +       struct submodule *submodule = lookup_or_create_by_name(me->cache,
> +                                                              me->gitmodules_sha1,
> +                                                              name);
>
>         if (!strcmp(key, "path")) {
>                 if (!value)
> @@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
>         return ret;
>  }
>
> +static int parse_config(const char *var, const char *value, void *data)
> +{
> +       struct parse_config_parameter *me = data;
> +
> +       int subsection_len;
> +       const char *subsection, *key;
> +       char *name;
> +
> +       if (parse_config_key(var, "submodule", &subsection,
> +                            &subsection_len, &key) < 0)
> +               return 0;
> +
> +       if (!subsection_len)
> +               return parse_generic_submodule_config(var, key, value);
> +       else {
> +               int ret;
> +               /* subsection is not null terminated */
> +               name = xmemdupz(subsection, subsection_len);
> +               ret = parse_specific_submodule_config(me, name, key, value, var);
> +               free(name);
> +               return ret;
> +       }
> +}

Minor: You could drop the 'else' and outdent its body, thus losing one
indentation level.

    if (!subsection_len)
        return parse_generic_submodule_config(...);

    int ret;
    ...
    return ret;

This might give you a less noisy diff and would be a bit more
consistent with the early part of the function where you don't bother
giving the if (parse_config_key(...)) an 'else' body.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option
  2015-10-28 23:21     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-30  2:17       ` Eric Sunshine
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Sunshine @ 2015-10-30  2:17 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> This allows to configure fetching and updating in parallel
> without having the command line option.
>
> This moved the responsibility to determine how many parallel processes
> to start from builtin/fetch to submodule.c as we need a way to communicate
> "The user did not specify the number of parallel processes in the command
> line options" in the builtin fetch. The submodule code takes care of
> the precedence (CLI > config > default)
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 391a0c3..785721a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
>         "--ignore-submodules" option. The 'git submodule' commands are not
>         affected by this setting.
>
> +submodule.jobs::
> +       This is used to determine how many submodules can be operated on in
> +       parallel. Specifying a positive integer allows up to that number
> +       of submodules being fetched in parallel. This is used in fetch
> +       and clone operations only. A value of 0 will give some reasonable
> +       default. The defaults may change with different versions of Git.

I'm not sure that "default" is the correct word here. When you talk
about a "default", you're normally explaining what happens when the
configuration is not provided. (In fact, the default number of jobs is
1, which you may want to document here).

>  tag.sort::
>         This variable controls the sort ordering of tags when displayed by
>         linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
> diff --git a/submodule-config.c b/submodule-config.c
> index 1cea404..07bdcdf 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -32,6 +32,7 @@ enum lookup_type {
>
>  static struct submodule_cache cache;
>  static int is_cache_init;
> +static int parallel_jobs = -1;
>
>  static int config_path_cmp(const struct submodule_entry *a,
>                            const struct submodule_entry *b,
> @@ -235,6 +236,9 @@ static int parse_generic_submodule_config(const char *var,
>                                           const char *key,
>                                           const char *value)
>  {
> +       if (!strcmp(key, "jobs")) {
> +               parallel_jobs = strtol(value, NULL, 10);
> +       }

Style: unnecessary braces

Why does this allow a negative value? The documentation doesn't
mention anything about it.

>         return 0;
>  }
>
> diff --git a/submodule.c b/submodule.c
> index 0257ea3..188ba02 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -752,6 +752,11 @@ int fetch_populated_submodules(const struct argv_array *options,
>         argv_array_push(&spf.args, "--recurse-submodules-default");
>         /* default value, "--submodule-prefix" and its value are added later */
>
> +       if (max_parallel_jobs < 0)
> +               max_parallel_jobs = config_parallel_submodules();
> +       if (max_parallel_jobs < 0)
> +               max_parallel_jobs = 1;

run_process_parallel() itself specially handles max_parallel_jobs==0,
so you don't need to consider it here. Okay.

> +
>         calculate_changed_submodule_paths();
>         run_processes_parallel(max_parallel_jobs,
>                                get_next_submodule,
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 1b4ce69..5c3579c 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
>         test_i18ncmp expect.err actual.err
>  '
>
> +test_expect_success 'fetching submodules respects parallel settings' '
> +       git config fetch.recurseSubmodules true &&
> +       (
> +               cd downstream &&
> +               GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
> +               grep "7 children" trace.out &&
> +               git config submodule.jobs 8 &&
> +               GIT_TRACE=$(pwd)/trace.out git fetch &&
> +               grep "8 children" trace.out &&
> +               GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
> +               grep "9 children" trace.out
> +       )
> +'

Not specifically related to this test, but maybe add tests to check
cases when --jobs is not specified, and --jobs=1?

> +
>  test_done
> --
> 2.5.0.281.g4ed9cdb
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages
  2015-10-30  1:10       ` Eric Sunshine
@ 2015-10-30 17:32         ` Stefan Beller
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-30 17:32 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Thu, Oct 29, 2015 at 6:10 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
> On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
>> run_processes_parallel: Add output to tracing messages
>
> This doesn't really say much. I guess you mean that the intention is
> to delimit a section in which output from various tasks may be
> intermixed.

My original intention is to have it there for testing in later patches,
so I am not so much interested in the delimiting but the raw number
%d here.

>     run_processes_parallel: delimit intermixed task output

Sounds good to me, better than my subject.

> s/children/tasks/ maybe?
>
> Minor: Perhaps drop "in parallel" since the parallelism is already
> implied by the "run_processes_parallel" prefix.

done

>> +       trace_printf("run_processes_parallel: parallel processing done");
>
> Minor: Likewise, perhaps just "done" rather than "parallel processing
> done" since the "run_processes_parallel" prefix already implies
> parallelism.

done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-30  1:14       ` Eric Sunshine
@ 2015-10-30 17:38         ` Stefan Beller
  2015-10-30 18:16           ` Eric Sunshine
  0 siblings, 1 reply; 48+ messages in thread
From: Stefan Beller @ 2015-10-30 17:38 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Thu, Oct 29, 2015 at 6:14 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
>> +               else if (!me->overwrite && submodule->update != NULL)
>
> Although "foo != NULL" is unusual in this code-base, it is used
> elsewhere in this file, including just outside the context seen above.
> Okay.

ok, I'll clean that up as we go.

>> +                       free((void *)submodule->update);
>
> Minor: Every other 'free((void *) foo)' in this file has a space after
> "(void *)", one of which can be seen in the context just above.

done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-30 17:38         ` Stefan Beller
@ 2015-10-30 18:16           ` Eric Sunshine
  2015-10-30 18:25             ` Stefan Beller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sunshine @ 2015-10-30 18:16 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Fri, Oct 30, 2015 at 1:38 PM, Stefan Beller <sbeller@google.com> wrote:
> On Thu, Oct 29, 2015 at 6:14 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
>>> +               else if (!me->overwrite && submodule->update != NULL)
>>
>> Although "foo != NULL" is unusual in this code-base, it is used
>> elsewhere in this file, including just outside the context seen above.
>> Okay.
>
> ok, I'll clean that up as we go.

Oh, I wasn't suggesting that you clean this up (though you may if you
want). I was merely commenting (for the sake of others reviewing this
patch) that, while not the norm for the project, this instance is
consistent with surrounding code.

>>> +                       free((void *)submodule->update);
>>
>> Minor: Every other 'free((void *) foo)' in this file has a space after
>> "(void *)", one of which can be seen in the context just above.
>
> done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-30 18:16           ` Eric Sunshine
@ 2015-10-30 18:25             ` Stefan Beller
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-30 18:25 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Fri, Oct 30, 2015 at 11:16 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Fri, Oct 30, 2015 at 1:38 PM, Stefan Beller <sbeller@google.com> wrote:
>> On Thu, Oct 29, 2015 at 6:14 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
>>>> +               else if (!me->overwrite && submodule->update != NULL)
>>>
>>> Although "foo != NULL" is unusual in this code-base, it is used
>>> elsewhere in this file, including just outside the context seen above.
>>> Okay.
>>
>> ok, I'll clean that up as we go.
>
> Oh, I wasn't suggesting that you clean this up (though you may if you
> want). I was merely commenting (for the sake of others reviewing this
> patch) that, while not the norm for the project, this instance is
> consistent with surrounding code.

I only did a separate patch on top cleaning up 4 occurrences in that file.
We use != NULL quite often throughout the code base, specially in
conditions with side effects like:

    while ((char *c = string++) != NULL) {
        ...

where I think that makes even sense. But there are a minor number of
cases where we have no side effects

    $ grep -rI "!= NULL" |grep -v "((" |grep -v "))" |wc -l
    135



>
>>>> +                       free((void *)submodule->update);
>>>
>>> Minor: Every other 'free((void *) foo)' in this file has a space after
>>> "(void *)", one of which can be seen in the context just above.
>>
>> done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 3/8] submodule config: remove name_and_item_from_var
  2015-10-30  1:23       ` Eric Sunshine
@ 2015-10-30 18:37         ` Stefan Beller
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-30 18:37 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Thu, Oct 29, 2015 at 6:23 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
> On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
>> submodule config: remove name_and_item_from_var
>>
>> By inlining `name_and_item_from_var` it is easy to add later options
>> which are not required to have a submodule name.
>
> I guess you're trying to say that name_and_item_from_var() didn't
> provide a proper abstraction, thus wasn't as useful as expected.
> Perhaps that commit message could make this shortcoming clearer.
>

ok

>
>     if (parse_config_key(var, "submodule", &subsection,
>             &subsection_len, &key) < 0 || !subsection_len)
>         return 0;

done

>>         submodule = lookup_or_create_by_name(me->cache,
>>                                              me->gitmodules_sha1,
>> -                                            name.buf);
>> +                                            name);
>> +       free(name);
>
> Since this is all private to submodule-config.c, I wonder if it would
> be cleaner to change lookup_or_create_by_name() to accept a
> name_length argument?
>

That looks amazingly clean. :)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 4/8] submodule-config: parse_config
  2015-10-30  1:53       ` Eric Sunshine
@ 2015-10-30 19:29         ` Stefan Beller
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-10-30 19:29 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Thu, Oct 29, 2015 at 6:53 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
> On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
>> submodule-config: parse_config
>
> Um, what?

submodule-config: Introduce parse_generic_submodule_config

>
>> This rewrites parse_config to distinguish between configs specific to
>> one submodule and configs which apply generically to all submodules.
>> We do not have generic submodule configs yet, but the next patch will
>> introduce "submodule.jobs".
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>>
>> # Conflicts:
>> #       submodule-config.c
>
> Interesting.

fixed

>
> Minor: Are these 'key', 'value', 'var' arguments analogous to the
> like-named arguments of parse_generic_submodule_config()? If so, why
> is the order of arguments different?

Reordered. I thought how they made most sense individually, but consistency
across functions is better.

>
>> +{
>> +       int ret = 0;
>> +       struct submodule *submodule = lookup_or_create_by_name(me->cache,
>> +                                                              me->gitmodules_sha1,
>> +                                                              name);
>>
>>         if (!strcmp(key, "path")) {
>>                 if (!value)
>> @@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
>>         return ret;
>>  }
>>
>> +static int parse_config(const char *var, const char *value, void *data)
>> +{
>> +       struct parse_config_parameter *me = data;
>> +
>> +       int subsection_len;
>> +       const char *subsection, *key;
>> +       char *name;
>> +
>> +       if (parse_config_key(var, "submodule", &subsection,
>> +                            &subsection_len, &key) < 0)
>> +               return 0;
>> +
>> +       if (!subsection_len)
>> +               return parse_generic_submodule_config(var, key, value);
>> +       else {
>> +               int ret;
>> +               /* subsection is not null terminated */
>> +               name = xmemdupz(subsection, subsection_len);
>> +               ret = parse_specific_submodule_config(me, name, key, value, var);
>> +               free(name);
>> +               return ret;
>> +       }
>> +}
>
> Minor: You could drop the 'else' and outdent its body, thus losing one
> indentation level.

By passing on the subsection, subsection_len, we only have one statement there

     if (!subsection_len)
         return parse_generic_submodule_config(key, var, value, me);
     else
         return parse_specific_submodule_config(subsection,
               subsection_len, key,
                  var, value, me);

will do without dedenting I guess.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones
  2015-10-28 23:21     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-11-01  8:58       ` Eric Sunshine
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Sunshine @ 2015-11-01  8:58 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> Just pass it along to "git submodule update", which may pick reasonable
> defaults if you don't specify an explicit number.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> @@ -724,8 +723,20 @@ static int checkout(void)
>         err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>                            sha1_to_hex(sha1), "1", NULL);
>
> -       if (!err && option_recursive)
> -               err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
> +       if (!err && option_recursive) {
> +               struct argv_array args = ARGV_ARRAY_INIT;
> +               argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
> +
> +               if (max_jobs != -1) {
> +                       struct strbuf sb = STRBUF_INIT;
> +                       strbuf_addf(&sb, "--jobs=%d", max_jobs);
> +                       argv_array_push(&args, sb.buf);
> +                       strbuf_release(&sb);

The above four lines can be collapsed to:

    argv_array_pushf(&args, "--jobs=%d", max_jobs);

> +               }
> +
> +               err = run_command_v_opt(args.argv, RUN_GIT_CMD);
> +               argv_array_clear(&args);
> +       }
>
>         return err;
>  }

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 23:50         ` Ramsay Jones
@ 2015-11-03 19:41           ` Stefan Beller
  0 siblings, 0 replies; 48+ messages in thread
From: Stefan Beller @ 2015-11-03 19:41 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Thu, Oct 29, 2015 at 4:50 PM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>
>
> On 29/10/15 15:51, Stefan Beller wrote:
>> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>> <ramsay@ramsayjones.plus.com> wrote:
>>
>>> Hmm, is there a way to _not_ fetch in parallel (override the
>>> config) from the command line for a given command?
>>>
>>> ATB,
>>> Ramsay Jones
>>
>> git config submodule.jobs 42
>> git <foo> --jobs 1 # should run just one task, despite having 42 configured
>
> Heh, yes ... I didn't pose the question quite right ...
>>
>> It does use the parallel processing machinery though, but with a maximum of
>> one subcommand being spawned. Is that what you're asking?
>
> ... but, despite that, you correctly inferred what I was really
> asking about! :)
>
> I was just wondering what overhead the parallel processing machinery
> adds to the original 'non-parallel' code path (for the j=1 case).
> I suspect the answer is 'not much', but that's just a guess.
> Have you measured it?

Totally unscientific:
 * Make a copy of my current gerrit repository and time the fetch.
 * That repo contains 5 submodules, one needs fetching

time git fetch --recurse-submodules=yes --jobs=1 # this series
real 0m7.150s
user 0m3.459s
sys 0m1.126s

time git fetch --recurse-submodules=yes # origin/master
real 0m7.667s
user 0m3.439s
sys 0m1.190s

Now let's test a few more times repeatedly to avoid cold caches or
network hiccups, (also there is nothing to fetch, so it's more like doing
6 ls-remotes in a row, one for gerrit and 5 submodules)

this series, best out of 5:
real 0m3.971s
user 0m2.447s
sys 0m0.452s

this series, worst out of 5:
real 0m4.229s
user 0m2.506s
sys 0m0.413s

origin/master, best out of 5:
real 0m3.968s
user 0m2.516s
sys 0m0.380s

origin/master, worst out of 5:
real 0m4.217s
user 0m2.472s
sys 0m0.408s

The ratio of real time taken longer is < 1 % in
both the best and worst case.

If you really care about 1 % of performance, you'd want to fetch in
parallel anyway?


> What happens if there is only a single
> submodule to fetch?

Ok let's see. I created https://github.com/stefanbeller/test-sub-1
to play around with it. However
time git fetch --recurse-submodules=yes
or
time git fetch --recurse-submodules=yes --jobs 100
seems to be lost in the noise.

So I am not sure what the question is w.r.t. having just one
submodule.


>
> ATB,
> Ramsay Jones
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-11-03 19:41 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27 18:15 [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
2015-10-27 18:15 ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
2015-10-27 21:26   ` Jonathan Nieder
2015-10-27 21:39     ` Junio C Hamano
2015-10-27 18:15 ` [PATCH 2/9] submodule config: keep update strategy around Stefan Beller
2015-10-27 18:15 ` [PATCH 3/9] run_processes_parallel: Add output to tracing messages Stefan Beller
2015-10-27 18:15 ` [PATCH 4/9] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-10-27 18:15 ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
2015-10-27 20:59   ` Junio C Hamano
2015-10-28 21:40     ` Stefan Beller
2015-10-28 22:20       ` Junio C Hamano
2015-10-27 18:15 ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2015-10-27 20:57   ` Junio C Hamano
2015-10-28 20:50     ` Stefan Beller
2015-10-27 18:15 ` [PATCH 7/9] submodule config: remove name_and_item_from_var Stefan Beller
2015-10-27 18:15 ` [PATCH 8/9] submodule-config: parse_config Stefan Beller
2015-10-27 18:15 ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
2015-10-27 21:00   ` Junio C Hamano
2015-10-27 19:12 ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
2015-10-28 23:21   ` [PATCHv2 0/8] " Stefan Beller
2015-10-28 23:21     ` [PATCHv2 1/8] run_processes_parallel: Add output to tracing messages Stefan Beller
2015-10-30  1:10       ` Eric Sunshine
2015-10-30 17:32         ` Stefan Beller
2015-10-28 23:21     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
2015-10-30  1:14       ` Eric Sunshine
2015-10-30 17:38         ` Stefan Beller
2015-10-30 18:16           ` Eric Sunshine
2015-10-30 18:25             ` Stefan Beller
2015-10-28 23:21     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
2015-10-30  1:23       ` Eric Sunshine
2015-10-30 18:37         ` Stefan Beller
2015-10-28 23:21     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
2015-10-30  1:53       ` Eric Sunshine
2015-10-30 19:29         ` Stefan Beller
2015-10-28 23:21     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
2015-10-30  2:17       ` Eric Sunshine
2015-10-28 23:21     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-10-29 22:34       ` Junio C Hamano
2015-10-28 23:21     ` [PATCHv2 7/8] submodule update: expose parallelism to the user Stefan Beller
2015-10-28 23:21     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2015-11-01  8:58       ` Eric Sunshine
2015-10-29 13:19     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
2015-10-29 15:51       ` Stefan Beller
2015-10-29 17:23         ` Junio C Hamano
2015-10-29 17:30           ` Stefan Beller
2015-10-29 23:50         ` Ramsay Jones
2015-11-03 19:41           ` Stefan Beller
2015-10-29 20:12     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).