git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  @ 2015-10-12 23:50  5%     ` Junio C Hamano
  2015-10-16 17:04  2%       ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-12 23:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Heiko Voigt, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> There is core.preloadIndex to enable parallel index preload, but
> that is boolean and not giving fine control to the user. We want to give
> fine control to the user here I'd assume.

I'd approach this as "fetching multiple submodules at a time", if I
were deciding its name.

> ... We could also make it a
> submodule specifc thing (submodule.jobs), but that would collide
> with submodule.<name>.<foo> maybe?

I do not think so.  You can have

	area.attr1
	area.attr3
        area."userthing1".attr1
        area."userthing2".attr1

and the parser can differenciate them just fine.

So if you want

    [submodule]
        fetchParallel = 16
	updateParallel = 4

I do not think that would interfere with any

    [submodule "name"]
    	var = val

You can choose to even allow an attribute that is fundamentally per
"userthing" (e.g. the branch, the remote, the submodule) defined
with area."userthing".attr, but make area.attr to be the fallback
value for unspecified area."userthing9".attr (I think http.*.*
hierarchy takes that approach), but I do not think the parallelism
of fetching is something that should be specified per submodule.

>> The parallel_process API could learn a new "verbose" feature that it
>> by itself shows some messages like
>>
>>     "processing the 'frotz' job with N tasks"
>>     "M tasks finished (N still running)"
>
> I know what to fill in for M and N, 'frotz' is a bit unclear to me.

The caller would pass the label to pp_init(); in this codepath
perhaps it will say 'submodule fetch' or something.

^ permalink raw reply	[relevance 5%]

* [PATCHv2] submodule-config: Shorten logic in parse_config
  @ 2015-10-13  0:02 24% ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-13  0:02 UTC (permalink / raw)
  To: gitster; +Cc: git, jens.lehmann, Stefan Beller, Eric Sunshine, Heiko Voigt

This makes the parsing more concise by removing the forward goto as well
as unifying the structure of parsing the {ignore, url, path} options.
By unifying we introduce subtle changes in the error cases. We notice
non-boolean variables before noticing duplicate variables now.

CC: Eric Sunshine <sunshine@sunshineco.com>
CC: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 69 ++++++++++++++++++++----------------------------------
 1 file changed, 26 insertions(+), 43 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 393de53..96f1a0b 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -261,74 +261,57 @@ static int parse_config(const char *var, const char *value, void *data)
 			name.buf);
 
 	if (!strcmp(item.buf, "path")) {
-		struct strbuf path = STRBUF_INIT;
-		if (!value) {
+		if (!value)
 			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (!me->overwrite && submodule->path != NULL) {
+		else if (!me->overwrite && submodule->path != NULL)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"path");
-			goto release_return;
+		else {
+			if (submodule->path)
+				cache_remove_path(me->cache, submodule);
+			free((void *) submodule->path);
+			submodule->path = xstrdup(value);
+			cache_put_path(me->cache, submodule);
 		}
-
-		if (submodule->path)
-			cache_remove_path(me->cache, submodule);
-		free((void *) submodule->path);
-		strbuf_addstr(&path, value);
-		submodule->path = strbuf_detach(&path, NULL);
-		cache_put_path(me->cache, submodule);
 	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
-		    submodule->fetch_recurse != RECURSE_SUBMODULES_NONE) {
+		    submodule->fetch_recurse != RECURSE_SUBMODULES_NONE)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"fetchrecursesubmodules");
-			goto release_return;
-		}
-
-		submodule->fetch_recurse = parse_fetch_recurse(var, value,
+		else
+			submodule->fetch_recurse = parse_fetch_recurse(
+								var, value,
 								die_on_error);
 	} else if (!strcmp(item.buf, "ignore")) {
-		struct strbuf ignore = STRBUF_INIT;
-		if (!me->overwrite && submodule->ignore != NULL) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->ignore != NULL)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"ignore");
-			goto release_return;
-		}
-		if (!value) {
-			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (strcmp(value, "untracked") && strcmp(value, "dirty") &&
-		    strcmp(value, "all") && strcmp(value, "none")) {
+		else if (strcmp(value, "untracked") &&
+			 strcmp(value, "dirty") &&
+			 strcmp(value, "all") &&
+			 strcmp(value, "none"))
 			warning("Invalid parameter '%s' for config option "
 					"'submodule.%s.ignore'", value, var);
-			goto release_return;
+		else {
+			free((void *) submodule->ignore);
+			submodule->ignore = xstrdup(value);
 		}
-
-		free((void *) submodule->ignore);
-		strbuf_addstr(&ignore, value);
-		submodule->ignore = strbuf_detach(&ignore, NULL);
 	} else if (!strcmp(item.buf, "url")) {
-		struct strbuf url = STRBUF_INIT;
 		if (!value) {
 			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (!me->overwrite && submodule->url != NULL) {
+		} else if (!me->overwrite && submodule->url != NULL) {
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"url");
-			goto release_return;
+		} else {
+			free((void *) submodule->url);
+			submodule->url = xstrdup(value);
 		}
-
-		free((void *) submodule->url);
-		strbuf_addstr(&url, value);
-		submodule->url = strbuf_detach(&url, NULL);
 	}
 
-release_return:
 	strbuf_release(&name);
 	strbuf_release(&item);
 
-- 
2.5.0.267.g8d6e698.dirty

^ permalink raw reply related	[relevance 24%]

* [RFC PATCHv1 00/12] git submodule update in C with parallel cloning
@ 2015-10-16  1:52 10% Stefan Beller
  2015-10-16  1:52 20% ` [PATCH 01/12] git submodule update: Announce skipping submodules on stderr Stefan Beller
                   ` (11 more replies)
  0 siblings, 12 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

So eventually we want to have projects with lots of submodules such as Android
(which would have O(1000) submodules).

The very first thing a user does is cloning a project, so we want to impress
with speed there as well. git clone however is lazy and just calls
`git submodule update --init --recursive`. So we need to make that fast.

This series rewrites parts of git submodule update in C and in the second-last
patch it separates cloning and doing the other actions(checkout/rebase/merge etc)
by doing the cloning first and then the rest.

The last patch (which is broken in the first version of the series), then 
proceeds to put the cloning of the submodules into the get_next_task callback
of the parallel process API.

That said, the first few patches introduce some churn in the behavior and tests
of Git, so maybe put your eyes there?

Thanks for any advice,
Stefan

Stefan Beller (12):
  git submodule update: Announce skipping submodules on stderr
  git submodule update: Announce uninitialized modules on stderr
  git submodule update: Move branch calculation to where it's needed
  git submodule update: Announce outcome of submodule operation to
    stderr
  git submodule update: Use its own list implementation.
  git submodule update: Handle unmerged submodules in C
  submodule config: keep update strategy around
  git submodule update: check for "none" in C
  git submodule update: Check url in C
  git submodule update: Clone projects from within C
  submodule--helper: Do not emit submodules to process directly.
  WIP/broken Clone all outstanding submodules in parallel

 builtin/submodule--helper.c | 221 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  38 +++-----
 submodule-config.c          |  11 +++
 submodule-config.h          |   1 +
 t/t7400-submodule-basic.sh  |  12 +--
 t/t7406-submodule-update.sh |  12 +--
 6 files changed, 256 insertions(+), 39 deletions(-)

-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply	[relevance 10%]

* [PATCH 02/12] git submodule update: Announce uninitialized modules on stderr
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
  2015-10-16  1:52 20% ` [PATCH 01/12] git submodule update: Announce skipping submodules on stderr Stefan Beller
@ 2015-10-16  1:52 26% ` Stefan Beller
  2015-10-16 20:54  4%   ` Junio C Hamano
  2015-10-16  1:52 17% ` [PATCH 03/12] git submodule update: Move branch calculation to where it's needed Stefan Beller
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh           |  2 +-
 t/t7400-submodule-basic.sh | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 578ec48..eea27f8 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -693,7 +693,7 @@ cmd_update()
 			# Only mention uninitialized submodules when its
 			# path have been specified
 			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
+			say >&2 "$(eval_gettext "Submodule path '\$displaypath' not initialized
 Maybe you want to use 'update --init'?")"
 			continue
 		fi
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..32a89b8 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,9 +462,9 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
-	cat update.out &&
-	test_i18ngrep "not initialized" update.out &&
+	git submodule update init 2> update.err &&
+	cat update.err &&
+	test_i18ngrep "not initialized" update.err &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
 
 	git submodule update --init init &&
@@ -480,9 +480,9 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
-		cat update.out &&
-		test_i18ngrep "not initialized" update.out &&
+		git submodule update ../init 2>update.err &&
+		cat update.err &&
+		test_i18ngrep "not initialized" update.err &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
 
 		git submodule update --init ../init
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 26%]

* [PATCH 01/12] git submodule update: Announce skipping submodules on stderr
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
@ 2015-10-16  1:52 20% ` Stefan Beller
  2015-10-16 20:37  5%   ` Junio C Hamano
  2015-10-16  1:52 26% ` [PATCH 02/12] git submodule update: Announce uninitialized modules " Stefan Beller
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 8b0eb9a..578ec48 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -684,7 +684,7 @@ cmd_update()
 
 		if test "$update_module" = "none"
 		then
-			echo "Skipping submodule '$displaypath'"
+			echo >&2 "Skipping submodule '$displaypath'"
 			continue
 		fi
 
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 20%]

* [PATCH 03/12] git submodule update: Move branch calculation to where it's needed
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
  2015-10-16  1:52 20% ` [PATCH 01/12] git submodule update: Announce skipping submodules on stderr Stefan Beller
  2015-10-16  1:52 26% ` [PATCH 02/12] git submodule update: Announce uninitialized modules " Stefan Beller
@ 2015-10-16  1:52 17% ` Stefan Beller
  2015-10-16 20:54  4%   ` Junio C Hamano
  2015-10-16  1:52 32% ` [PATCH 04/12] git submodule update: Announce outcome of submodule operation to stderr Stefan Beller
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The branch variable is used only once so calculate it only when needed.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index eea27f8..56a0524 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -668,7 +668,6 @@ cmd_update()
 		fi
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
-		branch=$(get_submodule_config "$name" branch master)
 		if ! test -z "$update"
 		then
 			update_module=$update
@@ -718,6 +717,7 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$sm_path'")"
 			fi
 			remote_name=$(clear_local_git_env; cd "$sm_path" && get_default_remote)
+			branch=$(get_submodule_config "$name" branch master)
 			sha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify "${remote_name}/${branch}") ||
 			die "$(eval_gettext "Unable to find current ${remote_name}/${branch} revision in submodule path '\$sm_path'")"
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 17%]

* [PATCH 09/12] git submodule update: Check url in C
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (7 preceding siblings ...)
  2015-10-16  1:52 23% ` [PATCH 08/12] git submodule update: check for "none" in C Stefan Beller
@ 2015-10-16  1:52 23% ` Stefan Beller
  2015-10-16  1:52 20% ` [PATCH 10/12] git submodule update: Clone projects from within C Stefan Beller
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 21 +++++++++++++++++++++
 git-submodule.sh            | 10 ----------
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 73954ac..7a2fd4e 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -300,6 +300,7 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 		const struct cache_entry *ce = list.entries[i];
 		struct strbuf sb = STRBUF_INIT;
 		const char *update_module = NULL;
+		const char *url = NULL;
 
 		char *env_prefix = getenv("prefix");
 		if (ce_stage(ce)) {
@@ -329,6 +330,26 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string_const(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pathspec.nr)
+				fprintf(stderr, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
 		printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
 		utf8_fprintf(stdout, "%s\n", ce->name);
 	}
diff --git a/git-submodule.sh b/git-submodule.sh
index 227fed6..80f41b2 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -677,16 +677,6 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say >&2 "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
 		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
 		then
 			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 23%]

* [PATCH 11/12] submodule--helper: Do not emit submodules to process directly.
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (9 preceding siblings ...)
  2015-10-16  1:52 20% ` [PATCH 10/12] git submodule update: Clone projects from within C Stefan Beller
@ 2015-10-16  1:52 13% ` Stefan Beller
  2015-10-16  1:52 18% ` [PATCH 12/12] WIP/broken Clone all outstanding submodules in parallel Stefan Beller
  11 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This will allow us to refactor the loop to use the parallel process
API.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d1684cf..fa8c008 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -297,6 +297,8 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 	char *update = NULL;
 	struct pathspec pathspec;
 	struct module_list list = MODULE_LIST_INIT;
+	struct string_list projectlines = STRING_LIST_INIT_DUP;
+	struct string_list_item *item;
 
 	struct option module_list_options[] = {
 		OPT_STRING(0, "prefix", &prefix,
@@ -403,9 +405,15 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 				return 1;
 			}
 		}
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&projectlines, sb.buf);
+	}
 
-		printf("%06o %s %d %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce), just_cloned);
-		utf8_fprintf(stdout, "%s\n", ce->name);
+	for_each_string_list_item(item, &projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
 	}
 	return 0;
 }
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 13%]

* [PATCH 10/12] git submodule update: Clone projects from within C
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (8 preceding siblings ...)
  2015-10-16  1:52 23% ` [PATCH 09/12] git submodule update: Check url " Stefan Beller
@ 2015-10-16  1:52 20% ` Stefan Beller
  2015-10-16  1:52 13% ` [PATCH 11/12] submodule--helper: Do not emit submodules to process directly Stefan Beller
  2015-10-16  1:52 18% ` [PATCH 12/12] WIP/broken Clone all outstanding submodules in parallel Stefan Beller
  11 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 56 ++++++++++++++++++++++++++++++++++++++++++++-
 git-submodule.sh            | 12 ++++++----
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 7a2fd4e..d1684cf 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -260,9 +260,40 @@ static int git_submodule_config(const char *var, const char *value, void *cb)
 	return parse_submodule_config_option(var, value);
 }
 
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
 static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	int quiet;
+	char *reference = NULL, *depth = NULL;
 	char *update = NULL;
 	struct pathspec pathspec;
 	struct module_list list = MODULE_LIST_INIT;
@@ -274,6 +305,13 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "update", &update,
 			   N_("string"),
 			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &reference, "<repository>",
+			N_("Use the local reference repository "
+			   "instead of a full clone")),
+		OPT_STRING(0, "depth", &depth, "<depth>",
+			N_("Create a shallow clone truncated to the "
+			   "specified number of revisions")),
+		OPT__QUIET(&quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
 
@@ -301,6 +339,7 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 		struct strbuf sb = STRBUF_INIT;
 		const char *update_module = NULL;
 		const char *url = NULL;
+		int just_cloned = 0;
 
 		char *env_prefix = getenv("prefix");
 		if (ce_stage(ce)) {
@@ -350,7 +389,22 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		if (just_cloned) {
+			struct child_process cp = CHILD_PROCESS_INIT;
+			fill_clone_command(&cp, quiet, prefix, ce->name,
+					   sub->name, url, reference, depth);
+
+			if (run_command(&cp)) {
+				printf("#unmatched\n");
+				return 1;
+			}
+		}
+
+		printf("%06o %s %d %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce), just_cloned);
 		utf8_fprintf(stdout, "%s\n", ce->name);
 	}
 	return 0;
diff --git a/git-submodule.sh b/git-submodule.sh
index 80f41b2..28f1757 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -656,9 +656,14 @@ cmd_update()
 	fi
 
 	cloned_modules=
-	git submodule--helper list-or-clone --prefix "$wt_prefix" ${update:+--update "$update"} "$@" | {
+	git submodule--helper list-or-clone ${GIT_QUIET:+--quiet} \
+		--prefix "$wt_prefix" \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
 
@@ -677,9 +682,8 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test "$just_cloned" = 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
 			cloned_modules="$cloned_modules;$name"
 			subsha1=
 		else
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 20%]

* [PATCH 12/12] WIP/broken Clone all outstanding submodules in parallel
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (10 preceding siblings ...)
  2015-10-16  1:52 13% ` [PATCH 11/12] submodule--helper: Do not emit submodules to process directly Stefan Beller
@ 2015-10-16  1:52 18% ` Stefan Beller
  11 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Our tests scream at this patch, it's just to show what I plan to do.
Essentially moving the content of the loop into the get_next_task
callback from the run_processes_parallel.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 181 +++++++++++++++++++++++++++++---------------
 1 file changed, 119 insertions(+), 62 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index fa8c008..c66aa53 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -289,65 +289,40 @@ static void fill_clone_command(struct child_process *cp, int quiet,
 		argv_array_push(&cp->args, depth);
 }
 
-static int module_list_or_clone(int argc, const char **argv, const char *prefix)
-{
-	int i;
-	int quiet;
-	char *reference = NULL, *depth = NULL;
-	char *update = NULL;
+struct submodule_list_or_clone {
 	struct pathspec pathspec;
-	struct module_list list = MODULE_LIST_INIT;
-	struct string_list projectlines = STRING_LIST_INIT_DUP;
-	struct string_list_item *item;
-
-	struct option module_list_options[] = {
-		OPT_STRING(0, "prefix", &prefix,
-			   N_("path"),
-			   N_("alternative anchor for relative paths")),
-		OPT_STRING(0, "update", &update,
-			   N_("string"),
-			   N_("update command for submodules")),
-		OPT_STRING(0, "reference", &reference, "<repository>",
-			N_("Use the local reference repository "
-			   "instead of a full clone")),
-		OPT_STRING(0, "depth", &depth, "<depth>",
-			N_("Create a shallow clone truncated to the "
-			   "specified number of revisions")),
-		OPT__QUIET(&quiet, N_("do't print cloning progress")),
-		OPT_END()
-	};
-
-	const char *const git_submodule_helper_usage[] = {
-		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
-		NULL
-	};
-
-	argc = parse_options(argc, argv, prefix, module_list_options,
-			     git_submodule_helper_usage, 0);
-
-	if (module_list_compute(argc, argv, prefix, &pathspec, &list) < 0) {
-		printf("#unmatched\n");
-		return 1;
-	}
+	struct module_list list;
+	struct string_list projectlines;
+	int count;
+	int quiet;
+	char *reference;
+	char *depth;
+	char *update;
+	char *env_prefix;
+	const char *prefix;
+	int print_unmatched;
+};
 
-	gitmodules_config();
-	/* Overlay the parsed .gitmodules file with .git/config */
-	git_config(git_submodule_config, NULL);
+static int get_next_task(void **pp_task_cb,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb)
+{
+	struct submodule_list_or_clone *pp = pp_cb;
 
-	for (i = 0; i < list.nr; i++) {
+	for (; pp->count < pp->list.nr; pp->count++) {
 		const struct submodule *sub = NULL;
 		const char *displaypath = NULL;
-		const struct cache_entry *ce = list.entries[i];
+		const struct cache_entry *ce = pp->list.entries[pp->count];
 		struct strbuf sb = STRBUF_INIT;
 		const char *update_module = NULL;
 		const char *url = NULL;
 		int just_cloned = 0;
 
-		char *env_prefix = getenv("prefix");
 		if (ce_stage(ce)) {
-			if (env_prefix)
+			if (pp->env_prefix)
 				fprintf(stderr, "Skipping unmerged submodule %s/%s\n",
-					env_prefix, ce->name);
+					pp->env_prefix, ce->name);
 			else
 				fprintf(stderr, "Skipping unmerged submodule %s\n",
 					ce->name);
@@ -355,13 +330,13 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 		}
 
 		sub = submodule_from_path(null_sha1, ce->name);
-		if (env_prefix)
-			displaypath = relative_path(env_prefix, ce->name, &sb);
+		if (pp->env_prefix)
+			displaypath = relative_path(pp->env_prefix, ce->name, &sb);
 		else
 			displaypath = ce->name;
 
-		if (update)
-			update_module = update;
+		if (pp->update)
+			update_module = pp->update;
 		if (!update_module)
 			update_module = sub->update;
 		if (!update_module)
@@ -385,7 +360,7 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 			 * Only mention uninitialized submodules when its
 			 * path have been specified
 			 */
-			if (pathspec.nr)
+			if (pp->pathspec.nr)
 				fprintf(stderr, _("Submodule path '%s' not initialized\n"
 					"Maybe you want to use 'update --init'?"), displaypath);
 			continue;
@@ -396,23 +371,105 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 		just_cloned = !file_exists(sb.buf);
 
 		if (just_cloned) {
-			struct child_process cp = CHILD_PROCESS_INIT;
-			fill_clone_command(&cp, quiet, prefix, ce->name,
-					   sub->name, url, reference, depth);
-
-			if (run_command(&cp)) {
-				printf("#unmatched\n");
-				return 1;
-			}
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			return 1;
 		}
 		strbuf_reset(&sb);
 		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
 				sha1_to_hex(ce->sha1), ce_stage(ce),
 				just_cloned, ce->name);
-		string_list_append(&projectlines, sb.buf);
+		string_list_append(&pp->projectlines, sb.buf);
+	}
+	return 0;
+}
+
+static int start_failure(struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	struct submodule_list_or_clone *pp = pp_cb;
+
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int task_finished(int result,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	struct submodule_list_or_clone *pp = pp_cb;
+
+	if (!result)
+		return 0;
+	else {
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int module_list_or_clone(int argc, const char **argv, const char *prefix)
+{
+	struct submodule_list_or_clone pp;
+	struct string_list_item *item;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &pp.prefix,
+			   N_("path"),
+			   N_("alternative anchor for relative paths")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			N_("Use the local reference repository "
+			   "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			N_("Create a shallow clone truncated to the "
+			   "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+
+	pp.prefix = NULL;
+	pp.list.entries = NULL;
+	pp.list.alloc = 0;
+	pp.list.nr = 0;
+	string_list_init(&pp.projectlines, 1);
+	pp.count = 0;
+	pp.reference = NULL;
+	pp.depth = NULL;
+	pp.update = NULL;
+	pp.env_prefix = getenv("prefix");
+	pp.print_unmatched = 0;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, pp.prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+
+	run_processes_parallel(1, get_next_task, start_failure, task_finished, &pp);
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
 	}
 
-	for_each_string_list_item(item, &projectlines) {
+	for_each_string_list_item(item, &pp.projectlines) {
 		utf8_fprintf(stdout, "%s", item->string);
 	}
 	return 0;
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 18%]

* [PATCH 04/12] git submodule update: Announce outcome of submodule operation to stderr
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (2 preceding siblings ...)
  2015-10-16  1:52 17% ` [PATCH 03/12] git submodule update: Move branch calculation to where it's needed Stefan Beller
@ 2015-10-16  1:52 32% ` Stefan Beller
  2015-10-16  1:52 18% ` [PATCH 05/12] git submodule update: Use its own list implementation Stefan Beller
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh            |  2 +-
 t/t7406-submodule-update.sh | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 56a0524..bb8b2c7 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -780,7 +780,7 @@ Maybe you want to use 'update --init'?")"
 
 			if (clear_local_git_env; cd "$sm_path" && $command "$sha1")
 			then
-				say "$say_msg"
+				say >&2 "$say_msg"
 			elif test -n "$must_die_on_failure"
 			then
 				die_with_status 2 "$die_msg"
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..f65b81c 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -111,8 +111,8 @@ test_expect_success 'submodule update does not fetch already present commits' '
 	(cd super &&
 	  git submodule update > ../actual 2> ../actual.err
 	) &&
-	test_i18ncmp expected actual &&
-	! test -s actual.err
+	test_i18ncmp expected actual.err &&
+	! test -s actual
 '
 
 test_expect_success 'submodule update should fail due to local changes' '
@@ -702,8 +702,8 @@ test_expect_success 'submodule update places git-dir in superprojects git-dir re
 	rm -rf super_update_r2 &&
 	git clone super_update_r super_update_r2 &&
 	(cd super_update_r2 &&
-	 git submodule update --init --recursive >actual &&
-	 test_i18ngrep "Submodule path .submodule/subsubmodule.: checked out" actual &&
+	 git submodule update --init --recursive 2>actual.err &&
+	 test_i18ngrep "Submodule path .submodule/subsubmodule.: checked out" actual.err &&
 	 (cd submodule/subsubmodule &&
 	  git log > ../../expected
 	 ) &&
@@ -770,8 +770,8 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 (cd deeper/submodule/subsubmodule &&
 	  git checkout HEAD^
 	 ) &&
-	 git submodule update --recursive deeper/submodule >actual &&
-	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
+	 git submodule update --recursive deeper/submodule 2>actual.err &&
+	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual.err
 	)
 '
 test_done
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 32%]

* [PATCH 08/12] git submodule update: check for "none" in C
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (6 preceding siblings ...)
  2015-10-16  1:52 27% ` [PATCH 07/12] submodule config: keep update strategy around Stefan Beller
@ 2015-10-16  1:52 23% ` Stefan Beller
  2015-10-16  1:52 23% ` [PATCH 09/12] git submodule update: Check url " Stefan Beller
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 38 ++++++++++++++++++++++++++++++++++++--
 git-submodule.sh            |  8 +-------
 2 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f81f37a..73954ac 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,9 +255,15 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
 static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	char *update = NULL;
 	struct pathspec pathspec;
 	struct module_list list = MODULE_LIST_INIT;
 
@@ -265,6 +271,9 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "prefix", &prefix,
 			   N_("path"),
 			   N_("alternative anchor for relative paths")),
+		OPT_STRING(0, "update", &update,
+			   N_("string"),
+			   N_("update command for submodules")),
 		OPT_END()
 	};
 
@@ -281,20 +290,45 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 		return 1;
 	}
 
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+
 	for (i = 0; i < list.nr; i++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
 		const struct cache_entry *ce = list.entries[i];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
 
 		char *env_prefix = getenv("prefix");
 		if (ce_stage(ce)) {
 			if (env_prefix)
-				fprintf(stderr, "Skipping unmerged submodule %s/%s",
+				fprintf(stderr, "Skipping unmerged submodule %s/%s\n",
 					env_prefix, ce->name);
 			else
-				fprintf(stderr, "Skipping unmerged submodule %s",
+				fprintf(stderr, "Skipping unmerged submodule %s\n",
 					ce->name);
 			continue;
 		}
 
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (env_prefix)
+			displaypath = relative_path(env_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (update)
+			update_module = update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			fprintf(stderr, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
 		printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
 		utf8_fprintf(stdout, "%s\n", ce->name);
 	}
diff --git a/git-submodule.sh b/git-submodule.sh
index 0754ecd..227fed6 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -656,7 +656,7 @@ cmd_update()
 	fi
 
 	cloned_modules=
-	git submodule--helper list-or-clone --prefix "$wt_prefix" "$@" | {
+	git submodule--helper list-or-clone --prefix "$wt_prefix" ${update:+--update "$update"} "$@" | {
 	err=
 	while read mode sha1 stage sm_path
 	do
@@ -677,12 +677,6 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo >&2 "Skipping submodule '$displaypath'"
-			continue
-		fi
-
 		if test -z "$url"
 		then
 			# Only mention uninitialized submodules when its
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 23%]

* [PATCH 07/12] submodule config: keep update strategy around
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (5 preceding siblings ...)
  2015-10-16  1:52 24% ` [PATCH 06/12] git submodule update: Handle unmerged submodules in C Stefan Beller
@ 2015-10-16  1:52 27% ` Stefan Beller
  2015-10-16  1:52 23% ` [PATCH 08/12] git submodule update: check for "none" in C Stefan Beller
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index 393de53..175bcbb 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -326,6 +327,16 @@ static int parse_config(const char *var, const char *value, void *data)
 		free((void *) submodule->url);
 		strbuf_addstr(&url, value);
 		submodule->url = strbuf_detach(&url, NULL);
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *)submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 release_return:
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 27%]

* [PATCH 05/12] git submodule update: Use its own list implementation.
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (3 preceding siblings ...)
  2015-10-16  1:52 32% ` [PATCH 04/12] git submodule update: Announce outcome of submodule operation to stderr Stefan Beller
@ 2015-10-16  1:52 18% ` Stefan Beller
  2015-10-16 21:02  6%   ` Junio C Hamano
  2015-10-16  1:52 24% ` [PATCH 06/12] git submodule update: Handle unmerged submodules in C Stefan Beller
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Discussions turned out that we cannot parallelize the whole loop below
`git submodule--helper list` in `git submodule update`, because some
changes should be done only one at a time, such as messing up a submodule
and leave it up to the user to cleanup the conflicted rebase or merge.

The submodules which are need to be cloned however do not expect to create
problems which require attention by the user one at a time, so we want to
parallelize that first.

To do so we will start with a literal copy of `git submodule--helper list`
and port over features gradually.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 40 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  2 +-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..47dc9cb 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,45 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int module_list_or_clone(int argc, const char **argv, const char *prefix)
+{
+	int i;
+	struct pathspec pathspec;
+	struct module_list list = MODULE_LIST_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("alternative anchor for relative paths")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pathspec, &list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for (i = 0; i < list.nr; i++) {
+		const struct cache_entry *ce = list.entries[i];
+
+		if (ce_stage(ce))
+			printf("%06o %s U\t", ce->ce_mode, sha1_to_hex(null_sha1));
+		else
+			printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
+
+		utf8_fprintf(stdout, "%s\n", ce->name);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +303,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"list-or-clone", module_list_or_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index bb8b2c7..d2d80e2 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -656,7 +656,7 @@ cmd_update()
 	fi
 
 	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper list-or-clone --prefix "$wt_prefix" "$@" | {
 	err=
 	while read mode sha1 stage sm_path
 	do
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 18%]

* [PATCH 06/12] git submodule update: Handle unmerged submodules in C
  2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
                   ` (4 preceding siblings ...)
  2015-10-16  1:52 18% ` [PATCH 05/12] git submodule update: Use its own list implementation Stefan Beller
@ 2015-10-16  1:52 24% ` Stefan Beller
  2015-10-20 21:11  8%   ` Junio C Hamano
  2015-10-16  1:52 27% ` [PATCH 07/12] submodule config: keep update strategy around Stefan Beller
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-16  1:52 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 15 +++++++++++----
 git-submodule.sh            |  6 +-----
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 47dc9cb..f81f37a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -284,11 +284,18 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
 	for (i = 0; i < list.nr; i++) {
 		const struct cache_entry *ce = list.entries[i];
 
-		if (ce_stage(ce))
-			printf("%06o %s U\t", ce->ce_mode, sha1_to_hex(null_sha1));
-		else
-			printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
+		char *env_prefix = getenv("prefix");
+		if (ce_stage(ce)) {
+			if (env_prefix)
+				fprintf(stderr, "Skipping unmerged submodule %s/%s",
+					env_prefix, ce->name);
+			else
+				fprintf(stderr, "Skipping unmerged submodule %s",
+					ce->name);
+			continue;
+		}
 
+		printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
 		utf8_fprintf(stdout, "%s\n", ce->name);
 	}
 	return 0;
diff --git a/git-submodule.sh b/git-submodule.sh
index d2d80e2..0754ecd 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -661,11 +661,7 @@ cmd_update()
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		if ! test -z "$update"
-- 
2.5.0.277.gfdc362b.dirty

^ permalink raw reply related	[relevance 24%]

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-12 23:50  5%     ` Junio C Hamano
@ 2015-10-16 17:04  2%       ` Stefan Beller
  2015-10-16 17:26  5%         ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-16 17:04 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Nieder
  Cc: git@vger.kernel.org, Heiko Voigt, Jens Lehmann

On Mon, Oct 12, 2015 at 4:50 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> There is core.preloadIndex to enable parallel index preload, but
>> that is boolean and not giving fine control to the user. We want to give
>> fine control to the user here I'd assume.
>
> I'd approach this as "fetching multiple submodules at a time", if I
> were deciding its name.
>

so maybe
    fetch.recurseSubmoduleJobs
    fetch.submoduleJobs
    fetch.jobs
    fetch.connectionsToUse

Eventually we want to also be parallel in git fetch --all, when using
the latter two
we could reuse these then too, no need to support different options for
fetch --all and fetch --recurseSubmodules.


> So if you want
>
>     [submodule]
>         fetchParallel = 16
>         updateParallel = 4

So you would have different settings here for only slightly different things?
So the series I sent out yesterday evening, would make use of updateParallel
for parallel cloning then instead?

^ permalink raw reply	[relevance 2%]

* Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
  2015-10-16 17:04  2%       ` Stefan Beller
@ 2015-10-16 17:26  5%         ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-16 17:26 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jonathan Nieder, git@vger.kernel.org, Heiko Voigt, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> so maybe
>     fetch.recurseSubmoduleJobs
>     fetch.submoduleJobs
>     fetch.jobs
>     fetch.connectionsToUse

"git remote update" is another example that may want to run multiple
independent 'git fetch' in parallel.  I think "When the operation I
ask involves fetching from multiple places, I want N instances of
them to be executed", regardless of the kind of operation ("remote
update" or "submodule update", etc.), would match the end-user's
point of view the best, if you want to give them "set this single
thing to apply to all of them" fallback default.

If you want to give them a finer-grained control, you would need to
differentiate what kind of fetch would use N tasks (as opposed to
other kind of fetch that uses M tasks) and the name would need to
have "submodule" in it for that differentiation.

>> So if you want
>>
>>     [submodule]
>>         fetchParallel = 16
>>         updateParallel = 4
>
> So you would have different settings here for only slightly different things?

I was just showing you that it is _possible_ if you want to give
finer control.  For example, you can define:

 * 'submodule.parallel', if defined gives the values for the
   following more specific ones if they aren't given.

 * 'submodule.fetchParallel' specifies how many tasks are run in
   'fetch --recurse-submodules'.

 * 'submodule.fetchParallel' specifies how many tasks are run in
   'submodule update'.

so that those who want finer controls can, and those who don't can
set a single one to apply to all.

If you want to start with a globally single setting, that is
perfectly fine.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 01/12] git submodule update: Announce skipping submodules on stderr
  2015-10-16  1:52 20% ` [PATCH 01/12] git submodule update: Announce skipping submodules on stderr Stefan Beller
@ 2015-10-16 20:37  5%   ` Junio C Hamano
  2015-10-16 20:47  5%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-16 20:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  git-submodule.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 8b0eb9a..578ec48 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -684,7 +684,7 @@ cmd_update()
>  
>  		if test "$update_module" = "none"
>  		then
> -			echo "Skipping submodule '$displaypath'"
> +			echo >&2 "Skipping submodule '$displaypath'"
>  			continue
>  		fi

Makes sense, but see 02/12.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 01/12] git submodule update: Announce skipping submodules on stderr
  2015-10-16 20:37  5%   ` Junio C Hamano
@ 2015-10-16 20:47  5%     ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

On Fri, Oct 16, 2015 at 1:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  git-submodule.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index 8b0eb9a..578ec48 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -684,7 +684,7 @@ cmd_update()
>>
>>               if test "$update_module" = "none"
>>               then
>> -                     echo "Skipping submodule '$displaypath'"
>> +                     echo >&2 "Skipping submodule '$displaypath'"
>>                       continue
>>               fi
>
> Makes sense, but see 02/12.

The patch (I can't see a reply there) ?

I split them on purpose. This one uses echo as opposed to say and has
no tests to fail.

So this patch documents, that there are no breaking tests. I can just change it
2/12 tells another story: We codified the behavior in tests and rely on it, so
we need to carefully decide if that's a breaking change.



>

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 02/12] git submodule update: Announce uninitialized modules on stderr
  2015-10-16  1:52 26% ` [PATCH 02/12] git submodule update: Announce uninitialized modules " Stefan Beller
@ 2015-10-16 20:54  4%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-16 20:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  git-submodule.sh           |  2 +-
>  t/t7400-submodule-basic.sh | 12 ++++++------
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 578ec48..eea27f8 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -693,7 +693,7 @@ cmd_update()
>  			# Only mention uninitialized submodules when its
>  			# path have been specified
>  			test "$#" != "0" &&
> -			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
> +			say >&2 "$(eval_gettext "Submodule path '\$displaypath' not initialized
>  Maybe you want to use 'update --init'?")"
>  			continue
>  		fi

There are quite a few other calls to "say" in this script, and you
are changing only this one to emit it to the standard error output.

My quick eyeballing of the script tells me that most of them, other
than the ones that are used in cmd_status to report the information
that the user asked to be shown on the standard output, are of "Now
I am doing this" kind fo output, which I feel are the same category
as this one that shouldn't be on the standard output.

Another thing (which relates to the one in 01/12) is that not all
output from this command comes from "say".

Perhaps the first thing to do before doing 01/12 is to sift these
messages into types and have them consistently use helpers designed
for different purposes, e.g.

 - a progress, like this one, the one in 01/12, and many other uses
   of "say"; which may want to become e.g. "say_progress".

 - an error or a warning, like "Could not remove working tree", "not
   initialized, maybe you want to do 'init' first?"; which may want
   to become something else e.g. "say_warning".

 - the real output from the program, e.g. output from cmd_status,
   would use yet another, e.g. "printf '%s\n'".

instead of converting each message that you happened to have noticed.

Note that "say" is squelched under GIT_QUIET (i.e. --quiet).  The
former two helpers we would want to make quiet (or for errors we may
not---I don't know offhand).  I do not think of any valid reason why
we want to squelch the output from cmd_status under --quiet; it is
not like the the while loop on the downstream of "list |" pipe tells
some status via its exit code.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 03/12] git submodule update: Move branch calculation to where it's needed
  2015-10-16  1:52 17% ` [PATCH 03/12] git submodule update: Move branch calculation to where it's needed Stefan Beller
@ 2015-10-16 20:54  4%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-16 20:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> The branch variable is used only once so calculate it only when needed.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---

Makes sense.

>  git-submodule.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index eea27f8..56a0524 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -668,7 +668,6 @@ cmd_update()
>  		fi
>  		name=$(git submodule--helper name "$sm_path") || exit
>  		url=$(git config submodule."$name".url)
> -		branch=$(get_submodule_config "$name" branch master)
>  		if ! test -z "$update"
>  		then
>  			update_module=$update
> @@ -718,6 +717,7 @@ Maybe you want to use 'update --init'?")"
>  				die "$(eval_gettext "Unable to fetch in submodule path '\$sm_path'")"
>  			fi
>  			remote_name=$(clear_local_git_env; cd "$sm_path" && get_default_remote)
> +			branch=$(get_submodule_config "$name" branch master)
>  			sha1=$(clear_local_git_env; cd "$sm_path" &&
>  				git rev-parse --verify "${remote_name}/${branch}") ||
>  			die "$(eval_gettext "Unable to find current ${remote_name}/${branch} revision in submodule path '\$sm_path'")"

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 05/12] git submodule update: Use its own list implementation.
  2015-10-16  1:52 18% ` [PATCH 05/12] git submodule update: Use its own list implementation Stefan Beller
@ 2015-10-16 21:02  6%   ` Junio C Hamano
  2015-10-16 21:08  4%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-16 21:02 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Discussions turned out that we cannot parallelize the whole loop below
> `git submodule--helper list` in `git submodule update`, because some
> changes should be done only one at a time, such as messing up a submodule
> and leave it up to the user to cleanup the conflicted rebase or merge.
>
> The submodules which are need to be cloned however do not expect to create
> problems which require attention by the user one at a time, so we want to
> parallelize that first.
>
> To do so we will start with a literal copy of `git submodule--helper list`
> and port over features gradually.

I am not sure what you mean by this.

Surely, the current implementation of "update" does the fetching and
updating as a single unit of task and iterate over these tasks, and
we would rather want to instead have one iteration of submodules to
do the fetching part (without doing other things that can fail and
have to get attention of the end user), followed by another
iteration that does the "other things", in order to get closer to
the end goal of doing the fetch in parallel and then doing the
remainder one-module-at-a-time sequencially.

I would imagine that the logical first step towards the end goal, if
I understood you correctly, would be to split that single large loop
that does a fetch and other things for a single module in each
iteration into two, one that iterates and fetches all, followed by a
new one that does the checkout/merge.

What I do not understand is why that requires a different kind of
enumerator (unless this is a kind of premature optimization, knowing
that the set of modules iterated by these two loops are slightly
different or something).

>  int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
> diff --git a/git-submodule.sh b/git-submodule.sh
> index bb8b2c7..d2d80e2 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -656,7 +656,7 @@ cmd_update()
>  	fi
>  
>  	cloned_modules=
> -	git submodule--helper list --prefix "$wt_prefix" "$@" | {
> +	git submodule--helper list-or-clone --prefix "$wt_prefix" "$@" | {
>  	err=
>  	while read mode sha1 stage sm_path
>  	do

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 05/12] git submodule update: Use its own list implementation.
  2015-10-16 21:02  6%   ` Junio C Hamano
@ 2015-10-16 21:08  4%     ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-16 21:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

On Fri, Oct 16, 2015 at 2:02 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Discussions turned out that we cannot parallelize the whole loop below
>> `git submodule--helper list` in `git submodule update`, because some
>> changes should be done only one at a time, such as messing up a submodule
>> and leave it up to the user to cleanup the conflicted rebase or merge.
>>
>> The submodules which are need to be cloned however do not expect to create
>> problems which require attention by the user one at a time, so we want to
>> parallelize that first.
>>
>> To do so we will start with a literal copy of `git submodule--helper list`
>> and port over features gradually.
>
> I am not sure what you mean by this.
>
> Surely, the current implementation of "update" does the fetching and
> updating as a single unit of task and iterate over these tasks, and
> we would rather want to instead have one iteration of submodules to
> do the fetching part (without doing other things that can fail and
> have to get attention of the end user), followed by another
> iteration that does the "other things", in order to get closer to
> the end goal of doing the fetch in parallel and then doing the
> remainder one-module-at-a-time sequencially.

I differentiated a bit more and moved the clone parts only.
Fetching should also be no problem. I initially assumed that to be a
problem too.

>
> I would imagine that the logical first step towards the end goal, if
> I understood you correctly, would be to split that single large loop
> that does a fetch and other things for a single module in each
> iteration into two, one that iterates and fetches all, followed by a
> new one that does the checkout/merge.

That was also one of the patch series I wrote (not sent to list)
1) split up into 2 phases
2) rewrite first phase in C
3) parallelize the first phase.

This series merges 1 and 2, so you don't have to review
the same functionality two times.

>
> What I do not understand is why that requires a different kind of
> enumerator (unless this is a kind of premature optimization, knowing
> that the set of modules iterated by these two loops are slightly
> different or something).

It is just moving all code before the clone step into the C part, so
we can call the clone in C.

>
>>  int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index bb8b2c7..d2d80e2 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -656,7 +656,7 @@ cmd_update()
>>       fi
>>
>>       cloned_modules=
>> -     git submodule--helper list --prefix "$wt_prefix" "$@" | {
>> +     git submodule--helper list-or-clone --prefix "$wt_prefix" "$@" | {
>>       err=
>>       while read mode sha1 stage sm_path
>>       do

^ permalink raw reply	[relevance 4%]

* [PATCH 0/5] Fixes for the parallel processing
@ 2015-10-19 18:24  7% Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-19 18:24 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

I noticed a problem with the gracefully abortion in the parallel processing,
that is fixed in patch 1.

Patch 2 makes the API more maintainable/usable (the caller may forget to
call child_process_init and only fill in fields which the callback is
interested in)

Patch 3 is another fixup. It actually initializes the shutdown flag properly.
(Apparently it had the right value so far)

Patches 4 and 5 add more tests both for the gracefully aborting case as well
as the standard use case.

This applies on top of sb/submodule-parallel-fetch which is already in next.
Junio, do you want me to reroll that series with these patches squashed in
appropriately, or just put it on top of that series ?

Thanks,
Stefan

Stefan Beller (5):
  run-command: Fix early shutdown
  run-command: Call get_next_task with a clean child process.
  run-command: Initialize the shutdown flag
  test-run-command: test for gracefully aborting
  test-run-command: Increase test coverage

 run-command.c          |  5 ++++-
 t/t0061-run-command.sh | 28 ++++++++++++++++++++++++++--
 test-run-command.c     | 22 ++++++++++++++++++++--
 3 files changed, 50 insertions(+), 5 deletions(-)

-- 
2.5.0.285.g8fe9b61.dirty

^ permalink raw reply	[relevance 7%]

* Re: [RFC] URL rewrite in .gitmodules
  @ 2015-10-19 22:07  5% ` Stefan Beller
  2015-10-25 14:43  5%   ` Lars Schneider
    1 sibling, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-19 22:07 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git Users

On Mon, Oct 19, 2015 at 12:28 PM, Lars Schneider
<larsxschneider@gmail.com> wrote:
> Hi,
>
> I have a closed source Git repo which references an Open Source Git repo as Submodule. The Open Source Git repo references yet another Open Source repo as submodule. In order to avoid failing builds due to external services I mirrored the Open Source repos in my company network. That works great with the first level of Submodules. Unfortunately it does not work with the second level because the first level still references the "outside of company" repos. I know I can rewrite Git URLs with the git config "url.<base>.insteadOf" option. However, git configs are client specific.

I feel like this is working as intended. You only want to improve your
one client (say the buildbot) to not goto the open source site, while
the developer may do want to fetch from external sources ("Hey shiny
new code!";)


> I would prefer a solution that works without setup on any client. I also know that I could update the .gitmodules file in the Open Source repo on the first level. I also would prefer not to do this as I want to use the very same hashes as defined by the "upstream" Open Source repos.

You could carry a patch on top of the tip of the first submodule
re-pointing the nested submodule. This requires good workflows
available to deal with submodules though. (Fetch and merge or rebase,
git submodule update should be able to do that?)

>
> Is there yet another way in Git to change URLs of Submodules in the way I want it?
>
> If not, what do you think about a patch that adds a "url" section similar to the one in git config to a .gitmodules file?
>

So we have different kinds of git configs. within one repository, in
the home director (global to the one machine),
maybe you would want to have one "global" config on a network share,
such that every box in your company
reads that "company-wide" global config and acts upon that?

> Example:
> ----------
> [submodule "git"]
>         path = git
>         url=git://github.com/larsxschneider/git.git
>
> [url "mycompany.com"]
>         insteadOf = outside.com

Wouldn't that be better put into say a global git config instead of
repeating it for every submodule?

In case of the nested submodule you would need to carry the last lines
as an extra patch anyway
if this was done in the .gitmodules files? Or do you expect this to be
applied recursively (i.e. nested
submodules all the way down also substitute outside.com)


> ----------
>

Am I missing your point?

> Thanks,
> Lars--
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 06/12] git submodule update: Handle unmerged submodules in C
  2015-10-16  1:52 24% ` [PATCH 06/12] git submodule update: Handle unmerged submodules in C Stefan Beller
@ 2015-10-20 21:11  8%   ` Junio C Hamano
  2015-10-20 21:21  5%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-20 21:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  builtin/submodule--helper.c | 15 +++++++++++----
>  git-submodule.sh            |  6 +-----
>  2 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 47dc9cb..f81f37a 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -284,11 +284,18 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
>  	for (i = 0; i < list.nr; i++) {
>  		const struct cache_entry *ce = list.entries[i];
>  
> -		if (ce_stage(ce))
> -			printf("%06o %s U\t", ce->ce_mode, sha1_to_hex(null_sha1));
> -		else
> -			printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
> +		char *env_prefix = getenv("prefix");

This somehow makes me feel dirty.  Do we really export such an
environment variable that is named overly generically to communicate
with our own helpers?

I can see why you need to be able to prefix leading paths (i.e. you
would need to prefix path to the enclosing submodule to a path to
obtain the "global view" from the very top-level superproject while
recursing into nested submodules), but still...

> +		if (ce_stage(ce)) {
> +			if (env_prefix)
> +				fprintf(stderr, "Skipping unmerged submodule %s/%s",
> +					env_prefix, ce->name);
> +			else
> +				fprintf(stderr, "Skipping unmerged submodule %s",
> +					ce->name);
> +			continue;
> +		}
>  
> +		printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
>  		utf8_fprintf(stdout, "%s\n", ce->name);
>  	}
>  	return 0;
> diff --git a/git-submodule.sh b/git-submodule.sh
> index d2d80e2..0754ecd 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -661,11 +661,7 @@ cmd_update()
>  	while read mode sha1 stage sm_path
>  	do
>  		die_if_unmatched "$mode"
> -		if test "$stage" = U
> -		then
> -			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
> -			continue
> -		fi
> +
>  		name=$(git submodule--helper name "$sm_path") || exit
>  		url=$(git config submodule."$name".url)
>  		if ! test -z "$update"

^ permalink raw reply	[relevance 8%]

* Re: [PATCH 06/12] git submodule update: Handle unmerged submodules in C
  2015-10-20 21:11  8%   ` Junio C Hamano
@ 2015-10-20 21:21  5%     ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-20 21:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

On Tue, Oct 20, 2015 at 2:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  builtin/submodule--helper.c | 15 +++++++++++----
>>  git-submodule.sh            |  6 +-----
>>  2 files changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
>> index 47dc9cb..f81f37a 100644
>> --- a/builtin/submodule--helper.c
>> +++ b/builtin/submodule--helper.c
>> @@ -284,11 +284,18 @@ static int module_list_or_clone(int argc, const char **argv, const char *prefix)
>>       for (i = 0; i < list.nr; i++) {
>>               const struct cache_entry *ce = list.entries[i];
>>
>> -             if (ce_stage(ce))
>> -                     printf("%06o %s U\t", ce->ce_mode, sha1_to_hex(null_sha1));
>> -             else
>> -                     printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
>> +             char *env_prefix = getenv("prefix");
>

[Just checked the date, it's the old series. I am about to send out a new
series which collapses some patches in here, is on top of the fixes series and
off course fixes this issue ;) ]

> This somehow makes me feel dirty.  Do we really export such an
> environment variable that is named overly generically to communicate
> with our own helpers?

I agree that this is bad. It was the fastest way.
I should have taken the slower road. I think I'll replace this with
another argument.

>
> I can see why you need to be able to prefix leading paths (i.e. you
> would need to prefix path to the enclosing submodule to a path to
> obtain the "global view" from the very top-level superproject while
> recursing into nested submodules), but still...
>
>> +             if (ce_stage(ce)) {
>> +                     if (env_prefix)
>> +                             fprintf(stderr, "Skipping unmerged submodule %s/%s",
>> +                                     env_prefix, ce->name);
>> +                     else
>> +                             fprintf(stderr, "Skipping unmerged submodule %s",
>> +                                     ce->name);
>> +                     continue;
>> +             }
>>
>> +             printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
>>               utf8_fprintf(stdout, "%s\n", ce->name);
>>       }
>>       return 0;
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index d2d80e2..0754ecd 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -661,11 +661,7 @@ cmd_update()
>>       while read mode sha1 stage sm_path
>>       do
>>               die_if_unmatched "$mode"
>> -             if test "$stage" = U
>> -             then
>> -                     echo >&2 "Skipping unmerged submodule $prefix$sm_path"
>> -                     continue
>> -             fi
>> +
>>               name=$(git submodule--helper name "$sm_path") || exit
>>               url=$(git config submodule."$name".url)
>>               if ! test -z "$update"

^ permalink raw reply	[relevance 5%]

* [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-20 22:43  9% [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
  2015-10-20 22:43 26% ` [PATCH 7/8] submodule config: Keep update strategy around Stefan Beller
@ 2015-10-20 22:43 21% ` Stefan Beller
  2015-10-21 20:47  4%   ` Junio C Hamano
  1 sibling, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode==none) are required in the
helper to make the decision for cloning. These checks have been moved
into the C function as well. (No need to repeat them in the shell
script)

As we can only access the stderr channel from within the parallel
processing engine, so we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so a
bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 222 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 235 insertions(+), 36 deletions(-)
 
 Review is best done starting at the end and scrolling up, as that's how the
 code flows in submodule--helper.c.

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..6d4815a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,227 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	int count;
+	int quiet;
+	int print_unmatched;
+	char *reference;
+	char *depth;
+	char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int get_next_task(void **pp_task_cb,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		const char *url = NULL;
+		int just_cloned = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string_const(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (just_cloned) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static int start_failure(struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int task_finished(int result,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			N_("Use the local reference repository "
+			   "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			N_("Create a shallow clone truncated to the "
+			   "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, get_next_task, start_failure, task_finished, &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +485,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8b0eb9a..ea883b9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -655,17 +655,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -682,27 +683,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -742,13 +726,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[relevance 21%]

* [PATCH 7/8] submodule config: Keep update strategy around
  2015-10-20 22:43  9% [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
@ 2015-10-20 22:43 26% ` Stefan Beller
  2015-10-20 22:43 21% ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
  1 sibling, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

This may conflict with origin/sb/submodule-config-parse, but only on a
syntactical level (this adds an else if {...} just after the refactoredd code).
There is no clash of functionality or semantics.

 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index 393de53..175bcbb 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -326,6 +327,16 @@ static int parse_config(const char *var, const char *value, void *data)
 		free((void *) submodule->url);
 		strbuf_addstr(&url, value);
 		submodule->url = strbuf_detach(&url, NULL);
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *)submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 release_return:
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply related	[relevance 26%]

* [PATCH 0/8] Fixes for the parallel processing engine and git submodule update
@ 2015-10-20 22:43  9% Stefan Beller
  2015-10-20 22:43 26% ` [PATCH 7/8] submodule config: Keep update strategy around Stefan Beller
  2015-10-20 22:43 21% ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
  0 siblings, 2 replies; 200+ results
From: Stefan Beller @ 2015-10-20 22:43 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

Patches 1-6 replace the last 6 patches of sb/submodule-parallel-fetch
(Patch 1,2 changed code, 3,4 stayed as is, 5 has more commit message, 
Patch 6 is the same again)

Patches 7,8 are new in the series .
Patch 7 keeps the update strategy in the cached submodue structs around,
Patch 8 rewrites some small part of the git submodule update script in C
by having another larger helper function in builtin/submodule--helper.c
which takes care of the cloning new submodules without having all the
intermediate steps as in previous versions of this series.

The patch 8 is just a rewrite/translation without enabling the parallel
processing though. This will be done in a later patch once we have
bike shedded enough how to name the user facing option for that.
(I guess the CLI option would be --jobs again, but I'd rather hint at
the config option)

This supersedes 
[RFC PATCHv1 00/12] git submodule update in C with parallel cloning

Any feedback welcome!
Thanks,
Stefan

Stefan Beller (8):
  run-command: Fix early shutdown
  run-command: Call get_next_task with a clean child process.
  run-command: Initialize the shutdown flag
  test-run-command: Test for gracefully aborting
  test-run-command: Increase test coverage
  run-command: Fix missing output from late callbacks
  submodule config: Keep update strategy around
  git submodule update: Have a dedicated helper for cloning

 builtin/submodule--helper.c | 222 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 run-command.c               |  27 +++++-
 submodule-config.c          |  11 +++
 submodule-config.h          |   1 +
 t/t0061-run-command.sh      |  37 +++++++-
 t/t7400-submodule-basic.sh  |   4 +-
 test-run-command.c          |  37 +++++++-
 8 files changed, 340 insertions(+), 44 deletions(-)

-- 
2.5.0.275.gbfc1651.dirty

^ permalink raw reply	[relevance 9%]

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-20 22:43 21% ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
@ 2015-10-21 20:47  4%   ` Junio C Hamano
  2015-10-21 21:06  7%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-21 20:47 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> This introduces a new helper function in git submodule--helper
> which takes care of cloning all submodules, which we want to
> parallelize eventually.
>
> Some tests (such as empty URL, update_mode==none) are required in the
> helper to make the decision for cloning. These checks have been moved
> into the C function as well. (No need to repeat them in the shell
> script)
>
> As we can only access the stderr channel from within the parallel
> processing engine, so we need to reroute the error message for
> specified but initialized submodules to stderr. As it is an error
> message, this should have gone to stderr in the first place, so a
> bug fix along the way.

The last paragraph is hard to parse; perhaps it is slightly
ungrammatical.

It would be a really good idea to split the small bit to redirect
the output that should have gone to the standard error to where it
should as a preparatory step before showing this patch.

I sense that this one is still a WIP/RFC, so I'll only skim it in
this round (but I may come back and read it again later with finer
toothed comb).

> +static int get_next_task(void **pp_task_cb,
> +			 struct child_process *cp,
> +			 struct strbuf *err,
> +			 void *pp_cb)

Will you have only one caller of the parallel run-command API in
this file, or will you be adding more to allow various different
operations run in parallel as more things are rewritten?  I am
guessing that it would be the latter, but if that is the case,
perhaps the function wants to be named a bit more specificly for
this first user, no?  Same for start_failure and task_finished.

> diff --git a/git-submodule.sh b/git-submodule.sh
> index 8b0eb9a..ea883b9 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -655,17 +655,18 @@ cmd_update()
>  		cmd_init "--" "$@" || return
>  	fi
>  
> -	cloned_modules=
> -	git submodule--helper list --prefix "$wt_prefix" "$@" | {
> +	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
> +		${wt_prefix:+--prefix "$wt_prefix"} \
> +		${prefix:+--recursive_prefix "$prefix"} \
> +		${update:+--update "$update"} \
> +		${reference:+--reference "$reference"} \
> +		${depth:+--depth "$depth"} \
> +		"$@" | {
>  	err=
> -	while read mode sha1 stage sm_path
> +	while read mode sha1 stage just_cloned sm_path
>  	do

I wonder if you really want this to be upstream of a pipe.  When the
downstream loop needs to abort, what happens to the remainder of the
"clone" part of the processing that is still ongoing in the upstream
of the pipe?  I would imagine that the "update-clone" network
accessing phase is the more human-time consuming part, so I suspect
that it would be much better to let the cloning part go and finish
first (during which time the human-user can spend time for other
things, like getting cup of coffee or filling expense reports) and
before moving to the loop that can stop and ask the human-user for
help.

The fix for the above could be trivial (do not pipe, just take the
output to a temporary file, and then feed the "while read" loop from
that temporary file), and I suspect it would make a big difference
for usability.

Thanks.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-21 20:47  4%   ` Junio C Hamano
@ 2015-10-21 21:06  7%     ` Stefan Beller
  2015-10-21 21:23  6%       ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-21 21:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Wed, Oct 21, 2015 at 1:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This introduces a new helper function in git submodule--helper
>> which takes care of cloning all submodules, which we want to
>> parallelize eventually.
>>
>> Some tests (such as empty URL, update_mode==none) are required in the
>> helper to make the decision for cloning. These checks have been moved
>> into the C function as well. (No need to repeat them in the shell
>> script)
>>
>> As we can only access the stderr channel from within the parallel
>> processing engine, so we need to reroute the error message for
>> specified but initialized submodules to stderr. As it is an error
>> message, this should have gone to stderr in the first place, so a
>> bug fix along the way.
>
> The last paragraph is hard to parse; perhaps it is slightly
> ungrammatical.

I seem to have started a habit starting my sentences with "so..."
even in spoken English. If left out, this may be easier to read:

    As we can only access the stderr channel from within the parallel
    processing engine, we need to reroute the error message for
    "specified but initialized submodules" to stderr. As it is an error
    message, this should have gone to stderr in the first place.
    It's a bug fix along the way.

>
> It would be a really good idea to split the small bit to redirect
> the output that should have gone to the standard error to where it
> should as a preparatory step before showing this patch.

ok.

>
> I sense that this one is still a WIP/RFC, so I'll only skim it in
> this round (but I may come back and read it again later with finer
> toothed comb).
>
>> +static int get_next_task(void **pp_task_cb,
>> +                      struct child_process *cp,
>> +                      struct strbuf *err,
>> +                      void *pp_cb)
>
> Will you have only one caller of the parallel run-command API in
> this file, or will you be adding more to allow various different
> operations run in parallel as more things are rewritten?  I am
> guessing that it would be the latter, but if that is the case,
> perhaps the function wants to be named a bit more specificly for
> this first user, no?  Same for start_failure and task_finished.

Ok, will rename.
Although I am not sure if I need to rewrite more in C for "git submodule".

I only rewrite git submodule update because git clone --recurse is just
blindly calling out to git submodule update.  So instead of parallelizing
"submodule update" I could have put a parallel submodule clone into
the clone command. (That looks strangely appealing now, because it
may be even faster as there is no downstream pipe with sequential
checkouts, so you could have one parallel pool with chained clone
and checkout commands).

>
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index 8b0eb9a..ea883b9 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -655,17 +655,18 @@ cmd_update()
>>               cmd_init "--" "$@" || return
>>       fi
>>
>> -     cloned_modules=
>> -     git submodule--helper list --prefix "$wt_prefix" "$@" | {
>> +     git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
>> +             ${wt_prefix:+--prefix "$wt_prefix"} \
>> +             ${prefix:+--recursive_prefix "$prefix"} \
>> +             ${update:+--update "$update"} \
>> +             ${reference:+--reference "$reference"} \
>> +             ${depth:+--depth "$depth"} \
>> +             "$@" | {
>>       err=
>> -     while read mode sha1 stage sm_path
>> +     while read mode sha1 stage just_cloned sm_path
>>       do
>
> I wonder if you really want this to be upstream of a pipe.  When the
> downstream loop needs to abort, what happens to the remainder of the
> "clone" part of the processing that is still ongoing in the upstream
> of the pipe?  I would imagine that the "update-clone" network
> accessing phase is the more human-time consuming part, so I suspect
> that it would be much better to let the cloning part go and finish
> first (during which time the human-user can spend time for other
> things, like getting cup of coffee or filling expense reports) and
> before moving to the loop that can stop and ask the human-user for
> help.
>
> The fix for the above could be trivial (do not pipe, just take the
> output to a temporary file, and then feed the "while read" loop from
> that temporary file), and I suspect it would make a big difference
> for usability.

I'd like to counter your argument with quoting code from update_clone
method:

     run_processes_parallel(1, get_next_task, start_failure,
task_finished, &pp);

     if (pp.print_unmatched) {
         printf("#unmatched\n");
         return 1;
     }

     for_each_string_list_item(item, &pp.projectlines) {
         utf8_fprintf(stdout, "%s", item->string);
     }

So we do already all the cloning first, and then once we did all of that
we just put out all accumulated lines of text. (It was harder to come up with
a sufficient file name than just storing it in memory. I don't think
memory is an
issue here, only a few bytes per submodule. So even 1000 submodules would
consume maybe 100kB)

Having a file though would allow us to continue after human interaction fixed
one problem.




>
> Thanks.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-21 21:06  7%     ` Stefan Beller
@ 2015-10-21 21:23  6%       ` Junio C Hamano
  2015-10-21 22:14  7%         ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-21 21:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

Stefan Beller <sbeller@google.com> writes:

> I'd like to counter your argument with quoting code from update_clone
> method:
> 
>      run_processes_parallel(1, get_next_task, start_failure,
> task_finished, &pp);
>
>      if (pp.print_unmatched) {
>          printf("#unmatched\n");
>          return 1;
>      }
>
>      for_each_string_list_item(item, &pp.projectlines) {
>          utf8_fprintf(stdout, "%s", item->string);
>      }
>
> So we do already all the cloning first, and then once we did all of that
> we just put out all accumulated lines of text. (It was harder to come up with
> a sufficient file name than just storing it in memory. I don't think
> memory is an
> issue here, only a few bytes per submodule. So even 1000 submodules would
> consume maybe 100kB)

That does not sound like a counter-argument; two bad design choices
compensating each other's shortcomings, perhaps ;-)

> Having a file though would allow us to continue after human
> interaction fixed one problem.

Yes.  That does sound like a better design.

This obviously depends on the impact to the other part of what
cmd_update() does, but your earlier idea to investigate the
feasibility and usefulness of updating "clone --recurse-submodules"
does sound like a good thing to do, too.  That's an excellent point.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 8/8] git submodule update: Have a dedicated helper for cloning
  2015-10-21 21:23  6%       ` Junio C Hamano
@ 2015-10-21 22:14  7%         ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-21 22:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Wed, Oct 21, 2015 at 2:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> I'd like to counter your argument with quoting code from update_clone
>> method:
>>
>>      run_processes_parallel(1, get_next_task, start_failure,
>> task_finished, &pp);
>>
>>      if (pp.print_unmatched) {
>>          printf("#unmatched\n");
>>          return 1;
>>      }
>>
>>      for_each_string_list_item(item, &pp.projectlines) {
>>          utf8_fprintf(stdout, "%s", item->string);
>>      }
>>
>> So we do already all the cloning first, and then once we did all of that
>> we just put out all accumulated lines of text. (It was harder to come up with
>> a sufficient file name than just storing it in memory. I don't think
>> memory is an
>> issue here, only a few bytes per submodule. So even 1000 submodules would
>> consume maybe 100kB)
>
> That does not sound like a counter-argument; two bad design choices
> compensating each other's shortcomings, perhaps ;-)

I was phrasing it worse than I meant to. I should have pointed out the
positive aspect of having first all clones done and then the
local post processing in the downstream pipe afterwards.

>
>> Having a file though would allow us to continue after human
>> interaction fixed one problem.
>
> Yes.  That does sound like a better design.

I don't think the proposed patches make it worse than it already is.
Before we have the "submodule--helper list" which tells downpipe to
do all the things. Now we just take out the cloning and make it an
upfront action, eventually faster due to parallelism.

Also I think we should not promote "git submodule" and specially
its update sub-command to be the best command available. Ideally
we want to rather implement implicit submodule handling in the
other commands such as clone, pull, fetch, checkout, merge, rebase.
and by that I mean better defaults than just "don't touch the submodules,
as that's the safest thing to do now".

>
> This obviously depends on the impact to the other part of what
> cmd_update() does, but your earlier idea to investigate the
> feasibility and usefulness of updating "clone --recurse-submodules"
> does sound like a good thing to do, too.  That's an excellent point.

I investigated and I think it's a bad idea now :)
Because of the --recursive switch we would need to do more than just

    submodules_init()
    run_parallel(.. clone_and_checkout...);

but each cloned submodule would need to be inspected for recursive
submodules again and then we would need to add that to the list of
submodules to process.

I estimate this is about as much work as improving "git submodule update"
to do uncontroversial checkouts in the first parallel phase.

^ permalink raw reply	[relevance 7%]

* Re: Make "git checkout" automatically update submodules?
  @ 2015-10-23 17:20  7% ` Stefan Beller
  2015-10-23 19:11  6%   ` Junio C Hamano
  2015-10-23 22:51  4%   ` Kannan Goundan
  0 siblings, 2 replies; 200+ results
From: Stefan Beller @ 2015-10-23 17:20 UTC (permalink / raw)
  To: Kannan Goundan; +Cc: git@vger.kernel.org, Jonathan Nieder, Jens Lehmann

On Thu, Oct 15, 2015 at 3:50 PM, Kannan Goundan <kannan@cakoose.com> wrote:
> Git submodules seem to be a great fit for one of our repos.  The biggest
> pain point is that it's too easy to forget to update submodules.
>
> 1. I often forget since most repos don't need it.
> 2. Infrequent users of our repo almost never know to update submodules and
> end up coming to us with strange build errors.
> 3. Existing scripts that work with Git repos are usually not built to handle
> submodules.
>
> In the common case of the submodule content having no local edits, it would
> be nice if "git checkout" automatically updated submodules [1].  If there
> are local edits, it could error out (maybe override with
> "--ignore-modified-submodules" or something).
>
> I'm not a Git expert, though.  Is there a reason something like this isn't
> already implemented?  Maybe there's an existing write-up or mailing list
> thread I can read to get some background information?
>
> Thanks!
>
> [1] Our post-checkout procedure is:
>
>     git submodule sync
>     git submodule update --init
>     git submodule foreach --recursive \
>       'git submodule sync ; git submodule update --init'
>
> (Not sure if this is correct.  Different articles/blogs suggest a slightly
> different set of commands.)
>

Checkout [1]. There are lots of good patches, but hard to find.
(Including, but not limited to a recursive git checkout enhancement!)

That said I've recently started working on submodules, too.
I am trying to push my work upstream as fast as possible
as that works best for us.

[1] https://github.com/jlehmann/git-submod-enhancements/wiki

^ permalink raw reply	[relevance 7%]

* [PATCH 1/3] git submodule update: have a dedicated helper for cloning
  2015-10-23 18:44 10% [PATCH 0/3] expose parallelism for submodule {update, clone} Stefan Beller
@ 2015-10-23 18:44 21% ` Stefan Beller
  2015-10-23 18:44 20% ` [PATCH 2/3] submodule update: Expose parallelism to the user Stefan Beller
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-23 18:44 UTC (permalink / raw)
  To: git, gitster; +Cc: jrnieder, Jens.Lehmann, Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 225 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 238 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..e6bce76 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,230 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	int count;
+	int quiet;
+	int print_unmatched;
+	char *reference;
+	char *depth;
+	char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		const char *url = NULL;
+		int just_cloned = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string_const(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (just_cloned) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +488,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8b0eb9a..ea883b9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -655,17 +655,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -682,27 +683,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -742,13 +726,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.6.2.280.g74301d6

^ permalink raw reply related	[relevance 21%]

* [PATCH 2/3] submodule update: Expose parallelism to the user
  2015-10-23 18:44 10% [PATCH 0/3] expose parallelism for submodule {update, clone} Stefan Beller
  2015-10-23 18:44 21% ` [PATCH 1/3] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-10-23 18:44 20% ` Stefan Beller
  2015-10-23 18:44 19% ` [PATCH 3/3] clone: Allow an explicit argument for parallel submodule clones Stefan Beller
  2015-10-23 19:25  7% ` [PATCH 0/3] expose parallelism for submodule {update, clone} Junio C Hamano
  3 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-23 18:44 UTC (permalink / raw)
  To: git, gitster; +Cc: jrnieder, Jens.Lehmann, Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  6 +++++-
 builtin/submodule--helper.c     | 17 +++++++++++++----
 git-submodule.sh                |  9 +++++++++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..f5429fa 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,10 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j::
+--jobs::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e6bce76..4888e84 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -422,6 +422,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -442,6 +443,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -463,10 +466,16 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs == -1)
+		if (git_config_get_int("submodule.jobs", &max_jobs))
+			max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index ea883b9..c2dfb16 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -636,6 +636,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -661,6 +669,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
-- 
2.6.2.280.g74301d6

^ permalink raw reply related	[relevance 20%]

* [PATCH 3/3] clone: Allow an explicit argument for parallel submodule clones
  2015-10-23 18:44 10% [PATCH 0/3] expose parallelism for submodule {update, clone} Stefan Beller
  2015-10-23 18:44 21% ` [PATCH 1/3] git submodule update: have a dedicated helper for cloning Stefan Beller
  2015-10-23 18:44 20% ` [PATCH 2/3] submodule update: Expose parallelism to the user Stefan Beller
@ 2015-10-23 18:44 19% ` Stefan Beller
  2015-10-28 21:03  4%   ` Sebastian Schuberth
  2015-10-23 19:25  7% ` [PATCH 0/3] expose parallelism for submodule {update, clone} Junio C Hamano
  3 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-23 18:44 UTC (permalink / raw)
  To: git, gitster; +Cc: jrnieder, Jens.Lehmann, Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

TODO: Add a test for this.
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  5 ++++-
 builtin/clone.c             | 23 +++++++++++++++++------
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..affa52e 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,9 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j::
+--jobs::
+	The number of submodules fetched at the same time.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 5864ad1..59ec984 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_children;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_children,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -674,8 +673,20 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_children) {
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addf(&sb, "--jobs=%d", max_children);
+			argv_array_push(&args, sb.buf);
+			strbuf_release(&sb);
+		}
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
-- 
2.6.2.280.g74301d6

^ permalink raw reply related	[relevance 19%]

* [PATCH 0/3] expose parallelism for submodule {update, clone}
@ 2015-10-23 18:44 10% Stefan Beller
  2015-10-23 18:44 21% ` [PATCH 1/3] git submodule update: have a dedicated helper for cloning Stefan Beller
                   ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Stefan Beller @ 2015-10-23 18:44 UTC (permalink / raw)
  To: git, gitster; +Cc: jrnieder, Jens.Lehmann, Stefan Beller

This goes on top of origin/sb/submodule-parallel-fetch^
The first patch replaces the last patch of origin/sb/submodule-parallel-fetch
using clearer names for the callback functions.

The patches 2 and 3 introduce CLI options for {submodule update, clone} to instruct Git
to be parallel for cloning submodule operations.

Additionally `git submodule update` respects the config option "submodule.jobs".

I also want to make "git fetch --recurse-submodules" and "git clone --recursive"
respect the same "submodule.jobs" config option, but that code change would collide
with origin/sb/submodule-config-parse, so I will put the patches on top of that.

Stefan Beller (3):
  git submodule update: have a dedicated helper for cloning
  submodule update: Expose parallelism to the user
  clone: Allow an explicit argument for parallel submodule clones

 Documentation/git-clone.txt     |   5 +-
 Documentation/git-submodule.txt |   6 +-
 builtin/clone.c                 |  23 ++--
 builtin/submodule--helper.c     | 234 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++------
 t/t7400-submodule-basic.sh      |   4 +-
 6 files changed, 282 insertions(+), 44 deletions(-)

-- 
2.6.2.280.g74301d6

^ permalink raw reply	[relevance 10%]

* Re: Make "git checkout" automatically update submodules?
  2015-10-23 17:20  7% ` Stefan Beller
@ 2015-10-23 19:11  6%   ` Junio C Hamano
  2015-10-23 22:51  4%   ` Kannan Goundan
  1 sibling, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-23 19:11 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Kannan Goundan, git@vger.kernel.org, Jonathan Nieder,
	Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> Checkout [1]. There are lots of good patches, but hard to find.
> (Including, but not limited to a recursive git checkout enhancement!)
> ...
> [1] https://github.com/jlehmann/git-submod-enhancements/wiki

Yes, Jens is not just one of the people who have been working on
harder, and thinking longer about, submodules than anybody else, but
also has demonstrated that he has good taste and balanced view on
the design of the subsystem over time, whose technical judgment we
can trust.

Not all the changes listed on the page may necessarily be good as-is
(e.g. some may help only a subset of users while hurting others,
like the "recursively check-out everything unconditionally" that
trigerred this thread), but the page has a good collection to remind
anybody, who designs a coherent whole, of issues that need to be
taken into account.

Thanks for a pointer to an excellent starting page.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 0/3] expose parallelism for submodule {update, clone}
  2015-10-23 18:44 10% [PATCH 0/3] expose parallelism for submodule {update, clone} Stefan Beller
                   ` (2 preceding siblings ...)
  2015-10-23 18:44 19% ` [PATCH 3/3] clone: Allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-23 19:25  7% ` Junio C Hamano
  2015-10-23 19:33  7%   ` Stefan Beller
  3 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-23 19:25 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder, Jens.Lehmann

Stefan Beller <sbeller@google.com> writes:

>   submodule update: Expose parallelism to the user
>   clone: Allow an explicit argument for parallel submodule clones

downcase Expose and Allow, perhaps?



I was looking at the previous one and I am getting the feeling that
everything up to "run-command: fix missing output from late callbacks"
is ready for 'next'.  Am I being too optimistic and missing something
that may make you want to do another reroll?

37bc721 submodule.c: write "Fetching submodule <foo>" to stderr
0904370 xread: poll on non blocking fds
fd6ed7c xread_nonblock: add functionality to read from fds without blocking
e7ba957 strbuf: add strbuf_read_once to read without blocking
8fc3f2e sigchain: add command to pop all common signals
f57c806 run-command: add an asynchronous parallel child processor
4733d9e fetch_populated_submodules: use new parallel job processing
dca8113 submodules: allow parallel fetching, add tests and documentation
79f3857 run-command: fix early shutdown
1c53754 run-command: clear leftover state from child_process structure
63ce47e run-command: initialize the shutdown flag
c3a5d11 test-run-command: test for gracefully aborting
74cc04d test-run-command: increase test coverage
376d400 run-command: fix missing output from late callbacks

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 0/3] expose parallelism for submodule {update, clone}
  2015-10-23 19:25  7% ` [PATCH 0/3] expose parallelism for submodule {update, clone} Junio C Hamano
@ 2015-10-23 19:33  7%   ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-23 19:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org, Jonathan Nieder, Jens Lehmann

On Fri, Oct 23, 2015 at 12:25 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>   submodule update: Expose parallelism to the user
>>   clone: Allow an explicit argument for parallel submodule clones
>
> downcase Expose and Allow, perhaps?

Will do, thanks!


>
>
>
> I was looking at the previous one and I am getting the feeling that
> everything up to "run-command: fix missing output from late callbacks"
> is ready for 'next'.  Am I being too optimistic and missing something
> that may make you want to do another reroll?

I would even argue for "submodule config: keep update strategy around"
to be ready for next. ;) But as that is quite unrelated to the previous patches
and only needed for the last patch, we can omit that safely too.

All the fixes up to "run-command: fix missing output from late callbacks"
sound good to me for next.

I have run into a problem cloning big repositories though, but I haven't
found the problem. So the whole parallel processing machine may need
another bug fix later on.

>
> 37bc721 submodule.c: write "Fetching submodule <foo>" to stderr
> 0904370 xread: poll on non blocking fds
> fd6ed7c xread_nonblock: add functionality to read from fds without blocking
> e7ba957 strbuf: add strbuf_read_once to read without blocking
> 8fc3f2e sigchain: add command to pop all common signals
> f57c806 run-command: add an asynchronous parallel child processor
> 4733d9e fetch_populated_submodules: use new parallel job processing
> dca8113 submodules: allow parallel fetching, add tests and documentation
> 79f3857 run-command: fix early shutdown
> 1c53754 run-command: clear leftover state from child_process structure
> 63ce47e run-command: initialize the shutdown flag
> c3a5d11 test-run-command: test for gracefully aborting
> 74cc04d test-run-command: increase test coverage
> 376d400 run-command: fix missing output from late callbacks

^ permalink raw reply	[relevance 7%]

* Re: Make "git checkout" automatically update submodules?
  2015-10-23 17:20  7% ` Stefan Beller
  2015-10-23 19:11  6%   ` Junio C Hamano
@ 2015-10-23 22:51  4%   ` Kannan Goundan
  1 sibling, 0 replies; 200+ results
From: Kannan Goundan @ 2015-10-23 22:51 UTC (permalink / raw)
  To: git

Stefan Beller <sbeller <at> google.com> writes:

> [1] https://github.com/jlehmann/git-submod-enhancements/wiki

Oh wow, Christmas came early!  I'll give this code a try.

^ permalink raw reply	[relevance 4%]

* Re: [RFC] URL rewrite in .gitmodules
  2015-10-19 22:07  5% ` Stefan Beller
@ 2015-10-25 14:43  5%   ` Lars Schneider
  0 siblings, 0 replies; 200+ results
From: Lars Schneider @ 2015-10-25 14:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Users


On 20 Oct 2015, at 00:07, Stefan Beller <sbeller@google.com> wrote:

> On Mon, Oct 19, 2015 at 12:28 PM, Lars Schneider
> <larsxschneider@gmail.com> wrote:
>> Hi,
>> 
>> I have a closed source Git repo which references an Open Source Git repo as Submodule. The Open Source Git repo references yet another Open Source repo as submodule. In order to avoid failing builds due to external services I mirrored the Open Source repos in my company network. That works great with the first level of Submodules. Unfortunately it does not work with the second level because the first level still references the "outside of company" repos. I know I can rewrite Git URLs with the git config "url.<base>.insteadOf" option. However, git configs are client specific.
> 
> I feel like this is working as intended. You only want to improve your
> one client (say the buildbot) to not goto the open source site, while
> the developer may do want to fetch from external sources ("Hey shiny
> new code!";)
Well, that's a good argument. However, our developers have usually no write access to these repos. If they want to push a commit then they need to fork the open source repo and change the submodule URL in the parent repo. I fear that this kind of process might overwhelm them and/or troubles them (changing Git submodules URLs has a few pitfalls). As a result they might be less inclined to make a contribution or - even worse - they copy the code in the parent repo, don't use Submodules and make no contribution at all. 


> 
>> I would prefer a solution that works without setup on any client. I also know that I could update the .gitmodules file in the Open Source repo on the first level. I also would prefer not to do this as I want to use the very same hashes as defined by the "upstream" Open Source repos.
> 
> You could carry a patch on top of the tip of the first submodule
> re-pointing the nested submodule. This requires good workflows
> available to deal with submodules though. (Fetch and merge or rebase,
> git submodule update should be able to do that?)
True. However, we have many Git beginners and I fear that this workflow would overwhelm them.


>> 
>> Is there yet another way in Git to change URLs of Submodules in the way I want it?
>> 
>> If not, what do you think about a patch that adds a "url" section similar to the one in git config to a .gitmodules file?
>> 
> 
> So we have different kinds of git configs. within one repository, in
> the home director (global to the one machine),
> maybe you would want to have one "global" config on a network share,
> such that every box in your company
> reads that "company-wide" global config and acts upon that?
That could actually work. The only downside I see is that the devs need to intentionally update their "company" git config. We have +4000 engineers and therefore I want to establish processes that are as easy and fault-tolerant as possible.  

> 
>> Example:
>> ----------
>> [submodule "git"]
>>        path = git
>>        url=git://github.com/larsxschneider/git.git
>> 
>> [url "mycompany.com"]
>>        insteadOf = outside.com
> 
> Wouldn't that be better put into say a global git config instead of
> repeating it for every submodule?
See answer above. The git config setup could be an obstacle.

> 
> In case of the nested submodule you would need to carry the last lines
> as an extra patch anyway
> if this was done in the .gitmodules files? Or do you expect this to be
> applied recursively (i.e. nested
> submodules all the way down also substitute outside.com)
Yes, my intention was to apply these recursively.


> Am I missing your point?
I don't think so :-)

Thanks,
Lars

^ permalink raw reply	[relevance 5%]

* Re: Why are submodules not automatically handled by default or at least configurable to do so?
  @ 2015-10-26 16:28  7%   ` Stefan Beller
  2015-10-26 19:53  4%     ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-26 16:28 UTC (permalink / raw)
  To: Chris Packham; +Cc: John Smith, GIT, Jens Lehmann

On Sun, Oct 25, 2015 at 5:56 PM, Chris Packham <judge.packham@gmail.com> wrote:
> On Mon, Oct 26, 2015 at 12:10 PM, John Smith <johsmi9933@inbox.com> wrote:
>> I found that I use submodules much, much more often in my git projects than I used externals
>> in Subversion and the reason is that git encourages/forces to organize large projects into
>> smaller repositories, one reason for this being that subversion allows to check out parts of
>> a repository while git does not.
>>
>> But when I clone a git repository with subprojects, I (and everyone else) has to remember to
>> add the --recursive option. When switching between branches with different versions/commits of the
>> submodules everyone has to remember to update the submodules. When updating a submodule
>> everyone has to remember to recurse there too.
>
> The config option fetch.recurseSubmodules exists. It's not quite the
> same as what git clone --recurse-submodules does but it's a start.
>
>>
>> Basically, everything with submodules has to be done manually every time and there seems
>> to be no way to change that default.
>>
>> Why is that? Basically all the time I use submodules I would want automatic handling of
>> submodules to happen and I cannot  remember having had a single situation where I would
>> not have wanted it to happen. So  why does git default to doing nothing?

IIUC at the time submodules were invented, there was need for lots of
code to be written.
Each command needed new code to deal with submodules. As there was not
enough people/time
to do it properly, the "do nothing" was the safest action which could
be added fast.

>
> It's hard to pick a default that suits every workflow that submodules
> support. Also with submodules there is a chicken-and-egg scenario.
> While you can put things in ~/.gitconfig most of what you'd want to
> configure when using submodules would be in super/.git/config but that
> doesn't exist until you've cloned super.git.
>
>> Why does it not provide a way to enable automatic
>> pulling/updating of submodules e.g. when cloning or switching branches?
>
> I believe Jens and Stefan (Cc'd) have been doing some great work in
> this area. Jens even posted his todo list a few days ago
> (https://github.com/jlehmann/git-submod-enhancements/wiki).

Yeah I would also point at Jens' wiki today.

All I did up to now was rewriting parts of the submodule code in C
(git submodule update specifically), while the code/patches you find at Jens'
copy of Git includes already lots of useful stuff such as `git
checkout --recurse-submodules`
(IIRC you don't need to type --recurse-submodules if you configured
that to be the default)

>
>> When would people routinely check out a branch and want to stay with the submodules as
>> the have been checked out for the old branch?

As said above, it was a sane choice which could be implemented fast, IIUC.

I mean what would happen if you had commits made in the submodule, or
just a dirty working tree?

>>
>> I honestly do not understand it.
>>
>> John
>>

Stefan

^ permalink raw reply	[relevance 7%]

* Re: [RFC] URL rewrite in .gitmodules
  @ 2015-10-26 16:34  2%     ` Stefan Beller
  2015-10-26 16:52  2%       ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-26 16:34 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Junio C Hamano, Git Users

On Sun, Oct 25, 2015 at 8:12 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>
> On 20 Oct 2015, at 19:33, Junio C Hamano <gitster@pobox.com> wrote:
>
>> Lars Schneider <larsxschneider@gmail.com> writes:
>>
>>> If not, what do you think about a patch that adds a "url" section
>>> similar to the one in git config to a .gitmodules file?
>>>
>>> Example:
>>> ----------
>>> [submodule "git"]
>>>      path = git
>>>        url=git://github.com/larsxschneider/git.git
>>>
>>> [url "mycompany.com"]
>>>        insteadOf = outside.com
>>> ----------
>>
>> It is unclear to me if you are adding the last two (or three,
>> counting the blank before) lines to your company's private fork of
>> the opensource project, but if that is the case, then that would
>> defeat your earlier desire:
>>
>>> ... I also would prefer not to do this as I want to use the
>>> very same hashes as defined by the "upstream" ...
>>
>> wouldn't it?
> The last three lines are added to my companies closed source Git repo. In this example the company repo references git://github.com/larsxschneider/git.git as submodule. This submodule in turn references another submodule with a URL "outside.com". This is the URL I want to rewrite. Do you think this could be useful to others as well?
>
>
>> I do not think this topic is specific to use of submodules.  If you
>> want to encourage your engineers to fetch from nearby mirrors you
>> maintain, you would want a forest of url.mine.insteadof=theirs for
>> the external repositories that matter to you specified by
>> everybody's $HOME/.gitconfig, and one way to do so would be to have
>> them use the configuration inclusion.  An item in your engineer
>> orientation material could tell them to add
>>
>>       [include]
>>               path = /usr/local/etc/git/mycompany.urlrewrite
>>
>> when they set up their "[user] name/email" in there.
>>
>> And you can update /usr/local/etc/git/mycompany.urlrewrite as
>> needed.
> Oh nice, I didn't know about "include". However, as mentioned to Stefan in this thread, I fear that our engineers will miss that. I would prefer a solution that does not need any additional setup. Therefore the suggestion to add rewrites in the .gitmodules file.

How do you distribute new copies of Git to your engineers?
Maybe you could ship them a version which has the "include" line
already builtin as default? So your distributed copy of Git
would not just check the default places for configs, but also
some complied in /net/share/mycompany.gitconfig

>
> Thanks,
> Lars
>

^ permalink raw reply	[relevance 2%]

* Re: [RFC] URL rewrite in .gitmodules
  2015-10-26 16:34  2%     ` Stefan Beller
@ 2015-10-26 16:52  2%       ` Jens Lehmann
  2015-11-15 13:16  0%         ` Lars Schneider
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-10-26 16:52 UTC (permalink / raw)
  To: Stefan Beller, Lars Schneider; +Cc: Junio C Hamano, Git Users

Am 26.10.2015 um 17:34 schrieb Stefan Beller:
> On Sun, Oct 25, 2015 at 8:12 AM, Lars Schneider <larsxschneider@gmail.com> wrote:
>> On 20 Oct 2015, at 19:33, Junio C Hamano <gitster@pobox.com> wrote:
>>> I do not think this topic is specific to use of submodules.  If you
>>> want to encourage your engineers to fetch from nearby mirrors you
>>> maintain, you would want a forest of url.mine.insteadof=theirs for
>>> the external repositories that matter to you specified by
>>> everybody's $HOME/.gitconfig, and one way to do so would be to have
>>> them use the configuration inclusion.  An item in your engineer
>>> orientation material could tell them to add
>>>
>>>        [include]
>>>                path = /usr/local/etc/git/mycompany.urlrewrite
>>>
>>> when they set up their "[user] name/email" in there.
>>>
>>> And you can update /usr/local/etc/git/mycompany.urlrewrite as
>>> needed.
>> Oh nice, I didn't know about "include". However, as mentioned to Stefan in this thread, I fear that our engineers will miss that. I would prefer a solution that does not need any additional setup. Therefore the suggestion to add rewrites in the .gitmodules file.
>
> How do you distribute new copies of Git to your engineers?
> Maybe you could ship them a version which has the "include" line
> already builtin as default? So your distributed copy of Git
> would not just check the default places for configs, but also
> some complied in /net/share/mycompany.gitconfig

Which is just what we do at $DAYJOB, that way you can easily
distribute all kinds of settings, customizations and hooks
company-wide.

^ permalink raw reply	[relevance 2%]

* Re: Why are submodules not automatically handled by default or at least configurable to do so?
  2015-10-26 16:28  7%   ` Stefan Beller
@ 2015-10-26 19:53  4%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-26 19:53 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Chris Packham, John Smith, GIT, Jens Lehmann

Stefan Beller <sbeller@google.com> writes:

> IIUC at the time submodules were invented, there was need for lots of
> code to be written.
> Each command needed new code to deal with submodules. As there was not
> enough people/time
> to do it properly, the "do nothing" was the safest action which could
> be added fast.

That is quite different from how I remember.  Soon after Linus and I
added the Gitlink in early Apr 20007, an early subproject/gitlink
(thought) experiment was started with help from folks like Steven
Grimm, Jan Hudec, Petr Baudis, Alex Riesen etc.  The first principle
of the design throughout that era was "we admit that we do not know
all the use cases, so let's start small and solid and make sure that
small-and-solid thing can later be enhanced as people discover the
way how they want to work" (you can see me expressing that sentiment
in $gmane/48287, for example).

So it wasn't "not enough people to do it properly" at all.  It was
"we admit that we do not know what is proper, so we defer to actual
users to define what is proper for them".

^ permalink raw reply	[relevance 4%]

* [PATCH 2/9] submodule config: keep update strategy around
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
  2015-10-27 18:15 24% ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
@ 2015-10-27 18:15 26% ` Stefan Beller
  2015-10-27 18:15 21% ` [PATCH 4/9] git submodule update: have a dedicated helper for cloning Stefan Beller
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..8b8c7d1 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *)submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 	strbuf_release(&name);
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 26%]

* [PATCH 4/9] git submodule update: have a dedicated helper for cloning
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
  2015-10-27 18:15 24% ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
  2015-10-27 18:15 26% ` [PATCH 2/9] submodule config: keep update strategy around Stefan Beller
@ 2015-10-27 18:15 21% ` Stefan Beller
  2015-10-27 18:15 23% ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 234 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 247 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..1ec1b85 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,239 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	int count;
+	int quiet;
+	int print_unmatched;
+	char *reference;
+	char *depth;
+	char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		char *url = NULL;
+		int just_cloned = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (!sub) {
+			strbuf_addf(err, "BUG: internal error managing submodules. "
+				    "The cache could not locate '%s'", ce->name);
+			pp->print_unmatched = 1;
+			return 0;
+		}
+
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (just_cloned) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			free(url);
+			return 1;
+		} else
+			free(url);
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +497,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8b0eb9a..ea883b9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -655,17 +655,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -682,27 +683,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -742,13 +726,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 21%]

* [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (3 preceding siblings ...)
  2015-10-27 18:15 23% ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
@ 2015-10-27 18:15 24% ` Stefan Beller
  2015-10-27 20:57  7%   ` Junio C Hamano
  2015-10-27 18:15 23% ` [PATCH 7/9] submodule config: remove name_and_item_from_var Stefan Beller
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  5 ++++-
 builtin/clone.c             | 26 ++++++++++++++++++++------
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..affa52e 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,9 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j::
+--jobs::
+	The number of submodules fetched at the same time.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 5864ad1..b8b1d4c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_jobs = -1;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_jobs,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -674,8 +673,23 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_jobs == -1)
+			if (git_config_get_int("submodule.jobs", &max_jobs))
+				max_jobs = 1;
+		if (max_jobs != 1) {
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addf(&sb, "--jobs=%d", max_jobs);
+			argv_array_push(&args, sb.buf);
+			strbuf_release(&sb);
+		}
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 05ea66f..ade0524 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -786,4 +786,19 @@ test_expect_success 'submodule update can be run in parallel' '
 	 grep "9 children" trace.out
 	)
 '
+
+test_expect_success 'git clone passes the parallel jobs config on to submodules' '
+	test_when_finished "rm -rf super4" &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 7 . super4 &&
+	grep "7 children" trace.out &&
+	rm -rf super4 &&
+	git config --global submodule.jobs 8 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules . super4 &&
+	grep "8 children" trace.out &&
+	rm -rf super4 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 9 . super4 &&
+	grep "9 children" trace.out &&
+	rm -rf super4
+'
+
 test_done
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 24%]

* [PATCH 7/9] submodule config: remove name_and_item_from_var
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (4 preceding siblings ...)
  2015-10-27 18:15 24% ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-27 18:15 23% ` Stefan Beller
  2015-10-27 18:15 21% ` [PATCH 8/9] submodule-config: parse_config Stefan Beller
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

By inlining `name_and_item_from_var` it is easy to add later options
which are not required to have a submodule name.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 8b8c7d1..4d0563c 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -161,22 +161,6 @@ static struct submodule *cache_lookup_name(struct submodule_cache *cache,
 	return NULL;
 }
 
-static int name_and_item_from_var(const char *var, struct strbuf *name,
-				  struct strbuf *item)
-{
-	const char *subsection, *key;
-	int subsection_len, parse;
-	parse = parse_config_key(var, "submodule", &subsection,
-			&subsection_len, &key);
-	if (parse < 0 || !subsection)
-		return 0;
-
-	strbuf_add(name, subsection, subsection_len);
-	strbuf_addstr(item, key);
-
-	return 1;
-}
-
 static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 		const unsigned char *gitmodules_sha1, const char *name)
 {
@@ -251,18 +235,25 @@ static int parse_config(const char *var, const char *value, void *data)
 {
 	struct parse_config_parameter *me = data;
 	struct submodule *submodule;
-	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
-	int ret = 0;
+	int subsection_len, ret = 0;
+	const char *subsection, *key;
+	char *name;
 
-	/* this also ensures that we only parse submodule entries */
-	if (!name_and_item_from_var(var, &name, &item))
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
 		return 0;
 
+	if (!subsection_len)
+		return 0;
+
+	/* subsection is not null terminated */
+	name = xmemdupz(subsection, subsection_len);
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
-					     name.buf);
+					     name);
+	free(name);
 
-	if (!strcmp(item.buf, "path")) {
+	if (!strcmp(key, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->path != NULL)
@@ -275,7 +266,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->path = xstrdup(value);
 			cache_put_path(me->cache, submodule);
 		}
-	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+	} else if (!strcmp(key, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
@@ -286,7 +277,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->fetch_recurse = parse_fetch_recurse(
 								var, value,
 								die_on_error);
-	} else if (!strcmp(item.buf, "ignore")) {
+	} else if (!strcmp(key, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->ignore != NULL)
@@ -302,7 +293,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->ignore);
 			submodule->ignore = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "url")) {
+	} else if (!strcmp(key, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
 		} else if (!me->overwrite && submodule->url != NULL) {
@@ -312,7 +303,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "update")) {
+	} else if (!strcmp(key, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->update != NULL)
@@ -324,9 +315,6 @@ static int parse_config(const char *var, const char *value, void *data)
 		}
 	}
 
-	strbuf_release(&name);
-	strbuf_release(&item);
-
 	return ret;
 }
 
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 23%]

* [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (2 preceding siblings ...)
  2015-10-27 18:15 21% ` [PATCH 4/9] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-10-27 18:15 23% ` Stefan Beller
  2015-10-27 20:59  6%   ` Junio C Hamano
  2015-10-27 18:15 24% ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  6 +++++-
 builtin/submodule--helper.c     | 17 +++++++++++++----
 git-submodule.sh                |  9 +++++++++
 t/t7406-submodule-update.sh     | 12 ++++++++++++
 4 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..f5429fa 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,10 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j::
+--jobs::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1ec1b85..c3d438a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -431,6 +431,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -451,6 +452,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -472,10 +475,16 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs == -1)
+		if (git_config_get_int("submodule.jobs", &max_jobs))
+			max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index ea883b9..c2dfb16 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -636,6 +636,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -661,6 +669,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..05ea66f 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
 	)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+	(cd super2 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+	 grep "7 children" trace.out &&
+	 git config submodule.jobs 8 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update &&
+	 grep "8 children" trace.out &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 9 &&
+	 grep "9 children" trace.out
+	)
+'
 test_done
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 23%]

* [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (6 preceding siblings ...)
  2015-10-27 18:15 21% ` [PATCH 8/9] submodule-config: parse_config Stefan Beller
@ 2015-10-27 18:15 24% ` Stefan Beller
  2015-10-27 21:00  7%   ` Junio C Hamano
  2015-10-27 19:12  7% ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This allows to configure fetching and updating in parallel
without having the command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt    |  7 +++++++
 builtin/fetch.c             |  2 +-
 submodule-config.c          |  9 +++++++++
 submodule-config.h          |  2 ++
 submodule.c                 |  5 +++++
 t/t5526-fetch-submodules.sh | 14 ++++++++++++++
 6 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 315f271..0b733d7 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2575,6 +2575,13 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
+submodule::jobs
+	This is used to determine how many submodules can be operated on in
+	parallel. Specifying a positive integer allows up to that number
+	of submodules being fetched in parallel. Specifying 0 the number
+	of cpus will be taken as the maximum number. Currently this is
+	used in fetch and clone operations only.
+
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
 	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index f28eac6..b1399dc 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule-config.c b/submodule-config.c
index 1cea404..07bdcdf 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -32,6 +32,7 @@ enum lookup_type {
 
 static struct submodule_cache cache;
 static int is_cache_init;
+static int parallel_jobs = -1;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
@@ -235,6 +236,9 @@ static int parse_generic_submodule_config(const char *var,
 					  const char *key,
 					  const char *value)
 {
+	if (!strcmp(key, "jobs")) {
+		parallel_jobs = strtol(value, NULL, 10);
+	}
 	return 0;
 }
 
@@ -483,3 +487,8 @@ void submodule_free(void)
 	cache_free(&cache);
 	is_cache_init = 0;
 }
+
+int config_parallel_submodules(void)
+{
+	return parallel_jobs;
+}
diff --git a/submodule-config.h b/submodule-config.h
index f9e2a29..d9bbf9a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
 void submodule_free(void);
 
+int config_parallel_submodules(void);
+
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index c21b265..4822605 100644
--- a/submodule.c
+++ b/submodule.c
@@ -759,6 +759,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_parallel_submodules();
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 1b4ce69..5c3579c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'fetching submodules respects parallel settings' '
+	git config fetch.recurseSubmodules true &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
+		grep "7 children" trace.out &&
+		git config submodule.jobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git fetch &&
+		grep "8 children" trace.out &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
+		grep "9 children" trace.out
+	)
+'
+
 test_done
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 24%]

* [PATCH 8/9] submodule-config: parse_config
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (5 preceding siblings ...)
  2015-10-27 18:15 23% ` [PATCH 7/9] submodule config: remove name_and_item_from_var Stefan Beller
@ 2015-10-27 18:15 21% ` Stefan Beller
  2015-10-27 18:15 24% ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
  2015-10-27 19:12  7% ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This rewrites parse_config to distinguish between configs specific to
one submodule and configs which apply generically to all submodules.
We do not have generic submodule configs yet, but the next patch will
introduce "submodule.jobs".

Signed-off-by: Stefan Beller <sbeller@google.com>

# Conflicts:
#	submodule-config.c

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 58 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 4d0563c..1cea404 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -231,27 +231,23 @@ struct parse_config_parameter {
 	int overwrite;
 };
 
-static int parse_config(const char *var, const char *value, void *data)
+static int parse_generic_submodule_config(const char *var,
+					  const char *key,
+					  const char *value)
 {
-	struct parse_config_parameter *me = data;
-	struct submodule *submodule;
-	int subsection_len, ret = 0;
-	const char *subsection, *key;
-	char *name;
-
-	if (parse_config_key(var, "submodule", &subsection,
-			     &subsection_len, &key) < 0)
-		return 0;
-
-	if (!subsection_len)
-		return 0;
+	return 0;
+}
 
-	/* subsection is not null terminated */
-	name = xmemdupz(subsection, subsection_len);
-	submodule = lookup_or_create_by_name(me->cache,
-					     me->gitmodules_sha1,
-					     name);
-	free(name);
+static int parse_specific_submodule_config(struct parse_config_parameter *me,
+					   const char *name,
+					   const char *key,
+					   const char *value,
+					   const char *var)
+{
+	int ret = 0;
+	struct submodule *submodule = lookup_or_create_by_name(me->cache,
+							       me->gitmodules_sha1,
+							       name);
 
 	if (!strcmp(key, "path")) {
 		if (!value)
@@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
+static int parse_config(const char *var, const char *value, void *data)
+{
+	struct parse_config_parameter *me = data;
+
+	int subsection_len;
+	const char *subsection, *key;
+	char *name;
+
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
+		return 0;
+
+	if (!subsection_len)
+		return parse_generic_submodule_config(var, key, value);
+	else {
+		int ret;
+		/* subsection is not null terminated */
+		name = xmemdupz(subsection, subsection_len);
+		ret = parse_specific_submodule_config(me, name, key, value, var);
+		free(name);
+		return ret;
+	}
+}
+
 static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
 				      unsigned char *gitmodules_sha1)
 {
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 21%]

* [PATCH 1/9] submodule-config: "goto" removal in parse_config()
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
@ 2015-10-27 18:15 24% ` Stefan Beller
  2015-10-27 21:26  4%   ` Jonathan Nieder
  2015-10-27 18:15 26% ` [PATCH 2/9] submodule config: keep update strategy around Stefan Beller
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Many components in if/else if/... cascade jumped to a shared
clean-up with "goto release_return", but we can restructure the
function a bit and make them disappear, which reduces the line count
as well.  Also reformat overlong lines and poorly indented ones
while at it.

The order of rules to verify the value for "ignore" used to be to
complain on multiple values first and then complain to boolean, but
swap the order to match how the values for "path" and "url" are
verified.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 74 +++++++++++++++++++++---------------------------------
 1 file changed, 29 insertions(+), 45 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 393de53..afe0ea8 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -257,78 +257,62 @@ static int parse_config(const char *var, const char *value, void *data)
 	if (!name_and_item_from_var(var, &name, &item))
 		return 0;
 
-	submodule = lookup_or_create_by_name(me->cache, me->gitmodules_sha1,
-			name.buf);
+	submodule = lookup_or_create_by_name(me->cache,
+					     me->gitmodules_sha1,
+					     name.buf);
 
 	if (!strcmp(item.buf, "path")) {
-		struct strbuf path = STRBUF_INIT;
-		if (!value) {
+		if (!value)
 			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (!me->overwrite && submodule->path != NULL) {
+		else if (!me->overwrite && submodule->path != NULL)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"path");
-			goto release_return;
+		else {
+			if (submodule->path)
+				cache_remove_path(me->cache, submodule);
+			free((void *) submodule->path);
+			submodule->path = xstrdup(value);
+			cache_put_path(me->cache, submodule);
 		}
-
-		if (submodule->path)
-			cache_remove_path(me->cache, submodule);
-		free((void *) submodule->path);
-		strbuf_addstr(&path, value);
-		submodule->path = strbuf_detach(&path, NULL);
-		cache_put_path(me->cache, submodule);
 	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
-		    submodule->fetch_recurse != RECURSE_SUBMODULES_NONE) {
+		    submodule->fetch_recurse != RECURSE_SUBMODULES_NONE)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"fetchrecursesubmodules");
-			goto release_return;
-		}
-
-		submodule->fetch_recurse = parse_fetch_recurse(var, value,
+		else
+			submodule->fetch_recurse = parse_fetch_recurse(
+								var, value,
 								die_on_error);
 	} else if (!strcmp(item.buf, "ignore")) {
-		struct strbuf ignore = STRBUF_INIT;
-		if (!me->overwrite && submodule->ignore != NULL) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->ignore != NULL)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"ignore");
-			goto release_return;
-		}
-		if (!value) {
-			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (strcmp(value, "untracked") && strcmp(value, "dirty") &&
-		    strcmp(value, "all") && strcmp(value, "none")) {
+		else if (strcmp(value, "untracked") &&
+			 strcmp(value, "dirty") &&
+			 strcmp(value, "all") &&
+			 strcmp(value, "none"))
 			warning("Invalid parameter '%s' for config option "
 					"'submodule.%s.ignore'", value, var);
-			goto release_return;
+		else {
+			free((void *) submodule->ignore);
+			submodule->ignore = xstrdup(value);
 		}
-
-		free((void *) submodule->ignore);
-		strbuf_addstr(&ignore, value);
-		submodule->ignore = strbuf_detach(&ignore, NULL);
 	} else if (!strcmp(item.buf, "url")) {
-		struct strbuf url = STRBUF_INIT;
 		if (!value) {
 			ret = config_error_nonbool(var);
-			goto release_return;
-		}
-		if (!me->overwrite && submodule->url != NULL) {
+		} else if (!me->overwrite && submodule->url != NULL) {
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"url");
-			goto release_return;
+		} else {
+			free((void *) submodule->url);
+			submodule->url = xstrdup(value);
 		}
-
-		free((void *) submodule->url);
-		strbuf_addstr(&url, value);
-		submodule->url = strbuf_detach(&url, NULL);
 	}
 
-release_return:
 	strbuf_release(&name);
 	strbuf_release(&item);
 
-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply related	[relevance 24%]

* [PATCH 0/9] Expose the submodule parallelism to the user
@ 2015-10-27 18:15  9% Stefan Beller
  2015-10-27 18:15 24% ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
                   ` (8 more replies)
  0 siblings, 9 replies; 200+ results
From: Stefan Beller @ 2015-10-27 18:15 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Where does it apply?
---
This applies on 376d400f4c (run-command: fix missing output from late callbacks,
which is the latest commit in origin/sb/submodule-parallel-fetch which was
merged to origin/next)
The first patch is a duplicate of origin/sb/submodule-config-parse, so
it may make sense to drop the first patch and apply this series on top of a
merge of 376d400f4c and origin/sb/submodule-config-parse.

I realize sending refactorings in the area you'd be likely to touch as 
a separate patch (series) is not necessarily a good idea as it leads to
situations like this.

What does it do?
---
This series should finish the on going efforts of parallelizing
submodule network traffic. The patches contain tests for clone,
fetch and submodule update to use the actual parallelism both via
command line as well as a configured option. I decided to go with
"submodule.jobs" for all three for now.

Detailed breakdown of the patches
---

Patch 1 is a duplicate of origin/sb/submodule-config-parse and may make
merging with that easier.

Patch 2 adds the update strategy to the struct submodule, which is required in
patch 4.

Patch 3 adds rudimentary tracing output to the parallel processing commands.

Patch 4 rewrites parts of "git submodule update" in C, such that the cloning
is done from within the parallel processing engine. 

Patch 5 however exposes the possible parallelism of patch 4 to the user.
(doc + tests)

Patch 6 adds the parallel feature to clone, which just invokes "submodule update"
internally.

Patch 7 is a small refactoring preparing patch 8 to smoothly parse submodules.jobs.

Patch 9 teaches fetch to respect the desired parallelism both from command line
as well as the config option.

Thanks,
Stefan

Stefan Beller (9):
  submodule-config: "goto" removal in parse_config()
  submodule config: keep update strategy around
  run_processes_parallel: Add output to tracing messages
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones
  submodule config: remove name_and_item_from_var
  submodule-config: parse_config
  fetching submodules: Respect `submodule.jobs` config option

 Documentation/config.txt        |   7 ++
 Documentation/git-clone.txt     |   5 +-
 Documentation/git-submodule.txt |   6 +-
 builtin/clone.c                 |  26 ++++-
 builtin/fetch.c                 |   2 +-
 builtin/submodule--helper.c     | 243 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++-----
 run-command.c                   |   4 +
 submodule-config.c              | 166 ++++++++++++++-------------
 submodule-config.h              |   3 +
 submodule.c                     |   5 +
 t/t5526-fetch-submodules.sh     |  14 +++
 t/t7400-submodule-basic.sh      |   4 +-
 t/t7406-submodule-update.sh     |  27 +++++
 14 files changed, 444 insertions(+), 122 deletions(-)

-- 
2.5.0.283.g1a79c94.dirty

^ permalink raw reply	[relevance 9%]

* Re: [PATCH 0/9] Expose the submodule parallelism to the user
  2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
                   ` (7 preceding siblings ...)
  2015-10-27 18:15 24% ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-27 19:12  7% ` Junio C Hamano
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
  8 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-27 19:12 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> Where does it apply?
> ---
> This applies on 376d400f4c (run-command: fix missing output from late callbacks,
> which is the latest commit in origin/sb/submodule-parallel-fetch which was
> merged to origin/next)

Thanks for a detailed description.  I'd do this:

    $ git checkout -b sb/submodule-parallel-update 8b70042
    $ git merge sb/submodule-parallel-fetch~4 ;# 376d400f4c

apply 2-9 there (the fork point is the merge of config-parse topic
to 'master'), and drop the four patches near the top of the other
branch.

> I realize sending refactorings in the area you'd be likely to touch as 
> a separate patch (series) is not necessarily a good idea as it leads to
> situations like this.

Don't worry too much about it.  When you tackle a large area with a
lot of existing code, these things are bound to happen.

> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option.

;-)

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones
  2015-10-27 18:15 24% ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-27 20:57  7%   ` Junio C Hamano
  2015-10-28 20:50  4%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-27 20:57 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> Just pass it along to "git submodule update", which may pick reasonable
> defaults if you don't specify an explicit number.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/git-clone.txt |  5 ++++-
>  builtin/clone.c             | 26 ++++++++++++++++++++------
>  t/t7406-submodule-update.sh | 15 +++++++++++++++
>  3 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index f1f2a3f..affa52e 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -14,7 +14,7 @@ SYNOPSIS
>  	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
>  	  [--dissociate] [--separate-git-dir <git dir>]
>  	  [--depth <depth>] [--[no-]single-branch]
> -	  [--recursive | --recurse-submodules] [--] <repository>
> +	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
>  	  [<directory>]
>  
>  DESCRIPTION
> @@ -216,6 +216,9 @@ objects from the source repository into a pack in the cloned repository.
>  	The result is Git repository can be separated from working
>  	tree.
>  
> +-j::
> +--jobs::

Judging from the way how "--depth <depth>" and other options with
parameter are described, I think this should be:

          -j <n>::
          --jobs <n>::

> +	The number of submodules fetched at the same time.

Do we want to say "Defaults to submodule.jobs" somewhere?

>  
>  <repository>::
>  	The (possibly remote) repository to clone from.  See the
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5864ad1..b8b1d4c 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -50,6 +50,7 @@ static int option_progress = -1;
>  static struct string_list option_config;
>  static struct string_list option_reference;
>  static int option_dissociate;
> +static int max_jobs = -1;
>  
>  static struct option builtin_clone_options[] = {
>  	OPT__VERBOSITY(&option_verbosity),
> @@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
>  		    N_("initialize submodules in the clone")),
>  	OPT_BOOL(0, "recurse-submodules", &option_recursive,
>  		    N_("initialize submodules in the clone")),
> +	OPT_INTEGER('j', "jobs", &max_jobs,
> +		    N_("number of submodules cloned in parallel")),
>  	OPT_STRING(0, "template", &option_template, N_("template-directory"),
>  		   N_("directory from which templates will be used")),
>  	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
> @@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
>  	OPT_END()
>  };
>  
> -static const char *argv_submodule[] = {
> -	"submodule", "update", "--init", "--recursive", NULL
> -};
> -
>  static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
>  {
>  	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
> @@ -674,8 +673,23 @@ static int checkout(void)
>  	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>  			   sha1_to_hex(sha1), "1", NULL);
>  
> -	if (!err && option_recursive)
> -		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
> +	if (!err && option_recursive) {
> +		struct argv_array args = ARGV_ARRAY_INIT;
> +		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
> +
> +		if (max_jobs == -1)
> +			if (git_config_get_int("submodule.jobs", &max_jobs))
> +				max_jobs = 1;

This is somewhat an irregular way to handle a configuration
variable.  Usually we instead do:

	* initialize a variable to "unspecified" (e.g. -1);
        * let git_config() callback to overwrite the variable;
        * let parse_options() to overwrite the variable.

so that you can just use the variable at the use site like this
function, knowing that the variable is already set with the correct
precedence order.

Besides, if you really cared what the value of submodule.jobs is,
shouldn't you be calling config_parallel_submodules()?  I'd also
think that you do not want to read that variable here in the first
place (see below)...

> +		if (max_jobs != 1) {
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_addf(&sb, "--jobs=%d", max_jobs);
> +			argv_array_push(&args, sb.buf);
> +			strbuf_release(&sb);
> +		}

I am tempted to suggest that you should not pay attention to
"submodule.jobs" in this command at all and just pass through
"--jobs=$max_jobs" that was specified from the command line, as the
spawned "submodule update --init --recursive" would handle
"submodule.jobs" itself.

Once you start allowing "clone.jobs" as a more specific version of
"submodule.jobs", then reading max_jobs first from "clone.jobs" and
then from the command line starts to make sense.  When neither is
specified, you would spawn "submodule update --init --recursive"
without any explicit "-j N" and let it honor its more generic
"submodule.jobs" setting; otherwise, you would run it with "-j N" to
override that more generic "submodule.jobs" setting with either the
value the command line -j given to "clone" or specified by a more
specific "clone.jobs".

> +		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
> +		argv_array_clear(&args);
> +	}

Thanks.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-27 18:15 23% ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
@ 2015-10-27 20:59  6%   ` Junio C Hamano
  2015-10-28 21:40  4%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-27 20:59 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> @@ -374,6 +374,10 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
>  	clone with a history truncated to the specified number of revisions.
>  	See linkgit:git-clone[1]
>  
> +-j::
> +--jobs::

This probably should be 

          -j <n>::
          --jobs <n>::

(see comments on [6/9]).  I know the option description in this file
is sloppy and does not say "--name <name>" etc., as it should (but
it does say "--reference <repository>"), and fixing them may not be
within the scope of this series, but we do not need to add more to
the existing problems.

> +	This option is only valid for the update command.
> +	Clone new submodules in parallel with as many jobs.

And when 0 starts to meaning something special, we would need to
describe that here (and/or submodule.jobs entry in config.txt).
As I already said, I do not think "0 means num_cpus" is a useful
default, and I would prefer if we reserved 0 to mean something more
useful we would figure out later.

Thanks.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option
  2015-10-27 18:15 24% ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-27 21:00  7%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-27 21:00 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 315f271..0b733d7 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2575,6 +2575,13 @@ submodule.<name>.ignore::
>  	"--ignore-submodules" option. The 'git submodule' commands are not
>  	affected by this setting.
>  
> +submodule::jobs

Did you mean this?

    submodule.jobs::

> +	This is used to determine how many submodules can be operated on in
> +	parallel. Specifying a positive integer allows up to that number
> +	of submodules being fetched in parallel. Specifying 0 the number
> +	of cpus will be taken as the maximum number. Currently this is
> +	used in fetch and clone operations only.
> +

You probably do not want to say "Currently this is" (you may still
want "only", though).  Whoever teaches other codepaths to pay
attention to the variable would update this as long as the
documentation stays current.

By the way, I doubt that "0 means num-CPUs" is a useful default for
parallelism that is used to help anything that is not CPU bound;
"clone", "submodule update", etc. are dominantly network bound, and
then disk I/O bound (especially if you are cloning from local disk).
I'd rather see "-j 0" to error out as "reserved for future use",
until we figure out what the useful default is, and then "-j 0" can
start using that default that is more useful than num_cpu.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 1/9] submodule-config: "goto" removal in parse_config()
  2015-10-27 18:15 24% ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
@ 2015-10-27 21:26  4%   ` Jonathan Nieder
  0 siblings, 0 replies; 200+ results
From: Jonathan Nieder @ 2015-10-27 21:26 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, gitster, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Hi,

Stefan Beller wrote:

> Subject: submodule-config: "goto" removal in parse_config()
>
> Many components in if/else if/... cascade jumped to a shared
> clean-up with "goto release_return", but we can restructure the
> function a bit and make them disappear,

Not having read the patch yet, the above makes me suspect this is
going to make the code worse.  A 'goto' for exception handling can
be a clean way to ensure everything allocated gets released, and
restructuring to avoid that can end up making the code more error
prone and harder to read.

In other words, the "goto" removal should be a side effect and not
the motivation.

>                                         which reduces the line count
> as well.  Also reformat overlong lines and poorly indented ones
> while at it.

These sound like good things.  Hopefully this will make the code
structure easier to understand, too.

> The order of rules to verify the value for "ignore" used to be to
> complain on multiple values first and then complain to boolean, but
> swap the order to match how the values for "path" and "url" are
> verified.

I don't understand this.  Hopefully the patch will make it clearer.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  submodule-config.c | 74 +++++++++++++++++++++---------------------------------
>  1 file changed, 29 insertions(+), 45 deletions(-)

What patch does this apply against?  A similar patch appears to
already be part of "master".

[...]
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -257,78 +257,62 @@ static int parse_config(const char *var, const char *value, void *data)
>  	if (!name_and_item_from_var(var, &name, &item))
>  		return 0;
>  
> -	submodule = lookup_or_create_by_name(me->cache, me->gitmodules_sha1,
> -			name.buf);
> +	submodule = lookup_or_create_by_name(me->cache,
> +					     me->gitmodules_sha1,
> +					     name.buf);

Ok.

>  	if (!strcmp(item.buf, "path")) {
> -		struct strbuf path = STRBUF_INIT;
> -		if (!value) {
> +		if (!value)
>  			ret = config_error_nonbool(var);
> -			goto release_return;
> -		}

In the preimage, I can see at this line already that nothing more is going to
happen in this case.  In the postimage, I need to scroll down to find that
everything else is "else"s.

More generally, the patch seems to be about changing from a code structure
of

	if (condition) {
		handle it;
		goto done;
	}
	if (other condition) {
		handle it;
		goto done;
	}
	handle misc;
	goto done;

to

	if (condition) {
		handle it;
	} else if (other condition) {
		handle it;
	} else {
		handle misc;
	}

In this example the postimage is concise and simple enough that it's
probably worth it, but it is not obvious in the general case that this
is always a good thing to do.

Now that I see the patch is already merged, I don't think it needs
tweaks.  Just a little concerned about the possibility of people
judging from the commit message and emulating the pattern in the rest
of git.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[relevance 4%]

* Re: What's the ".git/gitdir" file?
  @ 2015-10-27 22:22  5% ` Stefan Beller
  2015-10-27 22:42  2%   ` Randall S. Becker
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-27 22:22 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: git@vger.kernel.org

On Tue, Oct 27, 2015 at 3:04 PM, Kyle Meyer <kyle@kyleam.com> wrote:
> Hello,
>
> When a ".git" file points to another repo, a ".git/gitdir" file is
> created in that repo.
>
> For example, running
>
>     $ mkdir repo-a repo-b
>     $ cd repo-a
>     $ git init
>     $ cd ../repo-b
>     $ echo "gitdir: ../repo-a/.git" > .git
>     $ git status
>
> results in a file "repo-a/.git/gitdir" that contains
>
>     $ cat repo-a/.git/gitdir
>     .git
>
> I don't see this file mentioned in the gitrepository-layout manpage,
> and my searches haven't turned up any information on it.  What's the
> purpose of ".git/gitdir"?  Are there cases where it will contain
> something other than ".git"?
>
> Thanks.

It's designed for submodules to work IIUC.

Back in the day each git submodule had its own .git directory
keeping its local objects.

Nowadays the repository of submodule <name> is kept in the superprojects
.git/modules/<name> directory.

If you are in the submodule however you need to know where the repository is,
so we have a file pointing at ../<up until superprojects root
dir>/.git/modules/<name> directory.

If not using submodules, I'd expect that file to not be there.
If you have a file .git/gitdir which points to plain .git, this is
technically correct,
indicating where to find the repository (containing objects etc).

>
> --
> Kyle
> git version 2.6.1
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[relevance 5%]

* RE: What's the ".git/gitdir" file?
  2015-10-27 22:22  5% ` Stefan Beller
@ 2015-10-27 22:42  2%   ` Randall S. Becker
  2015-10-27 22:54  4%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Randall S. Becker @ 2015-10-27 22:42 UTC (permalink / raw)
  To: 'Stefan Beller', 'Kyle Meyer'; +Cc: git

-----Original Message-----
On Tue, October-27-15 6:23 PM, Stefan Beller wrote:
>On Tue, Oct 27, 2015 at 3:04 PM, Kyle Meyer <kyle@kyleam.com> wrote:
>> When a ".git" file points to another repo, a ".git/gitdir" file is 
>> created in that repo.
>>
>> For example, running
>>
>>     $ mkdir repo-a repo-b
>>     $ cd repo-a
>>     $ git init
>>     $ cd ../repo-b
>>     $ echo "gitdir: ../repo-a/.git" > .git
>>     $ git status
>>
>> results in a file "repo-a/.git/gitdir" that contains
>>
>>     $ cat repo-a/.git/gitdir
>>     .git
>>
>> I don't see this file mentioned in the gitrepository-layout manpage, 
>> and my searches haven't turned up any information on it.  What's the 
>> purpose of ".git/gitdir"?  Are there cases where it will contain 
>> something other than ".git"?
>
>It's designed for submodules to work IIUC.
>
>Back in the day each git submodule had its own .git directory keeping its local >objects.

>Nowadays the repository of submodule <name> is kept in the superprojects >.git/modules/<name> directory.

Slightly OT: Is there any way of avoiding having that file in the first place? I'm hoping to have a git repository in a normal file system (Posix) and a working area in a rather less-than-normal one where dots in file names are bad (actually a dot is a separator).

Cheers,
Randall

^ permalink raw reply	[relevance 2%]

* Re: What's the ".git/gitdir" file?
  2015-10-27 22:42  2%   ` Randall S. Becker
@ 2015-10-27 22:54  4%     ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-27 22:54 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: Kyle Meyer, git@vger.kernel.org

On Tue, Oct 27, 2015 at 3:42 PM, Randall S. Becker
<rsbecker@nexbridge.com> wrote:
> Slightly OT: Is there any way of avoiding having that file in the first place? I'm hoping to have a git repository in a normal file system (Posix) and a working area in a rather less-than-normal one where dots in file names are bad (actually a dot is a separator).

As said before, I would not expect a file .git/gitdir to be there if
not using submodules.
For your OT question, I'd presume you'd have environment variables setup
    export GIT_DIR=path_with_no_dots_and_git_repo_in_it # you mention
that is in your posix FS
    export GIT_WORK_TREE=/some.place.with.dot.separators
and you'd be good to go.


>
> Cheers,
> Randall
>

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones
  2015-10-27 20:57  7%   ` Junio C Hamano
@ 2015-10-28 20:50  4%     ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-28 20:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Tue, Oct 27, 2015 at 1:57 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> +     The number of submodules fetched at the same time.
>
> Do we want to say "Defaults to submodule.jobs" somewhere?

Yes. :)

> I am tempted to suggest that you should not pay attention to
> "submodule.jobs" in this command at all and just pass through
> "--jobs=$max_jobs" that was specified from the command line, as the
> spawned "submodule update --init --recursive" would handle
> "submodule.jobs" itself.

makes sense.

>
> Once you start allowing "clone.jobs" as a more specific version of
> "submodule.jobs", then reading max_jobs first from "clone.jobs" and
> then from the command line starts to make sense.  When neither is
> specified, you would spawn "submodule update --init --recursive"
> without any explicit "-j N" and let it honor its more generic
> "submodule.jobs" setting; otherwise, you would run it with "-j N" to
> override that more generic "submodule.jobs" setting with either the
> value the command line -j given to "clone" or specified by a more
> specific "clone.jobs".

I see. Though I do not plan adding clone.jobs in the near future.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 3/3] clone: Allow an explicit argument for parallel submodule clones
  2015-10-23 18:44 19% ` [PATCH 3/3] clone: Allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-28 21:03  4%   ` Sebastian Schuberth
  0 siblings, 0 replies; 200+ results
From: Sebastian Schuberth @ 2015-10-28 21:03 UTC (permalink / raw)
  To: git; +Cc: jrnieder, Jens.Lehmann

On 23.10.2015 20:44, Stefan Beller wrote:

> [...] which may pick reasonable
> defaults if you don't specify an explicit number.

IMO the above should also be mentioned ini the docs:

> +-j::
> +--jobs::
> +	The number of submodules fetched at the same time.

Otherwise, from reading the docs, my immediate question would be "What's 
the default for n if not specified?"

-- 
Sebastian Schuberth

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-27 20:59  6%   ` Junio C Hamano
@ 2015-10-28 21:40  4%     ` Stefan Beller
  2015-10-28 22:20  4%       ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-28 21:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Tue, Oct 27, 2015 at 1:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
> And when 0 starts to meaning something special, we would need to
> describe that here (and/or submodule.jobs entry in config.txt).
> As I already said, I do not think "0 means num_cpus" is a useful
> default, and I would prefer if we reserved 0 to mean something more
> useful we would figure out later.

Ok I'll add that, too.

I am just debating with myself where the best place is.
In run-command.c in pp_init we have:

    if (n < 1)
        n = online_cpus();
    pp->max_processes = n;

we would need to change only that one place to insert an

    die("We haven't found the right default yet for 0");

However I think for most loads online_cpus makes sense as that
is ususally the bottleneck for local operations (if being excessive
memory may become an issue, but unlikely IMHO).
So instead I think it makes more sense to add it in the fetch/clone/update
to come up with a treatment for 0.

Maybe we want to make the explicit decision for the default value
for any user of the parallel processing, such that this code above
is misguided as it leads to bad defaults if reviewers are inattentive.

So having spelled out that, we may just want to bark in the pp_init
for having a number n < 1.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 5/9] submodule update: expose parallelism to the user
  2015-10-28 21:40  4%     ` Stefan Beller
@ 2015-10-28 22:20  4%       ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-28 22:20 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine

Stefan Beller <sbeller@google.com> writes:

> On Tue, Oct 27, 2015 at 1:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> And when 0 starts to meaning something special, we would need to
>> describe that here (and/or submodule.jobs entry in config.txt).
>> As I already said, I do not think "0 means num_cpus" is a useful
>> default, and I would prefer if we reserved 0 to mean something more
>> useful we would figure out later.
>
> Ok I'll add that, too.

Sorry, but I take it back.  We just can document that (1) "-j 0"
will give you some default, (2) we do not promise that the default
will be optimal for you from day one, (3) we reserve the right to
"improve" it over time, and (4) we promise that we won't make it an
insanely wrong value.  And let's keep "0 currently means num_cpu",
which may or may not be optimal but it cannot be an "insanely wrong"
value.

^ permalink raw reply	[relevance 4%]

* [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-27 19:12  7% ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
@ 2015-10-28 23:21 25%   ` Stefan Beller
  2015-10-28 23:21 26%     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
                       ` (8 more replies)
  0 siblings, 9 replies; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This replaces origin/sb/submodule-parallel-update
(anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
into sb/submodule-parallel-update)

What does it do?
---
This series should finish the on going efforts of parallelizing
submodule network traffic. The patches contain tests for clone,
fetch and submodule update to use the actual parallelism both via
command line as well as a configured option. I decided to go with
"submodule.jobs" for all three for now.

What is new in v2?
---
* The patches got reordered slightly
* Documentation was adapted

Interdiff below

Stefan Beller (8):
  run_processes_parallel: Add output to tracing messages
  submodule config: keep update strategy around
  submodule config: remove name_and_item_from_var
  submodule-config: parse_config
  fetching submodules: Respect `submodule.jobs` config option
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones

 Documentation/config.txt        |   7 ++
 Documentation/git-clone.txt     |   6 +-
 Documentation/git-submodule.txt |   7 +-
 builtin/clone.c                 |  23 +++-
 builtin/fetch.c                 |   2 +-
 builtin/submodule--helper.c     | 244 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++-----
 run-command.c                   |   4 +
 submodule-config.c              |  98 ++++++++++------
 submodule-config.h              |   3 +
 submodule.c                     |   5 +
 t/t5526-fetch-submodules.sh     |  14 +++
 t/t7400-submodule-basic.sh      |   4 +-
 t/t7406-submodule-update.sh     |  27 +++++
 14 files changed, 418 insertions(+), 80 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0de0138..785721a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,12 +2643,12 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
-submodule::jobs
+submodule.jobs::
 	This is used to determine how many submodules can be operated on in
 	parallel. Specifying a positive integer allows up to that number
-	of submodules being fetched in parallel. Specifying 0 the number
-	of cpus will be taken as the maximum number. Currently this is
-	used in fetch and clone operations only.
+	of submodules being fetched in parallel. This is used in fetch
+	and clone operations only. A value of 0 will give some reasonable
+	default. The defaults may change with different versions of Git.
 
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index affa52e..01bd6b7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -216,9 +216,10 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
--j::
---jobs::
+-j <n>::
+--jobs <n>::
 	The number of submodules fetched at the same time.
+	Defaults to the `submodule.jobs` option.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f5429fa..c70fafd 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
--j::
---jobs::
+-j <n>::
+--jobs <n>::
 	This option is only valid for the update command.
 	Clone new submodules in parallel with as many jobs.
+	Defaults to the `submodule.jobs` option.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/clone.c b/builtin/clone.c
index 5ac2d89..22b9924 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -727,10 +727,7 @@ static int checkout(void)
 		struct argv_array args = ARGV_ARRAY_INIT;
 		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
 
-		if (max_jobs == -1)
-			if (git_config_get_int("submodule.jobs", &max_jobs))
-				max_jobs = 1;
-		if (max_jobs != 1) {
+		if (max_jobs != -1) {
 			struct strbuf sb = STRBUF_INIT;
 			strbuf_addf(&sb, "--jobs=%d", max_jobs);
 			argv_array_push(&args, sb.buf);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c3d438a..67dba1c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -476,9 +476,10 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
 
-	if (max_jobs == -1)
-		if (git_config_get_int("submodule.jobs", &max_jobs))
-			max_jobs = 1;
+	if (max_jobs < 0)
+		max_jobs = config_parallel_submodules();
+	if (max_jobs < 0)
+		max_jobs = 1;
 
 	run_processes_parallel(max_jobs,
 			       update_clone_get_next_task,

-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 25%]

* [PATCHv2 3/8] submodule config: remove name_and_item_from_var
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
  2015-10-28 23:21 26%     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
@ 2015-10-28 23:21 23%     ` Stefan Beller
  2015-10-30  1:23  7%       ` Eric Sunshine
  2015-10-28 23:21 21%     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

By inlining `name_and_item_from_var` it is easy to add later options
which are not required to have a submodule name.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 8b8c7d1..4d0563c 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -161,22 +161,6 @@ static struct submodule *cache_lookup_name(struct submodule_cache *cache,
 	return NULL;
 }
 
-static int name_and_item_from_var(const char *var, struct strbuf *name,
-				  struct strbuf *item)
-{
-	const char *subsection, *key;
-	int subsection_len, parse;
-	parse = parse_config_key(var, "submodule", &subsection,
-			&subsection_len, &key);
-	if (parse < 0 || !subsection)
-		return 0;
-
-	strbuf_add(name, subsection, subsection_len);
-	strbuf_addstr(item, key);
-
-	return 1;
-}
-
 static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 		const unsigned char *gitmodules_sha1, const char *name)
 {
@@ -251,18 +235,25 @@ static int parse_config(const char *var, const char *value, void *data)
 {
 	struct parse_config_parameter *me = data;
 	struct submodule *submodule;
-	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
-	int ret = 0;
+	int subsection_len, ret = 0;
+	const char *subsection, *key;
+	char *name;
 
-	/* this also ensures that we only parse submodule entries */
-	if (!name_and_item_from_var(var, &name, &item))
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
 		return 0;
 
+	if (!subsection_len)
+		return 0;
+
+	/* subsection is not null terminated */
+	name = xmemdupz(subsection, subsection_len);
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
-					     name.buf);
+					     name);
+	free(name);
 
-	if (!strcmp(item.buf, "path")) {
+	if (!strcmp(key, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->path != NULL)
@@ -275,7 +266,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->path = xstrdup(value);
 			cache_put_path(me->cache, submodule);
 		}
-	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+	} else if (!strcmp(key, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
@@ -286,7 +277,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->fetch_recurse = parse_fetch_recurse(
 								var, value,
 								die_on_error);
-	} else if (!strcmp(item.buf, "ignore")) {
+	} else if (!strcmp(key, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->ignore != NULL)
@@ -302,7 +293,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->ignore);
 			submodule->ignore = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "url")) {
+	} else if (!strcmp(key, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
 		} else if (!me->overwrite && submodule->url != NULL) {
@@ -312,7 +303,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "update")) {
+	} else if (!strcmp(key, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->update != NULL)
@@ -324,9 +315,6 @@ static int parse_config(const char *var, const char *value, void *data)
 		}
 	}
 
-	strbuf_release(&name);
-	strbuf_release(&item);
-
 	return ret;
 }
 
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 23%]

* [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
                       ` (2 preceding siblings ...)
  2015-10-28 23:21 21%     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
@ 2015-10-28 23:21 24%     ` Stefan Beller
  2015-10-30  2:17  5%       ` Eric Sunshine
  2015-10-28 23:21 21%     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
                       ` (4 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This allows to configure fetching and updating in parallel
without having the command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt    |  7 +++++++
 builtin/fetch.c             |  2 +-
 submodule-config.c          |  9 +++++++++
 submodule-config.h          |  2 ++
 submodule.c                 |  5 +++++
 t/t5526-fetch-submodules.sh | 14 ++++++++++++++
 6 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..785721a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
+submodule.jobs::
+	This is used to determine how many submodules can be operated on in
+	parallel. Specifying a positive integer allows up to that number
+	of submodules being fetched in parallel. This is used in fetch
+	and clone operations only. A value of 0 will give some reasonable
+	default. The defaults may change with different versions of Git.
+
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
 	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9cc1c9d..60e6797 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule-config.c b/submodule-config.c
index 1cea404..07bdcdf 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -32,6 +32,7 @@ enum lookup_type {
 
 static struct submodule_cache cache;
 static int is_cache_init;
+static int parallel_jobs = -1;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
@@ -235,6 +236,9 @@ static int parse_generic_submodule_config(const char *var,
 					  const char *key,
 					  const char *value)
 {
+	if (!strcmp(key, "jobs")) {
+		parallel_jobs = strtol(value, NULL, 10);
+	}
 	return 0;
 }
 
@@ -483,3 +487,8 @@ void submodule_free(void)
 	cache_free(&cache);
 	is_cache_init = 0;
 }
+
+int config_parallel_submodules(void)
+{
+	return parallel_jobs;
+}
diff --git a/submodule-config.h b/submodule-config.h
index f9e2a29..d9bbf9a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
 void submodule_free(void);
 
+int config_parallel_submodules(void);
+
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index 0257ea3..188ba02 100644
--- a/submodule.c
+++ b/submodule.c
@@ -752,6 +752,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_parallel_submodules();
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 1b4ce69..5c3579c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'fetching submodules respects parallel settings' '
+	git config fetch.recurseSubmodules true &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
+		grep "7 children" trace.out &&
+		git config submodule.jobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git fetch &&
+		grep "8 children" trace.out &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
+		grep "9 children" trace.out
+	)
+'
+
 test_done
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 24%]

* [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
                       ` (5 preceding siblings ...)
  2015-10-28 23:21 23%     ` [PATCHv2 7/8] submodule update: expose parallelism to the user Stefan Beller
@ 2015-10-28 23:21 24%     ` Stefan Beller
  2015-11-01  8:58  4%       ` Eric Sunshine
  2015-10-29 13:19  4%     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
  2015-10-29 20:12  6%     ` Junio C Hamano
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  6 +++++-
 builtin/clone.c             | 23 +++++++++++++++++------
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..01bd6b7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,10 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j <n>::
+--jobs <n>::
+	The number of submodules fetched at the same time.
+	Defaults to the `submodule.jobs` option.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 9eaecd9..22b9924 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_jobs = -1;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_jobs,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -724,8 +723,20 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_jobs != -1) {
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addf(&sb, "--jobs=%d", max_jobs);
+			argv_array_push(&args, sb.buf);
+			strbuf_release(&sb);
+		}
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 05ea66f..ade0524 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -786,4 +786,19 @@ test_expect_success 'submodule update can be run in parallel' '
 	 grep "9 children" trace.out
 	)
 '
+
+test_expect_success 'git clone passes the parallel jobs config on to submodules' '
+	test_when_finished "rm -rf super4" &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 7 . super4 &&
+	grep "7 children" trace.out &&
+	rm -rf super4 &&
+	git config --global submodule.jobs 8 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules . super4 &&
+	grep "8 children" trace.out &&
+	rm -rf super4 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 9 . super4 &&
+	grep "9 children" trace.out &&
+	rm -rf super4
+'
+
 test_done
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 24%]

* [PATCHv2 7/8] submodule update: expose parallelism to the user
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
                       ` (4 preceding siblings ...)
  2015-10-28 23:21 21%     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-10-28 23:21 23%     ` Stefan Beller
  2015-10-28 23:21 24%     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  7 ++++++-
 builtin/submodule--helper.c     | 18 ++++++++++++++----
 git-submodule.sh                |  9 +++++++++
 t/t7406-submodule-update.sh     | 12 ++++++++++++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..c70fafd 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j <n>::
+--jobs <n>::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
+	Defaults to the `submodule.jobs` option.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1ec1b85..67dba1c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -431,6 +431,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -451,6 +452,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -472,10 +475,17 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs < 0)
+		max_jobs = config_parallel_submodules();
+	if (max_jobs < 0)
+		max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index 9f554fb..10c5af9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -645,6 +645,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -670,6 +678,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..05ea66f 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
 	)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+	(cd super2 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+	 grep "7 children" trace.out &&
+	 git config submodule.jobs 8 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update &&
+	 grep "8 children" trace.out &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 9 &&
+	 grep "9 children" trace.out
+	)
+'
 test_done
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 23%]

* [PATCHv2 4/8] submodule-config: parse_config
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
  2015-10-28 23:21 26%     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
  2015-10-28 23:21 23%     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
@ 2015-10-28 23:21 21%     ` Stefan Beller
  2015-10-30  1:53  4%       ` Eric Sunshine
  2015-10-28 23:21 24%     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
                       ` (5 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This rewrites parse_config to distinguish between configs specific to
one submodule and configs which apply generically to all submodules.
We do not have generic submodule configs yet, but the next patch will
introduce "submodule.jobs".

Signed-off-by: Stefan Beller <sbeller@google.com>

# Conflicts:
#	submodule-config.c

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 58 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 4d0563c..1cea404 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -231,27 +231,23 @@ struct parse_config_parameter {
 	int overwrite;
 };
 
-static int parse_config(const char *var, const char *value, void *data)
+static int parse_generic_submodule_config(const char *var,
+					  const char *key,
+					  const char *value)
 {
-	struct parse_config_parameter *me = data;
-	struct submodule *submodule;
-	int subsection_len, ret = 0;
-	const char *subsection, *key;
-	char *name;
-
-	if (parse_config_key(var, "submodule", &subsection,
-			     &subsection_len, &key) < 0)
-		return 0;
-
-	if (!subsection_len)
-		return 0;
+	return 0;
+}
 
-	/* subsection is not null terminated */
-	name = xmemdupz(subsection, subsection_len);
-	submodule = lookup_or_create_by_name(me->cache,
-					     me->gitmodules_sha1,
-					     name);
-	free(name);
+static int parse_specific_submodule_config(struct parse_config_parameter *me,
+					   const char *name,
+					   const char *key,
+					   const char *value,
+					   const char *var)
+{
+	int ret = 0;
+	struct submodule *submodule = lookup_or_create_by_name(me->cache,
+							       me->gitmodules_sha1,
+							       name);
 
 	if (!strcmp(key, "path")) {
 		if (!value)
@@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
+static int parse_config(const char *var, const char *value, void *data)
+{
+	struct parse_config_parameter *me = data;
+
+	int subsection_len;
+	const char *subsection, *key;
+	char *name;
+
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
+		return 0;
+
+	if (!subsection_len)
+		return parse_generic_submodule_config(var, key, value);
+	else {
+		int ret;
+		/* subsection is not null terminated */
+		name = xmemdupz(subsection, subsection_len);
+		ret = parse_specific_submodule_config(me, name, key, value, var);
+		free(name);
+		return ret;
+	}
+}
+
 static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
 				      unsigned char *gitmodules_sha1)
 {
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 21%]

* [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
                       ` (3 preceding siblings ...)
  2015-10-28 23:21 24%     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-28 23:21 21%     ` Stefan Beller
  2015-10-29 22:34  6%       ` Junio C Hamano
  2015-10-28 23:21 23%     ` [PATCHv2 7/8] submodule update: expose parallelism to the user Stefan Beller
                       ` (3 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 234 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 247 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..1ec1b85 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,239 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	int count;
+	int quiet;
+	int print_unmatched;
+	char *reference;
+	char *depth;
+	char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix) {
+		argv_array_push(&cp->args, "--prefix");
+		argv_array_push(&cp->args, prefix);
+	}
+	argv_array_push(&cp->args, "--path");
+	argv_array_push(&cp->args, path);
+
+	argv_array_push(&cp->args, "--name");
+	argv_array_push(&cp->args, name);
+
+	argv_array_push(&cp->args, "--url");
+	argv_array_push(&cp->args, url);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		char *url = NULL;
+		int just_cloned = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (!sub) {
+			strbuf_addf(err, "BUG: internal error managing submodules. "
+				    "The cache could not locate '%s'", ce->name);
+			pp->print_unmatched = 1;
+			return 0;
+		}
+
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We cannot fall back to .gitmodules as we only want to process
+		 * configured submodules. This renders the submodule lookup API
+		 * useless, as it cannot lookup without fallback.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		just_cloned = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				just_cloned, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (just_cloned) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			free(url);
+			return 1;
+		} else
+			free(url);
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +497,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 9bc5c5f..9f554fb 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -664,17 +664,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -691,27 +692,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -751,13 +735,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 21%]

* [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
@ 2015-10-28 23:21 26%     ` Stefan Beller
  2015-10-30  1:14  4%       ` Eric Sunshine
  2015-10-28 23:21 23%     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
                       ` (7 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-28 23:21 UTC (permalink / raw)
  To: git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..8b8c7d1 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *)submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 	strbuf_release(&name);
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.5.0.281.g4ed9cdb

^ permalink raw reply related	[relevance 26%]

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
                       ` (6 preceding siblings ...)
  2015-10-28 23:21 24%     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-10-29 13:19  4%     ` Ramsay Jones
  2015-10-29 15:51  7%       ` Stefan Beller
  2015-10-29 20:12  6%     ` Junio C Hamano
  8 siblings, 1 reply; 200+ results
From: Ramsay Jones @ 2015-10-29 13:19 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: jacob.keller, peff, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine



On 28/10/15 23:21, Stefan Beller wrote:
> This replaces origin/sb/submodule-parallel-update
> (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
> into sb/submodule-parallel-update)
> 
> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option. I decided to go with
> "submodule.jobs" for all three for now.
> 
> What is new in v2?
> ---
> * The patches got reordered slightly
> * Documentation was adapted
> 
> Interdiff below
> 
> Stefan Beller (8):
>   run_processes_parallel: Add output to tracing messages
>   submodule config: keep update strategy around
>   submodule config: remove name_and_item_from_var
>   submodule-config: parse_config
>   fetching submodules: Respect `submodule.jobs` config option
>   git submodule update: have a dedicated helper for cloning
>   submodule update: expose parallelism to the user
>   clone: allow an explicit argument for parallel submodule clones
> 
>  Documentation/config.txt        |   7 ++
>  Documentation/git-clone.txt     |   6 +-
>  Documentation/git-submodule.txt |   7 +-
>  builtin/clone.c                 |  23 +++-
>  builtin/fetch.c                 |   2 +-
>  builtin/submodule--helper.c     | 244 ++++++++++++++++++++++++++++++++++++++++
>  git-submodule.sh                |  54 ++++-----
>  run-command.c                   |   4 +
>  submodule-config.c              |  98 ++++++++++------
>  submodule-config.h              |   3 +
>  submodule.c                     |   5 +
>  t/t5526-fetch-submodules.sh     |  14 +++
>  t/t7400-submodule-basic.sh      |   4 +-
>  t/t7406-submodule-update.sh     |  27 +++++
>  14 files changed, 418 insertions(+), 80 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0de0138..785721a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2643,12 +2643,12 @@ submodule.<name>.ignore::
>  	"--ignore-submodules" option. The 'git submodule' commands are not
>  	affected by this setting.
>  
> -submodule::jobs
> +submodule.jobs::
>  	This is used to determine how many submodules can be operated on in
>  	parallel. Specifying a positive integer allows up to that number
> -	of submodules being fetched in parallel. Specifying 0 the number
> -	of cpus will be taken as the maximum number. Currently this is
> -	used in fetch and clone operations only.
> +	of submodules being fetched in parallel. This is used in fetch
> +	and clone operations only. A value of 0 will give some reasonable
> +	default. The defaults may change with different versions of Git.
>  
>  tag.sort::
>  	This variable controls the sort ordering of tags when displayed by
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index affa52e..01bd6b7 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -216,9 +216,10 @@ objects from the source repository into a pack in the cloned repository.
>  	The result is Git repository can be separated from working
>  	tree.
>  
> --j::
> ---jobs::
> +-j <n>::
> +--jobs <n>::
>  	The number of submodules fetched at the same time.
> +	Defaults to the `submodule.jobs` option.

Hmm, is there a way to _not_ fetch in parallel (override the
config) from the command line for a given command?

ATB,
Ramsay Jones

>  
>  <repository>::
>  	The (possibly remote) repository to clone from.  See the
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index f5429fa..c70fafd 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -374,10 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
>  	clone with a history truncated to the specified number of revisions.
>  	See linkgit:git-clone[1]
>  
> --j::
> ---jobs::
> +-j <n>::
> +--jobs <n>::
>  	This option is only valid for the update command.
>  	Clone new submodules in parallel with as many jobs.
> +	Defaults to the `submodule.jobs` option.
>  
>  <path>...::
>  	Paths to submodule(s). When specified this will restrict the command
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 5ac2d89..22b9924 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -727,10 +727,7 @@ static int checkout(void)
>  		struct argv_array args = ARGV_ARRAY_INIT;
>  		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
>  
> -		if (max_jobs == -1)
> -			if (git_config_get_int("submodule.jobs", &max_jobs))
> -				max_jobs = 1;
> -		if (max_jobs != 1) {
> +		if (max_jobs != -1) {
>  			struct strbuf sb = STRBUF_INIT;
>  			strbuf_addf(&sb, "--jobs=%d", max_jobs);
>  			argv_array_push(&args, sb.buf);
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index c3d438a..67dba1c 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -476,9 +476,10 @@ static int update_clone(int argc, const char **argv, const char *prefix)
>  	/* Overlay the parsed .gitmodules file with .git/config */
>  	git_config(git_submodule_config, NULL);
>  
> -	if (max_jobs == -1)
> -		if (git_config_get_int("submodule.jobs", &max_jobs))
> -			max_jobs = 1;
> +	if (max_jobs < 0)
> +		max_jobs = config_parallel_submodules();
> +	if (max_jobs < 0)
> +		max_jobs = 1;
>  
>  	run_processes_parallel(max_jobs,
>  			       update_clone_get_next_task,
> 

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 13:19  4%     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
@ 2015-10-29 15:51  7%       ` Stefan Beller
  2015-10-29 17:23  4%         ` Junio C Hamano
  2015-10-29 23:50  6%         ` Ramsay Jones
  0 siblings, 2 replies; 200+ results
From: Stefan Beller @ 2015-10-29 15:51 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:

> Hmm, is there a way to _not_ fetch in parallel (override the
> config) from the command line for a given command?
>
> ATB,
> Ramsay Jones

git config submodule.jobs 42
git <foo> --jobs 1 # should run just one task, despite having 42 configured

It does use the parallel processing machinery though, but with a maximum of
one subcommand being spawned. Is that what you're asking?

Thanks,
Stefan

^ permalink raw reply	[relevance 7%]

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 15:51  7%       ` Stefan Beller
@ 2015-10-29 17:23  4%         ` Junio C Hamano
  2015-10-29 17:30  4%           ` Stefan Beller
  2015-10-29 23:50  6%         ` Ramsay Jones
  1 sibling, 1 reply; 200+ results
From: Junio C Hamano @ 2015-10-29 17:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Ramsay Jones, git@vger.kernel.org, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

Stefan Beller <sbeller@google.com> writes:

> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
> <ramsay@ramsayjones.plus.com> wrote:
>
>> Hmm, is there a way to _not_ fetch in parallel (override the
>> config) from the command line for a given command?
>>
>> ATB,
>> Ramsay Jones
>
> git config submodule.jobs 42
> git <foo> --jobs 1 # should run just one task, despite having 42 configured
>
> It does use the parallel processing machinery though, but with a maximum of
> one subcommand being spawned. Is that what you're asking?

With this patch, do we still keep a separate machinery that bypasses
the parallel thing altogether in the first place?

I was hoping that the underlying parallel machinery is polished
enough that using it with max=1 parallelism would be equivalent to
serial execution.  At least, that was my understanding of our goal,
and back when we reviewed the previous "fetch --recurse-sub" series,
my impression was we were already there.

And in that ideal endgame world, your "Give '-j1' from the command
line" would be perfectly an acceptable answer ;-).

Thanks.
 

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 17:23  4%         ` Junio C Hamano
@ 2015-10-29 17:30  4%           ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-29 17:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ramsay Jones, git@vger.kernel.org, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Thu, Oct 29, 2015 at 10:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>> <ramsay@ramsayjones.plus.com> wrote:
>>
>>> Hmm, is there a way to _not_ fetch in parallel (override the
>>> config) from the command line for a given command?
>>>
>>> ATB,
>>> Ramsay Jones
>>
>> git config submodule.jobs 42
>> git <foo> --jobs 1 # should run just one task, despite having 42 configured
>>
>> It does use the parallel processing machinery though, but with a maximum of
>> one subcommand being spawned. Is that what you're asking?
>
> With this patch, do we still keep a separate machinery that bypasses
> the parallel thing altogether in the first place?

No.

>
> I was hoping that the underlying parallel machinery is polished
> enough that using it with max=1 parallelism would be equivalent to
> serial execution.

There is no special code path for jobs=1.

It should be pretty close, just with the overhead of the parallel engine
spawning it one after the other and being an intermediate for output piping.
The one subcommand would still output via a pipe to the parallel engine,
which then outputs it immediately.

> At least, that was my understanding of our goal,
> and back when we reviewed the previous "fetch --recurse-sub" series,
> my impression was we were already there.
>
> And in that ideal endgame world, your "Give '-j1' from the command
> line" would be perfectly an acceptable answer ;-).

ok. :)

>
> Thanks.
>

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
                       ` (7 preceding siblings ...)
  2015-10-29 13:19  4%     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
@ 2015-10-29 20:12  6%     ` Junio C Hamano
  8 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-29 20:12 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> This replaces origin/sb/submodule-parallel-update
> (anchoring at 74367d8938, Merge branch 'sb/submodule-parallel-fetch'
> into sb/submodule-parallel-update)
>
> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option. I decided to go with
> "submodule.jobs" for all three for now.
>
> What is new in v2?
> ---
> * The patches got reordered slightly
> * Documentation was adapted

A couple of things I noticed (other than "many issues pointed out in
v1 have been updated") are:

 - The way 7/8 and 8/8 checks for uninitialized max_jobs are
   inconsistently written.  The way 7/8 does, i.e. (max_jobs < 0),
   looks more conventional.

 - "Defaults to the `submodule.jobs` option" should say
   "configuration variable" instead.

I haven't formed an opinion on 6/8 yet.

^ permalink raw reply	[relevance 6%]

* Re: [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning
  2015-10-28 23:21 21%     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-10-29 22:34  6%       ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-10-29 22:34 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> +struct submodule_update_clone {
> +	int count;
> +	int quiet;
> +	int print_unmatched;
> +	char *reference;
> +	char *depth;
> +	char *update;
> +	const char *recursive_prefix;
> +	const char *prefix;
> +	struct module_list list;
> +	struct string_list projectlines;
> +	struct pathspec pathspec;
> +};

These fields should be split into at least two classes, the ones
that are primarily the "configuration", and the others that are
"states".  I am guessing 'quiet' is what the caller prepares and
tells the pp callbacks that they must work with reduced verbosity,
and 'print_unmatched' is also in the same boat.  From the above
structure definition, nobody can guess what 'count' represents.  Is
that the number of modules you have in the top-level superproject?
Is that the number of modules updated so far?  Some other number?

We can guess "list" is probably the list of modules to be cloned or
updated, but we have no idea what "projectlines" mean and what it
will be used for.  The only word with 'project' we would use in the
context of discussing submodules is the "top level superproject",
but then that will not need a "list", so that is not it.  Perhaps
this refers to a list of projects bound to our tree as submodules,
and perhaps each such submodule gives some kind of "lines", but it
is totally unclear what kind of lines they use.

> +static void fill_clone_command(struct child_process *cp, int quiet,
> +			       const char *prefix, const char *path,
> +			       const char *name, const char *url,
> +			       const char *reference, const char *depth)
> +{
> +	cp->git_cmd = 1;
> +	cp->no_stdin = 1;
> +	cp->stdout_to_stderr = 1;
> +	cp->err = -1;
> +	argv_array_push(&cp->args, "submodule--helper");
> +	argv_array_push(&cp->args, "clone");
> +	if (quiet)
> +		argv_array_push(&cp->args, "--quiet");
> +
> +	if (prefix) {
> +		argv_array_push(&cp->args, "--prefix");
> +		argv_array_push(&cp->args, prefix);
> +	}
> +	argv_array_push(&cp->args, "--path");
> +	argv_array_push(&cp->args, path);

The pattern makes readers wish if there were a way to make these
pair of pushes easier to read.  The best I can come up with is

    argv_array_pushl(&cp->args, "--path", path, NULL);

While that would be already a vast improvement, when we know there
are many "I want to push two", it makes me wonder if I am entitled
to find the repeated ", NULL" irritating.

    argv_array_push2(&cp->args, "--path", path);

on the hand feels slightly too specific.  I dunno.

> +static int update_clone_get_next_task(void **pp_task_cb,
> +				      struct child_process *cp,
> +				      struct strbuf *err,
> +				      void *pp_cb)
> +{
> +	struct submodule_update_clone *pp = pp_cb;
> +
> +	for (; pp->count < pp->list.nr; pp->count++) {
> +		const struct submodule *sub = NULL;
> +		const char *displaypath = NULL;
> +		const struct cache_entry *ce = pp->list.entries[pp->count];
> +		struct strbuf sb = STRBUF_INIT;
> +		const char *update_module = NULL;
> +		char *url = NULL;
> +		int just_cloned = 0;
> +
> +		if (ce_stage(ce)) {
> +			if (pp->recursive_prefix)
> +				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
> +					pp->recursive_prefix, ce->name);
> +			else
> +				strbuf_addf(err, "Skipping unmerged submodule %s\n",
> +					ce->name);
> +			continue;
> +		}
> +
> +		sub = submodule_from_path(null_sha1, ce->name);
> +		if (!sub) {
> +			strbuf_addf(err, "BUG: internal error managing submodules. "
> +				    "The cache could not locate '%s'", ce->name);
> +			pp->print_unmatched = 1;
> +			return 0;

This feels a bit inconsistent.  When the pp->count'th submodule is
set not to update (i.e. "none" below), you let this loop to ignore
that submodule and continue on to process pp->count+1'th one without
returning to the caller.  Is there a reason why this case should be
processed differently?  If the rest of the code treats this
condition as a "grave error" that tells the caller to never call
get-next again (i.e. the "emergency abort" condition), that sort of
makes sense, but I cannot offhand see if that is being done in this
patch.

> +		}
> +
> +		if (pp->recursive_prefix)
> +			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
> +		else
> +			displaypath = ce->name;
> +
> +		if (pp->update)
> +			update_module = pp->update;
> +		if (!update_module)
> +			update_module = sub->update;
> +		if (!update_module)
> +			update_module = "checkout";
> +		if (!strcmp(update_module, "none")) {
> +			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
> +			continue;
> +		}
> +
> +		/*
> +		 * Looking up the url in .git/config.
> +		 * We cannot fall back to .gitmodules as we only want to process

s/cannot/must not/, right?

> +		 * configured submodules. This renders the submodule lookup API
> +		 * useless, as it cannot lookup without fallback.
> +		 */

I doubt the value of the last sentence, especially the "useless"
part.

Either "We do not want to read .gitmodules and that is why we do not
use submodule config API, period" (which does not make it "useless",
it is just not meant to be used here at all), or "We do not want to
read .gitmodules in this codepath, and submodule config API cannot
be used here before we teach it an option to only check the config
without falling back" (which does not make it "useless", it is just
that you haven't made it ready to be used here yet).

> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "submodule.%s.url", sub->name);
> +		git_config_get_string(sb.buf, &url);
> +		if (!url) {
> +			/*
> +			 * Only mention uninitialized submodules when its
> +			 * path have been specified
> +			 */
> +			if (pp->pathspec.nr)
> +				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
> +					"Maybe you want to use 'update --init'?"), displaypath);
> +			continue;
> +		}
> +
> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "%s/.git", ce->name);
> +		just_cloned = !file_exists(sb.buf);

That name was misleading and had me scratch my head for a while.
This module is in the "needs cloning" state, and you haven't even
started cloning it yet.

> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
> +				sha1_to_hex(ce->sha1), ce_stage(ce),
> +				just_cloned, ce->name);
> +		string_list_append(&pp->projectlines, sb.buf);
> +
> +		if (just_cloned) {
> +			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
> +					   sub->name, url, pp->reference, pp->depth);
> +			pp->count++;
> +			free(url);
> +			return 1;
> +		} else
> +			free(url);
> +	}
> +	return 0;
> +}

That's it for today.  I'll take a look at the remainder another day.

Thanks.

^ permalink raw reply	[relevance 6%]

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 15:51  7%       ` Stefan Beller
  2015-10-29 17:23  4%         ` Junio C Hamano
@ 2015-10-29 23:50  6%         ` Ramsay Jones
  2015-11-03 19:41  7%           ` Stefan Beller
  1 sibling, 1 reply; 200+ results
From: Ramsay Jones @ 2015-10-29 23:50 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine



On 29/10/15 15:51, Stefan Beller wrote:
> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
> <ramsay@ramsayjones.plus.com> wrote:
> 
>> Hmm, is there a way to _not_ fetch in parallel (override the
>> config) from the command line for a given command?
>>
>> ATB,
>> Ramsay Jones
> 
> git config submodule.jobs 42
> git <foo> --jobs 1 # should run just one task, despite having 42 configured

Heh, yes ... I didn't pose the question quite right ...
> 
> It does use the parallel processing machinery though, but with a maximum of
> one subcommand being spawned. Is that what you're asking?

... but, despite that, you correctly inferred what I was really
asking about! :)

I was just wondering what overhead the parallel processing machinery
adds to the original 'non-parallel' code path (for the j=1 case).
I suspect the answer is 'not much', but that's just a guess.
Have you measured it? What happens if there is only a single
submodule to fetch?

ATB,
Ramsay Jones

^ permalink raw reply	[relevance 6%]

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-28 23:21 26%     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
@ 2015-10-30  1:14  4%       ` Eric Sunshine
  2015-10-30 17:38  4%         ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Eric Sunshine @ 2015-10-30  1:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> We need the submodule update strategies in a later patch.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff --git a/submodule-config.c b/submodule-config.c
> index afe0ea8..8b8c7d1 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
>                         free((void *) submodule->url);
>                         submodule->url = xstrdup(value);
>                 }
> +       } else if (!strcmp(item.buf, "update")) {
> +               if (!value)
> +                       ret = config_error_nonbool(var);
> +               else if (!me->overwrite && submodule->update != NULL)

Although "foo != NULL" is unusual in this code-base, it is used
elsewhere in this file, including just outside the context seen above.
Okay.

> +                       warn_multiple_config(me->commit_sha1, submodule->name,
> +                                            "update");
> +               else {
> +                       free((void *)submodule->update);

Minor: Every other 'free((void *) foo)' in this file has a space after
"(void *)", one of which can be seen in the context just above.

> +                       submodule->update = xstrdup(value);
> +               }
>         }
>
>         strbuf_release(&name);

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 3/8] submodule config: remove name_and_item_from_var
  2015-10-28 23:21 23%     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
@ 2015-10-30  1:23  7%       ` Eric Sunshine
  2015-10-30 18:37  4%         ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Eric Sunshine @ 2015-10-30  1:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> submodule config: remove name_and_item_from_var
>
> By inlining `name_and_item_from_var` it is easy to add later options
> which are not required to have a submodule name.

I guess you're trying to say that name_and_item_from_var() didn't
provide a proper abstraction, thus wasn't as useful as expected.
Perhaps that commit message could make this shortcoming clearer.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/submodule-config.c b/submodule-config.c
> index 8b8c7d1..4d0563c 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -251,18 +235,25 @@ static int parse_config(const char *var, const char *value, void *data)
>  {
>         struct parse_config_parameter *me = data;
>         struct submodule *submodule;
> -       struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
> -       int ret = 0;
> +       int subsection_len, ret = 0;
> +       const char *subsection, *key;
> +       char *name;
>
> -       /* this also ensures that we only parse submodule entries */
> -       if (!name_and_item_from_var(var, &name, &item))
> +       if (parse_config_key(var, "submodule", &subsection,
> +                            &subsection_len, &key) < 0)
>                 return 0;
>
> +       if (!subsection_len)
> +               return 0;

Alternately:

    if (parse_config_key(var, "submodule", &subsection,
            &subsection_len, &key) < 0 || !subsection_len)
        return 0;

> +
> +       /* subsection is not null terminated */
> +       name = xmemdupz(subsection, subsection_len);
>         submodule = lookup_or_create_by_name(me->cache,
>                                              me->gitmodules_sha1,
> -                                            name.buf);
> +                                            name);
> +       free(name);

Since this is all private to submodule-config.c, I wonder if it would
be cleaner to change lookup_or_create_by_name() to accept a
name_length argument?

> -       if (!strcmp(item.buf, "path")) {
> +       if (!strcmp(key, "path")) {
>                 if (!value)
>                         ret = config_error_nonbool(var);
>                 else if (!me->overwrite && submodule->path != NULL)

^ permalink raw reply	[relevance 7%]

* Re: [PATCHv2 4/8] submodule-config: parse_config
  2015-10-28 23:21 21%     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
@ 2015-10-30  1:53  4%       ` Eric Sunshine
  2015-10-30 19:29  7%         ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Eric Sunshine @ 2015-10-30  1:53 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> submodule-config: parse_config

Um, what?

> This rewrites parse_config to distinguish between configs specific to
> one submodule and configs which apply generically to all submodules.
> We do not have generic submodule configs yet, but the next patch will
> introduce "submodule.jobs".
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
>
> # Conflicts:
> #       submodule-config.c

Interesting.

> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/submodule-config.c b/submodule-config.c
> index 4d0563c..1cea404 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -231,27 +231,23 @@ struct parse_config_parameter {
>         int overwrite;
>  };
>
> -static int parse_config(const char *var, const char *value, void *data)
> +static int parse_generic_submodule_config(const char *var,
> +                                         const char *key,
> +                                         const char *value)
>  {
> -       struct parse_config_parameter *me = data;
> -       struct submodule *submodule;
> -       int subsection_len, ret = 0;
> -       const char *subsection, *key;
> -       char *name;
> -
> -       if (parse_config_key(var, "submodule", &subsection,
> -                            &subsection_len, &key) < 0)
> -               return 0;
> -
> -       if (!subsection_len)
> -               return 0;
> +       return 0;
> +}
>
> -       /* subsection is not null terminated */
> -       name = xmemdupz(subsection, subsection_len);
> -       submodule = lookup_or_create_by_name(me->cache,
> -                                            me->gitmodules_sha1,
> -                                            name);
> -       free(name);
> +static int parse_specific_submodule_config(struct parse_config_parameter *me,
> +                                          const char *name,
> +                                          const char *key,
> +                                          const char *value,
> +                                          const char *var)

Minor: Are these 'key', 'value', 'var' arguments analogous to the
like-named arguments of parse_generic_submodule_config()? If so, why
is the order of arguments different?

> +{
> +       int ret = 0;
> +       struct submodule *submodule = lookup_or_create_by_name(me->cache,
> +                                                              me->gitmodules_sha1,
> +                                                              name);
>
>         if (!strcmp(key, "path")) {
>                 if (!value)
> @@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
>         return ret;
>  }
>
> +static int parse_config(const char *var, const char *value, void *data)
> +{
> +       struct parse_config_parameter *me = data;
> +
> +       int subsection_len;
> +       const char *subsection, *key;
> +       char *name;
> +
> +       if (parse_config_key(var, "submodule", &subsection,
> +                            &subsection_len, &key) < 0)
> +               return 0;
> +
> +       if (!subsection_len)
> +               return parse_generic_submodule_config(var, key, value);
> +       else {
> +               int ret;
> +               /* subsection is not null terminated */
> +               name = xmemdupz(subsection, subsection_len);
> +               ret = parse_specific_submodule_config(me, name, key, value, var);
> +               free(name);
> +               return ret;
> +       }
> +}

Minor: You could drop the 'else' and outdent its body, thus losing one
indentation level.

    if (!subsection_len)
        return parse_generic_submodule_config(...);

    int ret;
    ...
    return ret;

This might give you a less noisy diff and would be a bit more
consistent with the early part of the function where you don't bother
giving the if (parse_config_key(...)) an 'else' body.

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option
  2015-10-28 23:21 24%     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
@ 2015-10-30  2:17  5%       ` Eric Sunshine
  0 siblings, 0 replies; 200+ results
From: Eric Sunshine @ 2015-10-30  2:17 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> This allows to configure fetching and updating in parallel
> without having the command line option.
>
> This moved the responsibility to determine how many parallel processes
> to start from builtin/fetch to submodule.c as we need a way to communicate
> "The user did not specify the number of parallel processes in the command
> line options" in the builtin fetch. The submodule code takes care of
> the precedence (CLI > config > default)
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 391a0c3..785721a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
>         "--ignore-submodules" option. The 'git submodule' commands are not
>         affected by this setting.
>
> +submodule.jobs::
> +       This is used to determine how many submodules can be operated on in
> +       parallel. Specifying a positive integer allows up to that number
> +       of submodules being fetched in parallel. This is used in fetch
> +       and clone operations only. A value of 0 will give some reasonable
> +       default. The defaults may change with different versions of Git.

I'm not sure that "default" is the correct word here. When you talk
about a "default", you're normally explaining what happens when the
configuration is not provided. (In fact, the default number of jobs is
1, which you may want to document here).

>  tag.sort::
>         This variable controls the sort ordering of tags when displayed by
>         linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
> diff --git a/submodule-config.c b/submodule-config.c
> index 1cea404..07bdcdf 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -32,6 +32,7 @@ enum lookup_type {
>
>  static struct submodule_cache cache;
>  static int is_cache_init;
> +static int parallel_jobs = -1;
>
>  static int config_path_cmp(const struct submodule_entry *a,
>                            const struct submodule_entry *b,
> @@ -235,6 +236,9 @@ static int parse_generic_submodule_config(const char *var,
>                                           const char *key,
>                                           const char *value)
>  {
> +       if (!strcmp(key, "jobs")) {
> +               parallel_jobs = strtol(value, NULL, 10);
> +       }

Style: unnecessary braces

Why does this allow a negative value? The documentation doesn't
mention anything about it.

>         return 0;
>  }
>
> diff --git a/submodule.c b/submodule.c
> index 0257ea3..188ba02 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -752,6 +752,11 @@ int fetch_populated_submodules(const struct argv_array *options,
>         argv_array_push(&spf.args, "--recurse-submodules-default");
>         /* default value, "--submodule-prefix" and its value are added later */
>
> +       if (max_parallel_jobs < 0)
> +               max_parallel_jobs = config_parallel_submodules();
> +       if (max_parallel_jobs < 0)
> +               max_parallel_jobs = 1;

run_process_parallel() itself specially handles max_parallel_jobs==0,
so you don't need to consider it here. Okay.

> +
>         calculate_changed_submodule_paths();
>         run_processes_parallel(max_parallel_jobs,
>                                get_next_submodule,
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 1b4ce69..5c3579c 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
>         test_i18ncmp expect.err actual.err
>  '
>
> +test_expect_success 'fetching submodules respects parallel settings' '
> +       git config fetch.recurseSubmodules true &&
> +       (
> +               cd downstream &&
> +               GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
> +               grep "7 children" trace.out &&
> +               git config submodule.jobs 8 &&
> +               GIT_TRACE=$(pwd)/trace.out git fetch &&
> +               grep "8 children" trace.out &&
> +               GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
> +               grep "9 children" trace.out
> +       )
> +'

Not specifically related to this test, but maybe add tests to check
cases when --jobs is not specified, and --jobs=1?

> +
>  test_done
> --
> 2.5.0.281.g4ed9cdb
>

^ permalink raw reply	[relevance 5%]

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-30  1:14  4%       ` Eric Sunshine
@ 2015-10-30 17:38  4%         ` Stefan Beller
  2015-10-30 18:16  4%           ` Eric Sunshine
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-10-30 17:38 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Thu, Oct 29, 2015 at 6:14 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
>> +               else if (!me->overwrite && submodule->update != NULL)
>
> Although "foo != NULL" is unusual in this code-base, it is used
> elsewhere in this file, including just outside the context seen above.
> Okay.

ok, I'll clean that up as we go.

>> +                       free((void *)submodule->update);
>
> Minor: Every other 'free((void *) foo)' in this file has a space after
> "(void *)", one of which can be seen in the context just above.

done

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-30 17:38  4%         ` Stefan Beller
@ 2015-10-30 18:16  4%           ` Eric Sunshine
  2015-10-30 18:25  4%             ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Eric Sunshine @ 2015-10-30 18:16 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Fri, Oct 30, 2015 at 1:38 PM, Stefan Beller <sbeller@google.com> wrote:
> On Thu, Oct 29, 2015 at 6:14 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
>>> +               else if (!me->overwrite && submodule->update != NULL)
>>
>> Although "foo != NULL" is unusual in this code-base, it is used
>> elsewhere in this file, including just outside the context seen above.
>> Okay.
>
> ok, I'll clean that up as we go.

Oh, I wasn't suggesting that you clean this up (though you may if you
want). I was merely commenting (for the sake of others reviewing this
patch) that, while not the norm for the project, this instance is
consistent with surrounding code.

>>> +                       free((void *)submodule->update);
>>
>> Minor: Every other 'free((void *) foo)' in this file has a space after
>> "(void *)", one of which can be seen in the context just above.
>
> done

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 2/8] submodule config: keep update strategy around
  2015-10-30 18:16  4%           ` Eric Sunshine
@ 2015-10-30 18:25  4%             ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-30 18:25 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Fri, Oct 30, 2015 at 11:16 AM, Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Fri, Oct 30, 2015 at 1:38 PM, Stefan Beller <sbeller@google.com> wrote:
>> On Thu, Oct 29, 2015 at 6:14 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
>>>> +               else if (!me->overwrite && submodule->update != NULL)
>>>
>>> Although "foo != NULL" is unusual in this code-base, it is used
>>> elsewhere in this file, including just outside the context seen above.
>>> Okay.
>>
>> ok, I'll clean that up as we go.
>
> Oh, I wasn't suggesting that you clean this up (though you may if you
> want). I was merely commenting (for the sake of others reviewing this
> patch) that, while not the norm for the project, this instance is
> consistent with surrounding code.

I only did a separate patch on top cleaning up 4 occurrences in that file.
We use != NULL quite often throughout the code base, specially in
conditions with side effects like:

    while ((char *c = string++) != NULL) {
        ...

where I think that makes even sense. But there are a minor number of
cases where we have no side effects

    $ grep -rI "!= NULL" |grep -v "((" |grep -v "))" |wc -l
    135



>
>>>> +                       free((void *)submodule->update);
>>>
>>> Minor: Every other 'free((void *) foo)' in this file has a space after
>>> "(void *)", one of which can be seen in the context just above.
>>
>> done

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 3/8] submodule config: remove name_and_item_from_var
  2015-10-30  1:23  7%       ` Eric Sunshine
@ 2015-10-30 18:37  4%         ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-30 18:37 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Thu, Oct 29, 2015 at 6:23 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
> On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
>> submodule config: remove name_and_item_from_var
>>
>> By inlining `name_and_item_from_var` it is easy to add later options
>> which are not required to have a submodule name.
>
> I guess you're trying to say that name_and_item_from_var() didn't
> provide a proper abstraction, thus wasn't as useful as expected.
> Perhaps that commit message could make this shortcoming clearer.
>

ok

>
>     if (parse_config_key(var, "submodule", &subsection,
>             &subsection_len, &key) < 0 || !subsection_len)
>         return 0;

done

>>         submodule = lookup_or_create_by_name(me->cache,
>>                                              me->gitmodules_sha1,
>> -                                            name.buf);
>> +                                            name);
>> +       free(name);
>
> Since this is all private to submodule-config.c, I wonder if it would
> be cleaner to change lookup_or_create_by_name() to accept a
> name_length argument?
>

That looks amazingly clean. :)

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv2 4/8] submodule-config: parse_config
  2015-10-30  1:53  4%       ` Eric Sunshine
@ 2015-10-30 19:29  7%         ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-10-30 19:29 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Thu, Oct 29, 2015 at 6:53 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
> On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
>> submodule-config: parse_config
>
> Um, what?

submodule-config: Introduce parse_generic_submodule_config

>
>> This rewrites parse_config to distinguish between configs specific to
>> one submodule and configs which apply generically to all submodules.
>> We do not have generic submodule configs yet, but the next patch will
>> introduce "submodule.jobs".
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>>
>> # Conflicts:
>> #       submodule-config.c
>
> Interesting.

fixed

>
> Minor: Are these 'key', 'value', 'var' arguments analogous to the
> like-named arguments of parse_generic_submodule_config()? If so, why
> is the order of arguments different?

Reordered. I thought how they made most sense individually, but consistency
across functions is better.

>
>> +{
>> +       int ret = 0;
>> +       struct submodule *submodule = lookup_or_create_by_name(me->cache,
>> +                                                              me->gitmodules_sha1,
>> +                                                              name);
>>
>>         if (!strcmp(key, "path")) {
>>                 if (!value)
>> @@ -318,6 +314,30 @@ static int parse_config(const char *var, const char *value, void *data)
>>         return ret;
>>  }
>>
>> +static int parse_config(const char *var, const char *value, void *data)
>> +{
>> +       struct parse_config_parameter *me = data;
>> +
>> +       int subsection_len;
>> +       const char *subsection, *key;
>> +       char *name;
>> +
>> +       if (parse_config_key(var, "submodule", &subsection,
>> +                            &subsection_len, &key) < 0)
>> +               return 0;
>> +
>> +       if (!subsection_len)
>> +               return parse_generic_submodule_config(var, key, value);
>> +       else {
>> +               int ret;
>> +               /* subsection is not null terminated */
>> +               name = xmemdupz(subsection, subsection_len);
>> +               ret = parse_specific_submodule_config(me, name, key, value, var);
>> +               free(name);
>> +               return ret;
>> +       }
>> +}
>
> Minor: You could drop the 'else' and outdent its body, thus losing one
> indentation level.

By passing on the subsection, subsection_len, we only have one statement there

     if (!subsection_len)
         return parse_generic_submodule_config(key, var, value, me);
     else
         return parse_specific_submodule_config(subsection,
               subsection_len, key,
                  var, value, me);

will do without dedenting I guess.

^ permalink raw reply	[relevance 7%]

* Re: [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones
  2015-10-28 23:21 24%     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-11-01  8:58  4%       ` Eric Sunshine
  0 siblings, 0 replies; 200+ results
From: Eric Sunshine @ 2015-11-01  8:58 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann

On Wed, Oct 28, 2015 at 7:21 PM, Stefan Beller <sbeller@google.com> wrote:
> Just pass it along to "git submodule update", which may pick reasonable
> defaults if you don't specify an explicit number.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> @@ -724,8 +723,20 @@ static int checkout(void)
>         err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>                            sha1_to_hex(sha1), "1", NULL);
>
> -       if (!err && option_recursive)
> -               err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
> +       if (!err && option_recursive) {
> +               struct argv_array args = ARGV_ARRAY_INIT;
> +               argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
> +
> +               if (max_jobs != -1) {
> +                       struct strbuf sb = STRBUF_INIT;
> +                       strbuf_addf(&sb, "--jobs=%d", max_jobs);
> +                       argv_array_push(&args, sb.buf);
> +                       strbuf_release(&sb);

The above four lines can be collapsed to:

    argv_array_pushf(&args, "--jobs=%d", max_jobs);

> +               }
> +
> +               err = run_command_v_opt(args.argv, RUN_GIT_CMD);
> +               argv_array_clear(&args);
> +       }
>
>         return err;
>  }

^ permalink raw reply	[relevance 4%]

* Re: git.git as of tonight
  @ 2015-11-02 23:06  2%   ` Stefan Beller
    0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-02 23:06 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git@vger.kernel.org, Johannes Schindelin

On Mon, Nov 2, 2015 at 1:15 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> Am 02.11.2015 um 03:58 schrieb Junio C Hamano:
>> * sb/submodule-parallel-fetch (2015-10-21) 14 commits
>>    (merged to 'next' on 2015-10-23 at 8f04bbd)
>>   + run-command: fix missing output from late callbacks
>>   + test-run-command: increase test coverage
>>   + test-run-command: test for gracefully aborting
>>   + run-command: initialize the shutdown flag
>>   + run-command: clear leftover state from child_process structure
>>   + run-command: fix early shutdown
>>    (merged to 'next' on 2015-10-15 at df63590)
>>   + submodules: allow parallel fetching, add tests and documentation
>>   + fetch_populated_submodules: use new parallel job processing
>>   + run-command: add an asynchronous parallel child processor
>>   + sigchain: add command to pop all common signals
>>   + strbuf: add strbuf_read_once to read without blocking
>>   + xread_nonblock: add functionality to read from fds without blocking
>>   + xread: poll on non blocking fds
>>   + submodule.c: write "Fetching submodule <foo>" to stderr
>>   (this branch is used by rs/daemon-leak-fix and sb/submodule-parallel-update.)
>>
>>   Add a framework to spawn a group of processes in parallel, and use
>>   it to run "git fetch --recurse-submodules" in parallel.
>>
>>   Will merge to 'master'.
>
> Please don't, yet. This series does not build on Windows:
>
> run-command.c: In function 'set_nonblocking':
> run-command.c:1011: error: 'F_GETFL' undeclared (first use in this function)
> run-command.c:1011: error: (Each undeclared identifier is reported only once
> run-command.c:1011: error: for each function it appears in.)
> run-command.c:1015: error: 'F_SETFL' undeclared (first use in this function)
> run-command.c:1015: error: 'O_NONBLOCK' undeclared (first use in this function)
> make: *** [run-command.o] Error 1

Going by a quick search http://stackoverflow.com/a/22756664
I'd hope we only need to modify the set_nonblocking function using #ifdefs ?

>
> I have to investigate whether we can have some sort of Posixy
> non-blocking IO on Windows or whether we have to opt-out from this
> parallel-process facility. Any help from Windows experts would be
> appreciated.
>
> -- Hannes
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[relevance 2%]

* Re: git.git as of tonight
  @ 2015-11-03 18:18  2%         ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-03 18:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git@vger.kernel.org, Johannes Schindelin

On Tue, Nov 3, 2015 at 9:05 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Johannes Sixt <j6t@kdbg.org> writes:
>
>> My findings so far are negative. The only short-term and mid-term
>> solution I see so far is to opt-out from the framework during
>> build-time.

So I started reading up on that[1].
As far as I understand, we don't need to mark a file descriptor
to be non blocking, but rather we could use ReadFileEx[2] with
a flag set for "overlapped" operation.

So that said, we can make set_nonblocking a noop and
provide another implementation for strbuf_read_once
depending on NO_PTHREADS being set.
Maybe not even strbuf_read_once, but rather the underlying
xread_nonblock ?



[1] http://tinyclouds.org/iocp-links.html
[2] https://msdn.microsoft.com/en-us/library/aa365468(v=VS.85).aspx

>
> Now, from where I sit, it seems that the way forward would be
>
>  1. Make this an optional feature so that platforms can compile it
>     out, if it is not already done.  My preference, even if we go
>     that route, would be to see if we can find a way to preserve the
>     overall code structure (e.g. instead of spawning multiple
>     workers, which is why the code needs NONBLOCK to avoid getting
>     stuck on reading from one while others are working, perhaps we
>     can spawn only one and not do a nonblock read?).

Yeah that would be my understanding as well. If we don't come up with
a good solution for parallelism in Windows now, we'd need to make it at
least working in the jobs=1 case as well as it worked before.

>
>  2. After that is done, the feature could graduate to 'master'.  As
>     this is a bigger framework change than others, however, we do
>     not necessarily want to rush it.  On the other hand, because
>     this only affects submodules, which means it has fewer users and
>     testers that would give us feedback while it is on 'next', we
>     may want to push it to 'master' sooner to give it a larger
>     exposure.  I dunno, and I do not want to decide this myself the
>     week before I'll go offline for a few weeks (i.e. today).

Yeah I guess cooking this well done has its benefits.

>
>  3. Then we would enlist help from folks who are more familiar with
>     Windows platform (like you) to see how the "run parallel workers
>     and collect from them" can be (re)done with a nice level of
>     abstraction.  I am hoping that we can continue the tradition of
>     the evolution of run-command.c API (I am specifically impressed
>     by what you did for "async" that allows the callers not to worry
>     about threads and processes) aroundt this area.  That is
>     obviously a mid- to longer term goal.

I just wonder if we can skip step 1) and 2) by having the discussion
now how to change the framework to work well without posix file
descriptors here.

>
> Thanks for working together well, you two.

^ permalink raw reply	[relevance 2%]

* Re: [PATCHv2 0/8] Expose the submodule parallelism to the user
  2015-10-29 23:50  6%         ` Ramsay Jones
@ 2015-11-03 19:41  7%           ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-03 19:41 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: git@vger.kernel.org, Jacob Keller, Jeff King, Junio C Hamano,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Thu, Oct 29, 2015 at 4:50 PM, Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>
>
> On 29/10/15 15:51, Stefan Beller wrote:
>> On Thu, Oct 29, 2015 at 6:19 AM, Ramsay Jones
>> <ramsay@ramsayjones.plus.com> wrote:
>>
>>> Hmm, is there a way to _not_ fetch in parallel (override the
>>> config) from the command line for a given command?
>>>
>>> ATB,
>>> Ramsay Jones
>>
>> git config submodule.jobs 42
>> git <foo> --jobs 1 # should run just one task, despite having 42 configured
>
> Heh, yes ... I didn't pose the question quite right ...
>>
>> It does use the parallel processing machinery though, but with a maximum of
>> one subcommand being spawned. Is that what you're asking?
>
> ... but, despite that, you correctly inferred what I was really
> asking about! :)
>
> I was just wondering what overhead the parallel processing machinery
> adds to the original 'non-parallel' code path (for the j=1 case).
> I suspect the answer is 'not much', but that's just a guess.
> Have you measured it?

Totally unscientific:
 * Make a copy of my current gerrit repository and time the fetch.
 * That repo contains 5 submodules, one needs fetching

time git fetch --recurse-submodules=yes --jobs=1 # this series
real 0m7.150s
user 0m3.459s
sys 0m1.126s

time git fetch --recurse-submodules=yes # origin/master
real 0m7.667s
user 0m3.439s
sys 0m1.190s

Now let's test a few more times repeatedly to avoid cold caches or
network hiccups, (also there is nothing to fetch, so it's more like doing
6 ls-remotes in a row, one for gerrit and 5 submodules)

this series, best out of 5:
real 0m3.971s
user 0m2.447s
sys 0m0.452s

this series, worst out of 5:
real 0m4.229s
user 0m2.506s
sys 0m0.413s

origin/master, best out of 5:
real 0m3.968s
user 0m2.516s
sys 0m0.380s

origin/master, worst out of 5:
real 0m4.217s
user 0m2.472s
sys 0m0.408s

The ratio of real time taken longer is < 1 % in
both the best and worst case.

If you really care about 1 % of performance, you'd want to fetch in
parallel anyway?


> What happens if there is only a single
> submodule to fetch?

Ok let's see. I created https://github.com/stefanbeller/test-sub-1
to play around with it. However
time git fetch --recurse-submodules=yes
or
time git fetch --recurse-submodules=yes --jobs 100
seems to be lost in the noise.

So I am not sure what the question is w.r.t. having just one
submodule.


>
> ATB,
> Ramsay Jones
>
>

^ permalink raw reply	[relevance 7%]

* [PATCHv3 00/11]  Expose the submodule parallelism to the user
@ 2015-11-04  0:37  9% Stefan Beller
                     ` (9 more replies)
  0 siblings, 10 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Where does it apply?
---
This series applies on top of d075d2604c0f92045caa8d5bb6ab86cf4921a4ae (Merge
branch 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update) and replaces
the previous patches in sb/submodule-parallel-update

What does it do?
---
This series should finish the on going efforts of parallelizing
submodule network traffic. The patches contain tests for clone,
fetch and submodule update to use the actual parallelism both via
command line as well as a configured option. I decided to go with
"submodule.jobs" for all three for now.

What's new in v3?
---

 * 3 new patches (make it compile in Windows, better warnings in posix environment
   for setting fds to non blocking, drop check against NULL)
 * adressed reviews by Eric for readability. :) 
 * addressed Junios comments for the new clone helper function

Stefan Beller (11):
  run_processes_parallel: delimit intermixed task output
  run-command: report failure for degraded output just once
  run-command: omit setting file descriptors to non blocking in Windows
  submodule-config: keep update strategy around
  submodule-config: drop check against NULL
  submodule-config: remove name_and_item_from_var
  submodule-config: introduce parse_generic_submodule_config
  fetching submodules: respect `submodule.jobs` config option
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones

 Documentation/config.txt        |   7 ++
 Documentation/git-clone.txt     |   6 +-
 Documentation/git-submodule.txt |   7 +-
 builtin/clone.c                 |  19 +++-
 builtin/fetch.c                 |   2 +-
 builtin/submodule--helper.c     | 239 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++-----
 run-command.c                   |  26 ++++-
 submodule-config.c              | 109 +++++++++++-------
 submodule-config.h              |   3 +
 submodule.c                     |   5 +
 t/t5526-fetch-submodules.sh     |  14 +++
 t/t7400-submodule-basic.sh      |   4 +-
 t/t7406-submodule-update.sh     |  27 +++++
 14 files changed, 433 insertions(+), 89 deletions(-)

-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply	[relevance 9%]

* [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
                   ` (4 preceding siblings ...)
  2015-11-04  0:37 23% ` [PATCHv3 07/11] submodule-config: introduce parse_generic_submodule_config Stefan Beller
@ 2015-11-04  0:37 24% ` Stefan Beller
  2015-11-10 22:21  7%   ` Jens Lehmann
  2015-11-04  0:37 21% ` [PATCHv3 09/11] git submodule update: have a dedicated helper for cloning Stefan Beller
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

This allows to configure fetching and updating in parallel
without having the command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt    |  7 +++++++
 builtin/fetch.c             |  2 +-
 submodule-config.c          | 15 +++++++++++++++
 submodule-config.h          |  2 ++
 submodule.c                 |  5 +++++
 t/t5526-fetch-submodules.sh | 14 ++++++++++++++
 6 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..70e1b88 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
+submodule.jobs::
+	This is used to determine how many submodules can be operated on in
+	parallel. Specifying a positive integer allows up to that number
+	of submodules being fetched in parallel. This is used in fetch
+	and clone operations only. A value of 0 will give some reasonable
+	configuration. It defaults to 1.
+
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
 	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9cc1c9d..60e6797 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule-config.c b/submodule-config.c
index 29e21b2..475551a 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -32,6 +32,7 @@ enum lookup_type {
 
 static struct submodule_cache cache;
 static int is_cache_init;
+static int parallel_jobs = -1;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
@@ -239,6 +240,15 @@ static int parse_generic_submodule_config(const char *key,
 					  const char *value,
 					  struct parse_config_parameter *me)
 {
+	if (!strcmp(key, "jobs")) {
+		parallel_jobs = strtol(value, NULL, 10);
+		if (parallel_jobs < 0) {
+			warning("submodule.jobs not allowed to be negative.");
+			parallel_jobs = 1;
+			return 1;
+		}
+	}
+
 	return 0;
 }
 
@@ -482,3 +492,8 @@ void submodule_free(void)
 	cache_free(&cache);
 	is_cache_init = 0;
 }
+
+int config_parallel_submodules(void)
+{
+	return parallel_jobs;
+}
diff --git a/submodule-config.h b/submodule-config.h
index f9e2a29..d9bbf9a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
 void submodule_free(void);
 
+int config_parallel_submodules(void);
+
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index 0257ea3..188ba02 100644
--- a/submodule.c
+++ b/submodule.c
@@ -752,6 +752,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_parallel_submodules();
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 1b4ce69..5c3579c 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'fetching submodules respects parallel settings' '
+	git config fetch.recurseSubmodules true &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
+		grep "7 children" trace.out &&
+		git config submodule.jobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git fetch &&
+		grep "8 children" trace.out &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
+		grep "9 children" trace.out
+	)
+'
+
 test_done
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 24%]

* [PATCHv3 07/11] submodule-config: introduce parse_generic_submodule_config
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
                   ` (3 preceding siblings ...)
  2015-11-04  0:37 24% ` [PATCHv3 06/11] submodule-config: remove name_and_item_from_var Stefan Beller
@ 2015-11-04  0:37 23% ` Stefan Beller
  2015-11-04  0:37 24% ` [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option Stefan Beller
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

This rewrites parse_config to distinguish between configs specific to
one submodule and configs which apply generically to all submodules.
We do not have generic submodule configs yet, but the next patch will
introduce "submodule.jobs".

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index b826841..29e21b2 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -234,17 +234,22 @@ struct parse_config_parameter {
 	int overwrite;
 };
 
-static int parse_config(const char *var, const char *value, void *data)
+static int parse_generic_submodule_config(const char *key,
+					  const char *var,
+					  const char *value,
+					  struct parse_config_parameter *me)
 {
-	struct parse_config_parameter *me = data;
-	struct submodule *submodule;
-	int subsection_len, ret = 0;
-	const char *subsection, *key;
-
-	if (parse_config_key(var, "submodule", &subsection,
-			     &subsection_len, &key) < 0 || !subsection_len)
-		return 0;
+	return 0;
+}
 
+static int parse_specific_submodule_config(const char *subsection, int subsection_len,
+					   const char *key,
+					   const char *var,
+					   const char *value,
+					   struct parse_config_parameter *me)
+{
+	int ret = 0;
+	struct submodule *submodule;
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
 					     subsection, subsection_len);
@@ -314,6 +319,24 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
+static int parse_config(const char *var, const char *value, void *data)
+{
+	struct parse_config_parameter *me = data;
+	int subsection_len;
+	const char *subsection, *key;
+
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
+		return 0;
+
+	if (!subsection_len)
+		return parse_generic_submodule_config(key, var, value, me);
+	else
+		return parse_specific_submodule_config(subsection,
+						       subsection_len, key,
+						       var, value, me);
+}
+
 static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
 				      unsigned char *gitmodules_sha1)
 {
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 23%]

* [PATCHv3 06/11] submodule-config: remove name_and_item_from_var
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
                   ` (2 preceding siblings ...)
  2015-11-04  0:37 24% ` [PATCHv3 05/11] submodule-config: drop check against NULL Stefan Beller
@ 2015-11-04  0:37 24% ` Stefan Beller
  2015-11-04  0:37 23% ` [PATCHv3 07/11] submodule-config: introduce parse_generic_submodule_config Stefan Beller
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

`name_and_item_from_var` does not provide the proper abstraction
we need here in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 48 ++++++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 6d01941..b826841 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -161,31 +161,17 @@ static struct submodule *cache_lookup_name(struct submodule_cache *cache,
 	return NULL;
 }
 
-static int name_and_item_from_var(const char *var, struct strbuf *name,
-				  struct strbuf *item)
-{
-	const char *subsection, *key;
-	int subsection_len, parse;
-	parse = parse_config_key(var, "submodule", &subsection,
-			&subsection_len, &key);
-	if (parse < 0 || !subsection)
-		return 0;
-
-	strbuf_add(name, subsection, subsection_len);
-	strbuf_addstr(item, key);
-
-	return 1;
-}
-
 static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
-		const unsigned char *gitmodules_sha1, const char *name)
+						  const unsigned char *gitmodules_sha1,
+						  const char *name_ptr, int name_len)
 {
 	struct submodule *submodule;
 	struct strbuf name_buf = STRBUF_INIT;
+	char *name = xmemdupz(name_ptr, name_len);
 
 	submodule = cache_lookup_name(cache, gitmodules_sha1, name);
 	if (submodule)
-		return submodule;
+		goto out;
 
 	submodule = xmalloc(sizeof(*submodule));
 
@@ -201,7 +187,8 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 	hashcpy(submodule->gitmodules_sha1, gitmodules_sha1);
 
 	cache_add(cache, submodule);
-
+out:
+	free(name);
 	return submodule;
 }
 
@@ -251,18 +238,18 @@ static int parse_config(const char *var, const char *value, void *data)
 {
 	struct parse_config_parameter *me = data;
 	struct submodule *submodule;
-	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
-	int ret = 0;
+	int subsection_len, ret = 0;
+	const char *subsection, *key;
 
-	/* this also ensures that we only parse submodule entries */
-	if (!name_and_item_from_var(var, &name, &item))
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0 || !subsection_len)
 		return 0;
 
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
-					     name.buf);
+					     subsection, subsection_len);
 
-	if (!strcmp(item.buf, "path")) {
+	if (!strcmp(key, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->path)
@@ -275,7 +262,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->path = xstrdup(value);
 			cache_put_path(me->cache, submodule);
 		}
-	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+	} else if (!strcmp(key, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
@@ -286,7 +273,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->fetch_recurse = parse_fetch_recurse(
 								var, value,
 								die_on_error);
-	} else if (!strcmp(item.buf, "ignore")) {
+	} else if (!strcmp(key, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->ignore)
@@ -302,7 +289,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->ignore);
 			submodule->ignore = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "url")) {
+	} else if (!strcmp(key, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
 		} else if (!me->overwrite && submodule->url) {
@@ -312,7 +299,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "update")) {
+	} else if (!strcmp(key, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->update)
@@ -324,9 +311,6 @@ static int parse_config(const char *var, const char *value, void *data)
 		}
 	}
 
-	strbuf_release(&name);
-	strbuf_release(&item);
-
 	return ret;
 }
 
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 24%]

* [PATCHv3 10/11] submodule update: expose parallelism to the user
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
                   ` (6 preceding siblings ...)
  2015-11-04  0:37 21% ` [PATCHv3 09/11] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-11-04  0:37 23% ` Stefan Beller
  2015-11-04  0:37 24% ` [PATCHv3 11/11] clone: allow an explicit argument for parallel submodule clones Stefan Beller
  2015-11-04 17:54  6% ` [PATCHv3 00/11] Expose the submodule parallelism to the user Junio C Hamano
  9 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  7 ++++++-
 builtin/submodule--helper.c     | 18 ++++++++++++++----
 git-submodule.sh                |  9 +++++++++
 t/t7406-submodule-update.sh     | 12 ++++++++++++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..c70fafd 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j <n>::
+--jobs <n>::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
+	Defaults to the `submodule.jobs` option.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 95b45a2..662d329 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -426,6 +426,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -446,6 +447,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -467,10 +470,17 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs < 0)
+		max_jobs = config_parallel_submodules();
+	if (max_jobs < 0)
+		max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index 9f554fb..10c5af9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -645,6 +645,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -670,6 +678,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..05ea66f 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
 	)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+	(cd super2 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+	 grep "7 children" trace.out &&
+	 git config submodule.jobs 8 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update &&
+	 grep "8 children" trace.out &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 9 &&
+	 grep "9 children" trace.out
+	)
+'
 test_done
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 23%]

* [PATCHv3 11/11] clone: allow an explicit argument for parallel submodule clones
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
                   ` (7 preceding siblings ...)
  2015-11-04  0:37 23% ` [PATCHv3 10/11] submodule update: expose parallelism to the user Stefan Beller
@ 2015-11-04  0:37 24% ` Stefan Beller
  2015-11-04 17:54  6% ` [PATCHv3 00/11] Expose the submodule parallelism to the user Junio C Hamano
  9 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  6 +++++-
 builtin/clone.c             | 19 +++++++++++++------
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..01bd6b7 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,10 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j <n>::
+--jobs <n>::
+	The number of submodules fetched at the same time.
+	Defaults to the `submodule.jobs` option.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 9eaecd9..ce578d2 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_jobs = -1;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_jobs,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -724,8 +723,16 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_jobs != -1)
+			argv_array_pushf(&args, "--jobs=%d", max_jobs);
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 05ea66f..ade0524 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -786,4 +786,19 @@ test_expect_success 'submodule update can be run in parallel' '
 	 grep "9 children" trace.out
 	)
 '
+
+test_expect_success 'git clone passes the parallel jobs config on to submodules' '
+	test_when_finished "rm -rf super4" &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 7 . super4 &&
+	grep "7 children" trace.out &&
+	rm -rf super4 &&
+	git config --global submodule.jobs 8 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules . super4 &&
+	grep "8 children" trace.out &&
+	rm -rf super4 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 9 . super4 &&
+	grep "9 children" trace.out &&
+	rm -rf super4
+'
+
 test_done
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 24%]

* [PATCHv3 09/11] git submodule update: have a dedicated helper for cloning
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
                   ` (5 preceding siblings ...)
  2015-11-04  0:37 24% ` [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option Stefan Beller
@ 2015-11-04  0:37 21% ` Stefan Beller
  2015-11-04  0:37 23% ` [PATCHv3 10/11] submodule update: expose parallelism to the user Stefan Beller
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 229 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 242 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..95b45a2 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,234 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	/* states */
+	int count;
+	int print_unmatched;
+	/* configuration */
+	int quiet;
+	const char *reference;
+	const char *depth;
+	const char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix)
+		argv_array_pushl(&cp->args, "--prefix", prefix, NULL);
+
+	argv_array_pushl(&cp->args, "--path", path, NULL);
+	argv_array_pushl(&cp->args, "--name", name, NULL);
+	argv_array_pushl(&cp->args, "--url", url, NULL);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		char *url = NULL;
+		int needs_cloning = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (!sub) {
+			strbuf_addf(err, "BUG: internal error managing submodules. "
+				    "The cache could not locate '%s'", ce->name);
+			pp->print_unmatched = 1;
+			continue;
+		}
+
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We must not fall back to .gitmodules as we only want to process
+		 * configured submodules.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		needs_cloning = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				needs_cloning, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (needs_cloning) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			free(url);
+			return 1;
+		} else
+			free(url);
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +492,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 9bc5c5f..9f554fb 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -664,17 +664,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -691,27 +692,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -751,13 +735,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 21%]

* [PATCHv3 05/11] submodule-config: drop check against NULL
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
    2015-11-04  0:37 26% ` [PATCHv3 04/11] submodule-config: keep update strategy around Stefan Beller
@ 2015-11-04  0:37 24% ` Stefan Beller
  2015-11-04  0:37 24% ` [PATCHv3 06/11] submodule-config: remove name_and_item_from_var Stefan Beller
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Adhere to the common coding style of Git and not check explicitly
for NULL throughout the file. There are still other occurrences in the
code base but that is usually inside of conditions with side effects.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 4239b0e..6d01941 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -265,7 +265,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	if (!strcmp(item.buf, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->path != NULL)
+		else if (!me->overwrite && submodule->path)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"path");
 		else {
@@ -289,7 +289,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->ignore != NULL)
+		else if (!me->overwrite && submodule->ignore)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"ignore");
 		else if (strcmp(value, "untracked") &&
@@ -305,7 +305,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
-		} else if (!me->overwrite && submodule->url != NULL) {
+		} else if (!me->overwrite && submodule->url) {
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"url");
 		} else {
@@ -315,7 +315,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->update != NULL)
+		else if (!me->overwrite && submodule->update)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					     "update");
 		else {
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 24%]

* [PATCHv3 04/11] submodule-config: keep update strategy around
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
  @ 2015-11-04  0:37 26% ` Stefan Beller
  2015-11-04  0:37 24% ` [PATCHv3 05/11] submodule-config: drop check against NULL Stefan Beller
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04  0:37 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..4239b0e 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *) submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 	strbuf_release(&name);
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 26%]

* Re: [PATCHv3 00/11]  Expose the submodule parallelism to the user
  2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
                   ` (8 preceding siblings ...)
  2015-11-04  0:37 24% ` [PATCHv3 11/11] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-11-04 17:54  6% ` Junio C Hamano
  2015-11-04 18:08  7%   ` Stefan Beller
  9 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-11-04 17:54 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, j6t

Stefan Beller <sbeller@google.com> writes:

> Where does it apply?
> ---
> This series applies on top of d075d2604c0f92045caa8d5bb6ab86cf4921a4ae (Merge
> branch 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update) and replaces
> the previous patches in sb/submodule-parallel-update
>
> What does it do?
> ---
> This series should finish the on going efforts of parallelizing
> submodule network traffic. The patches contain tests for clone,
> fetch and submodule update to use the actual parallelism both via
> command line as well as a configured option. I decided to go with
> "submodule.jobs" for all three for now.

The order of patches and where the series builds makes me suspect
that I have been expecting too much from the "parallel-fetch" topic.

I've been hoping that it would be useful for the project as a whole
to polish the other topic and make it available to wider audience
sooner by itself (both from "end users get improved Git early"
aspect and from "the core machinery to be reused in follow-up
improvements are made closer to perfection sooner" perspective).  So
I've been expecting that "Let's fix it on Windows" change directly
on top of sb/submodule-parallel-fetch to make that topic usable
before everything else.  Other patches in this series may require
the child_process_cleanup() change, so they may be applied on top of
the merge between sb/submodule-parallel-fetch (updated for Windows)
and rs/daemon-plug-child-leak topic.

That does not seem to be what's happening here (note: I am not
complaining; I am just trying to make sure expectation matches
reality).  Am I reading you correctly?

I think sb/submodule-parallel-fetch + sb/submodule-parallel-update
as a single topic would need more time to mature to be in a tagged
release than we have in the remainder of this cycle.  It is likely
that the former topic has a chance to get rebased after 2.7 happens.
And that would allow us to (1) use the child_process_cleanup() from
get-go instead of _deinit and to (2) get the machinery right both
for UNIX and Windows from get-go.  Which would make the result
easier to understand.  As this is one of the more important areas,
it matters to keep the resulting code and the rationale behind it
understandable by reading "log --reverse -p".

Thanks.

^ permalink raw reply	[relevance 6%]

* Re: [PATCHv3 00/11] Expose the submodule parallelism to the user
  2015-11-04 17:54  6% ` [PATCHv3 00/11] Expose the submodule parallelism to the user Junio C Hamano
@ 2015-11-04 18:08  7%   ` Stefan Beller
  2015-11-04 18:17  4%     ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-04 18:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine,
	Johannes Sixt

On Wed, Nov 4, 2015 at 9:54 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Where does it apply?
>> ---
>> This series applies on top of d075d2604c0f92045caa8d5bb6ab86cf4921a4ae (Merge
>> branch 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update) and replaces
>> the previous patches in sb/submodule-parallel-update
>>
>> What does it do?
>> ---
>> This series should finish the on going efforts of parallelizing
>> submodule network traffic. The patches contain tests for clone,
>> fetch and submodule update to use the actual parallelism both via
>> command line as well as a configured option. I decided to go with
>> "submodule.jobs" for all three for now.
>
> The order of patches and where the series builds makes me suspect
> that I have been expecting too much from the "parallel-fetch" topic.
>
> I've been hoping that it would be useful for the project as a whole
> to polish the other topic and make it available to wider audience
> sooner by itself (both from "end users get improved Git early"
> aspect and from "the core machinery to be reused in follow-up
> improvements are made closer to perfection sooner" perspective).  So
> I've been expecting that "Let's fix it on Windows" change directly
> on top of sb/submodule-parallel-fetch to make that topic usable
> before everything else.

I can resend the patches on top of sb/submodule-parallel-fetch
(though looking at sb/submodule-parallel-fetch..d075d2604c0f920
[Merge branch 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update]
I don't expect conflicts, so it would be a verbatim resend)


> Other patches in this series may require
> the child_process_cleanup() change, so they may be applied on top of
> the merge between sb/submodule-parallel-fetch (updated for Windows)
> and rs/daemon-plug-child-leak topic.

I assumed the rs/daemon-plug-child-leak topic is no feature, but cleanup.
Which is why I would have expected a sb/submodule-parallel-fetch-for-windows
pointing at maybe the third patch of the series on top of
rs/daemon-plug-child-leak

>
> That does not seem to be what's happening here (note: I am not
> complaining; I am just trying to make sure expectation matches
> reality).  Am I reading you correctly?

I really wanted to send out just one series, my bad.
The ordering made sense to me (first the run-command related fixes
and then the new features in later patches)

>
> I think sb/submodule-parallel-fetch + sb/submodule-parallel-update
> as a single topic would need more time to mature to be in a tagged
> release than we have in the remainder of this cycle.

I agree on that.

>  It is likely
> that the former topic has a chance to get rebased after 2.7 happens.
> And that would allow us to (1) use the child_process_cleanup() from
> get-go instead of _deinit and to (2) get the machinery right both
> for UNIX and Windows from get-go.  Which would make the result
> easier to understand.  As this is one of the more important areas,
> it matters to keep the resulting code and the rationale behind it
> understandable by reading "log --reverse -p".

So you are saying that reading the Windows cleanup patch
before the s/deinit/clear/ Patch by Rene makes it way easier to understand?
Which is why you would prefer another history. (Merging an updated
sb/submodule-parallel-fetch again to  rs/daemon-plug-child-leak or even
sb/submodule-parallel-update)

Thanks,
Stefan

^ permalink raw reply	[relevance 7%]

* Re: [PATCHv3 02/11] run-command: report failure for degraded output just once
  @ 2015-11-04 18:14  4%   ` Junio C Hamano
  2015-11-04 20:14  2%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-11-04 18:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, j6t

Stefan Beller <sbeller@google.com> writes:

> The warning message is cluttering the output itself,
> so just report it once.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  run-command.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index 7c00c21..3ae563f 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1012,13 +1012,21 @@ static void pp_cleanup(struct parallel_processes *pp)
>  
>  static void set_nonblocking(int fd)
>  {
> +	static int reported_degrade = 0;
>  	int flags = fcntl(fd, F_GETFL);
> -	if (flags < 0)
> -		warning("Could not get file status flags, "
> -			"output will be degraded");
> -	else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK))
> -		warning("Could not set file status flags, "
> -			"output will be degraded");
> +	if (flags < 0) {
> +		if (!reported_degrade) {
> +			warning("Could not get file status flags, "
> +				"output will be degraded");
> +			reported_degrade = 1;
> +		}
> +	} else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK)) {
> +		if (!reported_degrade) {
> +			warning("Could not set file status flags, "
> +				"output will be degraded");
> +			reported_degrade = 1;
> +		}
> +	}
>  }

Imagine that we are running two things A and B at the same time.  We
ask poll(2) and it says both A and B have some data ready to be
read, and we try to read from A.  strbuf_read_once() would try to
read up to 8K, relying on the fact that you earlier set the IO to be
nonblock.  It will get stuck reading from A without allowing output
from B to drain.  B's write may get stuck because we are not reading
from it, and would cause B to stop making progress.

What if the other sides of the connection from A and B are talking
with each other, and B's non-progress caused the processing for A on
the other side of the connection to block, causing it not to produce
more output to allow us to make progress reading from A (so that
eventually we can give B a chance to drain its output)?  Imagine A
and B are pushes to the same remote, B may be pushing a change to a
submodule while A may be pushing a matching change to its
superproject, and the server may be trying to make sure that the
submodule update completes and updates the ref before making the
superproject's tree that binds that updated submodule's commit
availble, for example?  Can we make any progress from that point?

I am not convinced that the failure to set nonblock IO is merely
"output will be degraded".  It feels more like a fatal error if we
are driving more than one task at the same time.

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv3 00/11] Expose the submodule parallelism to the user
  2015-11-04 18:08  7%   ` Stefan Beller
@ 2015-11-04 18:17  4%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-11-04 18:17 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine,
	Johannes Sixt

Stefan Beller <sbeller@google.com> writes:

> So you are saying that reading the Windows cleanup patch
> before the s/deinit/clear/ Patch by Rene makes it way easier to understand?

No.

The run-parallel API added in parallel-fetch that needs to be fixed
up (because the topic is in 'next', my bad merging prematurely) with
a separate "oops that was not friendly to Windows" is the primary
concern I have for those who later want to learn how it was designed
by going through "log --reverse -p".

^ permalink raw reply	[relevance 4%]

* Re: [PATCHv3 02/11] run-command: report failure for degraded output just once
  2015-11-04 18:14  4%   ` Junio C Hamano
@ 2015-11-04 20:14  2%     ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04 20:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Jonathan Nieder, Johannes Schindelin, Jens Lehmann, Eric Sunshine,
	Johannes Sixt

On Wed, Nov 4, 2015 at 10:14 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> The warning message is cluttering the output itself,
>> so just report it once.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  run-command.c | 20 ++++++++++++++------
>>  1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/run-command.c b/run-command.c
>> index 7c00c21..3ae563f 100644
>> --- a/run-command.c
>> +++ b/run-command.c
>> @@ -1012,13 +1012,21 @@ static void pp_cleanup(struct parallel_processes *pp)
>>
>>  static void set_nonblocking(int fd)
>>  {
>> +     static int reported_degrade = 0;
>>       int flags = fcntl(fd, F_GETFL);
>> -     if (flags < 0)
>> -             warning("Could not get file status flags, "
>> -                     "output will be degraded");
>> -     else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK))
>> -             warning("Could not set file status flags, "
>> -                     "output will be degraded");
>> +     if (flags < 0) {
>> +             if (!reported_degrade) {
>> +                     warning("Could not get file status flags, "
>> +                             "output will be degraded");
>> +                     reported_degrade = 1;
>> +             }
>> +     } else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK)) {
>> +             if (!reported_degrade) {
>> +                     warning("Could not set file status flags, "
>> +                             "output will be degraded");
>> +                     reported_degrade = 1;
>> +             }
>> +     }
>>  }
>
> Imagine that we are running two things A and B at the same time.  We
> ask poll(2) and it says both A and B have some data ready to be
> read, and we try to read from A.  strbuf_read_once() would try to
> read up to 8K, relying on the fact that you earlier set the IO to be
> nonblock.  It will get stuck reading from A without allowing output
> from B to drain.  B's write may get stuck because we are not reading
> from it, and would cause B to stop making progress.
>
> What if the other sides of the connection from A and B are talking
> with each other,

I am not sure if we want to allow this ever. How would that work with
jobs==1? How do we guarantee to have A and B running at the same time?
In a later version of the parallel processing we may have some other ramping
up mechanisms, such as: "First run only one process until it outputted at least
250 bytes", which would also produce such a lock. So instead a time based ramp
up may be better. But my general concern is how much guarantees are we selling
here? Maybe the documentation needs to explicitly state that we cannot talk to
each or at least should assume the blocking of stdout/err.

> and B's non-progress caused the processing for A on
> the other side of the connection to block, causing it not to produce
> more output to allow us to make progress reading from A (so that
> eventually we can give B a chance to drain its output)?  Imagine A
> and B are pushes to the same remote, B may be pushing a change to a
> submodule while A may be pushing a matching change to its
> superproject, and the server may be trying to make sure that the
> submodule update completes and updates the ref before making the
> superproject's tree that binds that updated submodule's commit
> availble, for example?  Can we make any progress from that point?
>
> I am not convinced that the failure to set nonblock IO is merely
> "output will be degraded".  It feels more like a fatal error if we
> are driving more than one task at the same time.
>

Another approach would be to test if we can set to non blocking and if
that is not possible, do not buffer it, but redirect the subcommand
directly to stderr of the calling process.

    if (set_nonblocking(pp->children[i].process.err) < 0) {
        pp->children[i].process.err = 2;
        degraded_parallelism = 1;
    }

and once we observe the degraded_parallelism flag, we can only
schedule a maximum of one job at a time, having direct output?

^ permalink raw reply	[relevance 2%]

* [PATCH 0/2] Missing O_NONBLOCK under Windows (was: git.git as of tonight)
  @ 2015-11-04 22:43  7% ` Stefan Beller
  2015-11-04 22:43  5%   ` [PATCH 1/2] run-parallel: rename set_nonblocking to set_nonblocking_or_die Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-04 22:43 UTC (permalink / raw)
  To: sbeller; +Cc: tboegi, j6t, gitster, git, johannes.schindelin

The first patch is a general fixup as per discussion,
the second patch will make Git compile in Windows again (hopefully, not tested)

The number of #ifdefs seems acceptable to me, opinions on that?

This has been developed on top of d075d2604c0f9 (Merge branch 'rs/daemon-plug-child-leak'
into sb/submodule-parallel-update), but should also apply on top of origin/sb/submodule-parallel-fetch

Thanks,
Stefan

Stefan Beller (2):
  run-parallel: rename set_nonblocking to set_nonblocking_or_die
  run-parallel: Run sequential if nonblocking I/O is unavailable

 run-command.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply	[relevance 7%]

* [PATCH 1/2] run-parallel: rename set_nonblocking to set_nonblocking_or_die
  2015-11-04 22:43  7% ` [PATCH 0/2] Missing " Stefan Beller
@ 2015-11-04 22:43  5%   ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-04 22:43 UTC (permalink / raw)
  To: sbeller; +Cc: tboegi, j6t, gitster, git, johannes.schindelin

Discussion turned out a warning is not enough, but we need to die in case
of blocking output as we can lockup the child processes.

Junio wrote:
> Imagine that we are running two things A and B at the same time.  We
> ask poll(2) and it says both A and B have some data ready to be
> read, and we try to read from A.  strbuf_read_once() would try to
> read up to 8K, relying on the fact that you earlier set the IO to be
> nonblock.  It will get stuck reading from A without allowing output
> from B to drain.  B's write may get stuck because we are not reading
> from it, and would cause B to stop making progress.

> What if the other sides of the connection from A and B are talking
> with each other, and B's non-progress caused the processing for A on
> the other side of the connection to block, causing it not to produce
> more output to allow us to make progress reading from A (so that
> eventually we can give B a chance to drain its output)?  Imagine A
> and B are pushes to the same remote, B may be pushing a change to a
> submodule while A may be pushing a matching change to its
> superproject, and the server may be trying to make sure that the
> submodule update completes and updates the ref before making the
> superproject's tree that binds that updated submodule's commit
> availble, for example?  Can we make any progress from that point?

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/run-command.c b/run-command.c
index 0a3c24e..86fbe50 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1006,15 +1006,13 @@ static void pp_cleanup(struct parallel_processes *pp)
 	sigchain_pop_common();
 }
 
-static void set_nonblocking(int fd)
+static void set_nonblocking_or_die(int fd)
 {
 	int flags = fcntl(fd, F_GETFL);
 	if (flags < 0)
-		warning("Could not get file status flags, "
-			"output will be degraded");
+		die("Could not get file status flags");
 	else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK))
-		warning("Could not set file status flags, "
-			"output will be degraded");
+		die("Could not set file status flags");
 }
 
 /* returns
@@ -1052,7 +1050,7 @@ static int pp_start_one(struct parallel_processes *pp)
 		return code ? -1 : 1;
 	}
 
-	set_nonblocking(pp->children[i].process.err);
+	set_nonblocking_or_die(pp->children[i].process.err);
 
 	pp->nr_processes++;
 	pp->children[i].in_use = 1;
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 5%]

* [PATCH 0/2] Remove non-blocking fds from run-command.
@ 2015-11-05 18:17  7% Stefan Beller
    0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-05 18:17 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, johannes.schindelin, Jens.Lehmann, ericsunshine,
	tboegi, j6t, Stefan Beller

So as far as I understand, all of the discussion participants (Torsten, Jeff,
Junio and me) are convinced we don't need the non-blocking feature. So remove it.

I developed it on top of d075d2604c0 (Merge branch 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update)
but AFAICT it also applies to sb/submodule-parallel-fetch.

This will fix compilation in Windows without any platform specific hacks.

Thanks,
Stefan

Stefan Beller (2):
  run-command: Remove set_nonblocking
  strbuf: Correct documentation for strbuf_read_once

 run-command.c | 13 -------------
 strbuf.h      |  3 +--
 2 files changed, 1 insertion(+), 15 deletions(-)

-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 1/2] run-command: Remove set_nonblocking
  @ 2015-11-05 18:45  4%   ` Junio C Hamano
  2015-11-05 19:22  2%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2015-11-05 18:45 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, peff, johannes.schindelin, Jens.Lehmann, ericsunshine,
	tboegi, j6t

Stefan Beller <sbeller@google.com> writes:

> strbuf_read_once can also operate on blocking file descriptors if we are
> sure they are ready. The poll (2) command however makes sure this is the
> case.
>
> Reading the manual for poll (2), there may be spurious returns indicating
> readiness but that is for network sockets only. Pipes should be unaffected.

Given the presence of "for example" in that bug section, I wouldn't
say "only" or "should be unaffected".

> By having this patch, we rely on the correctness of poll to return
> only pipes ready to read.

We rely on two things.  One is for poll to return only pipes that are 
non-empty.  The other is for read from a non-empty pipe not to block.

>
> This fixes compilation in Windows.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---

Thanks.  Let's apply these fixes on sb/submodule-parallel-fetch,
merge the result to 'next' and have people play with it.

>  run-command.c | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index 0a3c24e..51d078c 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1006,17 +1006,6 @@ static void pp_cleanup(struct parallel_processes *pp)
>  	sigchain_pop_common();
>  }
>  
> -static void set_nonblocking(int fd)
> -{
> -	int flags = fcntl(fd, F_GETFL);
> -	if (flags < 0)
> -		warning("Could not get file status flags, "
> -			"output will be degraded");
> -	else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK))
> -		warning("Could not set file status flags, "
> -			"output will be degraded");
> -}
> -
>  /* returns
>   *  0 if a new task was started.
>   *  1 if no new jobs was started (get_next_task ran out of work, non critical
> @@ -1052,8 +1041,6 @@ static int pp_start_one(struct parallel_processes *pp)
>  		return code ? -1 : 1;
>  	}
>  
> -	set_nonblocking(pp->children[i].process.err);
> -
>  	pp->nr_processes++;
>  	pp->children[i].in_use = 1;
>  	pp->pfd[i].fd = pp->children[i].process.err;

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 1/2] run-command: Remove set_nonblocking
  2015-11-05 18:45  4%   ` Junio C Hamano
@ 2015-11-05 19:22  2%     ` Stefan Beller
  2015-11-05 19:37  3%       ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-05 19:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jeff King, Johannes Schindelin, Jens Lehmann,
	Eric Sunshine, Torsten Bögershausen, Johannes Sixt

On Thu, Nov 5, 2015 at 10:45 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> strbuf_read_once can also operate on blocking file descriptors if we are
>> sure they are ready. The poll (2) command however makes sure this is the
>> case.
>>
>> Reading the manual for poll (2), there may be spurious returns indicating
>> readiness but that is for network sockets only. Pipes should be unaffected.
>
> Given the presence of "for example" in that bug section, I wouldn't
> say "only" or "should be unaffected".

Reading the documentation we are in agreement, that we expect
no spurious returns, no?

>
>> By having this patch, we rely on the correctness of poll to return
>> only pipes ready to read.
>
> We rely on two things.  One is for poll to return only pipes that are
> non-empty.  The other is for read from a non-empty pipe not to block.

That's what I meant with 'pipe being ready'.

>
>>
>> This fixes compilation in Windows.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>
> Thanks.  Let's apply these fixes on sb/submodule-parallel-fetch,
> merge the result to 'next' and have people play with it.

Maybe the commit message was weakly crafted. Do you want me to resend?

>
>>  run-command.c | 13 -------------
>>  1 file changed, 13 deletions(-)
>>
>> diff --git a/run-command.c b/run-command.c
>> index 0a3c24e..51d078c 100644
>> --- a/run-command.c
>> +++ b/run-command.c
>> @@ -1006,17 +1006,6 @@ static void pp_cleanup(struct parallel_processes *pp)
>>       sigchain_pop_common();
>>  }
>>
>> -static void set_nonblocking(int fd)
>> -{
>> -     int flags = fcntl(fd, F_GETFL);
>> -     if (flags < 0)
>> -             warning("Could not get file status flags, "
>> -                     "output will be degraded");
>> -     else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK))
>> -             warning("Could not set file status flags, "
>> -                     "output will be degraded");
>> -}
>> -
>>  /* returns
>>   *  0 if a new task was started.
>>   *  1 if no new jobs was started (get_next_task ran out of work, non critical
>> @@ -1052,8 +1041,6 @@ static int pp_start_one(struct parallel_processes *pp)
>>               return code ? -1 : 1;
>>       }
>>
>> -     set_nonblocking(pp->children[i].process.err);
>> -
>>       pp->nr_processes++;
>>       pp->children[i].in_use = 1;
>>       pp->pfd[i].fd = pp->children[i].process.err;

^ permalink raw reply	[relevance 2%]

* Re: [PATCH 1/2] run-command: Remove set_nonblocking
  2015-11-05 19:22  2%     ` Stefan Beller
@ 2015-11-05 19:37  3%       ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2015-11-05 19:37 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jeff King, Johannes Schindelin, Jens Lehmann,
	Eric Sunshine, Torsten Bögershausen, Johannes Sixt

Stefan Beller <sbeller@google.com> writes:

> On Thu, Nov 5, 2015 at 10:45 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> strbuf_read_once can also operate on blocking file descriptors if we are
>>> sure they are ready. The poll (2) command however makes sure this is the
>>> case.
>>>
>>> Reading the manual for poll (2), there may be spurious returns indicating
>>> readiness but that is for network sockets only. Pipes should be unaffected.
>>
>> Given the presence of "for example" in that bug section, I wouldn't
>> say "only" or "should be unaffected".
>
> Reading the documentation we are in agreement, that we expect
> no spurious returns, no?

Given the presence of "for example" in that bug section, I wouldn't
say "only" or "should be unaffected".  I cannot say "we expect no
spurious returns".

>> Thanks.  Let's apply these fixes on sb/submodule-parallel-fetch,
>> merge the result to 'next' and have people play with it.
>
> Maybe the commit message was weakly crafted. Do you want me to resend?

I somehow feel that it is prudent to let this cook just above 'next'
for a few days (not just for the log message but to verify the
strategy and wait for others to come up with even better ideas), but
then I'll be offline starting next week, so I expect that merging
the final version to 'next' will be done by our interim maintainer,
which means we still have time to polish ;-)

Here is what I queued for now.

-- >8 --
From: Stefan Beller <sbeller@google.com>
Date: Thu, 5 Nov 2015 10:17:18 -0800
Subject: [PATCH] run-command: remove set_nonblocking()

strbuf_read_once can also operate on blocking file descriptors if we
are sure they are ready.  And the poll(2) we call before calling
this ensures that this is the case.

Reading the manual for poll(2), there may be spurious returns
indicating readiness but that is for network sockets only and pipes
should be unaffected.

With this change, we rely on

 - poll(2) returns only non-empty pipes; and
 - read(2) on a non-empty pipe does not block.

This should fix compilation on Windows.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 run-command.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/run-command.c b/run-command.c
index 1fbd286..07424e9 100644
--- a/run-command.c
+++ b/run-command.c
@@ -996,17 +996,6 @@ static void pp_cleanup(struct parallel_processes *pp)
 	sigchain_pop_common();
 }
 
-static void set_nonblocking(int fd)
-{
-	int flags = fcntl(fd, F_GETFL);
-	if (flags < 0)
-		warning("Could not get file status flags, "
-			"output will be degraded");
-	else if (fcntl(fd, F_SETFL, flags | O_NONBLOCK))
-		warning("Could not set file status flags, "
-			"output will be degraded");
-}
-
 /* returns
  *  0 if a new task was started.
  *  1 if no new jobs was started (get_next_task ran out of work, non critical
@@ -1042,8 +1031,6 @@ static int pp_start_one(struct parallel_processes *pp)
 		return code ? -1 : 1;
 	}
 
-	set_nonblocking(pp->children[i].process.err);
-
 	pp->nr_processes++;
 	pp->children[i].in_use = 1;
 	pp->pfd[i].fd = pp->children[i].process.err;
-- 
2.6.2-539-g1c5cd50

^ permalink raw reply related	[relevance 3%]

* [PATCH] run-command: detect finished children by closed pipe rather than waitpid
@ 2015-11-06 23:48  7% Stefan Beller
  2015-11-07  9:01  2% ` Johannes Sixt
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-06 23:48 UTC (permalink / raw)
  To: git, j6t
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, Stefan Beller

Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid.
As waitpid is not fully working in Windows, this is an approach which
allows for better cross platform operation. (It's less code, too)

Previously we did not close the read pipe of finished children, which we
do now.

The old way missed some messages on an early abort. We just killed the
children and did not bother to look what was left over. With this approach
we'd send a signal to the children and wait for them to close the pipe to
have all the messages (including possible "killed by signal 15" messages).

To have the test suite passing as before, we allow for real graceful
abortion now. In case the user wishes to abort parallel execution
the user needs to provide either the signal used to kill all children
or the children are let run until they finish normally.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Hi,
 
 this applis on top of origin/sb/submodule-parallel-fetch,
 making Windows folks possibly even more happy. It makes the code easier
 to read and has less races on cleaning up a terminated child.
 
 It follows the idea of Johannes patch, instead of encoding information in .err
 I removed the in_use flag and added a state, currently having 3 states.
 
 Thanks,
 Stefan

 Johannes schrieb:
 > First let me say that I find it very questionable that the callbacks
 > receive a struct child_process.
 
 I tried to get rid of the child_process struct in the callbacks, but that's
 not as easy as one may think. The submodule fetch code looks like this:
 
 get_next_submodule(..) {
        ...
        child_process_init(cp);
        cp->dir = strbuf_detach(&submodule_path, NULL);
        cp->env = local_repo_env;
        cp->git_cmd = 1;
        if (!spf->quiet)
                strbuf_addf(err, "Fetching submodule %s%s\n",
                            spf->prefix, ce->name);
        argv_array_init(&cp->args);
        argv_array_pushv(&cp->args, spf->args.argv);
        argv_array_push(&cp->args, default_argv);
        argv_array_push(&cp->args, "--submodule-prefix");
        argv_array_push(&cp->args, submodule_prefix.buf);
 }
 
 So we need to have access to the args, and git_cmd, environment variables
 and the directory where to run the command in. (So quite a lot of the
 internals already)
 

 run-command.c      | 140 +++++++++++++++++++++++------------------------------
 run-command.h      |  12 +++--
 submodule.c        |   3 --
 test-run-command.c |   3 --
 4 files changed, 69 insertions(+), 89 deletions(-)

diff --git a/run-command.c b/run-command.c
index 1fbd286..cf17baf 100644
--- a/run-command.c
+++ b/run-command.c
@@ -858,6 +858,12 @@ int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
 	return finish_command(cmd);
 }
 
+enum child_state {
+	FREE,
+	WORKING,
+	WAIT_CLEANUP,
+};
+
 static struct parallel_processes {
 	void *data;
 
@@ -869,7 +875,7 @@ static struct parallel_processes {
 	task_finished_fn task_finished;
 
 	struct {
-		unsigned in_use : 1;
+		enum child_state state;
 		struct child_process process;
 		struct strbuf err;
 		void *data;
@@ -923,7 +929,7 @@ static void kill_children(struct parallel_processes *pp, int signo)
 	int i, n = pp->max_processes;
 
 	for (i = 0; i < n; i++)
-		if (pp->children[i].in_use)
+		if (pp->children[i].state == WORKING)
 			kill(pp->children[i].process.pid, signo);
 }
 
@@ -967,7 +973,7 @@ static struct parallel_processes *pp_init(int n,
 	for (i = 0; i < n; i++) {
 		strbuf_init(&pp->children[i].err, 0);
 		child_process_init(&pp->children[i].process);
-		pp->pfd[i].events = POLLIN;
+		pp->pfd[i].events = POLLIN | POLLHUP;
 		pp->pfd[i].fd = -1;
 	}
 	sigchain_push_common(handle_children_on_signal);
@@ -1011,41 +1017,48 @@ static void set_nonblocking(int fd)
  *  0 if a new task was started.
  *  1 if no new jobs was started (get_next_task ran out of work, non critical
  *    problem with starting a new command)
- * -1 no new job was started, user wishes to shutdown early.
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *    to signal the children.
  */
 static int pp_start_one(struct parallel_processes *pp)
 {
-	int i;
+	int i, code;
 
 	for (i = 0; i < pp->max_processes; i++)
-		if (!pp->children[i].in_use)
+		if (pp->children[i].state == FREE)
 			break;
 	if (i == pp->max_processes)
 		die("BUG: bookkeeping is hard");
 
-	if (!pp->get_next_task(&pp->children[i].data,
-			       &pp->children[i].process,
-			       &pp->children[i].err,
-			       pp->data)) {
+	code = pp->get_next_task(&pp->children[i].data,
+				 &pp->children[i].process,
+				 &pp->children[i].err,
+				 pp->data);
+	if (!code) {
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
 		return 1;
 	}
+	pp->children[i].process.err = -1;
+	pp->children[i].process.stdout_to_stderr = 1;
+	pp->children[i].process.no_stdin = 1;
 
 	if (start_command(&pp->children[i].process)) {
-		int code = pp->start_failure(&pp->children[i].process,
-					     &pp->children[i].err,
-					     pp->data,
-					     &pp->children[i].data);
+		code = pp->start_failure(&pp->children[i].process,
+					 &pp->children[i].err,
+					 pp->data,
+					 &pp->children[i].data);
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
-		return code ? -1 : 1;
+		if (code)
+			pp->shutdown = 1;
+		return code;
 	}
 
 	set_nonblocking(pp->children[i].process.err);
 
 	pp->nr_processes++;
-	pp->children[i].in_use = 1;
+	pp->children[i].state = WORKING;
 	pp->pfd[i].fd = pp->children[i].process.err;
 	return 0;
 }
@@ -1063,19 +1076,24 @@ static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
 
 	/* Buffer output from all pipes. */
 	for (i = 0; i < pp->max_processes; i++) {
-		if (pp->children[i].in_use &&
-		    pp->pfd[i].revents & POLLIN)
-			if (strbuf_read_once(&pp->children[i].err,
-					     pp->children[i].process.err, 0) < 0)
+		if (pp->children[i].state == WORKING &&
+		    pp->pfd[i].revents & (POLLIN | POLLHUP)) {
+			int n = strbuf_read_once(&pp->children[i].err,
+						 pp->children[i].process.err, 0);
+			if (n == 0) {
+				close(pp->children[i].process.err);
+				pp->children[i].state = WAIT_CLEANUP;
+			} else if (n < 0)
 				if (errno != EAGAIN)
 					die_errno("read");
+		}
 	}
 }
 
 static void pp_output(struct parallel_processes *pp)
 {
 	int i = pp->output_owner;
-	if (pp->children[i].in_use &&
+	if (pp->children[i].state == WORKING &&
 	    pp->children[i].err.len) {
 		fputs(pp->children[i].err.buf, stderr);
 		strbuf_reset(&pp->children[i].err);
@@ -1084,68 +1102,30 @@ static void pp_output(struct parallel_processes *pp)
 
 static int pp_collect_finished(struct parallel_processes *pp)
 {
-	int i = 0;
-	pid_t pid;
-	int wait_status, code;
+	int i, code;
 	int n = pp->max_processes;
 	int result = 0;
 
 	while (pp->nr_processes > 0) {
-		pid = waitpid(-1, &wait_status, WNOHANG);
-		if (pid == 0)
-			break;
-
-		if (pid < 0)
-			die_errno("wait");
-
 		for (i = 0; i < pp->max_processes; i++)
-			if (pp->children[i].in_use &&
-			    pid == pp->children[i].process.pid)
+			if (pp->children[i].state == WAIT_CLEANUP)
 				break;
 		if (i == pp->max_processes)
-			die("BUG: found a child process we were not aware of");
-
-		if (strbuf_read(&pp->children[i].err,
-				pp->children[i].process.err, 0) < 0)
-			die_errno("strbuf_read");
-
-		if (WIFSIGNALED(wait_status)) {
-			code = WTERMSIG(wait_status);
-			if (!pp->shutdown &&
-			    code != SIGINT && code != SIGQUIT)
-				strbuf_addf(&pp->children[i].err,
-					    "%s died of signal %d",
-					    pp->children[i].process.argv[0],
-					    code);
-			/*
-			 * This return value is chosen so that code & 0xff
-			 * mimics the exit code that a POSIX shell would report for
-			 * a program that died from this signal.
-			 */
-			code += 128;
-		} else if (WIFEXITED(wait_status)) {
-			code = WEXITSTATUS(wait_status);
-			/*
-			 * Convert special exit code when execvp failed.
-			 */
-			if (code == 127) {
-				code = -1;
-				errno = ENOENT;
-			}
-		} else {
-			strbuf_addf(&pp->children[i].err,
-				    "waitpid is confused (%s)",
-				    pp->children[i].process.argv[0]);
-			code = -1;
-		}
+			break;
+
+		code = finish_command(&pp->children[i].process);
 
-		if (pp->task_finished(code, &pp->children[i].process,
-				      &pp->children[i].err, pp->data,
-				      &pp->children[i].data))
-			result = 1;
+		code = pp->task_finished(code, &pp->children[i].process,
+					 &pp->children[i].err, pp->data,
+					 &pp->children[i].data);
+
+		if (code)
+			result = code;
+		if (code < 0)
+			break;
 
 		pp->nr_processes--;
-		pp->children[i].in_use = 0;
+		pp->children[i].state = FREE;
 		pp->pfd[i].fd = -1;
 		child_process_deinit(&pp->children[i].process);
 		child_process_init(&pp->children[i].process);
@@ -1170,7 +1150,7 @@ static int pp_collect_finished(struct parallel_processes *pp)
 			 * running process time.
 			 */
 			for (i = 0; i < n; i++)
-				if (pp->children[(pp->output_owner + i) % n].in_use)
+				if (pp->children[(pp->output_owner + i) % n].state == WORKING)
 					break;
 			pp->output_owner = (pp->output_owner + i) % n;
 		}
@@ -1184,7 +1164,7 @@ int run_processes_parallel(int n,
 			   task_finished_fn task_finished,
 			   void *pp_cb)
 {
-	int i;
+	int i, code;
 	int output_timeout = 100;
 	int spawn_cap = 4;
 	struct parallel_processes *pp;
@@ -1195,12 +1175,12 @@ int run_processes_parallel(int n,
 		    i < spawn_cap && !pp->shutdown &&
 		    pp->nr_processes < pp->max_processes;
 		    i++) {
-			int code = pp_start_one(pp);
+			code = pp_start_one(pp);
 			if (!code)
 				continue;
 			if (code < 0) {
 				pp->shutdown = 1;
-				kill_children(pp, SIGTERM);
+				kill_children(pp, -code);
 			}
 			break;
 		}
@@ -1208,9 +1188,11 @@ int run_processes_parallel(int n,
 			break;
 		pp_buffer_stderr(pp, output_timeout);
 		pp_output(pp);
-		if (pp_collect_finished(pp)) {
-			kill_children(pp, SIGTERM);
+		code = pp_collect_finished(pp);
+		if (code) {
 			pp->shutdown = 1;
+			if (code < 0)
+				kill_children(pp, -code);
 		}
 	}
 
diff --git a/run-command.h b/run-command.h
index c24aa54..414cc81 100644
--- a/run-command.h
+++ b/run-command.h
@@ -134,6 +134,8 @@ int finish_async(struct async *async);
  *
  * Return 1 if the next child is ready to run.
  * Return 0 if there are currently no more tasks to be processed.
+ * To send a signal to other child processes for abortion,
+ * return negative signal code.
  */
 typedef int (*get_next_task_fn)(void **pp_task_cb,
 				struct child_process *cp,
@@ -151,8 +153,9 @@ typedef int (*get_next_task_fn)(void **pp_task_cb,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing. To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*start_failure_fn)(struct child_process *cp,
 				struct strbuf *err,
@@ -169,8 +172,9 @@ typedef int (*start_failure_fn)(struct child_process *cp,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing.  To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*task_finished_fn)(int result,
 				struct child_process *cp,
diff --git a/submodule.c b/submodule.c
index c21b265..281bccd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -689,9 +689,6 @@ static int get_next_submodule(void **task_cb, struct child_process *cp,
 			cp->dir = strbuf_detach(&submodule_path, NULL);
 			cp->env = local_repo_env;
 			cp->git_cmd = 1;
-			cp->no_stdin = 1;
-			cp->stdout_to_stderr = 1;
-			cp->err = -1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
 					    spf->prefix, ce->name);
diff --git a/test-run-command.c b/test-run-command.c
index 13e5d44..b1f04d1 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -26,9 +26,6 @@ static int parallel_next(void** task_cb,
 		return 0;
 
 	argv_array_pushv(&cp->args, d->argv);
-	cp->stdout_to_stderr = 1;
-	cp->no_stdin = 1;
-	cp->err = -1;
 	strbuf_addf(err, "preloaded output of a child\n");
 	number_callbacks++;
 	return 1;
-- 
2.6.1.247.ge8f2a41.dirty

^ permalink raw reply related	[relevance 7%]

* Re: [PATCH] run-command: detect finished children by closed pipe rather than waitpid
  2015-11-06 23:48  7% [PATCH] run-command: detect finished children by closed pipe rather than waitpid Stefan Beller
@ 2015-11-07  9:01  2% ` Johannes Sixt
  0 siblings, 0 replies; 200+ results
From: Johannes Sixt @ 2015-11-07  9:01 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine

Am 07.11.2015 um 00:48 schrieb Stefan Beller:
> Detect if a child stopped working by checking if their stderr pipe
> was closed instead of checking their state with waitpid.
> As waitpid is not fully working in Windows, this is an approach which
> allows for better cross platform operation. (It's less code, too)
>
> Previously we did not close the read pipe of finished children, which we
> do now.
>
> The old way missed some messages on an early abort. We just killed the
> children and did not bother to look what was left over. With this approach
> we'd send a signal to the children and wait for them to close the pipe to
> have all the messages (including possible "killed by signal 15" messages).
>
> To have the test suite passing as before, we allow for real graceful
> abortion now. In case the user wishes to abort parallel execution
> the user needs to provide either the signal used to kill all children
> or the children are let run until they finish normally.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>   Hi,
>
>   this applis on top of origin/sb/submodule-parallel-fetch,
>   making Windows folks possibly even more happy. It makes the code easier
>   to read and has less races on cleaning up a terminated child.
>
>   It follows the idea of Johannes patch, instead of encoding information in .err
>   I removed the in_use flag and added a state, currently having 3 states.
>
>   Thanks,
>   Stefan
>
>   Johannes schrieb:
>   > First let me say that I find it very questionable that the callbacks
>   > receive a struct child_process.
>
>   I tried to get rid of the child_process struct in the callbacks, but that's
>   not as easy as one may think.

Fair enough. I see you removed .err, .no_stdin and .stdout_to_stderr 
from the callback. Good.

>   		pp->nr_processes--;
> -		pp->children[i].in_use = 0;
> +		pp->children[i].state = FREE;
>   		pp->pfd[i].fd = -1;
>   		child_process_deinit(&pp->children[i].process);

This cleanup is implied by finish_command and can be removed.

>   		child_process_init(&pp->children[i].process);

> @@ -1195,12 +1175,12 @@ int run_processes_parallel(int n,
>   		    i < spawn_cap && !pp->shutdown &&
>   		    pp->nr_processes < pp->max_processes;
>   		    i++) {
> -			int code = pp_start_one(pp);
> +			code = pp_start_one(pp);
>   			if (!code)
>   				continue;
>   			if (code < 0) {
>   				pp->shutdown = 1;
> -				kill_children(pp, SIGTERM);
> +				kill_children(pp, -code);

I'll see what this means for our kill emulation on Windows. Currently, 
we handle only SIGTERM.

>   			}
>   			break;
>   		}

Thanks you very much!

-- Hannes

^ permalink raw reply	[relevance 2%]

* Re: [PATCH v4] Add git-grep threads param
  @ 2015-11-09 18:40  4%                 ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-09 18:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Victor Leschuk, Jeff King, Junio C Hamano, Victor Leschuk,
	git@vger.kernel.org, john@keeping.me.uk

On Mon, Nov 9, 2015 at 9:55 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So stop with the "online_cpus()" stuff. And don't base your benchmarks
> purely on the CPU-bound case. Because the CPU-bound case is the case
> that is already generally so good that few people will care all *that*
> deeply.
>
> Many of the things git does are not for "best-case" behavior, but to
> avoid bad "worst-case" situations. Look at things like the index
> preloading (also threaded). The big win there is - again - when the
> stat() calls may need IO. Sure, it can help for CPU use too, but
> especially on Linux, cached "stat()" calls are really quite cheap. The
> big upside is, again, in situations like git repositories over NFS.
>
> In the CPU-intensive case, the threading might make things go from a
> couple of seconds to half a second. Big deal. You're not getting up to
> get a coffee in either case.

Chiming in here as I have another series in flight doing parallelism.
(Submodules done in parallel including fetching, cloning, checking out)

online_cpus() seems to be one of the easiest ballpark estimates for
the power of a system.

So what I would have liked to use would be some kind of

  parallel_expect_bottleneck(enum kinds);

with kinds being one of (FS, NETWORK, CPU, MEMORY?)
to get an estimate 'good' number to use.

^ permalink raw reply	[relevance 4%]

* Re: Allow git alias to override existing Git commands
  @ 2015-11-10 18:12  4% ` Stefan Beller
    2015-11-10 21:57  5%   ` Jens Lehmann
  0 siblings, 2 replies; 200+ results
From: Stefan Beller @ 2015-11-10 18:12 UTC (permalink / raw)
  To: Jeremy Morton; +Cc: git@vger.kernel.org

On Tue, Nov 10, 2015 at 8:31 AM, Jeremy Morton <admin@game-point.net> wrote:
> It's recently come to my attention that the "git alias" config functionality
> ignores all aliases that would override existing Git commands.  This seems
> like a bad idea to me.

This ensures that the plumbing commands always work as expected.
As scripts *should* only use plumbing commands, the scripts should
work with high probability despite all the crazy user configuration/aliases.

>
> For example, I wanted to setup "git clone" to automatically act as "git
> clone --recursive".  Sure I could do it in the shell, but it's more of a
> pain - any tutorial I set up about doing it would have to worry about what
> shell the user was using - and if you're going to make that argument, why
> have "git alias" at all?  It can all be done from the shell.

I think the git way for your example would be to configure git to include that
option by default, something like

    git config --global submodules.recursiveClone yes

though I was skimming through the man page of git config and did not find
that option there. I guess it's missing.


>
> Obviously I could also use a different alias that wasn't an existing Git
> command for this behaviour, but that would rather defeat the point: I want
> "git clone" to have different functionality.  If I remembered to use a
> different Git command, I might as well remember to type "git clone
> --recursive".  Also, if a future Git command were introduced with the same
> name as my alias, my alias's functionality would suddenly be ignored, giving
> unexpected behaviour.
>
> The reasoning behind this that it's "to avoid confusion and troubles with
> script usage" seems to be at odds with the general Git mentality that the
> user is given lots of power, and if they screw it up it's basically just
> user error.

For scripting the plumbing commands are recommended. The plumbing commands
usually cannot be configured to do crazy stuff.

> For example, Git doesn't *have* to allow you to rebase.  It's a
> potentially dangerous operation, so why is it allowed?  It might "cause
> confusion and troubles".

Git doesn't try to hide its complexity from the users. And if a user would need
to hack their way around to get rebasing working again, might also
"cause confusion
and troubles".

>
> On the other hand, by disallowing the overriding of existing Git commands
> through aliases you are preventing a lot of useful functionality that those
> aliases might be used for.
>
> So I think you should either allow Git aliases to override existing Git
> commands by default, or at least provide a config option that allows the
> user to say that this should happen.
>
> --
> Best regards,
> Jeremy Morton (Jez)
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[relevance 4%]

* Re: Allow git alias to override existing Git commands
  @ 2015-11-10 20:22  4%     ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-10 20:22 UTC (permalink / raw)
  To: Jeremy Morton; +Cc: git@vger.kernel.org

On Tue, Nov 10, 2015 at 12:04 PM, Jeremy Morton <admin@game-point.net> wrote:
> On 10/11/2015 18:12, Stefan Beller wrote:
>>
>> On Tue, Nov 10, 2015 at 8:31 AM, Jeremy Morton<admin@game-point.net>
>> wrote:
>>>
>>> It's recently come to my attention that the "git alias" config
>>> functionality
>>> ignores all aliases that would override existing Git commands.  This
>>> seems
>>> like a bad idea to me.
>>
>>
>> This ensures that the plumbing commands always work as expected.
>> As scripts *should* only use plumbing commands, the scripts should
>> work with high probability despite all the crazy user
>> configuration/aliases.
>>
>
> I just disagree with this.  If a user chooses to override their Git
> commands, it's their problem.  Why should Git care about this?

Because we still have some Git commands (i.e. git submodule) as scripts,
which would break if the user aliases plumbing commands. This is unexpected,
so should be avoided. Maybe we could allow aliasing porcelain commands though,
but that is extra effort, which nobody looked into yet.

> It should
> provide the user with the option to do this, and if the user ruins scripts
> because of their aliases, it is not Git's problem.  What you are doing is
> taking away power from users to use git aliases to their full potential.

Yeah, no user asked for that power I guess, you're the first. :)

As from your initial email, I think before trying to overriding 'clone'
to 'clone --recurse' you'd rather want to have a globally configured
option to recurse by default on invocation of 'clone'.
That sounds saner to me at least.

Stefan

^ permalink raw reply	[relevance 4%]

* Re: Allow git alias to override existing Git commands
  2015-11-10 18:12  4% ` Stefan Beller
  @ 2015-11-10 21:57  5%   ` Jens Lehmann
  2015-11-10 22:49  5%     ` Stefan Beller
  1 sibling, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-10 21:57 UTC (permalink / raw)
  To: Stefan Beller, Jeremy Morton
  Cc: git@vger.kernel.org, Heiko Voigt, Junio C Hamano, Jonathan Nieder

Am 10.11.2015 um 19:12 schrieb Stefan Beller:
> On Tue, Nov 10, 2015 at 8:31 AM, Jeremy Morton <admin@game-point.net> wrote:
>> It's recently come to my attention that the "git alias" config functionality
>> ignores all aliases that would override existing Git commands.  This seems
>> like a bad idea to me.
>
> This ensures that the plumbing commands always work as expected.
> As scripts *should* only use plumbing commands, the scripts should
> work with high probability despite all the crazy user configuration/aliases.

Exactly.

>> For example, I wanted to setup "git clone" to automatically act as "git
>> clone --recursive".  Sure I could do it in the shell, but it's more of a
>> pain - any tutorial I set up about doing it would have to worry about what
>> shell the user was using - and if you're going to make that argument, why
>> have "git alias" at all?  It can all be done from the shell.
>
> I think the git way for your example would be to configure git to include that
> option by default, something like
>
>      git config --global submodules.recursiveClone yes
>
> though I was skimming through the man page of git config and did not find
> that option there. I guess it's missing.

We thought about adding such a config option, but I believe that would
fall a bit short. If I want to have recursive clone I also want to init
all those submodules appearing in later fetches too (otherwise the end
result would depend on whether you cloned before or after a submodule
was added upstream, which is confusing). Extra points for populating
the submodule in my work tree when switching to a commit containing
the new submodule.

So what about a "submodule.autoupdate" config option? If set to true,
all submodules not marked "update=none" would automatically be fetched
and inited by fetch (and thus clone too) and then checked out (with my
recursive update changes) in every work tree manipulating command
(again including clone).

Users who only want the submodules to be present in the work tree but
not automagically updated could set "submodule.autoupdate=clone" to
avoid the extra cost of updating the work tree every time they switch
between commits. Now that Heiko's config-from-commit changes are in
master, someone could easily add that to fetch and clone as the first
step. We could also teach clone to make "submodule.autoupdate=true"
imply --recursive and execute the "git submodule" command to update
the work tree as a first step until the recursive checkout patches
are ready.

Does that make sense?

^ permalink raw reply	[relevance 5%]

* Re: [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option
  2015-11-04  0:37 24% ` [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option Stefan Beller
@ 2015-11-10 22:21  7%   ` Jens Lehmann
  2015-11-10 22:29  8%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-10 22:21 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, ericsunshine, j6t

Am 04.11.2015 um 01:37 schrieb Stefan Beller:
> This allows to configure fetching and updating in parallel
> without having the command line option.
>
> This moved the responsibility to determine how many parallel processes
> to start from builtin/fetch to submodule.c as we need a way to communicate
> "The user did not specify the number of parallel processes in the command
> line options" in the builtin fetch. The submodule code takes care of
> the precedence (CLI > config > default)
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>   Documentation/config.txt    |  7 +++++++
>   builtin/fetch.c             |  2 +-
>   submodule-config.c          | 15 +++++++++++++++
>   submodule-config.h          |  2 ++
>   submodule.c                 |  5 +++++
>   t/t5526-fetch-submodules.sh | 14 ++++++++++++++
>   6 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 391a0c3..70e1b88 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
>   	"--ignore-submodules" option. The 'git submodule' commands are not
>   	affected by this setting.
>
> +submodule.jobs::
> +	This is used to determine how many submodules can be operated on in
> +	parallel. Specifying a positive integer allows up to that number
> +	of submodules being fetched in parallel. This is used in fetch
> +	and clone operations only. A value of 0 will give some reasonable
> +	configuration. It defaults to 1.
> +

Just curious (and sorry if this has already been discussed and I missed
it, but the volume of your output is too much for my current git time
budget ;-): While this config is for fetching only, do I recall correctly
that you have plans to do submodule work tree updates in parallel too?
If so, would it make sense to have different settings for fetching and
updating?

>   tag.sort::
>   	This variable controls the sort ordering of tags when displayed by
>   	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 9cc1c9d..60e6797 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
>   static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
>   static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
>   static int tags = TAGS_DEFAULT, unshallow, update_shallow;
> -static int max_children = 1;
> +static int max_children = -1;
>   static const char *depth;
>   static const char *upload_pack;
>   static struct strbuf default_rla = STRBUF_INIT;
> diff --git a/submodule-config.c b/submodule-config.c
> index 29e21b2..475551a 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -32,6 +32,7 @@ enum lookup_type {
>
>   static struct submodule_cache cache;
>   static int is_cache_init;
> +static int parallel_jobs = -1;
>
>   static int config_path_cmp(const struct submodule_entry *a,
>   			   const struct submodule_entry *b,
> @@ -239,6 +240,15 @@ static int parse_generic_submodule_config(const char *key,
>   					  const char *value,
>   					  struct parse_config_parameter *me)
>   {
> +	if (!strcmp(key, "jobs")) {
> +		parallel_jobs = strtol(value, NULL, 10);
> +		if (parallel_jobs < 0) {
> +			warning("submodule.jobs not allowed to be negative.");
> +			parallel_jobs = 1;
> +			return 1;
> +		}
> +	}
> +
>   	return 0;
>   }
>
> @@ -482,3 +492,8 @@ void submodule_free(void)
>   	cache_free(&cache);
>   	is_cache_init = 0;
>   }
> +
> +int config_parallel_submodules(void)
> +{
> +	return parallel_jobs;
> +}
> diff --git a/submodule-config.h b/submodule-config.h
> index f9e2a29..d9bbf9a 100644
> --- a/submodule-config.h
> +++ b/submodule-config.h
> @@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
>   		const char *path);
>   void submodule_free(void);
>
> +int config_parallel_submodules(void);
> +
>   #endif /* SUBMODULE_CONFIG_H */
> diff --git a/submodule.c b/submodule.c
> index 0257ea3..188ba02 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -752,6 +752,11 @@ int fetch_populated_submodules(const struct argv_array *options,
>   	argv_array_push(&spf.args, "--recurse-submodules-default");
>   	/* default value, "--submodule-prefix" and its value are added later */
>
> +	if (max_parallel_jobs < 0)
> +		max_parallel_jobs = config_parallel_submodules();
> +	if (max_parallel_jobs < 0)
> +		max_parallel_jobs = 1;
> +
>   	calculate_changed_submodule_paths();
>   	run_processes_parallel(max_parallel_jobs,
>   			       get_next_submodule,
> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index 1b4ce69..5c3579c 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
>   	test_i18ncmp expect.err actual.err
>   '
>
> +test_expect_success 'fetching submodules respects parallel settings' '
> +	git config fetch.recurseSubmodules true &&
> +	(
> +		cd downstream &&
> +		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
> +		grep "7 children" trace.out &&
> +		git config submodule.jobs 8 &&
> +		GIT_TRACE=$(pwd)/trace.out git fetch &&
> +		grep "8 children" trace.out &&
> +		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
> +		grep "9 children" trace.out
> +	)
> +'
> +
>   test_done
>

^ permalink raw reply	[relevance 7%]

* Re: [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option
  2015-11-10 22:21  7%   ` Jens Lehmann
@ 2015-11-10 22:29  8%     ` Stefan Beller
  2015-11-11 19:55  7%       ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-10 22:29 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Junio C Hamano, Jonathan Nieder, Johannes Schindelin,
	Eric Sunshine, Johannes Sixt

On Tue, Nov 10, 2015 at 2:21 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> +submodule.jobs::
>> +       This is used to determine how many submodules can be operated on in
>> +       parallel. Specifying a positive integer allows up to that number
>> +       of submodules being fetched in parallel. This is used in fetch
>> +       and clone operations only. A value of 0 will give some reasonable
>> +       configuration. It defaults to 1.
>> +
>
>
> Just curious (and sorry if this has already been discussed and I missed
> it, but the volume of your output is too much for my current git time
> budget ;-): While this config is for fetching only, do I recall correctly
> that you have plans to do submodule work tree updates in parallel too?
> If so, would it make sense to have different settings for fetching and
> updating?

TL;DR: checkout is serial, network-related stuff only will be using
submodule.jobs

In the next series (origin/sb/submodule-parallel-update) this is reused for
fetches, clones, so only the network stuff. The checkout (as all local
operations)
is still done serially, as then you don't run into problems in
parallel at the same time.
(checkouts may be parallelized but I haven't done that yet, and postpone that
until it has settled a bit more)

^ permalink raw reply	[relevance 8%]

* Re: Allow git alias to override existing Git commands
  2015-11-10 21:57  5%   ` Jens Lehmann
@ 2015-11-10 22:49  5%     ` Stefan Beller
  2015-11-11 19:44  5%       ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-10 22:49 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Jeremy Morton, git@vger.kernel.org, Heiko Voigt, Junio C Hamano,
	Jonathan Nieder

On Tue, Nov 10, 2015 at 1:57 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 10.11.2015 um 19:12 schrieb Stefan Beller:
>>
>> On Tue, Nov 10, 2015 at 8:31 AM, Jeremy Morton <admin@game-point.net>
>> wrote:
>>>
>>> It's recently come to my attention that the "git alias" config
>>> functionality
>>> ignores all aliases that would override existing Git commands.  This
>>> seems
>>> like a bad idea to me.
>>
>>
>> This ensures that the plumbing commands always work as expected.
>> As scripts *should* only use plumbing commands, the scripts should
>> work with high probability despite all the crazy user
>> configuration/aliases.
>
>
> Exactly.
>
>>> For example, I wanted to setup "git clone" to automatically act as "git
>>> clone --recursive".  Sure I could do it in the shell, but it's more of a
>>> pain - any tutorial I set up about doing it would have to worry about
>>> what
>>> shell the user was using - and if you're going to make that argument, why
>>> have "git alias" at all?  It can all be done from the shell.
>>
>>
>> I think the git way for your example would be to configure git to include
>> that
>> option by default, something like
>>
>>      git config --global submodules.recursiveClone yes
>>
>> though I was skimming through the man page of git config and did not find
>> that option there. I guess it's missing.
>
>
> We thought about adding such a config option, but I believe that would
> fall a bit short. If I want to have recursive clone I also want to init
> all those submodules appearing in later fetches too (otherwise the end
> result would depend on whether you cloned before or after a submodule
> was added upstream, which is confusing). Extra points for populating
> the submodule in my work tree when switching to a commit containing
> the new submodule.
>
> So what about a "submodule.autoupdate" config option? If set to true,
> all submodules not marked "update=none" would automatically be fetched
> and inited by fetch (and thus clone too) and then checked out (with my
> recursive update changes) in every work tree manipulating command
> (again including clone).
>
> Users who only want the submodules to be present in the work tree but
> not automagically updated could set "submodule.autoupdate=clone" to
> avoid the extra cost of updating the work tree every time they switch
> between commits. Now that Heiko's config-from-commit changes are in
> master, someone could easily add that to fetch and clone as the first
> step. We could also teach clone to make "submodule.autoupdate=true"
> imply --recursive and execute the "git submodule" command to update
> the work tree as a first step until the recursive checkout patches
> are ready.
>
> Does that make sense?

I guess.

So the repo tool has the concepts of groups. I plan to add that to git
eventually, too.
i.e. with comma separated list that looks like:

    git clone --submodule-groups=default,x86builds,new-phone-codename

Having a new option there there I would also set the

    submodule.autoupdate=all

implicitly which then enables --recurse-submodules on all supported commands.

By introducing such a new submodule groups option we don't need to tell
the users about all the new submodule options, but they can still take
advantage of them,
I'd assume.

Does that make sense, too?

^ permalink raw reply	[relevance 5%]

* Re: What's cooking in git.git (Nov 2015, #02; Fri, 6)
  @ 2015-11-11 18:59  5% ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-11 18:59 UTC (permalink / raw)
  To: Jeff King, git@vger.kernel.org

On Fri, Nov 6, 2015 at 3:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
> I'll be offline for a few weeks, and Jeff King graciously agreed to
> help shepherd the project forward in the meantime as an interim
> maintainer.  Please be gentle.
>

Jeff,
gently asking where I can find our interims maintainers tree. :)

> * sb/submodule-parallel-update (2015-11-05) 10 commits
>  - clone: allow an explicit argument for parallel submodule clones
>  - submodule update: expose parallelism to the user
>  - git submodule update: have a dedicated helper for cloning
>  - fetching submodules: respect `submodule.jobs` config option
>  - submodule config: update parse_config()
>  - submodule config: remove name_and_item_from_var
>  - submodule config: keep update strategy around
>  - run_processes_parallel: add output to tracing messages
>  - Merge branch 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update
>  - Merge branch 'sb/submodule-parallel-fetch' into sb/submodule-parallel-update
>  (this branch uses sb/submodule-parallel-fetch.)
>
>  Builds on top of the "fetch --recurse-submodules" work to introduce
>  parallel downloading into multiple submodules for "submodule update".
>
>  Waiting for sb/submodule-parallel-fetch to stabilize.
>
>  It would be the cleanest to rebuild sb/submodule-parallel-fetch on
>  top of 2.7.0 once it ships and then build this directly on top;
>  that way, we do not have to have merges in this topic that
>  distracting (besides, some part of the other topic can be updated
>  in-place instead of this follow-up topic tweaking them as past
>  mistakes and inflexibilities).

Ok I can do that. I am stalling on  sb/submodule-parallel-update
until we all agree on  sb/submodule-parallel-fetch being solid.

>
> * sb/submodule-parallel-fetch (2015-11-05) 16 commits
>  - strbuf: update documentation for strbuf_read_once()
>  - run-command: remove set_nonblocking()
>   (merged to 'next' on 2015-10-23 at 8f04bbd)
>  + run-command: fix missing output from late callbacks
>  + test-run-command: increase test coverage
>  + test-run-command: test for gracefully aborting
>  + run-command: initialize the shutdown flag
>  + run-command: clear leftover state from child_process structure
>  + run-command: fix early shutdown
>   (merged to 'next' on 2015-10-15 at df63590)
>  + submodules: allow parallel fetching, add tests and documentation
>  + fetch_populated_submodules: use new parallel job processing
>  + run-command: add an asynchronous parallel child processor
>  + sigchain: add command to pop all common signals
>  + strbuf: add strbuf_read_once to read without blocking
>  + xread_nonblock: add functionality to read from fds without blocking
>  + xread: poll on non blocking fds
>  + submodule.c: write "Fetching submodule <foo>" to stderr
>  (this branch is used by sb/submodule-parallel-update.)
>
>  Add a framework to spawn a group of processes in parallel, and use
>  it to run "git fetch --recurse-submodules" in parallel.
>
>  Still being worked on, but it seems that we are seeing light at the
>  end of the tunnel.
>  ($gmane/280937)
>

 ($gmane/280937) is represented by
  - strbuf: update documentation for strbuf_read_once()
  - run-command: remove set_nonblocking()

So IMHO we're solid as required for  sb/submodule-parallel-update.

I am not sure if the rebuild on top of 2.7.0 expects a complete new
series which doesn't even mention O_NONBLOCK (squashing some
patches or reordering them), or if we want to keep the history around,
such it is easier to follow the development in the future if some bugs
show up.

^ permalink raw reply	[relevance 5%]

* Re: [RFC] Clone repositories recursive with depth 1
  @ 2015-11-11 19:19  5% ` Stefan Beller
  2015-11-11 20:09  7%   ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-11 19:19 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git List

On Wed, Nov 11, 2015 at 6:09 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:
> Hi,
>
> I have a clean build machine and I want to clone my source code to this machine while transferring only the minimal necessary amount of data. Therefore I use this command:
>
> git clone --recursive --depth 1 --single-branch <url>

That *should* work, actually.
However looking at the code it does not.

citing from builtin/clone.c:

    static struct option builtin_clone_options[] = {
        ...
        OPT_BOOL(0, "recursive", &option_recursive,
           N_("initialize submodules in the clone")),
        OPT_BOOL(0, "recurse-submodules", &option_recursive,
          N_("initialize submodules in the clone")),
        ...
    };
    ...
    static const char *argv_submodule[] = {
        "submodule", "update", "--init", "--recursive", NULL
    };

    if (!err && option_recursive)
        err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);

So the --depth argument is not passed on, although "git submodule update"
definitely supports --depth.

In an upcoming series (next version of origin/sb/submodule-parallel-update),
this will slightly change, such it will be even easier to add the
depth argument in
there as we construct the argument list in code instead of hard coding
argv_submodule.

This may require some discussion whether you expect --depth to be recursed.
(What if you only want a top level shallow thing?, What if you want to have only
submodules shallow? What is the user expectation here?)

>
> Apparently this does not clone the submodules with "--depth 1" (using Git 2.4.9). As a workaround I tried:
>
> git clone --depth 1 --single-branch <url>
> cd <repo-name>
> git submodule update --init --recursive --depth 1
>
> However, this does not work either as I get:
> fatal: reference is not a tree: <correct sha1 of the submodule referenced by the main project>
> Unable to checkout <correct sha1 of the submodule referenced by the main project> in submodule path <submodule path>

That seems like another bug to me.

I just tried to clone a project and populate with submodules later and
it works as expected without these error messages.
(I am running some kind of xxx.dirty development version, most likely
origin/sb/submodule-parallel-update,
I'll check some other versions, too)

^ permalink raw reply	[relevance 5%]

* Re: Allow git alias to override existing Git commands
  2015-11-10 22:49  5%     ` Stefan Beller
@ 2015-11-11 19:44  5%       ` Jens Lehmann
  0 siblings, 0 replies; 200+ results
From: Jens Lehmann @ 2015-11-11 19:44 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeremy Morton, git@vger.kernel.org, Heiko Voigt, Junio C Hamano,
	Jonathan Nieder

Am 10.11.2015 um 23:49 schrieb Stefan Beller:
> On Tue, Nov 10, 2015 at 1:57 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 10.11.2015 um 19:12 schrieb Stefan Beller:
>>> On Tue, Nov 10, 2015 at 8:31 AM, Jeremy Morton <admin@game-point.net>
>>>> For example, I wanted to setup "git clone" to automatically act as "git
>>>> clone --recursive".  Sure I could do it in the shell, but it's more of a
>>>> pain - any tutorial I set up about doing it would have to worry about
>>>> what
>>>> shell the user was using - and if you're going to make that argument, why
>>>> have "git alias" at all?  It can all be done from the shell.
>>>
>>>
>>> I think the git way for your example would be to configure git to include
>>> that
>>> option by default, something like
>>>
>>>       git config --global submodules.recursiveClone yes
>>>
>>> though I was skimming through the man page of git config and did not find
>>> that option there. I guess it's missing.
>>
>>
>> We thought about adding such a config option, but I believe that would
>> fall a bit short. If I want to have recursive clone I also want to init
>> all those submodules appearing in later fetches too (otherwise the end
>> result would depend on whether you cloned before or after a submodule
>> was added upstream, which is confusing). Extra points for populating
>> the submodule in my work tree when switching to a commit containing
>> the new submodule.
>>
>> So what about a "submodule.autoupdate" config option? If set to true,
>> all submodules not marked "update=none" would automatically be fetched
>> and inited by fetch (and thus clone too) and then checked out (with my
>> recursive update changes) in every work tree manipulating command
>> (again including clone).
>>
>> Users who only want the submodules to be present in the work tree but
>> not automagically updated could set "submodule.autoupdate=clone" to
>> avoid the extra cost of updating the work tree every time they switch
>> between commits. Now that Heiko's config-from-commit changes are in
>> master, someone could easily add that to fetch and clone as the first
>> step. We could also teach clone to make "submodule.autoupdate=true"
>> imply --recursive and execute the "git submodule" command to update
>> the work tree as a first step until the recursive checkout patches
>> are ready.
>>
>> Does that make sense?
>
> I guess.
>
> So the repo tool has the concepts of groups. I plan to add that to git
> eventually, too.
> i.e. with comma separated list that looks like:
>
>      git clone --submodule-groups=default,x86builds,new-phone-codename
>
> Having a new option there there I would also set the
>
>      submodule.autoupdate=all
>
> implicitly which then enables --recurse-submodules on all supported commands.

And then only submodules contained in these groups would be cloned,
automatically initialized (including those being added to a group by
upstream in the future) and their work trees updated every time the
superproject commit changes? And all submodules that aren't part in
any of these groups would be skipped and neither downloaded nor
updated? Sounds good.

But I'd rather use

     submodule.autoupdate=groups

for that use case. I expect "all" to really mean all submodules,
not only those contained in the selected groups.

> By introducing such a new submodule groups option we don't need to tell
> the users about all the new submodule options, but they can still take
> advantage of them,
> I'd assume.
>
> Does that make sense, too?

Yup.

^ permalink raw reply	[relevance 5%]

* Re: git clone --recursive should run git submodule update with flag --remote
  @ 2015-11-11 19:48  7% ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-11 19:48 UTC (permalink / raw)
  To: Stanislav; +Cc: git@vger.kernel.org

On Wed, Nov 11, 2015 at 4:46 AM, Stanislav <s.seletskiy@gmail.com> wrote:
> Consider two repositories, A and B.
>
> Repo A is embedded into B by using submodule:
>
>   git submodule add -b master <url-to-A> sub-a
>
> So, submodule sub-a is set to track master branch of the repo A.
>
> Running git submodule update --remote inside repo B will automatically
> update and checkout submodule sub-a to the latest master commit (as expected).
>
> However, when using git clone --recursive <url-to-B>, repo B will be cloned
> with submodule A checkouted to the commmit which was recorded on git
> submodule add command, not the master commit.
>
> Expected behaviour is to automatically update checkout commit pointed by
> branch, that was specified by -b flag in the git submodule add invocation.

To achieve what you want to do, you can first clone B and then
do a `git submodule update` using the --remote option yourself.
That is cumbersome however, I agree.

>
> Reason for this behaviour is that line:
>
>   https://github.com/git/git/blob/master/builtin/clone.c#L99
>
> I guess, it should be changed to include --remote flag.

I guess we could tweak the behavior to include the --remote flag
when the branch is recorded in the .gitmodules file.

Just wondering what should happen if there are both a sha1
and a branch recorded?

>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[relevance 7%]

* Re: [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option
  2015-11-10 22:29  8%     ` Stefan Beller
@ 2015-11-11 19:55  7%       ` Jens Lehmann
  2015-11-11 23:34  8%         ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-11 19:55 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Junio C Hamano, Jonathan Nieder, Johannes Schindelin,
	Eric Sunshine, Johannes Sixt

Am 10.11.2015 um 23:29 schrieb Stefan Beller:
> On Tue, Nov 10, 2015 at 2:21 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>> +submodule.jobs::
>>> +       This is used to determine how many submodules can be operated on in
>>> +       parallel. Specifying a positive integer allows up to that number
>>> +       of submodules being fetched in parallel. This is used in fetch
>>> +       and clone operations only. A value of 0 will give some reasonable
>>> +       configuration. It defaults to 1.
>>> +
>>
>>
>> Just curious (and sorry if this has already been discussed and I missed
>> it, but the volume of your output is too much for my current git time
>> budget ;-): While this config is for fetching only, do I recall correctly
>> that you have plans to do submodule work tree updates in parallel too?
>> If so, would it make sense to have different settings for fetching and
>> updating?
>
> TL;DR: checkout is serial, network-related stuff only will be using
> submodule.jobs

My point being: isn't "jobs" a bit too generic for a config option that
is only relevant for network-related stuff? Maybe "submodule.fetchJobs"
or similar would be better, as you are already thinking about adding
other parallelisms with different constraints later?

> In the next series (origin/sb/submodule-parallel-update) this is reused for
> fetches, clones, so only the network stuff. The checkout (as all local
> operations)
> is still done serially, as then you don't run into problems in
> parallel at the same time.
> (checkouts may be parallelized but I haven't done that yet, and postpone that
> until it has settled a bit more)

Makes sense.

^ permalink raw reply	[relevance 7%]

* Re: [RFC] Clone repositories recursive with depth 1
  2015-11-11 19:19  5% ` Stefan Beller
@ 2015-11-11 20:09  7%   ` Stefan Beller
  2015-11-12  9:39  2%     ` Lars Schneider
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-11 20:09 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git List

On Wed, Nov 11, 2015 at 11:19 AM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, Nov 11, 2015 at 6:09 AM, Lars Schneider
> <larsxschneider@gmail.com> wrote:
>> Hi,
>>
>> I have a clean build machine and I want to clone my source code to this machine while transferring only the minimal necessary amount of data. Therefore I use this command:
>>
>> git clone --recursive --depth 1 --single-branch <url>
>
> That *should* work, actually.
> However looking at the code it does not.
>
> citing from builtin/clone.c:
>
>     static struct option builtin_clone_options[] = {
>         ...
>         OPT_BOOL(0, "recursive", &option_recursive,
>            N_("initialize submodules in the clone")),
>         OPT_BOOL(0, "recurse-submodules", &option_recursive,
>           N_("initialize submodules in the clone")),
>         ...
>     };
>     ...
>     static const char *argv_submodule[] = {
>         "submodule", "update", "--init", "--recursive", NULL
>     };
>
>     if (!err && option_recursive)
>         err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
>
> So the --depth argument is not passed on, although "git submodule update"
> definitely supports --depth.
>
> In an upcoming series (next version of origin/sb/submodule-parallel-update),
> this will slightly change, such it will be even easier to add the
> depth argument in
> there as we construct the argument list in code instead of hard coding
> argv_submodule.
>
> This may require some discussion whether you expect --depth to be recursed.
> (What if you only want a top level shallow thing?, What if you want to have only
> submodules shallow? What is the user expectation here?)
>
>>
>> Apparently this does not clone the submodules with "--depth 1" (using Git 2.4.9). As a workaround I tried:
>>
>> git clone --depth 1 --single-branch <url>
>> cd <repo-name>
>> git submodule update --init --recursive --depth 1
>>

The workaround works with the origin/master version for me.

Notice the other email thread, which suggests to include --remote into the
call to  git submodule update depending on a branch config option being
present in the .gitmodules file.

^ permalink raw reply	[relevance 7%]

* [PATCH v2] run-command: detect finished children by closed pipe rather than waitpid
@ 2015-11-11 20:39  7% Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-11 20:39 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid.
As waitpid is not fully working in Windows, this is an approach which
allows for better cross platform operation. (It's less code, too)

Previously we did not close the read pipe of finished children, which we
do now.

The old way missed some messages on an early abort. We just killed the
children and did not bother to look what was left over. With this approach
we'd send a signal to the children and wait for them to close the pipe to
have all the messages (including possible "killed by signal 15" messages).

To have the test suite passing as before, we allow for real graceful
abortion now. In case the user wishes to abort parallel execution
the user needs to provide either the signal used to kill all children
or the children are let run until they finish normally.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

Hi Torsten, Johannes, Jeff,

changes since v1:

* prefixed the child enum states with GIT_CP_ .
* child_process_deinit(&pp->children[i].process); is gone after the finish cmd.
* I did not address the signals different than SIGTERM however, this may still
  be an issue. (Although we don't pass them in at any caller, so I guess it's
  ok for now)
  
This applies on top of origin/sb/submodule-parallel-fetch
(6f963a895a97d720, strbuf: update documentation for strbuf_read_once())

Thanks,
Stefan

 run-command.c      | 141 +++++++++++++++++++++++------------------------------
 run-command.h      |  12 +++--
 submodule.c        |   3 --
 test-run-command.c |   3 --
 4 files changed, 69 insertions(+), 90 deletions(-)

diff --git a/run-command.c b/run-command.c
index 07424e9..db4d916 100644
--- a/run-command.c
+++ b/run-command.c
@@ -858,6 +858,12 @@ int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
 	return finish_command(cmd);
 }
 
+enum child_state {
+	GIT_CP_FREE,
+	GIT_CP_WORKING,
+	GIT_CP_WAIT_CLEANUP,
+};
+
 static struct parallel_processes {
 	void *data;
 
@@ -869,7 +875,7 @@ static struct parallel_processes {
 	task_finished_fn task_finished;
 
 	struct {
-		unsigned in_use : 1;
+		enum child_state state;
 		struct child_process process;
 		struct strbuf err;
 		void *data;
@@ -923,7 +929,7 @@ static void kill_children(struct parallel_processes *pp, int signo)
 	int i, n = pp->max_processes;
 
 	for (i = 0; i < n; i++)
-		if (pp->children[i].in_use)
+		if (pp->children[i].state == GIT_CP_WORKING)
 			kill(pp->children[i].process.pid, signo);
 }
 
@@ -967,7 +973,7 @@ static struct parallel_processes *pp_init(int n,
 	for (i = 0; i < n; i++) {
 		strbuf_init(&pp->children[i].err, 0);
 		child_process_init(&pp->children[i].process);
-		pp->pfd[i].events = POLLIN;
+		pp->pfd[i].events = POLLIN | POLLHUP;
 		pp->pfd[i].fd = -1;
 	}
 	sigchain_push_common(handle_children_on_signal);
@@ -1000,39 +1006,46 @@ static void pp_cleanup(struct parallel_processes *pp)
  *  0 if a new task was started.
  *  1 if no new jobs was started (get_next_task ran out of work, non critical
  *    problem with starting a new command)
- * -1 no new job was started, user wishes to shutdown early.
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *    to signal the children.
  */
 static int pp_start_one(struct parallel_processes *pp)
 {
-	int i;
+	int i, code;
 
 	for (i = 0; i < pp->max_processes; i++)
-		if (!pp->children[i].in_use)
+		if (pp->children[i].state == GIT_CP_FREE)
 			break;
 	if (i == pp->max_processes)
 		die("BUG: bookkeeping is hard");
 
-	if (!pp->get_next_task(&pp->children[i].data,
-			       &pp->children[i].process,
-			       &pp->children[i].err,
-			       pp->data)) {
+	code = pp->get_next_task(&pp->children[i].data,
+				 &pp->children[i].process,
+				 &pp->children[i].err,
+				 pp->data);
+	if (!code) {
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
 		return 1;
 	}
+	pp->children[i].process.err = -1;
+	pp->children[i].process.stdout_to_stderr = 1;
+	pp->children[i].process.no_stdin = 1;
 
 	if (start_command(&pp->children[i].process)) {
-		int code = pp->start_failure(&pp->children[i].process,
-					     &pp->children[i].err,
-					     pp->data,
-					     &pp->children[i].data);
+		code = pp->start_failure(&pp->children[i].process,
+					 &pp->children[i].err,
+					 pp->data,
+					 &pp->children[i].data);
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
-		return code ? -1 : 1;
+		if (code)
+			pp->shutdown = 1;
+		return code;
 	}
 
 	pp->nr_processes++;
-	pp->children[i].in_use = 1;
+	pp->children[i].state = GIT_CP_WORKING;
 	pp->pfd[i].fd = pp->children[i].process.err;
 	return 0;
 }
@@ -1050,19 +1063,24 @@ static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
 
 	/* Buffer output from all pipes. */
 	for (i = 0; i < pp->max_processes; i++) {
-		if (pp->children[i].in_use &&
-		    pp->pfd[i].revents & POLLIN)
-			if (strbuf_read_once(&pp->children[i].err,
-					     pp->children[i].process.err, 0) < 0)
+		if (pp->children[i].state == GIT_CP_WORKING &&
+		    pp->pfd[i].revents & (POLLIN | POLLHUP)) {
+			int n = strbuf_read_once(&pp->children[i].err,
+						 pp->children[i].process.err, 0);
+			if (n == 0) {
+				close(pp->children[i].process.err);
+				pp->children[i].state = GIT_CP_WAIT_CLEANUP;
+			} else if (n < 0)
 				if (errno != EAGAIN)
 					die_errno("read");
+		}
 	}
 }
 
 static void pp_output(struct parallel_processes *pp)
 {
 	int i = pp->output_owner;
-	if (pp->children[i].in_use &&
+	if (pp->children[i].state == GIT_CP_WORKING &&
 	    pp->children[i].err.len) {
 		fputs(pp->children[i].err.buf, stderr);
 		strbuf_reset(&pp->children[i].err);
@@ -1071,70 +1089,31 @@ static void pp_output(struct parallel_processes *pp)
 
 static int pp_collect_finished(struct parallel_processes *pp)
 {
-	int i = 0;
-	pid_t pid;
-	int wait_status, code;
+	int i, code;
 	int n = pp->max_processes;
 	int result = 0;
 
 	while (pp->nr_processes > 0) {
-		pid = waitpid(-1, &wait_status, WNOHANG);
-		if (pid == 0)
-			break;
-
-		if (pid < 0)
-			die_errno("wait");
-
 		for (i = 0; i < pp->max_processes; i++)
-			if (pp->children[i].in_use &&
-			    pid == pp->children[i].process.pid)
+			if (pp->children[i].state == GIT_CP_WAIT_CLEANUP)
 				break;
 		if (i == pp->max_processes)
-			die("BUG: found a child process we were not aware of");
-
-		if (strbuf_read(&pp->children[i].err,
-				pp->children[i].process.err, 0) < 0)
-			die_errno("strbuf_read");
-
-		if (WIFSIGNALED(wait_status)) {
-			code = WTERMSIG(wait_status);
-			if (!pp->shutdown &&
-			    code != SIGINT && code != SIGQUIT)
-				strbuf_addf(&pp->children[i].err,
-					    "%s died of signal %d",
-					    pp->children[i].process.argv[0],
-					    code);
-			/*
-			 * This return value is chosen so that code & 0xff
-			 * mimics the exit code that a POSIX shell would report for
-			 * a program that died from this signal.
-			 */
-			code += 128;
-		} else if (WIFEXITED(wait_status)) {
-			code = WEXITSTATUS(wait_status);
-			/*
-			 * Convert special exit code when execvp failed.
-			 */
-			if (code == 127) {
-				code = -1;
-				errno = ENOENT;
-			}
-		} else {
-			strbuf_addf(&pp->children[i].err,
-				    "waitpid is confused (%s)",
-				    pp->children[i].process.argv[0]);
-			code = -1;
-		}
+			break;
+
+		code = finish_command(&pp->children[i].process);
+
+		code = pp->task_finished(code, &pp->children[i].process,
+					 &pp->children[i].err, pp->data,
+					 &pp->children[i].data);
 
-		if (pp->task_finished(code, &pp->children[i].process,
-				      &pp->children[i].err, pp->data,
-				      &pp->children[i].data))
-			result = 1;
+		if (code)
+			result = code;
+		if (code < 0)
+			break;
 
 		pp->nr_processes--;
-		pp->children[i].in_use = 0;
+		pp->children[i].state = GIT_CP_FREE;
 		pp->pfd[i].fd = -1;
-		child_process_deinit(&pp->children[i].process);
 		child_process_init(&pp->children[i].process);
 
 		if (i != pp->output_owner) {
@@ -1157,7 +1136,7 @@ static int pp_collect_finished(struct parallel_processes *pp)
 			 * running process time.
 			 */
 			for (i = 0; i < n; i++)
-				if (pp->children[(pp->output_owner + i) % n].in_use)
+				if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
 					break;
 			pp->output_owner = (pp->output_owner + i) % n;
 		}
@@ -1171,7 +1150,7 @@ int run_processes_parallel(int n,
 			   task_finished_fn task_finished,
 			   void *pp_cb)
 {
-	int i;
+	int i, code;
 	int output_timeout = 100;
 	int spawn_cap = 4;
 	struct parallel_processes *pp;
@@ -1182,12 +1161,12 @@ int run_processes_parallel(int n,
 		    i < spawn_cap && !pp->shutdown &&
 		    pp->nr_processes < pp->max_processes;
 		    i++) {
-			int code = pp_start_one(pp);
+			code = pp_start_one(pp);
 			if (!code)
 				continue;
 			if (code < 0) {
 				pp->shutdown = 1;
-				kill_children(pp, SIGTERM);
+				kill_children(pp, -code);
 			}
 			break;
 		}
@@ -1195,9 +1174,11 @@ int run_processes_parallel(int n,
 			break;
 		pp_buffer_stderr(pp, output_timeout);
 		pp_output(pp);
-		if (pp_collect_finished(pp)) {
-			kill_children(pp, SIGTERM);
+		code = pp_collect_finished(pp);
+		if (code) {
 			pp->shutdown = 1;
+			if (code < 0)
+				kill_children(pp, -code);
 		}
 	}
 
diff --git a/run-command.h b/run-command.h
index c24aa54..414cc81 100644
--- a/run-command.h
+++ b/run-command.h
@@ -134,6 +134,8 @@ int finish_async(struct async *async);
  *
  * Return 1 if the next child is ready to run.
  * Return 0 if there are currently no more tasks to be processed.
+ * To send a signal to other child processes for abortion,
+ * return negative signal code.
  */
 typedef int (*get_next_task_fn)(void **pp_task_cb,
 				struct child_process *cp,
@@ -151,8 +153,9 @@ typedef int (*get_next_task_fn)(void **pp_task_cb,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing. To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*start_failure_fn)(struct child_process *cp,
 				struct strbuf *err,
@@ -169,8 +172,9 @@ typedef int (*start_failure_fn)(struct child_process *cp,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing.  To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*task_finished_fn)(int result,
 				struct child_process *cp,
diff --git a/submodule.c b/submodule.c
index c21b265..281bccd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -689,9 +689,6 @@ static int get_next_submodule(void **task_cb, struct child_process *cp,
 			cp->dir = strbuf_detach(&submodule_path, NULL);
 			cp->env = local_repo_env;
 			cp->git_cmd = 1;
-			cp->no_stdin = 1;
-			cp->stdout_to_stderr = 1;
-			cp->err = -1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
 					    spf->prefix, ce->name);
diff --git a/test-run-command.c b/test-run-command.c
index 13e5d44..b1f04d1 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -26,9 +26,6 @@ static int parallel_next(void** task_cb,
 		return 0;
 
 	argv_array_pushv(&cp->args, d->argv);
-	cp->stdout_to_stderr = 1;
-	cp->no_stdin = 1;
-	cp->err = -1;
 	strbuf_addf(err, "preloaded output of a child\n");
 	number_callbacks++;
 	return 1;
-- 
2.6.3.368.gf34be46

^ permalink raw reply related	[relevance 7%]

* Re: [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option
  2015-11-11 19:55  7%       ` Jens Lehmann
@ 2015-11-11 23:34  8%         ` Stefan Beller
  2015-11-13 20:47  8%           ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-11 23:34 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Junio C Hamano, Jonathan Nieder, Johannes Schindelin,
	Eric Sunshine, Johannes Sixt

On Wed, Nov 11, 2015 at 11:55 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>
>> TL;DR: checkout is serial, network-related stuff only will be using
>> submodule.jobs
>
>
> My point being: isn't "jobs" a bit too generic for a config option that
> is only relevant for network-related stuff? Maybe "submodule.fetchJobs"
> or similar would be better, as you are already thinking about adding
> other parallelisms with different constraints later?

Actually I don't think that far ahead.

(I assume network to be the bottleneck for clone/fetch operations)
All I want is a saturated network all the time, and as the native git protocol
doesn't provide that (tcp startup takes time until full band witdth is reached,
local operations both on client and server) I added the parallel stuff
to 'smear' different submodule network traffics along the timeline,
such that we have a better approximation of an always fully saturated link
for the whole operation. So in the long term future, we maybe want to
reuse an http/ssh session for a different submodule, possibly interleaving
the different submodules on the wire to make it even faster. Though that
may not be helping much.

So we're back at bike shedding about the name. submodule.fetchJobs
sounds like it only applies to fetching, do you think it's sufficient for clone
as well?

Once upon a time, Junio used  'submodule.fetchParallel' or  'submodule.paralle'
in a discussion[1] for the distinction of the local and networked things.
[1] Discussing "[PATCH] Add fetch.recurseSubmoduleParallelism config option"

How about submodules.parallelNetwork for the networking part and
submodules.parallelLocal for the local part? (I don't implement parallelLocal in
the next few weeks I'd estimate).

^ permalink raw reply	[relevance 8%]

* Re: [RFC] Clone repositories recursive with depth 1
  2015-11-11 20:09  7%   ` Stefan Beller
@ 2015-11-12  9:39  2%     ` Lars Schneider
  2015-11-12 23:47  5%       ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Lars Schneider @ 2015-11-12  9:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git List


On 11 Nov 2015, at 21:09, Stefan Beller <sbeller@google.com> wrote:

> On Wed, Nov 11, 2015 at 11:19 AM, Stefan Beller <sbeller@google.com> wrote:
>> On Wed, Nov 11, 2015 at 6:09 AM, Lars Schneider
>> <larsxschneider@gmail.com> wrote:
>>> Hi,
>>> 
>>> I have a clean build machine and I want to clone my source code to this machine while transferring only the minimal necessary amount of data. Therefore I use this command:
>>> 
>>> git clone --recursive --depth 1 --single-branch <url>
>> 
>> That *should* work, actually.
>> However looking at the code it does not.
>> 
>> citing from builtin/clone.c:
>> 
>>    static struct option builtin_clone_options[] = {
>>        ...
>>        OPT_BOOL(0, "recursive", &option_recursive,
>>           N_("initialize submodules in the clone")),
>>        OPT_BOOL(0, "recurse-submodules", &option_recursive,
>>          N_("initialize submodules in the clone")),
>>        ...
>>    };
>>    ...
>>    static const char *argv_submodule[] = {
>>        "submodule", "update", "--init", "--recursive", NULL
>>    };
>> 
>>    if (!err && option_recursive)
>>        err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
>> 
>> So the --depth argument is not passed on, although "git submodule update"
>> definitely supports --depth.
>> 
>> In an upcoming series (next version of origin/sb/submodule-parallel-update),
>> this will slightly change, such it will be even easier to add the
>> depth argument in
>> there as we construct the argument list in code instead of hard coding
>> argv_submodule.
>> 
>> This may require some discussion whether you expect --depth to be recursed.
>> (What if you only want a top level shallow thing?, What if you want to have only
>> submodules shallow? What is the user expectation here?)
>> 
>>> 
>>> Apparently this does not clone the submodules with "--depth 1" (using Git 2.4.9). As a workaround I tried:
>>> 
>>> git clone --depth 1 --single-branch <url>
>>> cd <repo-name>
>>> git submodule update --init --recursive --depth 1
>>> 
> 
> The workaround works with the origin/master version for me.
> 
> Notice the other email thread, which suggests to include --remote into the
> call to  git submodule update depending on a branch config option being
> present in the .gitmodules file.

Can you check "[PATCH v2] add test to demonstrate that shallow recursive clones fail"? This demonstrates the failure that I see. I also tried the "--remote" flag but this does not work either (see test case).
Can you confirm this behavior?

Cheers,
Lars

^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  @ 2015-11-12 23:34  4% ` Stefan Beller
  2015-11-15 12:43  2%   ` Lars Schneider
    1 sibling, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-12 23:34 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git@vger.kernel.org

On Thu, Nov 12, 2015 at 1:37 AM,  <larsxschneider@gmail.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> "git clone --recursive --depth 1 --single-branch <url>" clones the
> submodules successfully. However, it does not obey "--depth 1" for
> submodule cloning.
>
> The following workaround does only work if the used submodule pointer
> is on the default branch. Otherwise "git submodule update" fails with
> "fatal: reference is not a tree:" and "Unable to checkout".
> git clone --depth 1 --single-branch <url>
> cd <repo-name>
> git submodule update --init --recursive --depth 1
>
> The workaround does not fail using the "--remote" flag. However, in that
> case the wrong commit is checked out.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---

Thanks for writing these tests. :)

> +test_expect_failure shallow-clone-recursive-workaround '
> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
> +       echo $URL &&
> +       git clone --depth 1 --single-branch $URL clone-recursive-workaround &&
> +       (
> +               cd "clone-recursive-workaround" &&
> +               git log --oneline >lines &&
> +               test_line_count = 1 lines &&
> +               git submodule update --init --recursive --depth 1

Should we prepend the lines with git submodule update with test_must_fail here?

> +       )
> +'
> +
> +test_expect_failure shallow-clone-recursive-with-remote-workaround '
> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
> +       echo $URL &&
> +       git clone --depth 1 --single-branch $URL clone-recursive-remote-workaround &&
> +       (
> +               cd "clone-recursive-remote-workaround" &&
> +               git log --oneline >lines &&
> +               test_line_count = 1 lines &&
> +               git submodule update --init --remote --recursive --depth 1 &&
> +               git status submodule >status &&
> +               test_must_fail grep "modified:" status

Use ! here instead of test_must_fail.

IIUC we use test_must_fail for git commands (to test that git does
return a non null value instead of segfaulting).
But on the other hand we trust grep to not segfault, so just negating
its output is enough here.

> +       )
> +'
> +
> +test_done
> --
> 2.5.1
>

^ permalink raw reply	[relevance 4%]

* Re: [RFC] Clone repositories recursive with depth 1
  2015-11-12  9:39  2%     ` Lars Schneider
@ 2015-11-12 23:47  5%       ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-12 23:47 UTC (permalink / raw)
  To: Lars Schneider, Stanislav; +Cc: Git List

+cc Stanislav, who came up with the other thread for passing --remote
to git submodule

On Thu, Nov 12, 2015 at 1:39 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:
>> Notice the other email thread, which suggests to include --remote into the
>> call to  git submodule update depending on a branch config option being
>> present in the .gitmodules file.
>
> Can you check "[PATCH v2] add test to demonstrate that shallow recursive clones fail"? This demonstrates the failure that I see. I also tried the "--remote" flag but this does not work either (see test case).
> Can you confirm this behavior?
>
> Cheers,
> Lars

I can confirm it breaks as expected here.

I may have confused you here by pointing to the --remote option.

(git clone is a bit stupid when it comes to submodule handling.)
All it does currently is this:

    if --recurseSubmodules option or --recursive option is given:
        run: "git submodule update --init --recursive"

No attention is paid to any other option such as --depth.
That's all I wanted to point out there.

Ideally we want to add:

    If there is a branch configured in the .gitmodules file,
    we would want to add the --remote command

    if we have given other options such as --depth or --reference
    we want to pass that along to the called submodule helper.

So I was looking at the internal code structure and think one of the next
series I am going to send will touch the code such that we can incorporate
the conditions as outlined above easier, because it is not hardcoded into an
array ["git", "submodule", "update" "--init", "--recursive"], as I
want to add yet
another dynamic option to the submodule helper invocation. (I want to add
--jobs <n> there)

Cheers,
Stefan

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  @ 2015-11-13 18:41  5%   ` Stefan Beller
  2015-11-13 23:16  7%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-13 18:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, git@vger.kernel.org

On Thu, Nov 12, 2015 at 9:35 PM, Jeff King <peff@peff.net> wrote:
>
> Hrm. Do we want to make these workarounds work correctly? Or is the
> final solution going to be that the first command you gave simply works,
> and no workarounds are needed.  If the latter, I wonder if we want to be
> adding tests for the workarounds in the first place.

I think we want to make the final solution just work. I dug into that and it is
harder than expected. I may even call it a bug. The bug doesn't occur often as
it is only triggered by things like rewriting history (forced pushes)
or a short dpeth
argument.

So if you invoke "git clone --recursive", it will internally just
delegate the submodule
handling to "git submodule update --init --recursive", which then (as
the submodule
doesn't exist yet) will delegate the cloning to "git submodule--helper
clone", which
will then call git clone for the actual cloning.

However in this whole chain of commands we never pass around the actual sha1
we need. The strategy is to clone first and then checkout the sha1, which the
superprojects wants to see. The desired sha1 was hopefully included in
the cloning,
so we can check it out.

But the sha1 may not be present if we have a very short depth argument, or if we
rewrote history. In case of a short depth argument, consider the
following history:

... <- A <- B

A is the recorded sha1 in the superproject, whereas B is the HEAD in the
remote you're cloning from. If cloning with depth=1, the most naive way
would have been to pass on the depth argument down the command chain,
but then we would end up cloning B with no further depth, and upon checkout
we cannot find A.

In case of the rewritten history, consider:

.. < - C <- B
         \
          A

whereas A is the recorded sha1 in the superproject, but on a different branch
(or even just a dangling commit. but used to be on master).
B is the master branch. In case we pass on --depth to cloning the submodule,
--single-branch is implied by --depth, so we would not clone A. In case of
A being a dangling commit, we wouldn't even clone it without the depth argument.

So I propose:
 * similar to fetch, we enable clone to obtain a specific sha1 from remote.
 * we explicitly pass the submodule sha1 as recorded in the superproject
   to the submodule fetch/clone in case we follow the exact sha1. In case of
   --remote or the branch field present in the superprojects .gitmodule file,
   we can just pass the branch name.

Thanks,
Stefan

^ permalink raw reply	[relevance 5%]

* Re: [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option
  2015-11-11 23:34  8%         ` Stefan Beller
@ 2015-11-13 20:47  8%           ` Jens Lehmann
  2015-11-13 21:29  8%             ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-13 20:47 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Junio C Hamano, Jonathan Nieder, Johannes Schindelin,
	Eric Sunshine, Johannes Sixt

Am 12.11.2015 um 00:34 schrieb Stefan Beller:
> On Wed, Nov 11, 2015 at 11:55 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>>
>>> TL;DR: checkout is serial, network-related stuff only will be using
>>> submodule.jobs
>>
>>
>> My point being: isn't "jobs" a bit too generic for a config option that
>> is only relevant for network-related stuff? Maybe "submodule.fetchJobs"
>> or similar would be better, as you are already thinking about adding
>> other parallelisms with different constraints later?
>
> Actually I don't think that far ahead.

Maybe I've been bitten once too often by too generic names that became
a problem later on ... ;-)

> (I assume network to be the bottleneck for clone/fetch operations)
> All I want is a saturated network all the time, and as the native git protocol
> doesn't provide that (tcp startup takes time until full band witdth is reached,
> local operations both on client and server) I added the parallel stuff
> to 'smear' different submodule network traffics along the timeline,
> such that we have a better approximation of an always fully saturated link
> for the whole operation. So in the long term future, we maybe want to
> reuse an http/ssh session for a different submodule, possibly interleaving
> the different submodules on the wire to make it even faster. Though that
> may not be helping much.
>
> So we're back at bike shedding about the name. submodule.fetchJobs
> sounds like it only applies to fetching, do you think it's sufficient for clone
> as well?

Hmm, to me fetching is a part of cloning, so I don't have a problem with
that. And documenting it accordingly should make it clear to everyone.

> Once upon a time, Junio used  'submodule.fetchParallel' or  'submodule.paralle'
> in a discussion[1] for the distinction of the local and networked things.
> [1] Discussing "[PATCH] Add fetch.recurseSubmoduleParallelism config option"
>
> How about submodules.parallelNetwork for the networking part and
> submodules.parallelLocal for the local part? (I don't implement parallelLocal in
> the next few weeks I'd estimate).

If 'submodules.parallelNetwork' will be used for submodule push too as
soon as that learns parallel operation, I'm ok with that. But if we don't
have good reason to believe the number of jobs for fetch can simply be
reused for push, me thinks we should have one config option containing the
term "fetch" now and another that contains "push" when we need it later,
just to be on the safe side. Otherwise it might be hard to explain to
users why 'submodules.parallelNetwork' is only used for fetch and clone
and why they have to set 'submodules.parallelPush' for pushing ...

So either 'submodule.fetchParallel' or 'submodule.fetchJobs' is fine for
me, and 'submodules.parallelNetwork' is ok too as long as we have reason
to believe this value can be used for push later too.

^ permalink raw reply	[relevance 8%]

* Re: [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option
  2015-11-13 20:47  8%           ` Jens Lehmann
@ 2015-11-13 21:29  8%             ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-13 21:29 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: git@vger.kernel.org, Ramsay Jones, Jacob Keller, Jeff King,
	Junio C Hamano, Jonathan Nieder, Johannes Schindelin,
	Eric Sunshine, Johannes Sixt

On Fri, Nov 13, 2015 at 12:47 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 12.11.2015 um 00:34 schrieb Stefan Beller:
>>
>> On Wed, Nov 11, 2015 at 11:55 AM, Jens Lehmann <Jens.Lehmann@web.de>
>> wrote:
>>>>
>>>>
>>>> TL;DR: checkout is serial, network-related stuff only will be using
>>>> submodule.jobs
>>>
>>>
>>>
>>> My point being: isn't "jobs" a bit too generic for a config option that
>>> is only relevant for network-related stuff? Maybe "submodule.fetchJobs"
>>> or similar would be better, as you are already thinking about adding
>>> other parallelisms with different constraints later?
>>
>>
>> Actually I don't think that far ahead.
>
>
> Maybe I've been bitten once too often by too generic names that became
> a problem later on ... ;-)
>
>> (I assume network to be the bottleneck for clone/fetch operations)
>> All I want is a saturated network all the time, and as the native git
>> protocol
>> doesn't provide that (tcp startup takes time until full band witdth is
>> reached,
>> local operations both on client and server) I added the parallel stuff
>> to 'smear' different submodule network traffics along the timeline,
>> such that we have a better approximation of an always fully saturated link
>> for the whole operation. So in the long term future, we maybe want to
>> reuse an http/ssh session for a different submodule, possibly interleaving
>> the different submodules on the wire to make it even faster. Though that
>> may not be helping much.
>>
>> So we're back at bike shedding about the name. submodule.fetchJobs
>> sounds like it only applies to fetching, do you think it's sufficient for
>> clone
>> as well?
>
>
> Hmm, to me fetching is a part of cloning, so I don't have a problem with
> that. And documenting it accordingly should make it clear to everyone.
>
>> Once upon a time, Junio used  'submodule.fetchParallel' or
>> 'submodule.paralle'
>> in a discussion[1] for the distinction of the local and networked things.
>> [1] Discussing "[PATCH] Add fetch.recurseSubmoduleParallelism config
>> option"
>>
>> How about submodules.parallelNetwork for the networking part and
>> submodules.parallelLocal for the local part? (I don't implement
>> parallelLocal in
>> the next few weeks I'd estimate).
>
>
> If 'submodules.parallelNetwork' will be used for submodule push too as
> soon as that learns parallel operation, I'm ok with that. But if we don't
> have good reason to believe the number of jobs for fetch can simply be
> reused for push, me thinks we should have one config option containing the
> term "fetch" now and another that contains "push" when we need it later,
> just to be on the safe side. Otherwise it might be hard to explain to
> users why 'submodules.parallelNetwork' is only used for fetch and clone
> and why they have to set 'submodules.parallelPush' for pushing ...
>
> So either 'submodule.fetchParallel' or 'submodule.fetchJobs' is fine for
> me, and 'submodules.parallelNetwork' is ok too as long as we have reason
> to believe this value can be used for push later too.

Ok, got it. So fetchJobs is fine with me.
Mind the difference in the first part, submodule[s] in singular/plural.
I thought submodule as a prefix for any individual submodule, but any
settings applying to all of the submodules, you'd take the plural submodules.*
settings.

^ permalink raw reply	[relevance 8%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 18:41  5%   ` Stefan Beller
@ 2015-11-13 23:16  7%     ` Stefan Beller
  2015-11-13 23:38  2%       ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-13 23:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, git@vger.kernel.org, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 10:41 AM, Stefan Beller <sbeller@google.com> wrote:
> On Thu, Nov 12, 2015 at 9:35 PM, Jeff King <peff@peff.net> wrote:
>>
>> Hrm. Do we want to make these workarounds work correctly? Or is the
>> final solution going to be that the first command you gave simply works,
>> and no workarounds are needed.  If the latter, I wonder if we want to be
>> adding tests for the workarounds in the first place.
>
> I think we want to make the final solution just work. I dug into that and it is
> harder than expected. I may even call it a bug. The bug doesn't occur often as
> it is only triggered by things like rewriting history (forced pushes)
> or a short dpeth
> argument.
>
> So if you invoke "git clone --recursive", it will internally just
> delegate the submodule
> handling to "git submodule update --init --recursive", which then (as
> the submodule
> doesn't exist yet) will delegate the cloning to "git submodule--helper
> clone", which
> will then call git clone for the actual cloning.
>
> However in this whole chain of commands we never pass around the actual sha1
> we need. The strategy is to clone first and then checkout the sha1, which the
> superprojects wants to see. The desired sha1 was hopefully included in
> the cloning,
> so we can check it out.
>
> But the sha1 may not be present if we have a very short depth argument, or if we
> rewrote history. In case of a short depth argument, consider the
> following history:
>
> ... <- A <- B
>
> A is the recorded sha1 in the superproject, whereas B is the HEAD in the
> remote you're cloning from. If cloning with depth=1, the most naive way
> would have been to pass on the depth argument down the command chain,
> but then we would end up cloning B with no further depth, and upon checkout
> we cannot find A.
>
> In case of the rewritten history, consider:
>
> .. < - C <- B
>          \
>           A
>
> whereas A is the recorded sha1 in the superproject, but on a different branch
> (or even just a dangling commit. but used to be on master).
> B is the master branch. In case we pass on --depth to cloning the submodule,
> --single-branch is implied by --depth, so we would not clone A. In case of
> A being a dangling commit, we wouldn't even clone it without the depth argument.
>
> So I propose:
>  * similar to fetch, we enable clone to obtain a specific sha1 from remote.
>  * we explicitly pass the submodule sha1 as recorded in the superproject
>    to the submodule fetch/clone in case we follow the exact sha1. In case of
>    --remote or the branch field present in the superprojects .gitmodule file,
>    we can just pass the branch name.
>
> Thanks,
> Stefan

+cc Junio, Duy

So cloning from an arbitrary SHA1 is not a new thing I just came up with,
but has been discussed before[1].

Junio wrote on Oct 09, 2014:
> This is so non-standard a thing to do that I doubt it is worth
> supporting with "git clone".  "git clone --branch", which is about
"> I want to follow that particular branch", would not mesh well with
> "I want to see the history that leads to this exact commit", either.
> You would not know which branch(es) is that exact commit is on in
> the first place.

I disagree with this. This is the *exact* thing you actually want to do when
dealing with submodules. When fetching/cloning for a submodule, you want
to obtain the exact sha1, instead of a branch (which happens to be supported
too, but is not the original use case with submodules.)

> The "uploadpack.allowtipsha1inwant" is a wrong configuration to tie
> this into.  The intent of the configuration is to allow *ONLY*
> commits at the tip of the (possibly hidden) refs to be asked for.
> Those who want to hide some refs using "uploadpack.hiderefs" may
> want to enable "allowtipsha1inwant" to allow the tips of the hidden
> refs while still disallowing a request to fetch any random reachable
> commit not at the tip.

If the server contains at least one superproject/submodule, there is a legit
use case for fetching an exact sha1, which isn't a tip of a branch, but may
be in any branch  or even in no branch at all. So I wonder how we want
to add that as a non-hacky solution to allow for fetching specific sha1s
as we still need to differentiate between obligerated (forced pushed,
"go away sha1s") and legit submodule pointer sha1s.

As we don't want to lookup in a superproject (we don't even know
which superproject we'd need to look into), we can either go for a
more liberal sha1 fetching attitude or somehow modify the submodule
repository to mark sha1s which are good to fetch as "submodule update"
fetches.

Proposal:
    Could we have a refs/submodules/tracking which is tracking all the
    sha1s a superproject ever pointed to? That ref would need to be
    maintained by the superproject. If there is a forced push to the
    superproject, it would also need to obliterate some of the history
    in that special ref.
Problems with this proposal:
    How do we care about multiple superprojects,
    or superprojects with different branches?

[1] http://git.661346.n2.nabble.com/Can-I-fetch-an-arbitrary-commit-by-sha1-td7619396.html

^ permalink raw reply	[relevance 7%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 23:16  7%     ` Stefan Beller
@ 2015-11-13 23:38  2%       ` Jeff King
  2015-11-13 23:41  2%         ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2015-11-13 23:38 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Lars Schneider, git@vger.kernel.org, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:

> Junio wrote on Oct 09, 2014:
> > This is so non-standard a thing to do that I doubt it is worth
> > supporting with "git clone".  "git clone --branch", which is about
> "> I want to follow that particular branch", would not mesh well with
> > "I want to see the history that leads to this exact commit", either.
> > You would not know which branch(es) is that exact commit is on in
> > the first place.
> 
> I disagree with this. This is the *exact* thing you actually want to do when
> dealing with submodules. When fetching/cloning for a submodule, you want
> to obtain the exact sha1, instead of a branch (which happens to be supported
> too, but is not the original use case with submodules.)

I think this is already implemented in 68ee628 (upload-pack: optionally
allow fetching reachable sha1, 2015-05-21), isn't it?

> If the server contains at least one superproject/submodule, there is a legit
> use case for fetching an exact sha1, which isn't a tip of a branch, but may
> be in any branch  or even in no branch at all.

The patch above doesn't handle "no branch at all", but I'm not sure if
we want to (it violates git's usual access model; moreover, a git
repository does not necessarily have all ancestors of an unreachable
object, though these days it usually does).

-Peff

^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 23:38  2%       ` Jeff King
@ 2015-11-13 23:41  2%         ` Jeff King
  2015-11-14  0:10  5%           ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2015-11-13 23:41 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Lars Schneider, git@vger.kernel.org, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:

> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
> 
> > Junio wrote on Oct 09, 2014:
> > > This is so non-standard a thing to do that I doubt it is worth
> > > supporting with "git clone".  "git clone --branch", which is about
> > "> I want to follow that particular branch", would not mesh well with
> > > "I want to see the history that leads to this exact commit", either.
> > > You would not know which branch(es) is that exact commit is on in
> > > the first place.
> > 
> > I disagree with this. This is the *exact* thing you actually want to do when
> > dealing with submodules. When fetching/cloning for a submodule, you want
> > to obtain the exact sha1, instead of a branch (which happens to be supported
> > too, but is not the original use case with submodules.)
> 
> I think this is already implemented in 68ee628 (upload-pack: optionally
> allow fetching reachable sha1, 2015-05-21), isn't it?

Note that this just implements the server side. I think to use this with
submodules right now, you'd have to manually "git init && git fetch" in
the submodule. It might make sense to teach clone to handle this, to
avoid the submodule code duplicating what the clone code does.

-Peff

^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-13 23:41  2%         ` Jeff King
@ 2015-11-14  0:10  5%           ` Stefan Beller
  2015-11-16 18:59  5%             ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-14  0:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, git@vger.kernel.org, Junio C Hamano, Duy Nguyen

On Fri, Nov 13, 2015 at 3:41 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:
>
>> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
>>
>> > Junio wrote on Oct 09, 2014:
>> > > This is so non-standard a thing to do that I doubt it is worth
>> > > supporting with "git clone".  "git clone --branch", which is about
>> > "> I want to follow that particular branch", would not mesh well with
>> > > "I want to see the history that leads to this exact commit", either.
>> > > You would not know which branch(es) is that exact commit is on in
>> > > the first place.
>> >
>> > I disagree with this. This is the *exact* thing you actually want to do when
>> > dealing with submodules. When fetching/cloning for a submodule, you want
>> > to obtain the exact sha1, instead of a branch (which happens to be supported
>> > too, but is not the original use case with submodules.)
>>
>> I think this is already implemented in 68ee628 (upload-pack: optionally
>> allow fetching reachable sha1, 2015-05-21), isn't it?
>
> Note that this just implements the server side. I think to use this with
> submodules right now, you'd have to manually "git init && git fetch" in
> the submodule. It might make sense to teach clone to handle this, to
> avoid the submodule code duplicating what the clone code does.

Yes I want to add it to clone, as that is a prerequisite for making
git clone --recursive --depth 1 to work as you'd expect. (such that
the submodule can be cloned&checkout instead of rewriting that to be
init&fetch.

Thanks for pointing out that we already have some kind of server support.

I wonder if we should add an additional way to make fetching only some
sha1s possible. ("I don't want users to fetch any sha1, but only those
where superprojects point{ed} to", even if you force push a superproject,
you want to want to only allow fetching all sha1s which exist in the current
superprojects branch.)

Maybe our emails crossed, but in the other mail I pointed out we could use
some sort of hidden ref (refs/superprojects/*) for that, which are
allowed to mark
any sort of sha1, which are allowed in the superproject/submodule context
to be fetched.

So whenever you push to a superproject (a project that has a gitlink),
we would need to check serverside if that submodule is at us and mark the
correct sha1s in the submodule. Then you can disallow fetching most of the sha1s
but still could have a correctly working submodule update mechanism.

Thanks,
Stefan


>
> -Peff

^ permalink raw reply	[relevance 5%]

* [PATCHv4 3/9] submodule-config: drop check against NULL
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
  2015-11-14  1:06 26% ` [PATCHv4 2/9] submodule-config: keep update strategy around Stefan Beller
@ 2015-11-14  1:06 24% ` Stefan Beller
  2015-11-14  1:06 24% ` [PATCHv4 4/9] submodule-config: remove name_and_item_from_var Stefan Beller
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:06 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Adhere to the common coding style of Git and not check explicitly
for NULL throughout the file. There are still other occurrences in the
code base but that is usually inside of conditions with side effects.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 4239b0e..6d01941 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -265,7 +265,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	if (!strcmp(item.buf, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->path != NULL)
+		else if (!me->overwrite && submodule->path)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"path");
 		else {
@@ -289,7 +289,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->ignore != NULL)
+		else if (!me->overwrite && submodule->ignore)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"ignore");
 		else if (strcmp(value, "untracked") &&
@@ -305,7 +305,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
-		} else if (!me->overwrite && submodule->url != NULL) {
+		} else if (!me->overwrite && submodule->url) {
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"url");
 		} else {
@@ -315,7 +315,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->update != NULL)
+		else if (!me->overwrite && submodule->update)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					     "update");
 		else {
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 24%]

* [PATCHv4 5/9] submodule-config: introduce parse_generic_submodule_config
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (2 preceding siblings ...)
  2015-11-14  1:06 24% ` [PATCHv4 4/9] submodule-config: remove name_and_item_from_var Stefan Beller
@ 2015-11-14  1:06 23% ` Stefan Beller
  2015-11-14  1:06 24% ` [PATCHv4 6/9] fetching submodules: respect `submodule.jobs` config option Stefan Beller
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:06 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

This rewrites parse_config to distinguish between configs specific to
one submodule and configs which apply generically to all submodules.
We do not have generic submodule configs yet, but the next patch will
introduce "submodule.jobs".

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index b826841..29e21b2 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -234,17 +234,22 @@ struct parse_config_parameter {
 	int overwrite;
 };
 
-static int parse_config(const char *var, const char *value, void *data)
+static int parse_generic_submodule_config(const char *key,
+					  const char *var,
+					  const char *value,
+					  struct parse_config_parameter *me)
 {
-	struct parse_config_parameter *me = data;
-	struct submodule *submodule;
-	int subsection_len, ret = 0;
-	const char *subsection, *key;
-
-	if (parse_config_key(var, "submodule", &subsection,
-			     &subsection_len, &key) < 0 || !subsection_len)
-		return 0;
+	return 0;
+}
 
+static int parse_specific_submodule_config(const char *subsection, int subsection_len,
+					   const char *key,
+					   const char *var,
+					   const char *value,
+					   struct parse_config_parameter *me)
+{
+	int ret = 0;
+	struct submodule *submodule;
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
 					     subsection, subsection_len);
@@ -314,6 +319,24 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
+static int parse_config(const char *var, const char *value, void *data)
+{
+	struct parse_config_parameter *me = data;
+	int subsection_len;
+	const char *subsection, *key;
+
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
+		return 0;
+
+	if (!subsection_len)
+		return parse_generic_submodule_config(key, var, value, me);
+	else
+		return parse_specific_submodule_config(subsection,
+						       subsection_len, key,
+						       var, value, me);
+}
+
 static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
 				      unsigned char *gitmodules_sha1)
 {
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 23%]

* [PATCHv4 9/9] clone: allow an explicit argument for parallel submodule clones
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (6 preceding siblings ...)
  2015-11-14  1:07 23% ` [PATCHv4 8/9] submodule update: expose parallelism to the user Stefan Beller
@ 2015-11-14  1:07 24% ` Stefan Beller
  2015-11-20 12:02  4% ` [PATCHv4 0/9] Expose submodule parallelism to the user Jeff King
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:07 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  6 +++++-
 builtin/clone.c             | 19 +++++++++++++------
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..59d8c67 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,10 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j <n>::
+--jobs <n>::
+	The number of submodules fetched at the same time.
+	Defaults to the `submodule.fetchJobs` option.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 9eaecd9..ce578d2 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_jobs = -1;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_jobs,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -724,8 +723,16 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_jobs != -1)
+			argv_array_pushf(&args, "--jobs=%d", max_jobs);
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 7fd5142..090891e 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -786,4 +786,19 @@ test_expect_success 'submodule update can be run in parallel' '
 	 grep "9 tasks" trace.out
 	)
 '
+
+test_expect_success 'git clone passes the parallel jobs config on to submodules' '
+	test_when_finished "rm -rf super4" &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 7 . super4 &&
+	grep "7 tasks" trace.out &&
+	rm -rf super4 &&
+	git config --global submodule.fetchJobs 8 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules . super4 &&
+	grep "8 tasks" trace.out &&
+	rm -rf super4 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 9 . super4 &&
+	grep "9 tasks" trace.out &&
+	rm -rf super4
+'
+
 test_done
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 24%]

* [PATCHv4 8/9] submodule update: expose parallelism to the user
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (5 preceding siblings ...)
  2015-11-14  1:07 21% ` [PATCHv4 7/9] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-11-14  1:07 23% ` Stefan Beller
  2015-11-14  1:07 24% ` [PATCHv4 9/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
  2015-11-20 12:02  4% ` [PATCHv4 0/9] Expose submodule parallelism to the user Jeff King
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:07 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.jobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  7 ++++++-
 builtin/submodule--helper.c     | 18 ++++++++++++++----
 git-submodule.sh                |  9 +++++++++
 t/t7406-submodule-update.sh     | 12 ++++++++++++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..a87ff72 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j <n>::
+--jobs <n>::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
+	Defaults to the `submodule.fetchJobs` option.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 95b45a2..662d329 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -426,6 +426,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -446,6 +447,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -467,10 +470,17 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs < 0)
+		max_jobs = config_parallel_submodules();
+	if (max_jobs < 0)
+		max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index 9f554fb..10c5af9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -645,6 +645,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -670,6 +678,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..7fd5142 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
 	)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+	(cd super2 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+	 grep "7 tasks" trace.out &&
+	 git config submodule.fetchJobs 8 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update &&
+	 grep "8 tasks" trace.out &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 9 &&
+	 grep "9 tasks" trace.out
+	)
+'
 test_done
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 23%]

* [PATCHv4 6/9] fetching submodules: respect `submodule.jobs` config option
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (3 preceding siblings ...)
  2015-11-14  1:06 23% ` [PATCHv4 5/9] submodule-config: introduce parse_generic_submodule_config Stefan Beller
@ 2015-11-14  1:06 24% ` Stefan Beller
  2015-11-14  1:07 21% ` [PATCHv4 7/9] git submodule update: have a dedicated helper for cloning Stefan Beller
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:06 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

This allows to configure fetching and updating in parallel
without having the command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt    |  7 +++++++
 builtin/fetch.c             |  2 +-
 submodule-config.c          | 15 +++++++++++++++
 submodule-config.h          |  2 ++
 submodule.c                 |  5 +++++
 t/t5526-fetch-submodules.sh | 14 ++++++++++++++
 6 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..9e7c14c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
+submodule.fetchJobs::
+	This is used to determine how many submodules will be
+	fetched/cloned at the same time. Specifying a positive integer
+	allows up to that number of submodules being fetched in parallel.
+	This is used in fetch and clone operations only. A value of 0 will
+	give some reasonable configuration. It defaults to 1.
+
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
 	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9cc1c9d..60e6797 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule-config.c b/submodule-config.c
index 29e21b2..a32259e 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -32,6 +32,7 @@ enum lookup_type {
 
 static struct submodule_cache cache;
 static int is_cache_init;
+static int parallel_jobs = -1;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
@@ -239,6 +240,15 @@ static int parse_generic_submodule_config(const char *key,
 					  const char *value,
 					  struct parse_config_parameter *me)
 {
+	if (!strcmp(key, "fetchjobs")) {
+		parallel_jobs = strtol(value, NULL, 10);
+		if (parallel_jobs < 0) {
+			warning("submodule.fetchJobs not allowed to be negative.");
+			parallel_jobs = 1;
+			return 1;
+		}
+	}
+
 	return 0;
 }
 
@@ -482,3 +492,8 @@ void submodule_free(void)
 	cache_free(&cache);
 	is_cache_init = 0;
 }
+
+int config_parallel_submodules(void)
+{
+	return parallel_jobs;
+}
diff --git a/submodule-config.h b/submodule-config.h
index f9e2a29..d9bbf9a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
 void submodule_free(void);
 
+int config_parallel_submodules(void);
+
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index c6350eb..e73f850 100644
--- a/submodule.c
+++ b/submodule.c
@@ -749,6 +749,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_parallel_submodules();
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 1b4ce69..6671994 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'fetching submodules respects parallel settings' '
+	git config fetch.recurseSubmodules true &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
+		grep "7 tasks" trace.out &&
+		git config submodule.fetchJobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git fetch &&
+		grep "8 tasks" trace.out &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
+		grep "9 tasks" trace.out
+	)
+'
+
 test_done
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 24%]

* [PATCHv4 7/9] git submodule update: have a dedicated helper for cloning
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (4 preceding siblings ...)
  2015-11-14  1:06 24% ` [PATCHv4 6/9] fetching submodules: respect `submodule.jobs` config option Stefan Beller
@ 2015-11-14  1:07 21% ` Stefan Beller
  2015-11-14  1:07 23% ` [PATCHv4 8/9] submodule update: expose parallelism to the user Stefan Beller
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:07 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 229 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 242 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..95b45a2 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,234 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	/* states */
+	int count;
+	int print_unmatched;
+	/* configuration */
+	int quiet;
+	const char *reference;
+	const char *depth;
+	const char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix)
+		argv_array_pushl(&cp->args, "--prefix", prefix, NULL);
+
+	argv_array_pushl(&cp->args, "--path", path, NULL);
+	argv_array_pushl(&cp->args, "--name", name, NULL);
+	argv_array_pushl(&cp->args, "--url", url, NULL);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		char *url = NULL;
+		int needs_cloning = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (!sub) {
+			strbuf_addf(err, "BUG: internal error managing submodules. "
+				    "The cache could not locate '%s'", ce->name);
+			pp->print_unmatched = 1;
+			continue;
+		}
+
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We must not fall back to .gitmodules as we only want to process
+		 * configured submodules.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		needs_cloning = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				needs_cloning, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (needs_cloning) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			free(url);
+			return 1;
+		} else
+			free(url);
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines) {
+		utf8_fprintf(stdout, "%s", item->string);
+	}
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +492,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 9bc5c5f..9f554fb 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -664,17 +664,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -691,27 +692,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -751,13 +735,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 21%]

* [PATCHv4 4/9] submodule-config: remove name_and_item_from_var
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
  2015-11-14  1:06 26% ` [PATCHv4 2/9] submodule-config: keep update strategy around Stefan Beller
  2015-11-14  1:06 24% ` [PATCHv4 3/9] submodule-config: drop check against NULL Stefan Beller
@ 2015-11-14  1:06 24% ` Stefan Beller
  2015-11-14  1:06 23% ` [PATCHv4 5/9] submodule-config: introduce parse_generic_submodule_config Stefan Beller
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:06 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

`name_and_item_from_var` does not provide the proper abstraction
we need here in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 48 ++++++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 6d01941..b826841 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -161,31 +161,17 @@ static struct submodule *cache_lookup_name(struct submodule_cache *cache,
 	return NULL;
 }
 
-static int name_and_item_from_var(const char *var, struct strbuf *name,
-				  struct strbuf *item)
-{
-	const char *subsection, *key;
-	int subsection_len, parse;
-	parse = parse_config_key(var, "submodule", &subsection,
-			&subsection_len, &key);
-	if (parse < 0 || !subsection)
-		return 0;
-
-	strbuf_add(name, subsection, subsection_len);
-	strbuf_addstr(item, key);
-
-	return 1;
-}
-
 static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
-		const unsigned char *gitmodules_sha1, const char *name)
+						  const unsigned char *gitmodules_sha1,
+						  const char *name_ptr, int name_len)
 {
 	struct submodule *submodule;
 	struct strbuf name_buf = STRBUF_INIT;
+	char *name = xmemdupz(name_ptr, name_len);
 
 	submodule = cache_lookup_name(cache, gitmodules_sha1, name);
 	if (submodule)
-		return submodule;
+		goto out;
 
 	submodule = xmalloc(sizeof(*submodule));
 
@@ -201,7 +187,8 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 	hashcpy(submodule->gitmodules_sha1, gitmodules_sha1);
 
 	cache_add(cache, submodule);
-
+out:
+	free(name);
 	return submodule;
 }
 
@@ -251,18 +238,18 @@ static int parse_config(const char *var, const char *value, void *data)
 {
 	struct parse_config_parameter *me = data;
 	struct submodule *submodule;
-	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
-	int ret = 0;
+	int subsection_len, ret = 0;
+	const char *subsection, *key;
 
-	/* this also ensures that we only parse submodule entries */
-	if (!name_and_item_from_var(var, &name, &item))
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0 || !subsection_len)
 		return 0;
 
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
-					     name.buf);
+					     subsection, subsection_len);
 
-	if (!strcmp(item.buf, "path")) {
+	if (!strcmp(key, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->path)
@@ -275,7 +262,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->path = xstrdup(value);
 			cache_put_path(me->cache, submodule);
 		}
-	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+	} else if (!strcmp(key, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
@@ -286,7 +273,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->fetch_recurse = parse_fetch_recurse(
 								var, value,
 								die_on_error);
-	} else if (!strcmp(item.buf, "ignore")) {
+	} else if (!strcmp(key, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->ignore)
@@ -302,7 +289,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->ignore);
 			submodule->ignore = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "url")) {
+	} else if (!strcmp(key, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
 		} else if (!me->overwrite && submodule->url) {
@@ -312,7 +299,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "update")) {
+	} else if (!strcmp(key, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->update)
@@ -324,9 +311,6 @@ static int parse_config(const char *var, const char *value, void *data)
 		}
 	}
 
-	strbuf_release(&name);
-	strbuf_release(&item);
-
 	return ret;
 }
 
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 24%]

* [PATCHv4 0/9] Expose submodule parallelism to the user
@ 2015-11-14  1:06 10% Stefan Beller
  2015-11-14  1:06 26% ` [PATCHv4 2/9] submodule-config: keep update strategy around Stefan Beller
                   ` (8 more replies)
  0 siblings, 9 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:06 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

This replaces sb/submodule-parallel-update.
It applies on top of d075d2604c0 (Merge branch
'rs/daemon-plug-child-leak' into sb/submodule-parallel-update,
with additionally having merged submodule-parallel-fetch,
which has applied "run-command: detect finished
children by closed pipe rather than waitpid" on top of it.
Alternatively pull from github/stefanbeller/git submodule-parallel-update

* This lets you configure submodule.fetchJobs instead of previously submodule.jobs
* no weird NONBLOCK thingies any more as that was handled by submodule-parallel-fetch
  (or the patch on top of that) 


Stefan Beller (9):
  run_processes_parallel: delimit intermixed task output
  submodule-config: keep update strategy around
  submodule-config: drop check against NULL
  submodule-config: remove name_and_item_from_var
  submodule-config: introduce parse_generic_submodule_config
  fetching submodules: respect `submodule.jobs` config option
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones

 Documentation/config.txt        |   7 ++
 Documentation/git-clone.txt     |   6 +-
 Documentation/git-submodule.txt |   7 +-
 builtin/clone.c                 |  19 +++-
 builtin/fetch.c                 |   2 +-
 builtin/submodule--helper.c     | 239 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++-----
 run-command.c                   |   4 +
 submodule-config.c              | 109 +++++++++++-------
 submodule-config.h              |   3 +
 submodule.c                     |   5 +
 t/t5526-fetch-submodules.sh     |  14 +++
 t/t7400-submodule-basic.sh      |   4 +-
 t/t7406-submodule-update.sh     |  27 +++++
 14 files changed, 417 insertions(+), 83 deletions(-)

-- 
2.6.3.369.gea52ac0

^ permalink raw reply	[relevance 10%]

* [PATCHv4 2/9] submodule-config: keep update strategy around
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
@ 2015-11-14  1:06 26% ` Stefan Beller
  2015-11-14  1:06 24% ` [PATCHv4 3/9] submodule-config: drop check against NULL Stefan Beller
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-14  1:06 UTC (permalink / raw)
  To: git
  Cc: ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine, j6t,
	Stefan Beller

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..4239b0e 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *) submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 	strbuf_release(&name);
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.6.3.369.gea52ac0

^ permalink raw reply related	[relevance 26%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-12 23:34  4% ` Stefan Beller
@ 2015-11-15 12:43  2%   ` Lars Schneider
  0 siblings, 0 replies; 200+ results
From: Lars Schneider @ 2015-11-15 12:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org


On 13 Nov 2015, at 00:34, Stefan Beller <sbeller@google.com> wrote:

> On Thu, Nov 12, 2015 at 1:37 AM,  <larsxschneider@gmail.com> wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> "git clone --recursive --depth 1 --single-branch <url>" clones the
>> submodules successfully. However, it does not obey "--depth 1" for
>> submodule cloning.
>> 
>> The following workaround does only work if the used submodule pointer
>> is on the default branch. Otherwise "git submodule update" fails with
>> "fatal: reference is not a tree:" and "Unable to checkout".
>> git clone --depth 1 --single-branch <url>
>> cd <repo-name>
>> git submodule update --init --recursive --depth 1
>> 
>> The workaround does not fail using the "--remote" flag. However, in that
>> case the wrong commit is checked out.
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
> 
> Thanks for writing these tests. :)
Thanks for looking into the issue :)


> 
>> +test_expect_failure shallow-clone-recursive-workaround '
>> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
>> +       echo $URL &&
>> +       git clone --depth 1 --single-branch $URL clone-recursive-workaround &&
>> +       (
>> +               cd "clone-recursive-workaround" &&
>> +               git log --oneline >lines &&
>> +               test_line_count = 1 lines &&
>> +               git submodule update --init --recursive --depth 1
> 
> Should we prepend the lines with git submodule update with test_must_fail here?
Wouldn't the test fail then? The test is expected to fail (see "test_expect_failure"). Am I missing something?


> 
>> +       )
>> +'
>> +
>> +test_expect_failure shallow-clone-recursive-with-remote-workaround '
>> +       URL="file://$(pwd | sed "s/[[:space:]]/%20/g")/repo" &&
>> +       echo $URL &&
>> +       git clone --depth 1 --single-branch $URL clone-recursive-remote-workaround &&
>> +       (
>> +               cd "clone-recursive-remote-workaround" &&
>> +               git log --oneline >lines &&
>> +               test_line_count = 1 lines &&
>> +               git submodule update --init --remote --recursive --depth 1 &&
>> +               git status submodule >status &&
>> +               test_must_fail grep "modified:" status
> 
> Use ! here instead of test_must_fail.
> 
> IIUC we use test_must_fail for git commands (to test that git does
> return a non null value instead of segfaulting).
> But on the other hand we trust grep to not segfault, so just negating
> its output is enough here.

OK! I will fix that in the next series!

Thanks,
Lars

^ permalink raw reply	[relevance 2%]

* Re: [RFC] URL rewrite in .gitmodules
  2015-10-26 16:52  2%       ` Jens Lehmann
@ 2015-11-15 13:16  0%         ` Lars Schneider
  0 siblings, 0 replies; 200+ results
From: Lars Schneider @ 2015-11-15 13:16 UTC (permalink / raw)
  To: Jens Lehmann; +Cc: Stefan Beller, Junio C Hamano, Git Users


On 26 Oct 2015, at 17:52, Jens Lehmann <Jens.Lehmann@web.de> wrote:

> Am 26.10.2015 um 17:34 schrieb Stefan Beller:
>> On Sun, Oct 25, 2015 at 8:12 AM, Lars Schneider <larsxschneider@gmail.com> wrote:
>>> On 20 Oct 2015, at 19:33, Junio C Hamano <gitster@pobox.com> wrote:
>>>> I do not think this topic is specific to use of submodules.  If you
>>>> want to encourage your engineers to fetch from nearby mirrors you
>>>> maintain, you would want a forest of url.mine.insteadof=theirs for
>>>> the external repositories that matter to you specified by
>>>> everybody's $HOME/.gitconfig, and one way to do so would be to have
>>>> them use the configuration inclusion.  An item in your engineer
>>>> orientation material could tell them to add
>>>> 
>>>>       [include]
>>>>               path = /usr/local/etc/git/mycompany.urlrewrite
>>>> 
>>>> when they set up their "[user] name/email" in there.
>>>> 
>>>> And you can update /usr/local/etc/git/mycompany.urlrewrite as
>>>> needed.
>>> Oh nice, I didn't know about "include". However, as mentioned to Stefan in this thread, I fear that our engineers will miss that. I would prefer a solution that does not need any additional setup. Therefore the suggestion to add rewrites in the .gitmodules file.
>> 
>> How do you distribute new copies of Git to your engineers?
>> Maybe you could ship them a version which has the "include" line
>> already builtin as default? So your distributed copy of Git
>> would not just check the default places for configs, but also
>> some complied in /net/share/mycompany.gitconfig
> 
> Which is just what we do at $DAYJOB, that way you can easily
> distribute all kinds of settings, customizations and hooks
> company-wide.

That's a very good idea. I will try to establish this practice, too.

Thanks,
Lars

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] push: add recurseSubmodules config option
  @ 2015-11-16 18:15  2% ` Stefan Beller
  2015-11-16 18:31  2%   ` Mike Crowe
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-16 18:15 UTC (permalink / raw)
  To: Mike Crowe; +Cc: git@vger.kernel.org

On Mon, Nov 16, 2015 at 5:24 AM, Mike Crowe <mac@mcrowe.com> wrote:
> The --recurse-submodules command line parameter has existed for some
> time but it has no config file equivalent.
>
> Following the style of the corresponding parameter for git fetch, let's
> invent push.recurseSubmodules to provide a default for this
> parameter. This also requires the addition of --recurse-submodules=no to
> allow the configuration to be overridden on the command line when
> required.
>
> The most straightforward way to implement this appears to be to make
> push use code in submodule-config in a similar way to fetch.
>
> Signed-off-by: Mike Crowe <mac@mcrowe.com>
> ---

The code itself looks good to me, one nit in the tests though.

> @@ -79,6 +87,119 @@ test_expect_success 'push succeeds after commit was pushed to remote' '
>         )
>  '
>
> +test_expect_success 'push succeeds if submodule commit not on remote but using on-demand on command line' '
> +       (
> +               cd work/gar/bage &&
> +               >recurse-on-demand-on-command-line &&
> +               git add recurse-on-demand-on-command-line &&
> +               git commit -m "Recurse on-demand on command line junk"
> +       ) &&
> +       (
> +               cd work &&
> +               git add gar/bage &&
> +               git commit -m "Recurse on-demand on command line for gar/bage" &&
> +               git push --recurse-submodules=on-demand ../pub.git master &&
> +               # Check that the supermodule commit got there
> +               git fetch ../pub.git &&
> +               git diff --quiet FETCH_HEAD master

Missing && chain here.

> +               # Check that the submodule commit got there too
> +               cd gar/bage &&
> +               git diff --quiet origin/master master
> +       )
> +'
> +

^ permalink raw reply	[relevance 2%]

* Re: [PATCH] push: add recurseSubmodules config option
  2015-11-16 18:15  2% ` Stefan Beller
@ 2015-11-16 18:31  2%   ` Mike Crowe
  2015-11-16 19:05  0%     ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Mike Crowe @ 2015-11-16 18:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org

On Monday 16 November 2015 at 10:15:24 -0800, Stefan Beller wrote:
> The code itself looks good to me, one nit in the tests though.
> 
> > @@ -79,6 +87,119 @@ test_expect_success 'push succeeds after commit was pushed to remote' '
> >         )
> >  '
> >
> > +test_expect_success 'push succeeds if submodule commit not on remote but using on-demand on command line' '
> > +       (
> > +               cd work/gar/bage &&
> > +               >recurse-on-demand-on-command-line &&
> > +               git add recurse-on-demand-on-command-line &&
> > +               git commit -m "Recurse on-demand on command line junk"
> > +       ) &&
> > +       (
> > +               cd work &&
> > +               git add gar/bage &&
> > +               git commit -m "Recurse on-demand on command line for gar/bage" &&
> > +               git push --recurse-submodules=on-demand ../pub.git master &&
> > +               # Check that the supermodule commit got there
> > +               git fetch ../pub.git &&
> > +               git diff --quiet FETCH_HEAD master
> 
> Missing && chain here.

Oh, well spotted! I'll provide an updated version.

Thanks.

Mike.

^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-14  0:10  5%           ` Stefan Beller
@ 2015-11-16 18:59  5%             ` Jens Lehmann
  2015-11-16 19:25  5%               ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-16 18:59 UTC (permalink / raw)
  To: Stefan Beller, Jeff King
  Cc: Lars Schneider, git@vger.kernel.org, Junio C Hamano, Duy Nguyen

Am 14.11.2015 um 01:10 schrieb Stefan Beller:
> On Fri, Nov 13, 2015 at 3:41 PM, Jeff King <peff@peff.net> wrote:
>> On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:
>>
>>> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
>>>
>>>> Junio wrote on Oct 09, 2014:
>>>>> This is so non-standard a thing to do that I doubt it is worth
>>>>> supporting with "git clone".  "git clone --branch", which is about
>>>> "> I want to follow that particular branch", would not mesh well with
>>>>> "I want to see the history that leads to this exact commit", either.
>>>>> You would not know which branch(es) is that exact commit is on in
>>>>> the first place.
>>>>
>>>> I disagree with this. This is the *exact* thing you actually want to do when
>>>> dealing with submodules. When fetching/cloning for a submodule, you want
>>>> to obtain the exact sha1, instead of a branch (which happens to be supported
>>>> too, but is not the original use case with submodules.)

Yes, being able to fetch certain sha1s makes lots of sense for submodules
(this has been discussed some time ago at a GitTogether). But - apart from
the extra network load - it's rather helpful to get all the submodule
branches too (though that could be limited to the branches the sha1 is on).

>>> I think this is already implemented in 68ee628 (upload-pack: optionally
>>> allow fetching reachable sha1, 2015-05-21), isn't it?
>>
>> Note that this just implements the server side. I think to use this with
>> submodules right now, you'd have to manually "git init && git fetch" in
>> the submodule. It might make sense to teach clone to handle this, to
>> avoid the submodule code duplicating what the clone code does.
>
> Yes I want to add it to clone, as that is a prerequisite for making
> git clone --recursive --depth 1 to work as you'd expect. (such that
> the submodule can be cloned&checkout instead of rewriting that to be
> init&fetch.

Cool, that should help recursive fetch too.

> Thanks for pointing out that we already have some kind of server support.
>
> I wonder if we should add an additional way to make fetching only some
> sha1s possible. ("I don't want users to fetch any sha1, but only those
> where superprojects point{ed} to", even if you force push a superproject,
> you want to want to only allow fetching all sha1s which exist in the current
> superprojects branch.)

Me thinks the restrictions for sha1-fetching could come from the branches
these sha1s are found in the upstream submodule: if the client is allowed
to fetch a branch, it should be able to fetch any sha1 on that branch.

> Maybe our emails crossed, but in the other mail I pointed out we could use
> some sort of hidden ref (refs/superprojects/*) for that, which are
> allowed to mark
> any sort of sha1, which are allowed in the superproject/submodule context
> to be fetched.
>
> So whenever you push to a superproject (a project that has a gitlink),
> we would need to check serverside if that submodule is at us and mark the
> correct sha1s in the submodule. Then you can disallow fetching most of the sha1s
> but still could have a correctly working submodule update mechanism.

And what happens if the submodule isn't at us? Involving the serverside of
a superproject in submodule fetching sounds wrong to me. Me thinks that
the upstream of the submodule should always control if a sha1 is allowed
to be fetched. Or did I understand you wrong?

^ permalink raw reply	[relevance 5%]

* Re: [PATCH] push: add recurseSubmodules config option
  2015-11-16 18:31  2%   ` Mike Crowe
@ 2015-11-16 19:05  0%     ` Jens Lehmann
  0 siblings, 0 replies; 200+ results
From: Jens Lehmann @ 2015-11-16 19:05 UTC (permalink / raw)
  To: Mike Crowe, Stefan Beller; +Cc: git@vger.kernel.org, Heiko Voigt

Am 16.11.2015 um 19:31 schrieb Mike Crowe:
> On Monday 16 November 2015 at 10:15:24 -0800, Stefan Beller wrote:
>> The code itself looks good to me, one nit in the tests though.
>>
>>> @@ -79,6 +87,119 @@ test_expect_success 'push succeeds after commit was pushed to remote' '
>>>          )
>>>   '
>>>
>>> +test_expect_success 'push succeeds if submodule commit not on remote but using on-demand on command line' '
>>> +       (
>>> +               cd work/gar/bage &&
>>> +               >recurse-on-demand-on-command-line &&
>>> +               git add recurse-on-demand-on-command-line &&
>>> +               git commit -m "Recurse on-demand on command line junk"
>>> +       ) &&
>>> +       (
>>> +               cd work &&
>>> +               git add gar/bage &&
>>> +               git commit -m "Recurse on-demand on command line for gar/bage" &&
>>> +               git push --recurse-submodules=on-demand ../pub.git master &&
>>> +               # Check that the supermodule commit got there
>>> +               git fetch ../pub.git &&
>>> +               git diff --quiet FETCH_HEAD master
>>
>> Missing && chain here.
>
> Oh, well spotted! I'll provide an updated version.

Looking good for me too!

Cool, another issue from my Wiki that's being worked on!

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 18:59  5%             ` Jens Lehmann
@ 2015-11-16 19:25  5%               ` Stefan Beller
  2015-11-16 21:42  5%                 ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-16 19:25 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>
>> On Fri, Nov 13, 2015 at 3:41 PM, Jeff King <peff@peff.net> wrote:
>>>
>>> On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:
>>>
>>>> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
>>>>
>>>>> Junio wrote on Oct 09, 2014:
>>>>>>
>>>>>> This is so non-standard a thing to do that I doubt it is worth
>>>>>> supporting with "git clone".  "git clone --branch", which is about
>>>>>
>>>>> "> I want to follow that particular branch", would not mesh well with
>>>>>>
>>>>>> "I want to see the history that leads to this exact commit", either.
>>>>>> You would not know which branch(es) is that exact commit is on in
>>>>>> the first place.
>>>>>
>>>>>
>>>>> I disagree with this. This is the *exact* thing you actually want to do
>>>>> when
>>>>> dealing with submodules. When fetching/cloning for a submodule, you
>>>>> want
>>>>> to obtain the exact sha1, instead of a branch (which happens to be
>>>>> supported
>>>>> too, but is not the original use case with submodules.)
>
>
> Yes, being able to fetch certain sha1s makes lots of sense for submodules
> (this has been discussed some time ago at a GitTogether). But - apart from
> the extra network load - it's rather helpful to get all the submodule
> branches too (though that could be limited to the branches the sha1 is on).

Ok, I did not attend that GitTogether ;)

>
>>>> I think this is already implemented in 68ee628 (upload-pack: optionally
>>>> allow fetching reachable sha1, 2015-05-21), isn't it?
>>>
>>>
>>> Note that this just implements the server side. I think to use this with
>>> submodules right now, you'd have to manually "git init && git fetch" in
>>> the submodule. It might make sense to teach clone to handle this, to
>>> avoid the submodule code duplicating what the clone code does.
>>
>>
>> Yes I want to add it to clone, as that is a prerequisite for making
>> git clone --recursive --depth 1 to work as you'd expect. (such that
>> the submodule can be cloned&checkout instead of rewriting that to be
>> init&fetch.
>
>
> Cool, that should help recursive fetch too.
>
>> Thanks for pointing out that we already have some kind of server support.
>>
>> I wonder if we should add an additional way to make fetching only some
>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>> where superprojects point{ed} to", even if you force push a superproject,
>> you want to want to only allow fetching all sha1s which exist in the
>> current
>> superprojects branch.)
>
>
> Me thinks the restrictions for sha1-fetching could come from the branches
> these sha1s are found in the upstream submodule: if the client is allowed
> to fetch a branch, it should be able to fetch any sha1 on that branch.

I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
set, is not sufficient though.

To fetch an arbitrary sha1, you would need to check if that sha1 is part
of the history of any advertised branch and then allow fetching serverside,
which sounds like some work for the server, which we may want to avoid
by having smarter data structures there.

Instead of having to search all branches for the requested sha1, we could have
some sort of data structure to make it not an O(n) operation (n being
all objects
in the repo).

Maybe I overestimate the work which needs to be done, because the server has
bitmaps nowadays.

Maybe a lazy reverse-pointer graph can be established on the serverside.
So I guess when we add the feature to fetch arbitrary sha1s, reachable from
any branch, people using submodules will make use of the feature. (such as with
git fetch --recurse --depth 1 or via a new `git fetch --recursive
--up-to-submodule-tip-only`)

So once the server is asked for a certain sha1, it will do the
reachability check,
which takes some effort, but then stores the result in the form:
"If ${current tip sha} of ${branch} is reachable, so is requested $sha1."

So when the next fetch request for $sha1 arrives, the server only needs to
check for ${current tip sha} to be part of $branch, which is expected to be
a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could
just tell you or at least shorten the walk even more)
If the ${branch} has changed, the next evaluation for $sha1 can update
the cache,
such that the reverse lookup is not expensive on expectation.

I assume this will mostly be used with submodules, so only a few sha1s need
this caching.

>
>> Maybe our emails crossed, but in the other mail I pointed out we could use
>> some sort of hidden ref (refs/superprojects/*) for that, which are
>> allowed to mark
>> any sort of sha1, which are allowed in the superproject/submodule context
>> to be fetched.
>>
>> So whenever you push to a superproject (a project that has a gitlink),
>> we would need to check serverside if that submodule is at us and mark the
>> correct sha1s in the submodule. Then you can disallow fetching most of the
>> sha1s
>> but still could have a correctly working submodule update mechanism.
>
>
> And what happens if the submodule isn't at us? Involving the serverside of
> a superproject in submodule fetching sounds wrong to me. Me thinks that
> the upstream of the submodule should always control if a sha1 is allowed
> to be fetched. Or did I understand you wrong?

Yes and no.
The serverside submodule repository should be responsible for the ultimate
decision if you are allowed to fetch that sha1. But maybe on pushing the
superproject, we can store a hint in the submodule, that this sha1 is legit.
Although I may be missguided in my thinking here as the superproject
should have no influence on the submodule.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 19:25  5%               ` Stefan Beller
@ 2015-11-16 21:42  5%                 ` Jens Lehmann
  2015-11-16 22:56  5%                   ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-16 21:42 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

Am 16.11.2015 um 20:25 schrieb Stefan Beller:
> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>> Thanks for pointing out that we already have some kind of server support.
>>>
>>> I wonder if we should add an additional way to make fetching only some
>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>> where superprojects point{ed} to", even if you force push a superproject,
>>> you want to want to only allow fetching all sha1s which exist in the
>>> current
>>> superprojects branch.)
>>
>>
>> Me thinks the restrictions for sha1-fetching could come from the branches
>> these sha1s are found in the upstream submodule: if the client is allowed
>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>
> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
> set, is not sufficient though.
>
> To fetch an arbitrary sha1, you would need to check if that sha1 is part
> of the history of any advertised branch and then allow fetching serverside,
> which sounds like some work for the server, which we may want to avoid
> by having smarter data structures there.
>
> Instead of having to search all branches for the requested sha1, we could have
> some sort of data structure to make it not an O(n) operation (n being
> all objects
> in the repo).
>
> Maybe I overestimate the work which needs to be done, because the server has
> bitmaps nowadays.
>
> Maybe a lazy reverse-pointer graph can be established on the serverside.
> So I guess when we add the feature to fetch arbitrary sha1s, reachable from
> any branch, people using submodules will make use of the feature. (such as with
> git fetch --recurse --depth 1 or via a new `git fetch --recursive
> --up-to-submodule-tip-only`)
>
> So once the server is asked for a certain sha1, it will do the
> reachability check,
> which takes some effort, but then stores the result in the form:
> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>
> So when the next fetch request for $sha1 arrives, the server only needs to
> check for ${current tip sha} to be part of $branch, which is expected to be
> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could
> just tell you or at least shorten the walk even more)
> If the ${branch} has changed, the next evaluation for $sha1 can update
> the cache,
> such that the reverse lookup is not expensive on expectation.

Makes sense, although I do not know enough about the server side to tell if
it would need such an optimization or will cope with the load just fine.

But even if we'd enable such a feature without having to set an extra config
option, a submodule fetch asking for certain sha1s would have to fall back
to a simple "fetch all" like we do now when the server doesn't support that
for backwards compatibility. But maybe that's just obvious.

> I assume this will mostly be used with submodules, so only a few sha1s need
> this caching.

I won't bet on that, some of the submodules at $DAYJOB are rather busy and
see almost the same traffic as their superprojects ;-)

>>> Maybe our emails crossed, but in the other mail I pointed out we could use
>>> some sort of hidden ref (refs/superprojects/*) for that, which are
>>> allowed to mark
>>> any sort of sha1, which are allowed in the superproject/submodule context
>>> to be fetched.
>>>
>>> So whenever you push to a superproject (a project that has a gitlink),
>>> we would need to check serverside if that submodule is at us and mark the
>>> correct sha1s in the submodule. Then you can disallow fetching most of the
>>> sha1s
>>> but still could have a correctly working submodule update mechanism.
>>
>>
>> And what happens if the submodule isn't at us? Involving the serverside of
>> a superproject in submodule fetching sounds wrong to me. Me thinks that
>> the upstream of the submodule should always control if a sha1 is allowed
>> to be fetched. Or did I understand you wrong?
>
> Yes and no.
> The serverside submodule repository should be responsible for the ultimate
> decision if you are allowed to fetch that sha1. But maybe on pushing the
> superproject, we can store a hint in the submodule, that this sha1 is legit.
> Although I may be missguided in my thinking here as the superproject
> should have no influence on the submodule.

Submodules should never be aware of their superproject. But a superproject
does know its submodules, so I don't think the influence you describe here
is a problem per se. It's just looking like a corner case to me, as in a
lot of scenarios submodules do not live on the same server. And even if
they do, a superproject has no canonical way of finding their submodule's
repos (except for submodules that use relative URLs). So I'd rather like
to see a generic solution first, before we think about adding an optimized
version for certain setups later ;-)

The only real itch I have with the "superproject declaring submodule sha1s
fetchable on the server" approach is that it smells like a security problem.
The access rights of superprojects are often different from those of the
submodules it contains and this feels like a privilege escalation waiting
to happen.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 21:42  5%                 ` Jens Lehmann
@ 2015-11-16 22:56  5%                   ` Stefan Beller
  2015-11-17 19:46  5%                     ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-16 22:56 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

On Mon, Nov 16, 2015 at 1:42 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 16.11.2015 um 20:25 schrieb Stefan Beller:
>>
>> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de>
>> wrote:
>>>
>>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>>>
>>>> Thanks for pointing out that we already have some kind of server
>>>> support.
>>>>
>>>> I wonder if we should add an additional way to make fetching only some
>>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>>> where superprojects point{ed} to", even if you force push a
>>>> superproject,
>>>> you want to want to only allow fetching all sha1s which exist in the
>>>> current
>>>> superprojects branch.)
>>>
>>>
>>>
>>> Me thinks the restrictions for sha1-fetching could come from the branches
>>> these sha1s are found in the upstream submodule: if the client is allowed
>>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>>
>>
>> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
>> set, is not sufficient though.
>>
>> To fetch an arbitrary sha1, you would need to check if that sha1 is part
>> of the history of any advertised branch and then allow fetching
>> serverside,
>> which sounds like some work for the server, which we may want to avoid
>> by having smarter data structures there.
>>
>> Instead of having to search all branches for the requested sha1, we could
>> have
>> some sort of data structure to make it not an O(n) operation (n being
>> all objects
>> in the repo).
>>
>> Maybe I overestimate the work which needs to be done, because the server
>> has
>> bitmaps nowadays.
>>
>> Maybe a lazy reverse-pointer graph can be established on the serverside.
>> So I guess when we add the feature to fetch arbitrary sha1s, reachable
>> from
>> any branch, people using submodules will make use of the feature. (such as
>> with
>> git fetch --recurse --depth 1 or via a new `git fetch --recursive
>> --up-to-submodule-tip-only`)
>>
>> So once the server is asked for a certain sha1, it will do the
>> reachability check,
>> which takes some effort, but then stores the result in the form:
>> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>>
>> So when the next fetch request for $sha1 arrives, the server only needs to
>> check for ${current tip sha} to be part of $branch, which is expected to
>> be
>> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap
>> could
>> just tell you or at least shorten the walk even more)
>> If the ${branch} has changed, the next evaluation for $sha1 can update
>> the cache,
>> such that the reverse lookup is not expensive on expectation.
>
>
> Makes sense, although I do not know enough about the server side to tell if
> it would need such an optimization or will cope with the load just fine.
>
> But even if we'd enable such a feature without having to set an extra config
> option, a submodule fetch asking for certain sha1s would have to fall back
> to a simple "fetch all" like we do now when the server doesn't support that
> for backwards compatibility. But maybe that's just obvious.

It's not obvious to me.  Say you run the command:

    git clone --recursive --depth=1 ...

Currently the depth argument for the submodules is ignored, because it
doesn't work out conceptually. This is because recursive fetches fetch the
branch tips and not the submodule-specified sha1.

If we want to make it work, we would need to think about, what we want
to achieve here.
depth is usually used to reduce the transmit time/bandwidth required.

So if the server tells us it's not allowing fetching of arbitrary
sha1s by its cryptic message:

    $ git fetch origin 6f963a895a97d720c909fcf4eb0544a272ef7c49:refs/heads/copy
    error: no such remote ref 6f963a895a97d720c909fcf4eb0544a272ef7c49

we have two choices, either error out with

    die(_("Server doesn't support cloning of arbitrary sha1s"))

or we could pretend as if we know how to fix it by cloning regularly
with the whole history
attached and then present a tightened history by shallowing after
cloning. But that would
defeat the whole point of the depth argument in the first place, the
time and bandwidth would
have been wasted. So instead I'd rather have the user make the choice.



>
>> I assume this will mostly be used with submodules, so only a few sha1s
>> need
>> this caching.
>
>
> I won't bet on that, some of the submodules at $DAYJOB are rather busy and
> see almost the same traffic as their superprojects ;-)

But do you update the superproject with each submodules commit?
(We plan to update the superproject in Gerrit with each submodule eventually,
so yeah that point is nuts.)

>
> Submodules should never be aware of their superproject. But a superproject
> does know its submodules, so I don't think the influence you describe here
> is a problem per se. It's just looking like a corner case to me, as in a
> lot of scenarios submodules do not live on the same server. And even if
> they do, a superproject has no canonical way of finding their submodule's
> repos (except for submodules that use relative URLs). So I'd rather like
> to see a generic solution first, before we think about adding an optimized
> version for certain setups later ;-)

ok. :)

>
> The only real itch I have with the "superproject declaring submodule sha1s
> fetchable on the server" approach is that it smells like a security problem.
> The access rights of superprojects are often different from those of the
> submodules it contains and this feels like a privilege escalation waiting
> to happen.

Yeah, I agree. It was one of my first ideas, probably not the best
idea on this topic.
Currently I am amazed by the reverse lazy caching if that check should ever be
a problem on the server side.

^ permalink raw reply	[relevance 5%]

* [PATCHv2] push: add recurseSubmodules config option
@ 2015-11-17 11:05 18% Mike Crowe
  0 siblings, 0 replies; 200+ results
From: Mike Crowe @ 2015-11-17 11:05 UTC (permalink / raw)
  To: git; +Cc: Mike Crowe, Stefan Beller, Eric Sunshine

The --recurse-submodules command line parameter has existed for some
time but it has no config file equivalent.

Following the style of the corresponding parameter for git fetch, let's
invent push.recurseSubmodules to provide a default for this
parameter. This also requires the addition of --recurse-submodules=no to
allow the configuration to be overridden on the command line when
required.

The most straightforward way to implement this appears to be to make
push use code in submodule-config in a similar way to fetch.

Signed-off-by: Mike Crowe <mac@mcrowe.com>

---
 Documentation/config.txt       |  14 ++++
 Documentation/git-push.txt     |  24 ++++---
 builtin/push.c                 |  39 +++++++----
 submodule-config.c             |  29 ++++++++
 submodule-config.h             |   1 +
 submodule.h                    |   1 +
 t/t5531-deep-submodule-push.sh | 152 ++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 234 insertions(+), 26 deletions(-)

Changes from v1:

 * Incorporate feedback from Eric Sunshine:

 ** push.recurseSubmodules config option now supports 'no' value.

 ** --no-recurse-submodules is now a synonym for
    --recurse-submodules=no.

 ** use "git -c" rather than "git config" in tests to avoid leaving
    config options set if a test fails.

 * Fix several && chain failures in tests noticed by Stefan Beller.

 * Minor tweaks to documentation

 * Fix minor naming issues in tests

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..5a9f2ee 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2226,6 +2226,20 @@ push.gpgSign::
 	override a value from a lower-priority config file. An explicit
 	command-line flag always overrides this config option.
 
+push.recurseSubmodules::
+	Make sure all submodule commits used by the revisions to be pushed
+	are available on a remote-tracking branch. If the value is 'check'
+	then Git will verify that all submodule commits that changed in the
+	revisions to be pushed are available on at least one remote of the
+	submodule. If any commits are missing, the push will be aborted and
+	exit with non-zero status. If the value is 'on-demand' then all
+	submodules that changed in the revisions to be pushed will be
+	pushed. If on-demand was not able to push all necessary revisions
+	it will also be aborted and exit with non-zero status. If the value
+	is 'no' then default behavior of ignoring submodules when pushing
+	is retained. You may override this configuration at time of push by
+	specifying '--recurse-submodules=check|on-demand|no'.
+
 rebase.stat::
 	Whether to show a diffstat of what changed upstream since the last
 	rebase. False by default.
diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 85a4d7d..4c775bc 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -257,16 +257,20 @@ origin +master` to force a push to the `master` branch). See the
 	is specified. This flag forces progress status even if the
 	standard error stream is not directed to a terminal.
 
---recurse-submodules=check|on-demand::
-	Make sure all submodule commits used by the revisions to be
-	pushed are available on a remote-tracking branch. If 'check' is
-	used Git will verify that all submodule commits that changed in
-	the revisions to be pushed are available on at least one remote
-	of the submodule. If any commits are missing the push will be
-	aborted and exit with non-zero status. If 'on-demand' is used
-	all submodules that changed in the revisions to be pushed will
-	be pushed. If on-demand was not able to push all necessary
-	revisions it will also be aborted and exit with non-zero status.
+--no-recurse-submodules::
+--recurse-submodules=check|on-demand|no::
+	May be used to make sure all submodule commits used by the
+	revisions to be pushed are available on a remote-tracking branch.
+	If 'check' is used Git will verify that all submodule commits that
+	changed in the revisions to be pushed are available on at least one
+	remote of the submodule. If any commits are missing the push will
+	be aborted and exit with non-zero status. If 'on-demand' is used
+	all submodules that changed in the revisions to be pushed will be
+	pushed. If on-demand was not able to push all necessary revisions
+	it will also be aborted and exit with non-zero status. A value of
+	'no' or using '--no-recurse-submodules' can be used to override the
+	push.recurseSubmodules configuration variable when no submodule
+	recursion is required.
 
 --[no-]verify::
 	Toggle the pre-push hook (see linkgit:githooks[5]).  The
diff --git a/builtin/push.c b/builtin/push.c
index 3bda430..f9b59b4 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -9,6 +9,7 @@
 #include "transport.h"
 #include "parse-options.h"
 #include "submodule.h"
+#include "submodule-config.h"
 #include "send-pack.h"
 
 static const char * const push_usage[] = {
@@ -20,7 +21,7 @@ static int thin = 1;
 static int deleterefs;
 static const char *receivepack;
 static int verbosity;
-static int progress = -1;
+static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 
 static struct push_cas_option cas;
 
@@ -452,22 +453,17 @@ static int do_push(const char *repo, int flags)
 static int option_parse_recurse_submodules(const struct option *opt,
 				   const char *arg, int unset)
 {
-	int *flags = opt->value;
+	int *recurse_submodules = opt->value;
 
-	if (*flags & (TRANSPORT_RECURSE_SUBMODULES_CHECK |
-		      TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND))
+	if (*recurse_submodules != RECURSE_SUBMODULES_DEFAULT)
 		die("%s can only be used once.", opt->long_name);
 
-	if (arg) {
-		if (!strcmp(arg, "check"))
-			*flags |= TRANSPORT_RECURSE_SUBMODULES_CHECK;
-		else if (!strcmp(arg, "on-demand"))
-			*flags |= TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND;
-		else
-			die("bad %s argument: %s", opt->long_name, arg);
-	} else
-		die("option %s needs an argument (check|on-demand)",
-				opt->long_name);
+	if (unset)
+		*recurse_submodules = RECURSE_SUBMODULES_OFF;
+	else if (arg)
+		*recurse_submodules = parse_push_recurse_submodules_arg(opt->long_name, arg);
+	else
+		die("%s missing parameter", opt->long_name);
 
 	return 0;
 }
@@ -522,6 +518,10 @@ static int git_push_config(const char *k, const char *v, void *cb)
 					return error("Invalid value for '%s'", k);
 			}
 		}
+	} else if (!strcmp(k, "push.recursesubmodules")) {
+		const char *value;
+		if (!git_config_get_value("push.recursesubmodules", &value))
+			recurse_submodules = parse_push_recurse_submodules_arg(k, value);
 	}
 
 	return git_default_config(k, v, NULL);
@@ -532,6 +532,7 @@ int cmd_push(int argc, const char **argv, const char *prefix)
 	int flags = 0;
 	int tags = 0;
 	int push_cert = -1;
+	int recurse_submodules_from_cmdline = RECURSE_SUBMODULES_DEFAULT;
 	int rc;
 	const char *repo = NULL;	/* default repository */
 	struct option options[] = {
@@ -549,7 +550,7 @@ int cmd_push(int argc, const char **argv, const char *prefix)
 		  0, CAS_OPT_NAME, &cas, N_("refname>:<expect"),
 		  N_("require old value of ref to be at this value"),
 		  PARSE_OPT_OPTARG, parseopt_push_cas_option },
-		{ OPTION_CALLBACK, 0, "recurse-submodules", &flags, "check|on-demand",
+		{ OPTION_CALLBACK, 0, "recurse-submodules", &recurse_submodules_from_cmdline, N_("check|on-demand|no"),
 			N_("control recursive pushing of submodules"),
 			PARSE_OPT_OPTARG, option_parse_recurse_submodules },
 		OPT_BOOL( 0 , "thin", &thin, N_("use thin pack")),
@@ -580,6 +581,14 @@ int cmd_push(int argc, const char **argv, const char *prefix)
 	if (deleterefs && argc < 2)
 		die(_("--delete doesn't make sense without any refs"));
 
+	if (recurse_submodules_from_cmdline != RECURSE_SUBMODULES_DEFAULT)
+		recurse_submodules = recurse_submodules_from_cmdline;
+
+	if (recurse_submodules == RECURSE_SUBMODULES_CHECK)
+		flags |= TRANSPORT_RECURSE_SUBMODULES_CHECK;
+	else if (recurse_submodules == RECURSE_SUBMODULES_ON_DEMAND)
+		flags |= TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND;
+
 	if (tags)
 		add_refspec("refs/tags/*");
 
diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..fe8ceab 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -228,6 +228,35 @@ int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg)
 	return parse_fetch_recurse(opt, arg, 1);
 }
 
+static int parse_push_recurse(const char *opt, const char *arg,
+			       int die_on_error)
+{
+	switch (git_config_maybe_bool(opt, arg)) {
+	case 1:
+		/* There's no simple "on" value when pushing */
+		if (die_on_error)
+			die("bad %s argument: %s", opt, arg);
+		else
+			return RECURSE_SUBMODULES_ERROR;
+	case 0:
+		return RECURSE_SUBMODULES_OFF;
+	default:
+		if (!strcmp(arg, "on-demand"))
+			return RECURSE_SUBMODULES_ON_DEMAND;
+		else if (!strcmp(arg, "check"))
+			return RECURSE_SUBMODULES_CHECK;
+		else if (die_on_error)
+			die("bad %s argument: %s", opt, arg);
+		else
+			return RECURSE_SUBMODULES_ERROR;
+	}
+}
+
+int parse_push_recurse_submodules_arg(const char *opt, const char *arg)
+{
+	return parse_push_recurse(opt, arg, 1);
+}
+
 static void warn_multiple_config(const unsigned char *commit_sha1,
 				 const char *name, const char *option)
 {
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..9bfa65a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -19,6 +19,7 @@ struct submodule {
 };
 
 int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
+int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
 int parse_submodule_config_option(const char *var, const char *value);
 const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
diff --git a/submodule.h b/submodule.h
index 5507c3d..ddff512 100644
--- a/submodule.h
+++ b/submodule.h
@@ -5,6 +5,7 @@ struct diff_options;
 struct argv_array;
 
 enum {
+	RECURSE_SUBMODULES_CHECK = -4,
 	RECURSE_SUBMODULES_ERROR = -3,
 	RECURSE_SUBMODULES_NONE = -2,
 	RECURSE_SUBMODULES_ON_DEMAND = -1,
diff --git a/t/t5531-deep-submodule-push.sh b/t/t5531-deep-submodule-push.sh
index 6507487..9fda7b0 100755
--- a/t/t5531-deep-submodule-push.sh
+++ b/t/t5531-deep-submodule-push.sh
@@ -64,7 +64,12 @@ test_expect_success 'push fails if submodule commit not on remote' '
 		cd work &&
 		git add gar/bage &&
 		git commit -m "Third commit for gar/bage" &&
-		test_must_fail git push --recurse-submodules=check ../pub.git master
+		# the push should fail with --recurse-submodules=check
+		# on the command line...
+		test_must_fail git push --recurse-submodules=check ../pub.git master &&
+
+		# ...or if specified in the configuration..
+		test_must_fail git -c push.recurseSubmodules=check push ../pub.git master
 	)
 '
 
@@ -79,6 +84,151 @@ test_expect_success 'push succeeds after commit was pushed to remote' '
 	)
 '
 
+test_expect_success 'push succeeds if submodule commit not on remote but using on-demand on command line' '
+	(
+		cd work/gar/bage &&
+		>recurse-on-demand-on-command-line &&
+		git add recurse-on-demand-on-command-line &&
+		git commit -m "Recurse on-demand on command line junk"
+	) &&
+	(
+		cd work &&
+		git add gar/bage &&
+		git commit -m "Recurse on-demand on command line for gar/bage" &&
+		git push --recurse-submodules=on-demand ../pub.git master &&
+		# Check that the supermodule commit got there
+		git fetch ../pub.git &&
+		git diff --quiet FETCH_HEAD master &&
+		# Check that the submodule commit got there too
+		cd gar/bage &&
+		git diff --quiet origin/master master
+	)
+'
+
+test_expect_success 'push succeeds if submodule commit not on remote but using on-demand from config' '
+	(
+		cd work/gar/bage &&
+		>recurse-on-demand-from-config &&
+		git add recurse-on-demand-from-config &&
+		git commit -m "Recurse on-demand from config junk"
+	) &&
+	(
+		cd work &&
+		git add gar/bage &&
+		git commit -m "Recurse on-demand from config for gar/bage" &&
+		git -c push.recurseSubmodules=on-demand push ../pub.git master &&
+		# Check that the supermodule commit got there
+		git fetch ../pub.git &&
+		git diff --quiet FETCH_HEAD master &&
+		# Check that the submodule commit got there too
+		cd gar/bage &&
+		git diff --quiet origin/master master
+	)
+'
+
+test_expect_success 'push fails if submodule commit not on remote using check from cmdline overriding config' '
+	(
+		cd work/gar/bage &&
+		>recurse-check-on-command-line-overriding-config &&
+		git add recurse-check-on-command-line-overriding-config &&
+		git commit -m "Recurse on command-line overridiing config junk"
+	) &&
+	(
+		cd work &&
+		git add gar/bage &&
+		git commit -m "Recurse on command-line overriding config for gar/bage" &&
+		test_must_fail git -c push.recurseSubmodules=on-demand push --recurse-submodules=check ../pub.git master &&
+		# Check that the supermodule commit did not get there
+		git fetch ../pub.git &&
+		git diff --quiet FETCH_HEAD master^ &&
+		# Check that the submodule commit did not get there
+		cd gar/bage &&
+		git diff --quiet origin/master master^
+	)
+'
+
+test_expect_success 'push succeeds if submodule commit not on remote using on-demand from cmdline overriding config' '
+	(
+		cd work/gar/bage &&
+		>recurse-on-demand-on-command-line-overriding-config &&
+		git add recurse-on-demand-on-command-line-overriding-config &&
+		git commit -m "Recurse on-demand on command-line overriding config junk"
+	) &&
+	(
+		cd work &&
+		git add gar/bage &&
+		git commit -m "Recurse on-demand on command-line overriding config for gar/bage" &&
+		git -c push.recurseSubmodules=check push --recurse-submodules=on-demand ../pub.git master &&
+		# Check that the supermodule commit got there
+		git fetch ../pub.git &&
+		git diff --quiet FETCH_HEAD master &&
+		# Check that the submodule commit got there
+		cd gar/bage &&
+		git diff --quiet origin/master master
+	)
+'
+
+test_expect_success 'push succeeds if submodule commit disabling recursion from cmdline overriding config' '
+	(
+		cd work/gar/bage &&
+		>recurse-disable-on-command-line-overriding-config &&
+		git add recurse-disable-on-command-line-overriding-config &&
+		git commit -m "Recurse disable on command-line overriding config junk"
+	) &&
+	(
+		cd work &&
+		git add gar/bage &&
+		git commit -m "Recurse disable on command-line overriding config for gar/bage" &&
+		git -c push.recurseSubmodules=check push --recurse-submodules=no ../pub.git master &&
+		# Check that the supermodule commit got there
+		git fetch ../pub.git &&
+		git diff --quiet FETCH_HEAD master &&
+		# But that the submodule commit did not
+		( cd gar/bage && git diff --quiet origin/master master^ ) &&
+		# Now push it to avoid confusing future tests
+		git push --recurse-submodules=on-demand ../pub.git master
+	)
+'
+
+test_expect_success 'push succeeds if submodule commit disabling recursion from cmdline (alternative form) overriding config' '
+	(
+		cd work/gar/bage &&
+		>recurse-disable-on-command-line-alt-overriding-config &&
+		git add recurse-disable-on-command-line-alt-overriding-config &&
+		git commit -m "Recurse disable on command-line alternative overriding config junk"
+	) &&
+	(
+		cd work &&
+		git add gar/bage &&
+		git commit -m "Recurse disable on command-line alternative overriding config for gar/bage" &&
+		git -c push.recurseSubmodules=check push --no-recurse-submodules ../pub.git master &&
+		# Check that the supermodule commit got there
+		git fetch ../pub.git &&
+		git diff --quiet FETCH_HEAD master &&
+		# But that the submodule commit did not
+		( cd gar/bage && git diff --quiet origin/master master^ ) &&
+		# Now push it to avoid confusing future tests
+		git push --recurse-submodules=on-demand ../pub.git master
+	)
+'
+
+test_expect_success 'push fails if recurse submodules option passed as yes' '
+	(
+		cd work/gar/bage &&
+		>recurse-push-fails-if-recurse-submodules-passed-as-yes &&
+		git add recurse-push-fails-if-recurse-submodules-passed-as-yes &&
+		git commit -m "Recurse push fails if recurse submodules option passed as yes"
+	) &&
+	(
+		cd work &&
+		git add gar/bage &&
+		git commit -m "Recurse push fails if recurse submodules option passed as yes for gar/bage" &&
+		test_must_fail git push --recurse-submodules=yes ../pub.git master &&
+		test_must_fail git -c push.recurseSubmodules=yes push ../pub.git master &&
+		git push --recurse-submodules=on-demand ../pub.git master
+	)
+'
+
 test_expect_success 'push fails when commit on multiple branches if one branch has no remote' '
 	(
 		cd work/gar/bage &&
-- 
2.1.4

^ permalink raw reply related	[relevance 18%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-16 22:56  5%                   ` Stefan Beller
@ 2015-11-17 19:46  5%                     ` Jens Lehmann
  2015-11-17 20:04  5%                       ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-17 19:46 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

Am 16.11.2015 um 23:56 schrieb Stefan Beller:
> On Mon, Nov 16, 2015 at 1:42 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 16.11.2015 um 20:25 schrieb Stefan Beller:
>>>
>>> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de>
>>> wrote:
>>>>
>>>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>>>>
>>>>> Thanks for pointing out that we already have some kind of server
>>>>> support.
>>>>>
>>>>> I wonder if we should add an additional way to make fetching only some
>>>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>>>> where superprojects point{ed} to", even if you force push a
>>>>> superproject,
>>>>> you want to want to only allow fetching all sha1s which exist in the
>>>>> current
>>>>> superprojects branch.)
>>>>
>>>>
>>>>
>>>> Me thinks the restrictions for sha1-fetching could come from the branches
>>>> these sha1s are found in the upstream submodule: if the client is allowed
>>>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>>>
>>>
>>> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
>>> set, is not sufficient though.
>>>
>>> To fetch an arbitrary sha1, you would need to check if that sha1 is part
>>> of the history of any advertised branch and then allow fetching
>>> serverside,
>>> which sounds like some work for the server, which we may want to avoid
>>> by having smarter data structures there.
>>>
>>> Instead of having to search all branches for the requested sha1, we could
>>> have
>>> some sort of data structure to make it not an O(n) operation (n being
>>> all objects
>>> in the repo).
>>>
>>> Maybe I overestimate the work which needs to be done, because the server
>>> has
>>> bitmaps nowadays.
>>>
>>> Maybe a lazy reverse-pointer graph can be established on the serverside.
>>> So I guess when we add the feature to fetch arbitrary sha1s, reachable
>>> from
>>> any branch, people using submodules will make use of the feature. (such as
>>> with
>>> git fetch --recurse --depth 1 or via a new `git fetch --recursive
>>> --up-to-submodule-tip-only`)
>>>
>>> So once the server is asked for a certain sha1, it will do the
>>> reachability check,
>>> which takes some effort, but then stores the result in the form:
>>> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>>>
>>> So when the next fetch request for $sha1 arrives, the server only needs to
>>> check for ${current tip sha} to be part of $branch, which is expected to
>>> be
>>> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap
>>> could
>>> just tell you or at least shorten the walk even more)
>>> If the ${branch} has changed, the next evaluation for $sha1 can update
>>> the cache,
>>> such that the reverse lookup is not expensive on expectation.
>>
>>
>> Makes sense, although I do not know enough about the server side to tell if
>> it would need such an optimization or will cope with the load just fine.
>>
>> But even if we'd enable such a feature without having to set an extra config
>> option, a submodule fetch asking for certain sha1s would have to fall back
>> to a simple "fetch all" like we do now when the server doesn't support that
>> for backwards compatibility. But maybe that's just obvious.
>
> It's not obvious to me.  Say you run the command:
>
>      git clone --recursive --depth=1 ...
>
> Currently the depth argument for the submodules is ignored, because it
> doesn't work out conceptually. This is because recursive fetches fetch the
> branch tips and not the submodule-specified sha1.
>
> If we want to make it work, we would need to think about, what we want
> to achieve here.
> depth is usually used to reduce the transmit time/bandwidth required.
>
> So if the server tells us it's not allowing fetching of arbitrary
> sha1s by its cryptic message:
>
>      $ git fetch origin 6f963a895a97d720c909fcf4eb0544a272ef7c49:refs/heads/copy
>      error: no such remote ref 6f963a895a97d720c909fcf4eb0544a272ef7c49
>
> we have two choices, either error out with
>
>      die(_("Server doesn't support cloning of arbitrary sha1s"))
>
> or we could pretend as if we know how to fix it by cloning regularly
> with the whole history
> attached and then present a tightened history by shallowing after
> cloning. But that would
> defeat the whole point of the depth argument in the first place, the
> time and bandwidth would
> have been wasted. So instead I'd rather have the user make the choice.

But for quite some time you'll have older servers out there that
don't support fetching a single sha1 or aren't configured to do so.
Wouldn't it be better to give the user an appropriate warning and
fall back to cloning everything for those submodules while using the
optimized new method for all others and the superproject? Otherwise
you won't be able to limit the depth if only a single submodule
server doesn't support fetching a single sha1.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 19:46  5%                     ` Jens Lehmann
@ 2015-11-17 20:04  5%                       ` Stefan Beller
  2015-11-17 20:39  4%                         ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-17 20:04 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

On Tue, Nov 17, 2015 at 11:46 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>
> But for quite some time you'll have older servers out there that
> don't support fetching a single sha1 or aren't configured to do so.

Only when talking about the open source side. If you have all the
submodules/superprojects on your companies mirror, you can control
the git installations there.

> Wouldn't it be better to give the user an appropriate warning and
> fall back to cloning everything for those submodules while using the
> optimized new method for all others and the superproject? Otherwise
> you won't be able to limit the depth if only a single submodule
> server doesn't support fetching a single sha1.
>

I think warnings are fine, but no fallbacks. The warning could look like:

    Server for submodule <foo> doesn't support fetching by sha1.
    Fetch again without depth argument.

and keep going with the other submodules. This would allow the user
to make an informed decision if they want to have the fallback solution
(which requires more band width, disk space)
On the other hand, that's what people do today, so it's not that bad either,
so I guess falling bad would work too.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 20:04  5%                       ` Stefan Beller
@ 2015-11-17 20:39  4%                         ` Jens Lehmann
  2015-11-17 20:49  2%                           ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-17 20:39 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

Am 17.11.2015 um 21:04 schrieb Stefan Beller:
> On Tue, Nov 17, 2015 at 11:46 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>
>> But for quite some time you'll have older servers out there that
>> don't support fetching a single sha1 or aren't configured to do so.
>
> Only when talking about the open source side. If you have all the
> submodules/superprojects on your companies mirror, you can control
> the git installations there.

Sure. But that doesn't mean we should make life harder for the open
source side, no? We'll have to support both for quite some time.

>> Wouldn't it be better to give the user an appropriate warning and
>> fall back to cloning everything for those submodules while using the
>> optimized new method for all others and the superproject? Otherwise
>> you won't be able to limit the depth if only a single submodule
>> server doesn't support fetching a single sha1.
>>
>
> I think warnings are fine, but no fallbacks. The warning could look like:
>
>      Server for submodule <foo> doesn't support fetching by sha1.
>      Fetch again without depth argument.
>
> and keep going with the other submodules. This would allow the user
> to make an informed decision if they want to have the fallback solution
> (which requires more band width, disk space)

No, this is a regression. This worked before but now some submodules
are missing from the clone. And if that happens inside a Jenkins
script I doubt that Jenkins can make an informed decision, that job
will simply fail.

> On the other hand, that's what people do today, so it's not that bad either,
> so I guess falling bad would work too.

Not that bad? I don't see any other sane way. Don't break formerly
working use cases without a very good reason. Fall back to what we
did before (even if it is suboptimal) and only then use the new
optimized (and admittedly better) feature when it is available.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 20:39  4%                         ` Jens Lehmann
@ 2015-11-17 20:49  2%                           ` Stefan Beller
  2015-11-17 21:00  5%                             ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-17 20:49 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

On Tue, Nov 17, 2015 at 12:39 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 17.11.2015 um 21:04 schrieb Stefan Beller:
>>
>> On Tue, Nov 17, 2015 at 11:46 AM, Jens Lehmann <Jens.Lehmann@web.de>
>> wrote:
>>>
>>>
>>> But for quite some time you'll have older servers out there that
>>> don't support fetching a single sha1 or aren't configured to do so.
>>
>>
>> Only when talking about the open source side. If you have all the
>> submodules/superprojects on your companies mirror, you can control
>> the git installations there.
>
>
> Sure. But that doesn't mean we should make life harder for the open
> source side, no? We'll have to support both for quite some time.
>
>>> Wouldn't it be better to give the user an appropriate warning and
>>> fall back to cloning everything for those submodules while using the
>>> optimized new method for all others and the superproject? Otherwise
>>> you won't be able to limit the depth if only a single submodule
>>> server doesn't support fetching a single sha1.
>>>
>>
>> I think warnings are fine, but no fallbacks. The warning could look like:
>>
>>      Server for submodule <foo> doesn't support fetching by sha1.
>>      Fetch again without depth argument.
>>
>> and keep going with the other submodules. This would allow the user
>> to make an informed decision if they want to have the fallback solution
>> (which requires more band width, disk space)
>
>
> No, this is a regression. This worked before but now some submodules
> are missing from the clone. And if that happens inside a Jenkins
> script I doubt that Jenkins can make an informed decision, that job
> will simply fail.
>
>> On the other hand, that's what people do today, so it's not that bad
>> either,
>> so I guess falling bad would work too.
>
>
> Not that bad? I don't see any other sane way. Don't break formerly
> working use cases without a very good reason. Fall back to what we
> did before (even if it is suboptimal) and only then use the new
> optimized (and admittedly better) feature when it is available.

I assumed we'd have yet another flag to activate the new behavior,
but if you want to roll out that new feature as a default, I agree on
needing the fallback.

^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
  2015-11-17 20:49  2%                           ` Stefan Beller
@ 2015-11-17 21:00  5%                             ` Jens Lehmann
  0 siblings, 0 replies; 200+ results
From: Jens Lehmann @ 2015-11-17 21:00 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff King, Lars Schneider, git@vger.kernel.org, Junio C Hamano,
	Duy Nguyen

Am 17.11.2015 um 21:49 schrieb Stefan Beller:
> I assumed we'd have yet another flag to activate the new behavior,
> but if you want to roll out that new feature as a default, I agree on
> needing the fallback.

Ah, I was under the impression that users are surprised by --depth
not propagating into the submodules, so I considered this a bugfix
which doesn't need extra configuration (and the subject line seems
to support my impression ;-).

And if some users really need the old behavior, they can remove the
--recurse-submodules from the clone and issue a submodule update
afterwards (Unless they speak up now and tell us why having --depth
only apply to the superproject is a feature ... ;-).

^ permalink raw reply	[relevance 5%]

* Re: [PATCHv4 0/9] Expose submodule parallelism to the user
  2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (7 preceding siblings ...)
  2015-11-14  1:07 24% ` [PATCHv4 9/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
@ 2015-11-20 12:02  4% ` Jeff King
  8 siblings, 0 replies; 200+ results
From: Jeff King @ 2015-11-20 12:02 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, gitster, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine, j6t

On Fri, Nov 13, 2015 at 05:06:53PM -0800, Stefan Beller wrote:

> This replaces sb/submodule-parallel-update.
> It applies on top of d075d2604c0 (Merge branch
> 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update,
> with additionally having merged submodule-parallel-fetch,
> which has applied "run-command: detect finished
> children by closed pipe rather than waitpid" on top of it.

Thanks for being detailed here. It makes a poor, inexperienced
maintainer's life much easier. :)

-Peff

^ permalink raw reply	[relevance 4%]

* [PATCHv2] run-command: detect finished children by closed pipe rather than waitpid
@ 2015-11-20 21:08  7% Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-20 21:08 UTC (permalink / raw)
  To: peff, git; +Cc: jrnieder, johannes.schindelin, j6t, tboegi, Stefan Beller

Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid.
As waitpid is not fully working in Windows, this is an approach which
allows for better cross platform operation. (It's less code, too)

Previously we did not close the read pipe of finished children, which we
do now.

The old way missed some messages on an early abort. We just killed the
children and did not bother to look what was left over. With this approach
we'd send a signal to the children and wait for them to close the pipe to
have all the messages (including possible "killed by signal 15" messages).

To have the test suite passing as before, we allow for real graceful
abortion now. In case the user wishes to abort parallel execution
the user needs to provide either the signal used to kill all children
or the children are let run until they finish normally.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

This applies on top of peff/sb/submodule-parallel-fetch.
It is a resend without modification from Nov. 11th.

 run-command.c      | 141 +++++++++++++++++++++++------------------------------
 run-command.h      |  12 +++--
 submodule.c        |   3 --
 test-run-command.c |   3 --
 4 files changed, 69 insertions(+), 90 deletions(-)

diff --git a/run-command.c b/run-command.c
index 07424e9..db4d916 100644
--- a/run-command.c
+++ b/run-command.c
@@ -858,6 +858,12 @@ int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
 	return finish_command(cmd);
 }
 
+enum child_state {
+	GIT_CP_FREE,
+	GIT_CP_WORKING,
+	GIT_CP_WAIT_CLEANUP,
+};
+
 static struct parallel_processes {
 	void *data;
 
@@ -869,7 +875,7 @@ static struct parallel_processes {
 	task_finished_fn task_finished;
 
 	struct {
-		unsigned in_use : 1;
+		enum child_state state;
 		struct child_process process;
 		struct strbuf err;
 		void *data;
@@ -923,7 +929,7 @@ static void kill_children(struct parallel_processes *pp, int signo)
 	int i, n = pp->max_processes;
 
 	for (i = 0; i < n; i++)
-		if (pp->children[i].in_use)
+		if (pp->children[i].state == GIT_CP_WORKING)
 			kill(pp->children[i].process.pid, signo);
 }
 
@@ -967,7 +973,7 @@ static struct parallel_processes *pp_init(int n,
 	for (i = 0; i < n; i++) {
 		strbuf_init(&pp->children[i].err, 0);
 		child_process_init(&pp->children[i].process);
-		pp->pfd[i].events = POLLIN;
+		pp->pfd[i].events = POLLIN | POLLHUP;
 		pp->pfd[i].fd = -1;
 	}
 	sigchain_push_common(handle_children_on_signal);
@@ -1000,39 +1006,46 @@ static void pp_cleanup(struct parallel_processes *pp)
  *  0 if a new task was started.
  *  1 if no new jobs was started (get_next_task ran out of work, non critical
  *    problem with starting a new command)
- * -1 no new job was started, user wishes to shutdown early.
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *    to signal the children.
  */
 static int pp_start_one(struct parallel_processes *pp)
 {
-	int i;
+	int i, code;
 
 	for (i = 0; i < pp->max_processes; i++)
-		if (!pp->children[i].in_use)
+		if (pp->children[i].state == GIT_CP_FREE)
 			break;
 	if (i == pp->max_processes)
 		die("BUG: bookkeeping is hard");
 
-	if (!pp->get_next_task(&pp->children[i].data,
-			       &pp->children[i].process,
-			       &pp->children[i].err,
-			       pp->data)) {
+	code = pp->get_next_task(&pp->children[i].data,
+				 &pp->children[i].process,
+				 &pp->children[i].err,
+				 pp->data);
+	if (!code) {
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
 		return 1;
 	}
+	pp->children[i].process.err = -1;
+	pp->children[i].process.stdout_to_stderr = 1;
+	pp->children[i].process.no_stdin = 1;
 
 	if (start_command(&pp->children[i].process)) {
-		int code = pp->start_failure(&pp->children[i].process,
-					     &pp->children[i].err,
-					     pp->data,
-					     &pp->children[i].data);
+		code = pp->start_failure(&pp->children[i].process,
+					 &pp->children[i].err,
+					 pp->data,
+					 &pp->children[i].data);
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
-		return code ? -1 : 1;
+		if (code)
+			pp->shutdown = 1;
+		return code;
 	}
 
 	pp->nr_processes++;
-	pp->children[i].in_use = 1;
+	pp->children[i].state = GIT_CP_WORKING;
 	pp->pfd[i].fd = pp->children[i].process.err;
 	return 0;
 }
@@ -1050,19 +1063,24 @@ static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
 
 	/* Buffer output from all pipes. */
 	for (i = 0; i < pp->max_processes; i++) {
-		if (pp->children[i].in_use &&
-		    pp->pfd[i].revents & POLLIN)
-			if (strbuf_read_once(&pp->children[i].err,
-					     pp->children[i].process.err, 0) < 0)
+		if (pp->children[i].state == GIT_CP_WORKING &&
+		    pp->pfd[i].revents & (POLLIN | POLLHUP)) {
+			int n = strbuf_read_once(&pp->children[i].err,
+						 pp->children[i].process.err, 0);
+			if (n == 0) {
+				close(pp->children[i].process.err);
+				pp->children[i].state = GIT_CP_WAIT_CLEANUP;
+			} else if (n < 0)
 				if (errno != EAGAIN)
 					die_errno("read");
+		}
 	}
 }
 
 static void pp_output(struct parallel_processes *pp)
 {
 	int i = pp->output_owner;
-	if (pp->children[i].in_use &&
+	if (pp->children[i].state == GIT_CP_WORKING &&
 	    pp->children[i].err.len) {
 		fputs(pp->children[i].err.buf, stderr);
 		strbuf_reset(&pp->children[i].err);
@@ -1071,70 +1089,31 @@ static void pp_output(struct parallel_processes *pp)
 
 static int pp_collect_finished(struct parallel_processes *pp)
 {
-	int i = 0;
-	pid_t pid;
-	int wait_status, code;
+	int i, code;
 	int n = pp->max_processes;
 	int result = 0;
 
 	while (pp->nr_processes > 0) {
-		pid = waitpid(-1, &wait_status, WNOHANG);
-		if (pid == 0)
-			break;
-
-		if (pid < 0)
-			die_errno("wait");
-
 		for (i = 0; i < pp->max_processes; i++)
-			if (pp->children[i].in_use &&
-			    pid == pp->children[i].process.pid)
+			if (pp->children[i].state == GIT_CP_WAIT_CLEANUP)
 				break;
 		if (i == pp->max_processes)
-			die("BUG: found a child process we were not aware of");
-
-		if (strbuf_read(&pp->children[i].err,
-				pp->children[i].process.err, 0) < 0)
-			die_errno("strbuf_read");
-
-		if (WIFSIGNALED(wait_status)) {
-			code = WTERMSIG(wait_status);
-			if (!pp->shutdown &&
-			    code != SIGINT && code != SIGQUIT)
-				strbuf_addf(&pp->children[i].err,
-					    "%s died of signal %d",
-					    pp->children[i].process.argv[0],
-					    code);
-			/*
-			 * This return value is chosen so that code & 0xff
-			 * mimics the exit code that a POSIX shell would report for
-			 * a program that died from this signal.
-			 */
-			code += 128;
-		} else if (WIFEXITED(wait_status)) {
-			code = WEXITSTATUS(wait_status);
-			/*
-			 * Convert special exit code when execvp failed.
-			 */
-			if (code == 127) {
-				code = -1;
-				errno = ENOENT;
-			}
-		} else {
-			strbuf_addf(&pp->children[i].err,
-				    "waitpid is confused (%s)",
-				    pp->children[i].process.argv[0]);
-			code = -1;
-		}
+			break;
+
+		code = finish_command(&pp->children[i].process);
+
+		code = pp->task_finished(code, &pp->children[i].process,
+					 &pp->children[i].err, pp->data,
+					 &pp->children[i].data);
 
-		if (pp->task_finished(code, &pp->children[i].process,
-				      &pp->children[i].err, pp->data,
-				      &pp->children[i].data))
-			result = 1;
+		if (code)
+			result = code;
+		if (code < 0)
+			break;
 
 		pp->nr_processes--;
-		pp->children[i].in_use = 0;
+		pp->children[i].state = GIT_CP_FREE;
 		pp->pfd[i].fd = -1;
-		child_process_deinit(&pp->children[i].process);
 		child_process_init(&pp->children[i].process);
 
 		if (i != pp->output_owner) {
@@ -1157,7 +1136,7 @@ static int pp_collect_finished(struct parallel_processes *pp)
 			 * running process time.
 			 */
 			for (i = 0; i < n; i++)
-				if (pp->children[(pp->output_owner + i) % n].in_use)
+				if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
 					break;
 			pp->output_owner = (pp->output_owner + i) % n;
 		}
@@ -1171,7 +1150,7 @@ int run_processes_parallel(int n,
 			   task_finished_fn task_finished,
 			   void *pp_cb)
 {
-	int i;
+	int i, code;
 	int output_timeout = 100;
 	int spawn_cap = 4;
 	struct parallel_processes *pp;
@@ -1182,12 +1161,12 @@ int run_processes_parallel(int n,
 		    i < spawn_cap && !pp->shutdown &&
 		    pp->nr_processes < pp->max_processes;
 		    i++) {
-			int code = pp_start_one(pp);
+			code = pp_start_one(pp);
 			if (!code)
 				continue;
 			if (code < 0) {
 				pp->shutdown = 1;
-				kill_children(pp, SIGTERM);
+				kill_children(pp, -code);
 			}
 			break;
 		}
@@ -1195,9 +1174,11 @@ int run_processes_parallel(int n,
 			break;
 		pp_buffer_stderr(pp, output_timeout);
 		pp_output(pp);
-		if (pp_collect_finished(pp)) {
-			kill_children(pp, SIGTERM);
+		code = pp_collect_finished(pp);
+		if (code) {
 			pp->shutdown = 1;
+			if (code < 0)
+				kill_children(pp, -code);
 		}
 	}
 
diff --git a/run-command.h b/run-command.h
index c24aa54..414cc81 100644
--- a/run-command.h
+++ b/run-command.h
@@ -134,6 +134,8 @@ int finish_async(struct async *async);
  *
  * Return 1 if the next child is ready to run.
  * Return 0 if there are currently no more tasks to be processed.
+ * To send a signal to other child processes for abortion,
+ * return negative signal code.
  */
 typedef int (*get_next_task_fn)(void **pp_task_cb,
 				struct child_process *cp,
@@ -151,8 +153,9 @@ typedef int (*get_next_task_fn)(void **pp_task_cb,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing. To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*start_failure_fn)(struct child_process *cp,
 				struct strbuf *err,
@@ -169,8 +172,9 @@ typedef int (*start_failure_fn)(struct child_process *cp,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing.  To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*task_finished_fn)(int result,
 				struct child_process *cp,
diff --git a/submodule.c b/submodule.c
index c21b265..281bccd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -689,9 +689,6 @@ static int get_next_submodule(void **task_cb, struct child_process *cp,
 			cp->dir = strbuf_detach(&submodule_path, NULL);
 			cp->env = local_repo_env;
 			cp->git_cmd = 1;
-			cp->no_stdin = 1;
-			cp->stdout_to_stderr = 1;
-			cp->err = -1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
 					    spf->prefix, ce->name);
diff --git a/test-run-command.c b/test-run-command.c
index 13e5d44..b1f04d1 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -26,9 +26,6 @@ static int parallel_next(void** task_cb,
 		return 0;
 
 	argv_array_pushv(&cp->args, d->argv);
-	cp->stdout_to_stderr = 1;
-	cp->no_stdin = 1;
-	cp->err = -1;
 	strbuf_addf(err, "preloaded output of a child\n");
 	number_callbacks++;
 	return 1;
-- 
2.6.1.256.g2277835.dirty

^ permalink raw reply related	[relevance 7%]

* [PATCHv3] run-command: detect finished children by closed pipe rather than waitpid
@ 2015-11-23 21:43  7% Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-23 21:43 UTC (permalink / raw)
  To: peff; +Cc: git, jrnieder, johannes.schindelin, j6t, tboegi, Stefan Beller

Detect if a child stopped working by checking if their stderr pipe
was closed instead of checking their state with waitpid. This resembles
the way we work with child processes in the non-parallel case.

Having a better consistency between the different methods to run
child processes, is the main advantage, though there are more:
* Previously we leaked the open read pipe of finished children.
* waitpid(-1, ...) is not implemented in Windows, this is an approach
  which allows for better cross platform operation.
* less lines of code.

The old way missed some messages on an early abort. We just killed the
children and did not bother to look what was left over. With this approach
we'd send a signal to the children and wait for them to close the pipe to
have all the messages (including possible "killed by signal 15" messages).

To have the test suite passing as before, we allow for real graceful
abortion now. In case the user wishes to abort parallel execution
the user needs to provide either the signal used to kill all children
or the children are let run until they finish normally.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 
 * applies on top of submodule-parallel-fetch
 * updated the commit message to address Jonathans concerns, using the
   words of Johannes.
 * did not rename code to anything else, as I do not see the need for it.
 
 run-command.c      | 141 +++++++++++++++++++++++------------------------------
 run-command.h      |  12 +++--
 submodule.c        |   3 --
 test-run-command.c |   3 --
 4 files changed, 69 insertions(+), 90 deletions(-)
 
 

diff --git a/run-command.c b/run-command.c
index 07424e9..db4d916 100644
--- a/run-command.c
+++ b/run-command.c
@@ -858,6 +858,12 @@ int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
 	return finish_command(cmd);
 }
 
+enum child_state {
+	GIT_CP_FREE,
+	GIT_CP_WORKING,
+	GIT_CP_WAIT_CLEANUP,
+};
+
 static struct parallel_processes {
 	void *data;
 
@@ -869,7 +875,7 @@ static struct parallel_processes {
 	task_finished_fn task_finished;
 
 	struct {
-		unsigned in_use : 1;
+		enum child_state state;
 		struct child_process process;
 		struct strbuf err;
 		void *data;
@@ -923,7 +929,7 @@ static void kill_children(struct parallel_processes *pp, int signo)
 	int i, n = pp->max_processes;
 
 	for (i = 0; i < n; i++)
-		if (pp->children[i].in_use)
+		if (pp->children[i].state == GIT_CP_WORKING)
 			kill(pp->children[i].process.pid, signo);
 }
 
@@ -967,7 +973,7 @@ static struct parallel_processes *pp_init(int n,
 	for (i = 0; i < n; i++) {
 		strbuf_init(&pp->children[i].err, 0);
 		child_process_init(&pp->children[i].process);
-		pp->pfd[i].events = POLLIN;
+		pp->pfd[i].events = POLLIN | POLLHUP;
 		pp->pfd[i].fd = -1;
 	}
 	sigchain_push_common(handle_children_on_signal);
@@ -1000,39 +1006,46 @@ static void pp_cleanup(struct parallel_processes *pp)
  *  0 if a new task was started.
  *  1 if no new jobs was started (get_next_task ran out of work, non critical
  *    problem with starting a new command)
- * -1 no new job was started, user wishes to shutdown early.
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *    to signal the children.
  */
 static int pp_start_one(struct parallel_processes *pp)
 {
-	int i;
+	int i, code;
 
 	for (i = 0; i < pp->max_processes; i++)
-		if (!pp->children[i].in_use)
+		if (pp->children[i].state == GIT_CP_FREE)
 			break;
 	if (i == pp->max_processes)
 		die("BUG: bookkeeping is hard");
 
-	if (!pp->get_next_task(&pp->children[i].data,
-			       &pp->children[i].process,
-			       &pp->children[i].err,
-			       pp->data)) {
+	code = pp->get_next_task(&pp->children[i].data,
+				 &pp->children[i].process,
+				 &pp->children[i].err,
+				 pp->data);
+	if (!code) {
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
 		return 1;
 	}
+	pp->children[i].process.err = -1;
+	pp->children[i].process.stdout_to_stderr = 1;
+	pp->children[i].process.no_stdin = 1;
 
 	if (start_command(&pp->children[i].process)) {
-		int code = pp->start_failure(&pp->children[i].process,
-					     &pp->children[i].err,
-					     pp->data,
-					     &pp->children[i].data);
+		code = pp->start_failure(&pp->children[i].process,
+					 &pp->children[i].err,
+					 pp->data,
+					 &pp->children[i].data);
 		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
 		strbuf_reset(&pp->children[i].err);
-		return code ? -1 : 1;
+		if (code)
+			pp->shutdown = 1;
+		return code;
 	}
 
 	pp->nr_processes++;
-	pp->children[i].in_use = 1;
+	pp->children[i].state = GIT_CP_WORKING;
 	pp->pfd[i].fd = pp->children[i].process.err;
 	return 0;
 }
@@ -1050,19 +1063,24 @@ static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
 
 	/* Buffer output from all pipes. */
 	for (i = 0; i < pp->max_processes; i++) {
-		if (pp->children[i].in_use &&
-		    pp->pfd[i].revents & POLLIN)
-			if (strbuf_read_once(&pp->children[i].err,
-					     pp->children[i].process.err, 0) < 0)
+		if (pp->children[i].state == GIT_CP_WORKING &&
+		    pp->pfd[i].revents & (POLLIN | POLLHUP)) {
+			int n = strbuf_read_once(&pp->children[i].err,
+						 pp->children[i].process.err, 0);
+			if (n == 0) {
+				close(pp->children[i].process.err);
+				pp->children[i].state = GIT_CP_WAIT_CLEANUP;
+			} else if (n < 0)
 				if (errno != EAGAIN)
 					die_errno("read");
+		}
 	}
 }
 
 static void pp_output(struct parallel_processes *pp)
 {
 	int i = pp->output_owner;
-	if (pp->children[i].in_use &&
+	if (pp->children[i].state == GIT_CP_WORKING &&
 	    pp->children[i].err.len) {
 		fputs(pp->children[i].err.buf, stderr);
 		strbuf_reset(&pp->children[i].err);
@@ -1071,70 +1089,31 @@ static void pp_output(struct parallel_processes *pp)
 
 static int pp_collect_finished(struct parallel_processes *pp)
 {
-	int i = 0;
-	pid_t pid;
-	int wait_status, code;
+	int i, code;
 	int n = pp->max_processes;
 	int result = 0;
 
 	while (pp->nr_processes > 0) {
-		pid = waitpid(-1, &wait_status, WNOHANG);
-		if (pid == 0)
-			break;
-
-		if (pid < 0)
-			die_errno("wait");
-
 		for (i = 0; i < pp->max_processes; i++)
-			if (pp->children[i].in_use &&
-			    pid == pp->children[i].process.pid)
+			if (pp->children[i].state == GIT_CP_WAIT_CLEANUP)
 				break;
 		if (i == pp->max_processes)
-			die("BUG: found a child process we were not aware of");
-
-		if (strbuf_read(&pp->children[i].err,
-				pp->children[i].process.err, 0) < 0)
-			die_errno("strbuf_read");
-
-		if (WIFSIGNALED(wait_status)) {
-			code = WTERMSIG(wait_status);
-			if (!pp->shutdown &&
-			    code != SIGINT && code != SIGQUIT)
-				strbuf_addf(&pp->children[i].err,
-					    "%s died of signal %d",
-					    pp->children[i].process.argv[0],
-					    code);
-			/*
-			 * This return value is chosen so that code & 0xff
-			 * mimics the exit code that a POSIX shell would report for
-			 * a program that died from this signal.
-			 */
-			code += 128;
-		} else if (WIFEXITED(wait_status)) {
-			code = WEXITSTATUS(wait_status);
-			/*
-			 * Convert special exit code when execvp failed.
-			 */
-			if (code == 127) {
-				code = -1;
-				errno = ENOENT;
-			}
-		} else {
-			strbuf_addf(&pp->children[i].err,
-				    "waitpid is confused (%s)",
-				    pp->children[i].process.argv[0]);
-			code = -1;
-		}
+			break;
+
+		code = finish_command(&pp->children[i].process);
+
+		code = pp->task_finished(code, &pp->children[i].process,
+					 &pp->children[i].err, pp->data,
+					 &pp->children[i].data);
 
-		if (pp->task_finished(code, &pp->children[i].process,
-				      &pp->children[i].err, pp->data,
-				      &pp->children[i].data))
-			result = 1;
+		if (code)
+			result = code;
+		if (code < 0)
+			break;
 
 		pp->nr_processes--;
-		pp->children[i].in_use = 0;
+		pp->children[i].state = GIT_CP_FREE;
 		pp->pfd[i].fd = -1;
-		child_process_deinit(&pp->children[i].process);
 		child_process_init(&pp->children[i].process);
 
 		if (i != pp->output_owner) {
@@ -1157,7 +1136,7 @@ static int pp_collect_finished(struct parallel_processes *pp)
 			 * running process time.
 			 */
 			for (i = 0; i < n; i++)
-				if (pp->children[(pp->output_owner + i) % n].in_use)
+				if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
 					break;
 			pp->output_owner = (pp->output_owner + i) % n;
 		}
@@ -1171,7 +1150,7 @@ int run_processes_parallel(int n,
 			   task_finished_fn task_finished,
 			   void *pp_cb)
 {
-	int i;
+	int i, code;
 	int output_timeout = 100;
 	int spawn_cap = 4;
 	struct parallel_processes *pp;
@@ -1182,12 +1161,12 @@ int run_processes_parallel(int n,
 		    i < spawn_cap && !pp->shutdown &&
 		    pp->nr_processes < pp->max_processes;
 		    i++) {
-			int code = pp_start_one(pp);
+			code = pp_start_one(pp);
 			if (!code)
 				continue;
 			if (code < 0) {
 				pp->shutdown = 1;
-				kill_children(pp, SIGTERM);
+				kill_children(pp, -code);
 			}
 			break;
 		}
@@ -1195,9 +1174,11 @@ int run_processes_parallel(int n,
 			break;
 		pp_buffer_stderr(pp, output_timeout);
 		pp_output(pp);
-		if (pp_collect_finished(pp)) {
-			kill_children(pp, SIGTERM);
+		code = pp_collect_finished(pp);
+		if (code) {
 			pp->shutdown = 1;
+			if (code < 0)
+				kill_children(pp, -code);
 		}
 	}
 
diff --git a/run-command.h b/run-command.h
index c24aa54..414cc81 100644
--- a/run-command.h
+++ b/run-command.h
@@ -134,6 +134,8 @@ int finish_async(struct async *async);
  *
  * Return 1 if the next child is ready to run.
  * Return 0 if there are currently no more tasks to be processed.
+ * To send a signal to other child processes for abortion,
+ * return negative signal code.
  */
 typedef int (*get_next_task_fn)(void **pp_task_cb,
 				struct child_process *cp,
@@ -151,8 +153,9 @@ typedef int (*get_next_task_fn)(void **pp_task_cb,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing. To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*start_failure_fn)(struct child_process *cp,
 				struct strbuf *err,
@@ -169,8 +172,9 @@ typedef int (*start_failure_fn)(struct child_process *cp,
  * pp_cb is the callback cookie as passed into run_processes_parallel,
  * pp_task_cb is the callback cookie as passed into get_next_task_fn.
  *
- * Return 0 to continue the parallel processing. To abort gracefully,
- * return non zero.
+ * Return 0 to continue the parallel processing.  To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * negative signal code.
  */
 typedef int (*task_finished_fn)(int result,
 				struct child_process *cp,
diff --git a/submodule.c b/submodule.c
index c21b265..281bccd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -689,9 +689,6 @@ static int get_next_submodule(void **task_cb, struct child_process *cp,
 			cp->dir = strbuf_detach(&submodule_path, NULL);
 			cp->env = local_repo_env;
 			cp->git_cmd = 1;
-			cp->no_stdin = 1;
-			cp->stdout_to_stderr = 1;
-			cp->err = -1;
 			if (!spf->quiet)
 				strbuf_addf(err, "Fetching submodule %s%s\n",
 					    spf->prefix, ce->name);
diff --git a/test-run-command.c b/test-run-command.c
index 13e5d44..b1f04d1 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -26,9 +26,6 @@ static int parallel_next(void** task_cb,
 		return 0;
 
 	argv_array_pushv(&cp->args, d->argv);
-	cp->stdout_to_stderr = 1;
-	cp->no_stdin = 1;
-	cp->err = -1;
 	strbuf_addf(err, "preloaded output of a child\n");
 	number_callbacks++;
 	return 1;
-- 
2.6.1.258.gaca1825.dirty

^ permalink raw reply related	[relevance 7%]

* [PATCHv5 0/9] Expose submodule parallelism to the user
@ 2015-11-25  1:14 12% Stefan Beller
  2015-11-25  1:14 26% ` [PATCHv5 2/9] submodule-config: keep update strategy around Stefan Beller
                   ` (7 more replies)
  0 siblings, 8 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This is also available at github/stefanbeller/git submodule-parallel-update
It applies on top of a merge of sb/submodule-parallel-fetch
(including the "run-command: detect finished children by closed pipe rather than waitpid"
sent yesterday) with
"Merge branch 'rs/daemon-plug-child-leak' into sb/submodule-parallel-update"
(d075d2604c0f9204)

The diff to v4 is below, just a change in braces.

My main motivation for resending this patch series is have it available to
reviewers of the series I am going to send next, which builds on top of this
series.

Thanks,
Stefan

Stefan Beller (9):
  run_processes_parallel: delimit intermixed task output
  submodule-config: keep update strategy around
  submodule-config: drop check against NULL
  submodule-config: remove name_and_item_from_var
  submodule-config: introduce parse_generic_submodule_config
  fetching submodules: respect `submodule.fetchJobs` config option
  git submodule update: have a dedicated helper for cloning
  submodule update: expose parallelism to the user
  clone: allow an explicit argument for parallel submodule clones

 Documentation/config.txt        |   7 ++
 Documentation/git-clone.txt     |   6 +-
 Documentation/git-submodule.txt |   7 +-
 builtin/clone.c                 |  19 +++-
 builtin/fetch.c                 |   2 +-
 builtin/submodule--helper.c     | 239 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                |  54 ++++-----
 run-command.c                   |   4 +
 submodule-config.c              | 109 +++++++++++-------
 submodule-config.h              |   3 +
 submodule.c                     |   5 +
 t/t5526-fetch-submodules.sh     |  14 +++
 t/t7400-submodule-basic.sh      |   4 +-
 t/t7406-submodule-update.sh     |  27 +++++
 14 files changed, 417 insertions(+), 83 deletions(-)
 
Interdiff to series v4: (ignoring the merge base as that changed, too)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 662d329..254824a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -487,9 +487,9 @@ static int update_clone(int argc, const char **argv, const char *prefix)
                return 1;
        }
 
-       for_each_string_list_item(item, &pp.projectlines) {
+       for_each_string_list_item(item, &pp.projectlines)
                utf8_fprintf(stdout, "%s", item->string);
-       }
+
        return 0;
 }
 
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 12%]

* [PATCHv5 2/9] submodule-config: keep update strategy around
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
@ 2015-11-25  1:14 26% ` Stefan Beller
  2015-11-25  1:14 25% ` [PATCHv5 3/9] submodule-config: drop check against NULL Stefan Beller
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Junio C Hamano

We need the submodule update strategies in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule-config.c | 11 +++++++++++
 submodule-config.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index afe0ea8..4239b0e 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -194,6 +194,7 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 
 	submodule->path = NULL;
 	submodule->url = NULL;
+	submodule->update = NULL;
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -311,6 +312,16 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
+	} else if (!strcmp(item.buf, "update")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->update != NULL)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "update");
+		else {
+			free((void *) submodule->update);
+			submodule->update = xstrdup(value);
+		}
 	}
 
 	strbuf_release(&name);
diff --git a/submodule-config.h b/submodule-config.h
index 9061e4e..f9e2a29 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -14,6 +14,7 @@ struct submodule {
 	const char *url;
 	int fetch_recurse;
 	const char *ignore;
+	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
 };
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 26%]

* [PATCHv5 5/9] submodule-config: introduce parse_generic_submodule_config
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (2 preceding siblings ...)
  2015-11-25  1:14 25% ` [PATCHv5 4/9] submodule-config: remove name_and_item_from_var Stefan Beller
@ 2015-11-25  1:14 23% ` Stefan Beller
  2015-11-25  1:14 25% ` [PATCHv5 6/9] fetching submodules: respect `submodule.fetchJobs` config option Stefan Beller
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This rewrites parse_config to distinguish between configs specific to
one submodule and configs which apply generically to all submodules.
We do not have generic submodule configs yet, but the next patch will
introduce "submodule.jobs".

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index b826841..29e21b2 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -234,17 +234,22 @@ struct parse_config_parameter {
 	int overwrite;
 };
 
-static int parse_config(const char *var, const char *value, void *data)
+static int parse_generic_submodule_config(const char *key,
+					  const char *var,
+					  const char *value,
+					  struct parse_config_parameter *me)
 {
-	struct parse_config_parameter *me = data;
-	struct submodule *submodule;
-	int subsection_len, ret = 0;
-	const char *subsection, *key;
-
-	if (parse_config_key(var, "submodule", &subsection,
-			     &subsection_len, &key) < 0 || !subsection_len)
-		return 0;
+	return 0;
+}
 
+static int parse_specific_submodule_config(const char *subsection, int subsection_len,
+					   const char *key,
+					   const char *var,
+					   const char *value,
+					   struct parse_config_parameter *me)
+{
+	int ret = 0;
+	struct submodule *submodule;
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
 					     subsection, subsection_len);
@@ -314,6 +319,24 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
+static int parse_config(const char *var, const char *value, void *data)
+{
+	struct parse_config_parameter *me = data;
+	int subsection_len;
+	const char *subsection, *key;
+
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0)
+		return 0;
+
+	if (!subsection_len)
+		return parse_generic_submodule_config(key, var, value, me);
+	else
+		return parse_specific_submodule_config(subsection,
+						       subsection_len, key,
+						       var, value, me);
+}
+
 static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
 				      unsigned char *gitmodules_sha1)
 {
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 23%]

* [PATCHv5 4/9] submodule-config: remove name_and_item_from_var
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
  2015-11-25  1:14 26% ` [PATCHv5 2/9] submodule-config: keep update strategy around Stefan Beller
  2015-11-25  1:14 25% ` [PATCHv5 3/9] submodule-config: drop check against NULL Stefan Beller
@ 2015-11-25  1:14 25% ` Stefan Beller
  2015-11-25  1:14 23% ` [PATCHv5 5/9] submodule-config: introduce parse_generic_submodule_config Stefan Beller
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

`name_and_item_from_var` does not provide the proper abstraction
we need here in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 48 ++++++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 6d01941..b826841 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -161,31 +161,17 @@ static struct submodule *cache_lookup_name(struct submodule_cache *cache,
 	return NULL;
 }
 
-static int name_and_item_from_var(const char *var, struct strbuf *name,
-				  struct strbuf *item)
-{
-	const char *subsection, *key;
-	int subsection_len, parse;
-	parse = parse_config_key(var, "submodule", &subsection,
-			&subsection_len, &key);
-	if (parse < 0 || !subsection)
-		return 0;
-
-	strbuf_add(name, subsection, subsection_len);
-	strbuf_addstr(item, key);
-
-	return 1;
-}
-
 static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
-		const unsigned char *gitmodules_sha1, const char *name)
+						  const unsigned char *gitmodules_sha1,
+						  const char *name_ptr, int name_len)
 {
 	struct submodule *submodule;
 	struct strbuf name_buf = STRBUF_INIT;
+	char *name = xmemdupz(name_ptr, name_len);
 
 	submodule = cache_lookup_name(cache, gitmodules_sha1, name);
 	if (submodule)
-		return submodule;
+		goto out;
 
 	submodule = xmalloc(sizeof(*submodule));
 
@@ -201,7 +187,8 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 	hashcpy(submodule->gitmodules_sha1, gitmodules_sha1);
 
 	cache_add(cache, submodule);
-
+out:
+	free(name);
 	return submodule;
 }
 
@@ -251,18 +238,18 @@ static int parse_config(const char *var, const char *value, void *data)
 {
 	struct parse_config_parameter *me = data;
 	struct submodule *submodule;
-	struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
-	int ret = 0;
+	int subsection_len, ret = 0;
+	const char *subsection, *key;
 
-	/* this also ensures that we only parse submodule entries */
-	if (!name_and_item_from_var(var, &name, &item))
+	if (parse_config_key(var, "submodule", &subsection,
+			     &subsection_len, &key) < 0 || !subsection_len)
 		return 0;
 
 	submodule = lookup_or_create_by_name(me->cache,
 					     me->gitmodules_sha1,
-					     name.buf);
+					     subsection, subsection_len);
 
-	if (!strcmp(item.buf, "path")) {
+	if (!strcmp(key, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->path)
@@ -275,7 +262,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->path = xstrdup(value);
 			cache_put_path(me->cache, submodule);
 		}
-	} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+	} else if (!strcmp(key, "fetchrecursesubmodules")) {
 		/* when parsing worktree configurations we can die early */
 		int die_on_error = is_null_sha1(me->gitmodules_sha1);
 		if (!me->overwrite &&
@@ -286,7 +273,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			submodule->fetch_recurse = parse_fetch_recurse(
 								var, value,
 								die_on_error);
-	} else if (!strcmp(item.buf, "ignore")) {
+	} else if (!strcmp(key, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->ignore)
@@ -302,7 +289,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->ignore);
 			submodule->ignore = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "url")) {
+	} else if (!strcmp(key, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
 		} else if (!me->overwrite && submodule->url) {
@@ -312,7 +299,7 @@ static int parse_config(const char *var, const char *value, void *data)
 			free((void *) submodule->url);
 			submodule->url = xstrdup(value);
 		}
-	} else if (!strcmp(item.buf, "update")) {
+	} else if (!strcmp(key, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
 		else if (!me->overwrite && submodule->update)
@@ -324,9 +311,6 @@ static int parse_config(const char *var, const char *value, void *data)
 		}
 	}
 
-	strbuf_release(&name);
-	strbuf_release(&item);
-
 	return ret;
 }
 
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 25%]

* [PATCHv5 6/9] fetching submodules: respect `submodule.fetchJobs` config option
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (3 preceding siblings ...)
  2015-11-25  1:14 23% ` [PATCHv5 5/9] submodule-config: introduce parse_generic_submodule_config Stefan Beller
@ 2015-11-25  1:14 25% ` Stefan Beller
  2015-11-25  1:14 21% ` [PATCHv5 7/9] git submodule update: have a dedicated helper for cloning Stefan Beller
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This allows to configure fetching and updating in parallel
without having the command line option.

This moved the responsibility to determine how many parallel processes
to start from builtin/fetch to submodule.c as we need a way to communicate
"The user did not specify the number of parallel processes in the command
line options" in the builtin fetch. The submodule code takes care of
the precedence (CLI > config > default)

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt    |  7 +++++++
 builtin/fetch.c             |  2 +-
 submodule-config.c          | 15 +++++++++++++++
 submodule-config.h          |  2 ++
 submodule.c                 |  5 +++++
 t/t5526-fetch-submodules.sh | 14 ++++++++++++++
 6 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 391a0c3..9e7c14c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2643,6 +2643,13 @@ submodule.<name>.ignore::
 	"--ignore-submodules" option. The 'git submodule' commands are not
 	affected by this setting.
 
+submodule.fetchJobs::
+	This is used to determine how many submodules will be
+	fetched/cloned at the same time. Specifying a positive integer
+	allows up to that number of submodules being fetched in parallel.
+	This is used in fetch and clone operations only. A value of 0 will
+	give some reasonable configuration. It defaults to 1.
+
 tag.sort::
 	This variable controls the sort ordering of tags when displayed by
 	linkgit:git-tag[1]. Without the "--sort=<value>" option provided, the
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 9cc1c9d..60e6797 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
-static int max_children = 1;
+static int max_children = -1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
diff --git a/submodule-config.c b/submodule-config.c
index 29e21b2..a32259e 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -32,6 +32,7 @@ enum lookup_type {
 
 static struct submodule_cache cache;
 static int is_cache_init;
+static int parallel_jobs = -1;
 
 static int config_path_cmp(const struct submodule_entry *a,
 			   const struct submodule_entry *b,
@@ -239,6 +240,15 @@ static int parse_generic_submodule_config(const char *key,
 					  const char *value,
 					  struct parse_config_parameter *me)
 {
+	if (!strcmp(key, "fetchjobs")) {
+		parallel_jobs = strtol(value, NULL, 10);
+		if (parallel_jobs < 0) {
+			warning("submodule.fetchJobs not allowed to be negative.");
+			parallel_jobs = 1;
+			return 1;
+		}
+	}
+
 	return 0;
 }
 
@@ -482,3 +492,8 @@ void submodule_free(void)
 	cache_free(&cache);
 	is_cache_init = 0;
 }
+
+int config_parallel_submodules(void)
+{
+	return parallel_jobs;
+}
diff --git a/submodule-config.h b/submodule-config.h
index f9e2a29..d9bbf9a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -27,4 +27,6 @@ const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
 void submodule_free(void);
 
+int config_parallel_submodules(void);
+
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index c6350eb..e73f850 100644
--- a/submodule.c
+++ b/submodule.c
@@ -749,6 +749,11 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&spf.args, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = config_parallel_submodules();
+	if (max_parallel_jobs < 0)
+		max_parallel_jobs = 1;
+
 	calculate_changed_submodule_paths();
 	run_processes_parallel(max_parallel_jobs,
 			       get_next_submodule,
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 1b4ce69..6671994 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -470,4 +470,18 @@ test_expect_success "don't fetch submodule when newly recorded commits are alrea
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success 'fetching submodules respects parallel settings' '
+	git config fetch.recurseSubmodules true &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 7 &&
+		grep "7 tasks" trace.out &&
+		git config submodule.fetchJobs 8 &&
+		GIT_TRACE=$(pwd)/trace.out git fetch &&
+		grep "8 tasks" trace.out &&
+		GIT_TRACE=$(pwd)/trace.out git fetch --jobs 9 &&
+		grep "9 tasks" trace.out
+	)
+'
+
 test_done
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 25%]

* [PATCHv5 9/9] clone: allow an explicit argument for parallel submodule clones
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (6 preceding siblings ...)
  2015-11-25  1:14 23% ` [PATCHv5 8/9] submodule update: expose parallelism to the user Stefan Beller
@ 2015-11-25  1:14 25% ` Stefan Beller
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Just pass it along to "git submodule update", which may pick reasonable
defaults if you don't specify an explicit number.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt |  6 +++++-
 builtin/clone.c             | 19 +++++++++++++------
 t/t7406-submodule-update.sh | 15 +++++++++++++++
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index f1f2a3f..59d8c67 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
 	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
-	  [--recursive | --recurse-submodules] [--] <repository>
+	  [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
 	  [<directory>]
 
 DESCRIPTION
@@ -216,6 +216,10 @@ objects from the source repository into a pack in the cloned repository.
 	The result is Git repository can be separated from working
 	tree.
 
+-j <n>::
+--jobs <n>::
+	The number of submodules fetched at the same time.
+	Defaults to the `submodule.fetchJobs` option.
 
 <repository>::
 	The (possibly remote) repository to clone from.  See the
diff --git a/builtin/clone.c b/builtin/clone.c
index 9eaecd9..ce578d2 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -50,6 +50,7 @@ static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
+static int max_jobs = -1;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -72,6 +73,8 @@ static struct option builtin_clone_options[] = {
 		    N_("initialize submodules in the clone")),
 	OPT_BOOL(0, "recurse-submodules", &option_recursive,
 		    N_("initialize submodules in the clone")),
+	OPT_INTEGER('j', "jobs", &max_jobs,
+		    N_("number of submodules cloned in parallel")),
 	OPT_STRING(0, "template", &option_template, N_("template-directory"),
 		   N_("directory from which templates will be used")),
 	OPT_STRING_LIST(0, "reference", &option_reference, N_("repo"),
@@ -95,10 +98,6 @@ static struct option builtin_clone_options[] = {
 	OPT_END()
 };
 
-static const char *argv_submodule[] = {
-	"submodule", "update", "--init", "--recursive", NULL
-};
-
 static const char *get_repo_path_1(struct strbuf *path, int *is_bundle)
 {
 	static char *suffix[] = { "/.git", "", ".git/.git", ".git" };
@@ -724,8 +723,16 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive)
-		err = run_command_v_opt(argv_submodule, RUN_GIT_CMD);
+	if (!err && option_recursive) {
+		struct argv_array args = ARGV_ARRAY_INIT;
+		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+
+		if (max_jobs != -1)
+			argv_array_pushf(&args, "--jobs=%d", max_jobs);
+
+		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
+		argv_array_clear(&args);
+	}
 
 	return err;
 }
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 7fd5142..090891e 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -786,4 +786,19 @@ test_expect_success 'submodule update can be run in parallel' '
 	 grep "9 tasks" trace.out
 	)
 '
+
+test_expect_success 'git clone passes the parallel jobs config on to submodules' '
+	test_when_finished "rm -rf super4" &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 7 . super4 &&
+	grep "7 tasks" trace.out &&
+	rm -rf super4 &&
+	git config --global submodule.fetchJobs 8 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules . super4 &&
+	grep "8 tasks" trace.out &&
+	rm -rf super4 &&
+	GIT_TRACE=$(pwd)/trace.out git clone --recurse-submodules --jobs 9 . super4 &&
+	grep "9 tasks" trace.out &&
+	rm -rf super4
+'
+
 test_done
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 25%]

* [PATCHv5 8/9] submodule update: expose parallelism to the user
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (5 preceding siblings ...)
  2015-11-25  1:14 21% ` [PATCHv5 7/9] git submodule update: have a dedicated helper for cloning Stefan Beller
@ 2015-11-25  1:14 23% ` Stefan Beller
  2015-11-25  1:14 25% ` [PATCHv5 9/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Expose possible parallelism either via the "--jobs" CLI parameter or
the "submodule.fetchJobs" setting.

By having the variable initialized to -1, we make sure 0 can be passed
into the parallel processing machine, which will then pick as many parallel
workers as there are CPUs.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  7 ++++++-
 builtin/submodule--helper.c     | 18 ++++++++++++++----
 git-submodule.sh                |  9 +++++++++
 t/t7406-submodule-update.sh     | 12 ++++++++++++
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index f17687e..a87ff72 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -16,7 +16,7 @@ SYNOPSIS
 'git submodule' [--quiet] deinit [-f|--force] [--] <path>...
 'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
 	      [-f|--force] [--rebase|--merge] [--reference <repository>]
-	      [--depth <depth>] [--recursive] [--] [<path>...]
+	      [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]
 'git submodule' [--quiet] summary [--cached|--files] [(-n|--summary-limit) <n>]
 	      [commit] [--] [<path>...]
 'git submodule' [--quiet] foreach [--recursive] <command>
@@ -374,6 +374,11 @@ for linkgit:git-clone[1]'s `--reference` and `--shared` options carefully.
 	clone with a history truncated to the specified number of revisions.
 	See linkgit:git-clone[1]
 
+-j <n>::
+--jobs <n>::
+	This option is only valid for the update command.
+	Clone new submodules in parallel with as many jobs.
+	Defaults to the `submodule.fetchJobs` option.
 
 <path>...::
 	Paths to submodule(s). When specified this will restrict the command
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 27363fa..254824a 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -426,6 +426,7 @@ static int update_clone_task_finished(int result,
 
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
+	int max_jobs = -1;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -446,6 +447,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		OPT_STRING(0, "depth", &pp.depth, "<depth>",
 			   N_("Create a shallow clone truncated to the "
 			      "specified number of revisions")),
+		OPT_INTEGER('j', "jobs", &max_jobs,
+			    N_("parallel jobs")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -467,10 +470,17 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
-	run_processes_parallel(1, update_clone_get_next_task,
-				  update_clone_start_failure,
-				  update_clone_task_finished,
-				  &pp);
+
+	if (max_jobs < 0)
+		max_jobs = config_parallel_submodules();
+	if (max_jobs < 0)
+		max_jobs = 1;
+
+	run_processes_parallel(max_jobs,
+			       update_clone_get_next_task,
+			       update_clone_start_failure,
+			       update_clone_task_finished,
+			       &pp);
 
 	if (pp.print_unmatched) {
 		printf("#unmatched\n");
diff --git a/git-submodule.sh b/git-submodule.sh
index 9f554fb..10c5af9 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -645,6 +645,14 @@ cmd_update()
 		--depth=*)
 			depth=$1
 			;;
+		-j|--jobs)
+			case "$2" in '') usage ;; esac
+			jobs="--jobs=$2"
+			shift
+			;;
+		--jobs=*)
+			jobs=$1
+			;;
 		--)
 			shift
 			break
@@ -670,6 +678,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${jobs:+$jobs} \
 		"$@" | {
 	err=
 	while read mode sha1 stage just_cloned sm_path
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index dda3929..7fd5142 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -774,4 +774,16 @@ test_expect_success 'submodule update --recursive drops module name before recur
 	 test_i18ngrep "Submodule path .deeper/submodule/subsubmodule.: checked out" actual
 	)
 '
+
+test_expect_success 'submodule update can be run in parallel' '
+	(cd super2 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 7 &&
+	 grep "7 tasks" trace.out &&
+	 git config submodule.fetchJobs 8 &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update &&
+	 grep "8 tasks" trace.out &&
+	 GIT_TRACE=$(pwd)/trace.out git submodule update --jobs 9 &&
+	 grep "9 tasks" trace.out
+	)
+'
 test_done
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 23%]

* [PATCHv5 7/9] git submodule update: have a dedicated helper for cloning
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
                   ` (4 preceding siblings ...)
  2015-11-25  1:14 25% ` [PATCHv5 6/9] fetching submodules: respect `submodule.fetchJobs` config option Stefan Beller
@ 2015-11-25  1:14 21% ` Stefan Beller
  2015-11-25  1:14 23% ` [PATCHv5 8/9] submodule update: expose parallelism to the user Stefan Beller
  2015-11-25  1:14 25% ` [PATCHv5 9/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Junio C Hamano

This introduces a new helper function in git submodule--helper
which takes care of cloning all submodules, which we want to
parallelize eventually.

Some tests (such as empty URL, update_mode=none) are required in the
helper to make the decision for cloning. These checks have been
moved into the C function as well (no need to repeat them in the
shell script).

As we can only access the stderr channel from within the parallel
processing engine, we need to reroute the error message for
specified but initialized submodules to stderr. As it is an error
message, this should have gone to stderr in the first place, so it
is a bug fix along the way.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 229 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  45 +++------
 t/t7400-submodule-basic.sh  |   4 +-
 3 files changed, 242 insertions(+), 36 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f4c3eff..27363fa 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -255,6 +255,234 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int git_submodule_config(const char *var, const char *value, void *cb)
+{
+	return parse_submodule_config_option(var, value);
+}
+
+struct submodule_update_clone {
+	/* states */
+	int count;
+	int print_unmatched;
+	/* configuration */
+	int quiet;
+	const char *reference;
+	const char *depth;
+	const char *update;
+	const char *recursive_prefix;
+	const char *prefix;
+	struct module_list list;
+	struct string_list projectlines;
+	struct pathspec pathspec;
+};
+#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+
+static void fill_clone_command(struct child_process *cp, int quiet,
+			       const char *prefix, const char *path,
+			       const char *name, const char *url,
+			       const char *reference, const char *depth)
+{
+	cp->git_cmd = 1;
+	cp->no_stdin = 1;
+	cp->stdout_to_stderr = 1;
+	cp->err = -1;
+	argv_array_push(&cp->args, "submodule--helper");
+	argv_array_push(&cp->args, "clone");
+	if (quiet)
+		argv_array_push(&cp->args, "--quiet");
+
+	if (prefix)
+		argv_array_pushl(&cp->args, "--prefix", prefix, NULL);
+
+	argv_array_pushl(&cp->args, "--path", path, NULL);
+	argv_array_pushl(&cp->args, "--name", name, NULL);
+	argv_array_pushl(&cp->args, "--url", url, NULL);
+	if (reference)
+		argv_array_push(&cp->args, reference);
+	if (depth)
+		argv_array_push(&cp->args, depth);
+}
+
+static int update_clone_get_next_task(void **pp_task_cb,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	for (; pp->count < pp->list.nr; pp->count++) {
+		const struct submodule *sub = NULL;
+		const char *displaypath = NULL;
+		const struct cache_entry *ce = pp->list.entries[pp->count];
+		struct strbuf sb = STRBUF_INIT;
+		const char *update_module = NULL;
+		char *url = NULL;
+		int needs_cloning = 0;
+
+		if (ce_stage(ce)) {
+			if (pp->recursive_prefix)
+				strbuf_addf(err, "Skipping unmerged submodule %s/%s\n",
+					pp->recursive_prefix, ce->name);
+			else
+				strbuf_addf(err, "Skipping unmerged submodule %s\n",
+					ce->name);
+			continue;
+		}
+
+		sub = submodule_from_path(null_sha1, ce->name);
+		if (!sub) {
+			strbuf_addf(err, "BUG: internal error managing submodules. "
+				    "The cache could not locate '%s'", ce->name);
+			pp->print_unmatched = 1;
+			continue;
+		}
+
+		if (pp->recursive_prefix)
+			displaypath = relative_path(pp->recursive_prefix, ce->name, &sb);
+		else
+			displaypath = ce->name;
+
+		if (pp->update)
+			update_module = pp->update;
+		if (!update_module)
+			update_module = sub->update;
+		if (!update_module)
+			update_module = "checkout";
+		if (!strcmp(update_module, "none")) {
+			strbuf_addf(err, "Skipping submodule '%s'\n", displaypath);
+			continue;
+		}
+
+		/*
+		 * Looking up the url in .git/config.
+		 * We must not fall back to .gitmodules as we only want to process
+		 * configured submodules.
+		 */
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "submodule.%s.url", sub->name);
+		git_config_get_string(sb.buf, &url);
+		if (!url) {
+			/*
+			 * Only mention uninitialized submodules when its
+			 * path have been specified
+			 */
+			if (pp->pathspec.nr)
+				strbuf_addf(err, _("Submodule path '%s' not initialized\n"
+					"Maybe you want to use 'update --init'?"), displaypath);
+			continue;
+		}
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		needs_cloning = !file_exists(sb.buf);
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%06o %s %d %d\t%s\n", ce->ce_mode,
+				sha1_to_hex(ce->sha1), ce_stage(ce),
+				needs_cloning, ce->name);
+		string_list_append(&pp->projectlines, sb.buf);
+
+		if (needs_cloning) {
+			fill_clone_command(cp, pp->quiet, pp->prefix, ce->name,
+					   sub->name, url, pp->reference, pp->depth);
+			pp->count++;
+			free(url);
+			return 1;
+		} else
+			free(url);
+	}
+	return 0;
+}
+
+static int update_clone_start_failure(struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	strbuf_addf(err, "error when starting a child process");
+	pp->print_unmatched = 1;
+
+	return 1;
+}
+
+static int update_clone_task_finished(int result,
+				      struct child_process *cp,
+				      struct strbuf *err,
+				      void *pp_cb,
+				      void *pp_task_cb)
+{
+	struct submodule_update_clone *pp = pp_cb;
+
+	if (!result) {
+		return 0;
+	} else {
+		strbuf_addf(err, "error in one child process");
+		pp->print_unmatched = 1;
+		return 1;
+	}
+}
+
+static int update_clone(int argc, const char **argv, const char *prefix)
+{
+	struct string_list_item *item;
+	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &prefix,
+			   N_("path"),
+			   N_("path into the working tree")),
+		OPT_STRING(0, "recursive_prefix", &pp.recursive_prefix,
+			   N_("path"),
+			   N_("path into the working tree, across nested "
+			      "submodule boundaries")),
+		OPT_STRING(0, "update", &pp.update,
+			   N_("string"),
+			   N_("update command for submodules")),
+		OPT_STRING(0, "reference", &pp.reference, "<repository>",
+			   N_("Use the local reference repository "
+			      "instead of a full clone")),
+		OPT_STRING(0, "depth", &pp.depth, "<depth>",
+			   N_("Create a shallow clone truncated to the "
+			      "specified number of revisions")),
+		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
+		OPT_END()
+	};
+
+	const char *const git_submodule_helper_usage[] = {
+		N_("git submodule--helper list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+	pp.prefix = prefix;
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, prefix, &pp.pathspec, &pp.list) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	gitmodules_config();
+	/* Overlay the parsed .gitmodules file with .git/config */
+	git_config(git_submodule_config, NULL);
+	run_processes_parallel(1, update_clone_get_next_task,
+				  update_clone_start_failure,
+				  update_clone_task_finished,
+				  &pp);
+
+	if (pp.print_unmatched) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for_each_string_list_item(item, &pp.projectlines)
+		utf8_fprintf(stdout, "%s", item->string);
+
+	return 0;
+}
+
 struct cmd_struct {
 	const char *cmd;
 	int (*fn)(int, const char **, const char *);
@@ -264,6 +492,7 @@ static struct cmd_struct commands[] = {
 	{"list", module_list},
 	{"name", module_name},
 	{"clone", module_clone},
+	{"update-clone", update_clone}
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 9bc5c5f..9f554fb 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -664,17 +664,18 @@ cmd_update()
 		cmd_init "--" "$@" || return
 	fi
 
-	cloned_modules=
-	git submodule--helper list --prefix "$wt_prefix" "$@" | {
+	git submodule--helper update-clone ${GIT_QUIET:+--quiet} \
+		${wt_prefix:+--prefix "$wt_prefix"} \
+		${prefix:+--recursive_prefix "$prefix"} \
+		${update:+--update "$update"} \
+		${reference:+--reference "$reference"} \
+		${depth:+--depth "$depth"} \
+		"$@" | {
 	err=
-	while read mode sha1 stage sm_path
+	while read mode sha1 stage just_cloned sm_path
 	do
 		die_if_unmatched "$mode"
-		if test "$stage" = U
-		then
-			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
-			continue
-		fi
+
 		name=$(git submodule--helper name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
@@ -691,27 +692,10 @@ cmd_update()
 
 		displaypath=$(relative_path "$prefix$sm_path")
 
-		if test "$update_module" = "none"
-		then
-			echo "Skipping submodule '$displaypath'"
-			continue
-		fi
-
-		if test -z "$url"
-		then
-			# Only mention uninitialized submodules when its
-			# path have been specified
-			test "$#" != "0" &&
-			say "$(eval_gettext "Submodule path '\$displaypath' not initialized
-Maybe you want to use 'update --init'?")"
-			continue
-		fi
-
-		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
+		if test $just_cloned -eq 1
 		then
-			git submodule--helper clone ${GIT_QUIET:+--quiet} --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
-			cloned_modules="$cloned_modules;$name"
 			subsha1=
+			update_module=checkout
 		else
 			subsha1=$(clear_local_git_env; cd "$sm_path" &&
 				git rev-parse --verify HEAD) ||
@@ -751,13 +735,6 @@ Maybe you want to use 'update --init'?")"
 				die "$(eval_gettext "Unable to fetch in submodule path '\$displaypath'")"
 			fi
 
-			# Is this something we just cloned?
-			case ";$cloned_modules;" in
-			*";$name;"*)
-				# then there is no local change to integrate
-				update_module=checkout ;;
-			esac
-
 			must_die_on_failure=
 			case "$update_module" in
 			checkout)
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 540771c..5991e3c 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -462,7 +462,7 @@ test_expect_success 'update --init' '
 	git config --remove-section submodule.example &&
 	test_must_fail git config submodule.example.url &&
 
-	git submodule update init > update.out &&
+	git submodule update init 2> update.out &&
 	cat update.out &&
 	test_i18ngrep "not initialized" update.out &&
 	test_must_fail git rev-parse --resolve-git-dir init/.git &&
@@ -480,7 +480,7 @@ test_expect_success 'update --init from subdirectory' '
 	mkdir -p sub &&
 	(
 		cd sub &&
-		git submodule update ../init >update.out &&
+		git submodule update ../init 2>update.out &&
 		cat update.out &&
 		test_i18ngrep "not initialized" update.out &&
 		test_must_fail git rev-parse --resolve-git-dir ../init/.git &&
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 21%]

* [PATCHv5 3/9] submodule-config: drop check against NULL
  2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
  2015-11-25  1:14 26% ` [PATCHv5 2/9] submodule-config: keep update strategy around Stefan Beller
@ 2015-11-25  1:14 25% ` Stefan Beller
  2015-11-25  1:14 25% ` [PATCHv5 4/9] submodule-config: remove name_and_item_from_var Stefan Beller
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:14 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Adhere to the common coding style of Git and not check explicitly
for NULL throughout the file. There are still other occurrences in the
code base but that is usually inside of conditions with side effects.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 4239b0e..6d01941 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -265,7 +265,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	if (!strcmp(item.buf, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->path != NULL)
+		else if (!me->overwrite && submodule->path)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"path");
 		else {
@@ -289,7 +289,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "ignore")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->ignore != NULL)
+		else if (!me->overwrite && submodule->ignore)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"ignore");
 		else if (strcmp(value, "untracked") &&
@@ -305,7 +305,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "url")) {
 		if (!value) {
 			ret = config_error_nonbool(var);
-		} else if (!me->overwrite && submodule->url != NULL) {
+		} else if (!me->overwrite && submodule->url) {
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					"url");
 		} else {
@@ -315,7 +315,7 @@ static int parse_config(const char *var, const char *value, void *data)
 	} else if (!strcmp(item.buf, "update")) {
 		if (!value)
 			ret = config_error_nonbool(var);
-		else if (!me->overwrite && submodule->update != NULL)
+		else if (!me->overwrite && submodule->update)
 			warn_multiple_config(me->commit_sha1, submodule->name,
 					     "update");
 		else {
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 25%]

* [RFC PATCH 0/5] Submodule Groups
@ 2015-11-25  1:32  9% Stefan Beller
  2015-11-25  1:32 27% ` [PATCH 1/5] submodule-config: keep submodule groups around Stefan Beller
                   ` (6 more replies)
  0 siblings, 7 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:32 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t, hvoigt, Stefan Beller

This is also available at https://github.com/stefanbeller/git/tree/submodule-groups
It applies on top of the submodule-parallel-patch series I sent a few minutes ago.

Consider having a real large software project in Git with each component
in a submodule (such as an operating system, Android, Debian, Fedora,
no toy OS such as https://github.com/gittup/gittup as that doesn't quite
demonstrate the scale of the problem).

If you have lots of submodules, you probably don't need all of them at once,
but you have functional units. Some submodules are absolutely required,
some are optional and only for very specific purposes.

This patch series adds meaning to a "groups" field in the .gitmodules file.

So you could have a .gitmodules file such as:

[submodule "gcc"]
        path = gcc
        url = git://...
        groups = default,devel
[submodule "linux"]
        path = linux
        url = git://...
        groups = default
[submodule "nethack"]
        path = nethack
        url = git://...
        groups = optional,games

and by this series you can work on an arbitrary subgroup of these submodules such
using these commands:

    git clone --group default --group devel git://...
    # will clone the superproject and recursively
    # checkout any submodule being in at least one of the groups.

    git submodule add --group default --group devel git://... ..
    # will add a submodule, adding 2 submodule
    # groups to its entry in .gitmodule
    
    # as support for clone we want to have:
    git config submodule.groups default
    git submodule init --groups
    # will init all submodules from the default group
    
    # as support for clone we want to have:
    git config submodule.groups default
    git submodule update --groups
    # will update all submodules from the default group

Any feedback welcome, specially on the design level!
(Do we want to have it stored in the .gitmodules file? Do we want to have
the groups configured in .git/config as "submodule.groups", any other way
to make it future proof and extend the groups syntax?)

Thanks,
Stefan

Stefan Beller (5):
  submodule-config: keep submodule groups around
  git submodule add can add a submodule with groups
  git submodule init to pass on groups
  submodule--helper: module_list and update-clone have --groups option
  builtin/clone: support submodule groups

 Documentation/git-clone.txt     |  11 ++++
 Documentation/git-submodule.txt |   8 ++-
 builtin/clone.c                 |  33 ++++++++++-
 builtin/submodule--helper.c     |  68 ++++++++++++++++++++++-
 git-submodule.sh                |  20 ++++++-
 submodule-config.c              |  14 +++++
 submodule-config.h              |   2 +
 t/t7400-submodule-basic.sh      | 118 ++++++++++++++++++++++++++++++++++++++++
 t/t7406-submodule-update.sh     |  32 +++++++++++
 9 files changed, 299 insertions(+), 7 deletions(-)

-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply	[relevance 9%]

* [PATCH 1/5] submodule-config: keep submodule groups around
  2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
@ 2015-11-25  1:32 27% ` Stefan Beller
  2015-11-25  1:32 29% ` [PATCH 2/5] git submodule add can add a submodule with groups Stefan Beller
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:32 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t, hvoigt, Stefan Beller

We need to query the groups in a later patch.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 submodule-config.c | 14 ++++++++++++++
 submodule-config.h |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/submodule-config.c b/submodule-config.c
index a32259e..f44ce20 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -60,6 +60,7 @@ static void free_one_config(struct submodule_entry *entry)
 {
 	free((void *) entry->config->path);
 	free((void *) entry->config->name);
+	free((void *) entry->config->groups);
 	free(entry->config);
 }
 
@@ -182,6 +183,8 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
 	submodule->path = NULL;
 	submodule->url = NULL;
 	submodule->update = NULL;
+	submodule->groups = xmalloc(sizeof(*submodule->groups));
+	string_list_init(submodule->groups, 1);
 	submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
 	submodule->ignore = NULL;
 
@@ -324,6 +327,17 @@ static int parse_specific_submodule_config(const char *subsection, int subsectio
 			free((void *) submodule->update);
 			submodule->update = xstrdup(value);
 		}
+	} else if (!strcmp(key, "groups")) {
+		if (!value)
+			ret = config_error_nonbool(var);
+		else if (!me->overwrite && submodule->groups)
+			warn_multiple_config(me->commit_sha1, submodule->name,
+					     "groups");
+		else {
+			string_list_clear(submodule->groups, 0);
+			string_list_split(submodule->groups, value, ',', -1);
+			string_list_sort(submodule->groups);
+		}
 	}
 
 	return ret;
diff --git a/submodule-config.h b/submodule-config.h
index d9bbf9a..7fc21e1 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -3,6 +3,7 @@
 
 #include "hashmap.h"
 #include "strbuf.h"
+#include "string-list.h"
 
 /*
  * Submodule entry containing the information about a certain submodule
@@ -17,6 +18,7 @@ struct submodule {
 	const char *update;
 	/* the sha1 blob id of the responsible .gitmodules file */
 	unsigned char gitmodules_sha1[20];
+	struct string_list *groups;
 };
 
 int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 27%]

* [PATCH 2/5] git submodule add can add a submodule with groups
  2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
  2015-11-25  1:32 27% ` [PATCH 1/5] submodule-config: keep submodule groups around Stefan Beller
@ 2015-11-25  1:32 29% ` Stefan Beller
  2015-11-25  1:32 27% ` [PATCH 3/5] git submodule init to pass on groups Stefan Beller
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:32 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t, hvoigt, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-submodule.txt |  8 +++++++-
 git-submodule.sh                |  9 +++++++++
 t/t7400-submodule-basic.sh      | 28 ++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index a87ff72..b434d8d 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -9,7 +9,7 @@ git-submodule - Initialize, update or inspect submodules
 SYNOPSIS
 --------
 [verse]
-'git submodule' [--quiet] add [-b <branch>] [-f|--force] [--name <name>]
+'git submodule' [--quiet] add [-b <branch>] [-f|--force] [-g <group>][--name <name>]
 	      [--reference <repository>] [--depth <depth>] [--] <repository> [<path>]
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [<path>...]
 'git submodule' [--quiet] init [--] [<path>...]
@@ -59,6 +59,9 @@ instead of treating the other project as a submodule. Directories
 that come from both projects can be cloned and checked out as a whole
 if you choose to go that route.
 
+If you manage a large set of submodules, but do not require all of them
+to be checked out, you should look into the submodule groups feature.
+
 COMMANDS
 --------
 add::
@@ -101,6 +104,9 @@ is the superproject and submodule repositories will be kept
 together in the same relative location, and only the
 superproject's URL needs to be provided: git-submodule will correctly
 locate the submodule using the relative URL in .gitmodules.
++
+If at least one group argument was given, all groups are recorded in the
+.gitmodules file in the groups field.
 
 status::
 	Show the status of the submodules. This will print the SHA-1 of the
diff --git a/git-submodule.sh b/git-submodule.sh
index 10c5af9..bbdcf78 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -203,6 +203,7 @@ cmd_add()
 {
 	# parse $args after "submodule ... add".
 	reference_path=
+	submodule_groups=
 	while test $# -ne 0
 	do
 		case "$1" in
@@ -238,6 +239,10 @@ cmd_add()
 		--depth=*)
 			depth=$1
 			;;
+		-g|--group)
+			submodule_groups=${submodule_groups:+${submodule_groups},}"$2"
+			shift
+			;;
 		--)
 			shift
 			break
@@ -365,6 +370,10 @@ Use -f if you really want to add it." >&2
 
 	git config -f .gitmodules submodule."$sm_name".path "$sm_path" &&
 	git config -f .gitmodules submodule."$sm_name".url "$repo" &&
+	if test -n "$submodule_groups"
+	then
+		git config -f .gitmodules submodule."$sm_name".groups "${submodule_groups}"
+	fi &&
 	if test -n "$branch"
 	then
 		git config -f .gitmodules submodule."$sm_name".branch "$branch"
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 5991e3c..a422df3 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -986,6 +986,7 @@ test_expect_success 'submodule with UTF-8 name' '
 '
 
 test_expect_success 'submodule add clone shallow submodule' '
+	test_when_finished "rm -rf super" &&
 	mkdir super &&
 	pwd=$(pwd) &&
 	(
@@ -999,5 +1000,32 @@ test_expect_success 'submodule add clone shallow submodule' '
 	)
 '
 
+test_expect_success 'submodule add records a group' '
+	test_when_finished "rm -rf super" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git config -f .gitmodules submodule."submodule".groups >actual &&
+		echo groupA >expected &&
+		test_cmp expected actual
+	)
+'
+
+test_expect_success 'submodule add records groups' '
+	test_when_finished "rm -rf super" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA -g groupB file://"$pwd"/example2 submodule &&
+		git config -f .gitmodules submodule."submodule".groups >actual &&
+		echo groupA,groupB >expected &&
+		test_cmp expected actual
+	)
+'
 
 test_done
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 29%]

* [PATCH 5/5] builtin/clone: support submodule groups
  2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
                   ` (3 preceding siblings ...)
  2015-11-25  1:32 13% ` [PATCH 4/5] submodule--helper: module_list and update-clone have --groups option Stefan Beller
@ 2015-11-25  1:32 22% ` Stefan Beller
  2015-11-25 17:52  8%   ` Jens Lehmann
  2015-11-25 17:35  7% ` [RFC PATCH 0/5] Submodule Groups Jens Lehmann
  2015-11-25 17:50  7% ` Jens Lehmann
  6 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-25  1:32 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t, hvoigt, Stefan Beller

This passes each group to the `submodule update` invocation and
additionally configures the groups to be automatically updated.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/git-clone.txt | 11 ++++++++
 builtin/clone.c             | 33 ++++++++++++++++++++--
 git-submodule.sh            |  5 ++++
 t/t7400-submodule-basic.sh  | 69 +++++++++++++++++++++++++++++++++++++++++++++
 t/t7406-submodule-update.sh | 32 +++++++++++++++++++++
 5 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 59d8c67..fbf68ab 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -209,6 +209,17 @@ objects from the source repository into a pack in the cloned repository.
 	repository does not have a worktree/checkout (i.e. if any of
 	`--no-checkout`/`-n`, `--bare`, or `--mirror` is given)
 
+--group::
+	After the clone is created, all submodules which are part of the
+	group are cloned. This option can be given multiple times to specify
+	different groups. This option will imply automatic submodule
+	updates for the groups by setting `submodule.update=groups`.
+	The group selection will be passed on recursively, i.e. if a submodule
+	is cloned because of group membership, its submodules will
+	be cloned according to group membership, too. If a submodule is
+	not cloned however, its submodules are not evaluated for group
+	membership.
+
 --separate-git-dir=<git dir>::
 	Instead of placing the cloned repository where it is supposed
 	to be, place the cloned repository at the specified directory,
diff --git a/builtin/clone.c b/builtin/clone.c
index ce578d2..17e9f54 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -51,6 +51,7 @@ static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
 static int max_jobs = -1;
+static struct string_list submodule_groups;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -95,6 +96,8 @@ static struct option builtin_clone_options[] = {
 		   N_("separate git dir from working tree")),
 	OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
 			N_("set config inside the new repository")),
+	OPT_STRING_LIST('g', "group", &submodule_groups, N_("group"),
+			N_("clone specific submodule groups")),
 	OPT_END()
 };
 
@@ -723,9 +726,18 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive) {
+	if (err)
+		goto out;
+
+	if (option_recursive || submodule_groups.nr > 0) {
 		struct argv_array args = ARGV_ARRAY_INIT;
-		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+		argv_array_pushl(&args, "submodule", "update", "--init", NULL);
+
+		if (option_recursive)
+			argv_array_pushf(&args, "--recursive");
+
+		if (submodule_groups.nr > 0)
+			argv_array_pushf(&args, "--groups");
 
 		if (max_jobs != -1)
 			argv_array_pushf(&args, "--jobs=%d", max_jobs);
@@ -733,7 +745,7 @@ static int checkout(void)
 		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
 		argv_array_clear(&args);
 	}
-
+out:
 	return err;
 }
 
@@ -864,6 +876,21 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		option_no_checkout = 1;
 	}
 
+	if (option_recursive && submodule_groups.nr > 0)
+		die(_("submodule groups and recursive flag are incompatible"));
+	if (submodule_groups.nr > 0) {
+		int first_item = 1;
+		struct string_list_item *item;
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, "submodule.groups=");
+		for_each_string_list_item(item, &submodule_groups) {
+			strbuf_addf(&sb, "%s%s", first_item ? "" : ",", item->string);
+			first_item = 0;
+		}
+		if (submodule_groups.nr > 0)
+			string_list_append(&option_config, strbuf_detach(&sb, 0));
+	}
+
 	if (!option_origin)
 		option_origin = "origin";
 
diff --git a/git-submodule.sh b/git-submodule.sh
index 4092a48..e3d1667 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -611,6 +611,7 @@ cmd_deinit()
 #
 cmd_update()
 {
+	groups=
 	# parse $args after "submodule ... update".
 	while test $# -ne 0
 	do
@@ -650,6 +651,9 @@ cmd_update()
 		--checkout)
 			update="checkout"
 			;;
+		--groups)
+			groups=1
+			;;
 		--depth)
 			case "$2" in '') usage ;; esac
 			depth="--depth=$2"
@@ -691,6 +695,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${groups:+--groups} \
 		${jobs:+$jobs} \
 		"$@" | {
 	err=
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index caed4be..e8654d7 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -1049,4 +1049,73 @@ test_expect_success 'submodule init --group works' '
 	)
 '
 
+cat <<EOF > expected
+submodule
+-submodule1
+EOF
+
+test_expect_success 'submodule update --groups works' '
+	test_when_finished "rm -rf super super_clone" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git submodule add file://"$pwd"/example2 submodule1 &&
+		git commit -a -m "create repository with 2 submodules, one is in a group"
+	) &&
+	git clone super super_clone &&
+	(
+		cd super_clone &&
+		git config submodule.groups groupA &&
+		git submodule init  &&
+		git submodule update --groups &&
+		git submodule status |cut -c1,42-52 | tr -d " " >../actual
+	) &&
+	test_cmp actual expected
+'
+
+test_expect_success 'submodule update --init --groups works' '
+	test_when_finished "rm -rf super super_clone" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git submodule add file://"$pwd"/example2 submodule1 &&
+		git commit -a -m "create repository with 2 submodules, one is in a group"
+	) &&
+	git clone super super_clone &&
+	(
+		cd super_clone &&
+		git config submodule.groups groupA &&
+		git submodule update --init --groups &&
+		git submodule status |cut -c1,42-52 | tr -d " " >../actual
+	) &&
+	test_cmp actual expected
+'
+
+test_expect_success 'clone --group works' '
+	test_when_finished "rm -rf super super_clone" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git submodule add file://"$pwd"/example2 submodule1 &&
+		git commit -a -m "create repository with 2 submodules, one is in a group"
+	) &&
+	git clone --group groupA super super_clone &&
+	(
+		cd super_clone &&
+		test_pause
+		git submodule status |cut -c1,42-52 | tr -d " " >../actual
+	) &&
+	test_cmp actual expected
+'
+
+
 test_done
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 090891e..7e59846 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -801,4 +801,36 @@ test_expect_success 'git clone passes the parallel jobs config on to submodules'
 	rm -rf super4
 '
 
+cat >expect <<-EOF &&
+-deeper/submodule
+-merging
+-moved/sub module
+-none
+-rebasing
+-submodule
+-submodule1
+EOF
+
+# none, merging rebasing, submodule1, submodule
+test_expect_success 'git clone works with submodule groups.' '
+	test_when_finished "rm -rf super5" &&
+	(
+		cd super &&
+		git config -f .gitmodules  submodule.submodule.groups default &&
+		git config -f .gitmodules  submodule.submodule1.groups "default,testing" &&
+		git config -f .gitmodules  submodule.none.groups testing &&
+		git commit -a -m "assigning groups to submodules"
+	) &&
+	git clone --group default --group testing super super5 &&
+	(
+		cd super5 &&
+		git submodule status |cut -c1,43- >../actual
+	) &&
+	test_cmp actual expect
+'
+
+test_expect_success 'git submodule update --groups' '
+	true
+'
+
 test_done
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 22%]

* [PATCH 3/5] git submodule init to pass on groups
  2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
  2015-11-25  1:32 27% ` [PATCH 1/5] submodule-config: keep submodule groups around Stefan Beller
  2015-11-25  1:32 29% ` [PATCH 2/5] git submodule add can add a submodule with groups Stefan Beller
@ 2015-11-25  1:32 27% ` Stefan Beller
  2015-11-25  1:32 13% ` [PATCH 4/5] submodule--helper: module_list and update-clone have --groups option Stefan Beller
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:32 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t, hvoigt, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh           |  6 +++++-
 t/t7400-submodule-basic.sh | 21 +++++++++++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index bbdcf78..4092a48 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -455,6 +455,7 @@ cmd_foreach()
 #
 cmd_init()
 {
+	submodule_groups=
 	# parse $args after "submodule ... init".
 	while test $# -ne 0
 	do
@@ -462,6 +463,9 @@ cmd_init()
 		-q|--quiet)
 			GIT_QUIET=1
 			;;
+		-g|--groups)
+			submodule_groups=1
+			;;
 		--)
 			shift
 			break
@@ -476,7 +480,7 @@ cmd_init()
 		shift
 	done
 
-	git submodule--helper list --prefix "$wt_prefix" "$@" |
+	git submodule--helper list ${submodule_groups:+--groups} --prefix "$wt_prefix" "$@" |
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index a422df3..caed4be 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -1028,4 +1028,25 @@ test_expect_success 'submodule add records groups' '
 	)
 '
 
+test_expect_success 'submodule init --group works' '
+	test_when_finished "rm -rf super super_clone" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git submodule add file://"$pwd"/example2 submodule1 &&
+		git commit -a -m "create repository with 2 submodules, one is in a group"
+	) &&
+	git clone super super_clone &&
+	(
+		cd super_clone &&
+		git config submodule.groups groupA &&
+		git submodule init --groups &&
+		git config submodule.submodule.url &&
+		test_must_fail git config submodule.submodule1.url
+	)
+'
+
 test_done
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 27%]

* [PATCH 4/5] submodule--helper: module_list and update-clone have --groups option
  2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
                   ` (2 preceding siblings ...)
  2015-11-25  1:32 27% ` [PATCH 3/5] git submodule init to pass on groups Stefan Beller
@ 2015-11-25  1:32 13% ` Stefan Beller
  2015-11-25  1:32 22% ` [PATCH 5/5] builtin/clone: support submodule groups Stefan Beller
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25  1:32 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t, hvoigt, Stefan Beller

This will be useful in a later patch.
when passing in the --groups option, only the configured groups are
considered instead of all groups.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c | 68 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 254824a..6a208ac 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -67,16 +67,33 @@ static int module_list_compute(int argc, const char **argv,
 	return result;
 }
 
+static int load_submodule_groups(struct string_list **groups)
+{
+	const char *g = NULL;
+	if (git_config_get_string_const("submodule.groups", &g) < 0)
+		return -1;
+	if (!g)
+		return 1;
+	*groups = xmalloc(sizeof(**groups));
+	string_list_init(*groups, 1);
+	string_list_split(*groups, g, ',', -1);
+	string_list_sort(*groups);
+	return 0;
+}
+
 static int module_list(int argc, const char **argv, const char *prefix)
 {
-	int i;
+	int i, groups = 0;
 	struct pathspec pathspec;
 	struct module_list list = MODULE_LIST_INIT;
+	struct string_list *submodule_groups;
 
 	struct option module_list_options[] = {
 		OPT_STRING(0, "prefix", &prefix,
 			   N_("path"),
 			   N_("alternative anchor for relative paths")),
+		OPT_BOOL(0, "groups", &groups,
+			 N_("Only initialize configured submodule groups")),
 		OPT_END()
 	};
 
@@ -93,9 +110,33 @@ static int module_list(int argc, const char **argv, const char *prefix)
 		return 1;
 	}
 
+	if (groups) {
+		gitmodules_config();
+		if (load_submodule_groups(&submodule_groups))
+			die(_("No groups configured?"));
+	}
 	for (i = 0; i < list.nr; i++) {
 		const struct cache_entry *ce = list.entries[i];
 
+		if (groups) {
+			int found = 0;
+			struct string_list_item *item;
+			const struct submodule *sub = submodule_from_path(null_sha1, ce->name);
+			if (!sub)
+				die("BUG: Could not find submodule %s in cache, "
+				    "despite having found it earlier", ce->name);
+			else {
+				for_each_string_list_item(item, sub->groups) {
+					if (string_list_lookup(submodule_groups, item->string)) {
+						found = 1;
+						break;
+					}
+				}
+			if (!found)
+				continue;
+			}
+		}
+
 		if (ce_stage(ce))
 			printf("%06o %s U\t", ce->ce_mode, sha1_to_hex(null_sha1));
 		else
@@ -262,6 +303,7 @@ static int git_submodule_config(const char *var, const char *value, void *cb)
 
 struct submodule_update_clone {
 	/* states */
+	struct string_list *submodule_groups;
 	int count;
 	int print_unmatched;
 	/* configuration */
@@ -275,7 +317,7 @@ struct submodule_update_clone {
 	struct string_list projectlines;
 	struct pathspec pathspec;
 };
-#define SUBMODULE_UPDATE_CLONE_INIT {0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
+#define SUBMODULE_UPDATE_CLONE_INIT {NULL, 0, 0, 0, NULL, NULL, NULL, NULL, NULL, MODULE_LIST_INIT, STRING_LIST_INIT_DUP}
 
 static void fill_clone_command(struct child_process *cp, int quiet,
 			       const char *prefix, const char *path,
@@ -318,6 +360,7 @@ static int update_clone_get_next_task(void **pp_task_cb,
 		const char *update_module = NULL;
 		char *url = NULL;
 		int needs_cloning = 0;
+		int in_submodule_groups = 0;
 
 		if (ce_stage(ce)) {
 			if (pp->recursive_prefix)
@@ -372,6 +415,20 @@ static int update_clone_get_next_task(void **pp_task_cb,
 			continue;
 		}
 
+		if (pp->submodule_groups) {
+			struct string_list_item *item;
+			for_each_string_list_item(item, sub->groups) {
+				if (string_list_lookup(
+				    pp->submodule_groups, item->string)) {
+					in_submodule_groups = 1;
+					break;
+				}
+			}
+		}
+
+		if (pp->submodule_groups && !in_submodule_groups)
+			continue;
+
 		strbuf_reset(&sb);
 		strbuf_addf(&sb, "%s/.git", ce->name);
 		needs_cloning = !file_exists(sb.buf);
@@ -427,6 +484,7 @@ static int update_clone_task_finished(int result,
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
 	int max_jobs = -1;
+	int submodule_groups = 0;
 	struct string_list_item *item;
 	struct submodule_update_clone pp = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -449,6 +507,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 			      "specified number of revisions")),
 		OPT_INTEGER('j', "jobs", &max_jobs,
 			    N_("parallel jobs")),
+		OPT_BOOL(0, "groups", &submodule_groups,
+			 N_("operate only on configured groups")),
 		OPT__QUIET(&pp.quiet, N_("do't print cloning progress")),
 		OPT_END()
 	};
@@ -467,6 +527,9 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 		return 1;
 	}
 
+	if (submodule_groups)
+		load_submodule_groups(&pp.submodule_groups);
+
 	gitmodules_config();
 	/* Overlay the parsed .gitmodules file with .git/config */
 	git_config(git_submodule_config, NULL);
@@ -490,6 +553,7 @@ static int update_clone(int argc, const char **argv, const char *prefix)
 	for_each_string_list_item(item, &pp.projectlines)
 		utf8_fprintf(stdout, "%s", item->string);
 
+	free(pp.submodule_groups);
 	return 0;
 }
 
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 13%]

* Re: [RFC PATCH 0/5] Submodule Groups
  2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
                   ` (5 preceding siblings ...)
  2015-11-25 17:35  7% ` [RFC PATCH 0/5] Submodule Groups Jens Lehmann
@ 2015-11-25 17:50  7% ` Jens Lehmann
  6 siblings, 0 replies; 200+ results
From: Jens Lehmann @ 2015-11-25 17:50 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, ericsunshine, j6t,
	hvoigt

Am 25.11.2015 um 02:32 schrieb Stefan Beller:
> This is also available at https://github.com/stefanbeller/git/tree/submodule-groups
> It applies on top of the submodule-parallel-patch series I sent a few minutes ago.
>
> Consider having a real large software project in Git with each component
> in a submodule (such as an operating system, Android, Debian, Fedora,
> no toy OS such as https://github.com/gittup/gittup as that doesn't quite
> demonstrate the scale of the problem).
>
> If you have lots of submodules, you probably don't need all of them at once,
> but you have functional units. Some submodules are absolutely required,
> some are optional and only for very specific purposes.
>
> This patch series adds meaning to a "groups" field in the .gitmodules file.
>
> So you could have a .gitmodules file such as:
>
> [submodule "gcc"]
>          path = gcc
>          url = git://...
>          groups = default,devel
> [submodule "linux"]
>          path = linux
>          url = git://...
>          groups = default
> [submodule "nethack"]
>          path = nethack
>          url = git://...
>          groups = optional,games

Yup. Do you want the user to select only a single group or do you
plan to support selecting multiple groups at the same time too?

> and by this series you can work on an arbitrary subgroup of these submodules such
> using these commands:
>
>      git clone --group default --group devel git://...
>      # will clone the superproject and recursively
>      # checkout any submodule being in at least one of the groups.

Does this automatically configure the given group in .git/config, so
that all future submodule related commands know about this choice?
Me thinks that would make sense ...

>      git submodule add --group default --group devel git://... ..
>      # will add a submodule, adding 2 submodule
>      # groups to its entry in .gitmodule

Maybe '--groups default,devel' is easier to grok? Dunno.

>      # as support for clone we want to have:
>      git config submodule.groups default
>      git submodule init --groups

Hmm, I doubt it makes much sense to add the --group option to "git
submodule init". I'd rather init all submodules and do the group
handling only in the "git submodule update" command. That way
upstream can change grouping later without having the user to
fiddle with her configuration to make that work.

>      # will init all submodules from the default group
>
>      # as support for clone we want to have:
>      git config submodule.groups default
>      git submodule update --groups
>
>      # will update all submodules from the default group
>
> Any feedback welcome, specially on the design level!
> (Do we want to have it stored in the .gitmodules file? Do we want to have
> the groups configured in .git/config as "submodule.groups", any other way
> to make it future proof and extend the groups syntax?)

Not sure what exactly you mean by "it" here ;-)

Talking about what groups a submodule belongs to, an entry in the
.gitmodules file makes the most sense to me. That way upstream can
change submodule grouping or add new submodules with group assignments
from commit to commit, and "git submodule update" will do the right
thing for the superproject commit checked out.

And I believe that the choice which group(s?) the user is interested
should be recorded in .git/config, as that is his personal setting
that shouldn't be influenced by upstream changes.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 5/5] builtin/clone: support submodule groups
  2015-11-25  1:32 22% ` [PATCH 5/5] builtin/clone: support submodule groups Stefan Beller
@ 2015-11-25 17:52  8%   ` Jens Lehmann
  2015-11-25 18:08  7%     ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-25 17:52 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, ericsunshine, j6t,
	hvoigt

Am 25.11.2015 um 02:32 schrieb Stefan Beller:
> This passes each group to the `submodule update` invocation and
> additionally configures the groups to be automatically updated.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>   Documentation/git-clone.txt | 11 ++++++++
>   builtin/clone.c             | 33 ++++++++++++++++++++--
>   git-submodule.sh            |  5 ++++
>   t/t7400-submodule-basic.sh  | 69 +++++++++++++++++++++++++++++++++++++++++++++
>   t/t7406-submodule-update.sh | 32 +++++++++++++++++++++
>   5 files changed, 147 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index 59d8c67..fbf68ab 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -209,6 +209,17 @@ objects from the source repository into a pack in the cloned repository.
>   	repository does not have a worktree/checkout (i.e. if any of
>   	`--no-checkout`/`-n`, `--bare`, or `--mirror` is given)
>
> +--group::
> +	After the clone is created, all submodules which are part of the
> +	group are cloned. This option can be given multiple times to specify
> +	different groups.

Ah, that answers my question in my response to the cover letter ;-)

> This option will imply automatic submodule
> +	updates for the groups by setting `submodule.update=groups`.

Please don't. The per-submodule update setting configures how a
submodule has to be updated, adding a global one with a completely
different meaning (what submodules should be updated?) is confusing.
Why not "submodule.groups=<groups>"?

> +	The group selection will be passed on recursively, i.e. if a submodule
> +	is cloned because of group membership, its submodules will
> +	be cloned according to group membership, too. If a submodule is
> +	not cloned however, its submodules are not evaluated for group
> +	membership.

What do you mean by the last sentence? Did the clone fail? Then you
cannot update the submodule anyway ...

>   --separate-git-dir=<git dir>::
>   	Instead of placing the cloned repository where it is supposed
>   	to be, place the cloned repository at the specified directory,
> diff --git a/builtin/clone.c b/builtin/clone.c
> index ce578d2..17e9f54 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -51,6 +51,7 @@ static struct string_list option_config;
>   static struct string_list option_reference;
>   static int option_dissociate;
>   static int max_jobs = -1;
> +static struct string_list submodule_groups;
>
>   static struct option builtin_clone_options[] = {
>   	OPT__VERBOSITY(&option_verbosity),
> @@ -95,6 +96,8 @@ static struct option builtin_clone_options[] = {
>   		   N_("separate git dir from working tree")),
>   	OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
>   			N_("set config inside the new repository")),
> +	OPT_STRING_LIST('g', "group", &submodule_groups, N_("group"),
> +			N_("clone specific submodule groups")),
>   	OPT_END()
>   };
>
> @@ -723,9 +726,18 @@ static int checkout(void)
>   	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
>   			   sha1_to_hex(sha1), "1", NULL);
>
> -	if (!err && option_recursive) {
> +	if (err)
> +		goto out;
> +
> +	if (option_recursive || submodule_groups.nr > 0) {
>   		struct argv_array args = ARGV_ARRAY_INIT;
> -		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
> +		argv_array_pushl(&args, "submodule", "update", "--init", NULL);
> +
> +		if (option_recursive)
> +			argv_array_pushf(&args, "--recursive");
> +
> +		if (submodule_groups.nr > 0)
> +			argv_array_pushf(&args, "--groups");
>
>   		if (max_jobs != -1)
>   			argv_array_pushf(&args, "--jobs=%d", max_jobs);
> @@ -733,7 +745,7 @@ static int checkout(void)
>   		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
>   		argv_array_clear(&args);
>   	}
> -
> +out:
>   	return err;
>   }
>
> @@ -864,6 +876,21 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>   		option_no_checkout = 1;
>   	}
>
> +	if (option_recursive && submodule_groups.nr > 0)
> +		die(_("submodule groups and recursive flag are incompatible"));

Me thinks this contradicts your description of the --group option
in the man page. I don't see why such a restriction would make
sense, what incompatibility are you trying to avoid here? Maybe
we need another submodule-specific setting to tell update what
groups to use inside that submodule?

> +	if (submodule_groups.nr > 0) {
> +		int first_item = 1;
> +		struct string_list_item *item;
> +		struct strbuf sb = STRBUF_INIT;
> +		strbuf_addstr(&sb, "submodule.groups=");
> +		for_each_string_list_item(item, &submodule_groups) {
> +			strbuf_addf(&sb, "%s%s", first_item ? "" : ",", item->string);
> +			first_item = 0;
> +		}
> +		if (submodule_groups.nr > 0)
> +			string_list_append(&option_config, strbuf_detach(&sb, 0));
> +	}
> +
>   	if (!option_origin)
>   		option_origin = "origin";
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 4092a48..e3d1667 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -611,6 +611,7 @@ cmd_deinit()
>   #
>   cmd_update()
>   {
> +	groups=
>   	# parse $args after "submodule ... update".
>   	while test $# -ne 0
>   	do
> @@ -650,6 +651,9 @@ cmd_update()
>   		--checkout)
>   			update="checkout"
>   			;;
> +		--groups)
> +			groups=1
> +			;;
>   		--depth)
>   			case "$2" in '') usage ;; esac
>   			depth="--depth=$2"
> @@ -691,6 +695,7 @@ cmd_update()
>   		${update:+--update "$update"} \
>   		${reference:+--reference "$reference"} \
>   		${depth:+--depth "$depth"} \
> +		${groups:+--groups} \
>   		${jobs:+$jobs} \
>   		"$@" | {
>   	err=
> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
> index caed4be..e8654d7 100755
> --- a/t/t7400-submodule-basic.sh
> +++ b/t/t7400-submodule-basic.sh
> @@ -1049,4 +1049,73 @@ test_expect_success 'submodule init --group works' '
>   	)
>   '
>
> +cat <<EOF > expected
> +submodule
> +-submodule1
> +EOF
> +
> +test_expect_success 'submodule update --groups works' '
> +	test_when_finished "rm -rf super super_clone" &&
> +	mkdir super &&
> +	pwd=$(pwd) &&
> +	(
> +		cd super &&
> +		git init &&
> +		git submodule add --group groupA file://"$pwd"/example2 submodule &&
> +		git submodule add file://"$pwd"/example2 submodule1 &&
> +		git commit -a -m "create repository with 2 submodules, one is in a group"
> +	) &&
> +	git clone super super_clone &&
> +	(
> +		cd super_clone &&
> +		git config submodule.groups groupA &&
> +		git submodule init  &&
> +		git submodule update --groups &&
> +		git submodule status |cut -c1,42-52 | tr -d " " >../actual
> +	) &&
> +	test_cmp actual expected
> +'
> +
> +test_expect_success 'submodule update --init --groups works' '
> +	test_when_finished "rm -rf super super_clone" &&
> +	mkdir super &&
> +	pwd=$(pwd) &&
> +	(
> +		cd super &&
> +		git init &&
> +		git submodule add --group groupA file://"$pwd"/example2 submodule &&
> +		git submodule add file://"$pwd"/example2 submodule1 &&
> +		git commit -a -m "create repository with 2 submodules, one is in a group"
> +	) &&
> +	git clone super super_clone &&
> +	(
> +		cd super_clone &&
> +		git config submodule.groups groupA &&
> +		git submodule update --init --groups &&
> +		git submodule status |cut -c1,42-52 | tr -d " " >../actual
> +	) &&
> +	test_cmp actual expected
> +'
> +
> +test_expect_success 'clone --group works' '
> +	test_when_finished "rm -rf super super_clone" &&
> +	mkdir super &&
> +	pwd=$(pwd) &&
> +	(
> +		cd super &&
> +		git init &&
> +		git submodule add --group groupA file://"$pwd"/example2 submodule &&
> +		git submodule add file://"$pwd"/example2 submodule1 &&
> +		git commit -a -m "create repository with 2 submodules, one is in a group"
> +	) &&
> +	git clone --group groupA super super_clone &&
> +	(
> +		cd super_clone &&
> +		test_pause
> +		git submodule status |cut -c1,42-52 | tr -d " " >../actual
> +	) &&
> +	test_cmp actual expected
> +'
> +
> +
>   test_done
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index 090891e..7e59846 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -801,4 +801,36 @@ test_expect_success 'git clone passes the parallel jobs config on to submodules'
>   	rm -rf super4
>   '
>
> +cat >expect <<-EOF &&
> +-deeper/submodule
> +-merging
> +-moved/sub module
> +-none
> +-rebasing
> +-submodule
> +-submodule1
> +EOF
> +
> +# none, merging rebasing, submodule1, submodule
> +test_expect_success 'git clone works with submodule groups.' '
> +	test_when_finished "rm -rf super5" &&
> +	(
> +		cd super &&
> +		git config -f .gitmodules  submodule.submodule.groups default &&
> +		git config -f .gitmodules  submodule.submodule1.groups "default,testing" &&
> +		git config -f .gitmodules  submodule.none.groups testing &&
> +		git commit -a -m "assigning groups to submodules"
> +	) &&
> +	git clone --group default --group testing super super5 &&
> +	(
> +		cd super5 &&
> +		git submodule status |cut -c1,43- >../actual
> +	) &&
> +	test_cmp actual expect
> +'
> +
> +test_expect_success 'git submodule update --groups' '
> +	true
> +'
> +
>   test_done
>

^ permalink raw reply	[relevance 8%]

* Re: [RFC PATCH 0/5] Submodule Groups
  2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
                   ` (4 preceding siblings ...)
  2015-11-25  1:32 22% ` [PATCH 5/5] builtin/clone: support submodule groups Stefan Beller
@ 2015-11-25 17:35  7% ` Jens Lehmann
  2015-11-25 18:00  7%   ` Stefan Beller
  2015-11-25 17:50  7% ` Jens Lehmann
  6 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-25 17:35 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, ericsunshine, j6t,
	hvoigt

Am 25.11.2015 um 02:32 schrieb Stefan Beller:
> This is also available at https://github.com/stefanbeller/git/tree/submodule-groups
> It applies on top of the submodule-parallel-patch series I sent a few minutes ago.
>
> Consider having a real large software project in Git with each component
> in a submodule (such as an operating system, Android, Debian, Fedora,
> no toy OS such as https://github.com/gittup/gittup as that doesn't quite
> demonstrate the scale of the problem).
>
> If you have lots of submodules, you probably don't need all of them at once,
> but you have functional units. Some submodules are absolutely required,
> some are optional and only for very specific purposes.
>
> This patch series adds meaning to a "groups" field in the .gitmodules file.
>
> So you could have a .gitmodules file such as:
>
> [submodule "gcc"]
>          path = gcc
>          url = git://...
>          groups = default,devel
> [submodule "linux"]
>          path = linux
>          url = git://...
>          groups = default
> [submodule "nethack"]
>          path = nethack
>          url = git://...
>          groups = optional,games

Yup. Do you want the user to select only a single group or do you
plan to support selecting multiple groups at the same time too?

> and by this series you can work on an arbitrary subgroup of these submodules such
> using these commands:
>
>      git clone --group default --group devel git://...
>      # will clone the superproject and recursively
>      # checkout any submodule being in at least one of the groups.

Does this automatically configure the given group in .git/config, so
that all future submodule related commands know about this choice?
Me thinks that would make sense ...

>      git submodule add --group default --group devel git://... ..
>      # will add a submodule, adding 2 submodule
>      # groups to its entry in .gitmodule

Maybe '--groups default,devel' is easier to grok? Dunno.

>      # as support for clone we want to have:
>      git config submodule.groups default
>      git submodule init --groups

Hmm, I doubt it makes much sense to add the --group option to "git
submodule init". I'd rather init all submodules and do the group
handling only in the "git submodule update" command. That way
upstream can change grouping later without having the user to
fiddle with her configuration to make that work.

>      # will init all submodules from the default group
>
>      # as support for clone we want to have:
>      git config submodule.groups default
>      git submodule update --groups
>
>      # will update all submodules from the default group
>
> Any feedback welcome, specially on the design level!
> (Do we want to have it stored in the .gitmodules file? Do we want to have
> the groups configured in .git/config as "submodule.groups", any other way
> to make it future proof and extend the groups syntax?)

Not sure what exactly you mean by "it" here ;-)

Talking about what groups a submodule belongs to, an entry in the
.gitmodules file makes the most sense to me. That way upstream can
change submodule grouping or add new submodules with group assignments
from commit to commit, and "git submodule update" will do the right
thing for the superproject commit checked out.

And I believe that the choice which group(s?) the user is interested
should be recorded in .git/config, as that is his personal setting
that shouldn't be influenced by upstream changes.

^ permalink raw reply	[relevance 7%]

* Re: [RFC PATCH 0/5] Submodule Groups
  2015-11-25 17:35  7% ` [RFC PATCH 0/5] Submodule Groups Jens Lehmann
@ 2015-11-25 18:00  7%   ` Stefan Beller
  2015-11-25 19:18  6%     ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-25 18:00 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Eric Sunshine, Heiko Voigt

--cc Johannes Sixt

On Wed, Nov 25, 2015 at 9:35 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> [submodule "gcc"]
>>          path = gcc
>>          url = git://...
>>          groups = default,devel
>> [submodule "linux"]
>>          path = linux
>>          url = git://...
>>          groups = default
>> [submodule "nethack"]
>>          path = nethack
>>          url = git://...
>>          groups = optional,games
>
>
> Yup. Do you want the user to select only a single group or do you
> plan to support selecting multiple groups at the same time too?

Yes you should be able to select multiple groups, such as
default+devel or alternatively default+games.

The logical OR is supported in this patch series (all submodules which are
in at least one of the specified groups,i.e. A OR B OR C ...)

>
>> and by this series you can work on an arbitrary subgroup of these
>> submodules such
>> using these commands:
>>
>>      git clone --group default --group devel git://...
>>      # will clone the superproject and recursively
>>      # checkout any submodule being in at least one of the groups.
>
>
> Does this automatically configure the given group in .git/config, so
> that all future submodule related commands know about this choice?
> Me thinks that would make sense ...

It does. Internally it does

    git config submodule.groups A,B
    git submodule update --init --groups

whereas submodule update checks if the submodule.groups
value is set and if so operates on the groups only.

>
>>      git submodule add --group default --group devel git://... ..
>>      # will add a submodule, adding 2 submodule
>>      # groups to its entry in .gitmodule
>
>
> Maybe '--groups default,devel' is easier to grok? Dunno.

I guess that makes sense.

>
>>      # as support for clone we want to have:
>>      git config submodule.groups default
>>      git submodule init --groups
>
>
> Hmm, I doubt it makes much sense to add the --group option to "git
> submodule init". I'd rather init all submodules and do the group
> handling only in the "git submodule update" command. That way
> upstream can change grouping later without having the user to
> fiddle with her configuration to make that work.

Well if upstream changes grouping later, you could just run

    git submodule update --init --groups

and get what you want?

>
>>      # will init all submodules from the default group
>>
>>      # as support for clone we want to have:
>>      git config submodule.groups default
>>      git submodule update --groups
>>
>>      # will update all submodules from the default group
>>
>> Any feedback welcome, specially on the design level!
>> (Do we want to have it stored in the .gitmodules file? Do we want to have
>> the groups configured in .git/config as "submodule.groups", any other way
>> to make it future proof and extend the groups syntax?)
>
>
> Not sure what exactly you mean by "it" here ;-)
>
> Talking about what groups a submodule belongs to, an entry in the
> .gitmodules file makes the most sense to me. That way upstream can
> change submodule grouping or add new submodules with group assignments
> from commit to commit, and "git submodule update" will do the right
> thing for the superproject commit checked out.
>
> And I believe that the choice which group(s?) the user is interested
> should be recorded in .git/config, as that is his personal setting
> that shouldn't be influenced by upstream changes.

Right. I once discussed with Jonathan Nieder, who dreamed of a more
logical approach to the groups/sets of submodules. So more like set theory,
i.e. have a more complicated grammar: Get all submodules which are
in either A or B or (D AND E), but which are never in F.
So I'd imagine the groups are more like bit tags, and you can describe
a patterns you want.

I guess we want some more powerful eventually, so I asked this open ended
question there.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 5/5] builtin/clone: support submodule groups
  2015-11-25 17:52  8%   ` Jens Lehmann
@ 2015-11-25 18:08  7%     ` Stefan Beller
  2015-11-25 19:50  7%       ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-25 18:08 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Eric Sunshine, Johannes Sixt, Heiko Voigt

On Wed, Nov 25, 2015 at 9:52 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> +--group::
>> +       After the clone is created, all submodules which are part of the
>> +       group are cloned. This option can be given multiple times to
>> specify
>> +       different groups.
>
>
> Ah, that answers my question in my response to the cover letter ;-)
>
>> This option will imply automatic submodule
>> +       updates for the groups by setting `submodule.update=groups`.
>
>
> Please don't. The per-submodule update setting configures how a
> submodule has to be updated, adding a global one with a completely
> different meaning (what submodules should be updated?) is confusing.
> Why not "submodule.groups=<groups>"?

The documentation is out of date :/ as I was churning through lots of ideas,
so we do have a config submodule.groups=<groups> by now, but the
documentation is wrong.

>
>> +       The group selection will be passed on recursively, i.e. if a
>> submodule
>> +       is cloned because of group membership, its submodules will
>> +       be cloned according to group membership, too. If a submodule is
>> +       not cloned however, its submodules are not evaluated for group
>> +       membership.
>
>
> What do you mean by the last sentence? Did the clone fail? Then you
> cannot update the submodule anyway ...

Consider nested submodules:

    A: superproject containing
        B: which contains
            C.

If you clone A with group <C-but-not-B> you won't get C as we do not traverse
the submodules of B, as we don't clone B. Maybe it's obvious?

>> @@ -864,6 +876,21 @@ int cmd_clone(int argc, const char **argv, const char
>> *prefix)
>>                 option_no_checkout = 1;
>>         }
>>
>> +       if (option_recursive && submodule_groups.nr > 0)
>> +               die(_("submodule groups and recursive flag are
>> incompatible"));
>
>
> Me thinks this contradicts your description of the --group option
> in the man page. I don't see why such a restriction would make
> sense, what incompatibility are you trying to avoid here? Maybe
> we need another submodule-specific setting to tell update what
> groups to use inside that submodule?

So you want something like
    "In the top level respect the groups, but recursively get all of them"?

My thinking is that groups are implying recursive, whereas recursive implies
"all groups", so a git clone --group <half-the-submodules> --recursive
makes not much sense to me as it begs the question, what does --recursive
mean? Probably recurse into all submodules which are implied by the group
<half-the-submodules>. And then get all the nested submodules. But in case
you use the grouping feature, you could just mark the nested submodules with
groups, too?

^ permalink raw reply	[relevance 7%]

* Re: Git super slow on Windows 7
  @ 2015-11-25 18:47  5%   ` Stefan Beller
  2015-11-25 20:23  2%     ` Duy Nguyen
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-25 18:47 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Johannes Schindelin, GIT Mailing-list, stephan.arens

On Wed, Nov 25, 2015 at 10:42 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:
> After some investigation I figured that ~50 Submodules are the culprit.
> Does anyone have an idea how to speed up Git on Windows while keeping 50 Submodules?
>
> Thanks,
> Lars
>
>

Use the latest version of Git ;)

Checkout the series merged at 65e1449
(2015-10-05, Merge branch 'sb/submodule-helper')

    The infrastructure to rewrite "git submodule" in C is being built
    incrementally.  Let's polish these early parts well enough and make
    them graduate to 'next' and 'master', so that the more involved
    follow-up can start cooking on a solid ground.

    * sb/submodule-helper:
      submodule: rewrite `module_clone` shell function in C
      submodule: rewrite `module_name` shell function in C
      submodule: rewrite `module_list` shell function in C

More specifically the commits in there:

submodule: rewrite `module_name` shell function in C

    This implements the helper `name` in C instead of shell,
    yielding a nice performance boost.

    Before this patch, I measured a time (best out of three):

      $ time ./t7400-submodule-basic.sh  >/dev/null
        real 0m11.066s
        user 0m3.348s
        sys 0m8.534s

    With this patch applied I measured (also best out of three)

      $ time ./t7400-submodule-basic.sh  >/dev/null
        real 0m10.063s
        user 0m3.044s
        sys 0m7.487s

^ permalink raw reply	[relevance 5%]

* Re: [RFC PATCH 0/5] Submodule Groups
  2015-11-25 18:00  7%   ` Stefan Beller
@ 2015-11-25 19:18  6%     ` Jens Lehmann
  0 siblings, 0 replies; 200+ results
From: Jens Lehmann @ 2015-11-25 19:18 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Eric Sunshine, Heiko Voigt

(Sorry for the resend of my last mail, but I received bounce messages
from my email provider)

Am 25.11.2015 um 19:00 schrieb Stefan Beller:
> --cc Johannes Sixt
>
> On Wed, Nov 25, 2015 at 9:35 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>> [submodule "gcc"]
>>>           path = gcc
>>>           url = git://...
>>>           groups = default,devel
>>> [submodule "linux"]
>>>           path = linux
>>>           url = git://...
>>>           groups = default
>>> [submodule "nethack"]
>>>           path = nethack
>>>           url = git://...
>>>           groups = optional,games
>>
>>
>> Yup. Do you want the user to select only a single group or do you
>> plan to support selecting multiple groups at the same time too?
>
> Yes you should be able to select multiple groups, such as
> default+devel or alternatively default+games.
>
> The logical OR is supported in this patch series (all submodules which are
> in at least one of the specified groups,i.e. A OR B OR C ...)

Good, this is more flexible than restricting that to just a
single group.

>>> and by this series you can work on an arbitrary subgroup of these
>>> submodules such
>>> using these commands:
>>>
>>>       git clone --group default --group devel git://...
>>>       # will clone the superproject and recursively
>>>       # checkout any submodule being in at least one of the groups.
>>
>>
>> Does this automatically configure the given group in .git/config, so
>> that all future submodule related commands know about this choice?
>> Me thinks that would make sense ...
>
> It does. Internally it does
>
>      git config submodule.groups A,B
>      git submodule update --init --groups
>
> whereas submodule update checks if the submodule.groups
> value is set and if so operates on the groups only.

Makes sense (except for the "--groups" argument, see below ;-).

>>
>>>       # as support for clone we want to have:
>>>       git config submodule.groups default
>>>       git submodule init --groups
>>
>>
>> Hmm, I doubt it makes much sense to add the --group option to "git
>> submodule init". I'd rather init all submodules and do the group
>> handling only in the "git submodule update" command. That way
>> upstream can change grouping later without having the user to
>> fiddle with her configuration to make that work.
>
> Well if upstream changes grouping later, you could just run
>
>      git submodule update --init --groups
>
> and get what you want?

And make life harder than necessary for our users without having
a reason for that? Except for the URL copying submodule settings
on init is wrong, as it sets in stone what happened to be in the
.gitmodules file when you ran init and doesn't allow upstream to
easily change defaults later. We still do that with the update
setting for historical reasons, but I avoided making the same
mistake with all the options I added later. You can override
these settings if you want or need to, but that shouldn't be
necessary by default to make life easier for our users.

>>>       # will init all submodules from the default group
>>>
>>>       # as support for clone we want to have:
>>>       git config submodule.groups default
>>>       git submodule update --groups
>>>
>>>       # will update all submodules from the default group
>>>
>>> Any feedback welcome, specially on the design level!
>>> (Do we want to have it stored in the .gitmodules file? Do we want to have
>>> the groups configured in .git/config as "submodule.groups", any other way
>>> to make it future proof and extend the groups syntax?)
>>
>>
>> Not sure what exactly you mean by "it" here ;-)
>>
>> Talking about what groups a submodule belongs to, an entry in the
>> .gitmodules file makes the most sense to me. That way upstream can
>> change submodule grouping or add new submodules with group assignments
>> from commit to commit, and "git submodule update" will do the right
>> thing for the superproject commit checked out.
>>
>> And I believe that the choice which group(s?) the user is interested
>> should be recorded in .git/config, as that is his personal setting
>> that shouldn't be influenced by upstream changes.
>
> Right. I once discussed with Jonathan Nieder, who dreamed of a more
> logical approach to the groups/sets of submodules. So more like set theory,
> i.e. have a more complicated grammar: Get all submodules which are
> in either A or B or (D AND E), but which are never in F.
> So I'd imagine the groups are more like bit tags, and you can describe
> a patterns you want.

Ok, we can start with union and add intersection later when needed.

> I guess we want some more powerful eventually, so I asked this open ended
> question there.

And I don't think we need to implement everything right now, but we
should have thought things through as far as we can currently see,
to avoid running into problems later on ;-)

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 5/5] builtin/clone: support submodule groups
  2015-11-25 18:08  7%     ` Stefan Beller
@ 2015-11-25 19:50  7%       ` Jens Lehmann
  2015-11-25 20:03  7%         ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Jens Lehmann @ 2015-11-25 19:50 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Eric Sunshine, Johannes Sixt, Heiko Voigt

Am 25.11.2015 um 19:08 schrieb Stefan Beller:
> On Wed, Nov 25, 2015 at 9:52 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>> +--group::
>>> +       After the clone is created, all submodules which are part of the
>>> +       group are cloned. This option can be given multiple times to
>>> specify
>>> +       different groups.
>>
>>
>> Ah, that answers my question in my response to the cover letter ;-)
>>
>>> This option will imply automatic submodule
>>> +       updates for the groups by setting `submodule.update=groups`.
>>
>>
>> Please don't. The per-submodule update setting configures how a
>> submodule has to be updated, adding a global one with a completely
>> different meaning (what submodules should be updated?) is confusing.
>> Why not "submodule.groups=<groups>"?
>
> The documentation is out of date :/ as I was churning through lots of ideas,
> so we do have a config submodule.groups=<groups> by now, but the
> documentation is wrong.

Thanks for explaining, I did not look at the code very closely so
far so I missed that.

>>
>>> +       The group selection will be passed on recursively, i.e. if a
>>> submodule
>>> +       is cloned because of group membership, its submodules will
>>> +       be cloned according to group membership, too. If a submodule is
>>> +       not cloned however, its submodules are not evaluated for group
>>> +       membership.
>>
>>
>> What do you mean by the last sentence? Did the clone fail? Then you
>> cannot update the submodule anyway ...
>
> Consider nested submodules:
>
>      A: superproject containing
>          B: which contains
>              C.
>
> If you clone A with group <C-but-not-B> you won't get C as we do not traverse
> the submodules of B, as we don't clone B. Maybe it's obvious?

Maybe yes. Everything about submodule C is configured in B's
.gitmodules file, not in A's. So you cannot find submodule C
in A's .gitmodules (and it thus cannot be in one of A's submodule
groups either). And if cloning B fails, you have no .gitmodules
file to get the URL of C to clone it from in the first place. So
I think the concept 'group <C-but-not-B>' doesn't make any sense
when C is a submodule of B.

>>> @@ -864,6 +876,21 @@ int cmd_clone(int argc, const char **argv, const char
>>> *prefix)
>>>                  option_no_checkout = 1;
>>>          }
>>>
>>> +       if (option_recursive && submodule_groups.nr > 0)
>>> +               die(_("submodule groups and recursive flag are
>>> incompatible"));
>>
>>
>> Me thinks this contradicts your description of the --group option
>> in the man page. I don't see why such a restriction would make
>> sense, what incompatibility are you trying to avoid here? Maybe
>> we need another submodule-specific setting to tell update what
>> groups to use inside that submodule?
>
> So you want something like
>      "In the top level respect the groups, but recursively get all of them"?

Nope, only those that are chosen by the groups.

> My thinking is that groups are implying recursive, whereas recursive implies
> "all groups", so a git clone --group <half-the-submodules> --recursive
> makes not much sense to me as it begs the question, what does --recursive
> mean?

Groups are only about what submodules to update and have nothing to
do with recursion. It might make sense to imply recursion, but that's
just because that should have been the default for submodules from day
one. Recursion and groups are orthogonal, the first is about what to
do inside the submodules (carry on or not?) and the latter is about
what to do in the superproject (shall I update this submodule?).

 > Probably recurse into all submodules which are implied by the group
> <half-the-submodules>.

Yep. We also do not recurse into those submodules having set their
update setting to "none", so we do not do that for submodules not
in any chosen group either.

 > And then get all the nested submodules. But in case
> you use the grouping feature, you could just mark the nested submodules with
> groups, too?

Not in the top superproject. In a submodule you can specify new groups
for its sub-submodules, but these will in most cases be different from
those of the superproject.

Imagine I have this really cool Metaproject which contains the Android
superproject as a submodule. Those two will define different groups,
and when recursing into the android submodule I need to choose from the
Android specific groups. So my Metaproject's .gitmodules could look like
this:

[submodule "android"]
         path = android
         url = git://...
         groups = default,mobile
         subgroups = devel

"groups" tells git what superproject groups the android submodule
belongs to, and "subgroups" tells git what android submodules are
to be checked out when running recursively into it. If you do not
configure "subgroups", the whole android submodule is updated when
one of the groups "default" or "mobile" is chosen in the superproject.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 5/5] builtin/clone: support submodule groups
  2015-11-25 19:50  7%       ` Jens Lehmann
@ 2015-11-25 20:03  7%         ` Stefan Beller
  2015-11-25 22:30  7%           ` Jens Lehmann
  0 siblings, 1 reply; 200+ results
From: Stefan Beller @ 2015-11-25 20:03 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Eric Sunshine, Heiko Voigt

On Wed, Nov 25, 2015 at 11:50 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>
>> My thinking is that groups are implying recursive, whereas recursive
>> implies
>> "all groups", so a git clone --group <half-the-submodules> --recursive
>> makes not much sense to me as it begs the question, what does --recursive
>> mean?
>
>
> Groups are only about what submodules to update and have nothing to
> do with recursion. It might make sense to imply recursion, but that's
> just because that should have been the default for submodules from day
> one. Recursion and groups are orthogonal, the first is about what to
> do inside the submodules (carry on or not?) and the latter is about
> what to do in the superproject (shall I update this submodule?).

I see. So we would not want to mutually exclude recurse and groups,
but rather have groups implies --recurse, but you are allowed to give
--no-recurse if you explicitely do not want to recurse into the subsubmodules.

>
>> Probably recurse into all submodules which are implied by the group
>>
>> <half-the-submodules>.
>
>
> Yep. We also do not recurse into those submodules having set their
> update setting to "none", so we do not do that for submodules not
> in any chosen group either.
>
>> And then get all the nested submodules. But in case
>>
>> you use the grouping feature, you could just mark the nested submodules
>> with
>> groups, too?
>
>
> Not in the top superproject. In a submodule you can specify new groups
> for its sub-submodules, but these will in most cases be different from
> those of the superproject.
>
> Imagine I have this really cool Metaproject which contains the Android
> superproject as a submodule. Those two will define different groups,
> and when recursing into the android submodule I need to choose from the
> Android specific groups. So my Metaproject's .gitmodules could look like
> this:
>
> [submodule "android"]
>         path = android
>         url = git://...
>         groups = default,mobile
>         subgroups = devel
>
> "groups" tells git what superproject groups the android submodule
> belongs to, and "subgroups" tells git what android submodules are
> to be checked out when running recursively into it. If you do not
> configure "subgroups", the whole android submodule is updated when
> one of the groups "default" or "mobile" is chosen in the superproject.

I like the concept of subgroups as it allows to have some control over
subsubmodules you may want to aggregate from a third party via the
middleman submodule.

I'd prefer to delay that feature though by not giving a high priority.
Also would you go with subsubgroups, too? When does the recursion
end? In case we have more than the union of groups, but also prohibitive
terms available, could subgroups clash with the submodules groups spec?

^ permalink raw reply	[relevance 7%]

* Re: Git super slow on Windows 7
  2015-11-25 18:47  5%   ` Stefan Beller
@ 2015-11-25 20:23  2%     ` Duy Nguyen
  2015-11-25 20:42  5%       ` Stefan Beller
  0 siblings, 1 reply; 200+ results
From: Duy Nguyen @ 2015-11-25 20:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Lars Schneider, Johannes Schindelin, GIT Mailing-list,
	stephan.arens

On Wed, Nov 25, 2015 at 7:47 PM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, Nov 25, 2015 at 10:42 AM, Lars Schneider
> <larsxschneider@gmail.com> wrote:
>> After some investigation I figured that ~50 Submodules are the culprit.
>> Does anyone have an idea how to speed up Git on Windows while keeping 50 Submodules?
>>
>> Thanks,
>> Lars
>>
>>
>
> Use the latest version of Git ;)

Does it do parallel refresh yet? I think it would help.  I only looked
at "git log --merges origin/pu" and nothing caught my eyes.
-- 
Duy

^ permalink raw reply	[relevance 2%]

* Re: Git super slow on Windows 7
  2015-11-25 20:23  2%     ` Duy Nguyen
@ 2015-11-25 20:42  5%       ` Stefan Beller
  0 siblings, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25 20:42 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Lars Schneider, Johannes Schindelin, GIT Mailing-list,
	stephan.arens

On Wed, Nov 25, 2015 at 12:23 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Wed, Nov 25, 2015 at 7:47 PM, Stefan Beller <sbeller@google.com> wrote:
>> On Wed, Nov 25, 2015 at 10:42 AM, Lars Schneider
>> <larsxschneider@gmail.com> wrote:
>>> After some investigation I figured that ~50 Submodules are the culprit.
>>> Does anyone have an idea how to speed up Git on Windows while keeping 50 Submodules?
>>>
>>> Thanks,
>>> Lars
>>>
>>>
>>
>> Use the latest version of Git ;)
>
> Does it do parallel refresh yet? I think it would help.  I only looked
> at "git log --merges origin/pu" and nothing caught my eyes.

No. The hinted patch series only does a partial shell->C conversion, which
is the best guess for improving git status here.

I punted on parallel local operations inside "git submodule update" for now, too
as when things go wrong there, you need a human to resolve the merge conflict,
and as a user you only want to deal with one merge conflict at a time instead of
being left there with a ton of unresolved issues (according to the git
log of older
patches in the submodule area).

git status should require not human interaction if things go bad
within submodules,
so we may want to speed that up by parallelizing the submodule part. The status
command gathers all information of submodules by a call to "git
submodule summary"
and does some slight post processing on the output. "git submodule
summary" however
is written completely in shell code (200 lines, so I estimate 400 lines of C).

I will benchmark that later today and check if it's worth for us to
rewrite that in C for
our case (we plan to have lots more submodules, but we're a linux shop)

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 5/5] builtin/clone: support submodule groups
  2015-11-25 20:03  7%         ` Stefan Beller
@ 2015-11-25 22:30  7%           ` Jens Lehmann
  2015-11-25 22:51  7%             ` Stefan Beller
  2015-11-26  0:31 19%             ` [PATCHv2] " Stefan Beller
  0 siblings, 2 replies; 200+ results
From: Jens Lehmann @ 2015-11-25 22:30 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Eric Sunshine, Heiko Voigt

Am 25.11.2015 um 21:03 schrieb Stefan Beller:
> On Wed, Nov 25, 2015 at 11:50 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>
>>> My thinking is that groups are implying recursive, whereas recursive
>>> implies
>>> "all groups", so a git clone --group <half-the-submodules> --recursive
>>> makes not much sense to me as it begs the question, what does --recursive
>>> mean?
>>
>>
>> Groups are only about what submodules to update and have nothing to
>> do with recursion. It might make sense to imply recursion, but that's
>> just because that should have been the default for submodules from day
>> one. Recursion and groups are orthogonal, the first is about what to
>> do inside the submodules (carry on or not?) and the latter is about
>> what to do in the superproject (shall I update this submodule?).
>
> I see. So we would not want to mutually exclude recurse and groups,
> but rather have groups implies --recurse, but you are allowed to give
> --no-recurse if you explicitely do not want to recurse into the subsubmodules.

Exactly.

>>> And then get all the nested submodules. But in case
>>>
>>> you use the grouping feature, you could just mark the nested submodules
>>> with
>>> groups, too?
>>
>>
>> Not in the top superproject. In a submodule you can specify new groups
>> for its sub-submodules, but these will in most cases be different from
>> those of the superproject.
>>
>> Imagine I have this really cool Metaproject which contains the Android
>> superproject as a submodule. Those two will define different groups,
>> and when recursing into the android submodule I need to choose from the
>> Android specific groups. So my Metaproject's .gitmodules could look like
>> this:
>>
>> [submodule "android"]
>>          path = android
>>          url = git://...
>>          groups = default,mobile
>>          subgroups = devel
>>
>> "groups" tells git what superproject groups the android submodule
>> belongs to, and "subgroups" tells git what android submodules are
>> to be checked out when running recursively into it. If you do not
>> configure "subgroups", the whole android submodule is updated when
>> one of the groups "default" or "mobile" is chosen in the superproject.
>
> I like the concept of subgroups as it allows to have some control over
> subsubmodules you may want to aggregate from a third party via the
> middleman submodule.

That's the point (though maybe someone might come up with a better
name than "subgroups" ;-). And each repo configures its own submodule
groups.

> I'd prefer to delay that feature though by not giving a high priority.

No problem, we can start with "check out all subsubmodules" for now.
But I suspect we'll need subgroups rather sooner than later.

> Also would you go with subsubgroups, too? When does the recursion
> end?

Subsubgroups do not make sense in the superproject, that can only
configure its direct submodules. I think you are talking about the
groups of the subsubmodules, and these have to be chosen inside the
first level submodules via the subgroups of its submodules (which
are the second level submodules of the superproject). Still with
me? ;-) So the recursion can go on forever even as soon as we
implement the subgroup configuration.

 > In case we have more than the union of groups, but also prohibitive
> terms available, could subgroups clash with the submodules groups spec?

Not that I'm aware of. Groups decide which submodules to update and
only for those submodules subgroups tell git what group to use inside
that submodule. And so on.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH 5/5] builtin/clone: support submodule groups
  2015-11-25 22:30  7%           ` Jens Lehmann
@ 2015-11-25 22:51  7%             ` Stefan Beller
  2015-11-26  0:31 19%             ` [PATCHv2] " Stefan Beller
  1 sibling, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-25 22:51 UTC (permalink / raw)
  To: Jens Lehmann
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Eric Sunshine, Heiko Voigt

On Wed, Nov 25, 2015 at 2:30 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>>
>>
>> I like the concept of subgroups as it allows to have some control over
>> subsubmodules you may want to aggregate from a third party via the
>> middleman submodule.
>
>
> That's the point (though maybe someone might come up with a better
> name than "subgroups" ;-). And each repo configures its own submodule
> groups.
>
>> I'd prefer to delay that feature though by not giving a high priority.
>
>
> No problem, we can start with "check out all subsubmodules" for now.
> But I suspect we'll need subgroups rather sooner than later.

Oh!
I thought we'd recursively propagate the groups, so the subsubmodules
are checked to be either in groups or subgroups, and the subgroups are
just a way to enhance the union of groups.

>
>> Also would you go with subsubgroups, too? When does the recursion
>> end?
>
>
> Subsubgroups do not make sense in the superproject, that can only
> configure its direct submodules.

> I think you are talking about the
> groups of the subsubmodules, and these have to be chosen inside the
> first level submodules via the subgroups of its submodules (which
> are the second level submodules of the superproject). Still with
> me? ;-)

I believe so.

> So the recursion can go on forever even as soon as we
> implement the subgroup configuration.

So lets say you have your meta collection repository,
which looks like that:

operating systems:
    ubuntu
        linux
        nonfree-game
        ...
    gentoo
        ...
    fedora
        ...
    android
        linux
        linux-build-configs
            vendor-phones
            nexus-family

In the "operating systems" repo I have the submodules
ubuntu and android marked via a group: "work-related".

Now I want to specify to have the linux, linux-build-configs,
and in there the nexus-family in the android repository.

One way would be to have the "operating systems" repo to
have subsubgroups specifying the groups in the
submodules of linux-build-configs (3rd level of submodules).
You seem to oppose that.

The other way to do that, would be to have a fork of the android
repo and put in the right subgroups to select for the right submodules
in linux-build-configs. So forking and fixing the groups config
would be the way to make changes from upstream.

I personally would find it easier to have all the spec in the one
superproject repository as then I don't need to update the
forks. (the .gitmodules file would get some conflicts, in case of my
fork there, so it's not easy to maintain long term)

Is there yet another way to handle such a case of deeply nested
submodules properly?

>
>> In case we have more than the union of groups, but also prohibitive
>>
>> terms available, could subgroups clash with the submodules groups spec?
>
>
> Not that I'm aware of. Groups decide which submodules to update and
> only for those submodules subgroups tell git what group to use inside
> that submodule. And so on.

^ permalink raw reply	[relevance 7%]

* [PATCHv2] builtin/clone: support submodule groups
  2015-11-25 22:30  7%           ` Jens Lehmann
  2015-11-25 22:51  7%             ` Stefan Beller
@ 2015-11-26  0:31 19%             ` Stefan Beller
  1 sibling, 0 replies; 200+ results
From: Stefan Beller @ 2015-11-26  0:31 UTC (permalink / raw)
  To: git, Jens.Lehmann
  Cc: peff, gitster, jrnieder, johannes.schindelin, ericsunshine,
	hvoigt, Stefan Beller

This passes each group to the `submodule update` invocation and
additionally configures the groups to be automatically updated.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

This is a resend of the patch "[PATCH 5/5] builtin/clone: support submodule groups"
as that's where Jens and I discussed.

* reworded the documentation to match reality of the patch
* --recurse is now implied and can be turned off.

Thanks for the fast feedback,
Stefan

Interdiff to previous version [PATCH 5/5] builtin/clone: support submodule groups
        --- a/Documentation/git-clone.txt
        +++ b/Documentation/git-clone.txt
        @@ -211,14 +211,16 @@ objects from the source repository into a pack in the cloned repository.
         
         --group::
                After the clone is created, all submodules which are part of the
        -       group are cloned. This option can be given multiple times to specify
        -       different groups. This option will imply automatic submodule
        -       updates for the groups by setting `submodule.update=groups`.
        -       The group selection will be passed on recursively, i.e. if a submodule
        -       is cloned because of group membership, its submodules will
        -       be cloned according to group membership, too. If a submodule is
        -       not cloned however, its submodules are not evaluated for group
        -       membership.
        +       given groups are cloned. To specify multiple groups, you can either
        +       give the group argument multiple times or comma separate the groups.
        +       This option will be recorded in the `submodule.groups` config,
        +       which will affect the behavior of other submodule related commands,
        +       such as `git submodule update`.
        +       This option implies recursive submodule checkout. If you don't
        +       want to recurse into nested submodules, you need to specify
        +       `--no-recursive`. The group selection will be passed on recursively,
        +       i.e. if a submodule is cloned because of group membership, its
        +       submodules will be cloned according to group membership, too.
         
         --separate-git-dir=<git dir>::
                Instead of placing the cloned repository where it is supposed
        diff --git a/builtin/clone.c b/builtin/clone.c
        index 17e9f54..377c031 100644
        --- a/builtin/clone.c
        +++ b/builtin/clone.c
        @@ -39,7 +39,7 @@ static const char * const builtin_clone_usage[] = {
         };
         
         static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
        -static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
        +static int option_local = -1, option_no_hardlinks, option_shared, option_recursive = -1;
         static char *option_template, *option_depth;
         static char *option_origin = NULL;
         static char *option_branch = NULL;
        @@ -875,9 +875,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
                                die(_("--bare and --separate-git-dir are incompatible."));
                        option_no_checkout = 1;
                }
        -
        -       if (option_recursive && submodule_groups.nr > 0)
        -               die(_("submodule groups and recursive flag are incompatible"));
        +       if (option_recursive == -1) {
        +               if (submodule_groups.nr > 0)
        +                       option_recursive = 1; /* submodule groups implies recursive */
        +               else
        +                       option_recursive = 0; /* preserve historical default */
        +       }
                if (submodule_groups.nr > 0) {
                        int first_item = 1;
                        struct string_list_item *item;

Here comes the actual patch:

 Documentation/git-clone.txt | 13 +++++++++
 builtin/clone.c             | 38 ++++++++++++++++++++++---
 git-submodule.sh            |  5 ++++
 t/t7400-submodule-basic.sh  | 69 +++++++++++++++++++++++++++++++++++++++++++++
 t/t7406-submodule-update.sh | 32 +++++++++++++++++++++
 5 files changed, 153 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 59d8c67..2539fea 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -209,6 +209,19 @@ objects from the source repository into a pack in the cloned repository.
 	repository does not have a worktree/checkout (i.e. if any of
 	`--no-checkout`/`-n`, `--bare`, or `--mirror` is given)
 
+--group::
+	After the clone is created, all submodules which are part of the
+	given groups are cloned. To specify multiple groups, you can either
+	give the group argument multiple times or comma separate the groups.
+	This option will be recorded in the `submodule.groups` config,
+	which will affect the behavior of other submodule related commands,
+	such as `git submodule update`.
+	This option implies recursive submodule checkout. If you don't
+	want to recurse into nested submodules, you need to specify
+	`--no-recursive`. The group selection will be passed on recursively,
+	i.e. if a submodule is cloned because of group membership, its
+	submodules will be cloned according to group membership, too.
+
 --separate-git-dir=<git dir>::
 	Instead of placing the cloned repository where it is supposed
 	to be, place the cloned repository at the specified directory,
diff --git a/builtin/clone.c b/builtin/clone.c
index ce578d2..377c031 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -39,7 +39,7 @@ static const char * const builtin_clone_usage[] = {
 };
 
 static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
-static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
+static int option_local = -1, option_no_hardlinks, option_shared, option_recursive = -1;
 static char *option_template, *option_depth;
 static char *option_origin = NULL;
 static char *option_branch = NULL;
@@ -51,6 +51,7 @@ static struct string_list option_config;
 static struct string_list option_reference;
 static int option_dissociate;
 static int max_jobs = -1;
+static struct string_list submodule_groups;
 
 static struct option builtin_clone_options[] = {
 	OPT__VERBOSITY(&option_verbosity),
@@ -95,6 +96,8 @@ static struct option builtin_clone_options[] = {
 		   N_("separate git dir from working tree")),
 	OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
 			N_("set config inside the new repository")),
+	OPT_STRING_LIST('g', "group", &submodule_groups, N_("group"),
+			N_("clone specific submodule groups")),
 	OPT_END()
 };
 
@@ -723,9 +726,18 @@ static int checkout(void)
 	err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
 			   sha1_to_hex(sha1), "1", NULL);
 
-	if (!err && option_recursive) {
+	if (err)
+		goto out;
+
+	if (option_recursive || submodule_groups.nr > 0) {
 		struct argv_array args = ARGV_ARRAY_INIT;
-		argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+		argv_array_pushl(&args, "submodule", "update", "--init", NULL);
+
+		if (option_recursive)
+			argv_array_pushf(&args, "--recursive");
+
+		if (submodule_groups.nr > 0)
+			argv_array_pushf(&args, "--groups");
 
 		if (max_jobs != -1)
 			argv_array_pushf(&args, "--jobs=%d", max_jobs);
@@ -733,7 +745,7 @@ static int checkout(void)
 		err = run_command_v_opt(args.argv, RUN_GIT_CMD);
 		argv_array_clear(&args);
 	}
-
+out:
 	return err;
 }
 
@@ -863,6 +875,24 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 			die(_("--bare and --separate-git-dir are incompatible."));
 		option_no_checkout = 1;
 	}
+	if (option_recursive == -1) {
+		if (submodule_groups.nr > 0)
+			option_recursive = 1; /* submodule groups implies recursive */
+		else
+			option_recursive = 0; /* preserve historical default */
+	}
+	if (submodule_groups.nr > 0) {
+		int first_item = 1;
+		struct string_list_item *item;
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, "submodule.groups=");
+		for_each_string_list_item(item, &submodule_groups) {
+			strbuf_addf(&sb, "%s%s", first_item ? "" : ",", item->string);
+			first_item = 0;
+		}
+		if (submodule_groups.nr > 0)
+			string_list_append(&option_config, strbuf_detach(&sb, 0));
+	}
 
 	if (!option_origin)
 		option_origin = "origin";
diff --git a/git-submodule.sh b/git-submodule.sh
index 4092a48..e3d1667 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -611,6 +611,7 @@ cmd_deinit()
 #
 cmd_update()
 {
+	groups=
 	# parse $args after "submodule ... update".
 	while test $# -ne 0
 	do
@@ -650,6 +651,9 @@ cmd_update()
 		--checkout)
 			update="checkout"
 			;;
+		--groups)
+			groups=1
+			;;
 		--depth)
 			case "$2" in '') usage ;; esac
 			depth="--depth=$2"
@@ -691,6 +695,7 @@ cmd_update()
 		${update:+--update "$update"} \
 		${reference:+--reference "$reference"} \
 		${depth:+--depth "$depth"} \
+		${groups:+--groups} \
 		${jobs:+$jobs} \
 		"$@" | {
 	err=
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index caed4be..e8654d7 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -1049,4 +1049,73 @@ test_expect_success 'submodule init --group works' '
 	)
 '
 
+cat <<EOF > expected
+submodule
+-submodule1
+EOF
+
+test_expect_success 'submodule update --groups works' '
+	test_when_finished "rm -rf super super_clone" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git submodule add file://"$pwd"/example2 submodule1 &&
+		git commit -a -m "create repository with 2 submodules, one is in a group"
+	) &&
+	git clone super super_clone &&
+	(
+		cd super_clone &&
+		git config submodule.groups groupA &&
+		git submodule init  &&
+		git submodule update --groups &&
+		git submodule status |cut -c1,42-52 | tr -d " " >../actual
+	) &&
+	test_cmp actual expected
+'
+
+test_expect_success 'submodule update --init --groups works' '
+	test_when_finished "rm -rf super super_clone" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git submodule add file://"$pwd"/example2 submodule1 &&
+		git commit -a -m "create repository with 2 submodules, one is in a group"
+	) &&
+	git clone super super_clone &&
+	(
+		cd super_clone &&
+		git config submodule.groups groupA &&
+		git submodule update --init --groups &&
+		git submodule status |cut -c1,42-52 | tr -d " " >../actual
+	) &&
+	test_cmp actual expected
+'
+
+test_expect_success 'clone --group works' '
+	test_when_finished "rm -rf super super_clone" &&
+	mkdir super &&
+	pwd=$(pwd) &&
+	(
+		cd super &&
+		git init &&
+		git submodule add --group groupA file://"$pwd"/example2 submodule &&
+		git submodule add file://"$pwd"/example2 submodule1 &&
+		git commit -a -m "create repository with 2 submodules, one is in a group"
+	) &&
+	git clone --group groupA super super_clone &&
+	(
+		cd super_clone &&
+		test_pause
+		git submodule status |cut -c1,42-52 | tr -d " " >../actual
+	) &&
+	test_cmp actual expected
+'
+
+
 test_done
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 090891e..7e59846 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -801,4 +801,36 @@ test_expect_success 'git clone passes the parallel jobs config on to submodules'
 	rm -rf super4
 '
 
+cat >expect <<-EOF &&
+-deeper/submodule
+-merging
+-moved/sub module
+-none
+-rebasing
+-submodule
+-submodule1
+EOF
+
+# none, merging rebasing, submodule1, submodule
+test_expect_success 'git clone works with submodule groups.' '
+	test_when_finished "rm -rf super5" &&
+	(
+		cd super &&
+		git config -f .gitmodules  submodule.submodule.groups default &&
+		git config -f .gitmodules  submodule.submodule1.groups "default,testing" &&
+		git config -f .gitmodules  submodule.none.groups testing &&
+		git commit -a -m "assigning groups to submodules"
+	) &&
+	git clone --group default --group testing super super5 &&
+	(
+		cd super5 &&
+		git submodule status |cut -c1,43- >../actual
+	) &&
+	test_cmp actual expect
+'
+
+test_expect_success 'git submodule update --groups' '
+	true
+'
+
 test_done
-- 
2.6.1.261.g0d9c4c1

^ permalink raw reply related	[relevance 19%]

Results 401-600 of ~6000   |  | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2015-10-12 19:24     [PATCH] submodule-config: Untangle logic in parse_config Junio C Hamano
2015-10-13  0:02 24% ` [PATCHv2] submodule-config: Shorten " Stefan Beller
2015-10-12 22:52     [PATCH] Add fetch.recurseSubmoduleParallelism config option Stefan Beller
2015-10-12 23:14     ` Junio C Hamano
2015-10-12 23:31       ` Stefan Beller
2015-10-12 23:50  5%     ` Junio C Hamano
2015-10-16 17:04  2%       ` Stefan Beller
2015-10-16 17:26  5%         ` Junio C Hamano
2015-10-15 22:50     Make "git checkout" automatically update submodules? Kannan Goundan
2015-10-23 17:20  7% ` Stefan Beller
2015-10-23 19:11  6%   ` Junio C Hamano
2015-10-23 22:51  4%   ` Kannan Goundan
2015-10-16  1:52 10% [RFC PATCHv1 00/12] git submodule update in C with parallel cloning Stefan Beller
2015-10-16  1:52 20% ` [PATCH 01/12] git submodule update: Announce skipping submodules on stderr Stefan Beller
2015-10-16 20:37  5%   ` Junio C Hamano
2015-10-16 20:47  5%     ` Stefan Beller
2015-10-16  1:52 26% ` [PATCH 02/12] git submodule update: Announce uninitialized modules " Stefan Beller
2015-10-16 20:54  4%   ` Junio C Hamano
2015-10-16  1:52 17% ` [PATCH 03/12] git submodule update: Move branch calculation to where it's needed Stefan Beller
2015-10-16 20:54  4%   ` Junio C Hamano
2015-10-16  1:52 32% ` [PATCH 04/12] git submodule update: Announce outcome of submodule operation to stderr Stefan Beller
2015-10-16  1:52 18% ` [PATCH 05/12] git submodule update: Use its own list implementation Stefan Beller
2015-10-16 21:02  6%   ` Junio C Hamano
2015-10-16 21:08  4%     ` Stefan Beller
2015-10-16  1:52 24% ` [PATCH 06/12] git submodule update: Handle unmerged submodules in C Stefan Beller
2015-10-20 21:11  8%   ` Junio C Hamano
2015-10-20 21:21  5%     ` Stefan Beller
2015-10-16  1:52 27% ` [PATCH 07/12] submodule config: keep update strategy around Stefan Beller
2015-10-16  1:52 23% ` [PATCH 08/12] git submodule update: check for "none" in C Stefan Beller
2015-10-16  1:52 23% ` [PATCH 09/12] git submodule update: Check url " Stefan Beller
2015-10-16  1:52 20% ` [PATCH 10/12] git submodule update: Clone projects from within C Stefan Beller
2015-10-16  1:52 13% ` [PATCH 11/12] submodule--helper: Do not emit submodules to process directly Stefan Beller
2015-10-16  1:52 18% ` [PATCH 12/12] WIP/broken Clone all outstanding submodules in parallel Stefan Beller
2015-10-19 18:24  7% [PATCH 0/5] Fixes for the parallel processing Stefan Beller
2015-10-19 19:28     [RFC] URL rewrite in .gitmodules Lars Schneider
2015-10-19 22:07  5% ` Stefan Beller
2015-10-25 14:43  5%   ` Lars Schneider
2015-10-20 17:33     ` Junio C Hamano
2015-10-25 15:12       ` Lars Schneider
2015-10-26 16:34  2%     ` Stefan Beller
2015-10-26 16:52  2%       ` Jens Lehmann
2015-11-15 13:16  0%         ` Lars Schneider
2015-10-20 22:43  9% [PATCH 0/8] Fixes for the parallel processing engine and git submodule update Stefan Beller
2015-10-20 22:43 26% ` [PATCH 7/8] submodule config: Keep update strategy around Stefan Beller
2015-10-20 22:43 21% ` [PATCH 8/8] git submodule update: Have a dedicated helper for cloning Stefan Beller
2015-10-21 20:47  4%   ` Junio C Hamano
2015-10-21 21:06  7%     ` Stefan Beller
2015-10-21 21:23  6%       ` Junio C Hamano
2015-10-21 22:14  7%         ` Stefan Beller
2015-10-23 18:44 10% [PATCH 0/3] expose parallelism for submodule {update, clone} Stefan Beller
2015-10-23 18:44 21% ` [PATCH 1/3] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-10-23 18:44 20% ` [PATCH 2/3] submodule update: Expose parallelism to the user Stefan Beller
2015-10-23 18:44 19% ` [PATCH 3/3] clone: Allow an explicit argument for parallel submodule clones Stefan Beller
2015-10-28 21:03  4%   ` Sebastian Schuberth
2015-10-23 19:25  7% ` [PATCH 0/3] expose parallelism for submodule {update, clone} Junio C Hamano
2015-10-23 19:33  7%   ` Stefan Beller
2015-10-25 23:10     Why are submodules not automatically handled by default or at least configurable to do so? John Smith
2015-10-26  0:56     ` Chris Packham
2015-10-26 16:28  7%   ` Stefan Beller
2015-10-26 19:53  4%     ` Junio C Hamano
2015-10-27 18:15  9% [PATCH 0/9] Expose the submodule parallelism to the user Stefan Beller
2015-10-27 18:15 24% ` [PATCH 1/9] submodule-config: "goto" removal in parse_config() Stefan Beller
2015-10-27 21:26  4%   ` Jonathan Nieder
2015-10-27 18:15 26% ` [PATCH 2/9] submodule config: keep update strategy around Stefan Beller
2015-10-27 18:15 21% ` [PATCH 4/9] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-10-27 18:15 23% ` [PATCH 5/9] submodule update: expose parallelism to the user Stefan Beller
2015-10-27 20:59  6%   ` Junio C Hamano
2015-10-28 21:40  4%     ` Stefan Beller
2015-10-28 22:20  4%       ` Junio C Hamano
2015-10-27 18:15 24% ` [PATCH 6/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2015-10-27 20:57  7%   ` Junio C Hamano
2015-10-28 20:50  4%     ` Stefan Beller
2015-10-27 18:15 23% ` [PATCH 7/9] submodule config: remove name_and_item_from_var Stefan Beller
2015-10-27 18:15 21% ` [PATCH 8/9] submodule-config: parse_config Stefan Beller
2015-10-27 18:15 24% ` [PATCH 9/9] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
2015-10-27 21:00  7%   ` Junio C Hamano
2015-10-27 19:12  7% ` [PATCH 0/9] Expose the submodule parallelism to the user Junio C Hamano
2015-10-28 23:21 25%   ` [PATCHv2 0/8] " Stefan Beller
2015-10-28 23:21 26%     ` [PATCHv2 2/8] submodule config: keep update strategy around Stefan Beller
2015-10-30  1:14  4%       ` Eric Sunshine
2015-10-30 17:38  4%         ` Stefan Beller
2015-10-30 18:16  4%           ` Eric Sunshine
2015-10-30 18:25  4%             ` Stefan Beller
2015-10-28 23:21 23%     ` [PATCHv2 3/8] submodule config: remove name_and_item_from_var Stefan Beller
2015-10-30  1:23  7%       ` Eric Sunshine
2015-10-30 18:37  4%         ` Stefan Beller
2015-10-28 23:21 21%     ` [PATCHv2 4/8] submodule-config: parse_config Stefan Beller
2015-10-30  1:53  4%       ` Eric Sunshine
2015-10-30 19:29  7%         ` Stefan Beller
2015-10-28 23:21 24%     ` [PATCHv2 5/8] fetching submodules: Respect `submodule.jobs` config option Stefan Beller
2015-10-30  2:17  5%       ` Eric Sunshine
2015-10-28 23:21 21%     ` [PATCHv2 6/8] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-10-29 22:34  6%       ` Junio C Hamano
2015-10-28 23:21 23%     ` [PATCHv2 7/8] submodule update: expose parallelism to the user Stefan Beller
2015-10-28 23:21 24%     ` [PATCHv2 8/8] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2015-11-01  8:58  4%       ` Eric Sunshine
2015-10-29 13:19  4%     ` [PATCHv2 0/8] Expose the submodule parallelism to the user Ramsay Jones
2015-10-29 15:51  7%       ` Stefan Beller
2015-10-29 17:23  4%         ` Junio C Hamano
2015-10-29 17:30  4%           ` Stefan Beller
2015-10-29 23:50  6%         ` Ramsay Jones
2015-11-03 19:41  7%           ` Stefan Beller
2015-10-29 20:12  6%     ` Junio C Hamano
2015-10-27 21:22     [PATCH v4] Add git-grep threads param Victor Leschuk
2015-11-03 17:22     ` Junio C Hamano
2015-11-04  6:40       ` Jeff King
2015-11-09 11:36         ` Victor Leschuk
2015-11-09 15:55           ` Jeff King
2015-11-09 16:34             ` Victor Leschuk
2015-11-09 16:53               ` Jeff King
2015-11-09 17:28                 ` Victor Leschuk
2015-11-09 17:55                   ` Linus Torvalds
2015-11-09 18:40  4%                 ` Stefan Beller
2015-10-27 22:04     What's the ".git/gitdir" file? Kyle Meyer
2015-10-27 22:22  5% ` Stefan Beller
2015-10-27 22:42  2%   ` Randall S. Becker
2015-10-27 22:54  4%     ` Stefan Beller
2015-11-02  2:58     git.git as of tonight Junio C Hamano
2015-11-02 21:15     ` Johannes Sixt
2015-11-02 23:06  2%   ` Stefan Beller
2015-11-03  6:34         ` Johannes Sixt
2015-11-03 17:05           ` Junio C Hamano
2015-11-03 18:18  2%         ` Stefan Beller
2015-11-04  0:37  9% [PATCHv3 00/11] Expose the submodule parallelism to the user Stefan Beller
2015-11-04  0:37     ` [PATCHv3 02/11] run-command: report failure for degraded output just once Stefan Beller
2015-11-04 18:14  4%   ` Junio C Hamano
2015-11-04 20:14  2%     ` Stefan Beller
2015-11-04  0:37 26% ` [PATCHv3 04/11] submodule-config: keep update strategy around Stefan Beller
2015-11-04  0:37 24% ` [PATCHv3 05/11] submodule-config: drop check against NULL Stefan Beller
2015-11-04  0:37 24% ` [PATCHv3 06/11] submodule-config: remove name_and_item_from_var Stefan Beller
2015-11-04  0:37 23% ` [PATCHv3 07/11] submodule-config: introduce parse_generic_submodule_config Stefan Beller
2015-11-04  0:37 24% ` [PATCHv3 08/11] fetching submodules: respect `submodule.jobs` config option Stefan Beller
2015-11-10 22:21  7%   ` Jens Lehmann
2015-11-10 22:29  8%     ` Stefan Beller
2015-11-11 19:55  7%       ` Jens Lehmann
2015-11-11 23:34  8%         ` Stefan Beller
2015-11-13 20:47  8%           ` Jens Lehmann
2015-11-13 21:29  8%             ` Stefan Beller
2015-11-04  0:37 21% ` [PATCHv3 09/11] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-11-04  0:37 23% ` [PATCHv3 10/11] submodule update: expose parallelism to the user Stefan Beller
2015-11-04  0:37 24% ` [PATCHv3 11/11] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2015-11-04 17:54  6% ` [PATCHv3 00/11] Expose the submodule parallelism to the user Junio C Hamano
2015-11-04 18:08  7%   ` Stefan Beller
2015-11-04 18:17  4%     ` Junio C Hamano
2015-11-04 19:59     O_NONBLOCK under Windows (was: git.git as of tonight) Torsten Bögershausen
2015-11-04 22:43  7% ` [PATCH 0/2] Missing " Stefan Beller
2015-11-04 22:43  5%   ` [PATCH 1/2] run-parallel: rename set_nonblocking to set_nonblocking_or_die Stefan Beller
2015-11-05 18:17  7% [PATCH 0/2] Remove non-blocking fds from run-command Stefan Beller
2015-11-05 18:17     ` [PATCH 1/2] run-command: Remove set_nonblocking Stefan Beller
2015-11-05 18:45  4%   ` Junio C Hamano
2015-11-05 19:22  2%     ` Stefan Beller
2015-11-05 19:37  3%       ` Junio C Hamano
2015-11-06 23:41     What's cooking in git.git (Nov 2015, #02; Fri, 6) Junio C Hamano
2015-11-11 18:59  5% ` Stefan Beller
2015-11-06 23:48  7% [PATCH] run-command: detect finished children by closed pipe rather than waitpid Stefan Beller
2015-11-07  9:01  2% ` Johannes Sixt
2015-11-10 16:31     Allow git alias to override existing Git commands Jeremy Morton
2015-11-10 18:12  4% ` Stefan Beller
2015-11-10 20:04       ` Jeremy Morton
2015-11-10 20:22  4%     ` Stefan Beller
2015-11-10 21:57  5%   ` Jens Lehmann
2015-11-10 22:49  5%     ` Stefan Beller
2015-11-11 19:44  5%       ` Jens Lehmann
2015-11-11 12:46     git clone --recursive should run git submodule update with flag --remote Stanislav
2015-11-11 19:48  7% ` Stefan Beller
2015-11-11 14:09     [RFC] Clone repositories recursive with depth 1 Lars Schneider
2015-11-11 19:19  5% ` Stefan Beller
2015-11-11 20:09  7%   ` Stefan Beller
2015-11-12  9:39  2%     ` Lars Schneider
2015-11-12 23:47  5%       ` Stefan Beller
2015-11-11 20:39  7% [PATCH v2] run-command: detect finished children by closed pipe rather than waitpid Stefan Beller
2015-11-12  9:37     [PATCH v2] add test to demonstrate that shallow recursive clones fail larsxschneider
2015-11-12 23:34  4% ` Stefan Beller
2015-11-15 12:43  2%   ` Lars Schneider
2015-11-13  5:35     ` Jeff King
2015-11-13 18:41  5%   ` Stefan Beller
2015-11-13 23:16  7%     ` Stefan Beller
2015-11-13 23:38  2%       ` Jeff King
2015-11-13 23:41  2%         ` Jeff King
2015-11-14  0:10  5%           ` Stefan Beller
2015-11-16 18:59  5%             ` Jens Lehmann
2015-11-16 19:25  5%               ` Stefan Beller
2015-11-16 21:42  5%                 ` Jens Lehmann
2015-11-16 22:56  5%                   ` Stefan Beller
2015-11-17 19:46  5%                     ` Jens Lehmann
2015-11-17 20:04  5%                       ` Stefan Beller
2015-11-17 20:39  4%                         ` Jens Lehmann
2015-11-17 20:49  2%                           ` Stefan Beller
2015-11-17 21:00  5%                             ` Jens Lehmann
2015-11-14  1:06 10% [PATCHv4 0/9] Expose submodule parallelism to the user Stefan Beller
2015-11-14  1:06 26% ` [PATCHv4 2/9] submodule-config: keep update strategy around Stefan Beller
2015-11-14  1:06 24% ` [PATCHv4 3/9] submodule-config: drop check against NULL Stefan Beller
2015-11-14  1:06 24% ` [PATCHv4 4/9] submodule-config: remove name_and_item_from_var Stefan Beller
2015-11-14  1:06 23% ` [PATCHv4 5/9] submodule-config: introduce parse_generic_submodule_config Stefan Beller
2015-11-14  1:06 24% ` [PATCHv4 6/9] fetching submodules: respect `submodule.jobs` config option Stefan Beller
2015-11-14  1:07 21% ` [PATCHv4 7/9] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-11-14  1:07 23% ` [PATCHv4 8/9] submodule update: expose parallelism to the user Stefan Beller
2015-11-14  1:07 24% ` [PATCHv4 9/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2015-11-20 12:02  4% ` [PATCHv4 0/9] Expose submodule parallelism to the user Jeff King
2015-11-16 13:24     [PATCH] push: add recurseSubmodules config option Mike Crowe
2015-11-16 18:15  2% ` Stefan Beller
2015-11-16 18:31  2%   ` Mike Crowe
2015-11-16 19:05  0%     ` Jens Lehmann
2015-11-17 11:05 18% [PATCHv2] " Mike Crowe
2015-11-20 21:08  7% [PATCHv2] run-command: detect finished children by closed pipe rather than waitpid Stefan Beller
2015-11-23 21:43  7% [PATCHv3] " Stefan Beller
2015-11-25  1:14 12% [PATCHv5 0/9] Expose submodule parallelism to the user Stefan Beller
2015-11-25  1:14 26% ` [PATCHv5 2/9] submodule-config: keep update strategy around Stefan Beller
2015-11-25  1:14 25% ` [PATCHv5 3/9] submodule-config: drop check against NULL Stefan Beller
2015-11-25  1:14 25% ` [PATCHv5 4/9] submodule-config: remove name_and_item_from_var Stefan Beller
2015-11-25  1:14 23% ` [PATCHv5 5/9] submodule-config: introduce parse_generic_submodule_config Stefan Beller
2015-11-25  1:14 25% ` [PATCHv5 6/9] fetching submodules: respect `submodule.fetchJobs` config option Stefan Beller
2015-11-25  1:14 21% ` [PATCHv5 7/9] git submodule update: have a dedicated helper for cloning Stefan Beller
2015-11-25  1:14 23% ` [PATCHv5 8/9] submodule update: expose parallelism to the user Stefan Beller
2015-11-25  1:14 25% ` [PATCHv5 9/9] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2015-11-25  1:32  9% [RFC PATCH 0/5] Submodule Groups Stefan Beller
2015-11-25  1:32 27% ` [PATCH 1/5] submodule-config: keep submodule groups around Stefan Beller
2015-11-25  1:32 29% ` [PATCH 2/5] git submodule add can add a submodule with groups Stefan Beller
2015-11-25  1:32 27% ` [PATCH 3/5] git submodule init to pass on groups Stefan Beller
2015-11-25  1:32 13% ` [PATCH 4/5] submodule--helper: module_list and update-clone have --groups option Stefan Beller
2015-11-25  1:32 22% ` [PATCH 5/5] builtin/clone: support submodule groups Stefan Beller
2015-11-25 17:52  8%   ` Jens Lehmann
2015-11-25 18:08  7%     ` Stefan Beller
2015-11-25 19:50  7%       ` Jens Lehmann
2015-11-25 20:03  7%         ` Stefan Beller
2015-11-25 22:30  7%           ` Jens Lehmann
2015-11-25 22:51  7%             ` Stefan Beller
2015-11-26  0:31 19%             ` [PATCHv2] " Stefan Beller
2015-11-25 17:35  7% ` [RFC PATCH 0/5] Submodule Groups Jens Lehmann
2015-11-25 18:00  7%   ` Stefan Beller
2015-11-25 19:18  6%     ` Jens Lehmann
2015-11-25 17:50  7% ` Jens Lehmann
2015-11-25 12:35     Git super slow on Windows 7 Lars Schneider
2015-11-25 18:42     ` Lars Schneider
2015-11-25 18:47  5%   ` Stefan Beller
2015-11-25 20:23  2%     ` Duy Nguyen
2015-11-25 20:42  5%       ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).