git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/9] Progress with git submodule
@ 2015-08-28  1:14 Stefan Beller
  2015-08-28  1:14 ` [PATCH 1/9] submodule: implement `module_list` as a builtin helper Stefan Beller
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

This series replaces origin/sb/submodule-helper and is based on
5a1ba6b48a62bf55f9c8305d9850c3a8d22365c5, (Merge 'hv/submodule-config' to
'sb/submodule-helper', which includes jk/git-path and hv/submodule-config)

What changed?
* The help text of the submodule--helper was adapted to our standard,
  and removed long line, outdated docs, thanks Dscho!
  
* The run-command API was extended to be able to sync the output from
  various commands running at the same time.
  (In the previous series this was a one-shot only solution cooked for
   git submodule foreach_parallel)
   
* `git fetch recurse-submodules` will fetch the submodules in parallel!

Any feedback welcome,
specially on patch 4..7

Thanks,
Stefan

Stefan Beller (9):
  submodule: implement `module_list` as a builtin helper
  submodule: implement `module_name` as a builtin helper
  submodule: implement `module_clone` as a builtin helper
  thread-utils: add a threaded task queue
  run-command: add synced output
  submodule: helper to run foreach in parallel
  fetch: fetch submodules in parallel
  index-pack: Use the new worker pool
  pack-objects: Use new worker pool

 .gitignore                      |   1 +
 Documentation/fetch-options.txt |   7 +
 Makefile                        |   1 +
 builtin.h                       |   1 +
 builtin/fetch.c                 |   6 +-
 builtin/index-pack.c            |  23 +--
 builtin/pack-objects.c          | 175 ++++++-----------
 builtin/pull.c                  |   6 +
 builtin/submodule--helper.c     | 417 ++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh                | 177 +++--------------
 git.c                           |   1 +
 run-command.c                   |  99 ++++++++--
 run-command.h                   |  21 ++
 submodule.c                     | 100 ++++++++--
 submodule.h                     |   2 +-
 t/t7407-submodule-foreach.sh    |  11 ++
 thread-utils.c                  | 192 ++++++++++++++++++
 thread-utils.h                  |  35 ++++
 18 files changed, 960 insertions(+), 315 deletions(-)
 create mode 100644 builtin/submodule--helper.c

-- 
2.5.0.264.g5e52b0d

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/9] submodule: implement `module_list` as a builtin helper
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28  1:14 ` [PATCH 2/9] submodule: implement `module_name` " Stefan Beller
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

Most of the submodule operations work on a set of submodules.
Calculating and using this set is usually done via:

       module_list "$@" | {
           while read mode sha1 stage sm_path
           do
                # the actual operation
           done
       }

Currently the function `module_list` is implemented in the
git-submodule.sh as a shell script wrapping a perl script.
The rewrite is in C, such that it is faster and can later be
easily adapted when other functions are rewritten in C.

git-submodule.sh similar to the builtin commands will navigate
to the top most directory of the repository and keeping the
subdirectories as a variable. As the helper is called from
within the git-submodule.sh script, we are already navigated
to the root level, but the path arguments are stil relative
to the subdirectory we were in when calling git-submodule.sh.
That's why there is a `--prefix` option pointing to an alternative
path where to anchor relative path arguments.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore                  |   1 +
 Makefile                    |   1 +
 builtin.h                   |   1 +
 builtin/submodule--helper.c | 114 ++++++++++++++++++++++++++++++++++++++++++++
 git-submodule.sh            |  54 +++------------------
 git.c                       |   1 +
 6 files changed, 124 insertions(+), 48 deletions(-)
 create mode 100644 builtin/submodule--helper.c

diff --git a/.gitignore b/.gitignore
index 4fd81ba..1c2f832 100644
--- a/.gitignore
+++ b/.gitignore
@@ -155,6 +155,7 @@
 /git-status
 /git-stripspace
 /git-submodule
+/git-submodule--helper
 /git-svn
 /git-symbolic-ref
 /git-tag
diff --git a/Makefile b/Makefile
index 24b636d..d434e73 100644
--- a/Makefile
+++ b/Makefile
@@ -901,6 +901,7 @@ BUILTIN_OBJS += builtin/shortlog.o
 BUILTIN_OBJS += builtin/show-branch.o
 BUILTIN_OBJS += builtin/show-ref.o
 BUILTIN_OBJS += builtin/stripspace.o
+BUILTIN_OBJS += builtin/submodule--helper.o
 BUILTIN_OBJS += builtin/symbolic-ref.o
 BUILTIN_OBJS += builtin/tag.o
 BUILTIN_OBJS += builtin/unpack-file.o
diff --git a/builtin.h b/builtin.h
index 839483d..924e6c4 100644
--- a/builtin.h
+++ b/builtin.h
@@ -119,6 +119,7 @@ extern int cmd_show(int argc, const char **argv, const char *prefix);
 extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
 extern int cmd_status(int argc, const char **argv, const char *prefix);
 extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
+extern int cmd_submodule__helper(int argc, const char **argv, const char *prefix);
 extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_tag(int argc, const char **argv, const char *prefix);
 extern int cmd_tar_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
new file mode 100644
index 0000000..beaab7d
--- /dev/null
+++ b/builtin/submodule--helper.c
@@ -0,0 +1,114 @@
+#include "builtin.h"
+#include "cache.h"
+#include "parse-options.h"
+#include "quote.h"
+#include "pathspec.h"
+#include "dir.h"
+#include "utf8.h"
+
+static const struct cache_entry **ce_entries;
+static int ce_alloc, ce_used;
+static const char *alternative_path;
+
+static int module_list_compute(int argc, const char **argv,
+				const char *prefix,
+				struct pathspec *pathspec)
+{
+	int i;
+	char *max_prefix, *ps_matched = NULL;
+	int max_prefix_len;
+	parse_pathspec(pathspec, 0,
+		       PATHSPEC_PREFER_FULL |
+		       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
+		       prefix, argv);
+
+	/* Find common prefix for all pathspec's */
+	max_prefix = common_prefix(pathspec);
+	max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
+
+	if (pathspec->nr)
+		ps_matched = xcalloc(pathspec->nr, 1);
+
+	if (read_cache() < 0)
+		die("index file corrupt");
+
+	for (i = 0; i < active_nr; i++) {
+		const struct cache_entry *ce = active_cache[i];
+
+		if (!match_pathspec(pathspec, ce->name, ce_namelen(ce),
+				    max_prefix_len, ps_matched,
+				    S_ISGITLINK(ce->ce_mode) | S_ISDIR(ce->ce_mode)))
+			continue;
+
+		if (S_ISGITLINK(ce->ce_mode)) {
+			ALLOC_GROW(ce_entries, ce_used + 1, ce_alloc);
+			ce_entries[ce_used++] = ce;
+		}
+
+		while (i + 1 < active_nr && !strcmp(ce->name, active_cache[i + 1]->name))
+			/*
+			 * Skip entries with the same name in different stages
+			 * to make sure an entry is returned only once.
+			 */
+			i++;
+	}
+	free(max_prefix);
+
+	if (ps_matched && report_path_error(ps_matched, pathspec, prefix))
+		return -1;
+
+	return 0;
+}
+
+static int module_list(int argc, const char **argv, const char *prefix)
+{
+	int i;
+	static struct pathspec pathspec;
+
+	struct option module_list_options[] = {
+		OPT_STRING(0, "prefix", &alternative_path,
+			   N_("path"),
+			   N_("alternative anchor for relative paths")),
+		OPT_END()
+	};
+
+	static const char * const git_submodule_helper_usage[] = {
+		N_("git submodule--helper module_list [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, module_list_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(argc, argv, alternative_path
+					    ? alternative_path
+					    : prefix, &pathspec) < 0) {
+		printf("#unmatched\n");
+		return 1;
+	}
+
+	for (i = 0; i < ce_used; i++) {
+		const struct cache_entry *ce = ce_entries[i];
+
+		if (ce_stage(ce)) {
+			printf("%06o %s U\t", ce->ce_mode, sha1_to_hex(null_sha1));
+		} else {
+			printf("%06o %s %d\t", ce->ce_mode, sha1_to_hex(ce->sha1), ce_stage(ce));
+		}
+
+		utf8_fprintf(stdout, "%s\n", ce->name);
+	}
+	return 0;
+}
+
+int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
+{
+	if (argc < 2)
+		goto usage;
+
+	if (!strcmp(argv[1], "module_list"))
+		return module_list(argc - 1, argv + 1, prefix);
+
+usage:
+	usage("git submodule--helper module_list\n");
+}
diff --git a/git-submodule.sh b/git-submodule.sh
index 36797c3..af9ecef 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -145,48 +145,6 @@ relative_path ()
 	echo "$result$target"
 }
 
-#
-# Get submodule info for registered submodules
-# $@ = path to limit submodule list
-#
-module_list()
-{
-	eval "set $(git rev-parse --sq --prefix "$wt_prefix" -- "$@")"
-	(
-		git ls-files -z --error-unmatch --stage -- "$@" ||
-		echo "unmatched pathspec exists"
-	) |
-	@@PERL@@ -e '
-	my %unmerged = ();
-	my ($null_sha1) = ("0" x 40);
-	my @out = ();
-	my $unmatched = 0;
-	$/ = "\0";
-	while (<STDIN>) {
-		if (/^unmatched pathspec/) {
-			$unmatched = 1;
-			next;
-		}
-		chomp;
-		my ($mode, $sha1, $stage, $path) =
-			/^([0-7]+) ([0-9a-f]{40}) ([0-3])\t(.*)$/;
-		next unless $mode eq "160000";
-		if ($stage ne "0") {
-			if (!$unmerged{$path}++) {
-				push @out, "$mode $null_sha1 U\t$path\n";
-			}
-			next;
-		}
-		push @out, "$_\n";
-	}
-	if ($unmatched) {
-		print "#unmatched\n";
-	} else {
-		print for (@out);
-	}
-	'
-}
-
 die_if_unmatched ()
 {
 	if test "$1" = "#unmatched"
@@ -532,7 +490,7 @@ cmd_foreach()
 	# command in the subshell (and a recursive call to this function)
 	exec 3<&0
 
-	module_list |
+	git submodule--helper module_list --prefix "$wt_prefix"|
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
@@ -592,7 +550,7 @@ cmd_init()
 		shift
 	done
 
-	module_list "$@" |
+	git submodule--helper module_list --prefix "$wt_prefix" "$@" |
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
@@ -674,7 +632,7 @@ cmd_deinit()
 		die "$(eval_gettext "Use '.' if you really want to deinitialize all submodules")"
 	fi
 
-	module_list "$@" |
+	git submodule--helper module_list --prefix "$wt_prefix" "$@" |
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
@@ -790,7 +748,7 @@ cmd_update()
 	fi
 
 	cloned_modules=
-	module_list "$@" | {
+	git submodule--helper module_list --prefix "$wt_prefix" "$@" | {
 	err=
 	while read mode sha1 stage sm_path
 	do
@@ -1222,7 +1180,7 @@ cmd_status()
 		shift
 	done
 
-	module_list "$@" |
+	git submodule--helper module_list --prefix "$wt_prefix" "$@" |
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
@@ -1299,7 +1257,7 @@ cmd_sync()
 		esac
 	done
 	cd_to_toplevel
-	module_list "$@" |
+	git submodule--helper module_list --prefix "$wt_prefix" "$@" |
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
diff --git a/git.c b/git.c
index 55c327c..deecba0 100644
--- a/git.c
+++ b/git.c
@@ -469,6 +469,7 @@ static struct cmd_struct commands[] = {
 	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },
 	{ "status", cmd_status, RUN_SETUP | NEED_WORK_TREE },
 	{ "stripspace", cmd_stripspace },
+	{ "submodule--helper", cmd_submodule__helper, RUN_SETUP },
 	{ "symbolic-ref", cmd_symbolic_ref, RUN_SETUP },
 	{ "tag", cmd_tag, RUN_SETUP },
 	{ "unpack-file", cmd_unpack_file, RUN_SETUP },
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/9] submodule: implement `module_name` as a builtin helper
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
  2015-08-28  1:14 ` [PATCH 1/9] submodule: implement `module_list` as a builtin helper Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28  1:14 ` [PATCH 3/9] submodule: implement `module_clone` " Stefan Beller
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

This implements the helper `module_name` in C instead of shell,
yielding a nice performance boost.

Before this patch, I measured a time (best out of three):

  $ time ./t7400-submodule-basic.sh  >/dev/null
    real	0m11.066s
    user	0m3.348s
    sys	0m8.534s

With this patch applied I measured (also best out of three)

  $ time ./t7400-submodule-basic.sh  >/dev/null
    real	0m10.063s
    user	0m3.044s
    sys	0m7.487s

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 28 +++++++++++++++++++++++++++-
 git-submodule.sh            | 32 +++++++-------------------------
 2 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index beaab7d..c8f7e0c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -5,6 +5,9 @@
 #include "pathspec.h"
 #include "dir.h"
 #include "utf8.h"
+#include "submodule.h"
+#include "submodule-config.h"
+#include "string-list.h"
 
 static const struct cache_entry **ce_entries;
 static int ce_alloc, ce_used;
@@ -101,6 +104,26 @@ static int module_list(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int module_name(int argc, const char **argv, const char *prefix)
+{
+	const char *name;
+	const struct submodule *sub;
+
+	if (argc != 1)
+		usage("git submodule--helper module_name <path>\n");
+
+	gitmodules_config();
+	sub = submodule_from_path(null_sha1, argv[0]);
+
+	if (!sub)
+		die("No submodule mapping found in .gitmodules for path '%s'", argv[0]);
+
+	name = sub->name;
+	printf("%s\n", name);
+
+	return 0;
+}
+
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
 {
 	if (argc < 2)
@@ -109,6 +132,9 @@ int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[1], "module_list"))
 		return module_list(argc - 1, argv + 1, prefix);
 
+	if (!strcmp(argv[1], "module_name"))
+		return module_name(argc - 2, argv + 2, prefix);
+
 usage:
-	usage("git submodule--helper module_list\n");
+	usage("git submodule--helper [module_list | module_name]\n");
 }
diff --git a/git-submodule.sh b/git-submodule.sh
index af9ecef..e6ff38d 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -178,24 +178,6 @@ get_submodule_config () {
 	printf '%s' "${value:-$default}"
 }
 
-
-#
-# Map submodule path to submodule name
-#
-# $1 = path
-#
-module_name()
-{
-	# Do we have "submodule.<something>.path = $1" defined in .gitmodules file?
-	sm_path="$1"
-	re=$(printf '%s\n' "$1" | sed -e 's/[].[^$\\*]/\\&/g')
-	name=$( git config -f .gitmodules --get-regexp '^submodule\..*\.path$' |
-		sed -n -e 's|^submodule\.\(.*\)\.path '"$re"'$|\1|p' )
-	test -z "$name" &&
-	die "$(eval_gettext "No submodule mapping found in .gitmodules for path '\$sm_path'")"
-	printf '%s\n' "$name"
-}
-
 #
 # Clone a submodule
 #
@@ -498,7 +480,7 @@ cmd_foreach()
 		then
 			displaypath=$(relative_path "$sm_path")
 			say "$(eval_gettext "Entering '\$prefix\$displaypath'")"
-			name=$(module_name "$sm_path")
+			name=$(git submodule--helper module_name "$sm_path")
 			(
 				prefix="$prefix$sm_path/"
 				clear_local_git_env
@@ -554,7 +536,7 @@ cmd_init()
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
-		name=$(module_name "$sm_path") || exit
+		name=$(git submodule--helper module_name "$sm_path") || exit
 
 		displaypath=$(relative_path "$sm_path")
 
@@ -636,7 +618,7 @@ cmd_deinit()
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
-		name=$(module_name "$sm_path") || exit
+		name=$(git submodule--helper module_name "$sm_path") || exit
 
 		displaypath=$(relative_path "$sm_path")
 
@@ -758,7 +740,7 @@ cmd_update()
 			echo >&2 "Skipping unmerged submodule $prefix$sm_path"
 			continue
 		fi
-		name=$(module_name "$sm_path") || exit
+		name=$(git submodule--helper module_name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		branch=$(get_submodule_config "$name" branch master)
 		if ! test -z "$update"
@@ -1022,7 +1004,7 @@ cmd_summary() {
 			# Respect the ignore setting for --for-status.
 			if test -n "$for_status"
 			then
-				name=$(module_name "$sm_path")
+				name=$(git submodule--helper module_name "$sm_path")
 				ignore_config=$(get_submodule_config "$name" ignore none)
 				test $status != A && test $ignore_config = all && continue
 			fi
@@ -1184,7 +1166,7 @@ cmd_status()
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
-		name=$(module_name "$sm_path") || exit
+		name=$(git submodule--helper module_name "$sm_path") || exit
 		url=$(git config submodule."$name".url)
 		displaypath=$(relative_path "$prefix$sm_path")
 		if test "$stage" = U
@@ -1261,7 +1243,7 @@ cmd_sync()
 	while read mode sha1 stage sm_path
 	do
 		die_if_unmatched "$mode"
-		name=$(module_name "$sm_path")
+		name=$(git submodule--helper module_name "$sm_path")
 		url=$(git config -f .gitmodules --get submodule."$name".url)
 
 		# Possibly a url relative to parent
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/9] submodule: implement `module_clone` as a builtin helper
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
  2015-08-28  1:14 ` [PATCH 1/9] submodule: implement `module_list` as a builtin helper Stefan Beller
  2015-08-28  1:14 ` [PATCH 2/9] submodule: implement `module_name` " Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-31 18:53   ` Junio C Hamano
  2015-08-28  1:14 ` [PATCH 4/9] thread-utils: add a threaded task queue Stefan Beller
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

`module_clone` is part of the update command,
which I want to convert to C next.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/submodule--helper.c | 148 +++++++++++++++++++++++++++++++++++++++++++-
 git-submodule.sh            |  80 +-----------------------
 2 files changed, 150 insertions(+), 78 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c8f7e0c..d29499c 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -8,6 +8,7 @@
 #include "submodule.h"
 #include "submodule-config.h"
 #include "string-list.h"
+#include "run-command.h"
 
 static const struct cache_entry **ce_entries;
 static int ce_alloc, ce_used;
@@ -124,6 +125,147 @@ static int module_name(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int clone_submodule(const char *path, const char *gitdir, const char *url,
+			   const char *depth, const char *reference, int quiet)
+{
+	struct child_process cp;
+	child_process_init(&cp);
+
+	argv_array_push(&cp.args, "clone");
+	argv_array_push(&cp.args, "--no-checkout");
+	if (quiet)
+		argv_array_push(&cp.args, "--quiet");
+	if (depth && strcmp(depth, "")) {
+		argv_array_push(&cp.args, "--depth");
+		argv_array_push(&cp.args, depth);
+	}
+	if (reference && strcmp(reference, "")) {
+		argv_array_push(&cp.args, "--reference");
+		argv_array_push(&cp.args, reference);
+	}
+	if (gitdir) {
+		argv_array_push(&cp.args, "--separate-git-dir");
+		argv_array_push(&cp.args, gitdir);
+	}
+	argv_array_push(&cp.args, url);
+	argv_array_push(&cp.args, path);
+
+	cp.git_cmd = 1;
+	cp.env = local_repo_env;
+
+	cp.no_stdin = 1;
+	cp.no_stdout = 1;
+	cp.no_stderr = 1;
+
+	return run_command(&cp);
+}
+
+static int module_clone(int argc, const char **argv, const char *prefix)
+{
+	const char *path = NULL, *name = NULL, *url = NULL;
+	const char *reference = NULL, *depth = NULL;
+	int quiet = 0;
+	FILE *submodule_dot_git;
+	const char *sm_gitdir, *p;
+	struct strbuf rel_path = STRBUF_INIT;
+	struct strbuf sb = STRBUF_INIT;
+
+	struct option module_update_options[] = {
+		OPT_STRING(0, "prefix", &alternative_path,
+			   N_("path"),
+			   N_("alternative anchor for relative paths")),
+		OPT_STRING(0, "path", &path,
+			   N_("path"),
+			   N_("where the new submodule will be cloned to")),
+		OPT_STRING(0, "name", &name,
+			   N_("string"),
+			   N_("name of the new submodule")),
+		OPT_STRING(0, "url", &url,
+			   N_("string"),
+			   N_("url where to clone the submodule from")),
+		OPT_STRING(0, "reference", &reference,
+			   N_("string"),
+			   N_("reference repository")),
+		OPT_STRING(0, "depth", &depth,
+			   N_("string"),
+			   N_("depth for shallow clones")),
+		OPT_END()
+	};
+
+	static const char * const git_submodule_helper_usage[] = {
+		N_("git submodule--helper update [--prefix=<path>] [--quiet] [--remote] [-N|--no-fetch]"
+		   "[-f|--force] [--rebase|--merge] [--reference <repository>]"
+		   "[--depth <depth>] [--recursive] [--] [<path>...]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, module_update_options,
+			     git_submodule_helper_usage, 0);
+
+	if (getenv("GIT_QUIET"))
+		quiet = 1;
+
+	strbuf_addf(&sb, "%s/modules/%s", get_git_dir(), name);
+	sm_gitdir = strbuf_detach(&sb, NULL);
+
+	if (!file_exists(sm_gitdir)) {
+		safe_create_leading_directories_const(sm_gitdir);
+		if (clone_submodule(path, sm_gitdir, url, depth, reference, quiet))
+			die(N_("Clone of '%s' into submodule path '%s' failed"),
+			    url, path);
+	} else {
+		safe_create_leading_directories_const(path);
+		unlink(sm_gitdir);
+	}
+
+	/* Write a .git file in the submodule to redirect to the superproject. */
+	if (alternative_path && !strcmp(alternative_path, "")) {
+		p = relative_path(path, alternative_path, &sb);
+		strbuf_reset(&sb);
+	} else
+		p = path;
+
+	if (safe_create_leading_directories_const(p) < 0)
+		die("Could not create directory '%s'", p);
+
+	strbuf_addf(&sb, "%s/.git", p);
+
+	if (safe_create_leading_directories_const(sb.buf) < 0)
+		die(_("could not create leading directories of '%s'"), sb.buf);
+	submodule_dot_git = fopen(sb.buf, "w");
+	if (!submodule_dot_git)
+		die ("Cannot open file '%s': %s", sb.buf, strerror(errno));
+
+	fprintf(submodule_dot_git, "gitdir: %s\n",
+		relative_path(sm_gitdir, path, &rel_path));
+	if (fclose(submodule_dot_git))
+		die("Could not close file %s", sb.buf);
+	strbuf_reset(&sb);
+
+	/* Redirect the worktree of the submodule in the superprojects config */
+	if (!is_absolute_path(sm_gitdir)) {
+		char *s = (char*)sm_gitdir;
+		if (strbuf_getcwd(&sb))
+			die_errno("unable to get current working directory");
+		strbuf_addf(&sb, "/%s", sm_gitdir);
+		sm_gitdir = strbuf_detach(&sb, NULL);
+		free(s);
+	}
+
+	if (strbuf_getcwd(&sb))
+		die_errno("unable to get current working directory");
+	strbuf_addf(&sb, "/%s", path);
+
+	p = git_pathdup_submodule(path, "config");
+	if (!p)
+		die("Could not get submodule directory for '%s'", path);
+	git_config_set_in_file(p, "core.worktree",
+			       relative_path(sb.buf, sm_gitdir, &rel_path));
+	strbuf_release(&sb);
+	free((char*)sm_gitdir);
+	return 0;
+}
+
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
 {
 	if (argc < 2)
@@ -135,6 +277,10 @@ int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[1], "module_name"))
 		return module_name(argc - 2, argv + 2, prefix);
 
+	if (!strcmp(argv[1], "module_clone"))
+		return module_clone(argc - 1, argv + 1, prefix);
+
 usage:
-	usage("git submodule--helper [module_list | module_name]\n");
+	usage("git submodule--helper [module_list | module_name | "
+	      "module_clone]\n");
 }
diff --git a/git-submodule.sh b/git-submodule.sh
index e6ff38d..fb5155e 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -178,80 +178,6 @@ get_submodule_config () {
 	printf '%s' "${value:-$default}"
 }
 
-#
-# Clone a submodule
-#
-# $1 = submodule path
-# $2 = submodule name
-# $3 = URL to clone
-# $4 = reference repository to reuse (empty for independent)
-# $5 = depth argument for shallow clones (empty for deep)
-#
-# Prior to calling, cmd_update checks that a possibly existing
-# path is not a git repository.
-# Likewise, cmd_add checks that path does not exist at all,
-# since it is the location of a new submodule.
-#
-module_clone()
-{
-	sm_path=$1
-	name=$2
-	url=$3
-	reference="$4"
-	depth="$5"
-	quiet=
-	if test -n "$GIT_QUIET"
-	then
-		quiet=-q
-	fi
-
-	gitdir=
-	gitdir_base=
-	base_name=$(dirname "$name")
-
-	gitdir=$(git rev-parse --git-dir)
-	gitdir_base="$gitdir/modules/$base_name"
-	gitdir="$gitdir/modules/$name"
-
-	if test -d "$gitdir"
-	then
-		mkdir -p "$sm_path"
-		rm -f "$gitdir/index"
-	else
-		mkdir -p "$gitdir_base"
-		(
-			clear_local_git_env
-			git clone $quiet ${depth:+"$depth"} -n ${reference:+"$reference"} \
-				--separate-git-dir "$gitdir" "$url" "$sm_path"
-		) ||
-		die "$(eval_gettext "Clone of '\$url' into submodule path '\$sm_path' failed")"
-	fi
-
-	# We already are at the root of the work tree but cd_to_toplevel will
-	# resolve any symlinks that might be present in $PWD
-	a=$(cd_to_toplevel && cd "$gitdir" && pwd)/
-	b=$(cd_to_toplevel && cd "$sm_path" && pwd)/
-	# Remove all common leading directories after a sanity check
-	if test "${a#$b}" != "$a" || test "${b#$a}" != "$b"; then
-		die "$(eval_gettext "Gitdir '\$a' is part of the submodule path '\$b' or vice versa")"
-	fi
-	while test "${a%%/*}" = "${b%%/*}"
-	do
-		a=${a#*/}
-		b=${b#*/}
-	done
-	# Now chop off the trailing '/'s that were added in the beginning
-	a=${a%/}
-	b=${b%/}
-
-	# Turn each leading "*/" component into "../"
-	rel=$(printf '%s\n' "$b" | sed -e 's|[^/][^/]*|..|g')
-	printf '%s\n' "gitdir: $rel/$a" >"$sm_path/.git"
-
-	rel=$(printf '%s\n' "$a" | sed -e 's|[^/][^/]*|..|g')
-	(clear_local_git_env; cd "$sm_path" && GIT_WORK_TREE=. git config core.worktree "$rel/$b")
-}
-
 isnumber()
 {
 	n=$(($1 + 0)) 2>/dev/null && test "$n" = "$1"
@@ -301,7 +227,7 @@ cmd_add()
 			shift
 			;;
 		--depth=*)
-			depth=$1
+			depth="$1"
 			;;
 		--)
 			shift
@@ -412,7 +338,7 @@ Use -f if you really want to add it." >&2
 				echo "$(eval_gettext "Reactivating local git directory for submodule '\$sm_name'.")"
 			fi
 		fi
-		module_clone "$sm_path" "$sm_name" "$realrepo" "$reference" "$depth" || exit
+		git submodule--helper module_clone --prefix "$wt_prefix" --path "$sm_path" --name "$sm_name" --url "$realrepo" "$reference" "$depth" || exit
 		(
 			clear_local_git_env
 			cd "$sm_path" &&
@@ -774,7 +700,7 @@ Maybe you want to use 'update --init'?")"
 
 		if ! test -d "$sm_path"/.git && ! test -f "$sm_path"/.git
 		then
-			module_clone "$sm_path" "$name" "$url" "$reference" "$depth" || exit
+			git submodule--helper module_clone --prefix "$prefix" --path "$sm_path" --name "$name" --url "$url" "$reference" "$depth" || exit
 			cloned_modules="$cloned_modules;$name"
 			subsha1=
 		else
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] thread-utils: add a threaded task queue
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
                   ` (2 preceding siblings ...)
  2015-08-28  1:14 ` [PATCH 3/9] submodule: implement `module_clone` " Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28  1:14 ` [PATCH 5/9] run-command: add synced output Stefan Beller
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

This adds functionality to do work in a parallel threaded
fashion while the boiler plate code for setting up threads
and tearing them down as well as queuing up tasks is hidden
behind the new API.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c  |  39 +++++++-----
 run-command.h  |   3 +
 thread-utils.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 thread-utils.h |  35 +++++++++++
 4 files changed, 253 insertions(+), 16 deletions(-)

diff --git a/run-command.c b/run-command.c
index 28e1d55..3d37f8c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -610,10 +610,12 @@ static NORETURN void die_async(const char *err, va_list params)
 
 	if (!pthread_equal(main_thread, pthread_self())) {
 		struct async *async = pthread_getspecific(async_key);
-		if (async->proc_in >= 0)
-			close(async->proc_in);
-		if (async->proc_out >= 0)
-			close(async->proc_out);
+		if (async) {
+			if (async->proc_in >= 0)
+				close(async->proc_in);
+			if (async->proc_out >= 0)
+				close(async->proc_out);
+		}
 		pthread_exit((void *)128);
 	}
 
@@ -668,6 +670,22 @@ int git_atexit(void (*handler)(void))
 
 #endif
 
+void setup_main_thread(void)
+{
+	if (!main_thread_set) {
+		/*
+		 * We assume that the first time that start_async is called
+		 * it is from the main thread.
+		 */
+		main_thread_set = 1;
+		main_thread = pthread_self();
+		pthread_key_create(&async_key, NULL);
+		pthread_key_create(&async_die_counter, NULL);
+		set_die_routine(die_async);
+		set_die_is_recursing_routine(async_die_is_recursing);
+	}
+}
+
 int start_async(struct async *async)
 {
 	int need_in, need_out;
@@ -740,18 +758,7 @@ int start_async(struct async *async)
 	else if (async->out)
 		close(async->out);
 #else
-	if (!main_thread_set) {
-		/*
-		 * We assume that the first time that start_async is called
-		 * it is from the main thread.
-		 */
-		main_thread_set = 1;
-		main_thread = pthread_self();
-		pthread_key_create(&async_key, NULL);
-		pthread_key_create(&async_die_counter, NULL);
-		set_die_routine(die_async);
-		set_die_is_recursing_routine(async_die_is_recursing);
-	}
+	setup_main_thread();
 
 	if (proc_in >= 0)
 		set_cloexec(proc_in);
diff --git a/run-command.h b/run-command.h
index 5b4425a..176a5b2 100644
--- a/run-command.h
+++ b/run-command.h
@@ -119,4 +119,7 @@ struct async {
 int start_async(struct async *async);
 int finish_async(struct async *async);
 
+/* die gracefully from within threads */
+void setup_main_thread(void);
+
 #endif
diff --git a/thread-utils.c b/thread-utils.c
index a2135e0..30ccd79 100644
--- a/thread-utils.c
+++ b/thread-utils.c
@@ -1,5 +1,7 @@
 #include "cache.h"
 #include "thread-utils.h"
+#include "run-command.h"
+#include "git-compat-util.h"
 
 #if defined(hpux) || defined(__hpux) || defined(_hpux)
 #  include <sys/pstat.h>
@@ -75,3 +77,193 @@ int init_recursive_mutex(pthread_mutex_t *m)
 	}
 	return ret;
 }
+
+#ifndef NO_PTHREADS
+struct job_list {
+	int (*fct)(struct task_queue *tq, void *task);
+	void *task;
+	struct job_list *next;
+};
+
+struct task_queue {
+	pthread_mutex_t mutex;
+	pthread_cond_t cond_non_empty;
+
+	int queued_tasks;
+	struct job_list *first;
+	struct job_list *last;
+
+	pthread_t *threads;
+	unsigned max_threads;
+	unsigned max_tasks;
+
+	void (*finish_function)(struct task_queue *tq);
+	int early_return;
+};
+
+static void next_task(struct task_queue *tq,
+		      int (**fct)(struct task_queue *tq, void *task),
+		      void **task,
+		      int *early_return)
+{
+	struct job_list *job = NULL;
+
+	pthread_mutex_lock(&tq->mutex);
+	while (tq->queued_tasks == 0)
+		pthread_cond_wait(&tq->cond_non_empty, &tq->mutex);
+
+	tq->early_return |= *early_return;
+
+	if (!tq->early_return) {
+		job = tq->first;
+		tq->first = job->next;
+		if (!tq->first)
+			tq->last = NULL;
+		tq->queued_tasks--;
+	}
+
+	pthread_mutex_unlock(&tq->mutex);
+
+	if (job) {
+		*fct = job->fct;
+		*task = job->task;
+	} else {
+		*fct = NULL;
+		*task = NULL;
+	}
+
+	free(job);
+}
+
+static void *dispatcher(void *args)
+{
+	void *task;
+	int (*fct)(struct task_queue *tq, void *task);
+	int early_return = 0;
+	struct task_queue *tq = args;
+
+	next_task(tq, &fct, &task, &early_return);
+	while (fct && !early_return) {
+		early_return = fct(tq, task);
+		next_task(tq, &fct, &task, &early_return);
+	}
+
+	if (tq->finish_function)
+		tq->finish_function(tq);
+
+	pthread_exit(0);
+}
+
+struct task_queue *create_task_queue(unsigned max_threads)
+{
+	struct task_queue *tq = xmalloc(sizeof(*tq));
+
+	int i, ret;
+	if (!max_threads)
+		tq->max_threads = online_cpus();
+	else
+		tq->max_threads = max_threads;
+
+	pthread_mutex_init(&tq->mutex, NULL);
+	pthread_cond_init(&tq->cond_non_empty, NULL);
+
+	tq->threads = xmalloc(tq->max_threads * sizeof(pthread_t));
+
+	tq->queued_tasks = 0;
+	tq->first = NULL;
+	tq->last = NULL;
+
+	setup_main_thread();
+
+	for (i = 0; i < tq->max_threads; i++) {
+		ret = pthread_create(&tq->threads[i], 0, &dispatcher, tq);
+		if (ret)
+			die("unable to create thread: %s", strerror(ret));
+	}
+
+	tq->early_return = 0;
+
+	return tq;
+}
+
+void add_task(struct task_queue *tq,
+	      int (*fct)(struct task_queue *tq, void *task),
+	      void *task)
+{
+	struct job_list *job_list;
+
+	job_list = xmalloc(sizeof(*job_list));
+	job_list->task = task;
+	job_list->fct = fct;
+	job_list->next = NULL;
+
+	pthread_mutex_lock(&tq->mutex);
+
+	if (!tq->last) {
+		tq->last = job_list;
+		tq->first = tq->last;
+	} else {
+		tq->last->next = job_list;
+		tq->last = tq->last->next;
+	}
+	tq->queued_tasks++;
+
+	pthread_mutex_unlock(&tq->mutex);
+	pthread_cond_signal(&tq->cond_non_empty);
+}
+
+int finish_task_queue(struct task_queue *tq, void (*fct)(struct task_queue *tq))
+{
+	int ret;
+	int i;
+
+	tq->finish_function = fct;
+
+	for (i = 0; i < tq->max_threads; i++)
+		add_task(tq, NULL, NULL);
+
+	for (i = 0; i < tq->max_threads; i++)
+		pthread_join(tq->threads[i], 0);
+
+	pthread_mutex_destroy(&tq->mutex);
+	pthread_cond_destroy(&tq->cond_non_empty);
+
+	if (tq->first)
+		die("BUG: internal error with queuing jobs for threads");
+
+	free(tq->threads);
+	ret = tq->early_return;
+
+	free(tq);
+	return ret;
+}
+#else /* NO_PTHREADS */
+
+struct task_queue {
+	int early_return;
+};
+
+struct task_queue *create_task_queue(unsigned max_threads)
+{
+	struct task_queue *tq = xmalloc(sizeof(*tq));
+
+	tq->early_return = 0;
+}
+
+void add_task(struct task_queue *tq,
+	      int (*fct)(struct task_queue *tq, void *task),
+	      void *task)
+{
+	if (tq->early_return)
+		return;
+
+	tq->early_return |= fct(tq, task);
+}
+
+int finish_task_queue(struct task_queue *tq, void (*fct)(struct task_queue *tq))
+{
+	int ret = tq->early_return;
+	free(tq);
+	return ret;
+}
+#endif
diff --git a/thread-utils.h b/thread-utils.h
index d9a769d..f41cfb1 100644
--- a/thread-utils.h
+++ b/thread-utils.h
@@ -12,4 +12,39 @@ extern int init_recursive_mutex(pthread_mutex_t*);
 #define online_cpus() 1
 
 #endif
+
+/*
+ * Creates a struct `task_queue`, which holds a list of tasks. Up to
+ * `max_threads` threads are active to process the enqueued tasks
+ * processing the tasks in a first in first out order.
+ *
+ * If `max_threads` is zero the number of cores available will be used.
+ *
+ * Currently this only works in environments with pthreads, in other
+ * environments, the task will be processed sequentially in `add_task`.
+ */
+struct task_queue *create_task_queue(unsigned max_threads);
+
+/*
+ * The function and data are put into the task queue.
+ *
+ * The function `fct` must not be NULL, as that's used internally
+ * in `finish_task_queue` to signal shutdown. If the return code
+ * of `fct` is unequal to 0, the tasks will stop eventually,
+ * the current parallel tasks will be flushed out.
+ */
+void add_task(struct task_queue *tq,
+	      int (*fct)(struct task_queue *tq, void *task),
+	      void *task);
+
+/*
+ * Waits for all tasks to be done and frees the object. The return code
+ * is zero if all enqueued tasks were processed.
+ *
+ * The function `fct` is called once in each thread after the last task
+ * for that thread was processed. If no thread local cleanup needs to be
+ * performed, pass NULL.
+ */
+int finish_task_queue(struct task_queue *tq, void (*fct)(struct task_queue *tq));
+
 #endif /* THREAD_COMPAT_H */
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/9] run-command: add synced output
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
                   ` (3 preceding siblings ...)
  2015-08-28  1:14 ` [PATCH 4/9] thread-utils: add a threaded task queue Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28  1:14 ` [PATCH 6/9] submodule: helper to run foreach in parallel Stefan Beller
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

In the last patch we added an easy way to get a thread pool. Now if we
want to run external commands from threads in the thread pool, the output
will mix up between the threads.

To solve this problem we protect the output via a mutex from becoming
garbled. Each thread will try to acquire and directly pipe their output
to the standard output if they are lucky to get the mutex. If they do not
have the mutex each thread will buffer their output.

Example:
Let's assume we have 5 tasks A,B,C,D,E and each task takes a different
amount of time (say `git fetch` for different submodules), then the output
of tasks in sequential order might look like this:

 time -->
 output: |---A---|   |-B-|   |----C-----------|   |-D-|   |-E-|

When we schedule these tasks into two threads, a schedule
and sample output over time may look like this:

thread 1: |---A---|   |-D-|   |-E-|

thread 2: |-B-|   |----C-----------|

output:   |---A---|B|------C-------|ED

So A will be perceived as it would run normally in
the single threaded version of foreach. As B has finished
by the time the mutex becomes available, the whole buffer
will just be dumped into the standard output. This will be
perceived by the user as just a 'very fast' operation of B.
Once that is done, C takes the mutex, and flushes the piled
up buffer to standard output. In case the it is a
git command, we have a nice progress display, which will just
look like the first half of C happend really quickly.

Notice how E and D are put out in a different order than the
original as the new parallel foreach doesn't care about the
order.

So this way of output is really good for human consumption,
as it only changes the timing, not the actual output.

For machine consumption the output needs to be prepared in
the tasks, by either having a prefix per line or per block
to indicate whose tasks output is displayed.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 run-command.h | 18 ++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/run-command.c b/run-command.c
index 3d37f8c..bac4367 100644
--- a/run-command.c
+++ b/run-command.c
@@ -556,17 +556,77 @@ int finish_command(struct child_process *cmd)
 }
 
 int run_command(struct child_process *cmd)
+#ifdef NO_PTHREADS
 {
 	int code;
 
 	if (cmd->out < 0 || cmd->err < 0)
 		die("BUG: run_command with a pipe can cause deadlock");
 
+	if (cmd->sync_buf)
+		xwrite(cmd->out, cmd->sync_buf->buf, cmd->sync_buf->len);
+
 	code = start_command(cmd);
 	if (code)
 		return code;
 	return finish_command(cmd);
 }
+#else
+{
+	int code, lock_acquired;
+
+	if (!cmd->sync_mutex) {
+		if (cmd->out < 0 || cmd->err < 0)
+			die("BUG: run_command with a pipe can cause deadlock");
+
+		if (cmd->sync_buf)
+			xwrite(cmd->out, cmd->sync_buf->buf, cmd->sync_buf->len);
+	} else {
+		if (cmd->out < 0)
+			die("BUG: run_command with a pipe can cause deadlock");
+
+		if (!cmd->stdout_to_stderr)
+			die("BUG: run_command with sync_mutex not supported "
+			    "without stdout_to_stderr set");
+
+		if (!cmd->sync_buf)
+			die("BUG: Must pass a buffer when specifying "
+			    "to sync output");
+	}
+
+	code = start_command(cmd);
+	if (code)
+		return code;
+
+	if (cmd->sync_mutex) {
+		while (1) {
+			char buf[1024];
+			ssize_t len = xread(cmd->err, buf, sizeof(buf));
+			if (len < 0)
+				die("Read from command failed");
+			else if (len == 0)
+				break;
+			else
+				strbuf_add(cmd->sync_buf, buf, len);
+
+			if (!lock_acquired
+			    && !pthread_mutex_trylock(cmd->sync_mutex))
+				lock_acquired = 1;
+			if (lock_acquired) {
+				fputs(cmd->sync_buf->buf, stderr);
+				strbuf_reset(cmd->sync_buf);
+			}
+		}
+		if (!lock_acquired)
+			pthread_mutex_lock(cmd->sync_mutex);
+
+		fputs(cmd->sync_buf->buf, stderr);
+		pthread_mutex_unlock(cmd->sync_mutex);
+	}
+
+	return finish_command(cmd);
+}
+#endif
 
 int run_command_v_opt(const char **argv, int opt)
 {
diff --git a/run-command.h b/run-command.h
index 176a5b2..0df83c9 100644
--- a/run-command.h
+++ b/run-command.h
@@ -43,6 +43,24 @@ struct child_process {
 	unsigned stdout_to_stderr:1;
 	unsigned use_shell:1;
 	unsigned clean_on_exit:1;
+#ifndef NO_PTHREAD
+	/*
+	 * In a threaded environment running multiple commands from different
+	 * threads would lead to garbled output as the output of different
+	 * commands would mix.
+	 * If this mutex is not NULL, the output of `err` is only done when
+	 * holding the mutex. If the mutex cannot be acquired, the output
+	 * will be buffered until the mutex can be acquired.
+	 */
+	pthread_mutex_t *sync_mutex;
+#endif
+	/*
+	 * The sync_buf will be used to buffer the output while the mutex
+	 * is not acquired. It can also contain data before being passed into
+	 * run_command, which will be output together with the output of
+	 * the child.
+	 */
+	struct strbuf *sync_buf;
 };
 
 #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/9] submodule: helper to run foreach in parallel
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
                   ` (4 preceding siblings ...)
  2015-08-28  1:14 ` [PATCH 5/9] run-command: add synced output Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28 17:08   ` Stefan Beller
  2015-08-28  1:14 ` [PATCH 7/9] fetch: fetch submodules " Stefan Beller
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

Similar to `git submodule foreach` the new command `git submodule
foreach_parallel` will run a command on each submodule.

The commands are run in parallel up to the number of cores by default,
or you can specify '-j 4' tun just run with 4 threads for example.

One major difference to `git submodule foreach` is the handling of input
and output to the commands. Because of the parallel nature of the execution
it is not trivial how to schedule the std{in,out,err} channel for submodule
the command is run in. So in this patch there is no support for stdin.
stdout will be piped to stderr. stderr will make use of the synchronized
output feature of run_command.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/submodule--helper.c  | 133 ++++++++++++++++++++++++++++++++++++++++++-
 git-submodule.sh             |  11 +++-
 t/t7407-submodule-foreach.sh |  11 ++++
 3 files changed, 153 insertions(+), 2 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d29499c..18b67f0 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -8,6 +8,7 @@
 #include "submodule.h"
 #include "submodule-config.h"
 #include "string-list.h"
+#include "thread-utils.h"
 #include "run-command.h"
 
 static const struct cache_entry **ce_entries;
@@ -266,6 +267,127 @@ static int module_clone(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+#ifndef NO_PTHREADS
+struct submodule_args {
+	const char *name;
+	const char *path;
+	const char *sha1;
+	const char *toplevel;
+	const char *prefix;
+	const char **cmd;
+	pthread_mutex_t *sync;
+};
+
+int run_cmd_submodule(struct task_queue *aq, void *task)
+{
+	int i;
+	struct submodule_args *args = task;
+	struct strbuf out = STRBUF_INIT;
+	struct strbuf sb = STRBUF_INIT;
+	struct child_process *cp = xmalloc(sizeof(*cp));
+
+	strbuf_addf(&out, N_("Entering %s\n"), relative_path(args->path, args->prefix, &sb));
+
+	child_process_init(cp);
+	argv_array_pushv(&cp->args, args->cmd);
+
+	argv_array_pushf(&cp->env_array, "name=%s", args->name);
+	argv_array_pushf(&cp->env_array, "path=%s", args->path);
+	argv_array_pushf(&cp->env_array, "sha1=%s", args->sha1);
+	argv_array_pushf(&cp->env_array, "toplevel=%s", args->toplevel);
+
+	for (i = 0; local_repo_env[i]; i++)
+		argv_array_push(&cp->env_array, local_repo_env[i]);
+
+	cp->no_stdin = 1;
+	cp->out = 0;
+	cp->err = -1;
+	cp->dir = args->path;
+	cp->stdout_to_stderr = 1;
+	cp->use_shell = 1;
+	cp->sync_mutex = args->sync;
+	cp->sync_buf = &out;
+
+	return run_command(cp);
+}
+
+int module_foreach_parallel(int argc, const char **argv, const char *prefix)
+{
+	int i, recursive = 0, number_threads = 0, quiet = 0;
+	static struct pathspec pathspec;
+	struct strbuf sb = STRBUF_INIT;
+	struct task_queue *aq;
+	char **cmd;
+	const char **nullargv = {NULL};
+	pthread_mutex_t mutex;
+
+	struct option module_update_options[] = {
+		OPT_STRING(0, "prefix", &alternative_path,
+			   N_("path"),
+			   N_("alternative anchor for relative paths")),
+		OPT_STRING(0, "cmd", &cmd,
+			   N_("string"),
+			   N_("command to run")),
+		OPT_BOOL('r', "--recursive", &recursive,
+			 N_("Recurse into nexted submodules")),
+		OPT_INTEGER('j', "jobs", &number_threads,
+			    N_("Recurse into nexted submodules")),
+		OPT__QUIET(&quiet, N_("Suppress output")),
+		OPT_END()
+	};
+
+	static const char * const git_submodule_helper_usage[] = {
+		N_("git submodule--helper foreach [--prefix=<path>] [<path>...]"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, module_update_options,
+			     git_submodule_helper_usage, 0);
+
+	if (module_list_compute(0, nullargv, NULL, &pathspec) < 0)
+		return 1;
+
+	gitmodules_config();
+
+	pthread_mutex_init(&mutex, NULL);
+	aq = create_task_queue(number_threads);
+
+	for (i = 0; i < ce_used; i++) {
+		const struct submodule *sub;
+		const struct cache_entry *ce = ce_entries[i];
+		struct submodule_args *args = malloc(sizeof(*args));
+
+		if (ce_stage(ce))
+			args->sha1 = xstrdup(sha1_to_hex(null_sha1));
+		else
+			args->sha1 = xstrdup(sha1_to_hex(ce->sha1));
+
+		strbuf_reset(&sb);
+		strbuf_addf(&sb, "%s/.git", ce->name);
+		if (!file_exists(sb.buf)) {
+			free(args);
+			continue;
+		}
+
+		args->path = ce->name;
+		sub = submodule_from_path(null_sha1, args->path);
+		if (!sub)
+			die("No submodule mapping found in .gitmodules for path '%s'", args->path);
+
+		args->name = sub->name;
+		args->toplevel = xgetcwd();
+		args->cmd = argv;
+		args->sync = &mutex;
+		args->prefix = alternative_path;
+		add_task(aq, run_cmd_submodule, args);
+	}
+
+	finish_task_queue(aq, NULL);
+	pthread_mutex_destroy(&mutex);
+	return 0;
+}
+#endif /* NO_PTHREADS */
+
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
 {
 	if (argc < 2)
@@ -280,7 +402,16 @@ int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[1], "module_clone"))
 		return module_clone(argc - 1, argv + 1, prefix);
 
+#ifndef NO_PTHREADS
+	if (!strcmp(argv[1], "foreach_parallel"))
+		return module_foreach_parallel(argc - 1, argv + 1, prefix);
+#endif
+
 usage:
 	usage("git submodule--helper [module_list | module_name | "
-	      "module_clone]\n");
+	      "module_clone"
+#ifndef NO_PTHREADS
+	      " | foreach_parallel"
+#endif
+	      "]\n");
 }
diff --git a/git-submodule.sh b/git-submodule.sh
index fb5155e..f06488a 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -431,6 +431,15 @@ cmd_foreach()
 }
 
 #
+# Execute an arbitrary command sequence in each checked out
+# submodule in parallel.
+#
+cmd_foreach_parallel()
+{
+	git submodule--helper foreach_parallel --prefix "$wt_prefix" $@
+}
+
+#
 # Register submodules in .git/config
 #
 # $@ = requested paths (default to all)
@@ -1225,7 +1234,7 @@ cmd_sync()
 while test $# != 0 && test -z "$command"
 do
 	case "$1" in
-	add | foreach | init | deinit | update | status | summary | sync)
+	add | foreach | foreach_parallel | init | deinit | update | status | summary | sync)
 		command=$1
 		;;
 	-q|--quiet)
diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
index 7ca10b8..16f6138 100755
--- a/t/t7407-submodule-foreach.sh
+++ b/t/t7407-submodule-foreach.sh
@@ -195,6 +195,17 @@ test_expect_success 'test "foreach --quiet --recursive"' '
 	test_cmp expect actual
 '
 
+test_expect_success 'test "foreach_parallel --quiet"' '
+	(
+		cd clone2 &&
+		git submodule foreach_parallel -q -- "echo \$name-\$path" > ../actual
+	) &&
+	grep nested1-nested1 actual &&
+	grep foo1-sub1 actual &&
+	grep foo2-sub2 actual &&
+	grep foo3-sub3 actual
+'
+
 test_expect_success 'use "update --recursive" to checkout all submodules' '
 	git clone super clone3 &&
 	(
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
                   ` (5 preceding siblings ...)
  2015-08-28  1:14 ` [PATCH 6/9] submodule: helper to run foreach in parallel Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28 17:00   ` Stefan Beller
  2015-08-31 18:56   ` Junio C Hamano
  2015-08-28  1:14 ` [PATCH 8/9] index-pack: Use the new worker pool Stefan Beller
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

This makes use of the new task queue and the syncing feature of
run-command to fetch a number of submodules at the same time.

The output will look like it would have been run sequential,
but faster.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/fetch-options.txt |   7 +++
 builtin/fetch.c                 |   6 ++-
 builtin/pull.c                  |   6 +++
 submodule.c                     | 100 +++++++++++++++++++++++++++++++++-------
 submodule.h                     |   2 +-
 5 files changed, 102 insertions(+), 19 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 45583d8..e2a59c3 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -100,6 +100,13 @@ ifndef::git-pull[]
 	reference to a commit that isn't already in the local submodule
 	clone.
 
+-j::
+--jobs=<n>::
+	Number of threads to be used for fetching submodules. Each thread
+	will fetch from different submodules, such that fetching many
+	submodules will be faster. By default the number of cpus will
+	be used .
+
 --no-recurse-submodules::
 	Disable recursive fetching of submodules (this has the same effect as
 	using the '--recurse-submodules=no' option).
diff --git a/builtin/fetch.c b/builtin/fetch.c
index ee1f1a9..636707e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,6 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
+static int max_threads;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
@@ -99,6 +100,8 @@ static struct option builtin_fetch_options[] = {
 		    N_("fetch all tags and associated objects"), TAGS_SET),
 	OPT_SET_INT('n', NULL, &tags,
 		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
+	OPT_INTEGER('j', "jobs", &max_threads,
+		    N_("number of threads used for fetching")),
 	OPT_BOOL('p', "prune", &prune,
 		 N_("prune remote-tracking branches no longer on remote")),
 	{ OPTION_CALLBACK, 0, "recurse-submodules", NULL, N_("on-demand"),
@@ -1217,7 +1220,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_populated_submodules(&options,
 						    submodule_prefix,
 						    recurse_submodules,
-						    verbosity < 0);
+						    verbosity < 0,
+						    max_threads);
 		argv_array_clear(&options);
 	}
 
diff --git a/builtin/pull.c b/builtin/pull.c
index 722a83c..fbbda67 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -94,6 +94,7 @@ static int opt_force;
 static char *opt_tags;
 static char *opt_prune;
 static char *opt_recurse_submodules;
+static char *max_threads;
 static int opt_dry_run;
 static char *opt_keep;
 static char *opt_depth;
@@ -177,6 +178,9 @@ static struct option pull_options[] = {
 		N_("on-demand"),
 		N_("control recursive fetching of submodules"),
 		PARSE_OPT_OPTARG),
+	OPT_PASSTHRU('j', "jobs", &max_threads, N_("n"),
+		N_("number of threads used for fetching submodules"),
+		PARSE_OPT_OPTARG),
 	OPT_BOOL(0, "dry-run", &opt_dry_run,
 		N_("dry run")),
 	OPT_PASSTHRU('k', "keep", &opt_keep, NULL,
@@ -524,6 +528,8 @@ static int run_fetch(const char *repo, const char **refspecs)
 		argv_array_push(&args, opt_prune);
 	if (opt_recurse_submodules)
 		argv_array_push(&args, opt_recurse_submodules);
+	if (max_threads)
+		argv_array_push(&args, max_threads);
 	if (opt_dry_run)
 		argv_array_push(&args, "--dry-run");
 	if (opt_keep)
diff --git a/submodule.c b/submodule.c
index 9fcc86f..50266a8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -12,6 +12,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "blob.h"
+#include "thread-utils.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static struct string_list changed_submodule_paths;
@@ -615,13 +616,79 @@ static void calculate_changed_submodule_paths(void)
 	initialized_fetch_ref_tips = 0;
 }
 
+struct submodule_parallel_fetch {
+	struct child_process cp;
+	struct argv_array argv;
+	int *result;
+};
+
+#ifndef NO_PTHREADS
+static pthread_mutex_t output_mutex = PTHREAD_MUTEX_INITIALIZER;
+
+static void set_output_mutex(struct submodule_parallel_fetch *spf)
+{
+	spf->cp.sync_mutex = &output_mutex;
+}
+
+#define lock_output_mutex() pthread_mutex_lock(&output_mutex)
+
+#define unlock_output_mutex() pthread_mutex_unlock(&output_mutex)
+
+static void destroy_output_mutex()
+{
+	pthread_mutex_destroy(&output_mutex);
+}
+
+#else
+#define set_output_mutex()
+#define destroy_output_mutex()
+#define lock_output_mutex()
+#define unlock_output_mutex()
+#endif
+
+static struct submodule_parallel_fetch *submodule_parallel_fetch_create()
+{
+	struct submodule_parallel_fetch *spf = xmalloc(sizeof(*spf));
+	child_process_init(&spf->cp);
+	spf->cp.env = local_repo_env;
+	spf->cp.git_cmd = 1;
+	spf->cp.no_stdin = 1;
+	spf->cp.stdout_to_stderr = 1;
+	spf->cp.sync_buf = xmalloc(sizeof(spf->cp.sync_buf));
+	strbuf_init(spf->cp.sync_buf, 0);
+
+	argv_array_init(&spf->argv);
+	return spf;
+}
+
+static int run_command_and_cleanup(struct task_queue *tq, void *arg)
+{
+	int code;
+	struct submodule_parallel_fetch *spf = arg;
+
+	spf->cp.argv = spf->argv.argv;
+
+	code = run_command(&spf->cp);
+	if (code) {
+		lock_output_mutex();
+		*spf->result = code;
+		unlock_output_mutex();
+	}
+
+	argv_array_clear(&spf->argv);
+	free((char*)spf->cp.dir);
+	free(spf);
+	return 0;
+}
+
 int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
-			       int quiet)
+			       int quiet, int max_parallel_jobs)
 {
 	int i, result = 0;
-	struct child_process cp = CHILD_PROCESS_INIT;
+	struct task_queue *tq;
 	struct argv_array argv = ARGV_ARRAY_INIT;
+	struct submodule_parallel_fetch *spf;
 	const char *work_tree = get_git_work_tree();
 	if (!work_tree)
 		goto out;
@@ -635,12 +702,9 @@ int fetch_populated_submodules(const struct argv_array *options,
 	argv_array_push(&argv, "--recurse-submodules-default");
 	/* default value, "--submodule-prefix" and its value are added later */
 
-	cp.env = local_repo_env;
-	cp.git_cmd = 1;
-	cp.no_stdin = 1;
-
 	calculate_changed_submodule_paths();
 
+	tq = create_task_queue(max_parallel_jobs);
 	for (i = 0; i < active_nr; i++) {
 		struct strbuf submodule_path = STRBUF_INIT;
 		struct strbuf submodule_git_dir = STRBUF_INIT;
@@ -693,24 +757,26 @@ int fetch_populated_submodules(const struct argv_array *options,
 		if (!git_dir)
 			git_dir = submodule_git_dir.buf;
 		if (is_directory(git_dir)) {
+			spf = submodule_parallel_fetch_create();
+			spf->result = &result;
+			spf->cp.dir = strbuf_detach(&submodule_path, NULL);
 			if (!quiet)
-				printf("Fetching submodule %s%s\n", prefix, ce->name);
-			cp.dir = submodule_path.buf;
-			argv_array_push(&argv, default_argv);
-			argv_array_push(&argv, "--submodule-prefix");
-			argv_array_push(&argv, submodule_prefix.buf);
-			cp.argv = argv.argv;
-			if (run_command(&cp))
-				result = 1;
-			argv_array_pop(&argv);
-			argv_array_pop(&argv);
-			argv_array_pop(&argv);
+				strbuf_addf(spf->cp.sync_buf,
+					    "Fetching submodule %s%s",
+					    prefix, ce->name);
+			argv_array_pushv(&spf->argv, argv.argv);
+			argv_array_push(&spf->argv, default_argv);
+			argv_array_push(&spf->argv, "--submodule-prefix");
+			argv_array_push(&spf->argv, submodule_prefix.buf);
+			set_output_mutex(spf);
+			add_task(tq, run_command_and_cleanup, spf);
 		}
 		strbuf_release(&submodule_path);
 		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
 	}
 	argv_array_clear(&argv);
+	destroy_output_mutex();
 out:
 	string_list_clear(&changed_submodule_paths, 1);
 	return result;
diff --git a/submodule.h b/submodule.h
index 5507c3d..cbc0003 100644
--- a/submodule.h
+++ b/submodule.h
@@ -31,7 +31,7 @@ void set_config_fetch_recurse_submodules(int value);
 void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
-			       int quiet);
+			       int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/9] index-pack: Use the new worker pool
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
                   ` (6 preceding siblings ...)
  2015-08-28  1:14 ` [PATCH 7/9] fetch: fetch submodules " Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28  1:14 ` [PATCH 9/9] pack-objects: Use " Stefan Beller
  2015-08-28 10:09 ` [PATCH 0/9] Progress with git submodule Johannes Schindelin
  9 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

This demonstrates how the new threading API may be used.
There is no change in the workflow, just using the new
threading API instead of keeping track of the pthreads
ourselves.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/index-pack.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3f10840..187b281 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1075,7 +1075,7 @@ static void resolve_base(struct object_entry *obj)
 }
 
 #ifndef NO_PTHREADS
-static void *threaded_second_pass(void *data)
+static int threaded_second_pass(struct task_queue *tq, void *data)
 {
 	set_thread_data(data);
 	for (;;) {
@@ -1096,7 +1096,7 @@ static void *threaded_second_pass(void *data)
 
 		resolve_base(&objects[i]);
 	}
-	return NULL;
+	return 0;
 }
 #endif
 
@@ -1195,18 +1195,19 @@ static void resolve_deltas(void)
 					  nr_ref_deltas + nr_ofs_deltas);
 
 #ifndef NO_PTHREADS
-	nr_dispatched = 0;
+
 	if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
+		struct task_queue *tq;
+		nr_dispatched = 0;
 		init_thread();
-		for (i = 0; i < nr_threads; i++) {
-			int ret = pthread_create(&thread_data[i].thread, NULL,
-						 threaded_second_pass, thread_data + i);
-			if (ret)
-				die(_("unable to create thread: %s"),
-				    strerror(ret));
-		}
+
+		tq = create_task_queue(nr_threads);
 		for (i = 0; i < nr_threads; i++)
-			pthread_join(thread_data[i].thread, NULL);
+			add_task(tq, threaded_second_pass, thread_data + i);
+
+		if (finish_task_queue(tq, NULL))
+			die("Not all threads have finished");
+
 		cleanup_thread();
 		return;
 	}
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 9/9] pack-objects: Use new worker pool
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
                   ` (7 preceding siblings ...)
  2015-08-28  1:14 ` [PATCH 8/9] index-pack: Use the new worker pool Stefan Beller
@ 2015-08-28  1:14 ` Stefan Beller
  2015-08-28 10:09 ` [PATCH 0/9] Progress with git submodule Johannes Schindelin
  9 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28  1:14 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, gitster, johannes.schindelin, Stefan Beller

Before we had <n> threads doing the delta finding work, and the main thread
was load balancing the threads, i.e. moving work from a thread with a large
amount left to an idle thread whenever such a situation arose.

This moves the load balancing to the threads themselves. As soon as one
thread is done working it will look at its peer threads and will pickup
half the work load from the thread with the largest pending load.

By having the load balancing as part of the threads, the locking and
communication model becomes easier, such that we don't need so many
mutexes any more.

It also demonstrates the usage of the new threading pool being easily
applied in different situations.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 builtin/pack-objects.c | 175 ++++++++++++++++---------------------------------
 1 file changed, 57 insertions(+), 118 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 62cc16d..f46d2df 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -17,6 +17,7 @@
 #include "pack-objects.h"
 #include "progress.h"
 #include "refs.h"
+#include "run-command.h"
 #include "streaming.h"
 #include "thread-utils.h"
 #include "pack-bitmap.h"
@@ -1887,26 +1888,12 @@ static void try_to_free_from_threads(size_t size)
 
 static try_to_free_t old_try_to_free_routine;
 
-/*
- * The main thread waits on the condition that (at least) one of the workers
- * has stopped working (which is indicated in the .working member of
- * struct thread_params).
- * When a work thread has completed its work, it sets .working to 0 and
- * signals the main thread and waits on the condition that .data_ready
- * becomes 1.
- */
-
 struct thread_params {
-	pthread_t thread;
 	struct object_entry **list;
 	unsigned list_size;
 	unsigned remaining;
 	int window;
 	int depth;
-	int working;
-	int data_ready;
-	pthread_mutex_t mutex;
-	pthread_cond_t cond;
 	unsigned *processed;
 };
 
@@ -1933,7 +1920,52 @@ static void cleanup_threaded_search(void)
 	pthread_mutex_destroy(&progress_mutex);
 }
 
-static void *threaded_find_deltas(void *arg)
+static struct thread_params *p;
+
+static void threaded_split_largest_workload(struct thread_params *target)
+{
+	int i;
+
+	struct object_entry **list;
+	struct thread_params *victim = NULL;
+	unsigned sub_size = 0;
+
+	/* Find a victim */
+	progress_lock();
+	for (i = 0; i < delta_search_threads; i++)
+		if (p[i].remaining > 2*window &&
+		    (!victim || victim->remaining < p[i].remaining))
+			victim = &p[i];
+
+	if (victim) {
+		sub_size = victim->remaining / 2;
+		list = victim->list + victim->list_size - sub_size;
+		while (sub_size && list[0]->hash &&
+		       list[0]->hash == list[-1]->hash) {
+			list++;
+			sub_size--;
+		}
+		if (!sub_size) {
+			/*
+			 * It is possible for some "paths" to have
+			 * so many objects that no hash boundary
+			 * might be found.  Let's just steal the
+			 * exact half in that case.
+			 */
+			sub_size = victim->remaining / 2;
+			list -= sub_size;
+		}
+		victim->list_size -= sub_size;
+		victim->remaining -= sub_size;
+
+		target->list = list;
+		target->list_size = sub_size;
+		target->remaining = sub_size;
+	}
+	progress_unlock();
+}
+
+static int threaded_find_deltas(struct task_queue *tq, void *arg)
 {
 	struct thread_params *me = arg;
 
@@ -1941,34 +1973,17 @@ static void *threaded_find_deltas(void *arg)
 		find_deltas(me->list, &me->remaining,
 			    me->window, me->depth, me->processed);
 
-		progress_lock();
-		me->working = 0;
-		pthread_cond_signal(&progress_cond);
-		progress_unlock();
-
-		/*
-		 * We must not set ->data_ready before we wait on the
-		 * condition because the main thread may have set it to 1
-		 * before we get here. In order to be sure that new
-		 * work is available if we see 1 in ->data_ready, it
-		 * was initialized to 0 before this thread was spawned
-		 * and we reset it to 0 right away.
-		 */
-		pthread_mutex_lock(&me->mutex);
-		while (!me->data_ready)
-			pthread_cond_wait(&me->cond, &me->mutex);
-		me->data_ready = 0;
-		pthread_mutex_unlock(&me->mutex);
+		threaded_split_largest_workload(me);
 	}
-	/* leave ->working 1 so that this doesn't get more work assigned */
-	return NULL;
+
+	return 0;
 }
 
 static void ll_find_deltas(struct object_entry **list, unsigned list_size,
 			   int window, int depth, unsigned *processed)
 {
-	struct thread_params *p;
-	int i, ret, active_threads = 0;
+	struct task_queue *tq;
+	int i;
 
 	init_threaded_search();
 
@@ -1980,8 +1995,11 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size,
 	if (progress > pack_to_stdout)
 		fprintf(stderr, "Delta compression using up to %d threads.\n",
 				delta_search_threads);
+
 	p = xcalloc(delta_search_threads, sizeof(*p));
 
+	tq = create_task_queue(delta_search_threads);
+
 	/* Partition the work amongst work threads. */
 	for (i = 0; i < delta_search_threads; i++) {
 		unsigned sub_size = list_size / (delta_search_threads - i);
@@ -1993,8 +2011,6 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size,
 		p[i].window = window;
 		p[i].depth = depth;
 		p[i].processed = processed;
-		p[i].working = 1;
-		p[i].data_ready = 0;
 
 		/* try to split chunks on "path" boundaries */
 		while (sub_size && sub_size < list_size &&
@@ -2008,87 +2024,10 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size,
 
 		list += sub_size;
 		list_size -= sub_size;
+		add_task(tq, threaded_find_deltas, &p[i]);
 	}
 
-	/* Start work threads. */
-	for (i = 0; i < delta_search_threads; i++) {
-		if (!p[i].list_size)
-			continue;
-		pthread_mutex_init(&p[i].mutex, NULL);
-		pthread_cond_init(&p[i].cond, NULL);
-		ret = pthread_create(&p[i].thread, NULL,
-				     threaded_find_deltas, &p[i]);
-		if (ret)
-			die("unable to create thread: %s", strerror(ret));
-		active_threads++;
-	}
-
-	/*
-	 * Now let's wait for work completion.  Each time a thread is done
-	 * with its work, we steal half of the remaining work from the
-	 * thread with the largest number of unprocessed objects and give
-	 * it to that newly idle thread.  This ensure good load balancing
-	 * until the remaining object list segments are simply too short
-	 * to be worth splitting anymore.
-	 */
-	while (active_threads) {
-		struct thread_params *target = NULL;
-		struct thread_params *victim = NULL;
-		unsigned sub_size = 0;
-
-		progress_lock();
-		for (;;) {
-			for (i = 0; !target && i < delta_search_threads; i++)
-				if (!p[i].working)
-					target = &p[i];
-			if (target)
-				break;
-			pthread_cond_wait(&progress_cond, &progress_mutex);
-		}
-
-		for (i = 0; i < delta_search_threads; i++)
-			if (p[i].remaining > 2*window &&
-			    (!victim || victim->remaining < p[i].remaining))
-				victim = &p[i];
-		if (victim) {
-			sub_size = victim->remaining / 2;
-			list = victim->list + victim->list_size - sub_size;
-			while (sub_size && list[0]->hash &&
-			       list[0]->hash == list[-1]->hash) {
-				list++;
-				sub_size--;
-			}
-			if (!sub_size) {
-				/*
-				 * It is possible for some "paths" to have
-				 * so many objects that no hash boundary
-				 * might be found.  Let's just steal the
-				 * exact half in that case.
-				 */
-				sub_size = victim->remaining / 2;
-				list -= sub_size;
-			}
-			target->list = list;
-			victim->list_size -= sub_size;
-			victim->remaining -= sub_size;
-		}
-		target->list_size = sub_size;
-		target->remaining = sub_size;
-		target->working = 1;
-		progress_unlock();
-
-		pthread_mutex_lock(&target->mutex);
-		target->data_ready = 1;
-		pthread_cond_signal(&target->cond);
-		pthread_mutex_unlock(&target->mutex);
-
-		if (!sub_size) {
-			pthread_join(target->thread, NULL);
-			pthread_cond_destroy(&target->cond);
-			pthread_mutex_destroy(&target->mutex);
-			active_threads--;
-		}
-	}
+	finish_task_queue(tq, NULL);
 	cleanup_threaded_search();
 	free(p);
 }
-- 
2.5.0.264.g5e52b0d

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/9] Progress with git submodule
  2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
                   ` (8 preceding siblings ...)
  2015-08-28  1:14 ` [PATCH 9/9] pack-objects: Use " Stefan Beller
@ 2015-08-28 10:09 ` Johannes Schindelin
  2015-08-28 16:35   ` Stefan Beller
  9 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2015-08-28 10:09 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff, jrnieder, gitster

Hi Stefan,

On 2015-08-28 03:14, Stefan Beller wrote:

> Stefan Beller (9):
>   submodule: implement `module_list` as a builtin helper
>   submodule: implement `module_name` as a builtin helper
>   submodule: implement `module_clone` as a builtin helper

Another thing that just hit me: is there any specific reason why the underscore? I think we prefer dashes for subcommands (think: cherry-pick) and maybe we want to do the same for subsubcommands...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/9] Progress with git submodule
  2015-08-28 10:09 ` [PATCH 0/9] Progress with git submodule Johannes Schindelin
@ 2015-08-28 16:35   ` Stefan Beller
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 16:35 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git@vger.kernel.org, Jeff King, Jonathan Nieder, Junio C Hamano

On Fri, Aug 28, 2015 at 3:09 AM, Johannes Schindelin
<johannes.schindelin@gmx.de> wrote:
> Hi Stefan,
>
> On 2015-08-28 03:14, Stefan Beller wrote:
>
>> Stefan Beller (9):
>>   submodule: implement `module_list` as a builtin helper
>>   submodule: implement `module_name` as a builtin helper
>>   submodule: implement `module_clone` as a builtin helper
>
> Another thing that just hit me: is there any specific reason why the underscore? I think we prefer dashes for subcommands (think: cherry-pick) and maybe we want to do the same for subsubcommands...
>
> Ciao,
> Dscho
>

Junio wrote on Aug 3rd:
>>>     $ git submodule--helper module_list
>>
>> Why would you use an underscore in here as opposed to a dash?
>
> Simply because the diff would be easier to read; the callers used to
> call module_list shell function, now they call the subcommand with the
> same name of submodule--helper.

I mean no user is expected to use the submodule--helper functions directly and
we want to get rid of them eventually anyway (once the whole git-submodule.sh
is rewritten in C we can just call the C functions instead of calling
out to helpers)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28  1:14 ` [PATCH 7/9] fetch: fetch submodules " Stefan Beller
@ 2015-08-28 17:00   ` Stefan Beller
  2015-08-28 17:01     ` Jonathan Nieder
  2015-08-31 18:56   ` Junio C Hamano
  1 sibling, 1 reply; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 17:00 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: Jeff King, Jonathan Nieder, Junio C Hamano, Johannes Schindelin,
	Stefan Beller

On Thu, Aug 27, 2015 at 6:14 PM, Stefan Beller <sbeller@google.com> wrote:
> This makes use of the new task queue and the syncing feature of
> run-command to fetch a number of submodules at the same time.
>
> The output will look like it would have been run sequential,
> but faster.

And it breaks the tests t5526-fetch-submodules.sh as the output is done
on stderr only, instead of putting "Fetching submodule <submodule-path>
to stdout. :(

I guess combining stdout and stderr is not a good strategy after all now.
Advantages:
* The order is preserved if everything is in one stream.
Disadvantages:
* It's a change, which may not be expected.
* It's harder to make it machine parsable, as that probably
   relied on having 2 channels.

So now I need to come up with a way to either buffer 2 channels at the same
time, or we need to redefine parallel submodule fetch output semantics a bit.

>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/fetch-options.txt |   7 +++
>  builtin/fetch.c                 |   6 ++-
>  builtin/pull.c                  |   6 +++
>  submodule.c                     | 100 +++++++++++++++++++++++++++++++++-------
>  submodule.h                     |   2 +-
>  5 files changed, 102 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
> index 45583d8..e2a59c3 100644
> --- a/Documentation/fetch-options.txt
> +++ b/Documentation/fetch-options.txt
> @@ -100,6 +100,13 @@ ifndef::git-pull[]
>         reference to a commit that isn't already in the local submodule
>         clone.
>
> +-j::
> +--jobs=<n>::
> +       Number of threads to be used for fetching submodules. Each thread
> +       will fetch from different submodules, such that fetching many
> +       submodules will be faster. By default the number of cpus will
> +       be used .
> +
>  --no-recurse-submodules::
>         Disable recursive fetching of submodules (this has the same effect as
>         using the '--recurse-submodules=no' option).
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index ee1f1a9..636707e 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -37,6 +37,7 @@ static int prune = -1; /* unspecified */
>  static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
>  static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
>  static int tags = TAGS_DEFAULT, unshallow, update_shallow;
> +static int max_threads;
>  static const char *depth;
>  static const char *upload_pack;
>  static struct strbuf default_rla = STRBUF_INIT;
> @@ -99,6 +100,8 @@ static struct option builtin_fetch_options[] = {
>                     N_("fetch all tags and associated objects"), TAGS_SET),
>         OPT_SET_INT('n', NULL, &tags,
>                     N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
> +       OPT_INTEGER('j', "jobs", &max_threads,
> +                   N_("number of threads used for fetching")),
>         OPT_BOOL('p', "prune", &prune,
>                  N_("prune remote-tracking branches no longer on remote")),
>         { OPTION_CALLBACK, 0, "recurse-submodules", NULL, N_("on-demand"),
> @@ -1217,7 +1220,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
>                 result = fetch_populated_submodules(&options,
>                                                     submodule_prefix,
>                                                     recurse_submodules,
> -                                                   verbosity < 0);
> +                                                   verbosity < 0,
> +                                                   max_threads);
>                 argv_array_clear(&options);
>         }
>
> diff --git a/builtin/pull.c b/builtin/pull.c
> index 722a83c..fbbda67 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -94,6 +94,7 @@ static int opt_force;
>  static char *opt_tags;
>  static char *opt_prune;
>  static char *opt_recurse_submodules;
> +static char *max_threads;
>  static int opt_dry_run;
>  static char *opt_keep;
>  static char *opt_depth;
> @@ -177,6 +178,9 @@ static struct option pull_options[] = {
>                 N_("on-demand"),
>                 N_("control recursive fetching of submodules"),
>                 PARSE_OPT_OPTARG),
> +       OPT_PASSTHRU('j', "jobs", &max_threads, N_("n"),
> +               N_("number of threads used for fetching submodules"),
> +               PARSE_OPT_OPTARG),
>         OPT_BOOL(0, "dry-run", &opt_dry_run,
>                 N_("dry run")),
>         OPT_PASSTHRU('k', "keep", &opt_keep, NULL,
> @@ -524,6 +528,8 @@ static int run_fetch(const char *repo, const char **refspecs)
>                 argv_array_push(&args, opt_prune);
>         if (opt_recurse_submodules)
>                 argv_array_push(&args, opt_recurse_submodules);
> +       if (max_threads)
> +               argv_array_push(&args, max_threads);
>         if (opt_dry_run)
>                 argv_array_push(&args, "--dry-run");
>         if (opt_keep)
> diff --git a/submodule.c b/submodule.c
> index 9fcc86f..50266a8 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -12,6 +12,7 @@
>  #include "sha1-array.h"
>  #include "argv-array.h"
>  #include "blob.h"
> +#include "thread-utils.h"
>
>  static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
>  static struct string_list changed_submodule_paths;
> @@ -615,13 +616,79 @@ static void calculate_changed_submodule_paths(void)
>         initialized_fetch_ref_tips = 0;
>  }
>
> +struct submodule_parallel_fetch {
> +       struct child_process cp;
> +       struct argv_array argv;
> +       int *result;
> +};
> +
> +#ifndef NO_PTHREADS
> +static pthread_mutex_t output_mutex = PTHREAD_MUTEX_INITIALIZER;
> +
> +static void set_output_mutex(struct submodule_parallel_fetch *spf)
> +{
> +       spf->cp.sync_mutex = &output_mutex;
> +}
> +
> +#define lock_output_mutex() pthread_mutex_lock(&output_mutex)
> +
> +#define unlock_output_mutex() pthread_mutex_unlock(&output_mutex)
> +
> +static void destroy_output_mutex()
> +{
> +       pthread_mutex_destroy(&output_mutex);
> +}
> +
> +#else
> +#define set_output_mutex()
> +#define destroy_output_mutex()
> +#define lock_output_mutex()
> +#define unlock_output_mutex()
> +#endif
> +
> +static struct submodule_parallel_fetch *submodule_parallel_fetch_create()
> +{
> +       struct submodule_parallel_fetch *spf = xmalloc(sizeof(*spf));
> +       child_process_init(&spf->cp);
> +       spf->cp.env = local_repo_env;
> +       spf->cp.git_cmd = 1;
> +       spf->cp.no_stdin = 1;
> +       spf->cp.stdout_to_stderr = 1;
> +       spf->cp.sync_buf = xmalloc(sizeof(spf->cp.sync_buf));
> +       strbuf_init(spf->cp.sync_buf, 0);
> +
> +       argv_array_init(&spf->argv);
> +       return spf;
> +}
> +
> +static int run_command_and_cleanup(struct task_queue *tq, void *arg)
> +{
> +       int code;
> +       struct submodule_parallel_fetch *spf = arg;
> +
> +       spf->cp.argv = spf->argv.argv;
> +
> +       code = run_command(&spf->cp);
> +       if (code) {
> +               lock_output_mutex();
> +               *spf->result = code;
> +               unlock_output_mutex();
> +       }
> +
> +       argv_array_clear(&spf->argv);
> +       free((char*)spf->cp.dir);
> +       free(spf);
> +       return 0;
> +}
> +
>  int fetch_populated_submodules(const struct argv_array *options,
>                                const char *prefix, int command_line_option,
> -                              int quiet)
> +                              int quiet, int max_parallel_jobs)
>  {
>         int i, result = 0;
> -       struct child_process cp = CHILD_PROCESS_INIT;
> +       struct task_queue *tq;
>         struct argv_array argv = ARGV_ARRAY_INIT;
> +       struct submodule_parallel_fetch *spf;
>         const char *work_tree = get_git_work_tree();
>         if (!work_tree)
>                 goto out;
> @@ -635,12 +702,9 @@ int fetch_populated_submodules(const struct argv_array *options,
>         argv_array_push(&argv, "--recurse-submodules-default");
>         /* default value, "--submodule-prefix" and its value are added later */
>
> -       cp.env = local_repo_env;
> -       cp.git_cmd = 1;
> -       cp.no_stdin = 1;
> -
>         calculate_changed_submodule_paths();
>
> +       tq = create_task_queue(max_parallel_jobs);
>         for (i = 0; i < active_nr; i++) {
>                 struct strbuf submodule_path = STRBUF_INIT;
>                 struct strbuf submodule_git_dir = STRBUF_INIT;
> @@ -693,24 +757,26 @@ int fetch_populated_submodules(const struct argv_array *options,
>                 if (!git_dir)
>                         git_dir = submodule_git_dir.buf;
>                 if (is_directory(git_dir)) {
> +                       spf = submodule_parallel_fetch_create();
> +                       spf->result = &result;
> +                       spf->cp.dir = strbuf_detach(&submodule_path, NULL);
>                         if (!quiet)
> -                               printf("Fetching submodule %s%s\n", prefix, ce->name);
> -                       cp.dir = submodule_path.buf;
> -                       argv_array_push(&argv, default_argv);
> -                       argv_array_push(&argv, "--submodule-prefix");
> -                       argv_array_push(&argv, submodule_prefix.buf);
> -                       cp.argv = argv.argv;
> -                       if (run_command(&cp))
> -                               result = 1;
> -                       argv_array_pop(&argv);
> -                       argv_array_pop(&argv);
> -                       argv_array_pop(&argv);
> +                               strbuf_addf(spf->cp.sync_buf,
> +                                           "Fetching submodule %s%s",
> +                                           prefix, ce->name);
> +                       argv_array_pushv(&spf->argv, argv.argv);
> +                       argv_array_push(&spf->argv, default_argv);
> +                       argv_array_push(&spf->argv, "--submodule-prefix");
> +                       argv_array_push(&spf->argv, submodule_prefix.buf);
> +                       set_output_mutex(spf);
> +                       add_task(tq, run_command_and_cleanup, spf);
>                 }
>                 strbuf_release(&submodule_path);
>                 strbuf_release(&submodule_git_dir);
>                 strbuf_release(&submodule_prefix);
>         }
>         argv_array_clear(&argv);
> +       destroy_output_mutex();
>  out:
>         string_list_clear(&changed_submodule_paths, 1);
>         return result;
> diff --git a/submodule.h b/submodule.h
> index 5507c3d..cbc0003 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -31,7 +31,7 @@ void set_config_fetch_recurse_submodules(int value);
>  void check_for_new_submodule_commits(unsigned char new_sha1[20]);
>  int fetch_populated_submodules(const struct argv_array *options,
>                                const char *prefix, int command_line_option,
> -                              int quiet);
> +                              int quiet, int max_parallel_jobs);
>  unsigned is_submodule_modified(const char *path, int ignore_untracked);
>  int submodule_uses_gitfile(const char *path);
>  int ok_to_remove_submodule(const char *path);
> --
> 2.5.0.264.g5e52b0d
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 17:00   ` Stefan Beller
@ 2015-08-28 17:01     ` Jonathan Nieder
  2015-08-28 17:12       ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2015-08-28 17:01 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano,
	Johannes Schindelin

Stefan Beller wrote:
> On Thu, Aug 27, 2015 at 6:14 PM, Stefan Beller <sbeller@google.com> wrote:

>> This makes use of the new task queue and the syncing feature of
>> run-command to fetch a number of submodules at the same time.
>>
>> The output will look like it would have been run sequential,
>> but faster.
>
> And it breaks the tests t5526-fetch-submodules.sh as the output is done
> on stderr only, instead of putting "Fetching submodule <submodule-path>
> to stdout. :(
>
> I guess combining stdout and stderr is not a good strategy after all now.

IMHO the "Fetching submodule <submodule-path>" output always should have
gone to stderr.  It is not output that scripts would be relying on ---
it is just progress output.

So a preliminary patch doing that (and updating tests) would make sense
to me.

Thoughts?
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] submodule: helper to run foreach in parallel
  2015-08-28  1:14 ` [PATCH 6/9] submodule: helper to run foreach in parallel Stefan Beller
@ 2015-08-28 17:08   ` Stefan Beller
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 17:08 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: Jeff King, Jonathan Nieder, Junio C Hamano, Johannes Schindelin,
	Stefan Beller

On Thu, Aug 27, 2015 at 6:14 PM, Stefan Beller <sbeller@google.com> wrote:
> Similar to `git submodule foreach` the new command `git submodule
> foreach_parallel` will run a command on each submodule.
>
> The commands are run in parallel up to the number of cores by default,
> or you can specify '-j 4' tun just run with 4 threads for example.
>
> One major difference to `git submodule foreach` is the handling of input
> and output to the commands. Because of the parallel nature of the execution
> it is not trivial how to schedule the std{in,out,err} channel for submodule
> the command is run in. So in this patch there is no support for stdin.
> stdout will be piped to stderr. stderr will make use of the synchronized
> output feature of run_command.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  builtin/submodule--helper.c  | 133 ++++++++++++++++++++++++++++++++++++++++++-
>  git-submodule.sh             |  11 +++-
>  t/t7407-submodule-foreach.sh |  11 ++++
>  3 files changed, 153 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index d29499c..18b67f0 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -8,6 +8,7 @@
>  #include "submodule.h"
>  #include "submodule-config.h"
>  #include "string-list.h"
> +#include "thread-utils.h"
>  #include "run-command.h"
>
>  static const struct cache_entry **ce_entries;
> @@ -266,6 +267,127 @@ static int module_clone(int argc, const char **argv, const char *prefix)
>         return 0;
>  }
>
> +#ifndef NO_PTHREADS
> +struct submodule_args {
> +       const char *name;
> +       const char *path;
> +       const char *sha1;
> +       const char *toplevel;
> +       const char *prefix;
> +       const char **cmd;
> +       pthread_mutex_t *sync;
> +};
> +
> +int run_cmd_submodule(struct task_queue *aq, void *task)
> +{
> +       int i;
> +       struct submodule_args *args = task;
> +       struct strbuf out = STRBUF_INIT;
> +       struct strbuf sb = STRBUF_INIT;
> +       struct child_process *cp = xmalloc(sizeof(*cp));
> +
> +       strbuf_addf(&out, N_("Entering %s\n"), relative_path(args->path, args->prefix, &sb));
> +
> +       child_process_init(cp);
> +       argv_array_pushv(&cp->args, args->cmd);
> +
> +       argv_array_pushf(&cp->env_array, "name=%s", args->name);
> +       argv_array_pushf(&cp->env_array, "path=%s", args->path);
> +       argv_array_pushf(&cp->env_array, "sha1=%s", args->sha1);
> +       argv_array_pushf(&cp->env_array, "toplevel=%s", args->toplevel);
> +
> +       for (i = 0; local_repo_env[i]; i++)
> +               argv_array_push(&cp->env_array, local_repo_env[i]);
> +
> +       cp->no_stdin = 1;
> +       cp->out = 0;
> +       cp->err = -1;
> +       cp->dir = args->path;
> +       cp->stdout_to_stderr = 1;
> +       cp->use_shell = 1;
> +       cp->sync_mutex = args->sync;
> +       cp->sync_buf = &out;
> +
> +       return run_command(cp);
> +}
> +
> +int module_foreach_parallel(int argc, const char **argv, const char *prefix)
> +{
> +       int i, recursive = 0, number_threads = 0, quiet = 0;
> +       static struct pathspec pathspec;
> +       struct strbuf sb = STRBUF_INIT;
> +       struct task_queue *aq;
> +       char **cmd;
> +       const char **nullargv = {NULL};
> +       pthread_mutex_t mutex;
> +
> +       struct option module_update_options[] = {
> +               OPT_STRING(0, "prefix", &alternative_path,
> +                          N_("path"),
> +                          N_("alternative anchor for relative paths")),
> +               OPT_STRING(0, "cmd", &cmd,
> +                          N_("string"),
> +                          N_("command to run")),
> +               OPT_BOOL('r', "--recursive", &recursive,
> +                        N_("Recurse into nexted submodules")),
> +               OPT_INTEGER('j', "jobs", &number_threads,
> +                           N_("Recurse into nexted submodules")),
> +               OPT__QUIET(&quiet, N_("Suppress output")),
> +               OPT_END()
> +       };
> +
> +       static const char * const git_submodule_helper_usage[] = {
> +               N_("git submodule--helper foreach [--prefix=<path>] [<path>...]"),
> +               NULL
> +       };
> +
> +       argc = parse_options(argc, argv, prefix, module_update_options,
> +                            git_submodule_helper_usage, 0);
> +
> +       if (module_list_compute(0, nullargv, NULL, &pathspec) < 0)
> +               return 1;
> +
> +       gitmodules_config();
> +
> +       pthread_mutex_init(&mutex, NULL);
> +       aq = create_task_queue(number_threads);
> +
> +       for (i = 0; i < ce_used; i++) {
> +               const struct submodule *sub;
> +               const struct cache_entry *ce = ce_entries[i];
> +               struct submodule_args *args = malloc(sizeof(*args));
> +
> +               if (ce_stage(ce))
> +                       args->sha1 = xstrdup(sha1_to_hex(null_sha1));
> +               else
> +                       args->sha1 = xstrdup(sha1_to_hex(ce->sha1));
> +
> +               strbuf_reset(&sb);
> +               strbuf_addf(&sb, "%s/.git", ce->name);
> +               if (!file_exists(sb.buf)) {
> +                       free(args);
> +                       continue;
> +               }
> +
> +               args->path = ce->name;
> +               sub = submodule_from_path(null_sha1, args->path);
> +               if (!sub)
> +                       die("No submodule mapping found in .gitmodules for path '%s'", args->path);
> +
> +               args->name = sub->name;
> +               args->toplevel = xgetcwd();
> +               args->cmd = argv;
> +               args->sync = &mutex;
> +               args->prefix = alternative_path;
> +               add_task(aq, run_cmd_submodule, args);
> +       }
> +
> +       finish_task_queue(aq, NULL);
> +       pthread_mutex_destroy(&mutex);
> +       return 0;
> +}
> +#endif /* NO_PTHREADS */
> +
>  int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
>  {
>         if (argc < 2)
> @@ -280,7 +402,16 @@ int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
>         if (!strcmp(argv[1], "module_clone"))
>                 return module_clone(argc - 1, argv + 1, prefix);
>
> +#ifndef NO_PTHREADS
> +       if (!strcmp(argv[1], "foreach_parallel"))
> +               return module_foreach_parallel(argc - 1, argv + 1, prefix);
> +#endif
> +
>  usage:
>         usage("git submodule--helper [module_list | module_name | "
> -             "module_clone]\n");
> +             "module_clone"
> +#ifndef NO_PTHREADS
> +             " | foreach_parallel"
> +#endif
> +             "]\n");
>  }
> diff --git a/git-submodule.sh b/git-submodule.sh
> index fb5155e..f06488a 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -431,6 +431,15 @@ cmd_foreach()
>  }
>
>  #
> +# Execute an arbitrary command sequence in each checked out
> +# submodule in parallel.
> +#
> +cmd_foreach_parallel()
> +{
> +       git submodule--helper foreach_parallel --prefix "$wt_prefix" $@
> +}
> +
> +#
>  # Register submodules in .git/config
>  #
>  # $@ = requested paths (default to all)
> @@ -1225,7 +1234,7 @@ cmd_sync()
>  while test $# != 0 && test -z "$command"
>  do
>         case "$1" in
> -       add | foreach | init | deinit | update | status | summary | sync)
> +       add | foreach | foreach_parallel | init | deinit | update | status | summary | sync)
>                 command=$1
>                 ;;
>         -q|--quiet)
> diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
> index 7ca10b8..16f6138 100755
> --- a/t/t7407-submodule-foreach.sh
> +++ b/t/t7407-submodule-foreach.sh
> @@ -195,6 +195,17 @@ test_expect_success 'test "foreach --quiet --recursive"' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'test "foreach_parallel --quiet"' '
> +       (
> +               cd clone2 &&
> +               git submodule foreach_parallel -q -- "echo \$name-\$path" > ../actual
> +       ) &&
> +       grep nested1-nested1 actual &&
> +       grep foo1-sub1 actual &&
> +       grep foo2-sub2 actual &&
> +       grep foo3-sub3 actual
> +'
> +

This test fails, because the variables don't seem to be resolved. actual reads
literally "$name-$path" instead of the actual values replaced.

Using the `git submodule--helper foreach_parallel "echo
\$name-\$path"` works as expected,
so I think there is an issue in wrapping that in git-submodule.sh

>  test_expect_success 'use "update --recursive" to checkout all submodules' '
>         git clone super clone3 &&
>         (
> --
> 2.5.0.264.g5e52b0d
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 17:01     ` Jonathan Nieder
@ 2015-08-28 17:12       ` Junio C Hamano
  2015-08-28 17:45         ` Stefan Beller
  2015-08-28 18:20         ` Jonathan Nieder
  0 siblings, 2 replies; 33+ messages in thread
From: Junio C Hamano @ 2015-08-28 17:12 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, git@vger.kernel.org, Jeff King,
	Johannes Schindelin

Jonathan Nieder <jrnieder@gmail.com> writes:

> Stefan Beller wrote:
>> On Thu, Aug 27, 2015 at 6:14 PM, Stefan Beller <sbeller@google.com> wrote:
>
>>> This makes use of the new task queue and the syncing feature of
>>> run-command to fetch a number of submodules at the same time.
>>>
>>> The output will look like it would have been run sequential,
>>> but faster.
>>
>> And it breaks the tests t5526-fetch-submodules.sh as the output is done
>> on stderr only, instead of putting "Fetching submodule <submodule-path>
>> to stdout. :(
>>
>> I guess combining stdout and stderr is not a good strategy after all now.
>
> IMHO the "Fetching submodule <submodule-path>" output always should have
> gone to stderr.  It is not output that scripts would be relying on ---
> it is just progress output.
>
> So a preliminary patch doing that (and updating tests) would make sense
> to me.

Sounds good.

I personally do not think the "we still do all the output from a
single process while blocking output from others" buffering
implemented in this n-th round (by the way, please use [PATCH v$n
N/M]) is worth doing, though.  It does not make the output machine
parseable, because the reader does not get any signal in what order
output of these subprocesses arrive anyway.  The payload does not
even have "here is the beginning of output from the process that
handled the submodule X" to delimit them.

My preference is still (1) leave standard error output all connected
to the same fd without multiplexing, and (2) line buffer standard
output so that the output is at least readable as a text, in a
similar way a log of an irc channel where everybody is talking at
the same time.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 17:12       ` Junio C Hamano
@ 2015-08-28 17:45         ` Stefan Beller
  2015-08-28 18:20         ` Jonathan Nieder
  1 sibling, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 17:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, git@vger.kernel.org, Jeff King,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 10:12 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>> Stefan Beller wrote:
>>> On Thu, Aug 27, 2015 at 6:14 PM, Stefan Beller <sbeller@google.com> wrote:
>>
>>>> This makes use of the new task queue and the syncing feature of
>>>> run-command to fetch a number of submodules at the same time.
>>>>
>>>> The output will look like it would have been run sequential,
>>>> but faster.
>>>
>>> And it breaks the tests t5526-fetch-submodules.sh as the output is done
>>> on stderr only, instead of putting "Fetching submodule <submodule-path>
>>> to stdout. :(
>>>
>>> I guess combining stdout and stderr is not a good strategy after all now.
>>
>> IMHO the "Fetching submodule <submodule-path>" output always should have
>> gone to stderr.  It is not output that scripts would be relying on ---
>> it is just progress output.
>>
>> So a preliminary patch doing that (and updating tests) would make sense
>> to me.
>
> Sounds good.
>
> I personally do not think the "we still do all the output from a
> single process while blocking output from others" buffering
> implemented in this n-th round (by the way, please use [PATCH v$n
> N/M]) is worth doing, though.  It does not make the output machine
> parseable, because the reader does not get any signal in what order
> output of these subprocesses arrive anyway.

> The payload does not
> even have "here is the beginning of output from the process that
> handled the submodule X" to delimit them.

Oh it does, but it is not handled by the buffering code, but the application
code, so for git-fetch we would have "Fetching submodule <path> "
while for git submodule foreach we would have "Entering <submodule path>"

>
> My preference is still (1) leave standard error output all connected
> to the same fd without multiplexing, and (2) line buffer standard
> output so that the output is at least readable as a text, in a
> similar way a log of an irc channel where everybody is talking at
> the same time.

In case of fetch passing the quiet flag becomes mandatory then,
because the the per-repo progress is put to stderr, which
also deletes previous output, to have the nice counters.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 17:12       ` Junio C Hamano
  2015-08-28 17:45         ` Stefan Beller
@ 2015-08-28 18:20         ` Jonathan Nieder
  2015-08-28 18:27           ` Junio C Hamano
  1 sibling, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2015-08-28 18:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git@vger.kernel.org, Jeff King,
	Johannes Schindelin

Junio C Hamano wrote:

> My preference is still (1) leave standard error output all connected
> to the same fd without multiplexing, and (2) line buffer standard
> output so that the output is at least readable as a text, in a
> similar way a log of an irc channel where everybody is talking at
> the same time.

There is something nice about the immediacy of seeing output from all
the subprocesses at the same time in that model.

But for commands that show progress like "git clone", "git checkout",
and "git fetch", it does not work well at all.  They provide output
that updates itself by putting a carriage return at the end of each
chunk of output, like this:

 remote: Finding sources:  11% (18/155)           \r
 remote: Finding sources:  12% (19/155)           \r

With multiple commands producing such output, they will overwrite each
other's lines, producing a mixture that is confusing and unuseful.

Even with --no-progress, there is a similar sense of confusion in the
intermixed output.  Error messages are hard to find.  This is a
comment complaint about the current "repo sync -j" implementation.

Ideally what I as a user want to see is something like what "prove"
writes, showing progress on the multiple tasks that are taking place
at once:

 ===(     103;1  0/?  8/?  3/?  11/?  6/?  16/?  1/?  1/? )==============

That would require more sophisticated inter-process communication than
seems necessary for the first version of parallel "git submodule
update".  For the first version that people use in the wild, showing
output from one of the tasks at a time, simulating a sped-up
sequential implementation, seems useful to me.

In the degenerate case where only one task is running, it reduces to
the current output.  The output is familiar and easy to use.  I quite
like the approach.

My two cents,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:20         ` Jonathan Nieder
@ 2015-08-28 18:27           ` Junio C Hamano
  2015-08-28 18:35             ` Jeff King
                               ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Junio C Hamano @ 2015-08-28 18:27 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, git@vger.kernel.org, Jeff King,
	Johannes Schindelin

Jonathan Nieder <jrnieder@gmail.com> writes:

> Junio C Hamano wrote:
>
>> My preference is still (1) leave standard error output all connected
>> to the same fd without multiplexing, and (2) line buffer standard
>> output so that the output is at least readable as a text, in a
>> similar way a log of an irc channel where everybody is talking at
>> the same time.
>
> There is something nice about the immediacy of seeing output from all
> the subprocesses at the same time in that model.
>
> But for commands that show progress like "git clone", "git checkout",
> and "git fetch", it does not work well at all.  They provide output
> that updates itself by putting a carriage return at the end of each
> chunk of output, like this:
>
>  remote: Finding sources:  11% (18/155)           \r
>  remote: Finding sources:  12% (19/155)           \r
>
> With multiple commands producing such output, they will overwrite each
> other's lines, producing a mixture that is confusing and unuseful.

That example also illustrates why it is not a useful to buffer all
of these lines and showing them once.

> Ideally what I as a user want to see is something like what "prove"
> writes, showing progress on the multiple tasks that are taking place
> at once:
>
>  ===(     103;1  0/?  8/?  3/?  11/?  6/?  16/?  1/?  1/? )==============

Tell me how that "buffer all and show them once" helps us to get
near that ideal.

> That would require more sophisticated inter-process communication than
> seems necessary for the first version of parallel "git submodule
> update".

Exactly.  Why waste memory to buffer and stall the entire output
from other processes in the interim solution, then?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:27           ` Junio C Hamano
@ 2015-08-28 18:35             ` Jeff King
  2015-08-28 18:41               ` Junio C Hamano
                                 ` (2 more replies)
  2015-08-28 18:36             ` Stefan Beller
  2015-08-28 18:42             ` Jonathan Nieder
  2 siblings, 3 replies; 33+ messages in thread
From: Jeff King @ 2015-08-28 18:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Stefan Beller, git@vger.kernel.org,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 11:27:04AM -0700, Junio C Hamano wrote:

> > But for commands that show progress like "git clone", "git checkout",
> > and "git fetch", it does not work well at all.  They provide output
> > that updates itself by putting a carriage return at the end of each
> > chunk of output, like this:
> >
> >  remote: Finding sources:  11% (18/155)           \r
> >  remote: Finding sources:  12% (19/155)           \r
> >
> > With multiple commands producing such output, they will overwrite each
> > other's lines, producing a mixture that is confusing and unuseful.
> 
> That example also illustrates why it is not a useful to buffer all
> of these lines and showing them once.

I think Jonathan's point is that you could pick _one_ active child to
show without buffering, while simultaneously buffering everybody else's
output. When that finishes, pick a new active child, show its buffer,
and then start showing its output in realtime. And so on.

So to an observer, it would look like a serial operation, but subsequent
operations after the first would magically go much faster (because
they'd been working and buffering in the background).

And that doesn't require any additional IPC magic (though I am not sure
how we get progress in the first place if the child stderr is a
pipe...).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:27           ` Junio C Hamano
  2015-08-28 18:35             ` Jeff King
@ 2015-08-28 18:36             ` Stefan Beller
  2015-08-28 18:42             ` Jonathan Nieder
  2 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 18:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, git@vger.kernel.org, Jeff King,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 11:27 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
>
>> Junio C Hamano wrote:
>>
>>> My preference is still (1) leave standard error output all connected
>>> to the same fd without multiplexing, and (2) line buffer standard
>>> output so that the output is at least readable as a text, in a
>>> similar way a log of an irc channel where everybody is talking at
>>> the same time.
>>
>> There is something nice about the immediacy of seeing output from all
>> the subprocesses at the same time in that model.
>>
>> But for commands that show progress like "git clone", "git checkout",
>> and "git fetch", it does not work well at all.  They provide output
>> that updates itself by putting a carriage return at the end of each
>> chunk of output, like this:
>>
>>  remote: Finding sources:  11% (18/155)           \r
>>  remote: Finding sources:  12% (19/155)           \r
>>
>> With multiple commands producing such output, they will overwrite each
>> other's lines, producing a mixture that is confusing and unuseful.
>
> That example also illustrates why it is not a useful to buffer all
> of these lines and showing them once.
>
>> Ideally what I as a user want to see is something like what "prove"
>> writes, showing progress on the multiple tasks that are taking place
>> at once:
>>
>>  ===(     103;1  0/?  8/?  3/?  11/?  6/?  16/?  1/?  1/? )==============
>
> Tell me how that "buffer all and show them once" helps us to get
> near that ideal.

It doesn't, but it looks better than the irc like everybody speaks at
the same time
solution IMHO.

At the very the user may not even be interested in the details similar
to what prove
provides. The client can estimate its own total progress by weighting
the progress
from all tasks.

>
>> That would require more sophisticated inter-process communication than
>> seems necessary for the first version of parallel "git submodule
>> update".
>
> Exactly.  Why waste memory to buffer and stall the entire output
> from other processes in the interim solution, then?
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:35             ` Jeff King
@ 2015-08-28 18:41               ` Junio C Hamano
  2015-08-28 18:41               ` Stefan Beller
  2015-08-28 18:44               ` Jonathan Nieder
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2015-08-28 18:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Stefan Beller, git@vger.kernel.org,
	Johannes Schindelin

Jeff King <peff@peff.net> writes:

> I think Jonathan's point is that you could pick _one_ active child to
> show without buffering, while simultaneously buffering everybody else's
> output. When that finishes, pick a new active child, show its buffer,
> and then start showing its output in realtime. And so on.
>
> So to an observer, it would look like a serial operation, but subsequent
> operations after the first would magically go much faster (because
> they'd been working and buffering in the background).

OK, I can buy that explanation.

> And that doesn't require any additional IPC magic (though I am not sure
> how we get progress in the first place if the child stderr is a
> pipe...).

That is a good point ;-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:35             ` Jeff King
  2015-08-28 18:41               ` Junio C Hamano
@ 2015-08-28 18:41               ` Stefan Beller
  2015-08-28 18:44                 ` Jeff King
  2015-08-28 18:44               ` Jonathan Nieder
  2 siblings, 1 reply; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 18:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Jonathan Nieder, git@vger.kernel.org,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 11:35 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Aug 28, 2015 at 11:27:04AM -0700, Junio C Hamano wrote:
>
>> > But for commands that show progress like "git clone", "git checkout",
>> > and "git fetch", it does not work well at all.  They provide output
>> > that updates itself by putting a carriage return at the end of each
>> > chunk of output, like this:
>> >
>> >  remote: Finding sources:  11% (18/155)           \r
>> >  remote: Finding sources:  12% (19/155)           \r
>> >
>> > With multiple commands producing such output, they will overwrite each
>> > other's lines, producing a mixture that is confusing and unuseful.
>>
>> That example also illustrates why it is not a useful to buffer all
>> of these lines and showing them once.
>
> I think Jonathan's point is that you could pick _one_ active child to
> show without buffering, while simultaneously buffering everybody else's
> output. When that finishes, pick a new active child, show its buffer,
> and then start showing its output in realtime. And so on.

or better yet, pick that child with the most progress (i.e. flush all finished
children and then pick the next active child), that would approximate
the progress in the output best, as it would reduce the hidden
buffered progress.

>
> So to an observer, it would look like a serial operation, but subsequent
> operations after the first would magically go much faster (because
> they'd been working and buffering in the background).
>
> And that doesn't require any additional IPC magic (though I am not sure
> how we get progress in the first place if the child stderr is a
> pipe...).

Moving the contents from the pipe to a strbuf buffer which we can grow
indefinitely
(way larger than pipe limits, but the output of a git fetch should be
small enough for that).



>
> -Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:27           ` Junio C Hamano
  2015-08-28 18:35             ` Jeff King
  2015-08-28 18:36             ` Stefan Beller
@ 2015-08-28 18:42             ` Jonathan Nieder
  2 siblings, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2015-08-28 18:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, git@vger.kernel.org, Jeff King,
	Johannes Schindelin

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>>  remote: Finding sources:  11% (18/155)           \r
>>  remote: Finding sources:  12% (19/155)           \r
>>
>> With multiple commands producing such output, they will overwrite each
>> other's lines, producing a mixture that is confusing and unuseful.
>
> That example also illustrates why it is not a useful to buffer all
> of these lines and showing them once.

I don't completely follow.  Are you referring to the wasted memory to
store the line that is going to be written and rewritten or some other
aspect?

Today (without parallelism), if I clone a repository with multiple
submodules and walk away from the terminal, then when I get back I see

	Cloning into 'plugins/cookbook-plugin'...
	remote: Counting objects: 36, done
	remote: Finding sources: 100% (36/36)
	remote: Total 1192 (delta 196), reused 1192 (delta 196)
	Receiving objects: 100% (1192/1192), 239.46 KiB | 0 bytes/s, done.
	Resolving deltas: 100% (196/196), done.
	Checking connectivity... done.
	Submodule path 'plugins/cookbook-plugin': checked out 'b9d3ca8a65030071e28be19296ba867ab424fbbf'
	Cloning into 'plugins/download-commands'...
	remote: Counting objects: 37, done
	remote: Finding sources: 100% (37/37)
	remote: Total 448 (delta 46), reused 448 (delta 46)
	Receiving objects: 100% (448/448), 96.13 KiB | 0 bytes/s, done.
	Resolving deltas: 100% (46/46), done.
	Checking connectivity... done.
	Submodule path 'plugins/download-commands': checked out '99e61fb06a4505a9558c23a56213cb32ceaa9cca'
	...

The output for each submodule is in one chunk and I can understand what
happened in each.

By contrast, with inter-mixing the speed of output tells you something
about whether the process is stalled while you stare at the screen and
wait for it to finish (good) but the result on the screen is
unintelligible when the process finishes (bad).

Showing interactive output for one task still provides the real-time
feedback about whether it is completely stuck and needs to be cancelled.
It is easier to make sense of even from the point of view of real-time
progress than the intermixed output.

What problem is the intermixed output meant to solve?

In other words, what is your objection about?  Perhaps there is a way to
both satisfy that objection and to have output clumped per task, but it
is hard to know without knowing what the objection is.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:41               ` Stefan Beller
@ 2015-08-28 18:44                 ` Jeff King
  2015-08-28 18:50                   ` Jonathan Nieder
  2015-08-28 18:59                   ` Stefan Beller
  0 siblings, 2 replies; 33+ messages in thread
From: Jeff King @ 2015-08-28 18:44 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Jonathan Nieder, git@vger.kernel.org,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 11:41:17AM -0700, Stefan Beller wrote:

> > So to an observer, it would look like a serial operation, but subsequent
> > operations after the first would magically go much faster (because
> > they'd been working and buffering in the background).
> >
> > And that doesn't require any additional IPC magic (though I am not sure
> > how we get progress in the first place if the child stderr is a
> > pipe...).
> 
> Moving the contents from the pipe to a strbuf buffer which we can grow
> indefinitely
> (way larger than pipe limits, but the output of a git fetch should be
> small enough for that).

Right, clearly we can't rely on pipe buffers to be large enough here
(though we _may_ want to rely on tempfiles if we aren't sure that the
stdout is bounded in a reasonable way).

But what I meant was: the child will only show progress if stderr is a
tty, but here it is not.

I wonder if we need to set GIT_STDERR_IS_TTY=1 in the parent process,
and then respect it in the children (this is similar to what
GIT_PAGER_IN_USE does for stdout).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:35             ` Jeff King
  2015-08-28 18:41               ` Junio C Hamano
  2015-08-28 18:41               ` Stefan Beller
@ 2015-08-28 18:44               ` Jonathan Nieder
  2 siblings, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2015-08-28 18:44 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Stefan Beller, git@vger.kernel.org,
	Johannes Schindelin

Jeff King wrote:

> I think Jonathan's point is that you could pick _one_ active child to
> show without buffering, while simultaneously buffering everybody else's
> output.

Yep.  Thanks for interpreting.

[...]
> So to an observer, it would look like a serial operation, but subsequent
> operations after the first would magically go much faster (because
> they'd been working and buffering in the background).

Yes.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:44                 ` Jeff King
@ 2015-08-28 18:50                   ` Jonathan Nieder
  2015-08-28 18:53                     ` Jeff King
  2015-08-28 18:59                   ` Stefan Beller
  1 sibling, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2015-08-28 18:50 UTC (permalink / raw)
  To: Jeff King
  Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org,
	Johannes Schindelin

Jeff King wrote:

> Right, clearly we can't rely on pipe buffers to be large enough here
> (though we _may_ want to rely on tempfiles if we aren't sure that the
> stdout is bounded in a reasonable way).
>
> But what I meant was: the child will only show progress if stderr is a
> tty, but here it is not.

For clone / fetch, we can pass --progress explicitly.

For some reason 'git checkout' doesn't support a --progress option.  I
suppose it should. ;-)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:50                   ` Jonathan Nieder
@ 2015-08-28 18:53                     ` Jeff King
  2015-08-28 19:02                       ` Stefan Beller
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2015-08-28 18:53 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 11:50:50AM -0700, Jonathan Nieder wrote:

> > But what I meant was: the child will only show progress if stderr is a
> > tty, but here it is not.
> 
> For clone / fetch, we can pass --progress explicitly.
> 
> For some reason 'git checkout' doesn't support a --progress option.  I
> suppose it should. ;-)

Yeah, that will work for those tools, but I thought you could pass
arbitrary shell commands.  It would be nice if git sub-commands run
through those just magically worked, even though we don't have an
opportunity to change their command-line parameters.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:44                 ` Jeff King
  2015-08-28 18:50                   ` Jonathan Nieder
@ 2015-08-28 18:59                   ` Stefan Beller
  1 sibling, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 18:59 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Jonathan Nieder, git@vger.kernel.org,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 11:44 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Aug 28, 2015 at 11:41:17AM -0700, Stefan Beller wrote:
>
>> > So to an observer, it would look like a serial operation, but subsequent
>> > operations after the first would magically go much faster (because
>> > they'd been working and buffering in the background).
>> >
>> > And that doesn't require any additional IPC magic (though I am not sure
>> > how we get progress in the first place if the child stderr is a
>> > pipe...).
>>
>> Moving the contents from the pipe to a strbuf buffer which we can grow
>> indefinitely
>> (way larger than pipe limits, but the output of a git fetch should be
>> small enough for that).
>
> Right, clearly we can't rely on pipe buffers to be large enough here
> (though we _may_ want to rely on tempfiles if we aren't sure that the
> stdout is bounded in a reasonable way).
>
> But what I meant was: the child will only show progress if stderr is a
> tty, but here it is not.

Oh, I forgot about that.

>
> I wonder if we need to set GIT_STDERR_IS_TTY=1 in the parent process,
> and then respect it in the children (this is similar to what
> GIT_PAGER_IN_USE does for stdout).

The use of GIT_PAGER_IN_USE looks straightforward to me. I'll try to add
GIT_STDERR_IS_TTY then.


>
> -Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28 18:53                     ` Jeff King
@ 2015-08-28 19:02                       ` Stefan Beller
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Beller @ 2015-08-28 19:02 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Junio C Hamano, git@vger.kernel.org,
	Johannes Schindelin

On Fri, Aug 28, 2015 at 11:53 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Aug 28, 2015 at 11:50:50AM -0700, Jonathan Nieder wrote:
>
>> > But what I meant was: the child will only show progress if stderr is a
>> > tty, but here it is not.
>>
>> For clone / fetch, we can pass --progress explicitly.
>>
>> For some reason 'git checkout' doesn't support a --progress option.  I
>> suppose it should. ;-)
>
> Yeah, that will work for those tools, but I thought you could pass
> arbitrary shell commands.  It would be nice if git sub-commands run
> through those just magically worked, even though we don't have an
> opportunity to change their command-line parameters.

Technically speaking this discusses the patch for fetch only and we'd
want to have that discussion at the previous patch. ;)
But as both of them use the same code, the sync feature of run-command,
which is in patch 5/9, we want to have git commands just work anyway.

>
> -Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/9] submodule: implement `module_clone` as a builtin helper
  2015-08-28  1:14 ` [PATCH 3/9] submodule: implement `module_clone` " Stefan Beller
@ 2015-08-31 18:53   ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2015-08-31 18:53 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff, jrnieder, johannes.schindelin

Stefan Beller <sbeller@google.com> writes:

> +		char *s = (char*)sm_gitdir;

	char *s = (char *)sm_gitdir;

Many other places have similar, also in 7/9.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-28  1:14 ` [PATCH 7/9] fetch: fetch submodules " Stefan Beller
  2015-08-28 17:00   ` Stefan Beller
@ 2015-08-31 18:56   ` Junio C Hamano
  2015-08-31 19:05     ` Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2015-08-31 18:56 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, peff, jrnieder, johannes.schindelin

Stefan Beller <sbeller@google.com> writes:

> +static void destroy_output_mutex()

static void destroy_output_mutex(void)

> +{
> +	pthread_mutex_destroy(&output_mutex);
> +}
> +
> +#else
> +#define set_output_mutex()
> +#define destroy_output_mutex()
> +#define lock_output_mutex()
> +#define unlock_output_mutex()
> +#endif
> +
> +static struct submodule_parallel_fetch *submodule_parallel_fetch_create()

static struct submodule_parallel_fetch *submodule_parallel_fetch_create(void)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] fetch: fetch submodules in parallel
  2015-08-31 18:56   ` Junio C Hamano
@ 2015-08-31 19:05     ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2015-08-31 19:05 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git

On Mon, Aug 31, 2015 at 11:56:04AM -0700, Junio C Hamano wrote:

> Stefan Beller <sbeller@google.com> writes:
> 
> > +static void destroy_output_mutex()
> 
> static void destroy_output_mutex(void)

Yep. Stefan, you may want to beef up the warning flags in your
config.mak. For reference, I use:

  CFLAGS += -Wall -Werror
  CFLAGS += -Wdeclaration-after-statement
  CFLAGS += -Wpointer-arith
  CFLAGS += -Wstrict-prototypes
  CFLAGS += -Wvla
  CFLAGS += -Wold-style-declaration
  CFLAGS += -Wold-style-definition

though note that you will need to relax some of those if compiling older
versions of git. My complete config.mak is at:

  https://github.com/peff/git/blob/meta/config/config.mak

which handles this semi-automatically (it's wildly undocumented, but I'd
be happy to explain any of it if anybody is interested).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-08-31 19:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
2015-08-28  1:14 ` [PATCH 1/9] submodule: implement `module_list` as a builtin helper Stefan Beller
2015-08-28  1:14 ` [PATCH 2/9] submodule: implement `module_name` " Stefan Beller
2015-08-28  1:14 ` [PATCH 3/9] submodule: implement `module_clone` " Stefan Beller
2015-08-31 18:53   ` Junio C Hamano
2015-08-28  1:14 ` [PATCH 4/9] thread-utils: add a threaded task queue Stefan Beller
2015-08-28  1:14 ` [PATCH 5/9] run-command: add synced output Stefan Beller
2015-08-28  1:14 ` [PATCH 6/9] submodule: helper to run foreach in parallel Stefan Beller
2015-08-28 17:08   ` Stefan Beller
2015-08-28  1:14 ` [PATCH 7/9] fetch: fetch submodules " Stefan Beller
2015-08-28 17:00   ` Stefan Beller
2015-08-28 17:01     ` Jonathan Nieder
2015-08-28 17:12       ` Junio C Hamano
2015-08-28 17:45         ` Stefan Beller
2015-08-28 18:20         ` Jonathan Nieder
2015-08-28 18:27           ` Junio C Hamano
2015-08-28 18:35             ` Jeff King
2015-08-28 18:41               ` Junio C Hamano
2015-08-28 18:41               ` Stefan Beller
2015-08-28 18:44                 ` Jeff King
2015-08-28 18:50                   ` Jonathan Nieder
2015-08-28 18:53                     ` Jeff King
2015-08-28 19:02                       ` Stefan Beller
2015-08-28 18:59                   ` Stefan Beller
2015-08-28 18:44               ` Jonathan Nieder
2015-08-28 18:36             ` Stefan Beller
2015-08-28 18:42             ` Jonathan Nieder
2015-08-31 18:56   ` Junio C Hamano
2015-08-31 19:05     ` Jeff King
2015-08-28  1:14 ` [PATCH 8/9] index-pack: Use the new worker pool Stefan Beller
2015-08-28  1:14 ` [PATCH 9/9] pack-objects: Use " Stefan Beller
2015-08-28 10:09 ` [PATCH 0/9] Progress with git submodule Johannes Schindelin
2015-08-28 16:35   ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).