git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/1] fetch --multiple: respect --jobs=
@ 2019-10-01 11:53 Johannes Schindelin via GitGitGadget
  2019-10-01 11:53 ` [PATCH 1/1] fetch: let --jobs=<n> parallelize --multiple, too Johannes Schindelin via GitGitGadget
  2019-10-05 18:46 ` [PATCH v2 0/1] fetch --multiple: respect --jobs= Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-10-01 11:53 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano

I saw with sadness that pd/fetch-jobs went nowhere, and read in the most
recent What's Cooking mail that it was even dropped.

This is my attempt to resurrect the idea (although without the overhead of
trying to support a first-class UI to control submodule and multiple-remote
fetches independently, of which I was a rather outspoken opponent).

To make things a bit safer, this patch uses the --end-of-options marker, and
is therefore based on top of jk/eoo.

Johannes Schindelin (1):
  fetch: let --jobs=<n> parallelize --multiple, too

 Documentation/config/fetch.txt  |  10 +++
 Documentation/fetch-options.txt |  13 ++--
 builtin/fetch.c                 | 124 +++++++++++++++++++++++++++-----
 t/t5514-fetch-multiple.sh       |  11 +++
 4 files changed, 137 insertions(+), 21 deletions(-)


base-commit: 67feca3b1c45a51b204253039139b46cc07e145f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-369%2Fdscho%2Ffetch-jobs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-369/dscho/fetch-jobs-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/369
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1] fetch: let --jobs=<n> parallelize --multiple, too
  2019-10-01 11:53 [PATCH 0/1] fetch --multiple: respect --jobs= Johannes Schindelin via GitGitGadget
@ 2019-10-01 11:53 ` Johannes Schindelin via GitGitGadget
  2019-10-04  4:33   ` Junio C Hamano
  2019-10-05 18:46 ` [PATCH v2 0/1] fetch --multiple: respect --jobs= Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-10-01 11:53 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

So far, `--jobs=<n>` only parallelizes submodule fetches/clones, not
`--multiple` fetches, which is unintuitive, given that the option's name
does not say anything about submodules in particular.

Let's change that. With this patch, also fetches from multiple remotes
are parallelized.

For backwards-compatibility (and to prepare for a use case where
submodule and multiple-remote fetches may need different parallelization
limits), the config setting `submodule.fetchJobs` still only controls
the submodule part of `git fetch`, while the newly-introduced setting
`fetch.parallel` controls both (but can be overridden for submodules
with `submodule.fetchJobs`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/config/fetch.txt  |  10 +++
 Documentation/fetch-options.txt |  13 ++--
 builtin/fetch.c                 | 124 +++++++++++++++++++++++++++-----
 t/t5514-fetch-multiple.sh       |  11 +++
 4 files changed, 137 insertions(+), 21 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index ba890b5884..b200634065 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -68,3 +68,13 @@ fetch.showForcedUpdates::
 	Set to false to enable `--no-show-forced-updates` in
 	linkgit:git-fetch[1] and linkgit:git-pull[1] commands.
 	Defaults to true.
+
+fetch.parallel::
+	Specifies the maximal number of fetch operations to be run in parallel
+	at a time (submodules, or remotes when the `--multiple` option of
+	linkgit:git-fetch[1] is in effect).
++
+A value of 0 will give some reasonable default. If unset, it defaults to 1.
++
+For submodules, this setting can be overridden using the `submodule.fetchJobs`
+config setting.
diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 3c9b4f9e09..8f269d3baa 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -160,10 +160,15 @@ ifndef::git-pull[]
 
 -j::
 --jobs=<n>::
-	Number of parallel children to be used for fetching submodules.
-	Each will fetch from different submodules, such that fetching many
-	submodules will be faster. By default submodules will be fetched
-	one at a time.
+	Number of parallel children to be used for all forms of fetching.
++
+If the `--multiple` option was specified, the different remotes will be fetched
+in parallel. If multiple submodules are fetched, they will be fetched in
+parallel. To control them independently, use the config settings
+`fetch.parallel` and `submodule.fetchJobs` (see linkgit:git-config[1]).
++
+Typically, parallel recursive and multi-remote fetches will be faster. By
+default fetches are performed sequentially, not in parallel.
 
 --no-recurse-submodules::
 	Disable recursive fetching of submodules (this has the same effect as
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 53ce99d2bb..e2d374724d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -54,7 +54,8 @@ static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosit
 static int progress = -1;
 static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
-static int max_children = 1;
+static int max_jobs = -1, submodule_fetch_jobs_config = -1;
+static int fetch_parallel_config = 1;
 static enum transport_family family;
 static const char *depth;
 static const char *deepen_since;
@@ -96,13 +97,20 @@ static int git_fetch_config(const char *k, const char *v, void *cb)
 	}
 
 	if (!strcmp(k, "submodule.fetchjobs")) {
-		max_children = parse_submodule_fetchjobs(k, v);
+		submodule_fetch_jobs_config = parse_submodule_fetchjobs(k, v);
 		return 0;
 	} else if (!strcmp(k, "fetch.recursesubmodules")) {
 		recurse_submodules = parse_fetch_recurse_submodules_arg(k, v);
 		return 0;
 	}
 
+	if (!strcmp(k, "fetch.parallel")) {
+		fetch_parallel_config = git_config_int(k, v);
+		if (fetch_parallel_config < 0)
+			die(_("fetch.parallel cannot be negative"));
+		return 0;
+	}
+
 	return git_default_config(k, v, cb);
 }
 
@@ -134,7 +142,7 @@ static struct option builtin_fetch_options[] = {
 		    N_("fetch all tags and associated objects"), TAGS_SET),
 	OPT_SET_INT('n', NULL, &tags,
 		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
-	OPT_INTEGER('j', "jobs", &max_children,
+	OPT_INTEGER('j', "jobs", &max_jobs,
 		    N_("number of submodules fetched in parallel")),
 	OPT_BOOL('p', "prune", &prune,
 		 N_("prune remote-tracking branches no longer on remote")),
@@ -1456,7 +1464,62 @@ static void add_options_to_argv(struct argv_array *argv)
 
 }
 
-static int fetch_multiple(struct string_list *list)
+/* Fetch multiple remotes in parallel */
+
+struct parallel_fetch_state {
+	const char **argv;
+	struct string_list *remotes;
+	int next, result;
+};
+
+static int fetch_next_remote(struct child_process *cp, struct strbuf *out,
+			     void *cb, void **task_cb)
+{
+	struct parallel_fetch_state *state = cb;
+	char *remote;
+
+	if (state->next < 0 || state->next >= state->remotes->nr)
+		return 0;
+
+	remote = state->remotes->items[state->next++].string;
+	*task_cb = remote;
+
+	argv_array_pushv(&cp->args, state->argv);
+	argv_array_push(&cp->args, remote);
+	cp->git_cmd = 1;
+
+	if (verbosity >= 0)
+		printf(_("Fetching %s\n"), remote);
+
+	return 1;
+}
+
+static int fetch_failed_to_start(struct strbuf *out, void *cb, void *task_cb)
+{
+	struct parallel_fetch_state *state = cb;
+	const char *remote = task_cb;
+
+	state->result = error(_("Could not fetch %s"), remote);
+
+	return 0;
+}
+
+static int fetch_finished(int result, struct strbuf *out,
+			  void *cb, void *task_cb)
+{
+	struct parallel_fetch_state *state = cb;
+	const char *remote = task_cb;
+
+	if (result) {
+		strbuf_addf(out, _("could not fetch '%s' (exit code: %d)\n"),
+			    remote, result);
+		state->result = -1;
+	}
+
+	return 0;
+}
+
+static int fetch_multiple(struct string_list *list, int max_children)
 {
 	int i, result = 0;
 	struct argv_array argv = ARGV_ARRAY_INIT;
@@ -1470,20 +1533,34 @@ static int fetch_multiple(struct string_list *list)
 	argv_array_pushl(&argv, "fetch", "--append", "--no-auto-gc", NULL);
 	add_options_to_argv(&argv);
 
-	for (i = 0; i < list->nr; i++) {
-		const char *name = list->items[i].string;
-		argv_array_push(&argv, name);
-		if (verbosity >= 0)
-			printf(_("Fetching %s\n"), name);
-		if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
-			error(_("Could not fetch %s"), name);
-			result = 1;
+	if (max_children != 1 && list->nr != 1) {
+		struct parallel_fetch_state state = { argv.argv, list, 0, 0 };
+
+		argv_array_push(&argv, "--end-of-options");
+		result = run_processes_parallel_tr2(max_children,
+						    &fetch_next_remote,
+						    &fetch_failed_to_start,
+						    &fetch_finished,
+						    &state,
+						    "fetch", "parallel/fetch");
+
+		if (!result)
+			result = state.result;
+	} else
+		for (i = 0; i < list->nr; i++) {
+			const char *name = list->items[i].string;
+			argv_array_push(&argv, name);
+			if (verbosity >= 0)
+				printf(_("Fetching %s\n"), name);
+			if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
+				error(_("Could not fetch %s"), name);
+				result = 1;
+			}
+			argv_array_pop(&argv);
 		}
-		argv_array_pop(&argv);
-	}
 
 	argv_array_clear(&argv);
-	return result;
+	return !!result;
 }
 
 /*
@@ -1626,7 +1703,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	for (i = 1; i < argc; i++)
 		strbuf_addf(&default_rla, " %s", argv[i]);
 
-	fetch_config_from_gitmodules(&max_children, &recurse_submodules);
+	fetch_config_from_gitmodules(&submodule_fetch_jobs_config,
+				     &recurse_submodules);
 	git_config(git_fetch_config, NULL);
 
 	argc = parse_options(argc, argv, prefix,
@@ -1692,15 +1770,27 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			fetch_one_setup_partial(remote);
 		result = fetch_one(remote, argc, argv, prune_tags_ok);
 	} else {
+		int max_children = max_jobs;
+
 		if (filter_options.choice)
 			die(_("--filter can only be used with the remote "
 			      "configured in extensions.partialclone"));
+
+		if (max_children < 0)
+			max_children = fetch_parallel_config;
+
 		/* TODO should this also die if we have a previous partial-clone? */
-		result = fetch_multiple(&list);
+		result = fetch_multiple(&list, max_children);
 	}
 
 	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct argv_array options = ARGV_ARRAY_INIT;
+		int max_children = max_jobs;
+
+		if (max_children < 0)
+			max_children = submodule_fetch_jobs_config;
+		if (max_children < 0)
+			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
 		result = fetch_populated_submodules(the_repository,
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index 5426d4b5ab..cce829b989 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -183,4 +183,15 @@ test_expect_success 'git fetch --all --tags' '
 	test_cmp expect test8/output
 '
 
+test_expect_success 'parallel' '
+	git remote add one ./bogus1 &&
+	git remote add two ./bogus2 &&
+
+	test_must_fail env GIT_TRACE="$PWD/trace" \
+		git fetch --jobs=2 --multiple one two 2>err &&
+	grep "2 tasks" trace &&
+	grep "one.*128" err &&
+	grep "two.*128" err
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] fetch: let --jobs=<n> parallelize --multiple, too
  2019-10-01 11:53 ` [PATCH 1/1] fetch: let --jobs=<n> parallelize --multiple, too Johannes Schindelin via GitGitGadget
@ 2019-10-04  4:33   ` Junio C Hamano
  2019-10-04 22:03     ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2019-10-04  4:33 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +test_expect_success 'parallel' '
> +	git remote add one ./bogus1 &&
> +	git remote add two ./bogus2 &&
> +
> +	test_must_fail env GIT_TRACE="$PWD/trace" \
> +		git fetch --jobs=2 --multiple one two 2>err &&
> +	grep "2 tasks" trace &&

I think this one expects to match this in run-command.c:

	trace_printf("run_processes_parallel: preparing to run up to %d tasks", n);

> +	grep "one.*128" err &&
> +	grep "two.*128" err

and these expect to match this in fetch.c

		strbuf_addf(out, _("could not fetch '%s' (exit code: %d)\n"),

It would have been nice to fellow contributors, if the grep patterns
were written a bit more tightly.  It would allow people who debug
test failure to more easily identify which message the patterns are
trying to catch.

In any case, the latter two needs to be guarded against
gettext-poison, I would think.  Without addressing the vagueness of
the pattern, at least the following needs to be squashed to help the
CI.

Thanks.

---
 t/t5514-fetch-multiple.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index cce829b989..33f5220a53 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -190,8 +190,8 @@ test_expect_success 'parallel' '
 	test_must_fail env GIT_TRACE="$PWD/trace" \
 		git fetch --jobs=2 --multiple one two 2>err &&
 	grep "2 tasks" trace &&
-	grep "one.*128" err &&
-	grep "two.*128" err
+	test_i18ngrep "one.*128" err &&
+	test_i18ngrep "two.*128" err
 '
 
 test_done
-- 
2.23.0-686-g3bf927a9c0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] fetch: let --jobs=<n> parallelize --multiple, too
  2019-10-04  4:33   ` Junio C Hamano
@ 2019-10-04 22:03     ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2019-10-04 22:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git

Hi Junio,

On Fri, 4 Oct 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +test_expect_success 'parallel' '
> > +	git remote add one ./bogus1 &&
> > +	git remote add two ./bogus2 &&
> > +
> > +	test_must_fail env GIT_TRACE="$PWD/trace" \
> > +		git fetch --jobs=2 --multiple one two 2>err &&
> > +	grep "2 tasks" trace &&
>
> I think this one expects to match this in run-command.c:
>
> 	trace_printf("run_processes_parallel: preparing to run up to %d tasks", n);
>
> > +	grep "one.*128" err &&
> > +	grep "two.*128" err
>
> and these expect to match this in fetch.c
>
> 		strbuf_addf(out, _("could not fetch '%s' (exit code: %d)\n"),
>
> It would have been nice to fellow contributors, if the grep patterns
> were written a bit more tightly.  It would allow people who debug
> test failure to more easily identify which message the patterns are
> trying to catch.

This is a two-edged sword: when those messages change (for whatever
reason), the regression test will fail, too, but it actually wants to
test the parallel fetch, not the trace message of
`run_processes_parallel`.

So I tried to prevent such an unactionable regression test failure. But
I see your reasoning, and I now thought about it and consider those
error messages to be rather stable.

Will fix.

> In any case, the latter two needs to be guarded against
> gettext-poison, I would think.  Without addressing the vagueness of
> the pattern, at least the following needs to be squashed to help the
> CI.

Indeed. I missed this because the GitGitGadget PR build was all green.
My guess is that I messed up the definition of that PR build (it is
_not_ what's in `azure-pipelines.yml` because that would not work
correctly when PRs target older commits). I _think_ I fixed it by
setting `export GIT_TEST_GETTEXT_POISON=true` explicitly (previously I
only set the `jobname`, expecting `ci/run-build-and-tests.sh` to pick up
on that.

Thanks,
Dscho

>
> Thanks.
>
> ---
>  t/t5514-fetch-multiple.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
> index cce829b989..33f5220a53 100755
> --- a/t/t5514-fetch-multiple.sh
> +++ b/t/t5514-fetch-multiple.sh
> @@ -190,8 +190,8 @@ test_expect_success 'parallel' '
>  	test_must_fail env GIT_TRACE="$PWD/trace" \
>  		git fetch --jobs=2 --multiple one two 2>err &&
>  	grep "2 tasks" trace &&
> -	grep "one.*128" err &&
> -	grep "two.*128" err
> +	test_i18ngrep "one.*128" err &&
> +	test_i18ngrep "two.*128" err
>  '
>
>  test_done
> --
> 2.23.0-686-g3bf927a9c0
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 0/1] fetch --multiple: respect --jobs=
  2019-10-01 11:53 [PATCH 0/1] fetch --multiple: respect --jobs= Johannes Schindelin via GitGitGadget
  2019-10-01 11:53 ` [PATCH 1/1] fetch: let --jobs=<n> parallelize --multiple, too Johannes Schindelin via GitGitGadget
@ 2019-10-05 18:46 ` Johannes Schindelin via GitGitGadget
  2019-10-05 18:46   ` [PATCH v2 1/1] fetch: let --jobs=<n> parallelize --multiple, too Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-10-05 18:46 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano

I saw with sadness that pd/fetch-jobs went nowhere, and read in the most
recent What's Cooking mail that it was even dropped.

This is my attempt to resurrect the idea (although without the overhead of
trying to support a first-class UI to control submodule and multiple-remote
fetches independently, of which I was a rather outspoken opponent).

To make things a bit safer, this patch uses the --end-of-options marker, and
is therefore based on top of jk/eoo.

Changes since v1:

 * The regression test now passes even under GETTEXT_POISON.
 * The needles used in the regression test are now more indicative of the
   code producing them.

Johannes Schindelin (1):
  fetch: let --jobs=<n> parallelize --multiple, too

 Documentation/config/fetch.txt  |  10 +++
 Documentation/fetch-options.txt |  13 ++--
 builtin/fetch.c                 | 124 +++++++++++++++++++++++++++-----
 t/t5514-fetch-multiple.sh       |  11 +++
 4 files changed, 137 insertions(+), 21 deletions(-)


base-commit: 67feca3b1c45a51b204253039139b46cc07e145f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-369%2Fdscho%2Ffetch-jobs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-369/dscho/fetch-jobs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/369

Range-diff vs v1:

 1:  818936f1e0 ! 1:  93a155a000 fetch: let --jobs=<n> parallelize --multiple, too
     @@ -267,9 +267,9 @@
      +
      +	test_must_fail env GIT_TRACE="$PWD/trace" \
      +		git fetch --jobs=2 --multiple one two 2>err &&
     -+	grep "2 tasks" trace &&
     -+	grep "one.*128" err &&
     -+	grep "two.*128" err
     ++	grep "preparing to run up to 2 tasks" trace &&
     ++	test_i18ngrep "could not fetch .one.*128" err &&
     ++	test_i18ngrep "could not fetch .two.*128" err
      +'
      +
       test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/1] fetch: let --jobs=<n> parallelize --multiple, too
  2019-10-05 18:46 ` [PATCH v2 0/1] fetch --multiple: respect --jobs= Johannes Schindelin via GitGitGadget
@ 2019-10-05 18:46   ` Johannes Schindelin via GitGitGadget
       [not found]     ` <xmqqftk67r6j.fsf@gitster-ct.c.googlers.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-10-05 18:46 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

So far, `--jobs=<n>` only parallelizes submodule fetches/clones, not
`--multiple` fetches, which is unintuitive, given that the option's name
does not say anything about submodules in particular.

Let's change that. With this patch, also fetches from multiple remotes
are parallelized.

For backwards-compatibility (and to prepare for a use case where
submodule and multiple-remote fetches may need different parallelization
limits), the config setting `submodule.fetchJobs` still only controls
the submodule part of `git fetch`, while the newly-introduced setting
`fetch.parallel` controls both (but can be overridden for submodules
with `submodule.fetchJobs`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Documentation/config/fetch.txt  |  10 +++
 Documentation/fetch-options.txt |  13 ++--
 builtin/fetch.c                 | 124 +++++++++++++++++++++++++++-----
 t/t5514-fetch-multiple.sh       |  11 +++
 4 files changed, 137 insertions(+), 21 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index ba890b5884..b200634065 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -68,3 +68,13 @@ fetch.showForcedUpdates::
 	Set to false to enable `--no-show-forced-updates` in
 	linkgit:git-fetch[1] and linkgit:git-pull[1] commands.
 	Defaults to true.
+
+fetch.parallel::
+	Specifies the maximal number of fetch operations to be run in parallel
+	at a time (submodules, or remotes when the `--multiple` option of
+	linkgit:git-fetch[1] is in effect).
++
+A value of 0 will give some reasonable default. If unset, it defaults to 1.
++
+For submodules, this setting can be overridden using the `submodule.fetchJobs`
+config setting.
diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 3c9b4f9e09..8f269d3baa 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -160,10 +160,15 @@ ifndef::git-pull[]
 
 -j::
 --jobs=<n>::
-	Number of parallel children to be used for fetching submodules.
-	Each will fetch from different submodules, such that fetching many
-	submodules will be faster. By default submodules will be fetched
-	one at a time.
+	Number of parallel children to be used for all forms of fetching.
++
+If the `--multiple` option was specified, the different remotes will be fetched
+in parallel. If multiple submodules are fetched, they will be fetched in
+parallel. To control them independently, use the config settings
+`fetch.parallel` and `submodule.fetchJobs` (see linkgit:git-config[1]).
++
+Typically, parallel recursive and multi-remote fetches will be faster. By
+default fetches are performed sequentially, not in parallel.
 
 --no-recurse-submodules::
 	Disable recursive fetching of submodules (this has the same effect as
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 53ce99d2bb..e2d374724d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -54,7 +54,8 @@ static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosit
 static int progress = -1;
 static int enable_auto_gc = 1;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow, deepen;
-static int max_children = 1;
+static int max_jobs = -1, submodule_fetch_jobs_config = -1;
+static int fetch_parallel_config = 1;
 static enum transport_family family;
 static const char *depth;
 static const char *deepen_since;
@@ -96,13 +97,20 @@ static int git_fetch_config(const char *k, const char *v, void *cb)
 	}
 
 	if (!strcmp(k, "submodule.fetchjobs")) {
-		max_children = parse_submodule_fetchjobs(k, v);
+		submodule_fetch_jobs_config = parse_submodule_fetchjobs(k, v);
 		return 0;
 	} else if (!strcmp(k, "fetch.recursesubmodules")) {
 		recurse_submodules = parse_fetch_recurse_submodules_arg(k, v);
 		return 0;
 	}
 
+	if (!strcmp(k, "fetch.parallel")) {
+		fetch_parallel_config = git_config_int(k, v);
+		if (fetch_parallel_config < 0)
+			die(_("fetch.parallel cannot be negative"));
+		return 0;
+	}
+
 	return git_default_config(k, v, cb);
 }
 
@@ -134,7 +142,7 @@ static struct option builtin_fetch_options[] = {
 		    N_("fetch all tags and associated objects"), TAGS_SET),
 	OPT_SET_INT('n', NULL, &tags,
 		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
-	OPT_INTEGER('j', "jobs", &max_children,
+	OPT_INTEGER('j', "jobs", &max_jobs,
 		    N_("number of submodules fetched in parallel")),
 	OPT_BOOL('p', "prune", &prune,
 		 N_("prune remote-tracking branches no longer on remote")),
@@ -1456,7 +1464,62 @@ static void add_options_to_argv(struct argv_array *argv)
 
 }
 
-static int fetch_multiple(struct string_list *list)
+/* Fetch multiple remotes in parallel */
+
+struct parallel_fetch_state {
+	const char **argv;
+	struct string_list *remotes;
+	int next, result;
+};
+
+static int fetch_next_remote(struct child_process *cp, struct strbuf *out,
+			     void *cb, void **task_cb)
+{
+	struct parallel_fetch_state *state = cb;
+	char *remote;
+
+	if (state->next < 0 || state->next >= state->remotes->nr)
+		return 0;
+
+	remote = state->remotes->items[state->next++].string;
+	*task_cb = remote;
+
+	argv_array_pushv(&cp->args, state->argv);
+	argv_array_push(&cp->args, remote);
+	cp->git_cmd = 1;
+
+	if (verbosity >= 0)
+		printf(_("Fetching %s\n"), remote);
+
+	return 1;
+}
+
+static int fetch_failed_to_start(struct strbuf *out, void *cb, void *task_cb)
+{
+	struct parallel_fetch_state *state = cb;
+	const char *remote = task_cb;
+
+	state->result = error(_("Could not fetch %s"), remote);
+
+	return 0;
+}
+
+static int fetch_finished(int result, struct strbuf *out,
+			  void *cb, void *task_cb)
+{
+	struct parallel_fetch_state *state = cb;
+	const char *remote = task_cb;
+
+	if (result) {
+		strbuf_addf(out, _("could not fetch '%s' (exit code: %d)\n"),
+			    remote, result);
+		state->result = -1;
+	}
+
+	return 0;
+}
+
+static int fetch_multiple(struct string_list *list, int max_children)
 {
 	int i, result = 0;
 	struct argv_array argv = ARGV_ARRAY_INIT;
@@ -1470,20 +1533,34 @@ static int fetch_multiple(struct string_list *list)
 	argv_array_pushl(&argv, "fetch", "--append", "--no-auto-gc", NULL);
 	add_options_to_argv(&argv);
 
-	for (i = 0; i < list->nr; i++) {
-		const char *name = list->items[i].string;
-		argv_array_push(&argv, name);
-		if (verbosity >= 0)
-			printf(_("Fetching %s\n"), name);
-		if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
-			error(_("Could not fetch %s"), name);
-			result = 1;
+	if (max_children != 1 && list->nr != 1) {
+		struct parallel_fetch_state state = { argv.argv, list, 0, 0 };
+
+		argv_array_push(&argv, "--end-of-options");
+		result = run_processes_parallel_tr2(max_children,
+						    &fetch_next_remote,
+						    &fetch_failed_to_start,
+						    &fetch_finished,
+						    &state,
+						    "fetch", "parallel/fetch");
+
+		if (!result)
+			result = state.result;
+	} else
+		for (i = 0; i < list->nr; i++) {
+			const char *name = list->items[i].string;
+			argv_array_push(&argv, name);
+			if (verbosity >= 0)
+				printf(_("Fetching %s\n"), name);
+			if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
+				error(_("Could not fetch %s"), name);
+				result = 1;
+			}
+			argv_array_pop(&argv);
 		}
-		argv_array_pop(&argv);
-	}
 
 	argv_array_clear(&argv);
-	return result;
+	return !!result;
 }
 
 /*
@@ -1626,7 +1703,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	for (i = 1; i < argc; i++)
 		strbuf_addf(&default_rla, " %s", argv[i]);
 
-	fetch_config_from_gitmodules(&max_children, &recurse_submodules);
+	fetch_config_from_gitmodules(&submodule_fetch_jobs_config,
+				     &recurse_submodules);
 	git_config(git_fetch_config, NULL);
 
 	argc = parse_options(argc, argv, prefix,
@@ -1692,15 +1770,27 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 			fetch_one_setup_partial(remote);
 		result = fetch_one(remote, argc, argv, prune_tags_ok);
 	} else {
+		int max_children = max_jobs;
+
 		if (filter_options.choice)
 			die(_("--filter can only be used with the remote "
 			      "configured in extensions.partialclone"));
+
+		if (max_children < 0)
+			max_children = fetch_parallel_config;
+
 		/* TODO should this also die if we have a previous partial-clone? */
-		result = fetch_multiple(&list);
+		result = fetch_multiple(&list, max_children);
 	}
 
 	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct argv_array options = ARGV_ARRAY_INIT;
+		int max_children = max_jobs;
+
+		if (max_children < 0)
+			max_children = submodule_fetch_jobs_config;
+		if (max_children < 0)
+			max_children = fetch_parallel_config;
 
 		add_options_to_argv(&options);
 		result = fetch_populated_submodules(the_repository,
diff --git a/t/t5514-fetch-multiple.sh b/t/t5514-fetch-multiple.sh
index 5426d4b5ab..de8e2f1531 100755
--- a/t/t5514-fetch-multiple.sh
+++ b/t/t5514-fetch-multiple.sh
@@ -183,4 +183,15 @@ test_expect_success 'git fetch --all --tags' '
 	test_cmp expect test8/output
 '
 
+test_expect_success 'parallel' '
+	git remote add one ./bogus1 &&
+	git remote add two ./bogus2 &&
+
+	test_must_fail env GIT_TRACE="$PWD/trace" \
+		git fetch --jobs=2 --multiple one two 2>err &&
+	grep "preparing to run up to 2 tasks" trace &&
+	test_i18ngrep "could not fetch .one.*128" err &&
+	test_i18ngrep "could not fetch .two.*128" err
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] fetch: let --jobs=<n> parallelize --multiple, too
       [not found]     ` <xmqqftk67r6j.fsf@gitster-ct.c.googlers.com>
@ 2019-10-06  9:53       ` Johannes Schindelin
  2019-10-07  1:17         ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2019-10-06  9:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Randall S. Becker

Hi Junio,

On Sun, 6 Oct 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +test_expect_success 'parallel' '
> > +	git remote add one ./bogus1 &&
> > +	git remote add two ./bogus2 &&
> > +
> > +	test_must_fail env GIT_TRACE="$PWD/trace" \
> > +		git fetch --jobs=2 --multiple one two 2>err &&
> > +	grep "preparing to run up to 2 tasks" trace &&
> > +	test_i18ngrep "could not fetch .one.*128" err &&
> > +	test_i18ngrep "could not fetch .two.*128" err
> > +'
> > +
> >  test_done
>
> Thanks.  I think it is much better to prepare these tests like this
> patch does to be broken when phrasing changes---that would give
> feedback and confidence to the person who is changing the message
> and/or the logic to emit the message.
>
> Where does the constant 128 come from, by the way?  If it is from errno.h
> then we will soon hear breakage report from NonStop folks, I predict
> ;-)

It comes from `die()`:

-- snip --
static NORETURN void die_builtin(const char *err, va_list params)
{
        /*
	 * We call this trace2 function first and expect it to va_copy 'params'
	 * before using it (because an 'ap' can only be walked once).
	 */
	trace2_cmd_error_va(err, params);

	vreportf("fatal: ", err, params);

	exit(128);
}
-- snap --

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] fetch: let --jobs=<n> parallelize --multiple, too
  2019-10-06  9:53       ` Johannes Schindelin
@ 2019-10-07  1:17         ` Junio C Hamano
  2019-10-07 10:14           ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2019-10-07  1:17 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Randall S. Becker

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Junio,
> ...
>> Thanks.  I think it is much better to prepare these tests like this
>> patch does to be broken when phrasing changes---that would give
>> feedback and confidence to the person who is changing the message
>> and/or the logic to emit the message.
>>
>> Where does the constant 128 come from, by the way?  If it is from errno.h
>> then we will soon hear breakage report from NonStop folks, I predict
>> ;-)
>
> It comes from `die()`:
> ...
> 	exit(128);

OK, so hopefully we wouldn't see any platform specific variations.
Good.

Thanks for digging.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] fetch: let --jobs=<n> parallelize --multiple, too
  2019-10-07  1:17         ` Junio C Hamano
@ 2019-10-07 10:14           ` Johannes Schindelin
  2019-10-07 10:17             ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2019-10-07 10:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Randall S. Becker

Hi Junio,

On Mon, 7 Oct 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi Junio,
> > ...
> >> Thanks.  I think it is much better to prepare these tests like this
> >> patch does to be broken when phrasing changes---that would give
> >> feedback and confidence to the person who is changing the message
> >> and/or the logic to emit the message.
> >>
> >> Where does the constant 128 come from, by the way?  If it is from errno.h
> >> then we will soon hear breakage report from NonStop folks, I predict
> >> ;-)
> >
> > It comes from `die()`:
> > ...
> > 	exit(128);
>
> OK, so hopefully we wouldn't see any platform specific variations.

I am certain of it, as the matched `128` is not printed implicitly, it
is printed by these two lines that I added in this patch (as part of
`fetch_finished()`):

+               strbuf_addf(out, _("could not fetch '%s' (exit code: %d)\n"),
+                           remote, result);

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] fetch: let --jobs=<n> parallelize --multiple, too
  2019-10-07 10:14           ` Johannes Schindelin
@ 2019-10-07 10:17             ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2019-10-07 10:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Randall S. Becker

Hi Junio,

On Mon, 7 Oct 2019, Johannes Schindelin wrote:

> On Mon, 7 Oct 2019, Junio C Hamano wrote:
>
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >
> > > Hi Junio,
> > > ...
> > >> Thanks.  I think it is much better to prepare these tests like this
> > >> patch does to be broken when phrasing changes---that would give
> > >> feedback and confidence to the person who is changing the message
> > >> and/or the logic to emit the message.
> > >>
> > >> Where does the constant 128 come from, by the way?  If it is from errno.h
> > >> then we will soon hear breakage report from NonStop folks, I predict
> > >> ;-)
> > >
> > > It comes from `die()`:
> > > ...
> > > 	exit(128);
> >
> > OK, so hopefully we wouldn't see any platform specific variations.
>
> I am certain of it, as the matched `128` is not printed implicitly, it
> is printed by these two lines that I added in this patch (as part of
> `fetch_finished()`):
>
> +               strbuf_addf(out, _("could not fetch '%s' (exit code: %d)\n"),
> +                           remote, result);

Oh, and I forgot to say: the exit code is expected to be `128` because
the added regression test case uses a non-existing path to fetch from,
essentially saying "No, `git fetch`, I expect you to `die()`".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-07 10:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-01 11:53 [PATCH 0/1] fetch --multiple: respect --jobs= Johannes Schindelin via GitGitGadget
2019-10-01 11:53 ` [PATCH 1/1] fetch: let --jobs=<n> parallelize --multiple, too Johannes Schindelin via GitGitGadget
2019-10-04  4:33   ` Junio C Hamano
2019-10-04 22:03     ` Johannes Schindelin
2019-10-05 18:46 ` [PATCH v2 0/1] fetch --multiple: respect --jobs= Johannes Schindelin via GitGitGadget
2019-10-05 18:46   ` [PATCH v2 1/1] fetch: let --jobs=<n> parallelize --multiple, too Johannes Schindelin via GitGitGadget
     [not found]     ` <xmqqftk67r6j.fsf@gitster-ct.c.googlers.com>
2019-10-06  9:53       ` Johannes Schindelin
2019-10-07  1:17         ` Junio C Hamano
2019-10-07 10:14           ` Johannes Schindelin
2019-10-07 10:17             ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).