git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Stefan Beller <sbeller@google.com>
Cc: peff@peff.net, git@vger.kernel.org, jrnieder@gmail.com
Subject: Re: [PATCH 3/5] submodule: helper to run foreach in parallel
Date: Tue, 25 Aug 2015 14:09:12 -0700	[thread overview]
Message-ID: <xmqq7fojxeh3.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1440523706-23041-4-git-send-email-sbeller@google.com> (Stefan Beller's message of "Tue, 25 Aug 2015 10:28:24 -0700")

Stefan Beller <sbeller@google.com> writes:

> This runs a command on each submodule in parallel and should eventually
> replace `git submodule foreach`.
>
> There is a new option -j/--jobs (inspired by make) to specify the number
> of parallel threads.
>
> The jobs=1 case needs to be special cases to exactly replicate the current
> default behavior of `git submodule foreach` such as working stdin input.

"git submodule foreach-parallel -j1" feels like a roundabout way to
say "git submodule foreach"; "I want you to go parallel.  Oh, not
really, I want you to do one thing at a time".

I do not think these two have to be equivalent---users who need the
traditional one-by-one "foreach" behaviour (including the standard
input and other aspects that are unreasonable to expect
"foreach-parallel -jN" to replicate) can and should choose to use
"foreach", not "foreach-parallel -j1".

In any case, I do not think I saw any special casing of -j1 case that
attempts to leave the standard input operational.  Did I miss
something, or is the above describing what is left to do?

> For more than one job there is no input possible and the output of both
> stdout/stderr of the command are put into the stderr in an ordered fashion,
> i.e. the tasks to not intermingle their output in races.

To rephrase what I said earlier, "for parallel version, the above
things happen, even with numthreads==1", is perfectly fine.

> +	cp->no_stdin = 1;
> +	cp->out = 0;
> +	cp->err = -1;
> +	cp->dir = args->path;
> +	cp->stdout_to_stderr = 1;

So the standard error and the standard output are mixed to a single
pipe ...

> +	cp->use_shell = 1;
> +
> +	if (start_command(cp)) {
> +		die("Could not start command");
> +		for (i = 0; cp->args.argv; i++)
> +			fprintf(stderr, "%s\n", cp->args.argv[i]);
> +	}
> +
> +	while (1) {
> +		ssize_t len = xread(cp->err, buf, sizeof(buf));
> +		if (len < 0)
> +			die("Read from child failed");
> +		else if (len == 0)
> +			break;
> +		else {
> +			strbuf_add(&out, buf, len);
> +		}

... and the whole thing is accumulated in core???

> +	}
> +	if (finish_command(cp))
> +		die("command died with error");
> +
> +	sem_wait(args->mutex);
> +	fputs(out.buf, stderr);
> +	sem_post(args->mutex);

... and emitted to standard error?

I would have expected that the standard error would be left alone
(i.e. letting warnings from multiple jobs to be mixed together
simply because everybody writes to the same file descriptor), while
the standard output would be line-buffered, perhaps captured by the
above loop and then emitted under mutex, or something.

I think I said this earlier, but latency to the first output counts
as an activity feedback mechanism.

> +	if (module_list_compute(0, nullargv, NULL, &pathspec) < 0)
> +		return 1;
> +
> +	gitmodules_config();
> +
> +	aq = create_task_queue(number_threads);
> +
> +	for (i = 0; i < ce_used; i++) {
> +		const struct submodule *sub;
> +		const struct cache_entry *ce = ce_entries[i];
> +		struct submodule_args *args = malloc(sizeof(*args));
> +
> +		if (ce_stage(ce))
> +			args->sha1 = xstrdup(sha1_to_hex(null_sha1));
> +		else
> +			args->sha1 = xstrdup(sha1_to_hex(ce->sha1));
> +
> +		strbuf_reset(&sb);
> +		strbuf_addf(&sb, "%s/.git", ce->name);
> +		if (!file_exists(sb.buf))
> +			continue;
> +
> +		args->path = ce->name;
> +		sub = submodule_from_path(null_sha1, args->path);
> +		if (!sub)
> +			die("No submodule mapping found in .gitmodules for path '%s'", args->path);
> +
> +		args->name = sub->name;
> +		args->toplevel = xstrdup(xgetcwd());
> +		args->cmd = argv;
> +		args->mutex = mutex;
> +		args->prefix = alternative_path;
> +		add_task(aq, run_cmd_submodule, args);
> +	}
> +
> +	finish_task_queue(aq, NULL);

This is very nice.  Declare a task queue with N workers, throw bunch
of task to it and then wait for all of them to complete.  Things
can't be simpler than that ;-).  One thing that other callers of the
mechanism might want may be to plug and unplug the task queue, but
that can wait until the need arises.

  reply	other threads:[~2015-08-25 21:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-25 17:28 [RFC PATCH 0/5] Demonstrate new parallel threading API Stefan Beller
2015-08-25 17:28 ` [PATCH 1/5] FIXUP submodule: implement `module_clone` as a builtin helper Stefan Beller
2015-08-25 17:28 ` [PATCH 2/5] thread-utils: add a threaded task queue Stefan Beller
2015-08-25 17:28 ` [PATCH 3/5] submodule: helper to run foreach in parallel Stefan Beller
2015-08-25 21:09   ` Junio C Hamano [this message]
2015-08-25 21:42     ` Stefan Beller
2015-08-25 22:23       ` Junio C Hamano
2015-08-25 22:44         ` Junio C Hamano
2015-08-26 17:06   ` Jeff King
2015-08-26 17:21     ` Stefan Beller
2015-08-25 17:28 ` [PATCH 4/5] index-pack: Use the new worker pool Stefan Beller
2015-08-25 19:03   ` Jeff King
2015-08-25 19:23     ` Stefan Beller
2015-08-25 20:41     ` Junio C Hamano
2015-08-25 20:59       ` Stefan Beller
2015-08-25 21:12         ` Junio C Hamano
2015-08-25 22:39           ` Stefan Beller
2015-08-25 22:50             ` Junio C Hamano
2015-08-25 17:28 ` [PATCH 5/5] pack-objects: Use " Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2015-08-27  0:52 [RFC PATCH 0/5] Progressing with `git submodule foreach_parallel` Stefan Beller
2015-08-27  0:52 ` [PATCH 3/5] submodule: helper to run foreach in parallel Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7fojxeh3.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).