git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: git@vger.kernel.org
Cc: peff@peff.net, jrnieder@gmail.com, gitster@pobox.com,
	johannes.schindelin@gmx.de, Stefan Beller <sbeller@google.com>
Subject: [PATCH 5/9] run-command: add synced output
Date: Thu, 27 Aug 2015 18:14:51 -0700	[thread overview]
Message-ID: <1440724495-708-6-git-send-email-sbeller@google.com> (raw)
In-Reply-To: <1440724495-708-1-git-send-email-sbeller@google.com>

In the last patch we added an easy way to get a thread pool. Now if we
want to run external commands from threads in the thread pool, the output
will mix up between the threads.

To solve this problem we protect the output via a mutex from becoming
garbled. Each thread will try to acquire and directly pipe their output
to the standard output if they are lucky to get the mutex. If they do not
have the mutex each thread will buffer their output.

Example:
Let's assume we have 5 tasks A,B,C,D,E and each task takes a different
amount of time (say `git fetch` for different submodules), then the output
of tasks in sequential order might look like this:

 time -->
 output: |---A---|   |-B-|   |----C-----------|   |-D-|   |-E-|

When we schedule these tasks into two threads, a schedule
and sample output over time may look like this:

thread 1: |---A---|   |-D-|   |-E-|

thread 2: |-B-|   |----C-----------|

output:   |---A---|B|------C-------|ED

So A will be perceived as it would run normally in
the single threaded version of foreach. As B has finished
by the time the mutex becomes available, the whole buffer
will just be dumped into the standard output. This will be
perceived by the user as just a 'very fast' operation of B.
Once that is done, C takes the mutex, and flushes the piled
up buffer to standard output. In case the it is a
git command, we have a nice progress display, which will just
look like the first half of C happend really quickly.

Notice how E and D are put out in a different order than the
original as the new parallel foreach doesn't care about the
order.

So this way of output is really good for human consumption,
as it only changes the timing, not the actual output.

For machine consumption the output needs to be prepared in
the tasks, by either having a prefix per line or per block
to indicate whose tasks output is displayed.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 run-command.h | 18 ++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/run-command.c b/run-command.c
index 3d37f8c..bac4367 100644
--- a/run-command.c
+++ b/run-command.c
@@ -556,17 +556,77 @@ int finish_command(struct child_process *cmd)
 }
 
 int run_command(struct child_process *cmd)
+#ifdef NO_PTHREADS
 {
 	int code;
 
 	if (cmd->out < 0 || cmd->err < 0)
 		die("BUG: run_command with a pipe can cause deadlock");
 
+	if (cmd->sync_buf)
+		xwrite(cmd->out, cmd->sync_buf->buf, cmd->sync_buf->len);
+
 	code = start_command(cmd);
 	if (code)
 		return code;
 	return finish_command(cmd);
 }
+#else
+{
+	int code, lock_acquired;
+
+	if (!cmd->sync_mutex) {
+		if (cmd->out < 0 || cmd->err < 0)
+			die("BUG: run_command with a pipe can cause deadlock");
+
+		if (cmd->sync_buf)
+			xwrite(cmd->out, cmd->sync_buf->buf, cmd->sync_buf->len);
+	} else {
+		if (cmd->out < 0)
+			die("BUG: run_command with a pipe can cause deadlock");
+
+		if (!cmd->stdout_to_stderr)
+			die("BUG: run_command with sync_mutex not supported "
+			    "without stdout_to_stderr set");
+
+		if (!cmd->sync_buf)
+			die("BUG: Must pass a buffer when specifying "
+			    "to sync output");
+	}
+
+	code = start_command(cmd);
+	if (code)
+		return code;
+
+	if (cmd->sync_mutex) {
+		while (1) {
+			char buf[1024];
+			ssize_t len = xread(cmd->err, buf, sizeof(buf));
+			if (len < 0)
+				die("Read from command failed");
+			else if (len == 0)
+				break;
+			else
+				strbuf_add(cmd->sync_buf, buf, len);
+
+			if (!lock_acquired
+			    && !pthread_mutex_trylock(cmd->sync_mutex))
+				lock_acquired = 1;
+			if (lock_acquired) {
+				fputs(cmd->sync_buf->buf, stderr);
+				strbuf_reset(cmd->sync_buf);
+			}
+		}
+		if (!lock_acquired)
+			pthread_mutex_lock(cmd->sync_mutex);
+
+		fputs(cmd->sync_buf->buf, stderr);
+		pthread_mutex_unlock(cmd->sync_mutex);
+	}
+
+	return finish_command(cmd);
+}
+#endif
 
 int run_command_v_opt(const char **argv, int opt)
 {
diff --git a/run-command.h b/run-command.h
index 176a5b2..0df83c9 100644
--- a/run-command.h
+++ b/run-command.h
@@ -43,6 +43,24 @@ struct child_process {
 	unsigned stdout_to_stderr:1;
 	unsigned use_shell:1;
 	unsigned clean_on_exit:1;
+#ifndef NO_PTHREAD
+	/*
+	 * In a threaded environment running multiple commands from different
+	 * threads would lead to garbled output as the output of different
+	 * commands would mix.
+	 * If this mutex is not NULL, the output of `err` is only done when
+	 * holding the mutex. If the mutex cannot be acquired, the output
+	 * will be buffered until the mutex can be acquired.
+	 */
+	pthread_mutex_t *sync_mutex;
+#endif
+	/*
+	 * The sync_buf will be used to buffer the output while the mutex
+	 * is not acquired. It can also contain data before being passed into
+	 * run_command, which will be output together with the output of
+	 * the child.
+	 */
+	struct strbuf *sync_buf;
 };
 
 #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
-- 
2.5.0.264.g5e52b0d

  parent reply	other threads:[~2015-08-28  1:15 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-28  1:14 [PATCH 0/9] Progress with git submodule Stefan Beller
2015-08-28  1:14 ` [PATCH 1/9] submodule: implement `module_list` as a builtin helper Stefan Beller
2015-08-28  1:14 ` [PATCH 2/9] submodule: implement `module_name` " Stefan Beller
2015-08-28  1:14 ` [PATCH 3/9] submodule: implement `module_clone` " Stefan Beller
2015-08-31 18:53   ` Junio C Hamano
2015-08-28  1:14 ` [PATCH 4/9] thread-utils: add a threaded task queue Stefan Beller
2015-08-28  1:14 ` Stefan Beller [this message]
2015-08-28  1:14 ` [PATCH 6/9] submodule: helper to run foreach in parallel Stefan Beller
2015-08-28 17:08   ` Stefan Beller
2015-08-28  1:14 ` [PATCH 7/9] fetch: fetch submodules " Stefan Beller
2015-08-28 17:00   ` Stefan Beller
2015-08-28 17:01     ` Jonathan Nieder
2015-08-28 17:12       ` Junio C Hamano
2015-08-28 17:45         ` Stefan Beller
2015-08-28 18:20         ` Jonathan Nieder
2015-08-28 18:27           ` Junio C Hamano
2015-08-28 18:35             ` Jeff King
2015-08-28 18:41               ` Junio C Hamano
2015-08-28 18:41               ` Stefan Beller
2015-08-28 18:44                 ` Jeff King
2015-08-28 18:50                   ` Jonathan Nieder
2015-08-28 18:53                     ` Jeff King
2015-08-28 19:02                       ` Stefan Beller
2015-08-28 18:59                   ` Stefan Beller
2015-08-28 18:44               ` Jonathan Nieder
2015-08-28 18:36             ` Stefan Beller
2015-08-28 18:42             ` Jonathan Nieder
2015-08-31 18:56   ` Junio C Hamano
2015-08-31 19:05     ` Jeff King
2015-08-28  1:14 ` [PATCH 8/9] index-pack: Use the new worker pool Stefan Beller
2015-08-28  1:14 ` [PATCH 9/9] pack-objects: Use " Stefan Beller
2015-08-28 10:09 ` [PATCH 0/9] Progress with git submodule Johannes Schindelin
2015-08-28 16:35   ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1440724495-708-6-git-send-email-sbeller@google.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).