git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 5/8] sigchain: add command to pop all common signals
  2015-09-28 23:13 [PATCH 0/8] fetch submodules in parallel Stefan Beller
@ 2015-09-28 23:14 ` Stefan Beller
  2015-09-30  5:23   ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Beller @ 2015-09-28 23:14 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, ramsay, jacob.keller, peff, gitster, jrnieder,
	johannes.schindelin, Jens.Lehmann, ericsunshine

The new method removes all common signal handlers that were installed
by sigchain_push.

CC: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 sigchain.c | 9 +++++++++
 sigchain.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/sigchain.c b/sigchain.c
index faa375d..9262307 100644
--- a/sigchain.c
+++ b/sigchain.c
@@ -50,3 +50,12 @@ void sigchain_push_common(sigchain_fun f)
 	sigchain_push(SIGQUIT, f);
 	sigchain_push(SIGPIPE, f);
 }
+
+void sigchain_pop_common(void)
+{
+	sigchain_pop(SIGINT);
+	sigchain_pop(SIGHUP);
+	sigchain_pop(SIGTERM);
+	sigchain_pop(SIGQUIT);
+	sigchain_pop(SIGPIPE);
+}
diff --git a/sigchain.h b/sigchain.h
index 618083b..138b20f 100644
--- a/sigchain.h
+++ b/sigchain.h
@@ -7,5 +7,6 @@ int sigchain_push(int sig, sigchain_fun f);
 int sigchain_pop(int sig);
 
 void sigchain_push_common(sigchain_fun f);
+void sigchain_pop_common(void);
 
 #endif /* SIGCHAIN_H */
-- 
2.5.0.273.g6fa2560.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/8] sigchain: add command to pop all common signals
  2015-09-28 23:14 ` [PATCH 5/8] sigchain: add command to pop all common signals Stefan Beller
@ 2015-09-30  5:23   ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2015-09-30  5:23 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, ramsay, jacob.keller, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Stefan Beller <sbeller@google.com> writes:

> The new method removes all common signal handlers that were installed
> by sigchain_push.
>
> CC: Jeff King <peff@peff.net>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  sigchain.c | 9 +++++++++
>  sigchain.h | 1 +
>  2 files changed, 10 insertions(+)

Sounds like a good idea, as you need to clean them all up if you did
push_common() and ended up not receiving any signal.

This is merely pure aesthetics, but I somehow thought that ordering
them in the reverse as listed in push_common() might make more
sense, though.

Thanks.

>
> diff --git a/sigchain.c b/sigchain.c
> index faa375d..9262307 100644
> --- a/sigchain.c
> +++ b/sigchain.c
> @@ -50,3 +50,12 @@ void sigchain_push_common(sigchain_fun f)
>  	sigchain_push(SIGQUIT, f);
>  	sigchain_push(SIGPIPE, f);
>  }
> +
> +void sigchain_pop_common(void)
> +{
> +	sigchain_pop(SIGINT);
> +	sigchain_pop(SIGHUP);
> +	sigchain_pop(SIGTERM);
> +	sigchain_pop(SIGQUIT);
> +	sigchain_pop(SIGPIPE);
> +}
> diff --git a/sigchain.h b/sigchain.h
> index 618083b..138b20f 100644
> --- a/sigchain.h
> +++ b/sigchain.h
> @@ -7,5 +7,6 @@ int sigchain_push(int sig, sigchain_fun f);
>  int sigchain_pop(int sig);
>  
>  void sigchain_push_common(sigchain_fun f);
> +void sigchain_pop_common(void);
>  
>  #endif /* SIGCHAIN_H */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7
@ 2015-12-14 19:37 Stefan Beller
  2015-12-14 19:37 ` [PATCH 1/8] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

I am sending out a new version for replacing sb/submodule-parallel-fetch for
the time after the 2.7 release.

The content are 
 * all patches as in the branch sb/submodule-parallel-fetch
 * inlcuding the fixups as suggested by Hannes, 
 * write a message to the debug log for better testing and debugging purposes
  (a patch cherry picked from the series which is supposed to build on top of this)

The patches themselves were rebased such that there are no fixup commits
any more, but we get things right the first time.

The commit message of "run-command: add an asynchronous parallel child processor"
has slightly been updated to mention the fact that we don't want to use waitpid(-1)
but rather use the assumption of child's stderr living as long as the child itself.

Thanks,
Stefan


Jonathan Nieder (1):
  submodule.c: write "Fetching submodule <foo>" to stderr

Stefan Beller (7):
  xread: poll on non blocking fds
  xread_nonblock: add functionality to read from fds without blocking
  strbuf: add strbuf_read_once to read without blocking
  sigchain: add command to pop all common signals
  run-command: add an asynchronous parallel child processor
  fetch_populated_submodules: use new parallel job processing
  submodules: allow parallel fetching, add tests and documentation

 Documentation/fetch-options.txt |   7 +
 builtin/fetch.c                 |   6 +-
 builtin/pull.c                  |   6 +
 git-compat-util.h               |   1 +
 run-command.c                   | 335 ++++++++++++++++++++++++++++++++++++++++
 run-command.h                   |  80 ++++++++++
 sigchain.c                      |   9 ++
 sigchain.h                      |   1 +
 strbuf.c                        |  11 ++
 strbuf.h                        |   8 +
 submodule.c                     | 141 +++++++++++------
 submodule.h                     |   2 +-
 t/t0061-run-command.sh          |  53 +++++++
 t/t5526-fetch-submodules.sh     |  71 ++++++---
 test-run-command.c              |  55 ++++++-
 wrapper.c                       |  35 ++++-
 16 files changed, 747 insertions(+), 74 deletions(-)

-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/8] submodule.c: write "Fetching submodule <foo>" to stderr
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 19:37 ` [PATCH 2/8] xread: poll on non blocking fds Stefan Beller
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: Jonathan Nieder, peff, gitster, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

From: Jonathan Nieder <jrnieder@gmail.com>

The "Pushing submodule <foo>" progress output correctly goes to
stderr, but "Fetching submodule <foo>" is going to stdout by
mistake.  Fix it to write to stderr.

Noticed while trying to implement a parallel submodule fetch.  When
this particular output line went to a different file descriptor, it
was buffered separately, resulting in wrongly interleaved output if
we copied it to the terminal naively.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c                 |  2 +-
 t/t5526-fetch-submodules.sh | 51 +++++++++++++++++++++++----------------------
 2 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/submodule.c b/submodule.c
index 14e7624..8386477 100644
--- a/submodule.c
+++ b/submodule.c
@@ -689,7 +689,7 @@ int fetch_populated_submodules(const struct argv_array *options,
 			git_dir = submodule_git_dir.buf;
 		if (is_directory(git_dir)) {
 			if (!quiet)
-				printf("Fetching submodule %s%s\n", prefix, ce->name);
+				fprintf(stderr, "Fetching submodule %s%s\n", prefix, ce->name);
 			cp.dir = submodule_path.buf;
 			argv_array_push(&argv, default_argv);
 			argv_array_push(&argv, "--submodule-prefix");
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index a4532b0..17759b14 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -16,7 +16,8 @@ add_upstream_commit() {
 		git add subfile &&
 		git commit -m new subfile &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "From $pwd/submodule" > ../expect.err &&
+		echo "Fetching submodule submodule" > ../expect.err &&
+		echo "From $pwd/submodule" >> ../expect.err &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err
 	) &&
 	(
@@ -27,6 +28,7 @@ add_upstream_commit() {
 		git add deepsubfile &&
 		git commit -m new deepsubfile &&
 		head2=$(git rev-parse --short HEAD) &&
+		echo "Fetching submodule submodule/subdir/deepsubmodule" >> ../expect.err
 		echo "From $pwd/deepsubmodule" >> ../expect.err &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err
 	)
@@ -56,9 +58,7 @@ test_expect_success setup '
 	(
 		cd downstream &&
 		git submodule update --init --recursive
-	) &&
-	echo "Fetching submodule submodule" > expect.out &&
-	echo "Fetching submodule submodule/subdir/deepsubmodule" >> expect.out
+	)
 '
 
 test_expect_success "fetch --recurse-submodules recurses into submodules" '
@@ -67,7 +67,7 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 		cd downstream &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -96,7 +96,7 @@ test_expect_success "using fetchRecurseSubmodules=true in .gitmodules recurses i
 		git config -f .gitmodules submodule.submodule.fetchRecurseSubmodules true &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -127,7 +127,7 @@ test_expect_success "--recurse-submodules overrides fetchRecurseSubmodules setti
 		git config --unset -f .gitmodules submodule.submodule.fetchRecurseSubmodules &&
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -146,7 +146,7 @@ test_expect_success "--dry-run propagates to submodules" '
 		cd downstream &&
 		git fetch --recurse-submodules --dry-run >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -155,7 +155,7 @@ test_expect_success "Without --dry-run propagates to submodules" '
 		cd downstream &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -166,7 +166,7 @@ test_expect_success "recurseSubmodules=true propagates into submodules" '
 		git config fetch.recurseSubmodules true
 		git fetch >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -180,7 +180,7 @@ test_expect_success "--recurse-submodules overrides config in submodule" '
 		) &&
 		git fetch --recurse-submodules >../actual.out 2>../actual.err
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -214,16 +214,15 @@ test_expect_success "Recursion stops when no new submodule commits are fetched"
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	echo "Fetching submodule submodule" > expect.out.sub &&
 	echo "From $pwd/." > expect.err.sub &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err.sub &&
-	head -2 expect.err >> expect.err.sub &&
+	head -3 expect.err >> expect.err.sub &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_i18ncmp expect.err.sub actual.err &&
-	test_i18ncmp expect.out.sub actual.out
+	test_must_be_empty actual.out
 '
 
 test_expect_success "Recursion doesn't happen when new superproject commits don't change any submodules" '
@@ -269,7 +268,7 @@ test_expect_success "Recursion picks up config in submodule" '
 		)
 	) &&
 	test_i18ncmp expect.err.sub actual.err &&
-	test_i18ncmp expect.out actual.out
+	test_must_be_empty actual.out
 '
 
 test_expect_success "Recursion picks up all submodules when necessary" '
@@ -285,7 +284,8 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule"
 		head2=$(git rev-parse --short HEAD) &&
-		echo "From $pwd/submodule" > ../expect.err.sub &&
+		echo "Fetching submodule submodule" > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err.sub
 	) &&
 	head1=$(git rev-parse --short HEAD) &&
@@ -295,13 +295,13 @@ test_expect_success "Recursion picks up all submodules when necessary" '
 	echo "From $pwd/." > expect.err.2 &&
 	echo "   $head1..$head2  master     -> origin/master" >> expect.err.2 &&
 	cat expect.err.sub >> expect.err.2 &&
-	tail -2 expect.err >> expect.err.2 &&
+	tail -3 expect.err >> expect.err.2 &&
 	(
 		cd downstream &&
 		git fetch >../actual.out 2>../actual.err
 	) &&
 	test_i18ncmp expect.err.2 actual.err &&
-	test_i18ncmp expect.out actual.out
+	test_must_be_empty actual.out
 '
 
 test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no new commits are fetched in the superproject (and ignores config)" '
@@ -317,7 +317,8 @@ test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
 		git add subdir/deepsubmodule &&
 		git commit -m "new deepsubmodule" &&
 		head2=$(git rev-parse --short HEAD) &&
-		echo "From $pwd/submodule" > ../expect.err.sub &&
+		echo Fetching submodule submodule > ../expect.err.sub &&
+		echo "From $pwd/submodule" >> ../expect.err.sub &&
 		echo "   $head1..$head2  master     -> origin/master" >> ../expect.err.sub
 	) &&
 	(
@@ -335,7 +336,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 	git add submodule &&
 	git commit -m "new submodule" &&
 	head2=$(git rev-parse --short HEAD) &&
-	tail -2 expect.err > expect.err.deepsub &&
+	tail -3 expect.err > expect.err.deepsub &&
 	echo "From $pwd/." > expect.err &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err &&
 	cat expect.err.sub >> expect.err &&
@@ -354,7 +355,7 @@ test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
 			git config --unset -f .gitmodules submodule.subdir/deepsubmodule.fetchRecursive
 		)
 	) &&
-	test_i18ncmp expect.out actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err actual.err
 '
 
@@ -388,7 +389,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 	head2=$(git rev-parse --short HEAD) &&
 	echo "From $pwd/." > expect.err.2 &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err.2 &&
-	head -2 expect.err >> expect.err.2 &&
+	head -3 expect.err >> expect.err.2 &&
 	(
 		cd downstream &&
 		git config fetch.recurseSubmodules on-demand &&
@@ -399,7 +400,7 @@ test_expect_success "'fetch.recurseSubmodules=on-demand' overrides global config
 		cd downstream &&
 		git config --unset fetch.recurseSubmodules
 	) &&
-	test_i18ncmp expect.out.sub actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err.2 actual.err
 '
 
@@ -416,7 +417,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 	head2=$(git rev-parse --short HEAD) &&
 	echo "From $pwd/." > expect.err.2 &&
 	echo "   $head1..$head2  master     -> origin/master" >>expect.err.2 &&
-	head -2 expect.err >> expect.err.2 &&
+	head -3 expect.err >> expect.err.2 &&
 	(
 		cd downstream &&
 		git config submodule.submodule.fetchRecurseSubmodules on-demand &&
@@ -427,7 +428,7 @@ test_expect_success "'submodule.<sub>.fetchRecurseSubmodules=on-demand' override
 		cd downstream &&
 		git config --unset submodule.submodule.fetchRecurseSubmodules
 	) &&
-	test_i18ncmp expect.out.sub actual.out &&
+	test_must_be_empty actual.out &&
 	test_i18ncmp expect.err.2 actual.err
 '
 
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/8] xread: poll on non blocking fds
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
  2015-12-14 19:37 ` [PATCH 1/8] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 22:58   ` Eric Sunshine
  2015-12-14 19:37 ` [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking Stefan Beller
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

>From the man page:
EAGAIN The file descriptor fd refers to a file other than a socket
       and has been marked nonblocking (O_NONBLOCK), and the read
       would block.

EAGAIN or EWOULDBLOCK
       The file descriptor fd refers to a socket and has been marked
       nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
       allows either error to be returned for this case, and does not
       require these constants to have the same value, so a portable
       application should check for both possibilities.

If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
As the intent of xread is to read as much as possible either until the
fd is EOF or an actual error occurs, we can ease the feeder of the fd
by not spinning the whole time, but rather wait for it politely by not
busy waiting.

We should not care if the call to poll failed, as we're in an infinite
loop and can only get out with the correct read().

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 wrapper.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index 6fcaa4d..4f720fe 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -236,8 +236,17 @@ ssize_t xread(int fd, void *buf, size_t len)
 	    len = MAX_IO_SIZE;
 	while (1) {
 		nr = read(fd, buf, len);
-		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
-			continue;
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			if (errno == EAGAIN || errno == EWOULDBLOCK) {
+				struct pollfd pfd;
+				pfd.events = POLLIN;
+				pfd.fd = fd;
+				/* We deliberately ignore the return value */
+				poll(&pfd, 1, -1);
+			}
+		}
 		return nr;
 	}
 }
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
  2015-12-14 19:37 ` [PATCH 1/8] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
  2015-12-14 19:37 ` [PATCH 2/8] xread: poll on non blocking fds Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 20:59   ` Junio C Hamano
  2015-12-14 23:03   ` Eric Sunshine
  2015-12-14 19:37 ` [PATCH 4/8] strbuf: add strbuf_read_once to read " Stefan Beller
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

Provide a wrapper to read(), similar to xread(), that restarts on
EINTR but not EAGAIN (or EWOULDBLOCK). This enables the caller to
handle polling itself, possibly polling multiple sockets or performing
some other action.

Helped-by: Jacob Keller <jacob.keller@gmail.com>
Helped-by: Jeff King <peff@peff.net>,
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-compat-util.h |  1 +
 wrapper.c         | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 8e39867..87456a3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -723,6 +723,7 @@ extern void *xmmap(void *start, size_t length, int prot, int flags, int fd, off_
 extern void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_t offset);
 extern int xopen(const char *path, int flags, ...);
 extern ssize_t xread(int fd, void *buf, size_t len);
+extern ssize_t xread_nonblock(int fd, void *buf, size_t len);
 extern ssize_t xwrite(int fd, const void *buf, size_t len);
 extern ssize_t xpread(int fd, void *buf, size_t len, off_t offset);
 extern int xdup(int fd);
diff --git a/wrapper.c b/wrapper.c
index 4f720fe..f71237c 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -252,6 +252,28 @@ ssize_t xread(int fd, void *buf, size_t len)
 }
 
 /*
+ * xread_nonblock() is the same a read(), but it automatically restarts read()
+ * interrupted operations (EINTR). xread_nonblock() DOES NOT GUARANTEE that
+ * "len" bytes is read. EWOULDBLOCK is turned into EAGAIN.
+ */
+ssize_t xread_nonblock(int fd, void *buf, size_t len)
+{
+	ssize_t nr;
+	if (len > MAX_IO_SIZE)
+		len = MAX_IO_SIZE;
+	while (1) {
+		nr = read(fd, buf, len);
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			if (errno == EWOULDBLOCK)
+				errno = EAGAIN;
+		}
+		return nr;
+	}
+}
+
+/*
  * xwrite() is the same a write(), but it automatically restarts write()
  * operations with a recoverable error (EAGAIN and EINTR). xwrite() DOES NOT
  * GUARANTEE that "len" bytes is written even if the operation is successful.
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/8] strbuf: add strbuf_read_once to read without blocking
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (2 preceding siblings ...)
  2015-12-14 19:37 ` [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 23:16   ` Eric Sunshine
  2015-12-14 19:37 ` [PATCH 5/8] sigchain: add command to pop all common signals Stefan Beller
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

The new call will read from a file descriptor into a strbuf once. The
underlying call xread_nonblock is meant to execute without blocking if
the file descriptor is set to O_NONBLOCK. It is a bug to call
strbuf_read_once on a file descriptor which would block.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 strbuf.c | 11 +++++++++++
 strbuf.h |  8 ++++++++
 2 files changed, 19 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index d76f0ae..b552a13 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -384,6 +384,17 @@ ssize_t strbuf_read(struct strbuf *sb, int fd, size_t hint)
 	return sb->len - oldlen;
 }
 
+ssize_t strbuf_read_once(struct strbuf *sb, int fd, size_t hint)
+{
+	ssize_t cnt;
+
+	strbuf_grow(sb, hint ? hint : 8192);
+	cnt = xread_nonblock(fd, sb->buf + sb->len, sb->alloc - sb->len - 1);
+	if (cnt > 0)
+		strbuf_setlen(sb, sb->len + cnt);
+	return cnt;
+}
+
 #define STRBUF_MAXLINK (2*PATH_MAX)
 
 int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint)
diff --git a/strbuf.h b/strbuf.h
index 7123fca..c3e5980 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -367,6 +367,14 @@ extern size_t strbuf_fread(struct strbuf *, size_t, FILE *);
 extern ssize_t strbuf_read(struct strbuf *, int fd, size_t hint);
 
 /**
+ * Returns the number of new bytes appended to the sb.
+ * Negative return value signals there was an error returned from
+ * underlying read(2), in which case the caller should check errno.
+ * e.g. errno == EAGAIN when the read may have blocked.
+ */
+extern ssize_t strbuf_read_once(struct strbuf *, int fd, size_t hint);
+
+/**
  * Read the contents of a file, specified by its path. The third argument
  * can be used to give a hint about the file size, to avoid reallocs.
  */
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/8] sigchain: add command to pop all common signals
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (3 preceding siblings ...)
  2015-12-14 19:37 ` [PATCH 4/8] strbuf: add strbuf_read_once to read " Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 19:37 ` [PATCH 6/8] run-command: add an asynchronous parallel child processor Stefan Beller
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

The new method removes all common signal handlers that were installed
by sigchain_push.

CC: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 sigchain.c | 9 +++++++++
 sigchain.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/sigchain.c b/sigchain.c
index faa375d..2ac43bb 100644
--- a/sigchain.c
+++ b/sigchain.c
@@ -50,3 +50,12 @@ void sigchain_push_common(sigchain_fun f)
 	sigchain_push(SIGQUIT, f);
 	sigchain_push(SIGPIPE, f);
 }
+
+void sigchain_pop_common(void)
+{
+	sigchain_pop(SIGPIPE);
+	sigchain_pop(SIGQUIT);
+	sigchain_pop(SIGTERM);
+	sigchain_pop(SIGHUP);
+	sigchain_pop(SIGINT);
+}
diff --git a/sigchain.h b/sigchain.h
index 618083b..138b20f 100644
--- a/sigchain.h
+++ b/sigchain.h
@@ -7,5 +7,6 @@ int sigchain_push(int sig, sigchain_fun f);
 int sigchain_pop(int sig);
 
 void sigchain_push_common(sigchain_fun f);
+void sigchain_pop_common(void);
 
 #endif /* SIGCHAIN_H */
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/8] run-command: add an asynchronous parallel child processor
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (4 preceding siblings ...)
  2015-12-14 19:37 ` [PATCH 5/8] sigchain: add command to pop all common signals Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 20:39   ` Johannes Sixt
  2015-12-14 19:37 ` [PATCH 7/8] fetch_populated_submodules: use new parallel job processing Stefan Beller
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

This allows to run external commands in parallel with ordered output
on stderr.

If we run external commands in parallel we cannot pipe the output directly
to the our stdout/err as it would mix up. So each process's output will
flow through a pipe, which we buffer. One subprocess can be directly
piped to out stdout/err for a low latency feedback to the user.

Example:
Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a
different amount of time as the different submodules vary in size, then
the output of fetches in sequential order might look like this:

 time -->
 output: |---A---| |-B-| |-------C-------| |-D-| |-E-|

When we schedule these submodules into maximal two parallel processes,
a schedule and sample output over time may look like this:

process 1: |---A---| |-D-| |-E-|

process 2: |-B-| |-------C-------|

output:    |---A---|B|---C-------|DE

So A will be perceived as it would run normally in the single child
version. As B has finished by the time A is done, we can dump its whole
progress buffer on stderr, such that it looks like it finished in no
time. Once that is done, C is determined to be the visible child and
its progress will be reported in real time.

So this way of output is really good for human consumption, as it only
changes the timing, not the actual output.

For machine consumption the output needs to be prepared in the tasks,
by either having a prefix per line or per block to indicate whose tasks
output is displayed, because the output order may not follow the
original sequential ordering:

 |----A----| |--B--| |-C-|

will be scheduled to be all parallel:

process 1: |----A----|
process 2: |--B--|
process 3: |-C-|
output:    |----A----|CB

This happens because C finished before B did, so it will be queued for
output before B.

The detection when a child has finished executing is done the same way as
two fold. First we check regularly if the stderr pipe still exists in an
interleaved manner with other actions such as checking other children
for their liveliness or starting new children. Once a child closed their
stderr stream, we assume it is stopping very soon, such that we can use
the `finish_command` code borrowed from the single external process
execution interface.

By maintaining the strong assumption of stderr being open until the
very end of a child process, we can avoid other hassle such as an
implementation using `waitpid(-1)`, which is not implemented in Windows.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 run-command.c          | 335 +++++++++++++++++++++++++++++++++++++++++++++++++
 run-command.h          |  80 ++++++++++++
 t/t0061-run-command.sh |  53 ++++++++
 test-run-command.c     |  55 +++++++-
 4 files changed, 522 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index 13fa452..51fd72c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -3,6 +3,8 @@
 #include "exec_cmd.h"
 #include "sigchain.h"
 #include "argv-array.h"
+#include "thread-utils.h"
+#include "strbuf.h"
 
 void child_process_init(struct child_process *child)
 {
@@ -865,3 +867,336 @@ int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
 	close(cmd->out);
 	return finish_command(cmd);
 }
+
+enum child_state {
+	GIT_CP_FREE,
+	GIT_CP_WORKING,
+	GIT_CP_WAIT_CLEANUP,
+};
+
+struct parallel_processes {
+	void *data;
+
+	int max_processes;
+	int nr_processes;
+
+	get_next_task_fn get_next_task;
+	start_failure_fn start_failure;
+	task_finished_fn task_finished;
+
+	struct {
+		enum child_state state;
+		struct child_process process;
+		struct strbuf err;
+		void *data;
+	} *children;
+	/*
+	 * The struct pollfd is logically part of *children,
+	 * but the system call expects it as its own array.
+	 */
+	struct pollfd *pfd;
+
+	unsigned shutdown : 1;
+
+	int output_owner;
+	struct strbuf buffered_output; /* of finished children */
+};
+
+static int default_start_failure(struct child_process *cp,
+				 struct strbuf *err,
+				 void *pp_cb,
+				 void *pp_task_cb)
+{
+	int i;
+
+	strbuf_addstr(err, "Starting a child failed:");
+	for (i = 0; cp->argv[i]; i++)
+		strbuf_addf(err, " %s", cp->argv[i]);
+
+	return 0;
+}
+
+static int default_task_finished(int result,
+				 struct child_process *cp,
+				 struct strbuf *err,
+				 void *pp_cb,
+				 void *pp_task_cb)
+{
+	int i;
+
+	if (!result)
+		return 0;
+
+	strbuf_addf(err, "A child failed with return code %d:", result);
+	for (i = 0; cp->argv[i]; i++)
+		strbuf_addf(err, " %s", cp->argv[i]);
+
+	return 0;
+}
+
+static void kill_children(struct parallel_processes *pp, int signo)
+{
+	int i, n = pp->max_processes;
+
+	for (i = 0; i < n; i++)
+		if (pp->children[i].state == GIT_CP_WORKING)
+			kill(pp->children[i].process.pid, signo);
+}
+
+static struct parallel_processes *pp_for_signal;
+
+static void handle_children_on_signal(int signo)
+{
+	kill_children(pp_for_signal, signo);
+	sigchain_pop(signo);
+	raise(signo);
+}
+
+static void pp_init(struct parallel_processes *pp,
+		    int n,
+		    get_next_task_fn get_next_task,
+		    start_failure_fn start_failure,
+		    task_finished_fn task_finished,
+		    void *data)
+{
+	int i;
+
+	if (n < 1)
+		n = online_cpus();
+
+	pp->max_processes = n;
+
+	trace_printf("run_processes_parallel: preparing to run up to %d tasks", n);
+
+	pp->data = data;
+	if (!get_next_task)
+		die("BUG: you need to specify a get_next_task function");
+	pp->get_next_task = get_next_task;
+
+	pp->start_failure = start_failure ? start_failure : default_start_failure;
+	pp->task_finished = task_finished ? task_finished : default_task_finished;
+
+	pp->nr_processes = 0;
+	pp->output_owner = 0;
+	pp->shutdown = 0;
+	pp->children = xcalloc(n, sizeof(*pp->children));
+	pp->pfd = xcalloc(n, sizeof(*pp->pfd));
+	strbuf_init(&pp->buffered_output, 0);
+
+	for (i = 0; i < n; i++) {
+		strbuf_init(&pp->children[i].err, 0);
+		child_process_init(&pp->children[i].process);
+		pp->pfd[i].events = POLLIN | POLLHUP;
+		pp->pfd[i].fd = -1;
+	}
+
+	pp_for_signal = pp;
+	sigchain_push_common(handle_children_on_signal);
+}
+
+static void pp_cleanup(struct parallel_processes *pp)
+{
+	int i;
+
+	trace_printf("run_processes_parallel: done");
+	for (i = 0; i < pp->max_processes; i++) {
+		strbuf_release(&pp->children[i].err);
+		child_process_clear(&pp->children[i].process);
+	}
+
+	free(pp->children);
+	free(pp->pfd);
+
+	/*
+	 * When get_next_task added messages to the buffer in its last
+	 * iteration, the buffered output is non empty.
+	 */
+	fputs(pp->buffered_output.buf, stderr);
+	strbuf_release(&pp->buffered_output);
+
+	sigchain_pop_common();
+}
+
+/* returns
+ *  0 if a new task was started.
+ *  1 if no new jobs was started (get_next_task ran out of work, non critical
+ *    problem with starting a new command)
+ * <0 no new job was started, user wishes to shutdown early. Use negative code
+ *    to signal the children.
+ */
+static int pp_start_one(struct parallel_processes *pp)
+{
+	int i, code;
+
+	for (i = 0; i < pp->max_processes; i++)
+		if (pp->children[i].state == GIT_CP_FREE)
+			break;
+	if (i == pp->max_processes)
+		die("BUG: bookkeeping is hard");
+
+	code = pp->get_next_task(&pp->children[i].process,
+				 &pp->children[i].err,
+				 pp->data,
+				 &pp->children[i].data);
+	if (!code) {
+		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
+		strbuf_reset(&pp->children[i].err);
+		return 1;
+	}
+	pp->children[i].process.err = -1;
+	pp->children[i].process.stdout_to_stderr = 1;
+	pp->children[i].process.no_stdin = 1;
+
+	if (start_command(&pp->children[i].process)) {
+		code = pp->start_failure(&pp->children[i].process,
+					 &pp->children[i].err,
+					 pp->data,
+					 &pp->children[i].data);
+		strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
+		strbuf_reset(&pp->children[i].err);
+		if (code)
+			pp->shutdown = 1;
+		return code;
+	}
+
+	pp->nr_processes++;
+	pp->children[i].state = GIT_CP_WORKING;
+	pp->pfd[i].fd = pp->children[i].process.err;
+	return 0;
+}
+
+static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
+{
+	int i;
+
+	while ((i = poll(pp->pfd, pp->max_processes, output_timeout)) < 0) {
+		if (errno == EINTR)
+			continue;
+		pp_cleanup(pp);
+		die_errno("poll");
+	}
+
+	/* Buffer output from all pipes. */
+	for (i = 0; i < pp->max_processes; i++) {
+		if (pp->children[i].state == GIT_CP_WORKING &&
+		    pp->pfd[i].revents & (POLLIN | POLLHUP)) {
+			int n = strbuf_read_once(&pp->children[i].err,
+						 pp->children[i].process.err, 0);
+			if (n == 0) {
+				close(pp->children[i].process.err);
+				pp->children[i].state = GIT_CP_WAIT_CLEANUP;
+			} else if (n < 0)
+				if (errno != EAGAIN)
+					die_errno("read");
+		}
+	}
+}
+
+static void pp_output(struct parallel_processes *pp)
+{
+	int i = pp->output_owner;
+	if (pp->children[i].state == GIT_CP_WORKING &&
+	    pp->children[i].err.len) {
+		fputs(pp->children[i].err.buf, stderr);
+		strbuf_reset(&pp->children[i].err);
+	}
+}
+
+static int pp_collect_finished(struct parallel_processes *pp)
+{
+	int i, code;
+	int n = pp->max_processes;
+	int result = 0;
+
+	while (pp->nr_processes > 0) {
+		for (i = 0; i < pp->max_processes; i++)
+			if (pp->children[i].state == GIT_CP_WAIT_CLEANUP)
+				break;
+		if (i == pp->max_processes)
+			break;
+
+		code = finish_command(&pp->children[i].process);
+
+		code = pp->task_finished(code, &pp->children[i].process,
+					 &pp->children[i].err, pp->data,
+					 &pp->children[i].data);
+
+		if (code)
+			result = code;
+		if (code < 0)
+			break;
+
+		pp->nr_processes--;
+		pp->children[i].state = GIT_CP_FREE;
+		pp->pfd[i].fd = -1;
+		child_process_init(&pp->children[i].process);
+
+		if (i != pp->output_owner) {
+			strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
+			strbuf_reset(&pp->children[i].err);
+		} else {
+			fputs(pp->children[i].err.buf, stderr);
+			strbuf_reset(&pp->children[i].err);
+
+			/* Output all other finished child processes */
+			fputs(pp->buffered_output.buf, stderr);
+			strbuf_reset(&pp->buffered_output);
+
+			/*
+			 * Pick next process to output live.
+			 * NEEDSWORK:
+			 * For now we pick it randomly by doing a round
+			 * robin. Later we may want to pick the one with
+			 * the most output or the longest or shortest
+			 * running process time.
+			 */
+			for (i = 0; i < n; i++)
+				if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
+					break;
+			pp->output_owner = (pp->output_owner + i) % n;
+		}
+	}
+	return result;
+}
+
+int run_processes_parallel(int n,
+			   get_next_task_fn get_next_task,
+			   start_failure_fn start_failure,
+			   task_finished_fn task_finished,
+			   void *pp_cb)
+{
+	int i, code;
+	int output_timeout = 100;
+	int spawn_cap = 4;
+	struct parallel_processes pp;
+
+	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb);
+	while (1) {
+		for (i = 0;
+		    i < spawn_cap && !pp.shutdown &&
+		    pp.nr_processes < pp.max_processes;
+		    i++) {
+			code = pp_start_one(&pp);
+			if (!code)
+				continue;
+			if (code < 0) {
+				pp.shutdown = 1;
+				kill_children(&pp, -code);
+			}
+			break;
+		}
+		if (!pp.nr_processes)
+			break;
+		pp_buffer_stderr(&pp, output_timeout);
+		pp_output(&pp);
+		code = pp_collect_finished(&pp);
+		if (code) {
+			pp.shutdown = 1;
+			if (code < 0)
+				kill_children(&pp, -code);
+		}
+	}
+
+	pp_cleanup(&pp);
+	return 0;
+}
diff --git a/run-command.h b/run-command.h
index 12bb26c..d5a57f9 100644
--- a/run-command.h
+++ b/run-command.h
@@ -122,4 +122,84 @@ int start_async(struct async *async);
 int finish_async(struct async *async);
 int in_async(void);
 
+/**
+ * This callback should initialize the child process and preload the
+ * error channel if desired. The preloading of is useful if you want to
+ * have a message printed directly before the output of the child process.
+ * pp_cb is the callback cookie as passed to run_processes_parallel.
+ * You can store a child process specific callback cookie in pp_task_cb.
+ *
+ * Even after returning 0 to indicate that there are no more processes,
+ * this function will be called again until there are no more running
+ * child processes.
+ *
+ * Return 1 if the next child is ready to run.
+ * Return 0 if there are currently no more tasks to be processed.
+ * To send a signal to other child processes for abortion,
+ * return the negative signal number.
+ */
+typedef int (*get_next_task_fn)(struct child_process *cp,
+				struct strbuf *err,
+				void *pp_cb,
+				void **pp_task_cb);
+
+/**
+ * This callback is called whenever there are problems starting
+ * a new process.
+ *
+ * You must not write to stdout or stderr in this function. Add your
+ * message to the strbuf err instead, which will be printed without
+ * messing up the output of the other parallel processes.
+ *
+ * pp_cb is the callback cookie as passed into run_processes_parallel,
+ * pp_task_cb is the callback cookie as passed into get_next_task_fn.
+ *
+ * Return 0 to continue the parallel processing. To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * the negative signal number.
+ */
+typedef int (*start_failure_fn)(struct child_process *cp,
+				struct strbuf *err,
+				void *pp_cb,
+				void *pp_task_cb);
+
+/**
+ * This callback is called on every child process that finished processing.
+ *
+ * You must not write to stdout or stderr in this function. Add your
+ * message to the strbuf err instead, which will be printed without
+ * messing up the output of the other parallel processes.
+ *
+ * pp_cb is the callback cookie as passed into run_processes_parallel,
+ * pp_task_cb is the callback cookie as passed into get_next_task_fn.
+ *
+ * Return 0 to continue the parallel processing.  To abort return non zero.
+ * To send a signal to other child processes for abortion, return
+ * the negative signal number.
+ */
+typedef int (*task_finished_fn)(int result,
+				struct child_process *cp,
+				struct strbuf *err,
+				void *pp_cb,
+				void *pp_task_cb);
+
+/**
+ * Runs up to n processes at the same time. Whenever a process can be
+ * started, the callback get_next_task_fn is called to obtain the data
+ * required to start another child process.
+ *
+ * The children started via this function run in parallel. Their output
+ * (both stdout and stderr) is routed to stderr in a manner that output
+ * from different tasks does not interleave.
+ *
+ * If start_failure_fn or task_finished_fn are NULL, default handlers
+ * will be used. The default handlers will print an error message on
+ * error without issuing an emergency stop.
+ */
+int run_processes_parallel(int n,
+			   get_next_task_fn,
+			   start_failure_fn,
+			   task_finished_fn,
+			   void *pp_cb);
+
 #endif
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 9acf628..12228b4 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -47,4 +47,57 @@ test_expect_success POSIXPERM,SANITY 'unreadable directory in PATH' '
 	test_cmp expect actual
 '
 
+cat >expect <<-EOF
+preloaded output of a child
+Hello
+World
+preloaded output of a child
+Hello
+World
+preloaded output of a child
+Hello
+World
+preloaded output of a child
+Hello
+World
+EOF
+
+test_expect_success 'run_command runs in parallel with more jobs available than tasks' '
+	test-run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
+	test-run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
+	test-run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<-EOF
+preloaded output of a child
+asking for a quick stop
+preloaded output of a child
+asking for a quick stop
+preloaded output of a child
+asking for a quick stop
+EOF
+
+test_expect_success 'run_command is asked to abort gracefully' '
+	test-run-command run-command-abort 3 false 2>actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<-EOF
+no further jobs available
+EOF
+
+test_expect_success 'run_command outputs ' '
+	test-run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/test-run-command.c b/test-run-command.c
index 89c7de2..f964507 100644
--- a/test-run-command.c
+++ b/test-run-command.c
@@ -10,16 +10,54 @@
 
 #include "git-compat-util.h"
 #include "run-command.h"
+#include "argv-array.h"
+#include "strbuf.h"
 #include <string.h>
 #include <errno.h>
 
+static int number_callbacks;
+static int parallel_next(struct child_process *cp,
+			 struct strbuf *err,
+			 void *cb,
+			 void** task_cb)
+{
+	struct child_process *d = cb;
+	if (number_callbacks >= 4)
+		return 0;
+
+	argv_array_pushv(&cp->args, d->argv);
+	strbuf_addf(err, "preloaded output of a child\n");
+	number_callbacks++;
+	return 1;
+}
+
+static int no_job(struct child_process *cp,
+		  struct strbuf *err,
+		  void *cb,
+		  void** task_cb)
+{
+	strbuf_addf(err, "no further jobs available\n");
+	return 0;
+}
+
+static int task_finished(int result,
+			 struct child_process *cp,
+			 struct strbuf *err,
+			 void *pp_cb,
+			 void *pp_task_cb)
+{
+	strbuf_addf(err, "asking for a quick stop\n");
+	return 1;
+}
+
 int main(int argc, char **argv)
 {
 	struct child_process proc = CHILD_PROCESS_INIT;
+	int jobs;
 
 	if (argc < 3)
 		return 1;
-	proc.argv = (const char **)argv+2;
+	proc.argv = (const char **)argv + 2;
 
 	if (!strcmp(argv[1], "start-command-ENOENT")) {
 		if (start_command(&proc) < 0 && errno == ENOENT)
@@ -30,6 +68,21 @@ int main(int argc, char **argv)
 	if (!strcmp(argv[1], "run-command"))
 		exit(run_command(&proc));
 
+	jobs = atoi(argv[2]);
+	proc.argv = (const char **)argv + 3;
+
+	if (!strcmp(argv[1], "run-command-parallel"))
+		exit(run_processes_parallel(jobs, parallel_next,
+					    NULL, NULL, &proc));
+
+	if (!strcmp(argv[1], "run-command-abort"))
+		exit(run_processes_parallel(jobs, parallel_next,
+					    NULL, task_finished, &proc));
+
+	if (!strcmp(argv[1], "run-command-no-jobs"))
+		exit(run_processes_parallel(jobs, no_job,
+					    NULL, task_finished, &proc));
+
 	fprintf(stderr, "check usage\n");
 	return 1;
 }
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/8] fetch_populated_submodules: use new parallel job processing
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (5 preceding siblings ...)
  2015-12-14 19:37 ` [PATCH 6/8] run-command: add an asynchronous parallel child processor Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 19:37 ` [PATCH 8/8] submodules: allow parallel fetching, add tests and documentation Stefan Beller
  2015-12-14 20:40 ` [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Johannes Sixt
  8 siblings, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

In a later patch we enable parallel processing of submodules, this
only adds the possibility for it. So this change should not change
any user facing behavior.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 submodule.c | 142 +++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 98 insertions(+), 44 deletions(-)

diff --git a/submodule.c b/submodule.c
index 8386477..6a2d786 100644
--- a/submodule.c
+++ b/submodule.c
@@ -12,6 +12,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "blob.h"
+#include "thread-utils.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
 static struct string_list changed_submodule_paths;
@@ -610,37 +611,28 @@ static void calculate_changed_submodule_paths(void)
 	initialized_fetch_ref_tips = 0;
 }
 
-int fetch_populated_submodules(const struct argv_array *options,
-			       const char *prefix, int command_line_option,
-			       int quiet)
+struct submodule_parallel_fetch {
+	int count;
+	struct argv_array args;
+	const char *work_tree;
+	const char *prefix;
+	int command_line_option;
+	int quiet;
+	int result;
+};
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0}
+
+static int get_next_submodule(struct child_process *cp,
+			      struct strbuf *err, void *data, void **task_cb)
 {
-	int i, result = 0;
-	struct child_process cp = CHILD_PROCESS_INIT;
-	struct argv_array argv = ARGV_ARRAY_INIT;
-	const char *work_tree = get_git_work_tree();
-	if (!work_tree)
-		goto out;
-
-	if (read_cache() < 0)
-		die("index file corrupt");
-
-	argv_array_push(&argv, "fetch");
-	for (i = 0; i < options->argc; i++)
-		argv_array_push(&argv, options->argv[i]);
-	argv_array_push(&argv, "--recurse-submodules-default");
-	/* default value, "--submodule-prefix" and its value are added later */
-
-	cp.env = local_repo_env;
-	cp.git_cmd = 1;
-	cp.no_stdin = 1;
-
-	calculate_changed_submodule_paths();
+	int ret = 0;
+	struct submodule_parallel_fetch *spf = data;
 
-	for (i = 0; i < active_nr; i++) {
+	for ( ; spf->count < active_nr; spf->count++) {
 		struct strbuf submodule_path = STRBUF_INIT;
 		struct strbuf submodule_git_dir = STRBUF_INIT;
 		struct strbuf submodule_prefix = STRBUF_INIT;
-		const struct cache_entry *ce = active_cache[i];
+		const struct cache_entry *ce = active_cache[spf->count];
 		const char *git_dir, *default_argv;
 		const struct submodule *submodule;
 
@@ -652,7 +644,7 @@ int fetch_populated_submodules(const struct argv_array *options,
 			submodule = submodule_from_name(null_sha1, ce->name);
 
 		default_argv = "yes";
-		if (command_line_option == RECURSE_SUBMODULES_DEFAULT) {
+		if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
 			if (submodule &&
 			    submodule->fetch_recurse !=
 						RECURSE_SUBMODULES_NONE) {
@@ -675,40 +667,102 @@ int fetch_populated_submodules(const struct argv_array *options,
 					default_argv = "on-demand";
 				}
 			}
-		} else if (command_line_option == RECURSE_SUBMODULES_ON_DEMAND) {
+		} else if (spf->command_line_option == RECURSE_SUBMODULES_ON_DEMAND) {
 			if (!unsorted_string_list_lookup(&changed_submodule_paths, ce->name))
 				continue;
 			default_argv = "on-demand";
 		}
 
-		strbuf_addf(&submodule_path, "%s/%s", work_tree, ce->name);
+		strbuf_addf(&submodule_path, "%s/%s", spf->work_tree, ce->name);
 		strbuf_addf(&submodule_git_dir, "%s/.git", submodule_path.buf);
-		strbuf_addf(&submodule_prefix, "%s%s/", prefix, ce->name);
+		strbuf_addf(&submodule_prefix, "%s%s/", spf->prefix, ce->name);
 		git_dir = read_gitfile(submodule_git_dir.buf);
 		if (!git_dir)
 			git_dir = submodule_git_dir.buf;
 		if (is_directory(git_dir)) {
-			if (!quiet)
-				fprintf(stderr, "Fetching submodule %s%s\n", prefix, ce->name);
-			cp.dir = submodule_path.buf;
-			argv_array_push(&argv, default_argv);
-			argv_array_push(&argv, "--submodule-prefix");
-			argv_array_push(&argv, submodule_prefix.buf);
-			cp.argv = argv.argv;
-			if (run_command(&cp))
-				result = 1;
-			argv_array_pop(&argv);
-			argv_array_pop(&argv);
-			argv_array_pop(&argv);
+			child_process_init(cp);
+			cp->dir = strbuf_detach(&submodule_path, NULL);
+			cp->env = local_repo_env;
+			cp->git_cmd = 1;
+			if (!spf->quiet)
+				strbuf_addf(err, "Fetching submodule %s%s\n",
+					    spf->prefix, ce->name);
+			argv_array_init(&cp->args);
+			argv_array_pushv(&cp->args, spf->args.argv);
+			argv_array_push(&cp->args, default_argv);
+			argv_array_push(&cp->args, "--submodule-prefix");
+			argv_array_push(&cp->args, submodule_prefix.buf);
+			ret = 1;
 		}
 		strbuf_release(&submodule_path);
 		strbuf_release(&submodule_git_dir);
 		strbuf_release(&submodule_prefix);
+		if (ret) {
+			spf->count++;
+			return 1;
+		}
 	}
-	argv_array_clear(&argv);
+	return 0;
+}
+
+static int fetch_start_failure(struct child_process *cp,
+			       struct strbuf *err,
+			       void *cb, void *task_cb)
+{
+	struct submodule_parallel_fetch *spf = cb;
+
+	spf->result = 1;
+
+	return 0;
+}
+
+static int fetch_finish(int retvalue, struct child_process *cp,
+			struct strbuf *err, void *cb, void *task_cb)
+{
+	struct submodule_parallel_fetch *spf = cb;
+
+	if (retvalue)
+		spf->result = 1;
+
+	return 0;
+}
+
+int fetch_populated_submodules(const struct argv_array *options,
+			       const char *prefix, int command_line_option,
+			       int quiet)
+{
+	int i;
+	int max_parallel_jobs = 1;
+	struct submodule_parallel_fetch spf = SPF_INIT;
+
+	spf.work_tree = get_git_work_tree();
+	spf.command_line_option = command_line_option;
+	spf.quiet = quiet;
+	spf.prefix = prefix;
+
+	if (!spf.work_tree)
+		goto out;
+
+	if (read_cache() < 0)
+		die("index file corrupt");
+
+	argv_array_push(&spf.args, "fetch");
+	for (i = 0; i < options->argc; i++)
+		argv_array_push(&spf.args, options->argv[i]);
+	argv_array_push(&spf.args, "--recurse-submodules-default");
+	/* default value, "--submodule-prefix" and its value are added later */
+
+	calculate_changed_submodule_paths();
+	run_processes_parallel(max_parallel_jobs,
+			       get_next_submodule,
+			       fetch_start_failure,
+			       fetch_finish,
+			       &spf);
+
+	argv_array_clear(&spf.args);
 out:
 	string_list_clear(&changed_submodule_paths, 1);
-	return result;
+	return spf.result;
 }
 
 unsigned is_submodule_modified(const char *path, int ignore_untracked)
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 8/8] submodules: allow parallel fetching, add tests and documentation
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (6 preceding siblings ...)
  2015-12-14 19:37 ` [PATCH 7/8] fetch_populated_submodules: use new parallel job processing Stefan Beller
@ 2015-12-14 19:37 ` Stefan Beller
  2015-12-14 20:40 ` [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Johannes Sixt
  8 siblings, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 19:37 UTC (permalink / raw)
  To: sbeller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

This enables the work of the previous patches.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/fetch-options.txt |  7 +++++++
 builtin/fetch.c                 |  6 +++++-
 builtin/pull.c                  |  6 ++++++
 submodule.c                     |  3 +--
 submodule.h                     |  2 +-
 t/t5526-fetch-submodules.sh     | 20 ++++++++++++++++++++
 6 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
index 45583d8..6b109f6 100644
--- a/Documentation/fetch-options.txt
+++ b/Documentation/fetch-options.txt
@@ -100,6 +100,13 @@ ifndef::git-pull[]
 	reference to a commit that isn't already in the local submodule
 	clone.
 
+-j::
+--jobs=<n>::
+	Number of parallel children to be used for fetching submodules.
+	Each will fetch from different submodules, such that fetching many
+	submodules will be faster. By default submodules will be fetched
+	one at a time.
+
 --no-recurse-submodules::
 	Disable recursive fetching of submodules (this has the same effect as
 	using the '--recurse-submodules=no' option).
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c85f347..586840d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -37,6 +37,7 @@ static int prune = -1; /* unspecified */
 static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow, update_shallow;
+static int max_children = 1;
 static const char *depth;
 static const char *upload_pack;
 static struct strbuf default_rla = STRBUF_INIT;
@@ -99,6 +100,8 @@ static struct option builtin_fetch_options[] = {
 		    N_("fetch all tags and associated objects"), TAGS_SET),
 	OPT_SET_INT('n', NULL, &tags,
 		    N_("do not fetch all tags (--no-tags)"), TAGS_UNSET),
+	OPT_INTEGER('j', "jobs", &max_children,
+		    N_("number of submodules fetched in parallel")),
 	OPT_BOOL('p', "prune", &prune,
 		 N_("prune remote-tracking branches no longer on remote")),
 	{ OPTION_CALLBACK, 0, "recurse-submodules", NULL, N_("on-demand"),
@@ -1213,7 +1216,8 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_populated_submodules(&options,
 						    submodule_prefix,
 						    recurse_submodules,
-						    verbosity < 0);
+						    verbosity < 0,
+						    max_children);
 		argv_array_clear(&options);
 	}
 
diff --git a/builtin/pull.c b/builtin/pull.c
index 5145fc6..9e3c738 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -95,6 +95,7 @@ static int opt_force;
 static char *opt_tags;
 static char *opt_prune;
 static char *opt_recurse_submodules;
+static char *max_children;
 static int opt_dry_run;
 static char *opt_keep;
 static char *opt_depth;
@@ -178,6 +179,9 @@ static struct option pull_options[] = {
 		N_("on-demand"),
 		N_("control recursive fetching of submodules"),
 		PARSE_OPT_OPTARG),
+	OPT_PASSTHRU('j', "jobs", &max_children, N_("n"),
+		N_("number of submodules pulled in parallel"),
+		PARSE_OPT_OPTARG),
 	OPT_BOOL(0, "dry-run", &opt_dry_run,
 		N_("dry run")),
 	OPT_PASSTHRU('k', "keep", &opt_keep, NULL,
@@ -525,6 +529,8 @@ static int run_fetch(const char *repo, const char **refspecs)
 		argv_array_push(&args, opt_prune);
 	if (opt_recurse_submodules)
 		argv_array_push(&args, opt_recurse_submodules);
+	if (max_children)
+		argv_array_push(&args, max_children);
 	if (opt_dry_run)
 		argv_array_push(&args, "--dry-run");
 	if (opt_keep)
diff --git a/submodule.c b/submodule.c
index 6a2d786..0b48734 100644
--- a/submodule.c
+++ b/submodule.c
@@ -729,10 +729,9 @@ static int fetch_finish(int retvalue, struct child_process *cp,
 
 int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
-			       int quiet)
+			       int quiet, int max_parallel_jobs)
 {
 	int i;
-	int max_parallel_jobs = 1;
 	struct submodule_parallel_fetch spf = SPF_INIT;
 
 	spf.work_tree = get_git_work_tree();
diff --git a/submodule.h b/submodule.h
index 5507c3d..cbc0003 100644
--- a/submodule.h
+++ b/submodule.h
@@ -31,7 +31,7 @@ void set_config_fetch_recurse_submodules(int value);
 void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 int fetch_populated_submodules(const struct argv_array *options,
 			       const char *prefix, int command_line_option,
-			       int quiet);
+			       int quiet, int max_parallel_jobs);
 unsigned is_submodule_modified(const char *path, int ignore_untracked);
 int submodule_uses_gitfile(const char *path);
 int ok_to_remove_submodule(const char *path);
diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index 17759b14..1241146 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -71,6 +71,17 @@ test_expect_success "fetch --recurse-submodules recurses into submodules" '
 	test_i18ncmp expect.err actual.err
 '
 
+test_expect_success "fetch --recurse-submodules -j2 has the same output behaviour" '
+	add_upstream_commit &&
+	(
+		cd downstream &&
+		GIT_TRACE=$(pwd)/../trace.out git fetch --recurse-submodules -j2 2>../actual.err
+	) &&
+	test_must_be_empty actual.out &&
+	test_i18ncmp expect.err actual.err &&
+	grep "2 tasks" trace.out
+'
+
 test_expect_success "fetch alone only fetches superproject" '
 	add_upstream_commit &&
 	(
@@ -140,6 +151,15 @@ test_expect_success "--quiet propagates to submodules" '
 	! test -s actual.err
 '
 
+test_expect_success "--quiet propagates to parallel submodules" '
+	(
+		cd downstream &&
+		git fetch --recurse-submodules -j 2 --quiet  >../actual.out 2>../actual.err
+	) &&
+	! test -s actual.out &&
+	! test -s actual.err
+'
+
 test_expect_success "--dry-run propagates to submodules" '
 	add_upstream_commit &&
 	(
-- 
2.6.4.443.ge094245.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/8] run-command: add an asynchronous parallel child processor
  2015-12-14 19:37 ` [PATCH 6/8] run-command: add an asynchronous parallel child processor Stefan Beller
@ 2015-12-14 20:39   ` Johannes Sixt
  2015-12-14 21:40     ` Stefan Beller
  0 siblings, 1 reply; 32+ messages in thread
From: Johannes Sixt @ 2015-12-14 20:39 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine

Am 14.12.2015 um 20:37 schrieb Stefan Beller:
> This allows to run external commands in parallel with ordered output
> on stderr.
>
> If we run external commands in parallel we cannot pipe the output directly
> to the our stdout/err as it would mix up. So each process's output will
> flow through a pipe, which we buffer. One subprocess can be directly
> piped to out stdout/err for a low latency feedback to the user.
>
> Example:
> Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a
> different amount of time as the different submodules vary in size, then
> the output of fetches in sequential order might look like this:
>
>   time -->
>   output: |---A---| |-B-| |-------C-------| |-D-| |-E-|
>
> When we schedule these submodules into maximal two parallel processes,
> a schedule and sample output over time may look like this:
>
> process 1: |---A---| |-D-| |-E-|
>
> process 2: |-B-| |-------C-------|
>
> output:    |---A---|B|---C-------|DE
>
> So A will be perceived as it would run normally in the single child
> version. As B has finished by the time A is done, we can dump its whole
> progress buffer on stderr, such that it looks like it finished in no
> time. Once that is done, C is determined to be the visible child and
> its progress will be reported in real time.
>
> So this way of output is really good for human consumption, as it only
> changes the timing, not the actual output.
>
> For machine consumption the output needs to be prepared in the tasks,
> by either having a prefix per line or per block to indicate whose tasks
> output is displayed, because the output order may not follow the
> original sequential ordering:
>
>   |----A----| |--B--| |-C-|
>
> will be scheduled to be all parallel:
>
> process 1: |----A----|
> process 2: |--B--|
> process 3: |-C-|
> output:    |----A----|CB
>
> This happens because C finished before B did, so it will be queued for
> output before B.
>
> The detection when a child has finished executing is done the same way as
> two fold. First we check regularly if the stderr pipe still exists in an
> interleaved manner with other actions such as checking other children
> for their liveliness or starting new children. Once a child closed their
> stderr stream, we assume it is stopping very soon, such that we can use
> the `finish_command` code borrowed from the single external process
> execution interface.

I can't quite parse the first sentence in this paragraph. Perhaps 
something like this:

To detect when a child has finished executing, we check interleaved
with other actions (such as checking the liveliness of children or
starting new processes) whether the stderr pipe still exists. Once a
child closed its stderr stream, we assume it is terminating very soon,
and use finish_command() from the single external process execution
interface to collect the exit status.

>
> By maintaining the strong assumption of stderr being open until the
> very end of a child process, we can avoid other hassle such as an
> implementation using `waitpid(-1)`, which is not implemented in Windows.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7
  2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
                   ` (7 preceding siblings ...)
  2015-12-14 19:37 ` [PATCH 8/8] submodules: allow parallel fetching, add tests and documentation Stefan Beller
@ 2015-12-14 20:40 ` Johannes Sixt
  2015-12-14 21:00   ` Junio C Hamano
  8 siblings, 1 reply; 32+ messages in thread
From: Johannes Sixt @ 2015-12-14 20:40 UTC (permalink / raw)
  To: Stefan Beller, git
  Cc: peff, gitster, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine

Am 14.12.2015 um 20:37 schrieb Stefan Beller:
> I am sending out a new version for replacing sb/submodule-parallel-fetch for
> the time after the 2.7 release.
>
> The content are
>   * all patches as in the branch sb/submodule-parallel-fetch
>   * inlcuding the fixups as suggested by Hannes,
>   * write a message to the debug log for better testing and debugging purposes
>    (a patch cherry picked from the series which is supposed to build on top of this)
>
> The patches themselves were rebased such that there are no fixup commits
> any more, but we get things right the first time.
>
> The commit message of "run-command: add an asynchronous parallel child processor"
> has slightly been updated to mention the fact that we don't want to use waitpid(-1)
> but rather use the assumption of child's stderr living as long as the child itself.

Thanks! I rebased a version of sb/submodule-parallel-fetch that includes 
my suggested improvements, and the result is identical to this series 
except for the trace output mentioned in the last bullet point.

With or without addressing my note about the commit message in 6/8:

Acked-by: Johannes Sixt <j6t@kdbg.org>

> Thanks,
> Stefan
>
>
> Jonathan Nieder (1):
>    submodule.c: write "Fetching submodule <foo>" to stderr
>
> Stefan Beller (7):
>    xread: poll on non blocking fds
>    xread_nonblock: add functionality to read from fds without blocking
>    strbuf: add strbuf_read_once to read without blocking
>    sigchain: add command to pop all common signals
>    run-command: add an asynchronous parallel child processor
>    fetch_populated_submodules: use new parallel job processing
>    submodules: allow parallel fetching, add tests and documentation
>
>   Documentation/fetch-options.txt |   7 +
>   builtin/fetch.c                 |   6 +-
>   builtin/pull.c                  |   6 +
>   git-compat-util.h               |   1 +
>   run-command.c                   | 335 ++++++++++++++++++++++++++++++++++++++++
>   run-command.h                   |  80 ++++++++++
>   sigchain.c                      |   9 ++
>   sigchain.h                      |   1 +
>   strbuf.c                        |  11 ++
>   strbuf.h                        |   8 +
>   submodule.c                     | 141 +++++++++++------
>   submodule.h                     |   2 +-
>   t/t0061-run-command.sh          |  53 +++++++
>   t/t5526-fetch-submodules.sh     |  71 ++++++---
>   test-run-command.c              |  55 ++++++-
>   wrapper.c                       |  35 ++++-
>   16 files changed, 747 insertions(+), 74 deletions(-)
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 19:37 ` [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking Stefan Beller
@ 2015-12-14 20:59   ` Junio C Hamano
  2015-12-14 23:03   ` Eric Sunshine
  1 sibling, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2015-12-14 20:59 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, peff, jrnieder, johannes.schindelin, Jens.Lehmann,
	ericsunshine, j6t

Stefan Beller <sbeller@google.com> writes:

> Provide a wrapper to read(), similar to xread(), that restarts on
> EINTR but not EAGAIN (or EWOULDBLOCK). This enables the caller to
> handle polling itself, possibly polling multiple sockets or performing
> some other action.

Do you still need this (and use of this in 4/8)?  The description of
a4433fd4 (run-command: remove set_nonblocking(), 2015-11-05) from
the previous iteration justifies the removal of set_nonblocking()
this way:

    run-command: remove set_nonblocking()
    
    strbuf_read_once can also operate on blocking file descriptors if we
    are sure they are ready.  And the poll(2) we call before calling
    this ensures that this is the case.

Updated run-command in this reroll lacks set_nonblocking(), and does
have the poll(2) before it calls strbuf_read_once().  Which means
that xread_nonblock() introduced here and used by [4/8] would read a
file descriptor that is not marked as nonblock, the read(2) in there
would never error-return with EWOULDBLOCK, and would be identical to 
xread() updated by [2/8] in this reroll.

So it appears to me that you can lose this and call xread() in [4/8]?

Ahh, there is a difference if the file descriptor the caller feeds
strbuf_read_once() happens to be marked as nonblock.  read_once()
wants to return without doing the poll() in such a case.  Even
though this series would not introduce any use of a nonblocking file
descriptor, as a general API function, [4/8] must be prepared for
such a future caller, hence [3/8] is needed.

OK, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7
  2015-12-14 20:40 ` [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Johannes Sixt
@ 2015-12-14 21:00   ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2015-12-14 21:00 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Stefan Beller, git, peff, jrnieder, johannes.schindelin,
	Jens.Lehmann, ericsunshine

Johannes Sixt <j6t@kdbg.org> writes:

> Am 14.12.2015 um 20:37 schrieb Stefan Beller:
>> I am sending out a new version for replacing sb/submodule-parallel-fetch for
>> the time after the 2.7 release.
>>
>> The content are
>>   * all patches as in the branch sb/submodule-parallel-fetch
>>   * inlcuding the fixups as suggested by Hannes,
>>   * write a message to the debug log for better testing and debugging purposes
>>    (a patch cherry picked from the series which is supposed to build on top of this)
>>
>> The patches themselves were rebased such that there are no fixup commits
>> any more, but we get things right the first time.
>>
>> The commit message of "run-command: add an asynchronous parallel child processor"
>> has slightly been updated to mention the fact that we don't want to use waitpid(-1)
>> but rather use the assumption of child's stderr living as long as the child itself.
>
> Thanks! I rebased a version of sb/submodule-parallel-fetch that
> includes my suggested improvements, and the result is identical to
> this series except for the trace output mentioned in the last bullet
> point.
>
> With or without addressing my note about the commit message in 6/8:
>
> Acked-by: Johannes Sixt <j6t@kdbg.org>

Thanks, both.  This round does look very clean to build on top.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/8] run-command: add an asynchronous parallel child processor
  2015-12-14 20:39   ` Johannes Sixt
@ 2015-12-14 21:40     ` Stefan Beller
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 21:40 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Eric Sunshine

On Mon, Dec 14, 2015 at 12:39 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>
> I can't quite parse the first sentence in this paragraph. Perhaps something
> like this:
>
> To detect when a child has finished executing, we check interleaved
> with other actions (such as checking the liveliness of children or
> starting new processes) whether the stderr pipe still exists. Once a
> child closed its stderr stream, we assume it is terminating very soon,
> and use finish_command() from the single external process execution
> interface to collect the exit status.

Sounds much better than my words. If a resend is necessary, I'll have your
reworded version of the paragraph instead.

>
>
>>
>> By maintaining the strong assumption of stderr being open until the
>> very end of a child process, we can avoid other hassle such as an
>> implementation using `waitpid(-1)`, which is not implemented in Windows.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xread: poll on non blocking fds
  2015-12-14 19:37 ` [PATCH 2/8] xread: poll on non blocking fds Stefan Beller
@ 2015-12-14 22:58   ` Eric Sunshine
  2015-12-14 23:07     ` Stefan Beller
  2015-12-14 23:11     ` Junio C Hamano
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Sunshine @ 2015-12-14 22:58 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 2:37 PM, Stefan Beller <sbeller@google.com> wrote:
> From the man page:
> EAGAIN The file descriptor fd refers to a file other than a socket
>        and has been marked nonblocking (O_NONBLOCK), and the read
>        would block.
>
> EAGAIN or EWOULDBLOCK
>        The file descriptor fd refers to a socket and has been marked
>        nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
>        allows either error to be returned for this case, and does not
>        require these constants to have the same value, so a portable
>        application should check for both possibilities.
>
> If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
> As the intent of xread is to read as much as possible either until the
> fd is EOF or an actual error occurs, we can ease the feeder of the fd
> by not spinning the whole time, but rather wait for it politely by not
> busy waiting.
>
> We should not care if the call to poll failed, as we're in an infinite
> loop and can only get out with the correct read().
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff --git a/wrapper.c b/wrapper.c
> index 6fcaa4d..4f720fe 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -236,8 +236,17 @@ ssize_t xread(int fd, void *buf, size_t len)
>             len = MAX_IO_SIZE;
>         while (1) {
>                 nr = read(fd, buf, len);
> -               if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
> -                       continue;
> +               if (nr < 0) {
> +                       if (errno == EINTR)
> +                               continue;
> +                       if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +                               struct pollfd pfd;
> +                               pfd.events = POLLIN;
> +                               pfd.fd = fd;
> +                               /* We deliberately ignore the return value */

This comment tells us what the code itself already says, but not why
the value is being ignored. The reader still has to consult the commit
message to learn that detail, which makes the value of the comment
questionable.

> +                               poll(&pfd, 1, -1);
> +                       }
> +               }
>                 return nr;
>         }
>  }
> --
> 2.6.4.443.ge094245.dirty
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 19:37 ` [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking Stefan Beller
  2015-12-14 20:59   ` Junio C Hamano
@ 2015-12-14 23:03   ` Eric Sunshine
  2015-12-14 23:05     ` Eric Sunshine
  2015-12-14 23:15     ` Junio C Hamano
  1 sibling, 2 replies; 32+ messages in thread
From: Eric Sunshine @ 2015-12-14 23:03 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 2:37 PM, Stefan Beller <sbeller@google.com> wrote:
> Provide a wrapper to read(), similar to xread(), that restarts on
> EINTR but not EAGAIN (or EWOULDBLOCK). This enables the caller to
> handle polling itself, possibly polling multiple sockets or performing
> some other action.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
> diff --git a/wrapper.c b/wrapper.c
> @@ -252,6 +252,28 @@ ssize_t xread(int fd, void *buf, size_t len)
> + * xread_nonblock() is the same a read(), but it automatically restarts read()
> + * interrupted operations (EINTR). xread_nonblock() DOES NOT GUARANTEE that
> + * "len" bytes is read. EWOULDBLOCK is turned into EAGAIN.

The last sentence is confusing. From the commit message, we learn that
this function doesn't care about EAGAIN or EWOULDBLOCK, yet the above
comment seems to imply that it does. What it really ought to be saying
is that "as a convenience, errno is transformed from EWOULDBLOCK to
EAGAIN so that the caller only has to check for EAGAIN".

(It also feels weird that this should be messing with errno at all,
but that's a different issue...)

> + */
> +ssize_t xread_nonblock(int fd, void *buf, size_t len)
> +{
> +       ssize_t nr;
> +       if (len > MAX_IO_SIZE)
> +               len = MAX_IO_SIZE;
> +       while (1) {
> +               nr = read(fd, buf, len);
> +               if (nr < 0) {
> +                       if (errno == EINTR)
> +                               continue;
> +                       if (errno == EWOULDBLOCK)
> +                               errno = EAGAIN;
> +               }
> +               return nr;
> +       }
> +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 23:03   ` Eric Sunshine
@ 2015-12-14 23:05     ` Eric Sunshine
  2015-12-14 23:15     ` Junio C Hamano
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Sunshine @ 2015-12-14 23:05 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 6:03 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
> On Mon, Dec 14, 2015 at 2:37 PM, Stefan Beller <sbeller@google.com> wrote:
>> Provide a wrapper to read(), similar to xread(), that restarts on
>> EINTR but not EAGAIN (or EWOULDBLOCK). This enables the caller to
>> handle polling itself, possibly polling multiple sockets or performing
>> some other action.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>> diff --git a/wrapper.c b/wrapper.c
>> @@ -252,6 +252,28 @@ ssize_t xread(int fd, void *buf, size_t len)
>> + * xread_nonblock() is the same a read(), but it automatically restarts read()
>> + * interrupted operations (EINTR). xread_nonblock() DOES NOT GUARANTEE that
>> + * "len" bytes is read. EWOULDBLOCK is turned into EAGAIN.
>
> The last sentence is confusing. From the commit message, we learn that
> this function doesn't care about EAGAIN or EWOULDBLOCK, yet the above
> comment seems to imply that it does. What it really ought to be saying
> is that "as a convenience, errno is transformed from EWOULDBLOCK to
> EAGAIN so that the caller only has to check for EAGAIN".

I forgot to mention that the mutation of EWOULDBLOCK into EAGAIN is
something that should be described in the header rather than here at
the source. In fact, the entire comment block above this function is
more suitable in the header file.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xread: poll on non blocking fds
  2015-12-14 22:58   ` Eric Sunshine
@ 2015-12-14 23:07     ` Stefan Beller
  2015-12-14 23:11     ` Junio C Hamano
  1 sibling, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 23:07 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 2:58 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
>> +                               /* We deliberately ignore the return value */
>
> This comment tells us what the code itself already says, but not why
> the value is being ignored. The reader still has to consult the commit
> message to learn that detail, which makes the value of the comment
> questionable.

It at least tells you it's not because of sloppiness. ;)
I'll elaborate that comment a bit further in a resend though.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xread: poll on non blocking fds
  2015-12-14 22:58   ` Eric Sunshine
  2015-12-14 23:07     ` Stefan Beller
@ 2015-12-14 23:11     ` Junio C Hamano
  2015-12-14 23:14       ` Stefan Beller
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2015-12-14 23:11 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Stefan Beller, Git List, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

Eric Sunshine <ericsunshine@gmail.com> writes:

> This comment tells us what the code itself already says, but not why
> the value is being ignored. The reader still has to consult the commit
> message to learn that detail, which makes the value of the comment
> questionable.

Let's do this for now, then.

-- >8 --
From: Stefan Beller <sbeller@google.com>
Date: Mon, 14 Dec 2015 11:37:12 -0800
Subject: [PATCH] xread: poll on non blocking fds

The man page of read(2) says:

  EAGAIN The file descriptor fd refers to a file other than a socket
	 and has been marked nonblocking (O_NONBLOCK), and the read
	 would block.

  EAGAIN or EWOULDBLOCK
	 The file descriptor fd refers to a socket and has been marked
	 nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
	 allows either error to be returned for this case, and does not
	 require these constants to have the same value, so a portable
	 application should check for both possibilities.

If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
As the intent of xread is to read as much as possible either until the
fd is EOF or an actual error occurs, we can ease the feeder of the fd
by not spinning the whole time, but rather wait for it politely by not
busy waiting.

We should not care if the call to poll failed, as we're in an infinite
loop and can only get out with the correct read().

Signed-off-by: Stefan Beller <sbeller@google.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 wrapper.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index 6fcaa4d..1770efa 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -236,8 +236,24 @@ ssize_t xread(int fd, void *buf, size_t len)
 	    len = MAX_IO_SIZE;
 	while (1) {
 		nr = read(fd, buf, len);
-		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
-			continue;
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			if (errno == EAGAIN || errno == EWOULDBLOCK) {
+				struct pollfd pfd;
+				pfd.events = POLLIN;
+				pfd.fd = fd;
+				/*
+				 * it is OK if this poll() failed; we
+				 * want to leave this infinite loop
+				 * only when read() returns with
+				 * success, or an expected failure,
+				 * which would be checked by the next
+				 * call to read(2).
+				 */
+				poll(&pfd, 1, -1);
+			}
+		}
 		return nr;
 	}
 }
-- 
2.7.0-rc0-109-gb762328

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xread: poll on non blocking fds
  2015-12-14 23:11     ` Junio C Hamano
@ 2015-12-14 23:14       ` Stefan Beller
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 23:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Eric Sunshine, Git List, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 3:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <ericsunshine@gmail.com> writes:
>
>> This comment tells us what the code itself already says, but not why
>> the value is being ignored. The reader still has to consult the commit
>> message to learn that detail, which makes the value of the comment
>> questionable.
>
> Let's do this for now, then.

That looks good to me. I'll pick it up for the resend.

>
> -- >8 --
> From: Stefan Beller <sbeller@google.com>
> Date: Mon, 14 Dec 2015 11:37:12 -0800
> Subject: [PATCH] xread: poll on non blocking fds
>
> The man page of read(2) says:
>
>   EAGAIN The file descriptor fd refers to a file other than a socket
>          and has been marked nonblocking (O_NONBLOCK), and the read
>          would block.
>
>   EAGAIN or EWOULDBLOCK
>          The file descriptor fd refers to a socket and has been marked
>          nonblocking (O_NONBLOCK), and the read would block.  POSIX.1-2001
>          allows either error to be returned for this case, and does not
>          require these constants to have the same value, so a portable
>          application should check for both possibilities.
>
> If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
> As the intent of xread is to read as much as possible either until the
> fd is EOF or an actual error occurs, we can ease the feeder of the fd
> by not spinning the whole time, but rather wait for it politely by not
> busy waiting.
>
> We should not care if the call to poll failed, as we're in an infinite
> loop and can only get out with the correct read().
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Acked-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  wrapper.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/wrapper.c b/wrapper.c
> index 6fcaa4d..1770efa 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -236,8 +236,24 @@ ssize_t xread(int fd, void *buf, size_t len)
>             len = MAX_IO_SIZE;
>         while (1) {
>                 nr = read(fd, buf, len);
> -               if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
> -                       continue;
> +               if (nr < 0) {
> +                       if (errno == EINTR)
> +                               continue;
> +                       if (errno == EAGAIN || errno == EWOULDBLOCK) {
> +                               struct pollfd pfd;
> +                               pfd.events = POLLIN;
> +                               pfd.fd = fd;
> +                               /*
> +                                * it is OK if this poll() failed; we
> +                                * want to leave this infinite loop
> +                                * only when read() returns with
> +                                * success, or an expected failure,
> +                                * which would be checked by the next
> +                                * call to read(2).
> +                                */
> +                               poll(&pfd, 1, -1);
> +                       }
> +               }
>                 return nr;
>         }
>  }
> --
> 2.7.0-rc0-109-gb762328
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 23:03   ` Eric Sunshine
  2015-12-14 23:05     ` Eric Sunshine
@ 2015-12-14 23:15     ` Junio C Hamano
  2015-12-14 23:57       ` Jeff King
  1 sibling, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2015-12-14 23:15 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Stefan Beller, Git List, Jeff King, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

Eric Sunshine <ericsunshine@gmail.com> writes:

> The last sentence is confusing. From the commit message, we learn that
> this function doesn't care about EAGAIN or EWOULDBLOCK, yet the above
> comment seems to imply that it does. What it really ought to be saying
> is that "as a convenience, errno is transformed from EWOULDBLOCK to
> EAGAIN so that the caller only has to check for EAGAIN".

Let's do this for now, then.

-- >8 --
From: Stefan Beller <sbeller@google.com>
Date: Mon, 14 Dec 2015 11:37:13 -0800
Subject: [PATCH] xread_nonblock: add functionality to read from fds without blocking

Provide a wrapper to read(), similar to xread(), that restarts on
EINTR but not EAGAIN (or EWOULDBLOCK). This enables the caller to
handle polling itself, possibly polling multiple sockets or performing
some other action.

Helped-by: Jacob Keller <jacob.keller@gmail.com>
Helped-by: Jeff King <peff@peff.net>,
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-compat-util.h |  1 +
 wrapper.c         | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 8e39867..87456a3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -723,6 +723,7 @@ extern void *xmmap(void *start, size_t length, int prot, int flags, int fd, off_
 extern void *xmmap_gently(void *start, size_t length, int prot, int flags, int fd, off_t offset);
 extern int xopen(const char *path, int flags, ...);
 extern ssize_t xread(int fd, void *buf, size_t len);
+extern ssize_t xread_nonblock(int fd, void *buf, size_t len);
 extern ssize_t xwrite(int fd, const void *buf, size_t len);
 extern ssize_t xpread(int fd, void *buf, size_t len, off_t offset);
 extern int xdup(int fd);
diff --git a/wrapper.c b/wrapper.c
index 1770efa..0b5b03d 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -259,6 +259,30 @@ ssize_t xread(int fd, void *buf, size_t len)
 }
 
 /*
+ * xread_nonblock() automatically restarts interrupted operations
+ * (EINTR). xread_nonblock() DOES NOT GUARANTEE that "len" bytes is
+ * read.  For convenience to callers that mark the file descriptor
+ * non-blocking, EWOULDBLOCK is turned into EAGAIN to allow them to
+ * check only for EAGAIN (POSIX.1 allows either to be returned).
+ */
+ssize_t xread_nonblock(int fd, void *buf, size_t len)
+{
+	ssize_t nr;
+	if (len > MAX_IO_SIZE)
+		len = MAX_IO_SIZE;
+	while (1) {
+		nr = read(fd, buf, len);
+		if (nr < 0) {
+			if (errno == EINTR)
+				continue;
+			if (errno == EWOULDBLOCK)
+				errno = EAGAIN;
+		}
+		return nr;
+	}
+}
+
+/*
  * xwrite() is the same a write(), but it automatically restarts write()
  * operations with a recoverable error (EAGAIN and EINTR). xwrite() DOES NOT
  * GUARANTEE that "len" bytes is written even if the operation is successful.
-- 
2.7.0-rc0-109-gb762328

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] strbuf: add strbuf_read_once to read without blocking
  2015-12-14 19:37 ` [PATCH 4/8] strbuf: add strbuf_read_once to read " Stefan Beller
@ 2015-12-14 23:16   ` Eric Sunshine
  2015-12-14 23:27     ` Stefan Beller
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Sunshine @ 2015-12-14 23:16 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git List, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 2:37 PM, Stefan Beller <sbeller@google.com> wrote:
> The new call will read from a file descriptor into a strbuf once. The
> underlying call xread_nonblock is meant to execute without blocking if
> the file descriptor is set to O_NONBLOCK. It is a bug to call
> strbuf_read_once on a file descriptor which would block.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff --git a/strbuf.h b/strbuf.h
> @@ -367,6 +367,14 @@ extern size_t strbuf_fread(struct strbuf *, size_t, FILE *);
>  extern ssize_t strbuf_read(struct strbuf *, int fd, size_t hint);
>
>  /**
> + * Returns the number of new bytes appended to the sb.
> + * Negative return value signals there was an error returned from
> + * underlying read(2), in which case the caller should check errno.
> + * e.g. errno == EAGAIN when the read may have blocked.
> + */
> +extern ssize_t strbuf_read_once(struct strbuf *, int fd, size_t hint);

strbuf_read_once() is a rather opaque name; without reading the
documentation, it's difficult to figure out what it means. I wonder if
strbuf_read_nonblock() or something would be clearer?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] strbuf: add strbuf_read_once to read without blocking
  2015-12-14 23:16   ` Eric Sunshine
@ 2015-12-14 23:27     ` Stefan Beller
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-14 23:27 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Jeff King, Junio C Hamano, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 3:16 PM, Eric Sunshine <ericsunshine@gmail.com> wrote:
> On Mon, Dec 14, 2015 at 2:37 PM, Stefan Beller <sbeller@google.com> wrote:
>> The new call will read from a file descriptor into a strbuf once. The
>> underlying call xread_nonblock is meant to execute without blocking if
>> the file descriptor is set to O_NONBLOCK. It is a bug to call
>> strbuf_read_once on a file descriptor which would block.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>> diff --git a/strbuf.h b/strbuf.h
>> @@ -367,6 +367,14 @@ extern size_t strbuf_fread(struct strbuf *, size_t, FILE *);
>>  extern ssize_t strbuf_read(struct strbuf *, int fd, size_t hint);
>>
>>  /**
>> + * Returns the number of new bytes appended to the sb.
>> + * Negative return value signals there was an error returned from
>> + * underlying read(2), in which case the caller should check errno.
>> + * e.g. errno == EAGAIN when the read may have blocked.
>> + */
>> +extern ssize_t strbuf_read_once(struct strbuf *, int fd, size_t hint);
>
> strbuf_read_once() is a rather opaque name; without reading the
> documentation, it's difficult to figure out what it means. I wonder if
> strbuf_read_nonblock() or something would be clearer?

Well the underlying read call can block if the fd is not marked for nonblocking,
so I would not name it _nonblock.
I just realize this same argument would make the naming moot for the previous
patch though. (xread_nonblock may block if not marked unblocking)

Currently we really want is a read once and do not try to grab as much
as possible,
but just return quickly. We do not do the non blocking setup, as we deemed that
unneeded because of the preceding poll in a higher layer to signal we have data
there to read.

Junio suggested in the discussion [4/8] maybe we can just use xread
and strbuf_read
in this series, so I am tempted to drop patches {3,4}/8 as that makes
sense to me.
For non blocking stuff we can later re introduce those helper functions though.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 23:15     ` Junio C Hamano
@ 2015-12-14 23:57       ` Jeff King
  2015-12-15  0:09         ` Stefan Beller
  2015-12-15  1:40         ` Junio C Hamano
  0 siblings, 2 replies; 32+ messages in thread
From: Jeff King @ 2015-12-14 23:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Eric Sunshine, Stefan Beller, Git List, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 03:15:29PM -0800, Junio C Hamano wrote:

> -- >8 --
> From: Stefan Beller <sbeller@google.com>
> Date: Mon, 14 Dec 2015 11:37:13 -0800
> Subject: [PATCH] xread_nonblock: add functionality to read from fds without blocking
> 
> Provide a wrapper to read(), similar to xread(), that restarts on
> EINTR but not EAGAIN (or EWOULDBLOCK). This enables the caller to
> handle polling itself, possibly polling multiple sockets or performing
> some other action.

This makes me wonder why we restart xread() on EAGAIN in the first
place.

On EINTR, sure; signals can come and we want to keep going. But if do
not have non-blocking descriptors, it should never happen, right?

Are we trying to protect ourselves against somebody _else_ giving us a
non-blocking descriptor? In that case we'll quietly spin and waste CPU.
Which isn't great, but perhaps better than returning an error.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 23:57       ` Jeff King
@ 2015-12-15  0:09         ` Stefan Beller
  2015-12-15  0:16           ` Jeff King
  2015-12-15  1:40         ` Junio C Hamano
  1 sibling, 1 reply; 32+ messages in thread
From: Stefan Beller @ 2015-12-15  0:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Eric Sunshine, Git List, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 3:57 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Dec 14, 2015 at 03:15:29PM -0800, Junio C Hamano wrote:
>
>> -- >8 --
>> From: Stefan Beller <sbeller@google.com>
>> Date: Mon, 14 Dec 2015 11:37:13 -0800
>> Subject: [PATCH] xread_nonblock: add functionality to read from fds without blocking
>>
>> Provide a wrapper to read(), similar to xread(), that restarts on
>> EINTR but not EAGAIN (or EWOULDBLOCK). This enables the caller to
>> handle polling itself, possibly polling multiple sockets or performing
>> some other action.
>
> This makes me wonder why we restart xread() on EAGAIN in the first
> place.
>
> On EINTR, sure; signals can come and we want to keep going. But if do
> not have non-blocking descriptors, it should never happen, right?

right.

>
> Are we trying to protect ourselves against somebody _else_ giving us a
> non-blocking descriptor? In that case we'll quietly spin and waste CPU.
> Which isn't great, but perhaps better than returning an error.

Yes.
This sounds like a good reasoning for 2/8 (add in the poll, so we are
more polite), though.

This patch is a prerequisite for 4/8, which explicitly doesn't want to loop
but a quick return. Maybe we could even drop this patch and just use
`read` as is in 4/8. Looking from a higher level perspective, we don't care
about strbuf_read_nonblocking to return after a signal without retry.

>
> -Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-15  0:09         ` Stefan Beller
@ 2015-12-15  0:16           ` Jeff King
  2015-12-15  0:25             ` Stefan Beller
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2015-12-15  0:16 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Eric Sunshine, Git List, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 04:09:01PM -0800, Stefan Beller wrote:

> > Are we trying to protect ourselves against somebody _else_ giving us a
> > non-blocking descriptor? In that case we'll quietly spin and waste CPU.
> > Which isn't great, but perhaps better than returning an error.
> 
> Yes.
> This sounds like a good reasoning for 2/8 (add in the poll, so we are
> more polite), though.
> 
> This patch is a prerequisite for 4/8, which explicitly doesn't want to loop
> but a quick return. Maybe we could even drop this patch and just use
> `read` as is in 4/8. Looking from a higher level perspective, we don't care
> about strbuf_read_nonblocking to return after a signal without retry.

I was actually thinking about simply teaching xread() not to worry about
EAGAIN, but that would probably be a regression in the "whoops, somebody
gave us a non-blocking stdin!" case.

But yeah, I think simply using xread() as-is in strbuf_read_once (or
whatever it ends up being called) is OK.  I think all of the
_intentionally_ non-blocking descriptors are gone in this iteration,
right? So the caller of strbuf_read_once expects to have to call poll()
or to block. And that's what xread() does.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-15  0:16           ` Jeff King
@ 2015-12-15  0:25             ` Stefan Beller
  2015-12-15  1:44               ` Jeff King
  2015-12-15  6:12               ` Johannes Sixt
  0 siblings, 2 replies; 32+ messages in thread
From: Stefan Beller @ 2015-12-15  0:25 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Eric Sunshine, Git List, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 4:16 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Dec 14, 2015 at 04:09:01PM -0800, Stefan Beller wrote:
>
>> > Are we trying to protect ourselves against somebody _else_ giving us a
>> > non-blocking descriptor? In that case we'll quietly spin and waste CPU.
>> > Which isn't great, but perhaps better than returning an error.
>>
>> Yes.
>> This sounds like a good reasoning for 2/8 (add in the poll, so we are
>> more polite), though.
>>
>> This patch is a prerequisite for 4/8, which explicitly doesn't want to loop
>> but a quick return. Maybe we could even drop this patch and just use
>> `read` as is in 4/8. Looking from a higher level perspective, we don't care
>> about strbuf_read_nonblocking to return after a signal without retry.
>
> I was actually thinking about simply teaching xread() not to worry about
> EAGAIN, but that would probably be a regression in the "whoops, somebody
> gave us a non-blocking stdin!" case.
>
> But yeah, I think simply using xread() as-is in strbuf_read_once (or
> whatever it ends up being called) is OK.

I was actually thinking about using {without-x}read, just the plain system call.
Do we have any issues with that for wrapping purposes for Windows?
There is no technical reason to prefer xread over read in strbuf_read_once as
* we are not nonblocking (so the EAGAIN|| EWOULDBLOCK doesn't apply)
* we don't care about EINTR and retrying upon that signal
* we would not care about MAX_IO_SIZE most likely (that's actually one
of the reasons I could technically think of to prefer xread)


> I think all of the
> _intentionally_ non-blocking descriptors are gone in this iteration,
> right?

I think we don't even have unintentional non blocking fds here now as we
create all the fds ourselves and never set the NOBLOCK flag.

> So the caller of strbuf_read_once expects to have to call poll()
> or to block. And that's what xread() does.

ok, I'll drop this patch and use xread there.

>
> -Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-14 23:57       ` Jeff King
  2015-12-15  0:09         ` Stefan Beller
@ 2015-12-15  1:40         ` Junio C Hamano
  1 sibling, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2015-12-15  1:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Eric Sunshine, Stefan Beller, Git List, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

Jeff King <peff@peff.net> writes:

> Are we trying to protect ourselves against somebody _else_ giving us a
> non-blocking descriptor? In that case we'll quietly spin and waste CPU.
> Which isn't great, but perhaps better than returning an error.

I think I said it earlier in a message upthread.

> Ahh, there is a difference if the file descriptor the caller feeds
> strbuf_read_once() happens to be marked as nonblock.  read_once()
> wants to return without doing the poll() in such a case.  Even
> though this series would not introduce any use of a nonblocking file
> descriptor, as a general API function, [4/8] must be prepared for
> such a future caller, hence [3/8] is needed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-15  0:25             ` Stefan Beller
@ 2015-12-15  1:44               ` Jeff King
  2015-12-15  6:12               ` Johannes Sixt
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff King @ 2015-12-15  1:44 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Eric Sunshine, Git List, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann, Johannes Sixt

On Mon, Dec 14, 2015 at 04:25:18PM -0800, Stefan Beller wrote:

> > But yeah, I think simply using xread() as-is in strbuf_read_once (or
> > whatever it ends up being called) is OK.
> 
> I was actually thinking about using {without-x}read, just the plain system call.
> Do we have any issues with that for wrapping purposes for Windows?
> There is no technical reason to prefer xread over read in strbuf_read_once as
> * we are not nonblocking (so the EAGAIN|| EWOULDBLOCK doesn't apply)
> * we don't care about EINTR and retrying upon that signal
> * we would not care about MAX_IO_SIZE most likely (that's actually one
> of the reasons I could technically think of to prefer xread)

I think you do still need to care about EINTR, or at least not barfing
if read() returns -1. If I understand correctly, you want to do
something like:

  while (1) {
	poll(some_fds);
	for (i = 0; i < nr_fds; i++) {
		if (some_fds[i].revents & POLLIN) {
			int r = strbuf_read_once(buf[i], some_fds[i]);
			/* ??? what do we do with r? */
		}
	}
  }

If we get EINTR from that read, it's OK for us to loop back to the
poll() and go again.

But if we get a true error in "r", we'd want to know, right? That means
we must distinguish between EINTR and "real" errors (like EIO or
something). We can do that here, but I think it's just as easy to do it
inside of strbuf_read_once (by calling xread() there). It's OK not to
jump back to the poll(), because we know the data that triggered the
POLLIN is still waiting for us to read it.

And we are fine with EAGAIN, too. We don't expect the sockets to be
non-blocking in the first place, but even if they were, we know we just
got POLLIN, so there should be data waiting.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking
  2015-12-15  0:25             ` Stefan Beller
  2015-12-15  1:44               ` Jeff King
@ 2015-12-15  6:12               ` Johannes Sixt
  1 sibling, 0 replies; 32+ messages in thread
From: Johannes Sixt @ 2015-12-15  6:12 UTC (permalink / raw)
  To: Stefan Beller, Jeff King
  Cc: Junio C Hamano, Eric Sunshine, Git List, Jonathan Nieder,
	Johannes Schindelin, Jens Lehmann

Am 15.12.2015 um 01:25 schrieb Stefan Beller:
> I was actually thinking about using {without-x}read, just the plain system call.
> Do we have any issues with that for wrapping purposes for Windows?

xread() limits the size being read to MAX_IO_SIZE, which is needed on 
some systems (I think that some Windows configurations did fall into 
this category).

-- Hannes

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-12-15  6:13 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-14 19:37 [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Stefan Beller
2015-12-14 19:37 ` [PATCH 1/8] submodule.c: write "Fetching submodule <foo>" to stderr Stefan Beller
2015-12-14 19:37 ` [PATCH 2/8] xread: poll on non blocking fds Stefan Beller
2015-12-14 22:58   ` Eric Sunshine
2015-12-14 23:07     ` Stefan Beller
2015-12-14 23:11     ` Junio C Hamano
2015-12-14 23:14       ` Stefan Beller
2015-12-14 19:37 ` [PATCH 3/8] xread_nonblock: add functionality to read from fds without blocking Stefan Beller
2015-12-14 20:59   ` Junio C Hamano
2015-12-14 23:03   ` Eric Sunshine
2015-12-14 23:05     ` Eric Sunshine
2015-12-14 23:15     ` Junio C Hamano
2015-12-14 23:57       ` Jeff King
2015-12-15  0:09         ` Stefan Beller
2015-12-15  0:16           ` Jeff King
2015-12-15  0:25             ` Stefan Beller
2015-12-15  1:44               ` Jeff King
2015-12-15  6:12               ` Johannes Sixt
2015-12-15  1:40         ` Junio C Hamano
2015-12-14 19:37 ` [PATCH 4/8] strbuf: add strbuf_read_once to read " Stefan Beller
2015-12-14 23:16   ` Eric Sunshine
2015-12-14 23:27     ` Stefan Beller
2015-12-14 19:37 ` [PATCH 5/8] sigchain: add command to pop all common signals Stefan Beller
2015-12-14 19:37 ` [PATCH 6/8] run-command: add an asynchronous parallel child processor Stefan Beller
2015-12-14 20:39   ` Johannes Sixt
2015-12-14 21:40     ` Stefan Beller
2015-12-14 19:37 ` [PATCH 7/8] fetch_populated_submodules: use new parallel job processing Stefan Beller
2015-12-14 19:37 ` [PATCH 8/8] submodules: allow parallel fetching, add tests and documentation Stefan Beller
2015-12-14 20:40 ` [PATCH 0/8] Rerolling sb/submodule-parallel-fetch for the time after 2.7 Johannes Sixt
2015-12-14 21:00   ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2015-09-28 23:13 [PATCH 0/8] fetch submodules in parallel Stefan Beller
2015-09-28 23:14 ` [PATCH 5/8] sigchain: add command to pop all common signals Stefan Beller
2015-09-30  5:23   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).