From: Junio C Hamano <gitster@pobox.com>
To: Stefan Beller <sbeller@google.com>
Cc: Jeff King <peff@peff.net>,
"git\@vger.kernel.org" <git@vger.kernel.org>,
Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [PATCH 3/5] submodule: helper to run foreach in parallel
Date: Tue, 25 Aug 2015 15:23:18 -0700 [thread overview]
Message-ID: <xmqqy4gzvwh5.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <CAGZ79kb2N_5_tJv-GURL9_ESFs=pHp=L-Mujn3Df_+-T74_9Dg@mail.gmail.com> (Stefan Beller's message of "Tue, 25 Aug 2015 14:42:25 -0700")
Stefan Beller <sbeller@google.com> writes:
>>> + while (1) {
>>> + ssize_t len = xread(cp->err, buf, sizeof(buf));
>>> + if (len < 0)
>>> + die("Read from child failed");
>>> + else if (len == 0)
>>> + break;
>>> + else {
>>> + strbuf_add(&out, buf, len);
>>> + }
>>
>> ... and the whole thing is accumulated in core???
>
> The pipes have a limit, so we need to empty them to prevent back-pressure?
Of course. But that does not lead to "we hold everything in core".
This side could choose to emit (under protection of args->mutex)
early, e.g. after reading a line, emit it to our standard output (or
our standard error).
> And because we want to have the output of one task at a time, we need to
> save it up until we can put out the whole output, no?
I do not necessarily agree, and I think I said that already:
http://thread.gmane.org/gmane.comp.version-control.git/276273/focus=276321
>>> + }
>>> + if (finish_command(cp))
>>> + die("command died with error");
>>> +
>>> + sem_wait(args->mutex);
>>> + fputs(out.buf, stderr);
>>> + sem_post(args->mutex);
>>
>> ... and emitted to standard error?
>>
>> I would have expected that the standard error would be left alone
>
> `git fetch` which may be a good candidate for such an operation
> provides progress on stderr, and we don't want to intermingle
> 2 different submodule fetch progress displays
> ("I need to work offline for a bit, so let me get all of the latest stuff,
> so I'll run `git submodule foreach -j 16 -- git fetch --all" though ideally
> we want to have `git fetch --recurse-submodules -j16` instead )
>
>> (i.e. letting warnings from multiple jobs to be mixed together
>> simply because everybody writes to the same file descriptor), while
>> the standard output would be line-buffered, perhaps captured by the
>> above loop and then emitted under mutex, or something.
>
>>
>> I think I said this earlier, but latency to the first output counts
>
> "to the first stderr"
> in this case?
I didn't mean "output==the standard output stream". As I said in
$gmane/276321, an early output, as an indication that we are doing
something, is important.
> Why would we want to unplug the task queue from somewhere else?
When you have a dispatcher more intelligent than a stupid FIFO, I
would imagine that you would want to be able to do this pattern,
especially when coming up with a task (not performing a task) takes
non-trivial amount of work:
prepare task queue and have N threads waiting on it;
plug the queue, i.e. tell threads that do not start picking
tasks out of it yet;
large enough loop to fill the queue to a reasonable size
while keeping the threads waiting;
unplug the queue. Now the threads can pick tasks from the
queue, but they have many to choose from, and a dispatcher
can do better than simple FIFO can take advantage of it;
keep filling the queue with more tasks, if necessary;
and finally, wait for everything to finish.
Without "plug/unplug" interface, you _could_ do the above by doing
something stupid like
prepare a task queue and have N threads waiting on it;
loop to find enough number of tasks but do not put them to
task queue, as FIFO will eat them one-by-one; instead hold
onto them in a custom data structure that is outside the
task queue system;
tight and quick loop to move them to the task queue;
keep finding more tasks and feed them to the task queue;
and finally, wait for everything to finish.
next prev parent reply other threads:[~2015-08-25 22:23 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-25 17:28 [RFC PATCH 0/5] Demonstrate new parallel threading API Stefan Beller
2015-08-25 17:28 ` [PATCH 1/5] FIXUP submodule: implement `module_clone` as a builtin helper Stefan Beller
2015-08-25 17:28 ` [PATCH 2/5] thread-utils: add a threaded task queue Stefan Beller
2015-08-25 17:28 ` [PATCH 3/5] submodule: helper to run foreach in parallel Stefan Beller
2015-08-25 21:09 ` Junio C Hamano
2015-08-25 21:42 ` Stefan Beller
2015-08-25 22:23 ` Junio C Hamano [this message]
2015-08-25 22:44 ` Junio C Hamano
2015-08-26 17:06 ` Jeff King
2015-08-26 17:21 ` Stefan Beller
2015-08-25 17:28 ` [PATCH 4/5] index-pack: Use the new worker pool Stefan Beller
2015-08-25 19:03 ` Jeff King
2015-08-25 19:23 ` Stefan Beller
2015-08-25 20:41 ` Junio C Hamano
2015-08-25 20:59 ` Stefan Beller
2015-08-25 21:12 ` Junio C Hamano
2015-08-25 22:39 ` Stefan Beller
2015-08-25 22:50 ` Junio C Hamano
2015-08-25 17:28 ` [PATCH 5/5] pack-objects: Use " Stefan Beller
-- strict thread matches above, loose matches on Subject: below --
2015-08-27 0:52 [RFC PATCH 0/5] Progressing with `git submodule foreach_parallel` Stefan Beller
2015-08-27 0:52 ` [PATCH 3/5] submodule: helper to run foreach in parallel Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqy4gzvwh5.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).