Re: [PATCH 03/10] run-command: add an asynchronous parallel child processor

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Jeff King <peff@peff.net>, Jonathan Nieder <jrnieder@gmail.com>,
	Johannes Schindelin <johannes.schindelin@gmail.com>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Vitali Lovich <vlovich@gmail.com>
Subject: Re: [PATCH 03/10] run-command: add an asynchronous parallel child processor
Date: Thu, 17 Sep 2015 16:19:35 -0700	[thread overview]
Message-ID: <CAGZ79kbgBOkiii6B3BFwB778DM=mNcKyOoS-KQGuLf=At7hG0g@mail.gmail.com> (raw)
In-Reply-To: <xmqqbnd0iuv6.fsf@gitster.mtv.corp.google.com>

On Thu, Sep 17, 2015 at 2:44 PM, Junio C Hamano <gitster@pobox.com> wrote:

> Hmm, you are relying on the fact that a valid pid can never be 0, so
> you can just use pp->children[i].child.pid to see if a "slot" is
> occupied without even using pp->slots[] (or pp->children[i].in_use).

We could either use the pid as accessed via children[i].process.pid
or we could rely on the children[i].process.err != -1 as start_process
will set it to an actual fd, and when it's done we reset it to -1.

I am unsure if this make it less readable though (all your other
suggestions improve readability a lot!)

>
>> +                     if (i == pp->max_number_processes)
>> +                             /*
>> +                              * waitpid returned another process id
>> +                              * which we are not waiting for.
>> +                              */
>> +                             return;
>> +             }
>> +             strbuf_read_noblock(&pp->err[i], pp->children[i].err, 0);
>
> This is to read leftover output?
>
> As discussed elsewhere, read_nonblock() will have to have "read
> some, not necessarily to the end" semantics to serve the caller in
> run_processes_parallel_buffer_stderr(), so you'd need a loop around
> it here to read until you see EOF.
>
> Or you may be able to just call strbuf_read() and the function may
> do the right thing to read things through to the EOF.  It depends on
> how you redo the patch [2/10].

strbuf_read sounds like the logical choice here (and the redo of 2/10
supports that).


>> +                      * NEEDSWORK:
>> +                      * For now we pick it randomly by doing a round
>> +                      * robin. Later we may want to pick the one with
>> +                      * the most output or the longest or shortest
>> +                      * running process time.
>> +                      */
>> +                     for (i = 0; i < n; i++)
>> +                             if (pp->slots[(pp->foreground_child + i) % n])
>> +                                     break;
>> +                     pp->foreground_child = (pp->foreground_child + i) % n;
>
> ... and then picks a new owner of the output channel.
>
> Up to this point it looks sensible.
>
>> +                     fputs(pp->err[pp->foreground_child].buf, stderr);
>> +                     strbuf_reset(&pp->err[pp->foreground_child]);
>
> I do not think these two lines need to be here, especially if you
> follow the above advice of separating buffering and draining.

They are outputting the buffer, if the next selected child is still running.
I mean it would output eventually anyway, but it would only output after
the next start of processes and poll() is done. Yeah maybe that's what we
want (starting new processes earlier is better than outputting earlier as we're
talking microseconds here).

>
>> +int run_processes_parallel(int n, void *data,
>> +                        get_next_task fn,
>> +                        handle_child_starting_failure fn_err,
>> +                        handle_child_return_value fn_exit)
>> +{
>> +     struct parallel_processes pp;
>> +     run_processes_parallel_init(&pp, n, data, fn, fn_err, fn_exit);
>> +
>> +     while (!pp.all_tasks_started || pp.nr_processes > 0) {
>
> The former is true as long as more_task() says there may be more.
> The latter is true as long as we have something already running.
>
> In either case, we should keep collecting and spawning as needed.
>
>> +             run_processes_parallel_start_new(&pp);
>
> But calling start_new() unconditionally feels sloppy.  It should at
> least be something like
>
>         if (pp.nr_processes < pp.max_processes &&
>             !pp.all_task_started)
>                 start_new_process()
>
> no?

That's the exact condition we have in start_new_process
so I don't want to repeat it here? We could move the while
loop outside of it though. That looks better, done.

next prev parent reply	other threads:[~2015-09-17 23:19 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17  1:38 [PATCH 00/10] fetch submodules in parallel and a preview on parallel "submodule update" Stefan Beller
2015-09-17  1:38 ` [PATCH 01/10] strbuf: Add strbuf_read_noblock Stefan Beller
2015-09-17 16:13   ` Junio C Hamano
2015-09-17 16:30     ` Jeff King
2015-09-17 16:44       ` Junio C Hamano
2015-09-17 16:51         ` Stefan Beller
2015-09-17 16:57         ` Jeff King
2015-09-17 16:58       ` Junio C Hamano
2015-09-17 17:13         ` Jeff King
2015-09-17 17:26           ` Stefan Beller
2015-09-17 17:35             ` Jeff King
2015-09-17 17:45               ` Stefan Beller
2015-09-17 17:50                 ` Jeff King
2015-09-17 17:53                   ` Stefan Beller
2015-09-17 17:57               ` Junio C Hamano
2015-09-17 17:54           ` Junio C Hamano
2015-09-17 18:02             ` Jeff King
2015-09-17 17:20         ` Stefan Beller
2015-09-17  1:39 ` [PATCH 02/10] run-command: factor out return value computation Stefan Beller
2015-09-17 10:33   ` Jeff King
2015-09-17  1:39 ` [PATCH 03/10] run-command: add an asynchronous parallel child processor Stefan Beller
2015-09-17 21:44   ` Junio C Hamano
2015-09-17 23:19     ` Stefan Beller [this message]
2015-09-18  1:05       ` Junio C Hamano
2015-09-18 16:36         ` Stefan Beller
2015-09-17  1:39 ` [PATCH 04/10] fetch_populated_submodules: use new parallel job processing Stefan Beller
2015-09-17  1:39 ` [PATCH 05/10] submodules: Allow parallel fetching, add tests and documentation Stefan Beller
2015-09-17  1:39 ` [PATCH 06/10] git submodule update: Redirect any output to stderr Stefan Beller
2015-09-17 20:31   ` Eric Sunshine
2015-09-17 20:38     ` Stefan Beller
2015-09-17  1:39 ` [PATCH 07/10] git submodule update: pass --prefix only with a non empty prefix Stefan Beller
2015-09-17 20:33   ` Eric Sunshine
2015-09-17  1:39 ` [PATCH 08/10] git submodule update: cmd_update_recursive Stefan Beller
2015-09-17  1:39 ` [PATCH 09/10] " Stefan Beller
2015-09-17 20:37   ` Eric Sunshine
2015-09-17  1:39 ` [PATCH 10/10] git submodule update: cmd_update_fetch Stefan Beller
2015-09-17 17:06 ` [PATCH 00/10] fetch submodules in parallel and a preview on parallel "submodule update" Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kbgBOkiii6B3BFwB778DM=mNcKyOoS-KQGuLf=At7hG0g@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=vlovich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).