Re: [PATCH 03/10] run-command: add an asynchronous parallel child processor

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Jeff King <peff@peff.net>, Jonathan Nieder <jrnieder@gmail.com>,
	Johannes Schindelin <johannes.schindelin@gmail.com>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Vitali Lovich <vlovich@gmail.com>
Subject: Re: [PATCH 03/10] run-command: add an asynchronous parallel child processor
Date: Fri, 18 Sep 2015 09:36:30 -0700	[thread overview]
Message-ID: <CAGZ79kZd8nZoj3Rk+ZeSfCxhH0vqp4Oaoh2+MYvFa98rzfaNxw@mail.gmail.com> (raw)
In-Reply-To: <xmqqk2ro7czy.fsf@gitster.mtv.corp.google.com>

On Thu, Sep 17, 2015 at 6:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Thu, Sep 17, 2015 at 2:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>>> Hmm, you are relying on the fact that a valid pid can never be 0, so
>>> you can just use pp->children[i].child.pid to see if a "slot" is
>>> occupied without even using pp->slots[] (or pp->children[i].in_use).
>>
>> We could either use the pid as accessed via children[i].process.pid
>> or we could rely on the children[i].process.err != -1 as start_process
>> will set it to an actual fd, and when it's done we reset it to -1.
>>
>> I am unsure if this make it less readable though (all your other
>> suggestions improve readability a lot!)
>
> Sorry, the above was not a suggestion but merely an observation with
> a bit of thinking aloud mixed in.  I should have marked it as such
> more clearly.

Ok, I see. no harm done, I did not take that as a suggestion.

>
>>> ... and then picks a new owner of the output channel.
>>>
>>> Up to this point it looks sensible.
>>>
>>>> +                     fputs(pp->err[pp->foreground_child].buf, stderr);
>>>> +                     strbuf_reset(&pp->err[pp->foreground_child]);
>>>
>>> I do not think these two lines need to be here, especially if you
>>> follow the above advice of separating buffering and draining.
>>
>> They are outputting the buffer, if the next selected child is still running.
>> I mean it would output eventually anyway, but it would only output after
>> the next start of processes and poll() is done. Yeah maybe that's what we
>> want (starting new processes earlier is better than outputting earlier as we're
>> talking microseconds here).
>
> I am not talking about microseconds but refraining from doing the
> same thing in multiple places.

ok

>
>>> But calling start_new() unconditionally feels sloppy.  It should at
>>> least be something like
>>>
>>>         if (pp.nr_processes < pp.max_processes &&
>>>             !pp.all_task_started)
>>>                 start_new_process()
>>>
>>> no?
>>
>> That's the exact condition we have in start_new_process
>> so I don't want to repeat it here?
>
> You are advocating to have a function whose definition is "this may
> or may not start a new process and the caller should not care", and
> then make the caller keep calling it, knowing/hoping that the caller
> does not care if we spawn a new thing or not.
>
> I somehow find it a highly questionable design taste to base the
> decision on "don't want to repeat it here".
>
> Stepping back and thinking about it, what is suggested is more
> explicit in the caller.  "If we know we need a new worker and we can
> have a new worker, then start it." is the logic in the caller in the
> suggested structure, and we would have a well defined helper whose
> sole task is to start a new worker to be called from the caller.
> The helper may want to check if the request to start a new one makes
> sense (e.g. if slots[] are all full, it may even want to return an
> error), but the checks are primarily for error checking (i.e. "we
> can have max N processes, so make sure we do not exceed that limit"),
> not a semantic one (i.e. the caller _could_ choose to spawn less
> than that max when there is a good reason to do so).

I would not put the decision to spawn fewer processes in the caller either,
My understanding is to have one high level function which outlines the algorithm
like:

    loop:
        spawn_children_as_necessary
        make_sure_pipes_don't_overflow
        offer_early_output
        take_care_of_finished_children

such that the main function reads more like a bullet point list explaining
how it roughly works. Each helper function should come up with its own strategy,
so spawn_children_as_necessary could be

    spawn_children_as_necessary:
        while less than n children and there are more tasks:
            spawn_one_child

for now. Later we can add more logic if necessary there. But I'd prefer to have
these decisions put into the helpers.

Having written this, I think I'll separate the function to drain the pipes and
the early output and separate the helper to start children into two:
one to make the decision and one to actually start just one child.



>
>

next prev parent reply	other threads:[~2015-09-18 16:36 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17  1:38 [PATCH 00/10] fetch submodules in parallel and a preview on parallel "submodule update" Stefan Beller
2015-09-17  1:38 ` [PATCH 01/10] strbuf: Add strbuf_read_noblock Stefan Beller
2015-09-17 16:13   ` Junio C Hamano
2015-09-17 16:30     ` Jeff King
2015-09-17 16:44       ` Junio C Hamano
2015-09-17 16:51         ` Stefan Beller
2015-09-17 16:57         ` Jeff King
2015-09-17 16:58       ` Junio C Hamano
2015-09-17 17:13         ` Jeff King
2015-09-17 17:26           ` Stefan Beller
2015-09-17 17:35             ` Jeff King
2015-09-17 17:45               ` Stefan Beller
2015-09-17 17:50                 ` Jeff King
2015-09-17 17:53                   ` Stefan Beller
2015-09-17 17:57               ` Junio C Hamano
2015-09-17 17:54           ` Junio C Hamano
2015-09-17 18:02             ` Jeff King
2015-09-17 17:20         ` Stefan Beller
2015-09-17  1:39 ` [PATCH 02/10] run-command: factor out return value computation Stefan Beller
2015-09-17 10:33   ` Jeff King
2015-09-17  1:39 ` [PATCH 03/10] run-command: add an asynchronous parallel child processor Stefan Beller
2015-09-17 21:44   ` Junio C Hamano
2015-09-17 23:19     ` Stefan Beller
2015-09-18  1:05       ` Junio C Hamano
2015-09-18 16:36         ` Stefan Beller [this message]
2015-09-17  1:39 ` [PATCH 04/10] fetch_populated_submodules: use new parallel job processing Stefan Beller
2015-09-17  1:39 ` [PATCH 05/10] submodules: Allow parallel fetching, add tests and documentation Stefan Beller
2015-09-17  1:39 ` [PATCH 06/10] git submodule update: Redirect any output to stderr Stefan Beller
2015-09-17 20:31   ` Eric Sunshine
2015-09-17 20:38     ` Stefan Beller
2015-09-17  1:39 ` [PATCH 07/10] git submodule update: pass --prefix only with a non empty prefix Stefan Beller
2015-09-17 20:33   ` Eric Sunshine
2015-09-17  1:39 ` [PATCH 08/10] git submodule update: cmd_update_recursive Stefan Beller
2015-09-17  1:39 ` [PATCH 09/10] " Stefan Beller
2015-09-17 20:37   ` Eric Sunshine
2015-09-17  1:39 ` [PATCH 10/10] git submodule update: cmd_update_fetch Stefan Beller
2015-09-17 17:06 ` [PATCH 00/10] fetch submodules in parallel and a preview on parallel "submodule update" Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZd8nZoj3Rk+ZeSfCxhH0vqp4Oaoh2+MYvFa98rzfaNxw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=vlovich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).