From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
Jeff King <peff@peff.net>, Jonathan Nieder <jrnieder@gmail.com>,
Johannes Schindelin <johannes.schindelin@gmail.com>,
Jens Lehmann <Jens.Lehmann@web.de>,
Vitali Lovich <vlovich@gmail.com>
Subject: Re: [PATCH 03/10] run-command: add an asynchronous parallel child processor
Date: Fri, 18 Sep 2015 09:36:30 -0700 [thread overview]
Message-ID: <CAGZ79kZd8nZoj3Rk+ZeSfCxhH0vqp4Oaoh2+MYvFa98rzfaNxw@mail.gmail.com> (raw)
In-Reply-To: <xmqqk2ro7czy.fsf@gitster.mtv.corp.google.com>
On Thu, Sep 17, 2015 at 6:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Thu, Sep 17, 2015 at 2:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>>> Hmm, you are relying on the fact that a valid pid can never be 0, so
>>> you can just use pp->children[i].child.pid to see if a "slot" is
>>> occupied without even using pp->slots[] (or pp->children[i].in_use).
>>
>> We could either use the pid as accessed via children[i].process.pid
>> or we could rely on the children[i].process.err != -1 as start_process
>> will set it to an actual fd, and when it's done we reset it to -1.
>>
>> I am unsure if this make it less readable though (all your other
>> suggestions improve readability a lot!)
>
> Sorry, the above was not a suggestion but merely an observation with
> a bit of thinking aloud mixed in. I should have marked it as such
> more clearly.
Ok, I see. no harm done, I did not take that as a suggestion.
>
>>> ... and then picks a new owner of the output channel.
>>>
>>> Up to this point it looks sensible.
>>>
>>>> + fputs(pp->err[pp->foreground_child].buf, stderr);
>>>> + strbuf_reset(&pp->err[pp->foreground_child]);
>>>
>>> I do not think these two lines need to be here, especially if you
>>> follow the above advice of separating buffering and draining.
>>
>> They are outputting the buffer, if the next selected child is still running.
>> I mean it would output eventually anyway, but it would only output after
>> the next start of processes and poll() is done. Yeah maybe that's what we
>> want (starting new processes earlier is better than outputting earlier as we're
>> talking microseconds here).
>
> I am not talking about microseconds but refraining from doing the
> same thing in multiple places.
ok
>
>>> But calling start_new() unconditionally feels sloppy. It should at
>>> least be something like
>>>
>>> if (pp.nr_processes < pp.max_processes &&
>>> !pp.all_task_started)
>>> start_new_process()
>>>
>>> no?
>>
>> That's the exact condition we have in start_new_process
>> so I don't want to repeat it here?
>
> You are advocating to have a function whose definition is "this may
> or may not start a new process and the caller should not care", and
> then make the caller keep calling it, knowing/hoping that the caller
> does not care if we spawn a new thing or not.
>
> I somehow find it a highly questionable design taste to base the
> decision on "don't want to repeat it here".
>
> Stepping back and thinking about it, what is suggested is more
> explicit in the caller. "If we know we need a new worker and we can
> have a new worker, then start it." is the logic in the caller in the
> suggested structure, and we would have a well defined helper whose
> sole task is to start a new worker to be called from the caller.
> The helper may want to check if the request to start a new one makes
> sense (e.g. if slots[] are all full, it may even want to return an
> error), but the checks are primarily for error checking (i.e. "we
> can have max N processes, so make sure we do not exceed that limit"),
> not a semantic one (i.e. the caller _could_ choose to spawn less
> than that max when there is a good reason to do so).
I would not put the decision to spawn fewer processes in the caller either,
My understanding is to have one high level function which outlines the algorithm
like:
loop:
spawn_children_as_necessary
make_sure_pipes_don't_overflow
offer_early_output
take_care_of_finished_children
such that the main function reads more like a bullet point list explaining
how it roughly works. Each helper function should come up with its own strategy,
so spawn_children_as_necessary could be
spawn_children_as_necessary:
while less than n children and there are more tasks:
spawn_one_child
for now. Later we can add more logic if necessary there. But I'd prefer to have
these decisions put into the helpers.
Having written this, I think I'll separate the function to drain the pipes and
the early output and separate the helper to start children into two:
one to make the decision and one to actually start just one child.
>
>
next prev parent reply other threads:[~2015-09-18 16:36 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-17 1:38 [PATCH 00/10] fetch submodules in parallel and a preview on parallel "submodule update" Stefan Beller
2015-09-17 1:38 ` [PATCH 01/10] strbuf: Add strbuf_read_noblock Stefan Beller
2015-09-17 16:13 ` Junio C Hamano
2015-09-17 16:30 ` Jeff King
2015-09-17 16:44 ` Junio C Hamano
2015-09-17 16:51 ` Stefan Beller
2015-09-17 16:57 ` Jeff King
2015-09-17 16:58 ` Junio C Hamano
2015-09-17 17:13 ` Jeff King
2015-09-17 17:26 ` Stefan Beller
2015-09-17 17:35 ` Jeff King
2015-09-17 17:45 ` Stefan Beller
2015-09-17 17:50 ` Jeff King
2015-09-17 17:53 ` Stefan Beller
2015-09-17 17:57 ` Junio C Hamano
2015-09-17 17:54 ` Junio C Hamano
2015-09-17 18:02 ` Jeff King
2015-09-17 17:20 ` Stefan Beller
2015-09-17 1:39 ` [PATCH 02/10] run-command: factor out return value computation Stefan Beller
2015-09-17 10:33 ` Jeff King
2015-09-17 1:39 ` [PATCH 03/10] run-command: add an asynchronous parallel child processor Stefan Beller
2015-09-17 21:44 ` Junio C Hamano
2015-09-17 23:19 ` Stefan Beller
2015-09-18 1:05 ` Junio C Hamano
2015-09-18 16:36 ` Stefan Beller [this message]
2015-09-17 1:39 ` [PATCH 04/10] fetch_populated_submodules: use new parallel job processing Stefan Beller
2015-09-17 1:39 ` [PATCH 05/10] submodules: Allow parallel fetching, add tests and documentation Stefan Beller
2015-09-17 1:39 ` [PATCH 06/10] git submodule update: Redirect any output to stderr Stefan Beller
2015-09-17 20:31 ` Eric Sunshine
2015-09-17 20:38 ` Stefan Beller
2015-09-17 1:39 ` [PATCH 07/10] git submodule update: pass --prefix only with a non empty prefix Stefan Beller
2015-09-17 20:33 ` Eric Sunshine
2015-09-17 1:39 ` [PATCH 08/10] git submodule update: cmd_update_recursive Stefan Beller
2015-09-17 1:39 ` [PATCH 09/10] " Stefan Beller
2015-09-17 20:37 ` Eric Sunshine
2015-09-17 1:39 ` [PATCH 10/10] git submodule update: cmd_update_fetch Stefan Beller
2015-09-17 17:06 ` [PATCH 00/10] fetch submodules in parallel and a preview on parallel "submodule update" Jacob Keller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGZ79kZd8nZoj3Rk+ZeSfCxhH0vqp4Oaoh2+MYvFa98rzfaNxw@mail.gmail.com \
--to=sbeller@google.com \
--cc=Jens.Lehmann@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmail.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
--cc=vlovich@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).