git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, emilyshaffer@google.com
Subject: Re: [PATCH 1/4] run-command: add pipe_output to run_processes_parallel
Date: Fri, 23 Sep 2022 11:58:23 -0700	[thread overview]
Message-ID: <xmqqy1u9uddc.fsf@gitster.g> (raw)
In-Reply-To: <20220922232947.631309-2-calvinwan@google.com> (Calvin Wan's message of "Thu, 22 Sep 2022 23:29:44 +0000")

Calvin Wan <calvinwan@google.com> writes:

> run_processes_parallel periodically collects output from its child
> processes, prints it, and then resets the buffers for each child.
> Add run_processes_parallel_pipe_output variable so output can be
> collected and fed to task_finished. When set, the function referenced
> by task_finished should parse the output of each child process.

I am puzzled.

 * Why are we configuring an API behaviour via a global variable in
   21st century?

 * The name "task_finished" is mentioned, but it is unclear what it
   is.  Is it one of the parameters to run_process_parallel()?

 * Is the effect of the new feature that task_finished callback is
   called with the output, in addition to the normal output?  I am
   not sure why it is called "pipe".  The task_finished callback may
   be free to fork a child and send the received output from the
   task to that child over the pipe, but that is what a client code
   could do and is inappropriate to base the name of the mechanism,
   isn't it?

> @@ -1770,10 +1771,12 @@ int run_processes_parallel(int n,
>  	int output_timeout = 100;
>  	int spawn_cap = 4;
>  	int ungroup = run_processes_parallel_ungroup;
> +	int pipe_output = run_processes_parallel_pipe_output;
>  	struct parallel_processes pp;
>  
>  	/* unset for the next API user */
>  	run_processes_parallel_ungroup = 0;
> +	run_processes_parallel_pipe_output = 0;
>  
>  	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb,
>  		ungroup);
> @@ -1800,7 +1803,8 @@ int run_processes_parallel(int n,
>  				pp.children[i].state = GIT_CP_WAIT_CLEANUP;
>  		} else {
>  			pp_buffer_stderr(&pp, output_timeout);
> -			pp_output(&pp);
> +			if (!pipe_output)
> +				pp_output(&pp);

So, we do not send the output from the child to the regular output
channel when pipe_output is in effect.  OK.

>  		}
>  		code = pp_collect_finished(&pp);
>  		if (code) {

And no other code changes?  This is quite different from what I
expected from reading the proposed log message.

Am I correct to say that under this new mode, we no longer flush any
output while the child task is running (due to the change in the
above hunk to omit calls to pp_output() during the run) and instead
keep accumulating in the strbuf, until the child task finishes, at
which time pp_collect_finished() will call task_finished callback.

Even though the callback usually consumes the last piece of the
output since the last pp_output() call made during the normal
execution of the run_processes_parallel() loop, because we omitted
these calls, we have full output from the child task accumulated in
the children[].err strbuf.  We may still not output .err for real,
as we may not be the output_owner, in which case we may only append
to .buffered_output member.

I am puzzled simply because, if the above summary is correct, I do
not see how a word "pipe" have a chance to come into the picture.

I can sort of see that in this mode, we would end up buffering the
entire output from each child task into one strbuf each, and can
avoid stalling the child tasks waiting for their turn to see their
output pipes drained.  But is this a reasonable thing to do?  How do
we control the memory consumption to avoid having to spool unbounded
amount of output from child tasks in core, or do we have a good
reason to believe that we do not have to bother?

Thanks.


  parent reply	other threads:[~2022-09-23 18:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 23:29 [PATCH 0/4] submodule: parallelize status Calvin Wan
2022-09-22 23:29 ` [PATCH 1/4] run-command: add pipe_output to run_processes_parallel Calvin Wan
2022-09-23  7:52   ` Ævar Arnfjörð Bjarmason
2022-09-26 16:59     ` Calvin Wan
2022-09-27 10:52       ` Ævar Arnfjörð Bjarmason
2022-09-23 18:58   ` Junio C Hamano [this message]
2022-09-26 17:31     ` Calvin Wan
2022-09-27  4:45       ` Junio C Hamano
2022-09-27 18:10         ` Calvin Wan
2022-09-27 21:40           ` Junio C Hamano
2022-09-27  9:05       ` Ævar Arnfjörð Bjarmason
2022-09-27 17:55         ` Calvin Wan
2022-09-27 19:34           ` Ævar Arnfjörð Bjarmason
2022-09-27 20:45             ` Calvin Wan
2022-09-28  5:40               ` Ævar Arnfjörð Bjarmason
2022-09-29 20:52                 ` Calvin Wan
2022-09-22 23:29 ` [PATCH 2/4] submodule: move status parsing into function Calvin Wan
2022-09-22 23:29 ` [PATCH 3/4] diff-lib: refactor functions Calvin Wan
2022-09-23 20:36   ` Junio C Hamano
2022-09-26 17:35     ` Calvin Wan
2022-09-22 23:29 ` [PATCH 4/4] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2022-09-23  8:06   ` Ævar Arnfjörð Bjarmason
2022-09-24 20:17     ` Junio C Hamano
2022-09-26 17:50     ` Calvin Wan
2022-09-23 21:44   ` Junio C Hamano
2022-09-26 19:12     ` Calvin Wan
2022-09-25 13:59   ` Phillip Wood
2022-09-26 17:11     ` Junio C Hamano
2022-09-26 19:22     ` Calvin Wan
2022-09-27 18:40   ` Emily Shaffer
2022-09-23 22:56 ` [PATCH 0/4] submodule: parallelize status Junio C Hamano
2022-09-26 16:33   ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy1u9uddc.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=calvinwan@google.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).