git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Heiko Voigt <hvoigt@hvoigt.net>,
	Jens Lehmann <jens.lehmann@web.de>
Subject: Re: [WIP/PATCH 3/3] submodule: helper to run foreach in parallel
Date: Fri, 21 Aug 2015 13:21:32 -0700	[thread overview]
Message-ID: <CAGZ79kYxYawAtATAoTmgG42w7E+2fEPfHek5bpbMgZ32wya5Zw@mail.gmail.com> (raw)
In-Reply-To: <xmqq7foo5tty.fsf@gitster.dls.corp.google.com>

On Fri, Aug 21, 2015 at 12:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +struct submodule_args {
>> +     const char *name;
>> +     const char *path;
>> +     const char *sha1;
>> +     const char *toplevel;
>> +     const char *prefix;
>> +     const char **cmd;
>> +     struct submodule_output *out;
>> +     sem_t *mutex;
>> +};
>
> I do not see what submodule_output looks like in the patch, but I
> guess you are capturing _all_ output from a task and emitting
> everything at the end, after the task is done?

It was a leftover and will be removed. Sorry for wasting your time.
So jrnieder and me had a discussion on how to do get the output right.

And we had 2 designs:

1) The worker thread acquires a mutex to be allowed to write to
  stdout. This makes early output easy for each thread in case of
  error messages. It also requires less resources than 2).
  Also the expected code complexity is lower as the decisions
  are made locally.

2) Have a single handler which deals with all output of all tasks.
  This handler would `select` on a huge number of pipes from
  all the tasks, (so we would require a lot of pipes). And this central
  handler (which would be submodule_output in this case) would
  take care of having no intermingled racy output from tasks, with
  prefixes and all bells and whistles. This handler could also
  give a progress meter (300 out of 500 submodules updated already)
  for all threads, or when the last task is running switch to piping
  the output of that task to stdout in real time, so you'd get progress
  of the last task as you're already used to.

The outcome of the discussion was to split the worker pool/load
distribution from the output handling in any case as there is no need
to couple them. In my very first designs I had the output handling
as part of the asynchronous worker pool.

This RFC implements 1), that's why there is only the mutex.
I plan on enhancing this to let the last task output in real time (no buffering)
as well as another counter "task n/m completed" after each task
is done.

>
> I have to wonder if that is what people would expect and more
> importantly if that is the most useful.  I am sympathetic to the
> desire to avoid the output totally mixed up from different processes
> emitting things in an uncoordinated manner, and "slurp everything
> and then show everything in one go" is certainly _one_ way to do so,
> but early feedback also counts.  Besides, because the order in which
> tasks are dispatched and completed is unpredictable, you cannot
> expect a machine parseable output _without_ assuming some help from
> the command invoked by each task (e.g. by prefixing the task's output
> with some string that identifies which submodule the output is about).

I was very focused on the "submodule foreach" output, which (in case of
no -q given), displays a

    "Entering %s"

as the first line for each finished task. I suspect that is not enough to
make it a good machine parseable output. So I will prefix each
line with a running number of the task. Maybe like this:

  [001/500] Entering 'foo/bar'
  [001/500] Here goes the text for task foo/bar
  [001/500] Here goes another text for task foo/bar
  [002/500] Entering 'foo/baz'
  [002/500] Here goes the text for task foo/baz
  [002/500] Here goes another text for task foo/bar
  [003/500] Entering 'foo/bla'
  ...

This will make the output for each task distinguishable from
other tasks as well offering some sort of progress meter.
(The fewer submodules you have the less fine grained
the progress bar becomes)

Jonathan suggested to add a capability to the git protocol for a
machine readable progress meter in another channel. so we do not
need to parse the current output Counting/Compressing/Transfer/etc

>
> Once you assume that the command is _aware_ that it needs to help
> the foreach-parallel infrastructure so that the user can sift their
> collective outputs to make sense of them, wouldn't a line-buffered
> intermixing also acceptable, and wouldn't it be a much lower impact
> approach to solve the same problem?

      reply	other threads:[~2015-08-21 20:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-21  1:40 [PATCH 1/3] submodule: implement `module_clone` as a builtin helper Stefan Beller
2015-08-21  1:40 ` [RFC PATCH 2/3] run-commands: add an async queue processor Stefan Beller
2015-08-21 19:05   ` Junio C Hamano
2015-08-21 19:44     ` Jeff King
2015-08-21 19:48       ` Stefan Beller
2015-08-21 19:51         ` Jeff King
2015-08-21 20:12           ` Stefan Beller
2015-08-21 20:41       ` Junio C Hamano
2015-08-21 23:40       ` Stefan Beller
2015-08-24 21:22         ` Junio C Hamano
2015-08-21 19:45     ` Stefan Beller
2015-08-21 20:47       ` Junio C Hamano
2015-08-21 20:56         ` Stefan Beller
2015-08-21  1:40 ` [WIP/PATCH 3/3] submodule: helper to run foreach in parallel Stefan Beller
2015-08-21 19:23   ` Junio C Hamano
2015-08-21 20:21     ` Stefan Beller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kYxYawAtATAoTmgG42w7E+2fEPfHek5bpbMgZ32wya5Zw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jens.lehmann@web.de \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).