git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Jens Lehmann <Jens.Lehmann@web.de>, Jeff King <peff@peff.net>,
	Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning
Date: Fri, 19 Feb 2016 16:32:24 -0800	[thread overview]
Message-ID: <20160220003224.GN28749@google.com> (raw)
In-Reply-To: <CAGZ79kZjD_hRBFEVpj-80NpaYR2qRvnLbBb+9kR4x7MKoRX+UQ@mail.gmail.com>

Stefan Beller wrote:
> On Fri, Feb 19, 2016 at 3:07 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:

[...]
>>> +             strbuf_addf(err, "Skipping submodule '%s'\n",
>>> +                         displaypath);
>>
>> Does the caller expect a newline at the end of err?
>>
>> In the refs code that uses an err strbuf, the convention is to
>> not end the message with a newline.  That way, a function like
>> die() can insert a newline and messages are guaranteed to be
>> newline-terminated even if someone is sloppy and does the wrong thing
>> when generating an error.
>
> Oh! So we need to add new lines ourselves here.

Now I'm confused.  The code in this patch is inconsistent.  I was
recommending consistently leaving out the \n.

It looks like pp_start_one takes the content of err without adding
a \n.  That's a bug in pp_start_one and pp_collect_finished IMHO.

For example, default_task_finished and default_start_failure don't
put a newline don't put a newline at the end of the message.  I
don't think that's a bug --- they're doing the right thing, but
pp_start_one and pp_collect_finished is just not handling it
correctly.

>>> +             if (pp->reference)
>>> +                     argv_array_push(&cp->args, pp->reference);
>>> +             if (pp->depth)
>>> +                     argv_array_push(&cp->args, pp->depth);
>>
>> What does 'cp' stand for mean?  Would a name like 'child' work?
>
> child process ? (any code which had a struct child_process referred to
> it as *cp)

Output from 'git grep --F -e "child_process *"' finds variables with
various names, corresponding to what kind of child it is.

[...]
>> Is this the 'list' subcommand?
>
> no. :(

No problem --- that's what review is for.

[...]
>>> +     if (pp.print_unmatched) {
>>> +             printf("#unmatched\n");
>>
>> I'm still confused about this.  I think a comment where
>> 'print_unmatched' is declared would probably clear it up.
>
> Probably this is a too literal translation from shell to C.
> By printing #unmatched the shell on the other end of the pipe
> of this helper program knows to just stop and error out.
>
> So this is telling the downstream program to stop.
>
> Maybe we can name the variable 'quickstop' ?
> 'abort_and_exit' ?

Interesting.

Would it work for the helper to exit at that point with nonzero status?

Lacking "set -o pipefail", it's a little messy to handle in the shell,
but it's possible:

	{
		git submodule--helper \
			--foo \
			--bar \
			--baz ||
		... handle error, e.g. by printing something
		that the other end of the pipe wants to see ...
	} |
	...

[...]
>> (just curious) why are these saved up and printed all at once instead
>> of being printed as it goes?
>
> To have a clean abort path, i.e. we need to be done with all the parallel stuff
> before we start on doing local on-disk stuff.

Interesting.

That's subtle.  Probably worth a comment so people know not to break
it.  (E.g.

	/*
	 * We saved the output and splat it all at once now.
	 * That means:
	 * - the listener does not have to interleave their (checkout)
	 *   work with our fetching.  The writes involved in a
	 *   checkout involve more straightforward sequential I/O.
	 * - the listener can avoid doing any work if fetching failed.
	 */

).

Thinking out loud: the other side could write their output to a
temporary file (e.g.  under .git) and save us the trouble of buffering
it.  But the list of submodules and statuses is not likely to be huge
--- buffering doesn't hurt much.

[...]
> In a next step, we can do the checkout in parallel if there is nothing to assume
> to lead to trouble. i.e. update strategy is checkout and the submodule is
> in a clean state at the tip of a branch.

Somebody tried parallelizing file output during checkout and found it
sped up the checkout on some systems by a small amount.  But then
later operations to read back the files were much slower.  It seems
that the output pattern affected the layout of the files on disk in a
bad way.

I'm not too afraid.  Just saying that the benefit of parallel checkout
would be something to measure.

A bigger worry is that checkout might be I/O bound and not benefit
much from parallelism.

> All problems which need to be solved by the user should be presented in
> a sequential way, i.e. present one problem and then full stop.
> That seems to be more encouraging as if we'd throw a "Here are a dozen
> broken submodule repositories which we expect the user fixes up".

It depends on the problem --- sometimes people want to see a list of
errors and fix them all (hence tools like "make -k").  I agree that
stop-on-first-error is a good default and that it's not worth worrying
about introducing --keep-going until someone asks for it.

Thanks for your thoughtfulness,
Jonathan

  reply	other threads:[~2016-02-20  0:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-19 18:17 [PATCHv14 0/7] Expose submodule parallelism to the user Stefan Beller
2016-02-19 18:17 ` [PATCHv14 1/7] submodule-config: keep update strategy around Stefan Beller
2016-02-19 18:17 ` [PATCHv14 2/7] submodule-config: drop check against NULL Stefan Beller
2016-02-19 18:17 ` [PATCHv14 3/7] fetching submodules: respect `submodule.fetchJobs` config option Stefan Beller
2016-02-19 18:17 ` [PATCHv14 4/7] submodule update: direct error message to stderr Stefan Beller
2016-02-19 18:17 ` [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning Stefan Beller
2016-02-19 23:07   ` Jonathan Nieder
2016-02-19 23:49     ` Stefan Beller
2016-02-20  0:32       ` Jonathan Nieder [this message]
2016-02-19 18:17 ` [PATCHv14 6/7] submodule update: expose parallelism to the user Stefan Beller
2016-02-19 18:17 ` [PATCHv14 7/7] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2016-02-19 21:04 ` [PATCHv14 0/7] Expose submodule parallelism to the user Junio C Hamano
2016-02-23 23:33   ` Junio C Hamano
2016-02-23 23:42     ` Stefan Beller
2016-02-23 23:44       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160220003224.GN28749@google.com \
    --to=jrnieder@gmail.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).