From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Junio C Hamano <gitster@pobox.com>,
"git@vger.kernel.org" <git@vger.kernel.org>,
Jens Lehmann <Jens.Lehmann@web.de>, Jeff King <peff@peff.net>,
Eric Sunshine <sunshine@sunshineco.com>
Subject: Re: [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning
Date: Fri, 19 Feb 2016 16:32:24 -0800 [thread overview]
Message-ID: <20160220003224.GN28749@google.com> (raw)
In-Reply-To: <CAGZ79kZjD_hRBFEVpj-80NpaYR2qRvnLbBb+9kR4x7MKoRX+UQ@mail.gmail.com>
Stefan Beller wrote:
> On Fri, Feb 19, 2016 at 3:07 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
[...]
>>> + strbuf_addf(err, "Skipping submodule '%s'\n",
>>> + displaypath);
>>
>> Does the caller expect a newline at the end of err?
>>
>> In the refs code that uses an err strbuf, the convention is to
>> not end the message with a newline. That way, a function like
>> die() can insert a newline and messages are guaranteed to be
>> newline-terminated even if someone is sloppy and does the wrong thing
>> when generating an error.
>
> Oh! So we need to add new lines ourselves here.
Now I'm confused. The code in this patch is inconsistent. I was
recommending consistently leaving out the \n.
It looks like pp_start_one takes the content of err without adding
a \n. That's a bug in pp_start_one and pp_collect_finished IMHO.
For example, default_task_finished and default_start_failure don't
put a newline don't put a newline at the end of the message. I
don't think that's a bug --- they're doing the right thing, but
pp_start_one and pp_collect_finished is just not handling it
correctly.
>>> + if (pp->reference)
>>> + argv_array_push(&cp->args, pp->reference);
>>> + if (pp->depth)
>>> + argv_array_push(&cp->args, pp->depth);
>>
>> What does 'cp' stand for mean? Would a name like 'child' work?
>
> child process ? (any code which had a struct child_process referred to
> it as *cp)
Output from 'git grep --F -e "child_process *"' finds variables with
various names, corresponding to what kind of child it is.
[...]
>> Is this the 'list' subcommand?
>
> no. :(
No problem --- that's what review is for.
[...]
>>> + if (pp.print_unmatched) {
>>> + printf("#unmatched\n");
>>
>> I'm still confused about this. I think a comment where
>> 'print_unmatched' is declared would probably clear it up.
>
> Probably this is a too literal translation from shell to C.
> By printing #unmatched the shell on the other end of the pipe
> of this helper program knows to just stop and error out.
>
> So this is telling the downstream program to stop.
>
> Maybe we can name the variable 'quickstop' ?
> 'abort_and_exit' ?
Interesting.
Would it work for the helper to exit at that point with nonzero status?
Lacking "set -o pipefail", it's a little messy to handle in the shell,
but it's possible:
{
git submodule--helper \
--foo \
--bar \
--baz ||
... handle error, e.g. by printing something
that the other end of the pipe wants to see ...
} |
...
[...]
>> (just curious) why are these saved up and printed all at once instead
>> of being printed as it goes?
>
> To have a clean abort path, i.e. we need to be done with all the parallel stuff
> before we start on doing local on-disk stuff.
Interesting.
That's subtle. Probably worth a comment so people know not to break
it. (E.g.
/*
* We saved the output and splat it all at once now.
* That means:
* - the listener does not have to interleave their (checkout)
* work with our fetching. The writes involved in a
* checkout involve more straightforward sequential I/O.
* - the listener can avoid doing any work if fetching failed.
*/
).
Thinking out loud: the other side could write their output to a
temporary file (e.g. under .git) and save us the trouble of buffering
it. But the list of submodules and statuses is not likely to be huge
--- buffering doesn't hurt much.
[...]
> In a next step, we can do the checkout in parallel if there is nothing to assume
> to lead to trouble. i.e. update strategy is checkout and the submodule is
> in a clean state at the tip of a branch.
Somebody tried parallelizing file output during checkout and found it
sped up the checkout on some systems by a small amount. But then
later operations to read back the files were much slower. It seems
that the output pattern affected the layout of the files on disk in a
bad way.
I'm not too afraid. Just saying that the benefit of parallel checkout
would be something to measure.
A bigger worry is that checkout might be I/O bound and not benefit
much from parallelism.
> All problems which need to be solved by the user should be presented in
> a sequential way, i.e. present one problem and then full stop.
> That seems to be more encouraging as if we'd throw a "Here are a dozen
> broken submodule repositories which we expect the user fixes up".
It depends on the problem --- sometimes people want to see a list of
errors and fix them all (hence tools like "make -k"). I agree that
stop-on-first-error is a good default and that it's not worth worrying
about introducing --keep-going until someone asks for it.
Thanks for your thoughtfulness,
Jonathan
next prev parent reply other threads:[~2016-02-20 0:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-19 18:17 [PATCHv14 0/7] Expose submodule parallelism to the user Stefan Beller
2016-02-19 18:17 ` [PATCHv14 1/7] submodule-config: keep update strategy around Stefan Beller
2016-02-19 18:17 ` [PATCHv14 2/7] submodule-config: drop check against NULL Stefan Beller
2016-02-19 18:17 ` [PATCHv14 3/7] fetching submodules: respect `submodule.fetchJobs` config option Stefan Beller
2016-02-19 18:17 ` [PATCHv14 4/7] submodule update: direct error message to stderr Stefan Beller
2016-02-19 18:17 ` [PATCHv14 5/7] git submodule update: have a dedicated helper for cloning Stefan Beller
2016-02-19 23:07 ` Jonathan Nieder
2016-02-19 23:49 ` Stefan Beller
2016-02-20 0:32 ` Jonathan Nieder [this message]
2016-02-19 18:17 ` [PATCHv14 6/7] submodule update: expose parallelism to the user Stefan Beller
2016-02-19 18:17 ` [PATCHv14 7/7] clone: allow an explicit argument for parallel submodule clones Stefan Beller
2016-02-19 21:04 ` [PATCHv14 0/7] Expose submodule parallelism to the user Junio C Hamano
2016-02-23 23:33 ` Junio C Hamano
2016-02-23 23:42 ` Stefan Beller
2016-02-23 23:44 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160220003224.GN28749@google.com \
--to=jrnieder@gmail.com \
--cc=Jens.Lehmann@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=sbeller@google.com \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).