git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Heiko Voigt <hvoigt@hvoigt.net>,
	Jens Lehmann <jens.lehmann@web.de>
Subject: Re: [PATCH] Add fetch.recurseSubmoduleParallelism config option
Date: Mon, 12 Oct 2015 16:31:18 -0700	[thread overview]
Message-ID: <CAGZ79kZuZZivs8czV2P6uHWaU6ay1hG21k-_G9tgN5KbV6jW8w@mail.gmail.com> (raw)
In-Reply-To: <xmqqeggzbrx5.fsf@gitster.mtv.corp.google.com>

On Mon, Oct 12, 2015 at 4:14 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This allows to configure fetching in parallel without having the annoying
>> command line option.
>
> s/annoying//;
>
> I think this is a sane thing to do, but the name of the variable may
> want to be bikeshedded a bit.

That's actually what I thought about most for this patch. I expected
bikeshedding
as well in the workflow (passing around -1 for each unset option).

We should not include (threads/processes) as that is implementation detail,
and should not be exposed to the user (Looking at pack.threads)

There are some options using max_* for configuring parallel stuff
such as http.maxRequests.

There is core.preloadIndex to enable parallel index preload, but
that is boolean and not giving fine control to the user. We want to give
fine control to the user here I'd assume.

Maybe also the more fundamental question needs to be answered,
if we want to stay in the "fetch." prefix. We could also make it a
submodule specifc thing (submodule.jobs), but that would collide
with submodule.<name>.<foo> maybe? (Originally I wanted to
postpone this patch until I have parallelized git submodule update,
so a "fetch." prefix may not be good, as we want these 2 operations
to use the same config option I'd guess)





>
>> This moved the responsibility to determine how many parallel processes
>> to start from builtin/fetch to submodule.c as we need a way to communicate
>> "The user did not specify the number of parallel processes in the command
>> line options" in the builtin fetch. The submodule code takes care of
>> the precedence (CLI > config > default)
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  Documentation/config.txt |  6 ++++++
>>  builtin/fetch.c          |  2 +-
>>  submodule.c              | 14 ++++++++++++++
>>  3 files changed, 21 insertions(+), 1 deletion(-)
>>
>>  I just monkey tested the code and it worked once! The problem with testing
>>  this parallelizing option is that the expected behavior doesn't change
>>  except for being faster. And I don't want to add timing tests to the test
>>  suite because they are unreliable.
>>
>>  Any idea how to test this properly?
>
> I agree that a test in t/ would catch bugs in the functionality.  If
> your parallel implementation is somehow broken in the future and
> stops functioning correctly, fetching all submodules with a single
> task and fetching them with N tasks will produce different results
> ;-).
>
> But it would not help you much in seeing if the parallelism is
> really taking place.  Adding t/perf/ tests to show how much benefit
> you are getting may be of more value.
>
> The parallel_process API could learn a new "verbose" feature that it
> by itself shows some messages like
>
>     "processing the 'frotz' job with N tasks"
>     "M tasks finished (N still running)"

I know what to fill in for M and N, 'frotz' is a bit unclear to me.
Would you imagine that to be passed in as a hardcoded string?
git fetch --recurse-submodules would pass in "Fetching submodules",
but <foo> wuld pass in actual "frotz", or would you assume that to be
computed from the task data somehow?

>
> in the output stream from strategic places.  For example, the first
> message will come at the end of pp_init(), and the second message
> will be appended at the end of buffered output of a task that has
> just been finished.  Once you have something like that, you could
> check for them in a test in t/.
>
> Just a thought.

I like that thought. :)

>
>>
>>  This applies on top of sb/submodule-parallel-fetch
>>
>>  Thanks,
>>  Stefan
>>
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 315f271..1172db0 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -1140,6 +1140,12 @@ fetch.recurseSubmodules::
>>       when its superproject retrieves a commit that updates the submodule's
>>       reference.
>>
>> +fetch.recurseSubmoduleParallelism
>> +     This is used to determine how many submodules can be fetched in
>> +     parallel. Specifying a positive integer allows up to that number
>> +     of submodules being fetched in parallel. Specifying 0 the number
>> +     of cpus will be taken as the maximum number.
>> +
>>  fetch.fsckObjects::
>>       If it is set to true, git-fetch-pack will check all fetched
>>       objects. It will abort in the case of a malformed object or a
>> diff --git a/builtin/fetch.c b/builtin/fetch.c
>> index f28eac6..b1399dc 100644
>> --- a/builtin/fetch.c
>> +++ b/builtin/fetch.c
>> @@ -37,7 +37,7 @@ static int prune = -1; /* unspecified */
>>  static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity;
>>  static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
>>  static int tags = TAGS_DEFAULT, unshallow, update_shallow;
>> -static int max_children = 1;
>> +static int max_children = -1;
>>  static const char *depth;
>>  static const char *upload_pack;
>>  static struct strbuf default_rla = STRBUF_INIT;
>> diff --git a/submodule.c b/submodule.c
>> index c21b265..c85d3ef 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -15,6 +15,7 @@
>>  #include "thread-utils.h"
>>
>>  static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
>> +static int config_fetch_parallel_submodules = -1;
>>  static struct string_list changed_submodule_paths;
>>  static int initialized_fetch_ref_tips;
>>  static struct sha1_array ref_tips_before_fetch;
>> @@ -179,6 +180,14 @@ int submodule_config(const char *var, const char *value, void *cb)
>>       else if (!strcmp(var, "fetch.recursesubmodules")) {
>>               config_fetch_recurse_submodules = parse_fetch_recurse_submodules_arg(var, value);
>>               return 0;
>> +     } else if (!strcmp(var, "fetch.recursesubmoduleparallelism")) {
>> +             char *end;
>> +             int ret;
>> +             config_fetch_parallel_submodules = strtol(value, &end, 0);
>> +             ret = (*end == '\0');
>> +             if (!ret)
>> +                     warning("value for fetch.recurseSubmoduleParallelism not recognized");
>> +             return ret;
>>       }
>>       return 0;
>>  }
>> @@ -759,6 +768,11 @@ int fetch_populated_submodules(const struct argv_array *options,
>>       argv_array_push(&spf.args, "--recurse-submodules-default");
>>       /* default value, "--submodule-prefix" and its value are added later */
>>
>> +     if (max_parallel_jobs < 0)
>> +             max_parallel_jobs = config_fetch_parallel_submodules;
>> +     if (max_parallel_jobs < 0)
>> +             max_parallel_jobs = 1;
>> +
>>       calculate_changed_submodule_paths();
>>       run_processes_parallel(max_parallel_jobs,
>>                              get_next_submodule,

  reply	other threads:[~2015-10-12 23:31 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-12 22:52 [PATCH] Add fetch.recurseSubmoduleParallelism config option Stefan Beller
2015-10-12 23:14 ` Junio C Hamano
2015-10-12 23:31   ` Stefan Beller [this message]
2015-10-12 23:50     ` Junio C Hamano
2015-10-16 17:04       ` Stefan Beller
2015-10-16 17:26         ` Junio C Hamano
2015-10-13  7:32     ` Junio C Hamano
2015-10-13 16:03       ` Stefan Beller
2015-10-13 21:07         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZuZZivs8czV2P6uHWaU6ay1hG21k-_G9tgN5KbV6jW8w@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jens.lehmann@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).