git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Palmer Dabbelt <palmer@sifive.com>
Cc: git@vger.kernel.org, peff@peff.net, e@80x24.org,
	chriscool@tuxfamily.org, jonathantanmy@google.com, tboegi@web.de,
	bwilliams.eng@gmail.com, jeffhost@microsoft.com
Subject: Re: [PATCH] fetch: add "--parallel", which fetches all remotes in parallel
Date: Thu, 18 Jul 2019 10:45:04 -0700	[thread overview]
Message-ID: <xmqq5znzz17j.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20190717015903.4384-1-palmer@sifive.com> (Palmer Dabbelt's message of "Tue, 16 Jul 2019 18:59:03 -0700")

Palmer Dabbelt <palmer@sifive.com> writes:

> * I'm not sure if it's safe to access the .git database from multiple
>   processes at the same time.

It is supposed to, and also you are supposed to keep it that way ;-)

> * As I was writing the documentation I found "fetch --jobs".  It seems
>   like I should use that instead of adding a new argrument, but I wasn't
>   sure.

Why not?  What makes you feel it is a bad idea to follow that
pattern?

Ah, --jobs that is taken already is right now too tied to fetches
that happen in submodules, which arguably was a design mistake.

    -j::
    --jobs=<n>::
            Number of parallel children to be used for fetching
            submodules.  Each will fetch from different submodules,
            such that fetching many submodules will be faster. By
            default submodules will be fetched one at a time.

The simplest endgame would be to replace "submodule" with
"repository" in the above description, perhaps like

	Number of parallel jobs to be used for fetching from
	multiple repositories (both fetching with "--multiple" from
	multiple repositories, and also fetching updated contents
	for submodules).  By default, fetching from multiple
	repositories and submodules is done one at a time.

and nobody would have complained if the system were like so from the
beginning.  Existing users, however, may want extra flexibility, and
would complain loudly if we did the above, in which case, we may
have to

 - introduce --fetch-jobs=<n> for what you are adding;

 - introduce --submodule-fetch-jobs=<n> as a synonym for existing
   --jobs=<n> and deprecate the current use of --jobs=<n>;

 - eventually repurpose --jobs=<n> as a short-hand to give both
   --fetch-jobs and --submoduje-fetch-jobs at the same time.

> +static int parallel = 0;

Don't explicitly "= 0;" initialize file-scope static.  Instead let
BSS take care of it.

>  static int git_fetch_config(const char *k, const char *v, void *cb)
>  {
> @@ -178,6 +179,8 @@ static struct option builtin_fetch_options[] = {
>  			TRANSPORT_FAMILY_IPV6),
>  	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
>  			N_("report that we have only objects reachable from this object")),
> +	OPT_BOOL(0, "parallel", &parallel,
> +		 N_("fetch in parallel from each remote")),
>  	OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
>  	OPT_BOOL(0, "auto-gc", &enable_auto_gc,
>  		 N_("run 'gc --auto' after fetching")),
> @@ -1456,12 +1459,15 @@ static void add_options_to_argv(struct argv_array *argv)
>  
>  }
>  
> -static int fetch_multiple(struct string_list *list)
> +static int fetch_multiple(struct string_list *list, int i)
>  {
> -	int i, result = 0;

'i' is perfectly a good name for a local variable that is used for
loop control purposes, but makes a horrible name for a parameter.

Existing 'list' is not any better either---we know it is a list by
its type already, the name should say what the list is about, what
it represents.  But having a horribly named parameter already is not
a good reason to make the code even worse.

And as you said, recursion makes the code structure harder to follow
here.  Keeping an array of --jobs=<n> cmd structures, looping to
fill them by starting, doing wait() to reap any of the started ones
that first exits to refill the slot just opened, etc. would be easier
to see if done in a loop, I think.

  reply	other threads:[~2019-07-18 17:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-17  1:59 [PATCH] fetch: add "--parallel", which fetches all remotes in parallel Palmer Dabbelt
2019-07-18 17:45 ` Junio C Hamano [this message]
2019-07-19 13:23   ` Johannes Schindelin
2019-07-19 15:12     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq5znzz17j.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=bwilliams.eng@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=palmer@sifive.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).