git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, Ram Rachum <ram@rachum.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Make `git fetch --all` parallel?
Date: Tue, 11 Oct 2016 16:18:15 -0700	[thread overview]
Message-ID: <CAGZ79kaKOiy-HJboaujXXc66P6CLupteDw4JyPOGetREfz_q_Q@mail.gmail.com> (raw)
In-Reply-To: <20161011225942.tvqbbzxglvu7lldi@sigill.intra.peff.net>

On Tue, Oct 11, 2016 at 3:59 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Oct 11, 2016 at 03:50:36PM -0700, Stefan Beller wrote:
>
>> I agree. Though even for implementing the "dumb" case of fetching
>> objects twice we'd have to take care of some racing issues, I would assume.
>>
>> Why did you put a "sleep 2" below?
>> * a slow start to better spread load locally? (keep the workstation responsive?)
>> * a slow start to have different fetches in a different phase of the
>> fetch protocol?
>> * avoiding some subtle race?
>>
>> At the very least we would need a similar thing as Jeff recently sent for the
>> push case with objects quarantined and then made available in one go?
>
> I don't think so. The object database is perfectly happy with multiple
> simultaneous writers, and nothing impacts the have/wants until actual
> refs are written. Quarantining objects before the refs are written is an
> orthogonal concept.

If a remote advertises its tips, we'd need to look these up (clientside) to
decide if we have them, and I do not think we'd do that via a reachability
check, but via direct lookup in the object data base? So I do not quite
understand, what we gain from the atomic ref writes in e.g. remote/origin/.


> I'm not altogether convinced that parallel fetch would be that much
> faster, though.

Ok, time to present data... Let's assume a degenerate case first:
"up-to-date with all remotes" because that is easy to reproduce.

I have 14 remotes currently:

$ time git fetch --all
real 0m18.016s
user 0m2.027s
sys 0m1.235s

$ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
-P 14 -I % git fetch %
real 0m5.168s
user 0m2.312s
sys 0m1.167s

A factor of >3, so I suspect there is improvement ;)

Well just as Ævar pointed out, there is some improvement.

>
> I usually just do a one-off fetch of their URL in such a case, exactly
> because I _don't_ want to end up with a bunch of remotes. You can also
> mark them with skipDefaultUpdate if you only care about them
> occasionally (so you can "git fetch sbeller" when you care about it, but
> it doesn't slow down your daily "git fetch").

And I assume you don't want the remotes because it takes time to fetch and not
because your disk space is expensive. ;)

>
> -Peff

  parent reply	other threads:[~2016-10-11 23:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 20:12 Make `git fetch --all` parallel? Ram Rachum
2016-10-11 20:53 ` Stefan Beller
2016-10-11 22:37   ` Junio C Hamano
2016-10-11 22:50     ` Stefan Beller
2016-10-11 22:58       ` Junio C Hamano
2016-10-11 22:58       ` Stefan Beller
2016-10-11 22:59       ` Jeff King
2016-10-11 23:16         ` Ævar Arnfjörð Bjarmason
2016-10-11 23:18         ` Stefan Beller [this message]
2016-10-12  1:34           ` Jeff King
2016-10-12  1:52             ` Jeff King
2016-10-12  6:47               ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kaKOiy-HJboaujXXc66P6CLupteDw4JyPOGetREfz_q_Q@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ram@rachum.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).