git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Ram Rachum <ram@rachum.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Make `git fetch --all` parallel?
Date: Tue, 11 Oct 2016 13:53:03 -0700	[thread overview]
Message-ID: <CAGZ79kZmrYZqi4+bSkRykn+Upt7bEyZ0N8VhiQ-h8DhSMym-FA@mail.gmail.com> (raw)
In-Reply-To: <CANXboVZvfPkTQ10PWop+LgPFpc2bD3-u-e5ix0itGawiwCxOuQ@mail.gmail.com>

On Tue, Oct 11, 2016 at 1:12 PM, Ram Rachum <ram@rachum.com> wrote:
> Hi everyone!
>
> I have a repo that has a bunch of different remotes, and I noticed
> slowness when doing `git fetch --all`. Is it currently made
> sequentially? Do you think that maybe it could be done in parallel so
> it could be much faster?
>
> Thanks,
> Ram.

If you were to run fetching from each remote in parallel
assuming the work load is unchanged, this would speed up the
execution by the number of remotes.

This translation sounds pretty easy at first, but when looking into
the details it is not as easy any more:

What if 2 remotes have the same object (e.g. the same commit)?
Currently this is easy: The first remote to fetch from will deliver that
object to you.

When fetching in parallel, we would want to download that object from
just one remote, preferably the remote with better network connectivity(?)

So I do think it would be much faster, but I also think patches for this would
require some thought and a lot of refactoring of the fetch code.

The current fetch protocol is roughly:

remote: I have these refs:
8a36cd87b7c85a651ab388d403629865ffa3ba0d HEAD
10d26b0d1ef1ebfd09418ec61bdadc299ac988e2 refs/heads/ab/gitweb-abbrev-links
77947bbe24e0306d1ce5605c962c4a25f5aca22f refs/heads/ab/gitweb-link-html-escape
...

client: I want 8a36cd87b7c85a651ab388d403629865ffa3ba0d,
and I have 231ce93d2a0b0b4210c810e865eb5db7ba3032b2
and I have 02d0927973782f4b8b7317b499979fada1105be6
and I have 1172e16af07d6e15bca6398f0ded18a0ae7b9249

remote: I don't know about 231ce93d2a0b0b4210c810e865eb5db7ba3032b2,
nor 02d0927973782f4b8b7317b499979fada1105be6, but
I know about 1172e16af07d6e15bca6398f0ded18a0ae7b9249

.... conversation continues...

remote: Ok I figured out what you need, here is a packfile:
<binary stuff>


During the negotiation phase a client would have to be able to change its
mind (add more "haves", or in case of the parallel fetching these become
"will-have-soons", although the remote figured out the client did not have it
earlier.)

If you want to see more details, see Documentation/technical/pack-protocol.txt

  reply	other threads:[~2016-10-11 20:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 20:12 Make `git fetch --all` parallel? Ram Rachum
2016-10-11 20:53 ` Stefan Beller [this message]
2016-10-11 22:37   ` Junio C Hamano
2016-10-11 22:50     ` Stefan Beller
2016-10-11 22:58       ` Junio C Hamano
2016-10-11 22:58       ` Stefan Beller
2016-10-11 22:59       ` Jeff King
2016-10-11 23:16         ` Ævar Arnfjörð Bjarmason
2016-10-11 23:18         ` Stefan Beller
2016-10-12  1:34           ` Jeff King
2016-10-12  1:52             ` Jeff King
2016-10-12  6:47               ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZmrYZqi4+bSkRykn+Upt7bEyZ0N8VhiQ-h8DhSMym-FA@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=ram@rachum.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).