git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: git@vger.kernel.org
Subject: Re: q: git-fetch a tad slow?
Date: Mon, 28 Jul 2008 22:50:14 -0700	[thread overview]
Message-ID: <20080729055014.GE11947@spearce.org> (raw)
In-Reply-To: <20080728160138.GA12777@elte.hu>

Ingo Molnar <mingo@elte.hu> wrote:
> 
> Setup/background: distributed kernel testing cluster, [...]
> 
> Problem: i noticed that git-fetch is a tad slow:
> 
>   titan:~/tip> time git-fetch
>   real    0m2.372s
> 
> There are hundreds of branches, so i thought fetching a single branch 
> alone would improve things:
> 
>   titan:~/tip> time git-fetch origin master
>   real    0m0.942s
>
> But that's still slow - so i use a (lame) ad-hoc script instead:
> 
>   titan:~/tip> time tip-fetch
>   real    0m0.246s

OK, yes, when there are _many_ branches like that limiting fetch
to a narrow focus of only the branch(es) you must have can make it
go much faster.  Part of the problem is we loop over the branches
many times, and those are O(N) loops (N=number of branches).  We
could do better, but we don't.

One reason why your tip-fetch runs so much better is because we don't
have to enumerate the hundreds of advertised branches offered up by
the remote peer to find the one you want to fetch.  Your tip-fetch
is reading only that one ref file (.git/refs/heads/master) and
that's pretty much it.

In contrast git-upload-pack on the server side must open and read
_all_ ref files under .git/refs/ and send them to the client, who
then has to loop over them at least twice before it can decide if
a match exists.  That's a lot more data to shove down over SSH.
Granted its only 42 bytes + refname per ref, but its still more.

Those O(N) loops I referred to earlier can explain why for hundreds
of branches it gets ugly.  That turns into an O(N^2) matching
algorithm.  Not pretty.  A simple hash would solve a lot of that,
changing the first time from 0m2.372s to much closer to the scond
time of 0m0.942s.

Neither of which can compete with your tip-fetch.

Have you tried using git-pack-refs to pack the branches on the
remote repository?

If you update all of the branches, run `git pack-refs --all --prune`,
then allow the testing clients to start fetching it may go much
quicker.  The pack-refs moves all of the individual ref files into
the single .git/packed-refs file, reducing the number of files we
need to open and read to service a single fetch client.

I wonder if git-pack-refs + fetching only a single branch will get
you closer to the tip-fetch time.

Also, I wonder if you really need to fetch over SSH.  Doing a
fetch over git:// is much quicker, as there is no SSH session
setup overheads.

-- 
Shawn.

  reply	other threads:[~2008-07-29  5:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-28 16:01 q: git-fetch a tad slow? Ingo Molnar
2008-07-29  5:50 ` Shawn O. Pearce [this message]
2008-07-29  9:08   ` Ingo Molnar
2008-07-30  4:48     ` Shawn O. Pearce
2008-07-30 19:06       ` Ingo Molnar
2008-07-30 22:38         ` Shawn O. Pearce
2008-07-31  4:45         ` Shawn O. Pearce
2008-07-31 21:03           ` Ingo Molnar
2008-07-31 21:11             ` Ingo Molnar
2008-07-31 21:19               ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080729055014.GE11947@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).