git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* q: git-fetch a tad slow?
@ 2008-07-28 16:01 Ingo Molnar
  2008-07-29  5:50 ` Shawn O. Pearce
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2008-07-28 16:01 UTC (permalink / raw)
  To: git


here's another possibly stupid question.

Setup/background: distributed kernel testing cluster, there's a central 
box with a git repo of the kernel, and lots of of testboxes that track 
that repo over ssh transport. In each "iteration" a random kernel config 
is generated, built and booted, and the booted up kernel is checked. 
Performance of each iteration matters to total testing throughput, so i 
try to optimize the critical path.

Problem: i noticed that git-fetch is a tad slow:

  titan:~/tip> time git-fetch
 
  real    0m2.372s
  user    0m0.814s
  sys     0m0.951s

There are hundreds of branches, so i thought fetching a single branch 
alone would improve things:

  titan:~/tip> time git-fetch origin master

  real    0m0.942s
  user    0m0.285s
  sys     0m0.109s

But that's still slow - so i use a (lame) ad-hoc script instead:

  titan:~/tip> time tip-fetch

  real    0m0.246s
  user    0m0.024s
  sys     0m0.019s

... which ssh's to the repo to check tip/master by hand:

  HEAD=$(git-log -1 --pretty=format:"%H" HEAD)
  RHEAD=$(ssh server "cd tip; git-log master -1 --pretty=format:'%H'")
  [ "$RHEAD" != "$HEAD" ] && {
    [...]
  }

... which script is lame/expensive on multiple levels but still is much 
faster.

I'm wondering, am i missing something obvious? It seems most of the 
overhead is local CPU overhead, so it's something in Git's domain and 
not the expense of the ssh protocol. (which expense should be about 200 
msecs)

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-28 16:01 q: git-fetch a tad slow? Ingo Molnar
@ 2008-07-29  5:50 ` Shawn O. Pearce
  2008-07-29  9:08   ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Shawn O. Pearce @ 2008-07-29  5:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git

Ingo Molnar <mingo@elte.hu> wrote:
> 
> Setup/background: distributed kernel testing cluster, [...]
> 
> Problem: i noticed that git-fetch is a tad slow:
> 
>   titan:~/tip> time git-fetch
>   real    0m2.372s
> 
> There are hundreds of branches, so i thought fetching a single branch 
> alone would improve things:
> 
>   titan:~/tip> time git-fetch origin master
>   real    0m0.942s
>
> But that's still slow - so i use a (lame) ad-hoc script instead:
> 
>   titan:~/tip> time tip-fetch
>   real    0m0.246s

OK, yes, when there are _many_ branches like that limiting fetch
to a narrow focus of only the branch(es) you must have can make it
go much faster.  Part of the problem is we loop over the branches
many times, and those are O(N) loops (N=number of branches).  We
could do better, but we don't.

One reason why your tip-fetch runs so much better is because we don't
have to enumerate the hundreds of advertised branches offered up by
the remote peer to find the one you want to fetch.  Your tip-fetch
is reading only that one ref file (.git/refs/heads/master) and
that's pretty much it.

In contrast git-upload-pack on the server side must open and read
_all_ ref files under .git/refs/ and send them to the client, who
then has to loop over them at least twice before it can decide if
a match exists.  That's a lot more data to shove down over SSH.
Granted its only 42 bytes + refname per ref, but its still more.

Those O(N) loops I referred to earlier can explain why for hundreds
of branches it gets ugly.  That turns into an O(N^2) matching
algorithm.  Not pretty.  A simple hash would solve a lot of that,
changing the first time from 0m2.372s to much closer to the scond
time of 0m0.942s.

Neither of which can compete with your tip-fetch.

Have you tried using git-pack-refs to pack the branches on the
remote repository?

If you update all of the branches, run `git pack-refs --all --prune`,
then allow the testing clients to start fetching it may go much
quicker.  The pack-refs moves all of the individual ref files into
the single .git/packed-refs file, reducing the number of files we
need to open and read to service a single fetch client.

I wonder if git-pack-refs + fetching only a single branch will get
you closer to the tip-fetch time.

Also, I wonder if you really need to fetch over SSH.  Doing a
fetch over git:// is much quicker, as there is no SSH session
setup overheads.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-29  5:50 ` Shawn O. Pearce
@ 2008-07-29  9:08   ` Ingo Molnar
  2008-07-30  4:48     ` Shawn O. Pearce
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2008-07-29  9:08 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git


* Shawn O. Pearce <spearce@spearce.org> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > Setup/background: distributed kernel testing cluster, [...]
> > 
> > Problem: i noticed that git-fetch is a tad slow:
> > 
> >   titan:~/tip> time git-fetch
> >   real    0m2.372s

> Also, I wonder if you really need to fetch over SSH.  Doing a fetch 
> over git:// is much quicker, as there is no SSH session setup 
> overheads.

note that titan is a very beefy box, almost 3 GHz Core2Duo:

   model name      : Intel(R) Core(TM)2 CPU         E6800  @ 2.93GHz
   stepping        : 5
   cpu MHz         : 2933.331

server is 3 GHz. So if we have a quadratic overhead on number of 
branches, that's going to be quite a PITA.

> I wonder if git-pack-refs + fetching only a single branch will get you 
> closer to the tip-fetch time.

should i pack on both repos? I dont explicitly pack anything, but on the 
server it goes into regular gc runs. (which will pack most stuff, 
right?)

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-29  9:08   ` Ingo Molnar
@ 2008-07-30  4:48     ` Shawn O. Pearce
  2008-07-30 19:06       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Shawn O. Pearce @ 2008-07-30  4:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git

Ingo Molnar <mingo@elte.hu> wrote:
> * Shawn O. Pearce <spearce@spearce.org> wrote:
> > Ingo Molnar <mingo@elte.hu> wrote:
> > > 
> > > Setup/background: distributed kernel testing cluster, [...]
> > > 
> > > Problem: i noticed that git-fetch is a tad slow:
> > > 
> > >   titan:~/tip> time git-fetch
> > >   real    0m2.372s
>
> note that titan is a very beefy box, almost 3 GHz Core2Duo:

That isn't going to matter if you have a quadratic algorithm and a
large dataset.  Especially when the inner loops are doing multiple
system calls per item in a long list of items.  :-|   Linux is fast,
but it isn't magic pixie dust.  It cannot fix broken applications.
 
> [...] So if we have a quadratic overhead on number of 
> branches, that's going to be quite a PITA.

Right.

> > I wonder if git-pack-refs + fetching only a single branch will get you 
> > closer to the tip-fetch time.
> 
> should i pack on both repos? I dont explicitly pack anything, but on the 
> server it goes into regular gc runs. (which will pack most stuff, 
> right?)

git-gc automatically runs `git pack-refs --all --prune` like I
recommended, unless you disabled it with config gc.packrefs = false.
So its probably already packed.

What does `find .git/refs -type f | wc -l` give for the repository
on the central server?  If its more than a handful (~20) I would
suggest running git-gc before testing again.

But I'm really suspecting that this is just our quadratic matching
algorithm running up against a large number of branches, causing
it to suck.

jgit at least uses an O(N) algorithm here, but since it is written
in Java its of course slow compared to C Git.  Takes a while to
get that JVM running.

I'll try to find some time to reproduce the issue and look at the
bottleneck here.  I'm two days into a new job so my git time has
been really quite short this week.  :-|

-- 
Shawn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-30  4:48     ` Shawn O. Pearce
@ 2008-07-30 19:06       ` Ingo Molnar
  2008-07-30 22:38         ` Shawn O. Pearce
  2008-07-31  4:45         ` Shawn O. Pearce
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2008-07-30 19:06 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git


* Shawn O. Pearce <spearce@spearce.org> wrote:

> > should i pack on both repos? I dont explicitly pack anything, but on 
> > the server it goes into regular gc runs. (which will pack most 
> > stuff, right?)
> 
> git-gc automatically runs `git pack-refs --all --prune` like I 
> recommended, unless you disabled it with config gc.packrefs = false. 
> So its probably already packed.
> 
> What does `find .git/refs -type f | wc -l` give for the repository on 
> the central server?  If its more than a handful (~20) I would suggest 
> running git-gc before testing again.

ah, you are right, it gave 275, then git-gc brought it down to two:

  earth4:~/tip> find .git/refs -type f | wc -l
  275
  earth4:~/tip> git gc
  earth4:~/tip> find .git/refs -type f | wc -l
  2

i turned off auto-gc recently (two weeks ago) because it was 
auto-triggering _way_ too frequently. (like on every fifth merge i was 
doing or so)

alas, fetching still seems to be slow:

  titan:~/tip> time git-fetch origin

  real    0m5.112s
  user    0m0.972s
  sys     0m3.380s

(but the gc run has not finished yet on the central repo so this isnt 
fully valid.)

> But I'm really suspecting that this is just our quadratic matching 
> algorithm running up against a large number of branches, causing it to 
> suck.
> 
> jgit at least uses an O(N) algorithm here, but since it is written in 
> Java its of course slow compared to C Git.  Takes a while to get that 
> JVM running.
> 
> I'll try to find some time to reproduce the issue and look at the 
> bottleneck here.  I'm two days into a new job so my git time has been 
> really quite short this week.  :-|

fetching the -tip repo:

   http://people.redhat.com/mingo/tip.git/README

and then running 'git remote update' will i think already show this 
problem for you too. People have been complaining about how slow the 
update is.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-30 19:06       ` Ingo Molnar
@ 2008-07-30 22:38         ` Shawn O. Pearce
  2008-07-31  4:45         ` Shawn O. Pearce
  1 sibling, 0 replies; 10+ messages in thread
From: Shawn O. Pearce @ 2008-07-30 22:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git

Ingo Molnar <mingo@elte.hu> wrote:
> * Shawn O. Pearce <spearce@spearce.org> wrote:
> > 
> > What does `find .git/refs -type f | wc -l` give for the repository on 
> > the central server?  If its more than a handful (~20) I would suggest 
> > running git-gc before testing again.
> 
> ah, you are right, it gave 275, then git-gc brought it down to two:
> 
>   earth4:~/tip> find .git/refs -type f | wc -l
>   275
>   earth4:~/tip> git gc
>   earth4:~/tip> find .git/refs -type f | wc -l
>   2
> 
> alas, fetching still seems to be slow:
> 
>   titan:~/tip> time git-fetch origin
> 
>   real    0m5.112s
>   user    0m0.972s
>   sys     0m3.380s

Yea, OK, there's definately performance problems there.  And it
should be fast.  Its too common of a case (fetching small deltas).
 
> > I'll try to find some time to reproduce the issue and look at the 
> > bottleneck here.  I'm two days into a new job so my git time has been 
> > really quite short this week.  :-|
> 
> fetching the -tip repo:
> 
>    http://people.redhat.com/mingo/tip.git/README
> 
> and then running 'git remote update' will i think already show this 
> problem for you too. People have been complaining about how slow the 
> update is.

Thanks.  I'll try to poke at it this evening and see what I find.
git-fetch should be running faster than this.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-30 19:06       ` Ingo Molnar
  2008-07-30 22:38         ` Shawn O. Pearce
@ 2008-07-31  4:45         ` Shawn O. Pearce
  2008-07-31 21:03           ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Shawn O. Pearce @ 2008-07-31  4:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git

Ingo Molnar <mingo@elte.hu> wrote:
> alas, fetching still seems to be slow:
> 
>   titan:~/tip> time git-fetch origin
> 
>   real    0m5.112s
>   user    0m0.972s
>   sys     0m3.380s

What version of git are dealing with on the client side?

I only have a MacBook Pro (2.4 GHz Intel Core 2 Duo) and I'm getting
fetch times of ~472 ms over git:// to your -tip.git tree and ~128
ms for strictly local fetch.  If your SSH overhead is ~300 ms this
is only a ~700 ms real time for `git fetch origin`, not 5100 ms.

Is your git-fetch a shell script?  Or a compiled binary?  The port
into C made it go _much_ faster, even though it is still a naive
O(N^2) matching algorithm.  Yea, we still should fix that, but
I think an upgrade to 1.5.4 or later would make the client side
improve consideribly.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-31  4:45         ` Shawn O. Pearce
@ 2008-07-31 21:03           ` Ingo Molnar
  2008-07-31 21:11             ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2008-07-31 21:03 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git


* Shawn O. Pearce <spearce@spearce.org> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> > alas, fetching still seems to be slow:
> > 
> >   titan:~/tip> time git-fetch origin
> > 
> >   real    0m5.112s
> >   user    0m0.972s
> >   sys     0m3.380s
> 
> What version of git are dealing with on the client side?

the client side on titan has:

 titan:~> git version
 git version 1.5.2.2

oldish but not outrageously old, right?

 server side has:

 earth4:~> git version
 git version 1.5.6.1.108.g660379

> 
> fetch times of ~472 ms over git:// to your -tip.git tree and ~128 ms 
> for strictly local fetch.  If your SSH overhead is ~300 ms this is 
> only a ~700 ms real time for `git fetch origin`, not 5100 ms.
> 
> Is your git-fetch a shell script?  Or a compiled binary?  The port 
> into C made it go _much_ faster, even though it is still a naive 
> O(N^2) matching algorithm.  Yea, we still should fix that, but I think 
> an upgrade to 1.5.4 or later would make the client side improve 
> consideribly.

ah, it is a shell script indeed! I'll upgrade to latest.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-31 21:03           ` Ingo Molnar
@ 2008-07-31 21:11             ` Ingo Molnar
  2008-07-31 21:19               ` Shawn O. Pearce
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2008-07-31 21:11 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git


* Ingo Molnar <mingo@elte.hu> wrote:

> > for strictly local fetch.  If your SSH overhead is ~300 ms this is 
> > only a ~700 ms real time for `git fetch origin`, not 5100 ms.
> > 
> > Is your git-fetch a shell script?  Or a compiled binary?  The port 
> > into C made it go _much_ faster, even though it is still a naive 
> > O(N^2) matching algorithm.  Yea, we still should fix that, but I 
> > think an upgrade to 1.5.4 or later would make the client side 
> > improve consideribly.
> 
> ah, it is a shell script indeed! I'll upgrade to latest.

on another box, with 1.5.4, i have:

 dione:~/tip> time git fetch origin

 real    0m0.481s
 user    0m0.136s
 sys     0m0.060s

 dione:~/tip> time ./tip-fetch
 b714d1a257cca93ba6422ca3276ac80a2cde2b59
 b714d1a257cca93ba6422ca3276ac80a2cde2b59

 real    0m0.273s
 user    0m0.012s
 sys     0m0.020s

that's a 2.66 GHz core2 quad, i.e. a pretty fast box too. As you can see 
most time spent in the tip-fetch case was waiting for the network. So 
there's about 200 msecs of extra CPU cost on the local side. On a CPU 
1-2 generations older that could be up to 1000 msecs or more.

In any case, performance has improved significantly with the C version! 
(I'll still use tip-fetch to squeeze out the last bit of performance, 
but it's quite comparable now.)

Sorry that i didnt notice that titan had 1.5.2 - i almost never notice 
it when i switch between stable git versions. (you guys are doing a 
really good job on compatibility)

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: q: git-fetch a tad slow?
  2008-07-31 21:11             ` Ingo Molnar
@ 2008-07-31 21:19               ` Shawn O. Pearce
  0 siblings, 0 replies; 10+ messages in thread
From: Shawn O. Pearce @ 2008-07-31 21:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git

Ingo Molnar <mingo@elte.hu> wrote:
> 
> on another box, with 1.5.4, i have:
> 
>  dione:~/tip> time git fetch origin
> 
>  real    0m0.481s
>  user    0m0.136s
>  sys     0m0.060s
> 
>  dione:~/tip> time ./tip-fetch
>  b714d1a257cca93ba6422ca3276ac80a2cde2b59
>  b714d1a257cca93ba6422ca3276ac80a2cde2b59
> 
>  real    0m0.273s
>  user    0m0.012s
>  sys     0m0.020s
> 
> that's a 2.66 GHz core2 quad, i.e. a pretty fast box too. As you can see 
> most time spent in the tip-fetch case was waiting for the network. So 
> there's about 200 msecs of extra CPU cost on the local side.

Yea.  My testing last night was suggesting about 1/2 of that 200
ms is on the client, and the other 200 ms is on the server side
of the connection.  That matches up somewhat with your test above,
where git-fetch used about 100 ms more user time on the client side
than your tip-fetch shell script.

I have no clue where the bottleneck is, I didn't get that far before
I realized you must have been running a shell script based git-fetch
to be seeing the performance you were.

Maybe 1.6.1 or .2 we can try to squeeze fetch to run faster.
Its far too late for 1.6.0.

> Sorry that i didnt notice that titan had 1.5.2 - i almost never notice 
> it when i switch between stable git versions. (you guys are doing a 
> really good job on compatibility)

Yea, its easy to not realize your git isn't giving you the latest
and greatest toys.  ;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-31 21:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-28 16:01 q: git-fetch a tad slow? Ingo Molnar
2008-07-29  5:50 ` Shawn O. Pearce
2008-07-29  9:08   ` Ingo Molnar
2008-07-30  4:48     ` Shawn O. Pearce
2008-07-30 19:06       ` Ingo Molnar
2008-07-30 22:38         ` Shawn O. Pearce
2008-07-31  4:45         ` Shawn O. Pearce
2008-07-31 21:03           ` Ingo Molnar
2008-07-31 21:11             ` Ingo Molnar
2008-07-31 21:19               ` Shawn O. Pearce

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).