git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Keith Packard <keithp@keithp.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git-fetch per-repository speed issues
Date: Mon, 3 Jul 2006 22:36:15 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0607032213030.12404@g5.osdl.org> (raw)
In-Reply-To: <1151989503.4723.126.camel@neko.keithp.com>



On Mon, 3 Jul 2006, Keith Packard wrote:
> 
> 5 Start:                             21:59:01.584648000
> 66 After args:                       21:59:01.605987000
> 248 fetch_main() start:              21:59:02.408559000
> 339 fetch_main() before fetch-pack:  21:59:03.293228000
> 387 fetch_main() done:               21:59:04.784388000
> 422 After tag following:             21:59:05.311439000
> 438 All done:                        21:59:05.315338000
> 
> fetch-pack itself took 0.421 seconds (measured with time(1)).
> 
> Looks like the bulk of the time here is caused by simple shell
> processing overhead, some of which scales with the number of heads and
> tags to track.

Ahh.. Do you have tons of tags at the other end?

Looking closer, I suspect a big part of it is that

	git-ls-remote $upload_pack --tags "$remote" |
	sed -ne 's|^\([0-9a-f]*\)[      ]\(refs/tags/.*\)^{}$|\1 \2|p' |
	while read sha1 name
	do
		..
	done

loop.

With a lot of tags, the shell overhead there can indeed be pretty 
disgusting. And I was wrong - I thought it would do that git-ls-remote 
only if the first time around we noticed that we would need to, but we do 
actually do it all the time that we're fetching any new branches. 

The sad part is that we really already got the list once, we just never 
saved it away (ie "git-fetch-pack" actually _knows_ what the tags at the 
other end are, and also knows which tags we already have, so if we made 
git-fetch-pack just create that list and save it off, all the overhead 
would just go away).

And yes, the shell script loops are really really simple, but some of them 
are actually quadratic in the number of refs (O(local*remote)). If this 
was a C program, we'd never even care, but with shell, the thing is slow 
enough that having even a modest amount of tags and refs is going to just 
make it waste a lot of time in shell scripting.

We already do a lot of the infrastructure for "git fetch" in C - the 
remotes parsing etc is all things that "git fetch" used to share with "git 
push", but "git push" has been a builtin C program for a while now. I 
suspect we should just do the same to "git fetch", which would make all 
these issues just totally go away.

			Linus

  reply	other threads:[~2006-07-04  5:36 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard
2006-07-03 23:14 ` Linus Torvalds
2006-07-04  0:21   ` Jeff King
2006-07-04  1:22     ` Ryan Anderson
2006-07-04  1:44       ` Jeff King
2006-07-04  1:55         ` Ryan Anderson
2006-07-04  3:07     ` Linus Torvalds
2006-07-05  6:47       ` Jeff King
2006-07-05 16:40         ` Linus Torvalds
2006-07-04  6:44     ` Jakub Narebski
     [not found]   ` <1151973438.4723.70.camel@neko.keithp.com>
2006-07-04  3:21     ` Linus Torvalds
2006-07-04  3:30       ` Junio C Hamano
2006-07-04  3:40         ` Linus Torvalds
2006-07-04  4:30           ` Keith Packard
2006-07-04 11:10             ` Andreas Ericsson
2006-07-04 11:18               ` Matthias Kestenholz
2006-07-04 12:05                 ` Andreas Ericsson
2006-07-04  4:02       ` Keith Packard
2006-07-04  4:19         ` Linus Torvalds
2006-07-04  5:05           ` Keith Packard
2006-07-04  5:36             ` Linus Torvalds [this message]
2006-07-04  6:21               ` Junio C Hamano
2006-07-04  5:29           ` Keith Packard
2006-07-04  5:53             ` Linus Torvalds
2006-07-04 15:42 ` Jakub Narebski
2006-07-04 16:30   ` Thomas Glanzmann
2006-07-04 17:45   ` Junio C Hamano
2006-07-04 19:22     ` Linus Torvalds
2006-07-04 21:05       ` Junio C Hamano
2006-07-06 23:36 ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0607032213030.12404@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=keithp@keithp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).