* git-fetch per-repository speed issues
@ 2006-07-03 18:02 Keith Packard
2006-07-03 23:14 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Keith Packard @ 2006-07-03 18:02 UTC (permalink / raw
To: Git Mailing List; +Cc: keithp
[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]
Ok, so maybe X.org is using git in an unexpected (or even wrong)
fashion. Our environment has split development across dozens of separate
repositories which match ABI interfaces. With CVS, we were able to keep
this all in one giant CVS repository with separate modules, but git
doesn't have that notion (which is mostly good). As such, we could use
cvsup or rsync to update the entire collection of modules.
With git, we'd prefer to use the git protocol instead of rsync for the
usual pack-related reasons, but that is limited to a single repository
at a time. And, it's painfully slow, even when the repository is up to
date:
$ cd lib/libXrandr
$ time git-fetch origin
...
real 0m17.035s
user 0m2.584s
sys 0m0.576s
This is a repository with 24 files and perhaps 50 revisions. Given
X.org's 307 git repositories, I'll clearly need to find a faster way
than running git-fetch on every one.
One thing I noticed was that the git+ssh syntax found in remotes files
doesn't do what I thought it did -- I assumed this would use 'git' for
fetch and 'ssh' for push, when in fact it just uses ssh for everything.
This slows down the connection process by several seconds.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard
@ 2006-07-03 23:14 ` Linus Torvalds
2006-07-04 0:21 ` Jeff King
[not found] ` <1151973438.4723.70.camel@neko.keithp.com>
2006-07-04 15:42 ` Jakub Narebski
2006-07-06 23:36 ` David Woodhouse
2 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2006-07-03 23:14 UTC (permalink / raw
To: Keith Packard; +Cc: Git Mailing List
On Mon, 3 Jul 2006, Keith Packard wrote:
>
> With git, we'd prefer to use the git protocol instead of rsync for the
> usual pack-related reasons, but that is limited to a single repository
> at a time.
Well, you could use multiple branches in the same repository, even if they
are totally unrealated. That would allow you to fetch them all in one go.
One way to do that is to just name the branches hierarcially have one
repo, but then call the branches something like
libXrandr/master
libXrandr/develop
Xorg/master
Xorg/develop
..
> And, it's painfully slow, even when the repository is up to
> date:
>
> $ cd lib/libXrandr
> $ time git-fetch origin
> ...
>
> real 0m17.035s
> user 0m2.584s
> sys 0m0.576s
That's _seriously_ wrong. If everything is up-to-date, a fetch should be
basically zero-cost. That's especially true with the anonymous git
protocol, which doesn't have any connection validation overhead (for the
ssh protocol, the cost is usually the ssh login).
But there may well be some bug there.
Look at this:
[torvalds@g5 git]$ time git fetch git://git.kernel.org/pub/scm/git/git.git
real 0m0.431s
user 0m0.036s
sys 0m0.024s
and that's over my DSL line, not some studly network thing.
Basically, a repo that is up-to-date should do a "git fetch" about as
quickly as it does a "git ls-remote". Which in turn really shouldn't be
doing much anything at all, apart from the connect itself:
[torvalds@g5 git]$ time git ls-remote master.kernel.org:/pub/scm/git/git.git > /dev/null
real 0m1.758s
user 0m0.188s
sys 0m0.024s
[torvalds@g5 git]$ time git ls-remote git://git.kernel.org/pub/scm/git/git.git > /dev/null
real 0m0.431s
user 0m0.056s
sys 0m0.016s
(note how the ssh connection is much slower - it actually ends up doing
all the ssh back-and-forth).
Can you try from different hosts? One problem may be the remote end
just trying to do reverse DNS lookups for xinetd or whatever?
Also, one thing to try is to just do
strace -Ttt git-peek-remote ...
which shows where the time is going (I selected "git-peek-remote", because
that's a simple program).
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-03 23:14 ` Linus Torvalds
@ 2006-07-04 0:21 ` Jeff King
2006-07-04 1:22 ` Ryan Anderson
` (2 more replies)
[not found] ` <1151973438.4723.70.camel@neko.keithp.com>
1 sibling, 3 replies; 30+ messages in thread
From: Jeff King @ 2006-07-04 0:21 UTC (permalink / raw
To: Linus Torvalds; +Cc: Keith Packard, Git Mailing List
On Mon, Jul 03, 2006 at 04:14:10PM -0700, Linus Torvalds wrote:
> Well, you could use multiple branches in the same repository, even if they
> are totally unrealated. That would allow you to fetch them all in one go.
One annoying thing about this is that you may want to have several of
the branches checked out at a time (i.e., you want the actual directory
structure of libXrandr/, Xorg/, etc). You could pull everything down
into one repo and point small pseudo-repos at it with alternates, but I
would think that would become a mess with pushes. You can do some magic
with read-tree --prefix, but again, I'm not sure how you'd make commits
on the correct branch. Is there an easier way to do this?
> Basically, a repo that is up-to-date should do a "git fetch" about as
> quickly as it does a "git ls-remote". Which in turn really shouldn't be
> doing much anything at all, apart from the connect itself:
Fetching by ssh actually makes two ssh connections (the second is to
grab tags).
-Peff
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 0:21 ` Jeff King
@ 2006-07-04 1:22 ` Ryan Anderson
2006-07-04 1:44 ` Jeff King
2006-07-04 3:07 ` Linus Torvalds
2006-07-04 6:44 ` Jakub Narebski
2 siblings, 1 reply; 30+ messages in thread
From: Ryan Anderson @ 2006-07-04 1:22 UTC (permalink / raw
To: Jeff King; +Cc: Linus Torvalds, Keith Packard, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]
Jeff King wrote:
> On Mon, Jul 03, 2006 at 04:14:10PM -0700, Linus Torvalds wrote:
>
>
>> Well, you could use multiple branches in the same repository, even if they
>> are totally unrealated. That would allow you to fetch them all in one go.
>>
>
> One annoying thing about this is that you may want to have several of
> the branches checked out at a time (i.e., you want the actual directory
> structure of libXrandr/, Xorg/, etc). You could pull everything down
> into one repo and point small pseudo-repos at it with alternates, but I
> would think that would become a mess with pushes. You can do some magic
> with read-tree --prefix, but again, I'm not sure how you'd make commits
> on the correct branch. Is there an easier way to do this?
>
You can have multiple source trees, one per 'branch' (which is a bit of
a bad term here), and have completely unrelated things in the branches.
See, for an example, the main Git repo, which has the "man", "html", and
"todo" branches, logically distinct and (somewhat) unrelated to the main
branch tucked away in "master".
--
Ryan Anderson
sometimes Pug Majere
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 1:22 ` Ryan Anderson
@ 2006-07-04 1:44 ` Jeff King
2006-07-04 1:55 ` Ryan Anderson
0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2006-07-04 1:44 UTC (permalink / raw
To: Ryan Anderson; +Cc: Linus Torvalds, Keith Packard, Git Mailing List
On Mon, Jul 03, 2006 at 06:22:26PM -0700, Ryan Anderson wrote:
> You can have multiple source trees, one per 'branch' (which is a bit of
> a bad term here), and have completely unrelated things in the branches.
>
> See, for an example, the main Git repo, which has the "man", "html", and
> "todo" branches, logically distinct and (somewhat) unrelated to the main
> branch tucked away in "master".
Right, I know, but my complaint is that I can't then turn that into a
directory hierarchy of .../man, .../html, .../todo that are all checked
out at the same time (there are obviously ways of playing with it, say
by setting GIT_DIR and doing a checkout in those directories, but then I
can't use git in the normal way).
The best I can come up with is having man, html, and todo repos pointing
to the one (now local) repo which contains everything. But then pushing
is a two-step process.
-Peff
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 1:44 ` Jeff King
@ 2006-07-04 1:55 ` Ryan Anderson
0 siblings, 0 replies; 30+ messages in thread
From: Ryan Anderson @ 2006-07-04 1:55 UTC (permalink / raw
To: Jeff King; +Cc: Linus Torvalds, Keith Packard, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]
Jeff King wrote:
> On Mon, Jul 03, 2006 at 06:22:26PM -0700, Ryan Anderson wrote:
>
>
>> You can have multiple source trees, one per 'branch' (which is a bit of
>> a bad term here), and have completely unrelated things in the branches.
>>
>> See, for an example, the main Git repo, which has the "man", "html", and
>> "todo" branches, logically distinct and (somewhat) unrelated to the main
>> branch tucked away in "master".
>>
>
> Right, I know, but my complaint is that I can't then turn that into a
> directory hierarchy of .../man, .../html, .../todo that are all checked
> out at the same time (there are obviously ways of playing with it, say
> by setting GIT_DIR and doing a checkout in those directories, but then I
> can't use git in the normal way).
>
> The best I can come up with is having man, html, and todo repos pointing
> to the one (now local) repo which contains everything. But then pushing
> is a two-step process.
>
>
Hrm, if I understand CVS at all, the old workflow was "cvsup a copy of
the repository, update a working tree against that", which is, I think,
actually even worse than the Git equivalent, since you can't reliably
even commit to that local clone of the CVS repository.
What am I missing?
You can still push directly upstream, I suppose, and just do 2-stage
pulls down.
--
Ryan Anderson
sometimes Pug Majere
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 0:21 ` Jeff King
2006-07-04 1:22 ` Ryan Anderson
@ 2006-07-04 3:07 ` Linus Torvalds
2006-07-05 6:47 ` Jeff King
2006-07-04 6:44 ` Jakub Narebski
2 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2006-07-04 3:07 UTC (permalink / raw
To: Jeff King; +Cc: Keith Packard, Git Mailing List
On Mon, 3 Jul 2006, Jeff King wrote:
>
> Fetching by ssh actually makes two ssh connections (the second is to
> grab tags).
True. Although that should happen only if there are any new tags.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
[not found] ` <1151973438.4723.70.camel@neko.keithp.com>
@ 2006-07-04 3:21 ` Linus Torvalds
2006-07-04 3:30 ` Junio C Hamano
2006-07-04 4:02 ` Keith Packard
0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2006-07-04 3:21 UTC (permalink / raw
To: Keith Packard; +Cc: Git Mailing List, Junio C Hamano
On Mon, 3 Jul 2006, Keith Packard wrote:
> On Mon, 2006-07-03 at 16:14 -0700, Linus Torvalds wrote:
> >
> > Well, you could use multiple branches in the same repository, even if they
> > are totally unrealated. That would allow you to fetch them all in one go.
>
> I'd like to avoid this; the hope is that most people won't ever need to
> look at most repositories; it would be somewhat like having glibc in the
> same repo as the kernel...
Sure, understood. I'm just saying that if you want to fetch in one go,
it's one possibility.
However, your setup has something else seriously wrong.
> Yeah, I tried with the git protocol and it's a few seconds faster (about
> 14 seconds instead of 17). Ick.
That's -still- about 13 seconds too much.
> I think it might have something to do with the number of heads we're
> tracking.
It really shouldn't matter. You get all the heads in one go with a single
connection, so if 32 heads takes 32 times longer, there's something wrong.
> > Also, one thing to try is to just do
> >
> > strace -Ttt git-peek-remote ...
>
> That's plenty fast, 0.410 seconds, with nothing ugly in the strace.
Ok, a "git fetch" really shouldn't take any longer than a single
connection. However, the fact that you have 32 heads, and it takes pretty
close to _exactly_ 32 times 0.410 seconds (32*0.410s = 13.1s) makes me
suspect that "git fetch" is just broken and fetches one branch at a time.
Which would be just stupid.
But look as I might, I see only that one "git-fetch-pack" in git-fetch.sh
that should trigger. Once. Not 32 times. But your timings sure sound like
it's doing a _lot_ more than it should.
Junio, any ideas?
Keithp, can you try this trivial patch? It _should_ say something like
Fetching
refs/heads/master
refs/heads/...
refs/heads/...
...
refs/heads/... from git://..../...
and more importantly, it should say so only once.
And then it should leave a "fetch.trace" file in your working directory,
which should show where that _one_ thing spends its time.
Linus
----
diff --git a/git-fetch.sh b/git-fetch.sh
index 48818f8..4739202 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -339,6 +339,8 @@ fetch_main () {
( : subshell because we muck with IFS
IFS=" $LF"
(
+ echo "Fetching $rref from $remote" >&2
+ strace -o fetch.trace -Ttt \
git-fetch-pack $exec $keep --thin "$remote" $rref || echo failed "$remote"
) |
while read sha1 remote_name
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 3:21 ` Linus Torvalds
@ 2006-07-04 3:30 ` Junio C Hamano
2006-07-04 3:40 ` Linus Torvalds
2006-07-04 4:02 ` Keith Packard
1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-07-04 3:30 UTC (permalink / raw
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> Ok, a "git fetch" really shouldn't take any longer than a single
> connection. However, the fact that you have 32 heads, and it takes pretty
> close to _exactly_ 32 times 0.410 seconds (32*0.410s = 13.1s) makes me
> suspect that "git fetch" is just broken and fetches one branch at a time.
>
> Which would be just stupid.
>
> But look as I might, I see only that one "git-fetch-pack" in git-fetch.sh
> that should trigger. Once. Not 32 times. But your timings sure sound like
> it's doing a _lot_ more than it should.
>
> Junio, any ideas?
Isn't that because the repository have 32 subprojects, totally
unrelated content-wise? If you have real stuff to pull from
there your pack generation needs to do 32 time as much work as
you would for a single head in that case.
If you are discussing "peek-remote runs, find out the 32 heads
are all up to date and no pack is generated" case, then you are
right. There is one single fetch-pack to grab the specified
heads, and after that, an optional single ls-remote and
fetch-pack runs only once to follow all new tags.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 3:30 ` Junio C Hamano
@ 2006-07-04 3:40 ` Linus Torvalds
2006-07-04 4:30 ` Keith Packard
0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2006-07-04 3:40 UTC (permalink / raw
To: Junio C Hamano; +Cc: git
On Mon, 3 Jul 2006, Junio C Hamano wrote:
>
> Isn't that because the repository have 32 subprojects, totally
> unrelated content-wise? If you have real stuff to pull from
> there your pack generation needs to do 32 time as much work as
> you would for a single head in that case.
No, Keith said this was for the case where the fetching repository is
already totally up-to-date:
"And, it's painfully slow, even when the repository is up to date"
and gave a 17-second time.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 3:21 ` Linus Torvalds
2006-07-04 3:30 ` Junio C Hamano
@ 2006-07-04 4:02 ` Keith Packard
2006-07-04 4:19 ` Linus Torvalds
1 sibling, 1 reply; 30+ messages in thread
From: Keith Packard @ 2006-07-04 4:02 UTC (permalink / raw
To: Linus Torvalds, Git Mailing List; +Cc: keithp
[-- Attachment #1: Type: text/plain, Size: 732 bytes --]
On Mon, 2006-07-03 at 20:21 -0700, Linus Torvalds wrote:
> Keithp, can you try this trivial patch? It _should_ say something like
Yeah, it says that only once. And, it runs the fetch-pack in about .5
seconds. And, now the whole process completes in 4.7 seconds; perhaps
the remote server is less loaded than earlier this afternoon? It's also
possible that I was running old git bits here, but I don't think so.
> And then it should leave a "fetch.trace" file in your working directory,
> which should show where that _one_ thing spends its time.
It looks boring to me and spent 0.55 from start to finish. I can send
along the whole trace if you have an acute desire to peer at it.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 4:02 ` Keith Packard
@ 2006-07-04 4:19 ` Linus Torvalds
2006-07-04 5:05 ` Keith Packard
2006-07-04 5:29 ` Keith Packard
0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2006-07-04 4:19 UTC (permalink / raw
To: Keith Packard; +Cc: Git Mailing List
On Mon, 3 Jul 2006, Keith Packard wrote:
>
> Yeah, it says that only once. And, it runs the fetch-pack in about .5
> seconds. And, now the whole process completes in 4.7 seconds; perhaps
> the remote server is less loaded than earlier this afternoon?
Well, that's still strange. What takes 4.2 seconds then?
> > And then it should leave a "fetch.trace" file in your working directory,
> > which should show where that _one_ thing spends its time.
>
> It looks boring to me and spent 0.55 from start to finish. I can send
> along the whole trace if you have an acute desire to peer at it.
No, the 0.5 seconds is what I _expected_. There's something strange going
on in your git fetch that it takes any longer than that.
Can you instrument your "git-fetch.sh" script (just add random
(echo $LINENO ; date) >&2
lines all over) to see what is so expensive?
That fetch-pack really should be the most expensive part by far (and half
a second sounds right), but it clearly isn't. At 4.7s, your fetch is still
taking about ten times longer than it _should_.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 3:40 ` Linus Torvalds
@ 2006-07-04 4:30 ` Keith Packard
2006-07-04 11:10 ` Andreas Ericsson
0 siblings, 1 reply; 30+ messages in thread
From: Keith Packard @ 2006-07-04 4:30 UTC (permalink / raw
To: Linus Torvalds; +Cc: keithp, Junio C Hamano, git
[-- Attachment #1: Type: text/plain, Size: 2093 bytes --]
On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote:
> "And, it's painfully slow, even when the repository is up to date"
>
> and gave a 17-second time.
It's faster this evening, down to 8 seconds using ssh and 4 seconds
using git. I clearly need to force use of the git protocol. Anyone else
like the attached patch?
---
connect.c | 18 ++++++++++++++----
1 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/connect.c b/connect.c
index 9a87bd9..e74eddc 100644
--- a/connect.c
+++ b/connect.c
@@ -303,6 +303,7 @@ enum protocol {
PROTO_LOCAL = 1,
PROTO_SSH,
PROTO_GIT,
+ PROTO_GIT_SSH,
};
static enum protocol get_protocol(const char *name)
@@ -312,9 +313,9 @@ static enum protocol get_protocol(const
if (!strcmp(name, "git"))
return PROTO_GIT;
if (!strcmp(name, "git+ssh"))
- return PROTO_SSH;
+ return PROTO_GIT_SSH;
if (!strcmp(name, "ssh+git"))
- return PROTO_SSH;
+ return PROTO_GIT_SSH;
die("I don't handle protocol '%s'", name);
}
@@ -572,6 +573,14 @@ static void git_proxy_connect(int fd[2],
close(pipefd[1][0]);
}
+/* returns whether the specified command can be interpreted by the
daemon */
+int git_is_daemon_command (const char *prog)
+{
+ if (!strcmp("git-upload-pack", prog))
+ return 1;
+ return 0;
+}
+
/*
* Yeah, yeah, fixme. Need to pass in the heads etc.
*/
@@ -641,7 +650,8 @@ int git_connect(int fd[2], char *url, co
*ptr = '\0';
}
- if (protocol == PROTO_GIT) {
+ if (protocol == PROTO_GIT ||
+ (protocol == PROTO_GIT_SSH && git_is_daemon_command (prog))) {
/* These underlying connection commands die() if they
* cannot connect.
*/
@@ -678,7 +688,7 @@ int git_connect(int fd[2], char *url, co
close(pipefd[0][1]);
close(pipefd[1][0]);
close(pipefd[1][1]);
- if (protocol == PROTO_SSH) {
+ if (protocol == PROTO_SSH || protocol == PROTO_GIT_SSH) {
const char *ssh, *ssh_basename;
ssh = getenv("GIT_SSH");
if (!ssh) ssh = "ssh";
--
1.4.1.g8fced-dirty
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 4:19 ` Linus Torvalds
@ 2006-07-04 5:05 ` Keith Packard
2006-07-04 5:36 ` Linus Torvalds
2006-07-04 5:29 ` Keith Packard
1 sibling, 1 reply; 30+ messages in thread
From: Keith Packard @ 2006-07-04 5:05 UTC (permalink / raw
To: Linus Torvalds; +Cc: keithp, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 865 bytes --]
On Mon, 2006-07-03 at 21:19 -0700, Linus Torvalds wrote:
> Can you instrument your "git-fetch.sh" script (just add random
>
> (echo $LINENO ; date) >&2
>
> lines all over) to see what is so expensive?
5 Start: 21:59:01.584648000
66 After args: 21:59:01.605987000
248 fetch_main() start: 21:59:02.408559000
339 fetch_main() before fetch-pack: 21:59:03.293228000
387 fetch_main() done: 21:59:04.784388000
422 After tag following: 21:59:05.311439000
438 All done: 21:59:05.315338000
fetch-pack itself took 0.421 seconds (measured with time(1)).
Looks like the bulk of the time here is caused by simple shell
processing overhead, some of which scales with the number of heads and
tags to track.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 4:19 ` Linus Torvalds
2006-07-04 5:05 ` Keith Packard
@ 2006-07-04 5:29 ` Keith Packard
2006-07-04 5:53 ` Linus Torvalds
1 sibling, 1 reply; 30+ messages in thread
From: Keith Packard @ 2006-07-04 5:29 UTC (permalink / raw
To: Linus Torvalds; +Cc: keithp, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 837 bytes --]
On Mon, 2006-07-03 at 21:19 -0700, Linus Torvalds wrote:
> Well, that's still strange. What takes 4.2 seconds then?
$ strace -e trace=execve -f git-fetch 2>&1 | grep execve | sed -e 's/^.*execve("//' -e 's/".*$//' | sort | uniq -c | sort -n
1 /bin/rm
1 /home/keithp/bin/git
1 /home/keithp/bin/git-fetch
1 /home/keithp/bin/git-fetch-pack
1 /home/keithp/bin/git-ls-remote
1 /home/keithp/bin/git-peek-remote
1 /usr/bin/sort
3 /bin/sed
4 /home/keithp/bin/git-repo-config
30 /bin/mkdir
30 /home/keithp/bin/git-cat-file
30 /home/keithp/bin/git-check-ref-format
30 /home/keithp/bin/git-merge-base
30 /usr/bin/dirname
64 /home/keithp/bin/git-rev-parse
361 /usr/bin/expr
someone sure likes 'expr'...
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 5:05 ` Keith Packard
@ 2006-07-04 5:36 ` Linus Torvalds
2006-07-04 6:21 ` Junio C Hamano
0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2006-07-04 5:36 UTC (permalink / raw
To: Keith Packard; +Cc: Git Mailing List
On Mon, 3 Jul 2006, Keith Packard wrote:
>
> 5 Start: 21:59:01.584648000
> 66 After args: 21:59:01.605987000
> 248 fetch_main() start: 21:59:02.408559000
> 339 fetch_main() before fetch-pack: 21:59:03.293228000
> 387 fetch_main() done: 21:59:04.784388000
> 422 After tag following: 21:59:05.311439000
> 438 All done: 21:59:05.315338000
>
> fetch-pack itself took 0.421 seconds (measured with time(1)).
>
> Looks like the bulk of the time here is caused by simple shell
> processing overhead, some of which scales with the number of heads and
> tags to track.
Ahh.. Do you have tons of tags at the other end?
Looking closer, I suspect a big part of it is that
git-ls-remote $upload_pack --tags "$remote" |
sed -ne 's|^\([0-9a-f]*\)[ ]\(refs/tags/.*\)^{}$|\1 \2|p' |
while read sha1 name
do
..
done
loop.
With a lot of tags, the shell overhead there can indeed be pretty
disgusting. And I was wrong - I thought it would do that git-ls-remote
only if the first time around we noticed that we would need to, but we do
actually do it all the time that we're fetching any new branches.
The sad part is that we really already got the list once, we just never
saved it away (ie "git-fetch-pack" actually _knows_ what the tags at the
other end are, and also knows which tags we already have, so if we made
git-fetch-pack just create that list and save it off, all the overhead
would just go away).
And yes, the shell script loops are really really simple, but some of them
are actually quadratic in the number of refs (O(local*remote)). If this
was a C program, we'd never even care, but with shell, the thing is slow
enough that having even a modest amount of tags and refs is going to just
make it waste a lot of time in shell scripting.
We already do a lot of the infrastructure for "git fetch" in C - the
remotes parsing etc is all things that "git fetch" used to share with "git
push", but "git push" has been a builtin C program for a while now. I
suspect we should just do the same to "git fetch", which would make all
these issues just totally go away.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 5:29 ` Keith Packard
@ 2006-07-04 5:53 ` Linus Torvalds
0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2006-07-04 5:53 UTC (permalink / raw
To: Keith Packard; +Cc: Git Mailing List
On Mon, 3 Jul 2006, Keith Packard wrote:
>
> 361 /usr/bin/expr
>
> someone sure likes 'expr'...
Heh. That's a very Junio thing to do.
Junio seems to like
if expr "z$string" : "z<regexp>" >/dev/null
then
..
and I think he explained it as being the way old-fashioned users do it.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 5:36 ` Linus Torvalds
@ 2006-07-04 6:21 ` Junio C Hamano
0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2006-07-04 6:21 UTC (permalink / raw
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> Looking closer, I suspect a big part of it is that
>
> git-ls-remote $upload_pack --tags "$remote" |
> sed -ne 's|^\([0-9a-f]*\)[ ]\(refs/tags/.*\)^{}$|\1 \2|p' |
> while read sha1 name
> do
> ..
> done
>
> loop.
Yes indeed. Maybe we can do this loop in Perl. Doing the whole
thing in C is another option but it would be somewhat painful,
unless we can deprecate all transport but git native protocols.
On the other hand, 5 seconds may not matter that much in practice.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 0:21 ` Jeff King
2006-07-04 1:22 ` Ryan Anderson
2006-07-04 3:07 ` Linus Torvalds
@ 2006-07-04 6:44 ` Jakub Narebski
2 siblings, 0 replies; 30+ messages in thread
From: Jakub Narebski @ 2006-07-04 6:44 UTC (permalink / raw
To: git
Jeff King wrote:
> On Mon, Jul 03, 2006 at 04:14:10PM -0700, Linus Torvalds wrote:
>
>> Well, you could use multiple branches in the same repository, even if
they
>> are totally unrealated. That would allow you to fetch them all in one go.
>
> One annoying thing about this is that you may want to have several of
> the branches checked out at a time (i.e., you want the actual directory
> structure of libXrandr/, Xorg/, etc). You could pull everything down
> into one repo and point small pseudo-repos at it with alternates, but I
> would think that would become a mess with pushes. You can do some magic
> with read-tree --prefix, but again, I'm not sure how you'd make commits
> on the correct branch. Is there an easier way to do this?
Write proper subprojects support for git, or pester someone to write it
(finally). See Subpro.txt in todo branch.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 4:30 ` Keith Packard
@ 2006-07-04 11:10 ` Andreas Ericsson
2006-07-04 11:18 ` Matthias Kestenholz
0 siblings, 1 reply; 30+ messages in thread
From: Andreas Ericsson @ 2006-07-04 11:10 UTC (permalink / raw
To: Keith Packard; +Cc: Linus Torvalds, Junio C Hamano, git
Keith Packard wrote:
> On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote:
>
>
>> "And, it's painfully slow, even when the repository is up to date"
>>
>>and gave a 17-second time.
>
>
> It's faster this evening, down to 8 seconds using ssh and 4 seconds
> using git. I clearly need to force use of the git protocol. Anyone else
> like the attached patch?
Since it changes the current meaning of ssh+git, I'm not exactly
thrilled. However, "git/ssh" or "ssh/git" would work fine for me. The
slash-separator could be used to say "fetch over this, push over that",
so we can end up with any valid protocol to use for fetches and another
one to push over.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 11:10 ` Andreas Ericsson
@ 2006-07-04 11:18 ` Matthias Kestenholz
2006-07-04 12:05 ` Andreas Ericsson
0 siblings, 1 reply; 30+ messages in thread
From: Matthias Kestenholz @ 2006-07-04 11:18 UTC (permalink / raw
To: Andreas Ericsson; +Cc: git
* Andreas Ericsson (ae@op5.se) wrote:
> Keith Packard wrote:
> >On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote:
> >
> >
> >> "And, it's painfully slow, even when the repository is up to date"
> >>
> >>and gave a 17-second time.
> >
> >
> >It's faster this evening, down to 8 seconds using ssh and 4 seconds
> >using git. I clearly need to force use of the git protocol. Anyone else
> >like the attached patch?
>
> Since it changes the current meaning of ssh+git, I'm not exactly
> thrilled. However, "git/ssh" or "ssh/git" would work fine for me. The
> slash-separator could be used to say "fetch over this, push over that",
> so we can end up with any valid protocol to use for fetches and another
> one to push over.
>
If we would do such a thing, we would be probably better off
allowing different URLs for pushing and pulling, because the git and
ssh URLs will only be the same, if the git repositories are located
in the root folder and I suspect that's almost never the case.
Matthias
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 11:18 ` Matthias Kestenholz
@ 2006-07-04 12:05 ` Andreas Ericsson
0 siblings, 0 replies; 30+ messages in thread
From: Andreas Ericsson @ 2006-07-04 12:05 UTC (permalink / raw
To: Matthias Kestenholz; +Cc: git
Matthias Kestenholz wrote:
> * Andreas Ericsson (ae@op5.se) wrote:
>
>>Keith Packard wrote:
>>
>>>On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote:
>>>
>>>
>>>
>>>> "And, it's painfully slow, even when the repository is up to date"
>>>>
>>>>and gave a 17-second time.
>>>
>>>
>>>It's faster this evening, down to 8 seconds using ssh and 4 seconds
>>>using git. I clearly need to force use of the git protocol. Anyone else
>>>like the attached patch?
>>
>>Since it changes the current meaning of ssh+git, I'm not exactly
>>thrilled. However, "git/ssh" or "ssh/git" would work fine for me. The
>>slash-separator could be used to say "fetch over this, push over that",
>>so we can end up with any valid protocol to use for fetches and another
>>one to push over.
>>
>
>
> If we would do such a thing, we would be probably better off
> allowing different URLs for pushing and pulling, because the git and
> ssh URLs will only be the same, if the git repositories are located
> in the root folder and I suspect that's almost never the case.
>
True. We use relative paths where I work, so for us either way would
work. Your way is better though.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard
2006-07-03 23:14 ` Linus Torvalds
@ 2006-07-04 15:42 ` Jakub Narebski
2006-07-04 16:30 ` Thomas Glanzmann
2006-07-04 17:45 ` Junio C Hamano
2006-07-06 23:36 ` David Woodhouse
2 siblings, 2 replies; 30+ messages in thread
From: Jakub Narebski @ 2006-07-04 15:42 UTC (permalink / raw
To: git
I wonder if the problem detected here is also responsible with results
of Jeremy Blosser benchmark comparing git with Mercurial
http://lists.ibiblio.org/pipermail/sm-discuss/2006-May/014586.html
where git wins for clone, status and log, but is slower for pull.
See summary at
http://git.or.cz/gitwiki/GitBenchmarks#head-85df1bb7f019c4c504e34cde43450ef69349882f
--
Jakub Narebski
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 15:42 ` Jakub Narebski
@ 2006-07-04 16:30 ` Thomas Glanzmann
2006-07-04 17:45 ` Junio C Hamano
1 sibling, 0 replies; 30+ messages in thread
From: Thomas Glanzmann @ 2006-07-04 16:30 UTC (permalink / raw
To: Jakub Narebski; +Cc: git
Hello,
> See summary at
> http://git.or.cz/gitwiki/GitBenchmarks#head-85df1bb7f019c4c504e34cde43450ef69349882f
thank you for clarifing! I finally understand why Solaris folks prefer
hg over git: It is dog slow. - So it fits the general philosophy behind
Solaris.
Thomas
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 15:42 ` Jakub Narebski
2006-07-04 16:30 ` Thomas Glanzmann
@ 2006-07-04 17:45 ` Junio C Hamano
2006-07-04 19:22 ` Linus Torvalds
1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2006-07-04 17:45 UTC (permalink / raw
To: git; +Cc: jnareb
Jakub Narebski <jnareb@gmail.com> writes:
> I wonder if the problem detected here is also responsible with results
> of Jeremy Blosser benchmark comparing git with Mercurial
> http://lists.ibiblio.org/pipermail/sm-discuss/2006-May/014586.html
> where git wins for clone, status and log, but is slower for pull.
I had an impression, though the report does not talk about this
specific detail, that the extra time we are paying is because
the "git pull" test is done without suppressing the final
diffstat phase.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 17:45 ` Junio C Hamano
@ 2006-07-04 19:22 ` Linus Torvalds
2006-07-04 21:05 ` Junio C Hamano
0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2006-07-04 19:22 UTC (permalink / raw
To: Junio C Hamano; +Cc: git, jnareb
On Tue, 4 Jul 2006, Junio C Hamano wrote:
>
> I had an impression, though the report does not talk about this
> specific detail, that the extra time we are paying is because
> the "git pull" test is done without suppressing the final
> diffstat phase.
I'm pretty sure that was the reason for the particular hg issue. Looking
at the "clone" times, the problem is almost certainly not the actual
pulling.
The diffstat generation is often the largest part of a git merge. It's
gotten cheaper since the hg benchmarks were done (I think they were done
back before the integrated diff generation, so they also have the overhead
of executing a lot of external GNU diff processes), but it's still not
"cheap".
But I have to say that the diffstat at least for me is absolutely
invaluable.
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 19:22 ` Linus Torvalds
@ 2006-07-04 21:05 ` Junio C Hamano
0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2006-07-04 21:05 UTC (permalink / raw
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> But I have to say that the diffstat at least for me is absolutely
> invaluable.
Oh, I absolutely agree with that and somebody who suggests to
turn it off by default needs a very good argument to convince
me.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-04 3:07 ` Linus Torvalds
@ 2006-07-05 6:47 ` Jeff King
2006-07-05 16:40 ` Linus Torvalds
0 siblings, 1 reply; 30+ messages in thread
From: Jeff King @ 2006-07-05 6:47 UTC (permalink / raw
To: Linus Torvalds; +Cc: Git Mailing List
On Mon, Jul 03, 2006 at 08:07:49PM -0700, Linus Torvalds wrote:
> > Fetching by ssh actually makes two ssh connections (the second is to
> > grab tags).
> True. Although that should happen only if there are any new tags.
Either you're wrong or there's a bug in git-fetch.
I think you're missing the call to git-ls-remote --tags to get the list
of tags (which we will then auto-follow if necessary). So in that case,
there would actually be 3 ssh connections. If everything is up to date,
we still make 2 connections (one to check refs from remotes file, and
one to check remote tag list).
-Peff
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-05 6:47 ` Jeff King
@ 2006-07-05 16:40 ` Linus Torvalds
0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2006-07-05 16:40 UTC (permalink / raw
To: Jeff King; +Cc: Git Mailing List
On Wed, 5 Jul 2006, Jeff King wrote:
>
> Either you're wrong or there's a bug in git-fetch.
I was wrong - I forgot the git-ls-remote (which really should be
unnecessary, but the way the git-fetch-pack works, we end up
re-connecting).
Linus
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues
2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard
2006-07-03 23:14 ` Linus Torvalds
2006-07-04 15:42 ` Jakub Narebski
@ 2006-07-06 23:36 ` David Woodhouse
2 siblings, 0 replies; 30+ messages in thread
From: David Woodhouse @ 2006-07-06 23:36 UTC (permalink / raw
To: Keith Packard; +Cc: Git Mailing List
On Mon, 2006-07-03 at 11:02 -0700, Keith Packard wrote:
> just uses ssh for everything. This slows down the connection process
> by several seconds.
Only if you forgot to use the 'control socket' support, which lets you
make a _single_ authenticated connection and re-use it for multiple
sessions.
http://david.woodhou.se/openssh-control.html has a couple of
improvements, but the basics are usable in upstream openssh.
--
dwmw2
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2006-07-06 23:36 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard
2006-07-03 23:14 ` Linus Torvalds
2006-07-04 0:21 ` Jeff King
2006-07-04 1:22 ` Ryan Anderson
2006-07-04 1:44 ` Jeff King
2006-07-04 1:55 ` Ryan Anderson
2006-07-04 3:07 ` Linus Torvalds
2006-07-05 6:47 ` Jeff King
2006-07-05 16:40 ` Linus Torvalds
2006-07-04 6:44 ` Jakub Narebski
[not found] ` <1151973438.4723.70.camel@neko.keithp.com>
2006-07-04 3:21 ` Linus Torvalds
2006-07-04 3:30 ` Junio C Hamano
2006-07-04 3:40 ` Linus Torvalds
2006-07-04 4:30 ` Keith Packard
2006-07-04 11:10 ` Andreas Ericsson
2006-07-04 11:18 ` Matthias Kestenholz
2006-07-04 12:05 ` Andreas Ericsson
2006-07-04 4:02 ` Keith Packard
2006-07-04 4:19 ` Linus Torvalds
2006-07-04 5:05 ` Keith Packard
2006-07-04 5:36 ` Linus Torvalds
2006-07-04 6:21 ` Junio C Hamano
2006-07-04 5:29 ` Keith Packard
2006-07-04 5:53 ` Linus Torvalds
2006-07-04 15:42 ` Jakub Narebski
2006-07-04 16:30 ` Thomas Glanzmann
2006-07-04 17:45 ` Junio C Hamano
2006-07-04 19:22 ` Linus Torvalds
2006-07-04 21:05 ` Junio C Hamano
2006-07-06 23:36 ` David Woodhouse
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).