From: Eric Wong <normalperson@yhbt.net>
To: Mike Hommey <mh@glandium.org>
Cc: git@vger.kernel.org
Subject: Re: Significant performance waste in git-svn and friends
Date: Thu, 6 Sep 2007 00:04:08 -0700 [thread overview]
Message-ID: <20070906070407.GA19624@soma> (raw)
In-Reply-To: <20070905184710.GA3632@glandium.org>
Mike Hommey <mh@glandium.org> wrote:
> Hi,
Hi Mike,
> Being a pervert abusing the way subversion doesn't deal with branches
> and tags, I'm actually not a user of git-svn or git-svnimport, because
> they just can't deal easily with my perversion. So I'm writing a script
> to do the conversion for me, and since I also like to learn new things
> when I'm coding, I'm writing it in ruby.
>
> Anyways, one of the things I'm trying to convert is my svk repository
> for debian packaging of xulrunner (so, a significant subset of the
> mozilla tree), which doesn't involve a lot of revisions (around 280,
> because I only imported releases or CVS snapshots), but involves a lot
> of files (roughly 20k).
>
> The first thing I noticed when twisting around the svk repo so that
> git-svn could somehow import it a while ago, is that running git-svn
> was in my case significantly slower than svnadmin dump | svnadmin load
> (more than 2 times slower).
>
> And now, with my own script, I got the same kind of "slowdown". So I
> investigated it, and it didn't take long to realize that replacing
> git-hash-object by a simple reimplementation in ruby was *way* faster.
> git-hash-object being more than probably what you do the most when you
> import a remote repository, it is not much of a surprise that forking
> thousands of times is a huge performance waste.
I haven't looked at the times in a while, but I suspect that exec()
is the (much bigger) culprit.
Since I usually import off remote repositories, so I notice network
latency way before I notice local performance problems with git-svn.
> So, just for the record, I did a lame hack of git-svn to see what kind
> of speedup could happen in git-svn. You can find this lame hack as a
> patch below. I did some tests (with a 1.5.2.1 release) and here are the
> results, importing only the trunk (192 revisions), with no checkout, and
> redirecting stdout to /dev/null:
>
> original git-svn:
> real 25m1.871s
> user 8m51.593s
> sys 12m31.659s
>
> patched git-svn:
> real 14m45.870s
> user 7m31.928s
> sys 4m1.047s
That's awesome.
> - It might be worth testing if git-cat-file is called a lot. If so,
> implementing a simple git-cat-file equivalent that would work for
> unpacked objects could improve speed.
IIRC git-cat-file is called a lot. Every modified file needs the
original cat-ed to make use of the delta.
> The same things obviously apply to git-cvsimport and other scripts
> calling git-hash-object a lot.
Making git-svn use fast-import would be very nice. I've got a bunch
of other git-svn things that I need to work on, but having git-svn
converted to use fast-import would be nice. Or allowing Git.pm
to access more of the git internals...
However, how well/poorly would fast-import work for incremental
fetches throughout the day?
--
Eric Wong
next prev parent reply other threads:[~2007-09-06 7:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-05 18:47 Significant performance waste in git-svn and friends Mike Hommey
2007-09-05 20:40 ` Junio C Hamano
2007-09-05 21:19 ` David Kastrup
2007-09-06 1:07 ` Patrick Doyle
2007-09-06 2:19 ` Shawn O. Pearce
2007-09-06 2:16 ` Shawn O. Pearce
2007-09-06 5:52 ` Mike Hommey
2007-09-06 7:04 ` Eric Wong [this message]
2007-09-07 4:55 ` Shawn O. Pearce
2007-09-07 6:28 ` Steven Grimm
2007-09-07 5:41 ` Mike Hommey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070906070407.GA19624@soma \
--to=normalperson@yhbt.net \
--cc=git@vger.kernel.org \
--cc=mh@glandium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).