git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Mike Hommey <mh@glandium.org>
Cc: git@vger.kernel.org
Subject: Re: Significant performance waste in git-svn and friends
Date: Thu, 6 Sep 2007 00:04:08 -0700	[thread overview]
Message-ID: <20070906070407.GA19624@soma> (raw)
In-Reply-To: <20070905184710.GA3632@glandium.org>

Mike Hommey <mh@glandium.org> wrote:
> Hi,

Hi Mike,

> Being a pervert abusing the way subversion doesn't deal with branches
> and tags, I'm actually not a user of git-svn or git-svnimport, because
> they just can't deal easily with my perversion. So I'm writing a script
> to do the conversion for me, and since I also like to learn new things
> when I'm coding, I'm writing it in ruby.
> 
> Anyways, one of the things I'm trying to convert is my svk repository
> for debian packaging of xulrunner (so, a significant subset of the
> mozilla tree), which doesn't involve a lot of revisions (around 280,
> because I only imported releases or CVS snapshots), but involves a lot
> of files (roughly 20k).
> 
> The first thing I noticed when twisting around the svk repo so that
> git-svn could somehow import it a while ago, is that running git-svn
> was in my case significantly slower than svnadmin dump | svnadmin load
> (more than 2 times slower).
> 
> And now, with my own script, I got the same kind of "slowdown". So I
> investigated it, and it didn't take long to realize that replacing
> git-hash-object by a simple reimplementation in ruby was *way* faster.
> git-hash-object being more than probably what you do the most when you
> import a remote repository, it is not much of a surprise that forking
> thousands of times is a huge performance waste.

I haven't looked at the times in a while, but I suspect that exec()
is the (much bigger) culprit.

Since I usually import off remote repositories, so I notice network
latency way before I notice local performance problems with git-svn.

> So, just for the record, I did a lame hack of git-svn to see what kind
> of speedup could happen in git-svn. You can find this lame hack as a
> patch below. I did some tests (with a 1.5.2.1 release) and here are the
> results, importing only the trunk (192 revisions), with no checkout, and
> redirecting stdout to /dev/null:
> 
> original git-svn:
> real    25m1.871s
> user    8m51.593s
> sys     12m31.659s
> 
> patched git-svn:
> real    14m45.870s
> user    7m31.928s
> sys     4m1.047s

That's awesome.

> - It might be worth testing if git-cat-file is called a lot. If so,
>   implementing a simple git-cat-file equivalent that would work for
>   unpacked objects could improve speed.

IIRC git-cat-file is called a lot.  Every modified file needs the
original cat-ed to make use of the delta.

> The same things obviously apply to git-cvsimport and other scripts
> calling git-hash-object a lot.

Making git-svn use fast-import would be very nice.  I've got a bunch
of other git-svn things that I need to work on, but having git-svn
converted to use fast-import would be nice.  Or allowing Git.pm
to access more of the git internals...

However, how well/poorly would fast-import work for incremental
fetches throughout the day?

-- 
Eric Wong

  parent reply	other threads:[~2007-09-06  7:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-05 18:47 Significant performance waste in git-svn and friends Mike Hommey
2007-09-05 20:40 ` Junio C Hamano
2007-09-05 21:19   ` David Kastrup
2007-09-06  1:07     ` Patrick Doyle
2007-09-06  2:19     ` Shawn O. Pearce
2007-09-06  2:16   ` Shawn O. Pearce
2007-09-06  5:52   ` Mike Hommey
2007-09-06  7:04 ` Eric Wong [this message]
2007-09-07  4:55   ` Shawn O. Pearce
2007-09-07  6:28     ` Steven Grimm
2007-09-07  5:41 ` Mike Hommey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070906070407.GA19624@soma \
    --to=normalperson@yhbt.net \
    --cc=git@vger.kernel.org \
    --cc=mh@glandium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).