git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Jan Holesovsky <kendy@suse.cz>, git@vger.kernel.org
Cc: dev@tools.openoffice.org
Subject: Re: Git benchmarks at OpenOffice.org wiki
Date: Thu, 3 May 2007 01:30:43 +0200	[thread overview]
Message-ID: <200705030130.44018.jnareb@gmail.com> (raw)
In-Reply-To: <200705021624.25560.kendy@suse.cz>

Jan Holesovsky wrote:
> On Tuesday 01 May 2007 23:46, Jakub Narebski wrote:
> 
>> What I am concerned about is some of git benchmark results at Git page
>> on OpenOffice.org wiki:
>>   http://wiki.services.openoffice.org/wiki/Git#Comparison

>> The problem is with 'Size of checkout': to start working in repository
>> one needs 1.4G (sources) and 98M (third party) for CVS checkout (it is
>> 1.5G for sources for Subversion checkout). Ordinary for distributed SCM
>> you would need size of repository + size of sources (working area),
>> which is 2.8G for sources and 688M for third party stuff files you can
>> hack on + the history]. This makes some prefer to go centralized SCM
>> route, i.e. Subversion as replacement for CVS (+ CWS, ChildWorkSpace).
> 
> Considering the size OOo needs for build (>8G without languages),
> the ~1.4G overhead for history is very well bearable.  I am surprised about
> the 100M overhead for SVN as well - from my experience it is usually about
> the size of the project itself; but maybe they improved something in SVN
> in the meantime.

I think the supposition that SVN uses hardlinks for pristine copy
of sources (HEAD version) seems probable; then there it is 100M overhead
plus size of changed files, and of course this tricks works only on
filesystems which support hardlinks, and assumes either hardlinks being
COW-links (copy-on-write) or editor behaving.
 
>> What might help here is splitting repository into current (e.g. from
>> OOo 2.0) and historical part,
> 
> No, I don't want this ;-)

I forgot to add there is possible to graft historical repository to the
current work repository, resulting in full history available. For example
Linux kernel repository has backported from BK historical repository, and
there is grafts file which connect those two repositories.

>> and / or using shallow clone. 

git-clone(1):

--depth <depth>::
        Create a 'shallow' clone with a history truncated to the
        specified number of revs.  A shallow repository has
        number of limitations (you cannot clone or fetch from
        it, nor push from nor into it), but is adequate if you
        want to only look at near the tip of a large project
        with a long history, and would want to send in a fixes
        as patches.

It is possible that those limitations will be lifted in the future
(if possible), so there is alternate possibility to reduce needed
disk space for git checkout. But certainly this is not for everybody.

>> Implementing  
>> partial checkouts, i.e. checking out only part of working area (...)

The problem with implementing this feature (you can do partial checkout
using low level commands, but this feature is not implemented [yet?]
per se) is with doing merge on part which is not checked out. Might
not be a problem for OOo; but this might be also not needed for OOo.
Sometimes submodules are better, sometimes partial checkout is the
only way: see below.

>> Splitting repository into submodules, and submodule 
>> support -- it depends on organization of OOo sources, would certainly
>> help for third party stuff repository.
> 
> We should better split the OOo sources; it's a process that already started
> [UNO runtime environment vs. OOo without URE], and I proposed some more
> changes already.

In my opinion each submodule should be able to compile and test by
itself. You can go X.Org route with splitting sources into modules...
or you can make use of the new submodules support (currently plumbing
level, i.e. low level commands), aka. gitlinks.

The submodules support makes it possible to split sources into
independent modules (parts), which can be developed independently,
and which you can download (clone, fetch) or not, while making it
possible to bind it all together into one superproject.

See (somewhat not up to date) http://git.or.cz/gitwiki/SubprojectSupport
page on git wiki.

>> What I'm really concerned about is branch switch and merging branches,
>> when one of the branches is an old one (e.g. unxsplash branch), which
>> takes 3min (!) according to the benchmark. 13-25sec for commit is also
>> bit long, but BRANCH SWITCHING which takes 3 MINUTES!? There is no
>> comparison benchmark for CVS or Subversion, though...

By the way, the time to switch branch should be proportional to number
of changed files, which you can get with "git diff --summary unxsplash
HEAD". Or to be more realistic to checkout some old version
(some old tag), as usually branches which got merged in are deleted
(or even never got published). For example when bisecting some bug:
Subversion doesn't have bisect, does it?

I wonder if running "git pack-refs" would help this benchmark...

-- 
Jakub Narebski
Poland

  parent reply	other threads:[~2007-05-02 23:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-01 21:46 Git benchmarks at OpenOffice.org wiki Jakub Narebski
2007-05-01 22:27 ` Junio C Hamano
2007-05-02  8:55 ` Andy Parkins
2007-05-02  9:51   ` Julian Phillips
2007-05-02 10:58     ` Andy Parkins
2007-05-02 14:28       ` Julian Phillips
2007-05-02 15:30         ` Andy Parkins
2007-05-02 17:11           ` Julian Phillips
2007-05-02 14:37       ` Jan Holesovsky
2007-05-02 15:33         ` Andy Parkins
2007-05-02 17:26       ` Junio C Hamano
2007-05-02 10:24 ` Johannes Schindelin
2007-05-02 11:33   ` Jakub Narebski
2007-05-02 14:55     ` Johannes Schindelin
2007-05-05  3:56     ` Linus Torvalds
2007-05-07  8:05       ` Junio C Hamano
2007-05-07 15:22         ` Linus Torvalds
2007-05-02 14:41   ` Jan Holesovsky
2007-05-02 16:24     ` Johannes Schindelin
2007-05-02 14:24 ` Jan Holesovsky
2007-05-02 14:35   ` Johannes Schindelin
2007-05-02 16:15   ` Petr Baudis
2007-05-02 16:27     ` Jan Holesovsky
2007-05-02 16:37       ` Petr Baudis
2007-05-02 16:48         ` Petr Baudis
2007-05-02 23:30   ` Jakub Narebski [this message]
2007-05-03 11:51     ` [tools-dev] " Jan Holesovsky
2007-05-03 12:54       ` Alex Riesen
2007-05-03 15:14       ` Johannes Sixt
2007-05-04  0:48       ` Jakub Narebski
2007-05-03  7:03 ` Florian Weimer
2007-05-03  9:33   ` Johannes Schindelin
2007-05-03 10:16     ` Robin Rosenberg
2007-05-03 10:48       ` Martin Langhoff
2007-05-06 20:05         ` Robin Rosenberg
2007-05-03 23:36       ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200705030130.44018.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=dev@tools.openoffice.org \
    --cc=git@vger.kernel.org \
    --cc=kendy@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).