From: Greg Troxel <gdt@ir.bbn.com>
To: Joshua Redstone <joshua.redstone@fb.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Git performance results on a large repository
Date: Sat, 04 Feb 2012 16:42:11 -0500 [thread overview]
Message-ID: <rmivcnm2s3w.fsf@fnord.ir.bbn.com> (raw)
In-Reply-To: <CB5074CF.3AD7A%joshua.redstone@fb.com> (Joshua Redstone's message of "Fri, 3 Feb 2012 14:20:06 +0000")
[-- Attachment #1: Type: text/plain, Size: 3075 bytes --]
Joshua Redstone <joshua.redstone@fb.com> writes:
> The test repo has 4 million commits, linear history and about 1.3 million
> files. The size of the .git directory is about 15GB, and has been
> repacked with 'git repack -a -d -f --max-pack-size=10g --depth=100
> --window=250'. This repack took about 2 days on a beefy machine (I.e.,
> lots of ram and flash). The size of the index file is 191 MB. I can share
> the script that generated it if people are interested - It basically picks
> 2-5 files, modifies a line or two and adds a few lines at the end
> consisting of random dictionary words, occasionally creates a new file,
> commits all the modifications and repeats.
I have a repository with about 500K files, 3.3G checkout, 1.5G .git, and
about 10K commits. (This is a real repository, not a test case.) So
not as many commits by a lot, but the size seems not so far off.
> I timed a few common operations with both a warm OS file cache and a cold
> cache. i.e., I did a 'echo 3 | tee /proc/sys/vm/drop_caches' and then did
> the operation in question a few times (first timing is the cold timing,
> the next few are the warm timings). The following results are on a server
> with average hard drive (I.e., not flash) and > 10GB of ram.
>
> 'git status' : 39 minutes cold, and 24 seconds warm.
Both of these numbers surprise me. I'm using NetBSD, whose stat
implementation isn't as optimized as Linux (you didn't say, but
assuming). On a years-old desktop, git status seems to be about a
minute semi-cold and 5s warm (once I set the vnode cache big over 500K,
vs 350K default for a 2G ram machine).
So on the warm status, I wonder how big your vnode cache is, and if
you've exceeded it, and I don't follow the cold time at all. Probably
some sort of profiling within git status would be illuminating.
> 'git blame': 44 minutes cold, 11 minutes warm.
>
> 'git add' (appending a few chars to the end of a file and adding it): 7
> seconds cold and 5 seconds warm.
>
> 'git commit -m "foo bar3" --no-verify --untracked-files=no --quiet
> --no-status': 41 minutes cold, 20 seconds warm. I also hacked a version
> of git to remove the three or four places where 'git commit' stats every
> file in the repo, and this dropped the times to 30 minutes cold and 8
> seconds warm.
So without the stat, I wonder what it's doing that takes 30 minutes.
> One way to get there is to do some deep code modifications to git
> internals, to, for example, create some abstractions and interfaces that
> allow plugging in the specialized servers. Another way is to leave git
> internals as they are and develop a layer of wrapper scripts around all
> the git commands that do the necessary interfacing. The wrapper scripts
> seem perhaps easier in the short-term, but may lead to increasing
> divergence from how git behaves natively and also a layer of complexity.
Having hooks for a blame server cache, etc. sounds sensible. Having a
way to call blames sort of like with --since and then keep updating it
(eg. in emacs) to earlier times sounds useful.
[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]
next prev parent reply other threads:[~2012-02-04 21:51 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-03 14:20 Git performance results on a large repository Joshua Redstone
2012-02-03 14:56 ` Ævar Arnfjörð Bjarmason
2012-02-03 17:00 ` Joshua Redstone
2012-02-03 22:40 ` Sam Vilain
2012-02-03 22:57 ` Sam Vilain
2012-02-07 1:19 ` Nguyen Thai Ngoc Duy
2012-02-03 23:05 ` Matt Graham
2012-02-04 1:25 ` Evgeny Sazhin
2012-02-03 23:35 ` Chris Lee
2012-02-04 0:01 ` Zeki Mokhtarzada
2012-02-04 5:07 ` Joey Hess
2012-02-04 6:53 ` Nguyen Thai Ngoc Duy
2012-02-04 18:05 ` Joshua Redstone
2012-02-05 3:47 ` Nguyen Thai Ngoc Duy
2012-02-06 15:40 ` Joey Hess
2012-02-07 13:43 ` Nguyen Thai Ngoc Duy
2012-02-09 21:06 ` Joshua Redstone
2012-02-10 7:12 ` Nguyen Thai Ngoc Duy
2012-02-10 9:39 ` Christian Couder
2012-02-10 12:24 ` Nguyen Thai Ngoc Duy
2012-02-06 7:10 ` David Mohs
2012-02-06 16:23 ` Matt Graham
2012-02-06 20:50 ` Joshua Redstone
2012-02-06 21:07 ` Greg Troxel
2012-02-07 1:28 ` david
2012-02-06 21:17 ` Sam Vilain
2012-02-04 20:05 ` Joshua Redstone
2012-02-05 15:01 ` Tomas Carnecky
2012-02-05 15:17 ` Nguyen Thai Ngoc Duy
2012-02-04 8:57 ` slinky
2012-02-04 21:42 ` Greg Troxel [this message]
2012-02-05 4:30 ` david
2012-02-05 11:24 ` David Barr
2012-02-07 8:58 ` Emanuele Zattin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=rmivcnm2s3w.fsf@fnord.ir.bbn.com \
--to=gdt@ir.bbn.com \
--cc=git@vger.kernel.org \
--cc=joshua.redstone@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).