git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Greg Troxel <gdt@ir.bbn.com>
To: Joshua Redstone <joshua.redstone@fb.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Git performance results on a large repository
Date: Sat, 04 Feb 2012 16:42:11 -0500	[thread overview]
Message-ID: <rmivcnm2s3w.fsf@fnord.ir.bbn.com> (raw)
In-Reply-To: <CB5074CF.3AD7A%joshua.redstone@fb.com> (Joshua Redstone's message of "Fri, 3 Feb 2012 14:20:06 +0000")

[-- Attachment #1: Type: text/plain, Size: 3075 bytes --]


Joshua Redstone <joshua.redstone@fb.com> writes:

> The test repo has 4 million commits, linear history and about 1.3 million
> files.  The size of the .git directory is about 15GB, and has been
> repacked with 'git repack -a -d -f --max-pack-size=10g --depth=100
> --window=250'.  This repack took about 2 days on a beefy machine (I.e.,
> lots of ram and flash).  The size of the index file is 191 MB. I can share
> the script that generated it if people are interested - It basically picks
> 2-5 files, modifies a line or two and adds a few lines at the end
> consisting of random dictionary words, occasionally creates a new file,
> commits all the modifications and repeats.

I have a repository with about 500K files, 3.3G checkout, 1.5G .git, and
about 10K commits.  (This is a real repository, not a test case.)  So
not as many commits by a lot, but the size seems not so far off.

> I timed a few common operations with both a warm OS file cache and a cold
> cache.  i.e., I did a 'echo 3 | tee /proc/sys/vm/drop_caches' and then did
> the operation in question a few times (first timing is the cold timing,
> the next few are the warm timings).  The following results are on a server
> with average hard drive (I.e., not flash)  and > 10GB of ram.
>
> 'git status' :   39 minutes cold, and 24 seconds warm.

Both of these numbers surprise me.  I'm using NetBSD, whose stat
implementation isn't as optimized as Linux (you didn't say, but
assuming).   On a years-old desktop, git status seems to be about a
minute semi-cold and 5s warm (once I set the vnode cache big over 500K,
vs 350K default for a 2G ram machine).

So on the warm status, I wonder how big your vnode cache is, and if
you've exceeded it, and I don't follow the cold time at all.  Probably
some sort of profiling within git status would be illuminating.

> 'git blame':   44 minutes cold, 11 minutes warm.
>
> 'git add' (appending a few chars to the end of a file and adding it):   7
> seconds cold and 5 seconds warm.
>
> 'git commit -m "foo bar3" --no-verify --untracked-files=no --quiet
> --no-status':  41 minutes cold, 20 seconds warm.  I also hacked a version
> of git to remove the three or four places where 'git commit' stats every
> file in the repo, and this dropped the times to 30 minutes cold and 8
> seconds warm.

So without the stat, I wonder what it's doing that takes 30 minutes.

> One way to get there is to do some deep code modifications to git
> internals, to, for example, create some abstractions and interfaces that
> allow plugging in the specialized servers.  Another way is to leave git
> internals as they are and develop a layer of wrapper scripts around all
> the git commands that do the necessary interfacing.  The wrapper scripts
> seem perhaps easier in the short-term, but may lead to increasing
> divergence from how git behaves natively and also a layer of complexity.

Having hooks for a blame server cache, etc. sounds sensible.  Having a
way to call blames sort of like with --since and then keep updating it
(eg. in emacs) to earlier times sounds useful.

[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

  parent reply	other threads:[~2012-02-04 21:51 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-03 14:20 Git performance results on a large repository Joshua Redstone
2012-02-03 14:56 ` Ævar Arnfjörð Bjarmason
2012-02-03 17:00   ` Joshua Redstone
2012-02-03 22:40     ` Sam Vilain
2012-02-03 22:57       ` Sam Vilain
2012-02-07  1:19       ` Nguyen Thai Ngoc Duy
2012-02-03 23:05     ` Matt Graham
2012-02-04  1:25   ` Evgeny Sazhin
2012-02-03 23:35 ` Chris Lee
2012-02-04  0:01 ` Zeki Mokhtarzada
2012-02-04  5:07 ` Joey Hess
2012-02-04  6:53 ` Nguyen Thai Ngoc Duy
2012-02-04 18:05   ` Joshua Redstone
2012-02-05  3:47     ` Nguyen Thai Ngoc Duy
2012-02-06 15:40       ` Joey Hess
2012-02-07 13:43         ` Nguyen Thai Ngoc Duy
2012-02-09 21:06           ` Joshua Redstone
2012-02-10  7:12             ` Nguyen Thai Ngoc Duy
2012-02-10  9:39               ` Christian Couder
2012-02-10 12:24                 ` Nguyen Thai Ngoc Duy
2012-02-06  7:10     ` David Mohs
2012-02-06 16:23     ` Matt Graham
2012-02-06 20:50       ` Joshua Redstone
2012-02-06 21:07         ` Greg Troxel
2012-02-07  1:28         ` david
2012-02-06 21:17     ` Sam Vilain
2012-02-04 20:05   ` Joshua Redstone
2012-02-05 15:01   ` Tomas Carnecky
2012-02-05 15:17     ` Nguyen Thai Ngoc Duy
2012-02-04  8:57 ` slinky
2012-02-04 21:42 ` Greg Troxel [this message]
2012-02-05  4:30 ` david
2012-02-05 11:24   ` David Barr
2012-02-07  8:58 ` Emanuele Zattin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=rmivcnm2s3w.fsf@fnord.ir.bbn.com \
    --to=gdt@ir.bbn.com \
    --cc=git@vger.kernel.org \
    --cc=joshua.redstone@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).