git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Florian Weimer <fw@deneb.enyo.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	git@vger.kernel.org
Subject: cache-cold repository performance
Date: Thu, 14 Apr 2005 12:50:18 +0200	[thread overview]
Message-ID: <20050414105018.GA5408@elte.hu> (raw)
In-Reply-To: <20050414070422.GA3226@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> i'd be surprised if it was twice as fast - cache-cold linear checkouts 
> are _seek_ limited, and it doesnt matter whether after a 1-2 msec 
> track-to-track disk seek the DMA engine spends another 30 microseconds 
> DMA-ing 60K uncompressed data instead of 30K compressed... (there are 
> other factors, but this is the main thing.)

i've benchmarked cache-cold compressed vs. uncompressed performance, to 
shed some more light on the performance differences between flat and 
compressed repositories.

i did alot of testing, and i primarily concentrated on being able to 
_trust_ the benchmark results, not to generate some quick numbers. The 
major problem was that the timing of the reads associated with 'checking 
out a large tree' is very unstable, even on a completely isolated 
testsystem with very common (and predictable) IO hardware.

the content i tested was a vanilla 2.6.10 kernel tree, with 19042 files 
in it, taking 246 MB uncompressed, and 110 MB compressed (via gzip -9).  
Average file size is 13.2 KB uncompressed, 5.9 KB compressed.

Firstly, the timings are very sensitive to the way the tree was created.  
To have a 'fair' on-disk layout the trees have to be created in an 
identical fashion: e.g. it is not valid to copy the uncompressed tree 
and run gzip over it - that will create a 'sparse' on-disk layout 
penalizing the compressed layout and making it 30% slower than the 
uncompressed layout! I first created the two trees, then i "cp -a"-ed 
them over into a new directory one after each other, so that they get on 
similar on-disk positions as well. I also created 2 more pairs of such 
trees to make sure disk layout is fair.

all timings were taken fresh after reboot, on a UP 1 GB RAM Athlon64 
3200+, using a large, top of the line IDE disk. The kernel was 
2.6.12-rc2, the filesystem was ext3 with enough free space to not be 
fragmented, both noatime and nodiratime was specified so that no write 
activities whatever occur during the 'checkout'.

the operation timed was a simple:

        time find . -type f | xargs cat > /dev/null

done in the root of the given tree. This generates the very same 
readonly IO pattern for each test. I've run the tests 10 times (i.e.  
have done 10 fresh reboots), but after every reboot i permutated the 
order of trees tested - to make sure there is no interaction between 
trees. (there was no interaction)

here are the raw numbers, elapsed real time in seconds:

 flat-1:  29.7 29.5 29.4 29.4 29.5 29.5 29.7 29.6 29.4 29.6 29.5 29.4:  29.5
 gzip-1:  41.2 40.9 40.7 40.7 40.5 41.7 41.0 40.3 40.6 40.8 40.8 40.9:  40.8

 flat-2:  28.0 28.2 27.7 27.9 27.8 27.9 27.7 27.9 27.9 28.1 27.9 28.0:  27.9
 gzip-2:  27.2 27.4 27.4 27.2 27.2 27.2 27.2 27.2 27.1 27.3 27.2 27.4:  27.2
 flat-3:  27.0 27.8 27.6 27.7 27.8 27.8 27.8 27.7 27.8 27.6 27.8 27.8:  27.6
 gzip-3:  25.8 26.8 26.6 26.5 26.5 26.5 26.6 26.4 26.5 26.7 26.6 26.7:  26.5

The final column is the average. (Standard deviation is below 0.1 sec, 
less than 0.3%.)

flat-1 is the original tree, created via tar. gzip-1 is a cp -a copy of 
it, per-file compressed afterwards. flat-2 is a cp -a copy of flat-1, 
gzip-2 is a cp -a copy of gzip-1. flat-3/gzip-3 are cp -a copies of 
flat-2/gzip-2.

note that gzip-1 is ~40% slower due to the 'sparse layout', so its 
results approximate a repository with 'bad' file layout. I'd not expect 
GIT repositories to have such a layout normally, so we can disregard it.

flat-2/3 and gzip-2/3 can be directly compared. Firstly, the results 
show that the on-disk layout cannot be constructed reliably - there's a 
1% systematic difference between flat-2 and flat-3, and a 3% systematic 
difference between gzip-2 and gzip-3 - both systematic errors are larger 
than the 0.5% standard deviation, so they are not measurement errors but 
real layout properties of these trees.

the most interesting result is that gzip-2 is 2.5% faster than flat-2, 
and gzip-3 is 4% faster than flat-3. These differences are close to the 
layout-related systematic error, but slightly above it, so i'd conclude 
that a compressed repository is 3% faster on this hardware.

(since these results were in line with my expectations i double-checked 
everything again and did another 10 reboot tests - same results.)

conclusion [*]: there's a negligible cache-cold performance hit from 
using an uncompressed repository, because cache-cold performance is 
dominated by number of seeks, which is almost identical in the two 
cases.

	Ingo

[*] lots of conditionals apply: these werent flat/compressed GIT 
repositories (although they were quite similar to it), nor was the GIT 
workload measured (although the one measured should be quite close to 
it).


      reply	other threads:[~2005-04-14 10:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <425C3F12.9070606@zytor.com>
     [not found] ` <Pine.LNX.4.58.0504121452330.4501@ppc970.osdl.org>
     [not found]   ` <20050412224027.GB20821@elte.hu>
     [not found]     ` <Pine.LNX.4.58.0504121554140.4501@ppc970.osdl.org>
     [not found]       ` <20050412230027.GA21759@elte.hu>
     [not found]         ` <20050412230729.GA22179@elte.hu>
     [not found]           ` <20050413111355.GB13865@elte.hu>
     [not found]             ` <425D4E1D.4040108@zytor.com>
     [not found]               ` <20050413165310.GA22428@elte.hu>
     [not found]                 ` <425D4FB1.9040207@zytor.com>
     [not found]                   ` <20050413171052.GA22711@elte.hu>
     [not found]                     ` <Pine.LNX.4.58.0504131027210.4501@ppc970.osdl.org>
     [not found]                       ` <20050413182909.GA25221@elte.hu>
     [not found]                         ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org>
2005-04-13 20:02                           ` Index/hash order Ingo Molnar
2005-04-13 20:07                             ` H. Peter Anvin
2005-04-13 20:15                               ` Ingo Molnar
2005-04-13 20:18                                 ` Ingo Molnar
2005-04-13 20:21                                   ` Ingo Molnar
2005-04-13 20:26                                     ` Updated base64 patches H. Peter Anvin
2005-04-13 21:04                                 ` Index/hash order Linus Torvalds
2005-04-20  7:40                                   ` enforcing DB immutability Ingo Molnar
2005-04-20  7:49                                     ` Ingo Molnar
2005-04-20  7:53                                       ` Ingo Molnar
2005-04-20  8:58                                         ` Chris Wedgwood
2005-04-20 14:57                                       ` Nick Craig-Wood
2005-04-27  8:15                                       ` Wout
2005-04-13 20:15                               ` Index/hash order Linus Torvalds
2005-04-13 20:28                         ` Baruch Even
     [not found]                   ` <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>
2005-04-13 21:40                     ` Florian Weimer
2005-04-13 22:11                       ` Linus Torvalds
2005-04-13 22:48                         ` Florian Weimer
2005-04-14  7:04                         ` Ingo Molnar
2005-04-14 10:50                           ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050414105018.GA5408@elte.hu \
    --to=mingo@elte.hu \
    --cc=fw@deneb.enyo.de \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).