On Friday 22 April 2005 19:55, Chris Mason wrote: > On Friday 22 April 2005 16:32, Chris Mason wrote: > > If I pack every 64k (uncompressed), the checkout-tree time goes down to > > 3m14s. That's a very big difference considering how stupid my code is > > .git was only 20% smaller with 64k chunks. I should be able to do > > better...I'll do one more run. > > This run also packed tree files together (everything produced by write-tree > went into a packed file), but not the commits. I estimate I could save > about another 168m by packing the tree files and commits into the same file > with the blobs, but this wouldn't make any of the times below faster. > > git - original (28k commits) packed > FS size 2,675,408k 1,723,820k > read-tree 24.45s 18.9s > checkout-cache 4m30s 3m5s > patch time 2h30m 1h55m > It was a rainy weekend, so I took a break from lawn care and hacked in some simple changes to the packed file format. There's now a header listing the sha1 for each subfile and the offset where to find it in the main file. Each subfile is compressed individually so you don't have to decompress the whole packed file to find one. commits were added into the packed files as well. Some results were about what I expected: FS size -- 1,614,376k read-tree -- 18s checkout-cache -- 2m35s (cold cache) checkout-cache -- 18s (hot cache) patch time -- 96m vanilla git needs 56s to checkout with a hot cache. The hot cache numbers weren't done before because I hadn't expected my patch to help at all. Even though we both do things entirely from cache, vanilla git is much slower at writing the checked out files back to the drive. I've made no optimizations to that code, and the drive is only 30% full, so this seems to just be a bad interaction with filesystem layout. I also expected vanilla git to perform pretty well when there were no commits in the tree. My test was to put a copy of 2.6.11 under git. vanilla packed update-cache (for all files) 2m1s 48s checkout-cache (cold) 1m23s 28s checkout-cache (hot) 12s 15s The difference in hot cache checkout time is userland cpu time. It could be avoided with smarter caching of the packed file header. Right now I'm decompressing it over and over again for each checkout. Still, the performance hit is pretty small because I try to limit the number of subfiles that get packed together. My current patch is attached for reference, it's against a git from late last week. I wouldn't suggest using this for anything other than benchmarking, and since I don't think I can get much better numbers easily, I'll stop playing around with this for a while. -chris