> The trace_performance functions require manual instrumentation of the code sections you want to measure

Ahh a case of RTFM :)

> Could you post details about your test setup? Are you still using WebKit for your tests?

I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus scanner, truecrypt, no defragger.

I've tried to be a bit smarter with the intent of my code, and this is what I came up with.

diff --git a/cache.h b/cache.h

index 4bf19e3..2e9fb1f 100644

--- a/cache.h

+++ b/cache.h

@@ -294,7 +294,7 @@ extern void free_name_hash(struct index_state *istate);

#define active_cache_changed (the_index.cache_changed)

#define active_cache_tree (the_index.cache_tree)

-#define read_cache() read_index(&the_index)

+#define read_cache() read_index_preload(&the_index, NULL)

#define read_cache_from(path) read_index_from(&the_index, (path))

#define read_cache_preload(pathspec) read_index_preload(&the_index, (pathspec))

#define is_cache_unborn() is_index_unborn(&the_index)

diff --git a/read-cache.c b/read-cache.c

index c3d5e35..5fb2788 100644

--- a/read-cache.c

+++ b/read-cache.c

@@ -1866,7 +1866,7 @@ int read_index_unmerged(struct index_state *istate)

int i;

int unmerged = 0;

- read_index(istate);

+ read_index_preload(istate, NULL);

for (i = 0; i < istate->cache_nr; i++) {

struct cache_entry *ce = istate->cache[i];

struct cache_entry *new_ce;

Interestingly when I run on a cleanly checked out blink repo my changes seem to make matters worse in terms of performance, but when working on a repo with ignored files in it it seems to work better. So for point of comparison I decided to run it on a comparison on a repo with working ignored files in it in this case msysgit/git after a 'make install'. When I get a few hours I'll try to build blink and re-run the numbers on a much much larger repo.

This comparison is a average of 3 cold cache runs of the kb/fscache-v4 [a] vs kb/fscache-v4 with my above changes applied [b], with preloadindex and fscache set to true.

For comparison

git status -s

[a] 3.02s

[b] 2.92s

git reset --hard head

[a] 3.67s

[b] 3.09s

git add -u

[a] 2.89s

[b] 2.08s

I noticed something interesting. Preload index uses 20 threads to do the work. When I was keeping an eye on them in task manager some threads will finish quite quickly, while others will run a lot longer. The way I understand the code at the moment the threads get equal chunks of work to perform. It's quite lilkely that even more performance could be obtained out of preload if the work splitting was 'smarter'. My currently best idea would be to use something like a lock-free queue to queue up the work and let the threads get the work of the queue. That way all threads are busy with work for longer. A candidate for the implementation would be libfds [1] queue. However my issue with this library and the reason I haven't tried to integrate is simply because the code expressly has no license.

[1] http://www.liblfds.org/