On Tuesday 08 September 2009, Johan Herland wrote: > I have some performance numbers that I will send in a separate email. Ok, here we go: Test scenario: Linux kernel repo with 157118 commits, 1 note per commit, with notes organized into various fanout schemes. Hardware is Intel Core 2 Quad with 4GB RAM. The tests were done on the following algorithms: - "before": This is the state of the notes code after applying patches 1-9. It uses the original notes-in-hash-map implementation, and does not grok any fanout scheme. - "16tree": This is the state of the notes code after applying patch 10. It uses the 16-tree data structure that parses the SHA-1 based fanout schemes. - "flexible": This is the state of the notes code after applying the entire patch series. This code parses a variety of date- and SHA1-based fanout schemes. Furthermore, the following notes tree structures were tested: - "no-notes": Testing without any notes at all. This is only present as a baseline, and to verify that the notes code does not negatively affect performance when not in use. - "no-fanout": All notes stored directly inside the root notes tree object. - "2_38": All notes stored in a SHA1-based 2/38 fanout scheme. - "2_2_36": All notes stored in a SHA1-based 2/2/36 fanout scheme. - "ym": Notes are organized within "yYYYYmMM"-named subtrees, where "YYYY" and "MM" are the year and month (respectively) from the annotated commit's commit date. - "ym_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme within the "yYYYYmMM"-named subtrees. - "ymd": Notes are organized within "yYYYYmMMdDD"-named subtrees. - "ymd_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme within the "yYYYYmMMdDD"-named subtrees. - "y_m": Notes are organized within two-level "yYYYY/mMM" subtrees. - "y_m_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme within the "yYYYY/mMM"-named subtrees. - "y_m_d": Notes are organized within three-level "yYYYY/mMM/dDD" subtrees. - "y_m_d_2_38": Same as above, but with a 2/38 SHA1-based fanout scheme within the "yYYYY/mMM/dDD"-named subtrees. Here are the runtime numbers, the first column shows the runtime for 100 repetitions of "git log -n10" (which we assume to be a common use case), and the second column shows the runtime from a single run of "git log --all" (which is somewhat closer to a worst case). Algorithm / Notes tree git log -n10 (x100) git log --all ------------------------------------------------------------ before / no-notes 4.78s 63.90s before / no-fanout 56.85s 65.69s 16tree / no-notes 4.77s 64.18s 16tree / no-fanout 30.35s 65.39s 16tree / 2_38 5.57s 65.42s 16tree / 2_2_36 5.19s 65.76s flexible / no-notes 4.78s 63.91s flexible / no-fanout 30.34s 65.57s flexible / 2_38 5.57s 65.46s flexible / 2_2_36 5.18s 65.72s flexible / ym 5.13s 65.66s flexible / ym_2_38 5.08s 65.63s flexible / ymd 5.30s 65.45s flexible / ymd_2_38 5.29s 65.90s flexible / y_m 5.11s 65.72s flexible / y_m_2_38 5.08s 65.67s flexible / y_m_d 5.06s 65.50s flexible / y_m_d_2_38 5.07s 65.79s Finally, I have also looked at the memory consumption of the various algorithms and fanout schemes: The memory usage was measured by calculating the #bytes dynamically allocated for the notes data structure, and printing the current usage every time get_commit_notes() was called during a complete run of "git log --all". The results are attached as two gnuplot graphs, one with regular axes, and one with logarithmic axes. Have fun! :) ...Johan -- Johan Herland, www.herland.net