From: Jameson Miller <jamill@microsoft.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Cc: "gitster@pobox.com" <gitster@pobox.com>,
"pclouds@gmail.com" <pclouds@gmail.com>,
"jonathantanmy@google.com" <jonathantanmy@google.com>,
Jameson Miller <jamill@microsoft.com>
Subject: [PATCH v1 0/5] Allocate cache entries from memory pool
Date: Tue, 17 Apr 2018 16:34:39 +0000 [thread overview]
Message-ID: <20180417163400.3875-2-jamill@microsoft.com> (raw)
In-Reply-To: <20180417163400.3875-1-jamill@microsoft.com>
This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.
Git previously allocated block of memory for the index cache entries
until [2].
This 5 part patch series is broken up as follows:
1/5, 2/5 - Move cache entry lifecycle methods behind an API
3/5 - Fill out memory pool API to include lifecycle and other
methods used in later patches
4/5 - Allocate cache entry structs from memory pool
5/5 - Add extra optional validation
Performance Benchmarks:
To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.
The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.
100K
Test baseline [4] block_allocation
------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01) 0.02(0.01+0.01) -33.3%
1M:
Test baseline block_allocation
------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11) 0.17(0.07+0.09) -26.1%
2M:
Test baseline block_allocation
------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19) 0.39(0.17+0.20) -13.3%
100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.
For completeness, here is the p0002-read-cache tests for git.git and
linux.git:
git.git:
Test baseline [4] block_allocation
---------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.30(0.26+0.03) 0.17(0.13+0.03) -43.3%
linux.git:
Test baseline block_allocation
---------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 7.05(6.01+0.84) 4.61(3.74+0.66) -34.6%
We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.
100K
Test baseline [4] tcmalloc jemalloc
------------------------------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times 0.03(0.01+0.01) 0.03(0.01+0.01) +0.0% 0.03(0.02+0.01) +0.0%
1M:
Test baseline tcmalloc jemalloc
------------------------------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times 0.23(0.12+0.11) 0.21(0.10+0.10) -8.7% 0.27(0.16+0.10) +17.4%
2M:
Test baseline tcmalloc jemalloc
------------------------------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times 0.45(0.26+0.19) 0.46(0.25+0.21) +2.2% 0.57(0.36+0.21) +26.7%
[1] https://public-inbox.org/git/20180321164152.204869-1-jamill@microsoft.com/
[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)
[3] Constructing test repositories:
The test repositories were constructed with t/perf/repos/many_files.sh with the following parameters:
100K: many-files.sh 4 10 9
1M: many-files.sh 5 10 9
2M: many-files.sh 6 8 7
[4] baseline commit: 8b026eda Revert "Merge branch 'en/rename-directory-detection'"
Jameson Miller (5):
read-cache: teach refresh_cache_entry to take istate
Add an API creating / discarding cache_entry structs
mem-pool: fill out functionality
Allocate cache entries from memory pools
Add optional memory validations around cache_entry lifecyle
apply.c | 26 +++---
blame.c | 5 +-
builtin/checkout.c | 8 +-
builtin/difftool.c | 8 +-
builtin/reset.c | 6 +-
builtin/update-index.c | 26 +++---
cache.h | 40 ++++++++-
git.c | 3 +
mem-pool.c | 136 ++++++++++++++++++++++++++++-
mem-pool.h | 34 ++++++++
merge-recursive.c | 4 +-
read-cache.c | 229 +++++++++++++++++++++++++++++++++++++++----------
resolve-undo.c | 6 +-
split-index.c | 31 +++++--
tree.c | 4 +-
unpack-trees.c | 27 +++---
16 files changed, 476 insertions(+), 117 deletions(-)
base-commit: cafaccae98f749ebf33495aec42ea25060de8682
--
2.14.3
next prev parent reply other threads:[~2018-04-17 16:34 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-17 16:34 [PATCH v1 0/5] Allocate cache entries from memory pool Jameson Miller
2018-04-17 16:34 ` Jameson Miller [this message]
2018-04-17 16:34 ` [PATCH v1 1/5] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-04-17 19:00 ` Ben Peart
2018-04-17 16:34 ` [PATCH v1 2/5] Add an API creating / discarding cache_entry structs Jameson Miller
2018-04-17 23:11 ` Ben Peart
2018-04-17 16:34 ` [PATCH v1 4/5] Allocate cache entries from memory pools Jameson Miller
2018-04-17 16:34 ` [PATCH v1 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-20 23:21 ` Jonathan Tan
2018-04-23 17:27 ` Jameson Miller
2018-04-23 17:49 ` Jonathan Tan
2018-04-23 18:20 ` Jameson Miller
2018-04-17 16:34 ` [PATCH v1 5/5] Add optional memory validations around cache_entry lifecyle Jameson Miller
2018-04-17 18:39 ` [PATCH v1 0/5] Allocate cache entries from memory pool Ben Peart
2018-04-23 14:09 ` Jameson Miller
2018-04-18 4:49 ` Junio C Hamano
2018-04-20 17:49 ` Stefan Beller
2018-04-23 16:44 ` Jameson Miller
2018-04-23 17:18 ` Stefan Beller
2018-04-23 16:19 ` Jameson Miller
2018-04-20 23:34 ` Jonathan Tan
2018-04-23 17:14 ` Jameson Miller
2018-04-30 15:31 ` [PATCH v2 " Jameson Miller
2018-04-30 15:31 ` [PATCH v2 1/5] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-04-30 15:31 ` [PATCH v2 2/5] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-04-30 15:31 ` [PATCH v2 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-30 21:42 ` Stefan Beller
2018-05-01 15:43 ` Jameson Miller
2018-05-03 16:18 ` Duy Nguyen
2018-04-30 15:31 ` [PATCH v2 4/5] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-04-30 15:31 ` [PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-03 16:28 ` Duy Nguyen
2018-05-03 16:35 ` [PATCH v2 0/5] Allocate cache entries from memory pool Duy Nguyen
2018-05-03 17:21 ` Stefan Beller
2018-05-03 19:17 ` Duy Nguyen
2018-05-03 20:58 ` Stefan Beller
2018-05-03 21:13 ` Jameson Miller
2018-05-03 22:18 ` [PATCH] alloc.c: replace alloc by mempool Stefan Beller
2018-05-04 16:33 ` Duy Nguyen
2018-05-08 0:37 ` Junio C Hamano
2018-05-08 0:44 ` Stefan Beller
2018-05-08 1:07 ` Junio C Hamano
2018-05-23 14:47 ` [PATCH v3 0/7] allocate cache entries from memory pool Jameson Miller
2018-05-23 14:47 ` [PATCH v3 1/7] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-05-25 22:54 ` Stefan Beller
2018-05-23 14:47 ` [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-05-24 4:52 ` Junio C Hamano
2018-05-24 14:47 ` Jameson Miller
2018-05-23 14:47 ` [PATCH v3 3/7] mem-pool: only search head block for available space Jameson Miller
2018-05-23 14:47 ` [PATCH v3 4/7] mem-pool: add lifecycle management functions Jameson Miller
2018-05-23 14:47 ` [PATCH v3 5/7] mem-pool: fill out functionality Jameson Miller
2018-06-01 19:28 ` Stefan Beller
2018-05-23 14:47 ` [PATCH v3 6/7] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-05-23 14:47 ` [PATCH v3 7/7] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-24 4:55 ` [PATCH v3 0/7] allocate cache entries from memory pool Junio C Hamano
2018-05-24 14:44 ` Jameson Miller
2018-05-25 22:53 ` Stefan Beller
2018-06-20 20:41 ` Jameson Miller
2018-05-25 22:41 ` Stefan Beller
2018-06-20 20:17 ` [PATCH v4 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17 ` [PATCH v4 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-20 20:17 ` [PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-21 21:14 ` Stefan Beller
2018-06-28 14:07 ` Jameson Miller
2018-06-20 20:17 ` [PATCH v4 3/8] mem-pool: only search head block for available space Jameson Miller
2018-06-21 21:33 ` Stefan Beller
2018-06-28 14:12 ` Jameson Miller
2018-06-20 20:17 ` [PATCH v4 4/8] mem-pool: tweak math on mp_block allocation size Jameson Miller
2018-06-20 20:17 ` [PATCH v4 5/8] mem-pool: add lifecycle management functions Jameson Miller
2018-06-20 20:17 ` [PATCH v4 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-20 20:17 ` [PATCH v4 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17 ` [PATCH v4 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-06-28 14:00 ` [PATCH v5 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-28 14:00 ` [PATCH v5 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-28 14:00 ` [PATCH v5 2/8] read-cache: make_cache_entry should take object_id struct Jameson Miller
2018-06-28 17:14 ` Junio C Hamano
2018-06-28 22:27 ` SZEDER Gábor
2018-06-28 14:00 ` [PATCH v5 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-28 18:43 ` Junio C Hamano
2018-06-28 22:28 ` SZEDER Gábor
2018-06-28 14:00 ` [PATCH v5 4/8] mem-pool: only search head block for available space Jameson Miller
2018-06-28 14:00 ` [PATCH v5 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-06-28 17:15 ` Junio C Hamano
2018-06-28 14:00 ` [PATCH v5 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-28 19:09 ` Junio C Hamano
2018-07-02 18:28 ` Jameson Miller
2018-06-28 14:00 ` [PATCH v5 7/8] block alloc: allocate cache entries from mem-pool Jameson Miller
2018-06-28 14:00 ` [PATCH v5 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-07-02 19:49 ` [PATCH v6 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49 ` [PATCH v6 1/8] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-07-02 19:49 ` [PATCH v6 2/8] read-cache: teach make_cache_entry to take object_id Jameson Miller
2018-07-02 21:23 ` Stefan Beller
2018-07-05 15:20 ` Jameson Miller
2018-07-02 19:49 ` [PATCH v6 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-07-22 9:23 ` Duy Nguyen
2018-07-02 19:49 ` [PATCH v6 4/8] mem-pool: only search head block for available space Jameson Miller
2018-07-02 19:49 ` [PATCH v6 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-07-02 19:49 ` [PATCH v6 6/8] mem-pool: fill out functionality Jameson Miller
2018-07-02 19:49 ` [PATCH v6 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49 ` [PATCH v6 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180417163400.3875-2-jamill@microsoft.com \
--to=jamill@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).