git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/5] Allocate cache entries from memory pool
@ 2018-04-17 16:34 Jameson Miller
  2018-04-17 16:34 ` Jameson Miller
                   ` (11 more replies)
  0 siblings, 12 replies; 100+ messages in thread
From: Jameson Miller @ 2018-04-17 16:34 UTC (permalink / raw)
  To: git@vger.kernel.org
  Cc: gitster@pobox.com, pclouds@gmail.com, jonathantanmy@google.com,
	Jameson Miller

This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.

Git previously allocated block of memory for the index cache entries
until [2].

This 5 part patch series is broken up as follows:

  1/5, 2/5 - Move cache entry lifecycle methods behind an API

  3/5 - Fill out memory pool API to include lifecycle and other
      	methods used in later patches

  4/5 - Allocate cache entry structs from memory pool

  5/5 - Add extra optional validation

Performance Benchmarks:

To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.

The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.

100K

Test                                       baseline [4]       block_allocation
 ------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)    0.02(0.01+0.01) -33.3%

1M:

Test                                       baseline           block_allocation
 ------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)    0.17(0.07+0.09) -26.1%

2M:

Test                                       baseline           block_allocation
 ------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)    0.39(0.17+0.20) -13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries. For
completeness, here is the p0002-read-cache tests for git.git and
linux.git:

git.git:

Test                                          baseline          block_allocation
 ---------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.30(0.26+0.03)   0.17(0.13+0.03) -43.3%

linux.git:

Test                                          baseline          block_allocation
 ---------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   7.05(6.01+0.84)   4.61(3.74+0.66) -34.6% 


We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.

100K

Test                                       baseline [4]      tcmalloc                  jemalloc
 ------------------------------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)   0.03(0.01+0.01) +0.0%     0.03(0.02+0.01) +0.0% 

1M:

Test                                       baseline          tcmalloc                  jemalloc
 ------------------------------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)   0.21(0.10+0.10) -8.7%     0.27(0.16+0.10) +17.4%

2M:

Test                                       baseline          tcmalloc                  jemalloc
 ------------------------------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)   0.46(0.25+0.21) +2.2%     0.57(0.36+0.21) +26.7%


[1] https://public-inbox.org/git/20180321164152.204869-1-jamill@microsoft.com/

[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)

[3] Constructing test repositories:

The test repositories were constructed with t/perf/repos/many_files.sh with the following parameters:

100K:	 many-files.sh 4 10 9
1M:	 many-files.sh 5 10 9
2M:	 many-files.sh 6 8 7

[4] baseline commit: 8b026eda Revert "Merge branch 'en/rename-directory-detection'"

Jameson Miller (5):
  read-cache: teach refresh_cache_entry to take istate
  Add an API creating / discarding cache_entry structs
  mem-pool: fill out functionality
  Allocate cache entries from memory pools
  Add optional memory validations around cache_entry lifecyle

 apply.c                |  26 +++---
 blame.c                |   5 +-
 builtin/checkout.c     |   8 +-
 builtin/difftool.c     |   8 +-
 builtin/reset.c        |   6 +-
 builtin/update-index.c |  26 +++---
 cache.h                |  40 ++++++++-
 git.c                  |   3 +
 mem-pool.c             | 136 ++++++++++++++++++++++++++++-
 mem-pool.h             |  34 ++++++++
 merge-recursive.c      |   4 +-
 read-cache.c           | 229 +++++++++++++++++++++++++++++++++++++++----------
 resolve-undo.c         |   6 +-
 split-index.c          |  31 +++++--
 tree.c                 |   4 +-
 unpack-trees.c         |  27 +++---
 16 files changed, 476 insertions(+), 117 deletions(-)


base-commit: cafaccae98f749ebf33495aec42ea25060de8682
-- 
2.14.3



^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2018-07-22  9:23 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 16:34 [PATCH v1 0/5] Allocate cache entries from memory pool Jameson Miller
2018-04-17 16:34 ` Jameson Miller
2018-04-17 16:34 ` [PATCH v1 1/5] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-04-17 19:00   ` Ben Peart
2018-04-17 16:34 ` [PATCH v1 2/5] Add an API creating / discarding cache_entry structs Jameson Miller
2018-04-17 23:11   ` Ben Peart
2018-04-17 16:34 ` [PATCH v1 4/5] Allocate cache entries from memory pools Jameson Miller
2018-04-17 16:34 ` [PATCH v1 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-20 23:21   ` Jonathan Tan
2018-04-23 17:27     ` Jameson Miller
2018-04-23 17:49       ` Jonathan Tan
2018-04-23 18:20         ` Jameson Miller
2018-04-17 16:34 ` [PATCH v1 5/5] Add optional memory validations around cache_entry lifecyle Jameson Miller
2018-04-17 18:39 ` [PATCH v1 0/5] Allocate cache entries from memory pool Ben Peart
2018-04-23 14:09   ` Jameson Miller
2018-04-18  4:49 ` Junio C Hamano
2018-04-20 17:49   ` Stefan Beller
2018-04-23 16:44     ` Jameson Miller
2018-04-23 17:18       ` Stefan Beller
2018-04-23 16:19   ` Jameson Miller
2018-04-20 23:34 ` Jonathan Tan
2018-04-23 17:14   ` Jameson Miller
2018-04-30 15:31 ` [PATCH v2 " Jameson Miller
2018-04-30 15:31   ` [PATCH v2 1/5] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-04-30 15:31   ` [PATCH v2 2/5] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-04-30 15:31   ` [PATCH v2 3/5] mem-pool: fill out functionality Jameson Miller
2018-04-30 21:42     ` Stefan Beller
2018-05-01 15:43       ` Jameson Miller
2018-05-03 16:18     ` Duy Nguyen
2018-04-30 15:31   ` [PATCH v2 4/5] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-04-30 15:31   ` [PATCH v2 5/5] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-03 16:28     ` Duy Nguyen
2018-05-03 16:35   ` [PATCH v2 0/5] Allocate cache entries from memory pool Duy Nguyen
2018-05-03 17:21     ` Stefan Beller
2018-05-03 19:17       ` Duy Nguyen
2018-05-03 20:58         ` Stefan Beller
2018-05-03 21:13           ` Jameson Miller
2018-05-03 22:18             ` [PATCH] alloc.c: replace alloc by mempool Stefan Beller
2018-05-04 16:33               ` Duy Nguyen
2018-05-08  0:37                 ` Junio C Hamano
2018-05-08  0:44                   ` Stefan Beller
2018-05-08  1:07                     ` Junio C Hamano
2018-05-23 14:47 ` [PATCH v3 0/7] allocate cache entries from memory pool Jameson Miller
2018-05-23 14:47   ` [PATCH v3 1/7] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-05-25 22:54     ` Stefan Beller
2018-05-23 14:47   ` [PATCH v3 2/7] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-05-24  4:52     ` Junio C Hamano
2018-05-24 14:47       ` Jameson Miller
2018-05-23 14:47   ` [PATCH v3 3/7] mem-pool: only search head block for available space Jameson Miller
2018-05-23 14:47   ` [PATCH v3 4/7] mem-pool: add lifecycle management functions Jameson Miller
2018-05-23 14:47   ` [PATCH v3 5/7] mem-pool: fill out functionality Jameson Miller
2018-06-01 19:28     ` Stefan Beller
2018-05-23 14:47   ` [PATCH v3 6/7] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-05-23 14:47   ` [PATCH v3 7/7] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-05-24  4:55   ` [PATCH v3 0/7] allocate cache entries from memory pool Junio C Hamano
2018-05-24 14:44     ` Jameson Miller
2018-05-25 22:53       ` Stefan Beller
2018-06-20 20:41         ` Jameson Miller
2018-05-25 22:41   ` Stefan Beller
2018-06-20 20:17 ` [PATCH v4 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17   ` [PATCH v4 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-20 20:17   ` [PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-21 21:14     ` Stefan Beller
2018-06-28 14:07       ` Jameson Miller
2018-06-20 20:17   ` [PATCH v4 3/8] mem-pool: only search head block for available space Jameson Miller
2018-06-21 21:33     ` Stefan Beller
2018-06-28 14:12       ` Jameson Miller
2018-06-20 20:17   ` [PATCH v4 4/8] mem-pool: tweak math on mp_block allocation size Jameson Miller
2018-06-20 20:17   ` [PATCH v4 5/8] mem-pool: add lifecycle management functions Jameson Miller
2018-06-20 20:17   ` [PATCH v4 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-20 20:17   ` [PATCH v4 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-06-20 20:17   ` [PATCH v4 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-06-28 14:00   ` [PATCH v5 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-06-28 14:00     ` [PATCH v5 1/8] read-cache: teach refresh_cache_entry() to take istate Jameson Miller
2018-06-28 14:00     ` [PATCH v5 2/8] read-cache: make_cache_entry should take object_id struct Jameson Miller
2018-06-28 17:14       ` Junio C Hamano
2018-06-28 22:27       ` SZEDER Gábor
2018-06-28 14:00     ` [PATCH v5 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-06-28 18:43       ` Junio C Hamano
2018-06-28 22:28       ` SZEDER Gábor
2018-06-28 14:00     ` [PATCH v5 4/8] mem-pool: only search head block for available space Jameson Miller
2018-06-28 14:00     ` [PATCH v5 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-06-28 17:15       ` Junio C Hamano
2018-06-28 14:00     ` [PATCH v5 6/8] mem-pool: fill out functionality Jameson Miller
2018-06-28 19:09       ` Junio C Hamano
2018-07-02 18:28         ` Jameson Miller
2018-06-28 14:00     ` [PATCH v5 7/8] block alloc: allocate cache entries from mem-pool Jameson Miller
2018-06-28 14:00     ` [PATCH v5 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller
2018-07-02 19:49     ` [PATCH v6 0/8] Allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49       ` [PATCH v6 1/8] read-cache: teach refresh_cache_entry to take istate Jameson Miller
2018-07-02 19:49       ` [PATCH v6 2/8] read-cache: teach make_cache_entry to take object_id Jameson Miller
2018-07-02 21:23         ` Stefan Beller
2018-07-05 15:20           ` Jameson Miller
2018-07-02 19:49       ` [PATCH v6 3/8] block alloc: add lifecycle APIs for cache_entry structs Jameson Miller
2018-07-22  9:23         ` Duy Nguyen
2018-07-02 19:49       ` [PATCH v6 4/8] mem-pool: only search head block for available space Jameson Miller
2018-07-02 19:49       ` [PATCH v6 5/8] mem-pool: add life cycle management functions Jameson Miller
2018-07-02 19:49       ` [PATCH v6 6/8] mem-pool: fill out functionality Jameson Miller
2018-07-02 19:49       ` [PATCH v6 7/8] block alloc: allocate cache entries from mem_pool Jameson Miller
2018-07-02 19:49       ` [PATCH v6 8/8] block alloc: add validations around cache_entry lifecyle Jameson Miller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).