From: Ben Peart <peartben@gmail.com> To: Stefan Beller <sbeller@google.com>, Ben Peart <Ben.Peart@microsoft.com> Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com> Subject: Re: [PATCH v1] read-cache: speed up index load through parallelization Date: Thu, 23 Aug 2018 15:44:58 -0400 Message-ID: <12b29bbf-4565-fe0e-97d4-19da2b4ddf5e@gmail.com> (raw) In-Reply-To: <CAGZ79kbXfPPvcQ1rnUdiOqWs5wC2qccGCnf8DvCVnp8QV126MA@mail.gmail.com> On 8/23/2018 1:31 PM, Stefan Beller wrote: > On Thu, Aug 23, 2018 at 8:45 AM Ben Peart <Ben.Peart@microsoft.com> wrote: >> >> This patch helps address the CPU cost of loading the index by creating >> multiple threads to divide the work of loading and converting the cache >> entries across all available CPU cores. >> >> It accomplishes this by having the primary thread loop across the index file >> tracking the offset and (for V4 indexes) expanding the name. It creates a >> thread to process each block of entries as it comes to them. Once the >> threads are complete and the cache entries are loaded, the rest of the >> extensions can be loaded and processed normally on the primary thread. >> >> Performance impact: >> >> read cache .git/index times on a synthetic repo with: >> >> 100,000 entries >> FALSE TRUE Savings %Savings >> 0.014798767 0.009580433 0.005218333 35.26% >> >> 1,000,000 entries >> FALSE TRUE Savings %Savings >> 0.240896533 0.1751243 0.065772233 27.30% >> >> read cache .git/index times on an actual repo with: >> >> ~3M entries >> FALSE TRUE Savings %Savings >> 0.59898098 0.4513169 0.14766408 24.65% >> >> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com> >> --- >> >> Notes: >> Base Ref: master >> Web-Diff: https://github.com/benpeart/git/commit/67a700419b >> Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v1 && git checkout 67a700419b >> >> Documentation/config.txt | 8 ++ >> config.c | 13 +++ >> config.h | 1 + >> read-cache.c | 218 ++++++++++++++++++++++++++++++++++----- >> 4 files changed, 216 insertions(+), 24 deletions(-) >> >> diff --git a/Documentation/config.txt b/Documentation/config.txt >> index 1c42364988..3344685cc4 100644 >> --- a/Documentation/config.txt >> +++ b/Documentation/config.txt >> @@ -899,6 +899,14 @@ relatively high IO latencies. When enabled, Git will do the >> index comparison to the filesystem data in parallel, allowing >> overlapping IO's. Defaults to true. >> >> +core.fastIndex:: >> + Enable parallel index loading >> ++ >> +This can speed up operations like 'git diff' and 'git status' especially >> +when the index is very large. When enabled, Git will do the index >> +loading from the on disk format to the in-memory format in parallel. >> +Defaults to true. > > "fast" is a non-descriptive word as we try to be fast in any operation? > Maybe core.parallelIndexReading as that just describes what it > turns on/off, without second guessing its effects? > (Are there still computers with just a single CPU, where this would not > make it faster? ;-)) > How about core.parallelReadIndex? Slightly shorter and matches the function names better. > >> +int git_config_get_fast_index(void) >> +{ >> + int val; >> + >> + if (!git_config_get_maybe_bool("core.fastindex", &val)) >> + return val; >> + >> + if (getenv("GIT_FASTINDEX_TEST")) >> + return 1; > > We look at this env value just before calling this function, > can be write it to only look at the evn variable once? > Sure, I didn't like the fact that it was called twice but didn't get around to cleaning it up. >> +++ b/config.h >> @@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void); >> extern int git_config_get_split_index(void); >> extern int git_config_get_max_percent_split_change(void); >> extern int git_config_get_fsmonitor(void); >> +extern int git_config_get_fast_index(void); > > Oh. nd/no-extern did not cover config.h > > >> >> +#ifndef min >> +#define min(a,b) (((a) < (b)) ? (a) : (b)) >> +#endif > > We do not have a minimum function in the tree, > except for xdiff/xmacros.h:29: XDL_MIN. > I wonder what the rationale is for not having a MIN() > definition, I think we discussed that on the list a couple > times but the rationale escaped me. > > If we introduce a min/max macro, can we put it somewhere > more prominent? (I would find it useful elsewhere) > I'll avoid that particular rabbit hole and just remove the min macro definition. ;-) >> +/* >> +* Mostly randomly chosen maximum thread counts: we >> +* cap the parallelism to online_cpus() threads, and we want >> +* to have at least 7500 cache entries per thread for it to >> +* be worth starting a thread. >> +*/ >> +#define THREAD_COST (7500) > > This reads very similar to preload-index.c THREAD_COST > >> + /* loop through index entries starting a thread for every thread_nr entries */ >> + consumed = thread = 0; >> + for (i = 0; ; i++) { >> + struct ondisk_cache_entry *ondisk; >> + const char *name; >> + unsigned int flags; >> + >> + /* we've reached the begining of a block of cache entries, kick off a thread to process them */ > > beginning > Thanks > Thanks, > Stefan >
next prev parent reply other threads:[~2018-08-23 19:45 UTC|newest] Thread overview: 199+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-23 15:41 Ben Peart 2018-08-23 17:31 ` Stefan Beller 2018-08-23 19:44 ` Ben Peart [this message] 2018-08-24 18:40 ` Duy Nguyen 2018-08-28 14:53 ` Ben Peart 2018-08-23 18:06 ` Junio C Hamano 2018-08-23 20:33 ` Ben Peart 2018-08-24 15:37 ` Duy Nguyen 2018-08-24 15:57 ` Duy Nguyen 2018-08-24 17:28 ` Ben Peart 2018-08-25 6:44 ` [PATCH] read-cache.c: optimize reading index format v4 Nguyễn Thái Ngọc Duy 2018-08-27 19:36 ` Junio C Hamano 2018-08-28 19:25 ` Duy Nguyen 2018-08-28 23:54 ` Ben Peart 2018-08-29 17:14 ` Junio C Hamano 2018-09-04 16:08 ` Duy Nguyen 2018-09-02 13:19 ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy 2018-09-02 13:19 ` [PATCH v2 1/1] read-cache.c: " Nguyễn Thái Ngọc Duy 2018-09-04 18:58 ` Junio C Hamano 2018-09-04 19:31 ` Junio C Hamano 2018-08-24 18:20 ` [PATCH v1] read-cache: speed up index load through parallelization Duy Nguyen 2018-08-24 18:40 ` Ben Peart 2018-08-24 19:00 ` Duy Nguyen 2018-08-24 19:57 ` Ben Peart 2018-08-29 15:25 ` [PATCH v2 0/3] " Ben Peart 2018-08-29 15:25 ` [PATCH v2 1/3] " Ben Peart 2018-08-29 17:14 ` Junio C Hamano 2018-08-29 21:35 ` Ben Peart 2018-09-03 19:16 ` Duy Nguyen 2018-08-29 15:25 ` [PATCH v2 2/3] read-cache: load cache extensions on worker thread Ben Peart 2018-08-29 17:12 ` Junio C Hamano 2018-08-29 21:42 ` Ben Peart 2018-08-29 22:19 ` Junio C Hamano 2018-09-03 19:21 ` Duy Nguyen 2018-09-03 19:27 ` Duy Nguyen 2018-08-29 15:25 ` [PATCH v2 3/3] read-cache: micro-optimize expand_name_field() to speed up V4 index parsing Ben Peart 2018-09-06 21:03 ` [PATCH v3 0/4] read-cache: speed up index load through parallelization Ben Peart 2018-09-06 21:03 ` [PATCH v3 1/4] read-cache: optimize expand_name_field() to speed up V4 index parsing Ben Peart 2018-09-06 21:03 ` [PATCH v3 2/4] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-07 17:55 ` Junio C Hamano 2018-09-07 20:23 ` Ben Peart 2018-09-08 6:29 ` Martin Ågren 2018-09-08 14:03 ` Ben Peart 2018-09-08 17:08 ` Martin Ågren 2018-09-06 21:03 ` [PATCH v3 3/4] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-07 21:10 ` Junio C Hamano 2018-09-08 14:56 ` Ben Peart 2018-09-06 21:03 ` [PATCH v3 4/4] read-cache: speed up index load through parallelization Ben Peart 2018-09-07 4:16 ` Torsten Bögershausen 2018-09-07 13:43 ` Ben Peart 2018-09-07 17:21 ` [PATCH v3 0/4] " Junio C Hamano 2018-09-07 18:31 ` Ben Peart 2018-09-08 13:18 ` Duy Nguyen 2018-09-11 23:26 ` [PATCH v4 0/5] " Ben Peart 2018-09-11 23:26 ` [PATCH v4 1/5] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-11 23:26 ` [PATCH v4 2/5] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-11 23:26 ` [PATCH v4 3/5] read-cache: speed up index load through parallelization Ben Peart 2018-09-11 23:26 ` [PATCH v4 4/5] read-cache.c: optimize reading index format v4 Ben Peart 2018-09-11 23:26 ` [PATCH v4 5/5] read-cache: clean up casting and byte decoding Ben Peart 2018-09-12 14:34 ` [PATCH v4 0/5] read-cache: speed up index load through parallelization Ben Peart 2018-09-12 16:18 ` [PATCH v5 " Ben Peart 2018-09-12 16:18 ` [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-13 22:44 ` Junio C Hamano 2018-09-15 10:02 ` Duy Nguyen 2018-09-17 14:54 ` Ben Peart 2018-09-17 16:05 ` Duy Nguyen 2018-09-17 17:31 ` Junio C Hamano 2018-09-17 17:38 ` Duy Nguyen 2018-09-17 19:08 ` Junio C Hamano 2018-09-12 16:18 ` [PATCH v5 2/5] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-15 10:22 ` Duy Nguyen 2018-09-15 10:24 ` Duy Nguyen 2018-09-17 16:38 ` Ben Peart 2018-09-15 16:23 ` Duy Nguyen 2018-09-17 17:19 ` Junio C Hamano 2018-09-17 16:26 ` Ben Peart 2018-09-17 16:45 ` Duy Nguyen 2018-09-17 21:32 ` Junio C Hamano 2018-09-12 16:18 ` [PATCH v5 3/5] read-cache: load cache entries on worker threads Ben Peart 2018-09-15 10:31 ` Duy Nguyen 2018-09-17 17:25 ` Ben Peart 2018-09-15 11:07 ` Duy Nguyen 2018-09-15 11:09 ` Duy Nguyen 2018-09-17 18:52 ` Ben Peart 2018-09-15 11:29 ` Duy Nguyen 2018-09-12 16:18 ` [PATCH v5 4/5] read-cache.c: optimize reading index format v4 Ben Peart 2018-09-12 16:18 ` [PATCH v5 5/5] read-cache: clean up casting and byte decoding Ben Peart 2018-09-26 19:54 ` [PATCH v6 0/7] speed up index load through parallelization Ben Peart 2018-09-26 19:54 ` [PATCH v6 1/7] read-cache.c: optimize reading index format v4 Ben Peart 2018-09-26 19:54 ` [PATCH v6 2/7] read-cache: clean up casting and byte decoding Ben Peart 2018-09-26 19:54 ` [PATCH v6 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-28 0:19 ` SZEDER Gábor 2018-09-28 18:38 ` Ben Peart 2018-09-29 0:51 ` SZEDER Gábor 2018-09-29 5:45 ` Duy Nguyen 2018-09-29 18:24 ` Junio C Hamano 2018-09-26 19:54 ` [PATCH v6 4/7] config: add new index.threads config setting Ben Peart 2018-09-28 0:26 ` SZEDER Gábor 2018-09-28 13:39 ` Ben Peart 2018-09-28 17:07 ` Junio C Hamano 2018-09-28 19:41 ` Ben Peart 2018-09-28 20:30 ` Ramsay Jones 2018-09-28 22:15 ` Junio C Hamano 2018-10-01 13:17 ` Ben Peart 2018-10-01 15:06 ` SZEDER Gábor 2018-09-26 19:54 ` [PATCH v6 5/7] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-26 19:54 ` [PATCH v6 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart 2018-09-26 19:54 ` [PATCH v6 7/7] read-cache: load cache entries on worker threads Ben Peart 2018-09-26 22:06 ` [PATCH v6 0/7] speed up index load through parallelization Junio C Hamano 2018-09-27 17:13 ` Duy Nguyen 2018-10-01 13:45 ` [PATCH v7 " Ben Peart 2018-10-01 13:45 ` [PATCH v7 1/7] read-cache.c: optimize reading index format v4 Ben Peart 2018-10-01 13:45 ` [PATCH v7 2/7] read-cache: clean up casting and byte decoding Ben Peart 2018-10-01 15:10 ` Duy Nguyen 2018-10-01 13:45 ` [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-10-01 15:17 ` SZEDER Gábor 2018-10-02 14:34 ` Ben Peart 2018-10-01 15:30 ` Duy Nguyen 2018-10-02 15:13 ` Ben Peart 2018-10-01 13:45 ` [PATCH v7 4/7] config: add new index.threads config setting Ben Peart 2018-10-01 13:45 ` [PATCH v7 5/7] read-cache: load cache extensions on a worker thread Ben Peart 2018-10-01 15:50 ` Duy Nguyen 2018-10-02 15:00 ` Ben Peart 2018-10-01 13:45 ` [PATCH v7 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart 2018-10-01 16:27 ` Duy Nguyen 2018-10-02 16:34 ` Ben Peart 2018-10-02 17:02 ` Duy Nguyen 2018-10-01 13:45 ` [PATCH v7 7/7] read-cache: load cache entries on worker threads Ben Peart 2018-10-01 17:09 ` Duy Nguyen 2018-10-02 19:09 ` Ben Peart 2018-10-10 15:59 ` [PATCH v8 0/7] speed up index load through parallelization Ben Peart 2018-10-10 15:59 ` [PATCH v8 1/7] read-cache.c: optimize reading index format v4 Ben Peart 2018-10-10 15:59 ` [PATCH v8 2/7] read-cache: clean up casting and byte decoding Ben Peart 2018-10-10 15:59 ` [PATCH v8 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-10-10 15:59 ` [PATCH v8 4/7] config: add new index.threads config setting Ben Peart 2018-10-10 15:59 ` [PATCH v8 5/7] read-cache: load cache extensions on a worker thread Ben Peart 2018-10-10 15:59 ` [PATCH v8 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart 2018-10-10 15:59 ` [PATCH v8 7/7] read-cache: load cache entries on worker threads Ben Peart 2018-10-19 16:11 ` Jeff King 2018-10-22 2:14 ` Junio C Hamano 2018-10-22 14:40 ` Ben Peart 2018-10-12 3:18 ` [PATCH v8 0/7] speed up index load through parallelization Junio C Hamano 2018-10-14 12:28 ` Duy Nguyen 2018-10-15 17:33 ` Ben Peart 2018-11-13 0:38 ` [PATCH 0/3] Avoid confusing messages from new index extensions (Re: [PATCH v8 0/7] speed up index load through parallelization) Jonathan Nieder 2018-11-13 0:39 ` [PATCH 1/3] eoie: default to not writing EOIE section Jonathan Nieder 2018-11-13 1:05 ` Junio C Hamano 2018-11-13 15:14 ` Ben Peart 2018-11-13 18:25 ` Jonathan Nieder 2018-11-14 1:36 ` Junio C Hamano 2018-11-15 0:19 ` Jonathan Nieder 2018-11-13 0:39 ` [PATCH 2/3] ieot: default to not writing IEOT section Jonathan Nieder 2018-11-13 0:58 ` Jonathan Tan 2018-11-13 1:09 ` Junio C Hamano 2018-11-13 1:12 ` Jonathan Nieder 2018-11-13 15:37 ` Duy Nguyen 2018-11-13 18:09 ` Jonathan Nieder 2018-11-13 15:22 ` Ben Peart 2018-11-13 18:18 ` Jonathan Nieder 2018-11-13 19:15 ` Ben Peart 2018-11-13 21:08 ` Jonathan Nieder 2018-11-14 18:09 ` Ben Peart 2018-11-15 0:05 ` Jonathan Nieder 2018-11-14 3:05 ` Junio C Hamano 2018-11-20 6:09 ` [PATCH v2 0/5] Avoid confusing messages from new index extensions Jonathan Nieder 2018-11-20 6:11 ` [PATCH 1/5] eoie: default to not writing EOIE section Jonathan Nieder 2018-11-20 13:06 ` Ben Peart 2018-11-20 13:21 ` SZEDER Gábor 2018-11-21 16:46 ` Jeff King 2018-11-22 0:47 ` Junio C Hamano 2018-11-20 15:01 ` Ben Peart 2018-11-20 6:12 ` [PATCH 2/5] ieot: default to not writing IEOT section Jonathan Nieder 2018-11-20 13:07 ` Ben Peart 2018-11-26 19:59 ` Stefan Beller 2018-11-26 21:47 ` Ben Peart 2018-11-26 22:02 ` Stefan Beller 2018-11-27 0:50 ` Junio C Hamano 2018-11-20 6:12 ` [PATCH 3/5] index: do not warn about unrecognized extensions Jonathan Nieder 2018-11-20 6:14 ` [PATCH 4/5] index: make index.threads=true enable ieot and eoie Jonathan Nieder 2018-11-20 13:24 ` Ben Peart 2018-11-20 6:15 ` [PATCH 5/5] index: offer advice for unknown index extensions Jonathan Nieder 2018-11-20 9:26 ` Ævar Arnfjörð Bjarmason 2018-11-20 13:30 ` Ben Peart 2018-11-21 0:22 ` Junio C Hamano 2018-11-21 0:39 ` Jonathan Nieder 2018-11-21 0:44 ` Jonathan Nieder 2018-11-21 5:01 ` Junio C Hamano 2018-11-21 5:04 ` Jonathan Nieder 2018-11-21 5:15 ` Junio C Hamano 2018-11-21 5:31 ` Junio C Hamano 2018-11-21 1:03 ` Jonathan Nieder 2018-11-21 4:23 ` Junio C Hamano 2018-11-21 4:57 ` Jonathan Nieder 2018-11-21 9:30 ` Ævar Arnfjörð Bjarmason 2018-11-13 0:40 ` [PATCH 3/3] index: do not warn about unrecognized extensions Jonathan Nieder 2018-11-13 1:10 ` Junio C Hamano 2018-11-13 15:25 ` Ben Peart 2018-11-14 3:24 ` Junio C Hamano 2018-11-14 18:19 ` Ben Peart
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=12b29bbf-4565-fe0e-97d4-19da2b4ddf5e@gmail.com \ --to=peartben@gmail.com \ --cc=Ben.Peart@microsoft.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=sbeller@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
git@vger.kernel.org list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for the project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git