From: Ben Peart <peartben@gmail.com> To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart <benpeart@microsoft.com> Subject: [PATCH v6 0/7] speed up index load through parallelization Date: Wed, 26 Sep 2018 15:54:35 -0400 Message-ID: <20180926195442.1380-1-benpeart@microsoft.com> (raw) In-Reply-To: <20180823154053.20212-1-benpeart@microsoft.com> Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/a0300882d4 Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v6 && git checkout a0300882d4 This iteration brings back the Index Entry Offset Table (IEOT) extension which enables us to multi-thread the cache entry parsing without having the primary thread have to scan all the entries first. In cases where the cache entry parsing is the most expensive part, this yields some additional savings. Using p0002-read-cache.sh to generate some performance numbers shows how each of the various patches contribute to the overall performance win. Test w/100,000 files Baseline Optimize V4 Extensions Entries ---------------------------------------------------------------------------- 0002.1: read_cache 22.36 18.74 -16.2% 18.64 -16.6% 12.63 -43.5% Test w/1,000,000 files Baseline Optimize V4 Extensions Entries ----------------------------------------------------------------------------- 0002.1: read_cache 304.40 270.70 -11.1% 195.50 -35.8% 204.82 -32.7% Note that on the 1,000,000 files case, multi-threading the cache entry parsing does not yield a performance win. This is because the cost to parse the index extensions in this repo, far outweigh the cost of loading the cache entries. Name First Last Elapsed load_index_extensions() 629.001 870.244 241.243 load_cache_entries_thread() 683.911 723.199 39.288 load_cache_entries_thread() 686.206 723.512 37.306 load_cache_entries_thread() 686.43 722.596 36.166 load_cache_entries_thread() 684.998 718.74 33.742 load_cache_entries_thread() 685.035 718.698 33.663 load_cache_entries_thread() 686.557 709.545 22.988 load_cache_entries_thread() 684.533 703.536 19.003 load_cache_entries_thread() 684.537 703.521 18.984 load_cache_entries_thread() 685.062 703.774 18.712 load_cache_entries_thread() 685.42 703.416 17.996 load_cache_entries_thread() 648.604 664.496 15.892 293.74 Total load_cache_entries_thread() The high cost of parsing the index extensions is driven by the cache tree and the untracked cache extensions. As this is currently the longest pole, any reduction in this time will reduce the overall index load times so is worth further investigation in another patch series. Name First Last Elapsed | + git!read_index_extension 684.052 870.244 186.192 | + git!cache_tree_read 684.052 797.801 113.749 | + git!read_untracked_extension 797.801 870.244 72.443 One option would be to load each extension on a separate thread but I believe that is overkill for the vast majority of repos. Instead, some optimization of the loading code for these two extensions is probably worth looking into as a quick examination shows that the bulk of the time for both of them is spent in xcalloc(). ### Patches Ben Peart (6): read-cache: clean up casting and byte decoding eoie: add End of Index Entry (EOIE) extension config: add new index.threads config setting read-cache: load cache extensions on a worker thread ieot: add Index Entry Offset Table (IEOT) extension read-cache: load cache entries on worker threads Nguyễn Thái Ngọc Duy (1): read-cache.c: optimize reading index format v4 Documentation/config.txt | 7 + Documentation/technical/index-format.txt | 41 ++ config.c | 18 + config.h | 1 + read-cache.c | 741 +++++++++++++++++++---- t/README | 10 + t/t1700-split-index.sh | 2 + 7 files changed, 705 insertions(+), 115 deletions(-) base-commit: fe8321ec057f9231c26c29b364721568e58040f7 -- 2.18.0.windows.1
next prev parent reply other threads:[~2018-09-26 19:54 UTC|newest] Thread overview: 199+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-23 15:41 [PATCH v1] read-cache: " Ben Peart 2018-08-23 17:31 ` Stefan Beller 2018-08-23 19:44 ` Ben Peart 2018-08-24 18:40 ` Duy Nguyen 2018-08-28 14:53 ` Ben Peart 2018-08-23 18:06 ` Junio C Hamano 2018-08-23 20:33 ` Ben Peart 2018-08-24 15:37 ` Duy Nguyen 2018-08-24 15:57 ` Duy Nguyen 2018-08-24 17:28 ` Ben Peart 2018-08-25 6:44 ` [PATCH] read-cache.c: optimize reading index format v4 Nguyễn Thái Ngọc Duy 2018-08-27 19:36 ` Junio C Hamano 2018-08-28 19:25 ` Duy Nguyen 2018-08-28 23:54 ` Ben Peart 2018-08-29 17:14 ` Junio C Hamano 2018-09-04 16:08 ` Duy Nguyen 2018-09-02 13:19 ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy 2018-09-02 13:19 ` [PATCH v2 1/1] read-cache.c: " Nguyễn Thái Ngọc Duy 2018-09-04 18:58 ` Junio C Hamano 2018-09-04 19:31 ` Junio C Hamano 2018-08-24 18:20 ` [PATCH v1] read-cache: speed up index load through parallelization Duy Nguyen 2018-08-24 18:40 ` Ben Peart 2018-08-24 19:00 ` Duy Nguyen 2018-08-24 19:57 ` Ben Peart 2018-08-29 15:25 ` [PATCH v2 0/3] " Ben Peart 2018-08-29 15:25 ` [PATCH v2 1/3] " Ben Peart 2018-08-29 17:14 ` Junio C Hamano 2018-08-29 21:35 ` Ben Peart 2018-09-03 19:16 ` Duy Nguyen 2018-08-29 15:25 ` [PATCH v2 2/3] read-cache: load cache extensions on worker thread Ben Peart 2018-08-29 17:12 ` Junio C Hamano 2018-08-29 21:42 ` Ben Peart 2018-08-29 22:19 ` Junio C Hamano 2018-09-03 19:21 ` Duy Nguyen 2018-09-03 19:27 ` Duy Nguyen 2018-08-29 15:25 ` [PATCH v2 3/3] read-cache: micro-optimize expand_name_field() to speed up V4 index parsing Ben Peart 2018-09-06 21:03 ` [PATCH v3 0/4] read-cache: speed up index load through parallelization Ben Peart 2018-09-06 21:03 ` [PATCH v3 1/4] read-cache: optimize expand_name_field() to speed up V4 index parsing Ben Peart 2018-09-06 21:03 ` [PATCH v3 2/4] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-07 17:55 ` Junio C Hamano 2018-09-07 20:23 ` Ben Peart 2018-09-08 6:29 ` Martin Ågren 2018-09-08 14:03 ` Ben Peart 2018-09-08 17:08 ` Martin Ågren 2018-09-06 21:03 ` [PATCH v3 3/4] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-07 21:10 ` Junio C Hamano 2018-09-08 14:56 ` Ben Peart 2018-09-06 21:03 ` [PATCH v3 4/4] read-cache: speed up index load through parallelization Ben Peart 2018-09-07 4:16 ` Torsten Bögershausen 2018-09-07 13:43 ` Ben Peart 2018-09-07 17:21 ` [PATCH v3 0/4] " Junio C Hamano 2018-09-07 18:31 ` Ben Peart 2018-09-08 13:18 ` Duy Nguyen 2018-09-11 23:26 ` [PATCH v4 0/5] " Ben Peart 2018-09-11 23:26 ` [PATCH v4 1/5] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-11 23:26 ` [PATCH v4 2/5] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-11 23:26 ` [PATCH v4 3/5] read-cache: speed up index load through parallelization Ben Peart 2018-09-11 23:26 ` [PATCH v4 4/5] read-cache.c: optimize reading index format v4 Ben Peart 2018-09-11 23:26 ` [PATCH v4 5/5] read-cache: clean up casting and byte decoding Ben Peart 2018-09-12 14:34 ` [PATCH v4 0/5] read-cache: speed up index load through parallelization Ben Peart 2018-09-12 16:18 ` [PATCH v5 " Ben Peart 2018-09-12 16:18 ` [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-13 22:44 ` Junio C Hamano 2018-09-15 10:02 ` Duy Nguyen 2018-09-17 14:54 ` Ben Peart 2018-09-17 16:05 ` Duy Nguyen 2018-09-17 17:31 ` Junio C Hamano 2018-09-17 17:38 ` Duy Nguyen 2018-09-17 19:08 ` Junio C Hamano 2018-09-12 16:18 ` [PATCH v5 2/5] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-15 10:22 ` Duy Nguyen 2018-09-15 10:24 ` Duy Nguyen 2018-09-17 16:38 ` Ben Peart 2018-09-15 16:23 ` Duy Nguyen 2018-09-17 17:19 ` Junio C Hamano 2018-09-17 16:26 ` Ben Peart 2018-09-17 16:45 ` Duy Nguyen 2018-09-17 21:32 ` Junio C Hamano 2018-09-12 16:18 ` [PATCH v5 3/5] read-cache: load cache entries on worker threads Ben Peart 2018-09-15 10:31 ` Duy Nguyen 2018-09-17 17:25 ` Ben Peart 2018-09-15 11:07 ` Duy Nguyen 2018-09-15 11:09 ` Duy Nguyen 2018-09-17 18:52 ` Ben Peart 2018-09-15 11:29 ` Duy Nguyen 2018-09-12 16:18 ` [PATCH v5 4/5] read-cache.c: optimize reading index format v4 Ben Peart 2018-09-12 16:18 ` [PATCH v5 5/5] read-cache: clean up casting and byte decoding Ben Peart 2018-09-26 19:54 ` Ben Peart [this message] 2018-09-26 19:54 ` [PATCH v6 1/7] read-cache.c: optimize reading index format v4 Ben Peart 2018-09-26 19:54 ` [PATCH v6 2/7] read-cache: clean up casting and byte decoding Ben Peart 2018-09-26 19:54 ` [PATCH v6 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-09-28 0:19 ` SZEDER Gábor 2018-09-28 18:38 ` Ben Peart 2018-09-29 0:51 ` SZEDER Gábor 2018-09-29 5:45 ` Duy Nguyen 2018-09-29 18:24 ` Junio C Hamano 2018-09-26 19:54 ` [PATCH v6 4/7] config: add new index.threads config setting Ben Peart 2018-09-28 0:26 ` SZEDER Gábor 2018-09-28 13:39 ` Ben Peart 2018-09-28 17:07 ` Junio C Hamano 2018-09-28 19:41 ` Ben Peart 2018-09-28 20:30 ` Ramsay Jones 2018-09-28 22:15 ` Junio C Hamano 2018-10-01 13:17 ` Ben Peart 2018-10-01 15:06 ` SZEDER Gábor 2018-09-26 19:54 ` [PATCH v6 5/7] read-cache: load cache extensions on a worker thread Ben Peart 2018-09-26 19:54 ` [PATCH v6 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart 2018-09-26 19:54 ` [PATCH v6 7/7] read-cache: load cache entries on worker threads Ben Peart 2018-09-26 22:06 ` [PATCH v6 0/7] speed up index load through parallelization Junio C Hamano 2018-09-27 17:13 ` Duy Nguyen 2018-10-01 13:45 ` [PATCH v7 " Ben Peart 2018-10-01 13:45 ` [PATCH v7 1/7] read-cache.c: optimize reading index format v4 Ben Peart 2018-10-01 13:45 ` [PATCH v7 2/7] read-cache: clean up casting and byte decoding Ben Peart 2018-10-01 15:10 ` Duy Nguyen 2018-10-01 13:45 ` [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-10-01 15:17 ` SZEDER Gábor 2018-10-02 14:34 ` Ben Peart 2018-10-01 15:30 ` Duy Nguyen 2018-10-02 15:13 ` Ben Peart 2018-10-01 13:45 ` [PATCH v7 4/7] config: add new index.threads config setting Ben Peart 2018-10-01 13:45 ` [PATCH v7 5/7] read-cache: load cache extensions on a worker thread Ben Peart 2018-10-01 15:50 ` Duy Nguyen 2018-10-02 15:00 ` Ben Peart 2018-10-01 13:45 ` [PATCH v7 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart 2018-10-01 16:27 ` Duy Nguyen 2018-10-02 16:34 ` Ben Peart 2018-10-02 17:02 ` Duy Nguyen 2018-10-01 13:45 ` [PATCH v7 7/7] read-cache: load cache entries on worker threads Ben Peart 2018-10-01 17:09 ` Duy Nguyen 2018-10-02 19:09 ` Ben Peart 2018-10-10 15:59 ` [PATCH v8 0/7] speed up index load through parallelization Ben Peart 2018-10-10 15:59 ` [PATCH v8 1/7] read-cache.c: optimize reading index format v4 Ben Peart 2018-10-10 15:59 ` [PATCH v8 2/7] read-cache: clean up casting and byte decoding Ben Peart 2018-10-10 15:59 ` [PATCH v8 3/7] eoie: add End of Index Entry (EOIE) extension Ben Peart 2018-10-10 15:59 ` [PATCH v8 4/7] config: add new index.threads config setting Ben Peart 2018-10-10 15:59 ` [PATCH v8 5/7] read-cache: load cache extensions on a worker thread Ben Peart 2018-10-10 15:59 ` [PATCH v8 6/7] ieot: add Index Entry Offset Table (IEOT) extension Ben Peart 2018-10-10 15:59 ` [PATCH v8 7/7] read-cache: load cache entries on worker threads Ben Peart 2018-10-19 16:11 ` Jeff King 2018-10-22 2:14 ` Junio C Hamano 2018-10-22 14:40 ` Ben Peart 2018-10-12 3:18 ` [PATCH v8 0/7] speed up index load through parallelization Junio C Hamano 2018-10-14 12:28 ` Duy Nguyen 2018-10-15 17:33 ` Ben Peart 2018-11-13 0:38 ` [PATCH 0/3] Avoid confusing messages from new index extensions (Re: [PATCH v8 0/7] speed up index load through parallelization) Jonathan Nieder 2018-11-13 0:39 ` [PATCH 1/3] eoie: default to not writing EOIE section Jonathan Nieder 2018-11-13 1:05 ` Junio C Hamano 2018-11-13 15:14 ` Ben Peart 2018-11-13 18:25 ` Jonathan Nieder 2018-11-14 1:36 ` Junio C Hamano 2018-11-15 0:19 ` Jonathan Nieder 2018-11-13 0:39 ` [PATCH 2/3] ieot: default to not writing IEOT section Jonathan Nieder 2018-11-13 0:58 ` Jonathan Tan 2018-11-13 1:09 ` Junio C Hamano 2018-11-13 1:12 ` Jonathan Nieder 2018-11-13 15:37 ` Duy Nguyen 2018-11-13 18:09 ` Jonathan Nieder 2018-11-13 15:22 ` Ben Peart 2018-11-13 18:18 ` Jonathan Nieder 2018-11-13 19:15 ` Ben Peart 2018-11-13 21:08 ` Jonathan Nieder 2018-11-14 18:09 ` Ben Peart 2018-11-15 0:05 ` Jonathan Nieder 2018-11-14 3:05 ` Junio C Hamano 2018-11-20 6:09 ` [PATCH v2 0/5] Avoid confusing messages from new index extensions Jonathan Nieder 2018-11-20 6:11 ` [PATCH 1/5] eoie: default to not writing EOIE section Jonathan Nieder 2018-11-20 13:06 ` Ben Peart 2018-11-20 13:21 ` SZEDER Gábor 2018-11-21 16:46 ` Jeff King 2018-11-22 0:47 ` Junio C Hamano 2018-11-20 15:01 ` Ben Peart 2018-11-20 6:12 ` [PATCH 2/5] ieot: default to not writing IEOT section Jonathan Nieder 2018-11-20 13:07 ` Ben Peart 2018-11-26 19:59 ` Stefan Beller 2018-11-26 21:47 ` Ben Peart 2018-11-26 22:02 ` Stefan Beller 2018-11-27 0:50 ` Junio C Hamano 2018-11-20 6:12 ` [PATCH 3/5] index: do not warn about unrecognized extensions Jonathan Nieder 2018-11-20 6:14 ` [PATCH 4/5] index: make index.threads=true enable ieot and eoie Jonathan Nieder 2018-11-20 13:24 ` Ben Peart 2018-11-20 6:15 ` [PATCH 5/5] index: offer advice for unknown index extensions Jonathan Nieder 2018-11-20 9:26 ` Ævar Arnfjörð Bjarmason 2018-11-20 13:30 ` Ben Peart 2018-11-21 0:22 ` Junio C Hamano 2018-11-21 0:39 ` Jonathan Nieder 2018-11-21 0:44 ` Jonathan Nieder 2018-11-21 5:01 ` Junio C Hamano 2018-11-21 5:04 ` Jonathan Nieder 2018-11-21 5:15 ` Junio C Hamano 2018-11-21 5:31 ` Junio C Hamano 2018-11-21 1:03 ` Jonathan Nieder 2018-11-21 4:23 ` Junio C Hamano 2018-11-21 4:57 ` Jonathan Nieder 2018-11-21 9:30 ` Ævar Arnfjörð Bjarmason 2018-11-13 0:40 ` [PATCH 3/3] index: do not warn about unrecognized extensions Jonathan Nieder 2018-11-13 1:10 ` Junio C Hamano 2018-11-13 15:25 ` Ben Peart 2018-11-14 3:24 ` Junio C Hamano 2018-11-14 18:19 ` Ben Peart
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180926195442.1380-1-benpeart@microsoft.com \ --to=peartben@gmail.com \ --cc=benpeart@microsoft.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=pclouds@gmail.com \ --subject='Re: [PATCH v6 0/7] speed up index load through parallelization' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
git@vger.kernel.org list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git