From: Ben Peart <benpeart@microsoft.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, pclouds@gmail.com, chriscool@tuxfamily.org,
Johannes.Schindelin@gmx.de, alexmv@dropbox.com, peff@peff.net,
Ben Peart <benpeart@microsoft.com>
Subject: [PATCH v1 0/4] Speed up index load through parallelization
Date: Thu, 9 Nov 2017 09:17:33 -0500 [thread overview]
Message-ID: <20171109141737.47976-1-benpeart@microsoft.com> (raw)
This patch will help address the CPU cost of loading the index by adding
a table of contents extension to the index that will allow us to
multi-thread the loading and conversion of cache entries from the on-disk
format, to the in-memory format. This is particularly beneficial
with large indexes and V4 indexes which have more CPU cost due to the
prefix-encoding.
I wanted to get feedback on the concept as the way I'm adding the table
of contents information via an extension that can be read before the
variable length section of cache entries and other extensions is a bit
of a clever hack (see below) as is the resetting of the prefix encoding
for V4 indexes. Both, however, are entirely backwards compatible with
older versions of git which can still properly read and use the index.
I'm not particularly fond of the names "fastindex" and "IEOT." I've
wondered if "indextoc" and "Index Table of Contents (ITOC)" would be
better names but I'm open to suggestions.
As there is overhead to spinning up a thread, there is logic to only
do the index loading in parallel when there are enough entries for it
to help (currently set at 7,500 per thread with a minimum of 2 threads).
The impact of the change can be seen using t/helper/test-read-cache:
fastindex
test count files TRUE FALSE Savings
------------------------------------------------------------------------
test-read-cache 500 100K 6.39 8.33 23.36%
test-read-cache 100 1M 12.49 18.68 33.12%
The on-disk format looks like this:
Index header
Cache entry 1
Cache entry 2
.
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/1146d38932
Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git checkout 1146d38932
Ben Peart (4):
fastindex: speed up index load through parallelization
update-index: add fastindex support to update-index
fastindex: add test tools and a test script
fastindex: add documentation for the fastindex extension
Documentation/config.txt | 8 +
Documentation/git-update-index.txt | 11 +
Documentation/technical/index-format.txt | 26 +++
Makefile | 2 +
builtin/update-index.c | 22 ++
cache.h | 25 +++
config.c | 20 ++
config.h | 1 +
environment.c | 3 +
read-cache.c | 343 +++++++++++++++++++++++++++++--
t/helper/test-dump-fast-index.c | 68 ++++++
t/helper/test-fast-index.c | 84 ++++++++
t/t1800-fast-index.sh | 55 +++++
13 files changed, 647 insertions(+), 21 deletions(-)
create mode 100644 t/helper/test-dump-fast-index.c
create mode 100644 t/helper/test-fast-index.c
create mode 100644 t/t1800-fast-index.sh
base-commit: 7668cbc60578f99a4c048f8f8f38787930b8147b
--
2.15.0.windows.1
next reply other threads:[~2017-11-09 14:18 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-09 14:17 Ben Peart [this message]
2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed up index load through parallelization Ben Peart
2017-11-10 4:46 ` Junio C Hamano
2017-11-13 16:42 ` Ben Peart
2017-11-14 1:10 ` Junio C Hamano
2017-11-14 14:31 ` Ben Peart
2017-11-14 15:04 ` Junio C Hamano
2017-11-14 15:40 ` Ben Peart
2017-11-15 1:12 ` Junio C Hamano
2017-11-15 4:16 ` Ben Peart
2017-11-15 4:40 ` Junio C Hamano
2017-11-20 14:01 ` Ben Peart
2017-11-20 14:20 ` Jeff King
2017-11-20 15:38 ` Jeff King
2017-11-20 23:51 ` Ramsay Jones
2017-11-21 0:45 ` Ben Peart
2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171109141737.47976-1-benpeart@microsoft.com \
--to=benpeart@microsoft.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=alexmv@dropbox.com \
--cc=chriscool@tuxfamily.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).