git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <benpeart@microsoft.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, pclouds@gmail.com, chriscool@tuxfamily.org,
	Johannes.Schindelin@gmx.de, alexmv@dropbox.com, peff@peff.net,
	Ben Peart <benpeart@microsoft.com>
Subject: [PATCH v1 0/4] Speed up index load through parallelization
Date: Thu,  9 Nov 2017 09:17:33 -0500	[thread overview]
Message-ID: <20171109141737.47976-1-benpeart@microsoft.com> (raw)

This patch will help address the CPU cost of loading the index by adding
a table of contents extension to the index that will allow us to
multi-thread the loading and conversion of cache entries from the on-disk
format, to the in-memory format.  This is particularly beneficial
with large indexes and V4 indexes which have more CPU cost due to the
prefix-encoding.

I wanted to get feedback on the concept as the way I'm adding the table
of contents information via an extension that can be read before the
variable length section of cache entries and other extensions is a bit
of a clever hack (see below) as is the resetting of the prefix encoding
for V4 indexes. Both, however, are entirely backwards compatible with
older versions of git which can still properly read and use the index.

I'm not particularly fond of the names "fastindex" and "IEOT." I've
wondered if "indextoc" and "Index Table of Contents (ITOC)" would be
better names but I'm open to suggestions.

As there is overhead to spinning up a thread, there is logic to only
do the index loading in parallel when there are enough entries for it
to help (currently set at 7,500 per thread with a minimum of 2 threads).

The impact of the change can be seen using t/helper/test-read-cache:

                                fastindex
test            count   files   TRUE    FALSE     Savings
------------------------------------------------------------------------
test-read-cache 500     100K    6.39	8.33      23.36%
test-read-cache 100     1M      12.49   18.68     33.12%

The on-disk format looks like this:

Index header
Cache entry 1
Cache entry 2
.
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1

Signed-off-by: Ben Peart <benpeart@microsoft.com>

Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/1146d38932
Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git checkout 1146d38932

Ben Peart (4):
  fastindex: speed up index load through parallelization
  update-index: add fastindex support to update-index
  fastindex: add test tools and a test script
  fastindex: add documentation for the fastindex extension

 Documentation/config.txt                 |   8 +
 Documentation/git-update-index.txt       |  11 +
 Documentation/technical/index-format.txt |  26 +++
 Makefile                                 |   2 +
 builtin/update-index.c                   |  22 ++
 cache.h                                  |  25 +++
 config.c                                 |  20 ++
 config.h                                 |   1 +
 environment.c                            |   3 +
 read-cache.c                             | 343 +++++++++++++++++++++++++++++--
 t/helper/test-dump-fast-index.c          |  68 ++++++
 t/helper/test-fast-index.c               |  84 ++++++++
 t/t1800-fast-index.sh                    |  55 +++++
 13 files changed, 647 insertions(+), 21 deletions(-)
 create mode 100644 t/helper/test-dump-fast-index.c
 create mode 100644 t/helper/test-fast-index.c
 create mode 100644 t/t1800-fast-index.sh


base-commit: 7668cbc60578f99a4c048f8f8f38787930b8147b
-- 
2.15.0.windows.1



             reply	other threads:[~2017-11-09 14:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09 14:17 Ben Peart [this message]
2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed up index load through parallelization Ben Peart
2017-11-10  4:46   ` Junio C Hamano
2017-11-13 16:42     ` Ben Peart
2017-11-14  1:10       ` Junio C Hamano
2017-11-14 14:31         ` Ben Peart
2017-11-14 15:04           ` Junio C Hamano
2017-11-14 15:40             ` Ben Peart
2017-11-15  1:12               ` Junio C Hamano
2017-11-15  4:16                 ` Ben Peart
2017-11-15  4:40                   ` Junio C Hamano
2017-11-20 14:01                     ` Ben Peart
2017-11-20 14:20                       ` Jeff King
2017-11-20 15:38                         ` Jeff King
2017-11-20 23:51                       ` Ramsay Jones
2017-11-21  0:45                         ` Ben Peart
2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171109141737.47976-1-benpeart@microsoft.com \
    --to=benpeart@microsoft.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alexmv@dropbox.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).