git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/4] Speed up index load through parallelization
@ 2017-11-09 14:17 Ben Peart
  2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Ben Peart @ 2017-11-09 14:17 UTC (permalink / raw)
  To: git
  Cc: gitster, pclouds, chriscool, Johannes.Schindelin, alexmv, peff,
	Ben Peart

This patch will help address the CPU cost of loading the index by adding
a table of contents extension to the index that will allow us to
multi-thread the loading and conversion of cache entries from the on-disk
format, to the in-memory format.  This is particularly beneficial
with large indexes and V4 indexes which have more CPU cost due to the
prefix-encoding.

I wanted to get feedback on the concept as the way I'm adding the table
of contents information via an extension that can be read before the
variable length section of cache entries and other extensions is a bit
of a clever hack (see below) as is the resetting of the prefix encoding
for V4 indexes. Both, however, are entirely backwards compatible with
older versions of git which can still properly read and use the index.

I'm not particularly fond of the names "fastindex" and "IEOT." I've
wondered if "indextoc" and "Index Table of Contents (ITOC)" would be
better names but I'm open to suggestions.

As there is overhead to spinning up a thread, there is logic to only
do the index loading in parallel when there are enough entries for it
to help (currently set at 7,500 per thread with a minimum of 2 threads).

The impact of the change can be seen using t/helper/test-read-cache:

                                fastindex
test            count   files   TRUE    FALSE     Savings
------------------------------------------------------------------------
test-read-cache 500     100K    6.39	8.33      23.36%
test-read-cache 100     1M      12.49   18.68     33.12%

The on-disk format looks like this:

Index header
Cache entry 1
Cache entry 2
.
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1

Signed-off-by: Ben Peart <benpeart@microsoft.com>

Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/1146d38932
Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git checkout 1146d38932

Ben Peart (4):
  fastindex: speed up index load through parallelization
  update-index: add fastindex support to update-index
  fastindex: add test tools and a test script
  fastindex: add documentation for the fastindex extension

 Documentation/config.txt                 |   8 +
 Documentation/git-update-index.txt       |  11 +
 Documentation/technical/index-format.txt |  26 +++
 Makefile                                 |   2 +
 builtin/update-index.c                   |  22 ++
 cache.h                                  |  25 +++
 config.c                                 |  20 ++
 config.h                                 |   1 +
 environment.c                            |   3 +
 read-cache.c                             | 343 +++++++++++++++++++++++++++++--
 t/helper/test-dump-fast-index.c          |  68 ++++++
 t/helper/test-fast-index.c               |  84 ++++++++
 t/t1800-fast-index.sh                    |  55 +++++
 13 files changed, 647 insertions(+), 21 deletions(-)
 create mode 100644 t/helper/test-dump-fast-index.c
 create mode 100644 t/helper/test-fast-index.c
 create mode 100644 t/t1800-fast-index.sh


base-commit: 7668cbc60578f99a4c048f8f8f38787930b8147b
-- 
2.15.0.windows.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-11-21  0:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
2017-11-10  4:46   ` Junio C Hamano
2017-11-13 16:42     ` Ben Peart
2017-11-14  1:10       ` Junio C Hamano
2017-11-14 14:31         ` Ben Peart
2017-11-14 15:04           ` Junio C Hamano
2017-11-14 15:40             ` Ben Peart
2017-11-15  1:12               ` Junio C Hamano
2017-11-15  4:16                 ` Ben Peart
2017-11-15  4:40                   ` Junio C Hamano
2017-11-20 14:01                     ` Ben Peart
2017-11-20 14:20                       ` Jeff King
2017-11-20 15:38                         ` Jeff King
2017-11-20 23:51                       ` Ramsay Jones
2017-11-21  0:45                         ` Ben Peart
2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).