[PATCH v1 0/4] Speed up index load through parallelization

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH v1 0/4] Speed up index load through parallelization
@ 2017-11-09 14:17 Ben Peart
  2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Ben Peart @ 2017-11-09 14:17 UTC (permalink / raw)
  To: git
  Cc: gitster, pclouds, chriscool, Johannes.Schindelin, alexmv, peff,
	Ben Peart

This patch will help address the CPU cost of loading the index by adding
a table of contents extension to the index that will allow us to
multi-thread the loading and conversion of cache entries from the on-disk
format, to the in-memory format.  This is particularly beneficial
with large indexes and V4 indexes which have more CPU cost due to the
prefix-encoding.

I wanted to get feedback on the concept as the way I'm adding the table
of contents information via an extension that can be read before the
variable length section of cache entries and other extensions is a bit
of a clever hack (see below) as is the resetting of the prefix encoding
for V4 indexes. Both, however, are entirely backwards compatible with
older versions of git which can still properly read and use the index.

I'm not particularly fond of the names "fastindex" and "IEOT." I've
wondered if "indextoc" and "Index Table of Contents (ITOC)" would be
better names but I'm open to suggestions.

As there is overhead to spinning up a thread, there is logic to only
do the index loading in parallel when there are enough entries for it
to help (currently set at 7,500 per thread with a minimum of 2 threads).

The impact of the change can be seen using t/helper/test-read-cache:

                                fastindex
test            count   files   TRUE    FALSE     Savings
------------------------------------------------------------------------
test-read-cache 500     100K    6.39	8.33      23.36%
test-read-cache 100     1M      12.49   18.68     33.12%

The on-disk format looks like this:

Index header
Cache entry 1
Cache entry 2
.
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1

Signed-off-by: Ben Peart <benpeart@microsoft.com>

Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/1146d38932
Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git checkout 1146d38932

Ben Peart (4):
  fastindex: speed up index load through parallelization
  update-index: add fastindex support to update-index
  fastindex: add test tools and a test script
  fastindex: add documentation for the fastindex extension

 Documentation/config.txt                 |   8 +
 Documentation/git-update-index.txt       |  11 +
 Documentation/technical/index-format.txt |  26 +++
 Makefile                                 |   2 +
 builtin/update-index.c                   |  22 ++
 cache.h                                  |  25 +++
 config.c                                 |  20 ++
 config.h                                 |   1 +
 environment.c                            |   3 +
 read-cache.c                             | 343 +++++++++++++++++++++++++++++--
 t/helper/test-dump-fast-index.c          |  68 ++++++
 t/helper/test-fast-index.c               |  84 ++++++++
 t/t1800-fast-index.sh                    |  55 +++++
 13 files changed, 647 insertions(+), 21 deletions(-)
 create mode 100644 t/helper/test-dump-fast-index.c
 create mode 100644 t/helper/test-fast-index.c
 create mode 100644 t/t1800-fast-index.sh

base-commit: 7668cbc60578f99a4c048f8f8f38787930b8147b
-- 
2.15.0.windows.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
@ 2017-11-09 14:17 ` Ben Peart
  2017-11-10  4:46   ` Junio C Hamano
  2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Ben Peart @ 2017-11-09 14:17 UTC (permalink / raw)
  To: git
  Cc: gitster, pclouds, chriscool, Johannes.Schindelin, alexmv, peff,
	Ben Peart

This patch helps address the CPU cost of loading the index by adding
additional data to the index that will allow us to multi-thread the
loading and conversion of cache entries.

It accomplishes this by adding an (optional) index extension that is a
table of offsets to blocks of cache entries in the index file.  With
version 2, 3 or even 4 indexes, we can utilize the Index Entry Offset Table
(IEOT) to parallelize the loading and conversion of the cache entries
across all available CPU cores.

To make this work for V4 indexes, when writing the index, it periodically
"resets" the prefix-compression by encoding the current entry as if the
path name for the previous entry is completely different and saves the
offset of that entry in the IEOT.  Basically, with V4 indexes, it
generates offsets into blocks of prefix-compressed entries.

To enable reading the IEOT extension before reading all the variable
length cache entries and other extensions, the IEOT is written last,
right before the trailing SHA1.

The format of that extension has the signature bytes and size at the
beginning (like a normal extension) as well as at the end in reverse
order to enable parsing the extension by seeking back from the end of
the file.

During index load, read the index header then seek to the end of the
index, back up past the trailing SHA1 and look for the IEOT extension
signature bytes.  If they exist, read the 32-bit size and seek back to
the extension header and verify the leading header and size bits.  If
they all match, we can be assured we have a valid IEOT extension.

If the IEOT extension is available, create multiple threads to divide
the work of loading and converting the cache entries across all
available CPU cores.  Once the cache entries are loaded, the rest of the
extensions can be loaded and processed normally (skipping the IEOT entry
as it has already been processed).  If the IEOT extension is not
available then parsing the index will proceed as usual with a single thread.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 cache.h       |  25 +++++
 config.c      |  20 ++++
 config.h      |   1 +
 environment.c |   3 +
 read-cache.c  | 343 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 5 files changed, 371 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index d74f00d8db..ea0c4b4b97 100644
--- a/cache.h
+++ b/cache.h
@@ -794,6 +794,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_fast_index;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
@@ -1942,4 +1943,28 @@ void sleep_millisec(int millisec);
  */
 void safe_create_dir(const char *dir, int share);
 
+
+#ifndef NO_PTHREADS
+struct index_entry_offset
+{
+	/* starting byte offset into index file, count of index entries in this block */
+	int offset, nr;
+};
+
+struct index_entry_offset_table
+{
+	int nr;
+	struct index_entry_offset entries[0];
+};
+
+extern struct index_entry_offset_table *read_ieot_extension(void *mmap, size_t mmap_size);
+
+struct ondisk_cache_entry;
+extern struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
+	unsigned long *ent_size,
+	struct strbuf *previous_name,
+	int use_length);
+
+#endif
+
 #endif /* CACHE_H */
diff --git a/config.c b/config.c
index 903abf9533..44b08f57eb 100644
--- a/config.c
+++ b/config.c
@@ -1203,6 +1203,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.fastindex")) {
+		core_fast_index = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
@@ -2156,6 +2161,21 @@ int git_config_get_max_percent_split_change(void)
 	return -1; /* default value */
 }
 
+int ignore_fast_index_config;
+int git_config_get_fast_index(void)
+{
+	int val;
+
+	/* Hack for test programs like test-ieot */
+	if (ignore_fast_index_config)
+		return core_fast_index;
+
+	if (!git_config_get_maybe_bool("core.fastindex", &val))
+		return val;
+
+	return -1; /* default value */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
diff --git a/config.h b/config.h
index a49d264416..3f78b1d262 100644
--- a/config.h
+++ b/config.h
@@ -212,6 +212,7 @@ extern int git_config_get_pathname(const char *key, const char **dest);
 extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
+extern int git_config_get_fast_index(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/environment.c b/environment.c
index 8289c25b44..492ef9c741 100644
--- a/environment.c
+++ b/environment.c
@@ -87,6 +87,9 @@ int auto_comment_line_char;
 /* Parallel index stat data preload? */
 int core_preload_index = 1;
 
+/* Parallel index cache entry loading? */
+int core_fast_index;
+
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
  * ensure that they do not modify the untracked cache when reading it.
diff --git a/read-cache.c b/read-cache.c
index f3d125c114..174be0da38 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -19,6 +19,7 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "thread-utils.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -38,6 +39,7 @@
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_FASTINDEX 0x49454F54	  /* "IEOT" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
@@ -1551,6 +1553,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+	case CACHE_EXT_FASTINDEX:
+		/* already handled in do_read_index() */
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1604,10 +1609,12 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
  * number of bytes to be stripped from the end of the previous name,
  * and the bytes to append to the result, to come up with its name.
  */
-static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
+static unsigned long expand_name_field(struct strbuf *name, const char *cp_, int use_length)
 {
 	const unsigned char *ep, *cp = (const unsigned char *)cp_;
 	size_t len = decode_varint(&cp);
+	if (!use_length)
+		len = name->len;
 
 	if (name->len < len)
 		die("malformed name field in the index");
@@ -1618,9 +1625,10 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
 	return (const char *)ep + 1 - cp_;
 }
 
-static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
+struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 					    unsigned long *ent_size,
-					    struct strbuf *previous_name)
+					    struct strbuf *previous_name,
+						int use_length)
 {
 	struct cache_entry *ce;
 	size_t len;
@@ -1654,7 +1662,7 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 		*ent_size = ondisk_ce_size(ce);
 	} else {
 		unsigned long consumed;
-		consumed = expand_name_field(previous_name, name);
+		consumed = expand_name_field(previous_name, name, use_length);
 		ce = cache_entry_from_ondisk(ondisk, flags,
 					     previous_name->buf,
 					     previous_name->len);
@@ -1725,16 +1733,208 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_split_index(istate);
 }
 
+static unsigned long load_cache_entries(struct index_state *istate, int offset, int nr, void *mmap, unsigned long start_offset)
+{
+	int i;
+	unsigned long src_offset = start_offset;
+	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+
+	if (istate->version == 4)
+		previous_name = &previous_name_buf;
+	else
+		previous_name = NULL;
+
+	for (i = offset; i < offset + nr; i++) {
+		struct ondisk_cache_entry *disk_ce;
+		struct cache_entry *ce;
+		unsigned long consumed;
+
+		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
+		ce = create_from_disk(disk_ce, &consumed, previous_name, i == offset ? 0 : 1);
+		set_index_entry(istate, i, ce);
+
+		src_offset += consumed;
+	}
+	strbuf_release(&previous_name_buf);
+	return src_offset - start_offset;
+}
+
+#ifndef NO_PTHREADS
+
+/*
+ * Mostly randomly chosen cache entries per thread (it works on my machine):
+ * we want to have at least 7500 cache entries per thread for it to
+ * be worth starting a thread.
+ */
+#define THREAD_COST		(7500)
+#define IEOT_VERSION	(1)
+
+static int ce_write(git_SHA_CTX *context, int fd, void *data, unsigned int len);
+
+static int write_index_ext_header(git_SHA_CTX *context, int fd,
+	unsigned int ext, unsigned int sz);
+
+struct load_cache_entries_thread_data
+{
+	pthread_t pthread;
+	struct index_state *istate;
+	int offset;			/* starting index into the istate->cache array */
+	void *mmap;
+	unsigned long consumed;	/* return # of bytes in index file processed */
+	struct index_entry_offset_table *ieot;
+	int ieot_offset;	/* starting index into the ieot array */
+	int ieot_work;		/* count of ieot entries to process */
+};
+
+/*
+ * A thread proc to run the load_cache_entries() computation
+ * across multiple background threads.
+ */
+static void *load_cache_entries_thread(void *_data)
+{
+	struct load_cache_entries_thread_data *p = _data;
+	int i;
+
+	/* itterate across all ieot blocks assigned to this thread */
+	for (i = p->ieot_offset; i < p->ieot_offset + p->ieot_work; i++) {
+		p->consumed += load_cache_entries(p->istate, p->offset, p->ieot->entries[i].nr, p->mmap, p->ieot->entries[i].offset);
+		p->offset += p->ieot->entries[i].nr;
+	}
+	return NULL;
+}
+
+struct index_entry_offset_table *read_ieot_extension(void *mmap, size_t mmap_size)
+{
+	/*
+	 * The IEOT extension is guaranteed to be last so that it can be found
+	 * by scanning backwards from the EOF.  In addition to the regular 4-byte
+	 * extension name and 4-byte section length is network byte order, it
+	 * also stores the 4-byte extension name and section length in reverse order
+	 * at the end of the extension.
+	 *
+	 * IEOT
+	 * 4-byte length
+	 * 4-byte version
+	 * variable length extension data...
+	 * 4-byte version
+	 * 4-byte length
+	 * IEOT
+	 * <SHA1>
+	 *
+	 * If names, lengths and versions match, the extension is assumed to be valid.
+	 */
+	const char *index;
+	uint32_t extsize_leading, extsize_trailing, ext_version;
+	struct index_entry_offset_table *ieot;
+	int i, nr;
+
+	/* validate the trailing extension signature */
+	index = (const char *)mmap + mmap_size - GIT_SHA1_RAWSZ - 4;
+	if (CACHE_EXT(index) != CACHE_EXT_FASTINDEX)
+		return NULL;
+	index -= sizeof(uint32_t);
+
+	/*
+	 * Validate the offset we're going to look for the leading extension
+	 * signature is past the index header.
+	 */
+	extsize_trailing = get_be32(index);
+	if ((index - extsize_trailing) < ((const char *)mmap + 12))
+		return NULL;
+	index -= sizeof(uint32_t);
+
+	/* validate the trailing version is IEOT_VERSION */
+	ext_version = get_be32(index);
+	if (ext_version != IEOT_VERSION)
+		return NULL;
+	index -= (extsize_trailing - sizeof(uint32_t));
+
+	/* validate the leading extension signature */
+	if (CACHE_EXT(index) != CACHE_EXT_FASTINDEX)
+		return NULL;
+	index += sizeof(uint32_t);
+
+	/* validate the leading extension size */
+	extsize_leading = get_be32(index);
+	if (extsize_leading != extsize_trailing)
+		return NULL;
+	index += sizeof(uint32_t);
+
+	/* validate the leading version is IEOT_VERSION */
+	ext_version = get_be32(index);
+	if (ext_version != IEOT_VERSION)
+		return NULL;
+	index += sizeof(uint32_t);
+
+	/* extension size - leading/trailing version bytes - trailing size - trailing signature / bytes per entry */
+	nr = (extsize_leading - sizeof(uint32_t) - sizeof(uint32_t) - sizeof(uint32_t) - 4) / (sizeof(uint32_t) + sizeof(uint32_t));
+	assert(nr);
+	ieot = xmalloc(sizeof(struct index_entry_offset_table)
+		+ (nr * sizeof(struct index_entry_offset)));
+	ieot->nr = nr;
+	for (i = 0; i < nr; i++) {
+		ieot->entries[i].offset = get_be32(index);
+		index += sizeof(uint32_t);
+		ieot->entries[i].nr = get_be32(index);
+		index += sizeof(uint32_t);
+	}
+
+	return ieot;
+}
+
+static int write_ieot_extension(git_SHA_CTX *context, int fd, struct index_entry_offset_table *ieot)
+{
+	struct strbuf sb = STRBUF_INIT;
+	uint32_t buffer;
+	int i, err;
+
+	/* version */
+	put_be32(&buffer, IEOT_VERSION);
+	strbuf_add(&sb, &buffer, sizeof(uint32_t));
+
+	/* ieot */
+	for (i = 0; i < ieot->nr; i++) {
+
+		/* offset */
+		put_be32(&buffer, ieot->entries[i].offset);
+		strbuf_add(&sb, &buffer, sizeof(uint32_t));
+
+		/* count */
+		put_be32(&buffer, ieot->entries[i].nr);
+		strbuf_add(&sb, &buffer, sizeof(uint32_t));
+	}
+
+	/* version */
+	put_be32(&buffer, IEOT_VERSION);
+	strbuf_add(&sb, &buffer, sizeof(uint32_t));
+
+	/* size */
+	put_be32(&buffer, sb.len + sizeof(uint32_t) + 4);
+	strbuf_add(&sb, &buffer, sizeof(uint32_t));
+
+	/* signature */
+	put_be32(&buffer, CACHE_EXT_FASTINDEX);
+	strbuf_add(&sb, &buffer, 4);
+
+	/* leading signature and size + extension data */
+	err = write_index_ext_header(context, fd, CACHE_EXT_FASTINDEX, sb.len) < 0
+		|| ce_write(context, fd, sb.buf, sb.len) < 0;
+	strbuf_release(&sb);
+
+	return err;
+}
+
+#endif
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
-	int fd, i;
+	int fd;
 	struct stat st;
 	unsigned long src_offset;
 	struct cache_header *hdr;
 	void *mmap;
 	size_t mmap_size;
-	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 
 	if (istate->initialized)
 		return istate->cache_nr;
@@ -1771,24 +1971,75 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
 	istate->initialized = 1;
 
-	if (istate->version == 4)
-		previous_name = &previous_name_buf;
-	else
-		previous_name = NULL;
-
 	src_offset = sizeof(*hdr);
-	for (i = 0; i < istate->cache_nr; i++) {
-		struct ondisk_cache_entry *disk_ce;
-		struct cache_entry *ce;
-		unsigned long consumed;
+#ifdef NO_PTHREADS
+	src_offset += load_cache_entries(istate, 0, istate->cache_nr, mmap, src_offset);
+#else
+	if (git_config_get_fast_index() != 1) {
+		src_offset += load_cache_entries(istate, 0, istate->cache_nr, mmap, src_offset);
+	} else {
+		struct index_entry_offset_table *ieot;
+		int threads, cpus = online_cpus();
 
-		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
-		ce = create_from_disk(disk_ce, &consumed, previous_name);
-		set_index_entry(istate, i, ce);
+		threads = istate->cache_nr / THREAD_COST;
+		if (threads > cpus)
+			threads = cpus;
 
-		src_offset += consumed;
+		/* enable testing with fewer than default minimum of entries */
+		if (threads < 2 && getenv("GIT_FASTINDEX_TEST"))
+			threads = 2;
+
+		/*
+		 * Locate and read the fast index extension so that we can use it
+		 * to multi-thread the reading of the cache entries.
+		 */
+		ieot = read_ieot_extension(mmap, mmap_size);
+		if (threads < 2 || !ieot) {
+			src_offset += load_cache_entries(istate, 0, istate->cache_nr, mmap, src_offset);
+		} else {
+			int i, offset, ieot_work, ieot_offset;
+			struct load_cache_entries_thread_data *data;
+
+			/* ensure we have no more threads than we have blocks to process */
+			if (threads > ieot->nr)
+				threads = ieot->nr;
+			data = xcalloc(threads, sizeof(struct load_cache_entries_thread_data));
+
+			offset = ieot_offset = 0;
+			ieot_work = DIV_ROUND_UP(ieot->nr, threads);
+			for (i = 0; i < threads; i++) {
+				struct load_cache_entries_thread_data *p = &data[i];
+				int j;
+
+				if (ieot_offset + ieot_work > ieot->nr)
+					ieot_work = ieot->nr - ieot_offset;
+
+				p->istate = istate;
+				p->offset = offset;
+				p->mmap = mmap;
+				p->ieot = ieot;
+				p->ieot_offset = ieot_offset;
+				p->ieot_work = ieot_work;
+				if (pthread_create(&p->pthread, NULL, load_cache_entries_thread, p))
+					die("unable to create threaded load_cache_entries");
+
+				/* increment by the number of cache entries in the ieot block being processed */
+				for (j = 0; j < ieot_work; j++)
+					offset += ieot->entries[ieot_offset + j].nr;
+				ieot_offset += ieot_work;
+			}
+			for (i = 0; i < threads; i++) {
+				struct load_cache_entries_thread_data *p = data + i;
+				if (pthread_join(p->pthread, NULL))
+					die("unable to join threaded load_cache_entries");
+				src_offset += p->consumed;
+			}
+			free(data);
+		}
+		free(ieot);
 	}
-	strbuf_release(&previous_name_buf);
+#endif
+
 	istate->timestamp.sec = st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 
@@ -2205,6 +2456,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	struct ondisk_cache_entry_extended ondisk;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 	int drop_cache_tree = 0;
+	int ieot_work = 1;
+	struct index_entry_offset_table *ieot = NULL;
+	int offset, nr;
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -2238,6 +2492,22 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
 		return -1;
 
+	if (!strip_extensions && core_fast_index) {
+		int ieot_blocks, cpus = online_cpus();
+
+		ieot_blocks = istate->cache_nr / THREAD_COST;
+		if (ieot_blocks < 1)
+			ieot_blocks = 1;
+		if (ieot_blocks > cpus)
+			ieot_blocks = cpus;
+		ieot = xcalloc(1, sizeof(struct index_entry_offset_table)
+			+ (ieot_blocks * sizeof(struct index_entry_offset)));
+		ieot->nr = 0;
+		ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
+	}
+
+	offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len;
+	nr = 0;
 	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
 
 	for (i = 0; i < entries; i++) {
@@ -2259,11 +2529,30 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 
 			drop_cache_tree = 1;
 		}
+		if (ieot && i && (i % ieot_work == 0)) {
+			ieot->entries[ieot->nr].nr = nr;
+			ieot->entries[ieot->nr].offset = offset;
+			ieot->nr++;
+			/*
+			 * If we have a V4 index, set the first byte to an invalid
+			 * character to ensure there is nothing common with the previous
+			 * entry
+			 */
+			if (previous_name)
+				previous_name->buf[0] = 0;
+			nr = 0;
+			offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len;
+		}
 		if (ce_write_entry(&c, newfd, ce, previous_name, (struct ondisk_cache_entry *)&ondisk) < 0)
 			err = -1;
-
 		if (err)
 			break;
+		nr++;
+	}
+	if (ieot && nr) {
+		ieot->entries[ieot->nr].nr = nr;
+		ieot->entries[ieot->nr].offset = offset;
+		ieot->nr++;
 	}
 	strbuf_release(&previous_name_buf);
 
@@ -2315,6 +2604,18 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 			return -1;
 	}
 
+	/*
+	 * CACHE_EXT_FASTINDEX must be written as the last entry before the SHA1
+	 * so that it can be found and processed before all the index entries are
+	 * read.
+	 */
+	if (!strip_extensions && ieot) {
+		err = write_ieot_extension(&c, newfd, ieot);
+		free(ieot);
+		if (err)
+			return -1;
+	}
+
 	if (ce_flush(&c, newfd, istate->sha1))
 		return -1;
 	if (close_tempfile_gently(tempfile)) {
-- 
2.15.0.windows.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 2/4] update-index: add fastindex support to update-index
  2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
  2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
@ 2017-11-09 14:17 ` Ben Peart
  2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
  2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart
  3 siblings, 0 replies; 19+ messages in thread
From: Ben Peart @ 2017-11-09 14:17 UTC (permalink / raw)
  To: git
  Cc: gitster, pclouds, chriscool, Johannes.Schindelin, alexmv, peff,
	Ben Peart

Add support in update-index to manually add/remove the fastindex
extension via --[no-]fastindex flags.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 builtin/update-index.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index fefbe60167..63b7f18807 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -917,6 +917,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 	struct refresh_params refresh_args = {0, &has_errors};
 	int lock_error = 0;
 	int split_index = -1;
+	int fast_index = -1;
 	struct lock_file lock_file = LOCK_INIT;
 	struct parse_opt_ctx_t ctx;
 	strbuf_getline_fn getline_fn;
@@ -1008,6 +1009,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    N_("test if the filesystem supports untracked cache"), UC_TEST),
 		OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
 			    N_("enable untracked cache without testing the filesystem"), UC_FORCE),
+		OPT_BOOL(0, "fastindex", &fast_index,
+			N_("enable or disable fast index parsing")),
 		OPT_END()
 	};
 
@@ -1146,6 +1149,25 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		die("BUG: bad untracked_cache value: %d", untracked_cache);
 	}
 
+	if (fast_index > 0) {
+		if (git_config_get_fast_index() == 0)
+			warning(_("core.fastindex is unset; "
+				"set it if you really want to "
+				"enable fastindex"));
+		core_fast_index = 1;
+		active_cache_changed |= SOMETHING_CHANGED;
+		report(_("fastindex enabled"));
+	}
+	else if (!fast_index) {
+		if (git_config_get_fast_index() == 1)
+			warning(_("core.fastindex is set; "
+				"remove it if you really want to "
+				"disable fastindex"));
+		core_fast_index = 0;
+		active_cache_changed |= SOMETHING_CHANGED;
+		report(_("fastindex disabled"));
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
-- 
2.15.0.windows.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 3/4] fastindex: add test tools and a test script
  2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
  2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
  2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
@ 2017-11-09 14:17 ` Ben Peart
  2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart
  3 siblings, 0 replies; 19+ messages in thread
From: Ben Peart @ 2017-11-09 14:17 UTC (permalink / raw)
  To: git
  Cc: gitster, pclouds, chriscool, Johannes.Schindelin, alexmv, peff,
	Ben Peart

Add a test utility (test-dump-fast-index) that dumps the contents of the
IEOT to stdout.  This enables checking the existance of the extension
and the ability to parse and output its contents.

Add a test utility (test-fast-index) that loads the index using the
fastindex logic then loads it using the regular logic and compares the
results of the two.  Any differences are reported and an error is returned.

Add a test script (t1800-fast-index.sh) to:

Test the ability to add/remove the fastindex index extension via
update-index.

Verify that loading the index with and without the fastindex extension
return the same results with both V2 and V4 indexes.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Makefile                        |  2 +
 t/helper/test-dump-fast-index.c | 68 +++++++++++++++++++++++++++++++++
 t/helper/test-fast-index.c      | 84 +++++++++++++++++++++++++++++++++++++++++
 t/t1800-fast-index.sh           | 55 +++++++++++++++++++++++++++
 4 files changed, 209 insertions(+)
 create mode 100644 t/helper/test-dump-fast-index.c
 create mode 100644 t/helper/test-fast-index.c
 create mode 100644 t/t1800-fast-index.sh

diff --git a/Makefile b/Makefile
index cd75985991..a7df82721e 100644
--- a/Makefile
+++ b/Makefile
@@ -647,9 +647,11 @@ TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-fast-index
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
 TEST_PROGRAMS_NEED_X += test-fake-ssh
+TEST_PROGRAMS_NEED_X += test-fast-index
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-hashmap
 TEST_PROGRAMS_NEED_X += test-index-version
diff --git a/t/helper/test-dump-fast-index.c b/t/helper/test-dump-fast-index.c
new file mode 100644
index 0000000000..4e6d28d660
--- /dev/null
+++ b/t/helper/test-dump-fast-index.c
@@ -0,0 +1,68 @@
+#include "cache.h"
+
+int cmd_main(int ac, const char **av)
+{
+#ifndef NO_PTHREADS
+	const char *path;
+	int fd, i;
+	struct stat st;
+	void *mmap;
+	size_t mmap_size;
+	struct cache_header *hdr;
+	struct index_entry_offset_table *ieot;
+	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	int err = 0;
+
+	setup_git_directory();
+	path = get_index_file();
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		die("%s: index file open failed", path);
+	}
+
+	if (fstat(fd, &st))
+		die("cannot stat the open index");
+
+	mmap_size = xsize_t(st.st_size);
+	if (mmap_size < sizeof(struct cache_header) + GIT_SHA1_RAWSZ)
+		die("index file smaller than expected");
+
+	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (mmap == MAP_FAILED)
+		die("unable to map index file");
+	close(fd);
+
+	hdr = mmap;
+	if (ntohl(hdr->hdr_version) == 4)
+		previous_name = &previous_name_buf;
+	else
+		previous_name = NULL;
+
+	ieot = read_ieot_extension(mmap, mmap_size);
+	if (ieot) {
+		printf("IEOT with %d entries\n", ieot->nr);
+		printf("  Offset    Count Name\n");
+		printf("-------- -------- ------------------------\n");
+		for (i = 0; i < ieot->nr; i++) {
+			struct ondisk_cache_entry *disk_ce;
+			struct cache_entry *ce;
+			unsigned long consumed;
+
+			disk_ce = (struct ondisk_cache_entry *)((char *)mmap + ieot->entries[i].offset);
+			ce = create_from_disk(disk_ce, &consumed, previous_name, 0);
+			printf("%8d %8d %.*s\n", ieot->entries[i].offset, ieot->entries[i].nr, ce->ce_namelen, ce->name);
+			free(ce);
+		}
+	} else {
+		printf("missing or invalid extension");
+		err = 1;
+	}
+
+	free(ieot);
+	munmap(mmap, mmap_size);
+	return err;
+#else
+	die("ieot only supported with PTHREADS");
+	return -1;
+#endif
+}
diff --git a/t/helper/test-fast-index.c b/t/helper/test-fast-index.c
new file mode 100644
index 0000000000..fe63130ba0
--- /dev/null
+++ b/t/helper/test-fast-index.c
@@ -0,0 +1,84 @@
+#include "cache.h"
+
+int compare_ce(const struct cache_entry *ce1, const struct cache_entry *ce2)
+{
+	/*	struct hashmap_entry ent; */
+	/*	struct stat_data ce_stat_data; */
+
+	if (ce1->ce_mode != ce2->ce_mode) {
+		printf("ce_mode: %d:%d\n", ce1->ce_mode, ce2->ce_mode);
+		return 1;
+	}
+
+	if (ce1->ce_flags != ce2->ce_flags) {
+		printf("ce_flags: %d:%d\n", ce1->ce_flags, ce2->ce_flags);
+		return 1;
+	}
+
+	if (ce1->ce_namelen != ce2->ce_namelen) {
+		printf("namelen: %d:%d\n", ce1->ce_namelen, ce2->ce_namelen);
+		return 1;
+	}
+
+	if (ce1->index != ce2->index) {
+		printf("index: %d:%d\n", ce1->index, ce2->index);
+		return 1;
+	}
+
+	if (oidcmp(&ce1->oid, &ce2->oid)) {
+		printf("oid: %s:%s\n", oid_to_hex(&ce1->oid), oid_to_hex(&ce2->oid));
+		return 1;
+	}
+
+	if (strcmp(ce1->name, ce2->name)) {
+		printf("name: %s:%s\n", ce1->name, ce2->name);
+		return 1;
+	}
+
+
+	return 0;
+}
+
+extern int ignore_fast_index_config;
+
+int cmd_main(int ac, const char **av)
+{
+#ifndef NO_PTHREADS
+	static struct index_state index;
+	static struct index_state ieot;
+	int i, err = 0;
+
+	setup_git_directory();
+	ignore_fast_index_config = 1;
+	core_fast_index = 0;
+	read_index(&index);
+	core_fast_index = 1;
+	read_index(&ieot);
+
+	for (i = 0; i < index.cache_nr; i++) {
+		if (compare_ce(index.cache[i], ieot.cache[i])) {
+			struct cache_entry *ce;
+
+			ce = index.cache[i];
+			printf("%06o %s %d\t%s\n", ce->ce_mode,
+				oid_to_hex(&ce->oid), ce_stage(ce), ce->name);
+			ce = ieot.cache[i];
+			printf("%06o %s %d\t%s\n", ce->ce_mode,
+				oid_to_hex(&ce->oid), ce_stage(ce), ce->name);
+
+			printf("cache entry %d does not match", i);
+			err = 1;
+			break;
+		}
+	}
+
+	discard_index(&ieot);
+	discard_index(&index);
+	if (!err)
+		printf("Cache entires are the same\n");
+	return err;
+#else
+	die("ieot only supported with PTHREADS");
+	return -1;
+#endif
+}
diff --git a/t/t1800-fast-index.sh b/t/t1800-fast-index.sh
new file mode 100644
index 0000000000..b3460f1698
--- /dev/null
+++ b/t/t1800-fast-index.sh
@@ -0,0 +1,55 @@
+#!/bin/sh
+
+test_description='git fast index tests'
+
+. ./test-lib.sh
+
+GIT_FASTINDEX_TEST=1; export GIT_FASTINDEX_TEST
+
+test_expect_success 'setup' '
+	: >tracked &&
+	: >modified &&
+	mkdir dir1 &&
+	: >dir1/tracked &&
+	: >dir1/modified &&
+	mkdir dir2 &&
+	: >dir2/tracked &&
+	: >dir2/modified &&
+	git add . &&
+	git commit -m initial &&
+	cat >.gitignore <<-\EOF
+	.gitignore
+	expect*
+	actual*
+	EOF
+'
+
+test_expect_success 'fastindex extension is off by default' '
+	test_must_fail test-dump-fast-index >actual 2>&1 &&
+	grep "^missing or invalid extension" actual
+'
+
+test_expect_success 'update-index --fastindex" adds the fsmonitor extension' '
+	git update-index --fastindex &&
+	test-dump-fast-index >actual &&
+	grep "^IEOT with" actual
+'
+
+test_expect_success 'update-index --no-fastindex" removes the fastindex extension' '
+	git update-index --no-fastindex &&
+	test_must_fail test-dump-fast-index >actual &&
+	grep "^missing or invalid extension" actual
+'
+
+test_expect_success 'verify with and without fastindex returns same result' '
+	git update-index --fastindex &&
+	test-fast-index
+'
+
+test_expect_success 'test with V4 index' '
+	git config core.fastindex 1 &&
+	git update-index --index-version 4 &&
+	test-fast-index
+'
+
+test_done
-- 
2.15.0.windows.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 4/4] fastindex: add documentation for the fastindex extension
  2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
                   ` (2 preceding siblings ...)
  2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
@ 2017-11-09 14:17 ` Ben Peart
  3 siblings, 0 replies; 19+ messages in thread
From: Ben Peart @ 2017-11-09 14:17 UTC (permalink / raw)
  To: git
  Cc: gitster, pclouds, chriscool, Johannes.Schindelin, alexmv, peff,
	Ben Peart

This includes the core.fastindex setting, the update-index additions,
and the fastindex index extension.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Documentation/config.txt                 |  8 ++++++++
 Documentation/git-update-index.txt       | 11 +++++++++++
 Documentation/technical/index-format.txt | 26 ++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5f0d62753d..269ff97c8c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -868,6 +868,14 @@ relatively high IO latencies.  When enabled, Git will do the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.  Defaults to true.
 
+core.fastIndex::
+	Enable parallel index loading
++
+This can speed up operations like 'git diff' and 'git status' especially
+when the index is very large.  When enabled, Git will do the index
+loading from the on disk format to the in-memory format in parallel.
+Defaults to false.
+
 core.createObject::
 	You can set this to 'link', in which case a hardlink followed by
 	a delete of the source are used to make sure that object creation
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 75c7dd9dea..7ffd285c94 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -19,6 +19,7 @@ SYNOPSIS
 	     [--ignore-submodules]
 	     [--[no-]split-index]
 	     [--[no-|test-|force-]untracked-cache]
+	     [--[no-]fastindex]
 	     [--really-refresh] [--unresolve] [--again | -g]
 	     [--info-only] [--index-info]
 	     [-z] [--stdin] [--index-version <n>]
@@ -201,6 +202,16 @@ will remove the intended effect of the option.
 	`--untracked-cache` used to imply `--test-untracked-cache` but
 	this option would enable the extension unconditionally.
 
+--fastindex::
+--no-fastindex::
+	Enable or disable the fast index feature. These options
+	take effect whatever the value of the `core.fastindex`
+	configuration variable (see linkgit:git-config[1]). But a warning
+	is emitted when the change goes against the configured value, as
+	the configured value will take effect next time the index is
+	read and this will remove the intended effect of the option.
+
+
 \--::
 	Do not interpret any more arguments as options.
 
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index ade0b0c445..e37b4cf874 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,29 @@ The remaining data of each directory block is grouped by type:
     in the previous ewah bitmap.
 
   - One NUL.
+
+== Index Entry Offset Table
+
+  The Index Entry Offset Table (IEOT) is used to help address the CPU
+  cost of loading the index by enabling multi-threading the process of
+  converting cache entries from the on-disk format to the in-memory format.
+  Because it must be able to be loaded before the variable length cache
+  entries and other index extensions, this extension must be written last.
+  The signature for this extension is { 'I', 'E', 'O', 'T' }.
+
+  The extension consists of:
+
+  - 32-bit version (currently 1)
+
+  - A number of index offset entries each consisting of:
+
+    - 32-bit offset from the begining of the file to the first cache entry
+	  in this block of entries.
+
+    - 32-bit count of cache entries in this block
+
+  - 32-bit version (currently 1)
+
+  - 32-bit size of the extension
+
+  - 4-byte extension signature
-- 
2.15.0.windows.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
@ 2017-11-10  4:46   ` Junio C Hamano
  2017-11-13 16:42     ` Ben Peart
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2017-11-10  4:46 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, pclouds, chriscool, Johannes.Schindelin, alexmv, peff

Ben Peart <benpeart@microsoft.com> writes:

> To make this work for V4 indexes, when writing the index, it periodically
> "resets" the prefix-compression by encoding the current entry as if the
> path name for the previous entry is completely different and saves the
> offset of that entry in the IEOT.  Basically, with V4 indexes, it
> generates offsets into blocks of prefix-compressed entries.

You specifically mentioned about this change in your earlier
correspondence on this series, and it reminds me of Shawn's reftable
proposal that also is heavily depended on prefix compression with
occasional reset to allow scanning from an entry in the middle.  I
find it a totally sensible design choice.

> To enable reading the IEOT extension before reading all the variable
> length cache entries and other extensions, the IEOT is written last,
> right before the trailing SHA1.

OK, we have already closed the door for further enhancements to
place something other than the index extensions after this block by
mistakenly made it the rule that the end of the series of extended
sections must coincide with the beginning of the trailing checksum,
so it does not sound all that bad to insist on this particular one
to be the last, I guess.  But see below...

> The format of that extension has the signature bytes and size at the
> beginning (like a normal extension) as well as at the end in reverse
> order to enable parsing the extension by seeking back from the end of
> the file.

I think this is a reasonable workaround to allow a single extension
that needs to be read before the main index is loaded.  

But I'd suggest taking this approach one step further.  Instead,
what if we introduce a new extension EOIE ("end of index entries")
whose sole purpose is to sit at the end of the series of extensions
and point at the beginning of the index extension section of the
file, to tell you where to seek in order to skip the main index?

That way, you can

 - seek to the end of the index file;

 - go backward skiping the trailing file checksum;

 - now you might be at the end of the EOIE extension.  seek back
   necessary number of bytes and verify EOIE header, pick up the
   recorded file offset of the beginning of the extensions section.

 - The 4-byte sequence you found may happen to match EOIE but that
   is not enough to be sure that you indeed have such an extension.
   So the following must be done carefully, allowing the possibility
   that there wasn't any EOIE extension at the end.
   Seek back to that offset, and repeat these three steps to skip
   over all extensions:

   - read the (possible) 4-byte header
   - read the (possible) extsize (validate that this is a sane value)
   - skip that many bytes

   until you come back to the location you assumed that you found
   your EOIE header, to make sure you did not get fooled by bytes
   that happened to look like one.  Some "extsize" you picked up
   during that process may lead you beyond the end of the index
   file, which would be a sure sign that what you found at the end
   of the index file was *not* the EOIE extension but a part of
   some other extension who happened to have these four bytes at the
   right place.

which would be a lot more robust way to allow any extensions to be
read before the main body of the index.

And a lot more importantly, we will leave the door open to allow
more than one index extensions that we would prefer to read before
reading the main body if we do it this way, because we can easily
skip things over without spending cycles once we have a robust way
to find where the end of the main index is.  After all, the reason
you placed IEOT at the end is not because you wanted to have it at
the very end.  You only wanted to be able to find where it is
without having to parse the variable-length records of the main
index.  And there may be other index extensions that wants to be
readable without reading the main part just like IEOT, and we do not
want to close the door for them.

Hmm?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-10  4:46   ` Junio C Hamano
@ 2017-11-13 16:42     ` Ben Peart
  2017-11-14  1:10       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Peart @ 2017-11-13 16:42 UTC (permalink / raw)
  To: Junio C Hamano, Ben Peart
  Cc: git, pclouds, chriscool, Johannes.Schindelin, alexmv, peff

On 11/9/2017 11:46 PM, Junio C Hamano wrote:
> Ben Peart <benpeart@microsoft.com> writes:
> 
>> To make this work for V4 indexes, when writing the index, it periodically
>> "resets" the prefix-compression by encoding the current entry as if the
>> path name for the previous entry is completely different and saves the
>> offset of that entry in the IEOT.  Basically, with V4 indexes, it
>> generates offsets into blocks of prefix-compressed entries.
> 
> You specifically mentioned about this change in your earlier
> correspondence on this series, and it reminds me of Shawn's reftable
> proposal that also is heavily depended on prefix compression with
> occasional reset to allow scanning from an entry in the middle.  I
> find it a totally sensible design choice.
> 

Great! I didn't realize there was already precedent for this in other 
patches.  I guess good minds think alike... :)

>> To enable reading the IEOT extension before reading all the variable
>> length cache entries and other extensions, the IEOT is written last,
>> right before the trailing SHA1.
> 
> OK, we have already closed the door for further enhancements to
> place something other than the index extensions after this block by
> mistakenly made it the rule that the end of the series of extended
> sections must coincide with the beginning of the trailing checksum,
> so it does not sound all that bad to insist on this particular one
> to be the last, I guess.  But see below...
> 
>> The format of that extension has the signature bytes and size at the
>> beginning (like a normal extension) as well as at the end in reverse
>> order to enable parsing the extension by seeking back from the end of
>> the file.
> 
> I think this is a reasonable workaround to allow a single extension
> that needs to be read before the main index is loaded.
> 
> But I'd suggest taking this approach one step further.  Instead,
> what if we introduce a new extension EOIE ("end of index entries")
> whose sole purpose is to sit at the end of the series of extensions
> and point at the beginning of the index extension section of the
> file, to tell you where to seek in order to skip the main index?
> 
> That way, you can
> 
>   - seek to the end of the index file;
> 
>   - go backward skiping the trailing file checksum;
> 
>   - now you might be at the end of the EOIE extension.  seek back
>     necessary number of bytes and verify EOIE header, pick up the
>     recorded file offset of the beginning of the extensions section.
> 
>   - The 4-byte sequence you found may happen to match EOIE but that
>     is not enough to be sure that you indeed have such an extension.
>     So the following must be done carefully, allowing the possibility
>     that there wasn't any EOIE extension at the end.
>     Seek back to that offset, and repeat these three steps to skip
>     over all extensions:
> 
>     - read the (possible) 4-byte header
>     - read the (possible) extsize (validate that this is a sane value)
>     - skip that many bytes
> 
>     until you come back to the location you assumed that you found
>     your EOIE header, to make sure you did not get fooled by bytes
>     that happened to look like one.  Some "extsize" you picked up
>     during that process may lead you beyond the end of the index
>     file, which would be a sure sign that what you found at the end
>     of the index file was *not* the EOIE extension but a part of
>     some other extension who happened to have these four bytes at the
>     right place.
> 

I'm doing this careful examination and verification in the patch 
already.  Please look closely at read_ieot_extension() to make sure 
there isn't a test I should be doing but missed.

> which would be a lot more robust way to allow any extensions to be
> read before the main body of the index.
> 
> And a lot more importantly, we will leave the door open to allow
> more than one index extensions that we would prefer to read before
> reading the main body if we do it this way, because we can easily
> skip things over without spending cycles once we have a robust way
> to find where the end of the main index is.  After all, the reason
> you placed IEOT at the end is not because you wanted to have it at
> the very end.  You only wanted to be able to find where it is
> without having to parse the variable-length records of the main
> index.  And there may be other index extensions that wants to be
> readable without reading the main part just like IEOT, and we do not
> want to close the door for them.
> 

The proposed format for extensions (ie having both a header and a footer 
with name and size) in the current patch already enables having multiple 
extensions that can be parsed forward or backward.  Any extensions that 
would want to be parse-able in reverse would just all need to be written 
one after the other after right before the trailing SHA1 (and of course, 
include the proper footer).  The same logic that parses the IEOT 
extension could be extended to continue looking backwards through the 
file parsing extensions until it encounters an unknown signature, 
invalid size, etc.

That said, the IEOT is essentially a table of contents into the index (I 
hesitate to call it an index of the index as that could get confusing 
:)).  While in the current iteration of this patch it only contains 
offsets into blocks of cache entries that are independently parse-able, 
I had initially considered including an offset into the index where the 
extensions start just as you suggest above.

When I looked at the existing extensions to see if they could be sped up 
by loading them on parallel threads I found that some of them 
assume/require they are processed in a specific order so there was no 
real benefit to loading them in parallel threads.

I ended up removing the offset to the first extension because I've found 
that trying to speculatively design a feature for a potential future 
need often doesn't result in what you actually need once you get there. 
:)  Instead, I made sure there was a version number in the IEOT 
structure so if we actually needed a way to extend the table of contents 
at some point in the future (like to include an offset to the 
extensions), we could easily add it and increment the IEOT version.

I could add the offset back in but currently nothing would use it so 
we're just adding additional complexity to calculate and parse it with 
with no (current) benefit.

How about I rename this the "Index Table of Contents" (ITOC) to make 
it's potential more clear and if/when we need to enhance it with an 
offset to the extensions, we do so at that point in time?

> Hmm?
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-13 16:42     ` Ben Peart
@ 2017-11-14  1:10       ` Junio C Hamano
  2017-11-14 14:31         ` Ben Peart
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2017-11-14  1:10 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff

Ben Peart <peartben@gmail.com> writes:

> The proposed format for extensions (ie having both a header and a
> footer with name and size) in the current patch already enables having
> multiple extensions that can be parsed forward or backward.  Any
> extensions that would want to be parse-able in reverse would just all
> need to be written one after the other after right before the trailing
> SHA1 (and of course, include the proper footer).

In other words, anything that wants to be scannable from the tail is
welcome to reimplement the same validation structure used by IEOT to
check the section specific sanity constraint, and this series is not
interested in introducing a more general infrastructure to make it
easy for code that want to find where each extension section in the
file begins without pasing the body of the index.

I find it somewhat unsatisfactory in that it is a fundamental change
to allow jumping to the start of an extension section from the tail
that can benefit any future codepath, and have expected a feature
neutral extension whose sole purpose is to do so [*1*].

That way, extension sections whose names are all-caps can stay to be
optional, even if they allow locating from the tail of the file.  If
you require them to implement the same validation struture as IEOT
to perform section specific sanity constraint and also require them
to be placed consecutively at the end, the reader MUST know about
all such extensions---otherwise they cannot scan backwards and find
ones that appear before an unknown but optional one.  If you keep an
extension section at the end whose sole purpose is to point at the
beginning of extension sections, the reader can then scan forward as
usual, skipping over unknown but optional ones, and reading your
IEOT can merely be an user (and the first user) of that more generic
feature that is futureproof, no?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-14  1:10       ` Junio C Hamano
@ 2017-11-14 14:31         ` Ben Peart
  2017-11-14 15:04           ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Peart @ 2017-11-14 14:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff

On 11/13/2017 8:10 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
> 
>> The proposed format for extensions (ie having both a header and a
>> footer with name and size) in the current patch already enables having
>> multiple extensions that can be parsed forward or backward.  Any
>> extensions that would want to be parse-able in reverse would just all
>> need to be written one after the other after right before the trailing
>> SHA1 (and of course, include the proper footer).
> 
> In other words, anything that wants to be scannable from the tail is
> welcome to reimplement the same validation structure used by IEOT to
> check the section specific sanity constraint, and this series is not
> interested in introducing a more general infrastructure to make it
> easy for code that want to find where each extension section in the
> file begins without pasing the body of the index.
> 
> I find it somewhat unsatisfactory in that it is a fundamental change
> to allow jumping to the start of an extension section from the tail
> that can benefit any future codepath, and have expected a feature
> neutral extension whose sole purpose is to do so [*1*].
> 
> That way, extension sections whose names are all-caps can stay to be
> optional, even if they allow locating from the tail of the file.  If
> you require them to implement the same validation struture as IEOT
> to perform section specific sanity constraint and also require them
> to be placed consecutively at the end, the reader MUST know about
> all such extensions---otherwise they cannot scan backwards and find
> ones that appear before an unknown but optional one.  If you keep an
> extension section at the end whose sole purpose is to point at the
> beginning of extension sections, the reader can then scan forward as
> usual, skipping over unknown but optional ones, and reading your
> IEOT can merely be an user (and the first user) of that more generic
> feature that is futureproof, no?
> 
> 

How about I add the logic to write out a special extension right before 
the SHA1 that contains an offset to the beginning of the extensions 
section.  I will also add the logic in do_read_index() to search for and 
load this special extension if it exists.

This will provide a common framework for any future extension to take 
advantage of if it wants to be loaded/processed before or in parallel 
with the cache entries or other extensions.

For all existing extensions that assume they are loaded _after_ the 
cache entries, in do_read_index() I'll add the logic to use the offset 
(if it exists) to adjust the src_offset and then load them normally.

Given the IEOT extension is just another list of offsets into the index 
to enable out of order processing, I'll add those offsets into the same 
extension so that it is a more generic "table of contents" for the 
entire index.  This enables us to have common/reusable way to have 
random access to _all_ sections in the index while maintaining backwards 
comparability with the existing index formats and code.

These additional offsets will initially only be used to parallelize the 
loading of cache entries and only if the user explicitly enables that 
option but I can think of other interesting uses for them in the future.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-14 14:31         ` Ben Peart
@ 2017-11-14 15:04           ` Junio C Hamano
  2017-11-14 15:40             ` Ben Peart
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2017-11-14 15:04 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff

Ben Peart <peartben@gmail.com> writes:

> How about I add the logic to write out a special extension right
> before the SHA1 that contains an offset to the beginning of the
> extensions section.  I will also add the logic in do_read_index() to
> search for and load this special extension if it exists.
>
> This will provide a common framework for any future extension to take
> advantage of if it wants to be loaded/processed before or in parallel
> with the cache entries or other extensions.
>
> For all existing extensions that assume they are loaded _after_ the
> cache entries, in do_read_index() I'll add the logic to use the offset
> (if it exists) to adjust the src_offset and then load them normally.
>
> Given the IEOT extension is just another list of offsets into the
> index to enable out of order processing, I'll add those offsets into
> the same extension so that it is a more generic "table of contents"
> for the entire index.  This enables us to have common/reusable way to
> have random access to _all_ sections in the index while maintaining
> backwards comparability with the existing index formats and code.
>
> These additional offsets will initially only be used to parallelize
> the loading of cache entries and only if the user explicitly enables
> that option but I can think of other interesting uses for them in the
> future.

If we freeze the format of IEOT extension so that we can guarantee
that the very first version of Git that understands IEOT can always
find the beginning of extension section in an index file that was
written by future versions of Git, then I'm all for that plan, but
my impression was that you are planning to make incompatible change
in the future to IEOT, judging from the waythat IEOT records its own
version number in the section and the reader uses it to reject an
unknown one.

With that plan, what I suspect would happen is that a version of Git
that understands another optional extension section that wants to be
findable without scanning the main table and the then-current
version of IEOT would not be able to use an index file written by a
new version of Git that enhances the format of the IEOT extension
bumps its extension version.

And if that is the case I would have to say that I strongly suspect
that you would regret the design decision to mix it into IEOT.  That
is why I keep suggesting that the back pointer extension should be
on its own, minimizing what it does and minimizing the need to be
updated across versions of Git.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-14 15:04           ` Junio C Hamano
@ 2017-11-14 15:40             ` Ben Peart
  2017-11-15  1:12               ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Peart @ 2017-11-14 15:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff



On 11/14/2017 10:04 AM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
> 
>> How about I add the logic to write out a special extension right
>> before the SHA1 that contains an offset to the beginning of the
>> extensions section.  I will also add the logic in do_read_index() to
>> search for and load this special extension if it exists.
>>
>> This will provide a common framework for any future extension to take
>> advantage of if it wants to be loaded/processed before or in parallel
>> with the cache entries or other extensions.
>>
>> For all existing extensions that assume they are loaded _after_ the
>> cache entries, in do_read_index() I'll add the logic to use the offset
>> (if it exists) to adjust the src_offset and then load them normally.
>>
>> Given the IEOT extension is just another list of offsets into the
>> index to enable out of order processing, I'll add those offsets into
>> the same extension so that it is a more generic "table of contents"
>> for the entire index.  This enables us to have common/reusable way to
>> have random access to _all_ sections in the index while maintaining
>> backwards comparability with the existing index formats and code.
>>
>> These additional offsets will initially only be used to parallelize
>> the loading of cache entries and only if the user explicitly enables
>> that option but I can think of other interesting uses for them in the
>> future.
> 
> If we freeze the format of IEOT extension so that we can guarantee
> that the very first version of Git that understands IEOT can always
> find the beginning of extension section in an index file that was
> written by future versions of Git, then I'm all for that plan, but
> my impression was that you are planning to make incompatible change
> in the future to IEOT, judging from the way that IEOT records its own
> version number in the section and the reader uses it to reject an
> unknown one.

I have no thoughts or plans for changes in the future of IEOT (which I 
plan to rename ITOC).  At this point in time, I can't even imagine what 
else we'd want as the index only contains cache entries, extensions and 
the trailing SHA1 and with the TOC we have random access to each of 
those.  If nothing else, it gives another signature to verify to ensure 
we actually have a valid trailing extension. :)

I only added the version because I've learned my ability to predict the 
future is pretty bad and it leaves us that option _if_ something ever 
comes up that we need it _and_ are willing to live with the significant 
negative trade-offs you correctly point out.

> 
> With that plan, what I suspect would happen is that a version of Git
> that understands another optional extension section that wants to be
> findable without scanning the main table and the then-current
> version of IEOT would not be able to use an index file written by a
> new version of Git that enhances the format of the IEOT extension
> bumps its extension version.
> 
> And if that is the case I would have to say that I strongly suspect
> that you would regret the design decision to mix it into IEOT.  That
> is why I keep suggesting that the back pointer extension should be
> on its own, minimizing what it does and minimizing the need to be
> updated across versions of Git.
> 

I understand the risk but the list of offsets into the cache entries is 
pretty simple as well. I prefer the simplicity of a single TOC extension 
that gives us random access to the entire index rather than having to 
piece one together using multiple extensions.  That model has its own 
set of risks and tradeoffs.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-14 15:40             ` Ben Peart
@ 2017-11-15  1:12               ` Junio C Hamano
  2017-11-15  4:16                 ` Ben Peart
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2017-11-15  1:12 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff

Ben Peart <peartben@gmail.com> writes:

> I have no thoughts or plans for changes in the future of IEOT (which I
> plan to rename ITOC).  At this point in time, I can't even imagine
> what else we'd want as the index only contains cache entries, ...

Yeah, but the thing is that this is open source, and the imagination
of the originator of the initial idea does not limit how the code
and data structure evolves.

Back when I added the index extensions to the system, I didn't have
perfect foresight, and I didn't have specific plans to add anything
beyond "TREE" to optimize "write-tree" and "diff-index --cached".

In hindsight, I got one thing right and one thing wrong.

Even though I didn't have any plan to add a mandatory extension, I
made the code to ignore optional ones and error out on mandatory
ones if an index extension section is not understood.  It turns out
that later (in fact much later---the "TREE" extension dates back to
April 2006, while "link" came in June 2014) we could add the
split-index mode without having to worry about older versions of Git
doing random wrong things when they see this new extension, thanks
to that design decision.  That went well.

On the other hand, I didn't think things through to realize that
some operations may want to peek only the extensions without ever
using the main table, that some other operations may want to read
some extensions first before reading the main table, or more
importantly, that these limitations would start mattering once Git
becomes popular enough and starts housing a project tree with very
many paths in the main table.  

I really wish somebody had brought it up as a potential issue during
the review---I would have defined the extended index format to have
the simple extension at the end that points back to the tail end of
the main table back then, and we wouldn't be having this back and
forth now.  But I was just too happy and excited that I have found a
room to squeeze extension sections into the index file format
without breaking existing implementations of Git (which checked the
trailer checksum matches to the result of hashing the whole thing,
and read the recorded number of entries from the main table, without
even noticing that there is a gap in between), and that excitement
blinded me.

> I understand the risk but the list of offsets into the cache entries
> is pretty simple as well. I prefer the simplicity of a single TOC
> extension that gives us random access to the entire index rather than
> having to piece one together using multiple extensions.  That model
> has its own set of risks and tradeoffs.

I thought that you are not using the "where does the series of
extensions begin" information in the first place, no?  That piece of
information is useful independent of the usefulness of "index into
the main table to list entries where the prefix-compression is
reset".  So if anything, I'd prefer the simplicity of introducing
that "where does the series of extensions begin" that does not say
anything else, and build on other things like ITOC as mere users of
the mechanism.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-15  1:12               ` Junio C Hamano
@ 2017-11-15  4:16                 ` Ben Peart
  2017-11-15  4:40                   ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Peart @ 2017-11-15  4:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff



On 11/14/2017 8:12 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
> 
>> I have no thoughts or plans for changes in the future of IEOT (which I
>> plan to rename ITOC).  At this point in time, I can't even imagine
>> what else we'd want as the index only contains cache entries, ...
> 
> Yeah, but the thing is that this is open source, and the imagination
> of the originator of the initial idea does not limit how the code
> and data structure evolves.
> 
> Back when I added the index extensions to the system, I didn't have
> perfect foresight, and I didn't have specific plans to add anything
> beyond "TREE" to optimize "write-tree" and "diff-index --cached".
> 
> In hindsight, I got one thing right and one thing wrong.
> 
> Even though I didn't have any plan to add a mandatory extension, I
> made the code to ignore optional ones and error out on mandatory
> ones if an index extension section is not understood.  It turns out
> that later (in fact much later---the "TREE" extension dates back to
> April 2006, while "link" came in June 2014) we could add the
> split-index mode without having to worry about older versions of Git
> doing random wrong things when they see this new extension, thanks
> to that design decision.  That went well.
> 
> On the other hand, I didn't think things through to realize that
> some operations may want to peek only the extensions without ever
> using the main table, that some other operations may want to read
> some extensions first before reading the main table, or more
> importantly, that these limitations would start mattering once Git
> becomes popular enough and starts housing a project tree with very
> many paths in the main table.
> 
> I really wish somebody had brought it up as a potential issue during
> the review---I would have defined the extended index format to have
> the simple extension at the end that points back to the tail end of
> the main table back then, and we wouldn't be having this back and
> forth now.  But I was just too happy and excited that I have found a
> room to squeeze extension sections into the index file format
> without breaking existing implementations of Git (which checked the
> trailer checksum matches to the result of hashing the whole thing,
> and read the recorded number of entries from the main table, without
> even noticing that there is a gap in between), and that excitement
> blinded me.
> 
>> I understand the risk but the list of offsets into the cache entries
>> is pretty simple as well. I prefer the simplicity of a single TOC
>> extension that gives us random access to the entire index rather than
>> having to piece one together using multiple extensions.  That model
>> has its own set of risks and tradeoffs.
> 
> I thought that you are not using the "where does the series of
> extensions begin" information in the first place, no?  That piece of
> information is useful independent of the usefulness of "index into
> the main table to list entries where the prefix-compression is
> reset".  So if anything, I'd prefer the simplicity of introducing
> that "where does the series of extensions begin" that does not say
> anything else, and build on other things like ITOC as mere users of
> the mechanism.
> 

OK.  I'll call this new extension "EOIE" ("end of index entries"). Other 
than the standard header/footer, it will only contain a 32 bit offset to 
the beginning of the extension entries.  I'll always write out that 
extension unless you would prefer it be behind a setting (if so, what do 
you want it called and what should the default be)?  I won't add support 
in update-index for this extension.

Since the goal was to find a way to load the IEOT extension before the 
cache entries, I'll also refactor the extension reading loop into a 
function that takes a function pointer and add a 
preread_index_extension() function that can be passed in when that loop 
is run before the cache entries are loaded.  When the loop is run again 
after the cache entries are loaded, it will pass in the existing 
read_index_extension() function.  Extensions can then choose which 
function they want to be loaded in.

The code to read/write/use the IEOT to parallelize the cache entry 
loading will stay behind a config setting that defaults to false (at 
least for now).  I'll stick with "core.fastindex" until someone can 
(please) propose a better name.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-15  4:16                 ` Ben Peart
@ 2017-11-15  4:40                   ` Junio C Hamano
  2017-11-20 14:01                     ` Ben Peart
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2017-11-15  4:40 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff

Ben Peart <peartben@gmail.com> writes:

> OK.  I'll call this new extension "EOIE" ("end of index
> entries"). Other than the standard header/footer, it will only contain
> a 32 bit offset to the beginning of the extension entries.  I'll
> always write out that extension unless you would prefer it be behind a
> setting (if so, what do you want it called and what should the default
> be)?  I won't add support in update-index for this extension.

To make it robust, if I were doing this, I would at least add a checksum
of some sort.  

As each extension section consists of 4-byte extension type, 4-byte
size, followed by that many bytes of the "meat" of the section, what
I had in mind when I suggested this backpointer was something like:

    "EOIE" <32-bit size> <32-bit offset> <20-byte hash>

where the size of the extension section is obviously 24-byte to
cover the offset plus hash, and the hash is computed over extension
types and their sizes (but not their contents---this is not about
protecting against file corruption and not worth wasting the cycles
for hashing) for all the extension sections this index file has
(except for "EOIE" at the end, for obvious reasons).  E.g. if we
have "TREE" extension that is N-bytes long, "REUC" extension that is
M-bytes long, followed by "EOIE", then the hash would be

	SHA-1("TREE" + <binary representation of N> +
	      "REUC" + <binary representation of M>)

Then the reader would

 - Seek back 32-byte from the trailer to ensure it sees "EOIE"
   followed by a correct size (24?)

 - Jump to the offset and find 4-bytes that presumably is the type
   of the first extension, followed by its size.  

 - Feed these 8-bytes to the hash, skip that section based on its
   size (while making sure we won't run off the end of the file,
   which is a sign that we thought EOIE exists when there wasn't).
   Repeat this until we hit where we found "EOIE" (or we notice our
   mistake by overrunning it).

 - Check the hash to make sure we got it right.

> Since the goal was to find a way to load the IEOT extension before the
> cache entries, I'll also refactor the extension reading loop into a
> function that takes a function pointer and add a
> preread_index_extension() function that can be passed in when that
> loop is run before the cache entries are loaded.  When the loop is run
> again after the cache entries are loaded, it will pass in the existing
> read_index_extension() function.  Extensions can then choose which
> function they want to be loaded in.
>
> The code to read/write/use the IEOT to parallelize the cache entry
> loading will stay behind a config setting that defaults to false (at
> least for now).  I'll stick with "core.fastindex" until someone can
> (please) propose a better name.

Sounds good.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-15  4:40                   ` Junio C Hamano
@ 2017-11-20 14:01                     ` Ben Peart
  2017-11-20 14:20                       ` Jeff King
  2017-11-20 23:51                       ` Ramsay Jones
  0 siblings, 2 replies; 19+ messages in thread
From: Ben Peart @ 2017-11-20 14:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff

Further testing has revealed that switching from the regular heap to a 
refactored version of the mem_pool in fast-import.c produces similar 
gains as parallelizing do_index_load().  This appears to be a much 
simpler patch for similar gains so we will be pursuing that path.

Combining the two patches resulted in a further 25% reduction of 
do_index_load() but with index load times getting smaller, that only 
resulted in a 3% improvement in total git command time.  Given the 
greater complexity of this patch, we've decided to put it on hold for 
now.  We'll keep it in our back pocket in case we need it in the future.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-20 14:01                     ` Ben Peart
@ 2017-11-20 14:20                       ` Jeff King
  2017-11-20 15:38                         ` Jeff King
  2017-11-20 23:51                       ` Ramsay Jones
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2017-11-20 14:20 UTC (permalink / raw)
  To: Ben Peart
  Cc: Junio C Hamano, Ben Peart, git, pclouds, chriscool,
	Johannes.Schindelin, alexmv

On Mon, Nov 20, 2017 at 09:01:45AM -0500, Ben Peart wrote:

> Further testing has revealed that switching from the regular heap to a
> refactored version of the mem_pool in fast-import.c produces similar gains
> as parallelizing do_index_load().  This appears to be a much simpler patch
> for similar gains so we will be pursuing that path.

That sounds like a pretty easy win for index entries, which tend to
stick around in big clumps.

Out of curiosity, have you tried experimenting with any high-performance
3rd-party allocator libraries? I've often wondered if we could get a
performance improvement from dropping in a new allocator, but was never
able to measure any real benefit over glibc's ptmalloc2. The situation
might be different on Windows, though (i.e., if the libc allocator isn't
that great).

Most of the high-performance allocators are focused on concurrency,
which usually isn't a big deal for git. But tcmalloc, at least, claims
to be about 6x faster than glibc.

The reason I ask is that we could possibly get the same wins without
writing a single line of code. And it could apply across the whole
code-base, not just the index code. I don't know how close a general
purpose allocator could come to a pooled implementation, though. You're
inherently making a tradeoff with a pool in not being able to free
individual entries.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-20 14:20                       ` Jeff King
@ 2017-11-20 15:38                         ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2017-11-20 15:38 UTC (permalink / raw)
  To: Ben Peart
  Cc: Junio C Hamano, Ben Peart, git, pclouds, chriscool,
	Johannes.Schindelin, alexmv

On Mon, Nov 20, 2017 at 09:20:35AM -0500, Jeff King wrote:

> Out of curiosity, have you tried experimenting with any high-performance
> 3rd-party allocator libraries? I've often wondered if we could get a
> performance improvement from dropping in a new allocator, but was never
> able to measure any real benefit over glibc's ptmalloc2. The situation
> might be different on Windows, though (i.e., if the libc allocator isn't
> that great).

Just linking with tcmalloc, like:

diff --git a/Makefile b/Makefile
index ee9d5eb11e..4f299cd914 100644
--- a/Makefile
+++ b/Makefile
@@ -1018,7 +1018,7 @@ BUILTIN_OBJS += builtin/worktree.o
 BUILTIN_OBJS += builtin/write-tree.o
 
 GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
-EXTLIBS =
+EXTLIBS = -ltcmalloc
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
 

seems to consistently show about 30% speedup on p0002:

Test                                          HEAD^             HEAD                  
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.19(0.18+0.01)   0.13(0.12+0.01) -31.6%

I don't think we really even need a patch for it. You should be able to
just build with:

  make EXT_LIBS=-ltcmalloc

I can't think of any real advantage to a Makefile knob except
discoverability.

-Peff

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-20 14:01                     ` Ben Peart
  2017-11-20 14:20                       ` Jeff King
@ 2017-11-20 23:51                       ` Ramsay Jones
  2017-11-21  0:45                         ` Ben Peart
  1 sibling, 1 reply; 19+ messages in thread
From: Ramsay Jones @ 2017-11-20 23:51 UTC (permalink / raw)
  To: Ben Peart, Junio C Hamano
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff



On 20/11/17 14:01, Ben Peart wrote:
> Further testing has revealed that switching from the regular heap to a refactored version of the mem_pool in fast-import.c produces similar gains as parallelizing do_index_load().  This appears to be a much simpler patch for similar gains so we will be pursuing that path.
> 
> Combining the two patches resulted in a further 25% reduction of do_index_load() but with index load times getting smaller, that only resulted in a 3% improvement in total git command time.  Given the greater complexity of this patch, we've decided to put it on hold for now.  We'll keep it in our back pocket in case we need it in the future.

Since you are withdrawing the 'bp/fastindex' branch, I won't send the
patch to fix up the sparse warnings (unless you would like to see it
before you put it in your back pocket). ;-)

ATB,
Ramsay Jones



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/4] fastindex: speed up index load through parallelization
  2017-11-20 23:51                       ` Ramsay Jones
@ 2017-11-21  0:45                         ` Ben Peart
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Peart @ 2017-11-21  0:45 UTC (permalink / raw)
  To: Ramsay Jones, Junio C Hamano
  Cc: Ben Peart, git, pclouds, chriscool, Johannes.Schindelin, alexmv,
	peff



On 11/20/2017 6:51 PM, Ramsay Jones wrote:
> 
> 
> On 20/11/17 14:01, Ben Peart wrote:
>> Further testing has revealed that switching from the regular heap to a refactored version of the mem_pool in fast-import.c produces similar gains as parallelizing do_index_load().  This appears to be a much simpler patch for similar gains so we will be pursuing that path.
>>
>> Combining the two patches resulted in a further 25% reduction of do_index_load() but with index load times getting smaller, that only resulted in a 3% improvement in total git command time.  Given the greater complexity of this patch, we've decided to put it on hold for now.  We'll keep it in our back pocket in case we need it in the future.
> 
> Since you are withdrawing the 'bp/fastindex' branch, I won't send the
> patch to fix up the sparse warnings (unless you would like to see it
> before you put it in your back pocket). ;-)
> 
> ATB,
> Ramsay Jones
> 
> 

I'm happy to take your patches as I am keeping a copy just in case we 
need to revisit it.

I'm actually interested to see how Peff's suggestion to use tcmalloc 
helps because nearly 90% of the remaining execution time was spent 
waiting for a lock in malloc due to all contention from the multi-threading.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-11-21  0:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 14:17 [PATCH v1 0/4] Speed up index load through parallelization Ben Peart
2017-11-09 14:17 ` [PATCH v1 1/4] fastindex: speed " Ben Peart
2017-11-10  4:46   ` Junio C Hamano
2017-11-13 16:42     ` Ben Peart
2017-11-14  1:10       ` Junio C Hamano
2017-11-14 14:31         ` Ben Peart
2017-11-14 15:04           ` Junio C Hamano
2017-11-14 15:40             ` Ben Peart
2017-11-15  1:12               ` Junio C Hamano
2017-11-15  4:16                 ` Ben Peart
2017-11-15  4:40                   ` Junio C Hamano
2017-11-20 14:01                     ` Ben Peart
2017-11-20 14:20                       ` Jeff King
2017-11-20 15:38                         ` Jeff King
2017-11-20 23:51                       ` Ramsay Jones
2017-11-21  0:45                         ` Ben Peart
2017-11-09 14:17 ` [PATCH v1 2/4] update-index: add fastindex support to update-index Ben Peart
2017-11-09 14:17 ` [PATCH v1 3/4] fastindex: add test tools and a test script Ben Peart
2017-11-09 14:17 ` [PATCH v1 4/4] fastindex: add documentation for the fastindex extension Ben Peart

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).