git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
@ 2018-07-18 20:45 Ben Peart
  2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
                   ` (4 more replies)
  0 siblings, 5 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-18 20:45 UTC (permalink / raw)
  To: git@vger.kernel.org; +Cc: gitster@pobox.com, Ben Peart

When working directories get big, checkout times start to suffer.  Even with
GVFS virtualization (which limits git to only having to update those files
that have been changed locally) we�re seeing P50 times for checkout of 31
seconds and the P80 time is 43 seconds.

Here is a checkout command with tracing turned on to demonstrate where the
time is spent.  Note, this is somewhat of a �best case� as I�m simply
checking out the current commit:

benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
$ /usr/src/git/git.exe checkout
12:31:50.419016 read-cache.c:2006       performance: 1.180966800 s: read cache .git/index
12:31:51.184636 name-hash.c:605         performance: 0.664575200 s: initialize name hash
12:31:51.200280 preload-index.c:111     performance: 0.019811600 s: preload index
12:31:51.294012 read-cache.c:1543       performance: 0.094515600 s: refresh index
12:32:29.731344 unpack-trees.c:1358     performance: 33.889840200 s: traverse_trees
12:32:37.512555 read-cache.c:2541       performance: 1.564438300 s: write index, changed mask = 28
12:32:44.918730 unpack-trees.c:1358     performance: 7.243155600 s: traverse_trees
12:32:44.965611 diff-lib.c:527          performance: 7.374729200 s: diff-index
Waiting for GVFS to parse index and update placeholder files...Succeeded
12:32:46.824986 trace.c:420             performance: 57.715656000 s: git command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout

Clearly, most of the time (41 seconds) is spent in the traverse_trees() code
so the question is, how can we significantly speed up that portion of the
command?

I investigated a few options with limited success:

ODB cache
=========
Since traverse_trees() hits the ODB for each tree object (of which there are
over 500K in this repo) I wrote and tested having an in-memory ODB cache
that cached all tree objects.  This resulted in a > 50% hit ratio (largely
due to the fact we traverse the tree twice during checkout) but resulted in
only a minimal savings (1.3 seconds).

Tree Graph File
===============
I also considered storing the commit tree in an alternate structure that is
faster to load/parse (ala the Commit graph) but the cache results along with
the negligible impact of running checkout back to back (thus ensuring the
objects were cached in my file system cache) made me believe this would not
result in much savings. MIDX has already helped out here given we end up
with a lot of pack files of commits and trees.

Sparse tree traversal
=====================
We�ve sped up other parts of git by taking advantage of the existing
sparse-checkout/excludes logic to limit what files git has to consider to
those that have been modified by the user locally.  I haven�t been able to
think of a way to take advantage of that with unpack-trees() as when you are
merging n commits, a change/conflict can occur in any tree object so they
must all be traversed.  If I�m missing something here and there _is_ a way
to entirely skip large parts of the tree, please let me know!  Please note
that we�re already limiting the files that git needs to update in the
working directory via sparse-checkout/excludes but the other/merge logic
still executes for the entire tree whether there are files to update or not.

Multi-threading unpack_trees()
==============================
The current model of unpack_trees() is that a single thread recursively
traverses each tree object as it comes across it.  One thought I had was to
multi-thread the traversal so that each tree object could be processed in
parallel.  To test this idea out, I wrote an unbounded
Multi-Product-Multi-Consumer queue and then wrote a
traverse_trees_parallel() function that would add any new tree objects into
the queue where they can be processed by a pool of worker threads.  Each
thread will wake up when there is work in the queue, remove a tree object,
process it adding any additional tree objects it finds.

Multi-threading anything in git is fraught with challenges as much of the
code base is not thread safe.  To make progress, I wrapped mutexes around
code paths that were not thread safe.  The end result is that I won�t
initially get much parallelization (due to mutexes around all the expensive
work) but at least I can test out the idea and resolve any other issues with
switching from a serial to a parallel implementation.  If this works out, I
can update more of the code paths to be thread safe and/or move to more fine
grained mutexes around those paths that are difficult to make thread safe.

Final thoughts
==============

The attached set of patches don�t work!  For some commands they succeed but
I�m including them only to make it explicit what I�m currently investigating.
I�d be very interested in design feedback but formatting/spelling/white
space errors are less useful at this early stage in the investigation.

When I brought up this idea with some other git contributors they mentioned
that multi threading unpack_trees() had been discussed a few years ago on
the list but that the idea was discarded.  They couldn�t remember exactly
why it was discarded and none of us have been able to find the email threads
from that earlier discussion. As a result, I decided to write up this RFC
and see if the greater git community has ideas, suggestions, or more
background/history on whether this is a reasonable path to pursue or if
there are other/better ideas on how to speed up checkout especially on large
repos.


Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/a022a91ceb
Checkout: git fetch https://github.com/benpeart/git unpacktrees-v1 && git checkout a022a91ceb

Ben Peart (3):
  add unbounded Multi-Producer-Multi-Consumer queue
  add performance tracing around traverse_trees() in unpack_trees()
  Add initial parallel version of unpack_trees()

 Makefile       |   1 +
 cache.h        |   1 +
 config.c       |   5 +
 environment.c  |   1 +
 mpmcqueue.c    |  49 ++++++++
 mpmcqueue.h    |  80 +++++++++++++
 unpack-trees.c | 314 ++++++++++++++++++++++++++++++++++++++++++++++++-
 unpack-trees.h |  30 +++++
 8 files changed, 480 insertions(+), 1 deletion(-)
 create mode 100644 mpmcqueue.c
 create mode 100644 mpmcqueue.h


base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9
-- 
2.17.0.gvfs.1.123.g449c066



^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue
  2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
@ 2018-07-18 20:45 ` Ben Peart
  2018-07-18 20:57   ` Stefan Beller
  2018-07-19 19:11   ` Junio C Hamano
  2018-07-18 20:45 ` [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees() Ben Peart
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-18 20:45 UTC (permalink / raw)
  To: git@vger.kernel.org; +Cc: gitster@pobox.com, Ben Peart

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Makefile    |  1 +
 mpmcqueue.c | 49 ++++++++++++++++++++++++++++++++
 mpmcqueue.h | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+)
 create mode 100644 mpmcqueue.c
 create mode 100644 mpmcqueue.h

diff --git a/Makefile b/Makefile
index 0cb6590f24..fdaabf0252 100644
--- a/Makefile
+++ b/Makefile
@@ -890,6 +890,7 @@ LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += mergesort.o
+LIB_OBJS += mpmcqueue.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += notes.o
 LIB_OBJS += notes-cache.o
diff --git a/mpmcqueue.c b/mpmcqueue.c
new file mode 100644
index 0000000000..22411af1b0
--- /dev/null
+++ b/mpmcqueue.c
@@ -0,0 +1,49 @@
+#include "mpmcqueue.h"
+
+void mpmcq_init(struct mpmcq *queue)
+{
+	queue->head = NULL;
+	queue->cancel = 0;
+	pthread_mutex_init(&queue->mutex, NULL);
+	pthread_cond_init(&queue->condition, NULL);
+}
+
+void mpmcq_destroy(struct mpmcq *queue)
+{
+	pthread_mutex_destroy(&queue->mutex);
+	pthread_cond_destroy(&queue->condition);
+}
+
+void mpmcq_push(struct mpmcq *queue, struct mpmcq_entry *entry)
+{
+	pthread_mutex_lock(&queue->mutex);
+	entry->next = queue->head;
+	queue->head = entry;
+	pthread_cond_signal(&queue->condition);
+	pthread_mutex_unlock(&queue->mutex);
+}
+
+struct mpmcq_entry *mpmcq_pop(struct mpmcq *queue)
+{
+	struct mpmcq_entry *entry = NULL;
+
+	pthread_mutex_lock(&queue->mutex);
+	while (!queue->head && !queue->cancel)
+		pthread_cond_wait(&queue->condition, &queue->mutex);
+	if (!queue->cancel) {
+		entry = queue->head;
+		queue->head = entry->next;
+	}
+	pthread_mutex_unlock(&queue->mutex);
+	return entry;
+}
+
+void mpmcq_cancel(struct mpmcq *queue)
+{
+	struct mpmcq_entry *entry;
+
+	pthread_mutex_lock(&queue->mutex);
+	queue->cancel = 1;
+	pthread_cond_broadcast(&queue->condition);
+	pthread_mutex_unlock(&queue->mutex);
+}
diff --git a/mpmcqueue.h b/mpmcqueue.h
new file mode 100644
index 0000000000..7421e06aad
--- /dev/null
+++ b/mpmcqueue.h
@@ -0,0 +1,80 @@
+#ifndef MPMCQUEUE_H
+#define MPMCQUEUE_H
+
+#include "git-compat-util.h"
+#include <pthread.h>
+
+/*
+ * Generic implementation of an unbounded Multi-Producer-Multi-Consumer
+ * queue.
+ */
+
+/*
+ * struct mpmcq_entry is an opaque structure representing an entry in the
+ * queue.
+ */
+struct mpmcq_entry {
+	struct mpmcq_entry *next;
+};
+
+/*
+ * struct mpmcq is the concurrent queue structure. Members should not be
+ * modified directly.
+ */
+struct mpmcq {
+	struct mpmcq_entry *head;
+	pthread_mutex_t mutex;
+	pthread_cond_t condition;
+	int cancel;
+};
+
+/*
+ * Initializes a mpmcq_entry structure.
+ *
+ * `entry` points to the entry to initialize.
+ *
+ * The mpmcq_entry structure does not hold references to external resources,
+ * and it is safe to just discard it once you are done with it (i.e. if
+ * your structure was allocated with xmalloc(), you can just free() it,
+ * and if it is on stack, you can just let it go out of scope).
+ */
+static inline void mpmcq_entry_init(struct mpmcq_entry *entry)
+{
+	entry->next = NULL;
+}
+
+/*
+ * Initializes a mpmcq structure.
+ */
+extern void mpmcq_init(struct mpmcq *queue);
+
+/*
+ * Destroys a mpmcq structure.
+ */
+extern void mpmcq_destroy(struct mpmcq *queue);
+
+/*
+ * Pushes an entry on to the queue.
+ *
+ * `queue` is the mpmcq structure.
+ * `entry` is the entry to push.
+ */
+extern void mpmcq_push(struct mpmcq *queue, struct mpmcq_entry *entry);
+
+/*
+ * Pops an entry off the queue.
+ *
+ * `queue` is the mpmcq structure.
+ *
+ * Returns mpmcq_entry on success, NULL on cancel;
+ */
+extern struct mpmcq_entry *mpmcq_pop(struct mpmcq *queue);
+
+/*
+ * Cancels any pending pop requests.
+ *
+ * `queue` is the mpmcq structure.
+ */
+extern void mpmcq_cancel(struct mpmcq *queue);
+
+#endif
-- 
2.17.0.gvfs.1.123.g449c066


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees()
  2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
  2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
@ 2018-07-18 20:45 ` Ben Peart
  2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-18 20:45 UTC (permalink / raw)
  To: git@vger.kernel.org; +Cc: gitster@pobox.com, Ben Peart

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 unpack-trees.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a85a02a77..1f58efc6bb 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1326,6 +1326,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	if (len) {
 		const char *prefix = o->prefix ? o->prefix : "";
 		struct traverse_info info;
+		uint64_t start;
 
 		setup_traverse_info(&info, prefix);
 		info.fn = unpack_callback;
@@ -1350,8 +1351,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 			}
 		}
 
+		start = getnanotime();
 		if (traverse_trees(len, t, &info) < 0)
 			goto return_failed;
+		trace_performance_since(start, "traverse_trees");
 	}
 
 	/* Any left-over entries in the index? */
-- 
2.17.0.gvfs.1.123.g449c066


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v1 3/3] Add initial parallel version of unpack_trees()
  2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
  2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
  2018-07-18 20:45 ` [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees() Ben Peart
@ 2018-07-18 20:45 ` Ben Peart
  2018-07-18 22:56   ` Junio C Hamano
  2018-07-18 21:02 ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Stefan Beller
  2018-07-18 21:34 ` Jeff King
  4 siblings, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-07-18 20:45 UTC (permalink / raw)
  To: git@vger.kernel.org; +Cc: gitster@pobox.com, Ben Peart

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 cache.h        |   1 +
 config.c       |   5 +
 environment.c  |   1 +
 unpack-trees.c | 313 ++++++++++++++++++++++++++++++++++++++++++++++++-
 unpack-trees.h |  30 +++++
 5 files changed, 348 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index d49092d94d..4bfa35c497 100644
--- a/cache.h
+++ b/cache.h
@@ -815,6 +815,7 @@ extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
+extern int core_parallel_unpack_trees;
 extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
diff --git a/config.c b/config.c
index f4a208a166..34d5506588 100644
--- a/config.c
+++ b/config.c
@@ -1346,6 +1346,11 @@ static int git_default_core_config(const char *var, const char *value)
 					 var, value);
 	}
 
+	if (!strcmp(var, "core.parallelunpacktrees")) {
+		core_parallel_unpack_trees = git_config_bool(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index 2a6de2330b..1eb0a05074 100644
--- a/environment.c
+++ b/environment.c
@@ -68,6 +68,7 @@ char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_commit_graph;
 int core_apply_sparse_checkout;
+int core_parallel_unpack_trees;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
diff --git a/unpack-trees.c b/unpack-trees.c
index 1f58efc6bb..2333626efd 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -17,6 +17,7 @@
 #include "submodule-config.h"
 #include "fsmonitor.h"
 #include "fetch-object.h"
+#include "thread-utils.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -641,6 +642,98 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
 	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+#ifndef NO_PTHREADS
+
+struct traverse_info_parallel {
+	struct mpmcq_entry entry;
+	struct tree_desc t[MAX_UNPACK_TREES];
+	void *buf[MAX_UNPACK_TREES];
+	struct traverse_info info;
+	int n;
+	int nr_buf;
+	int ret;
+};
+
+static int traverse_trees_parallel(int n, unsigned long dirmask,
+				   unsigned long df_conflicts,
+				   struct name_entry *names,
+				   struct traverse_info *info)
+{
+	int i;
+	struct name_entry *p;
+	struct unpack_trees_options *o = info->data;
+	struct traverse_info_parallel *newinfo;
+
+	p = names;
+	while (!p->mode)
+		p++;
+
+	newinfo = xmalloc(sizeof(struct traverse_info_parallel));
+	mpmcq_entry_init(&newinfo->entry);
+	newinfo->info = *info;
+	newinfo->info.prev = info;
+	newinfo->info.pathspec = info->pathspec;
+	newinfo->info.name = *p;
+	newinfo->info.pathlen += tree_entry_len(p) + 1;
+	newinfo->info.df_conflicts |= df_conflicts;
+	newinfo->nr_buf = 0;
+	newinfo->n = n;
+
+	/*
+	 * Fetch the tree from the ODB for each peer directory in the
+	 * n commits.
+	 *
+	 * For 2- and 3-way traversals, we try to avoid hitting the
+	 * ODB twice for the same OID.  This should yield a nice speed
+	 * up in checkouts and merges when the commits are similar.
+	 *
+	 * We don't bother doing the full O(n^2) search for larger n,
+	 * because wider traversals don't happen that often and we
+	 * avoid the search setup.
+	 *
+	 * When 2 peer OIDs are the same, we just copy the tree
+	 * descriptor data.  This implicitly borrows the buffer
+	 * data from the earlier cell.
+	 */
+	for (i = 0; i < n; i++, dirmask >>= 1) {
+		if (i > 0 && are_same_oid(&names[i], &names[i - 1]))
+			newinfo->t[i] = newinfo->t[i - 1];
+		else if (i > 1 && are_same_oid(&names[i], &names[i - 2]))
+			newinfo->t[i] = newinfo->t[i - 2];
+		else {
+			const struct object_id *oid = NULL;
+			if (dirmask & 1)
+				oid = names[i].oid;
+
+			/*
+			 * fill_tree_descriptor() will load the tree from the
+			 * ODB. Accessing the ODB is not thread safe so
+			 * serialize access using the odb_mutex.
+			 */
+			pthread_mutex_lock(&o->odb_mutex);
+			newinfo->buf[newinfo->nr_buf++] =
+				fill_tree_descriptor(newinfo->t + i, oid);
+			pthread_mutex_unlock(&o->odb_mutex);
+		}
+	}
+
+	/*
+	 * We can't play games with the cache bottom as we are processing
+	 * the tree objects in parallel.
+	 * newinfo->bottom = switch_cache_bottom(&newinfo->info);
+	 */
+
+	/* All I really need here is fetch_and_add() */
+	pthread_mutex_lock(&o->work_mutex);
+	o->remaining_work++;
+	pthread_mutex_unlock(&o->work_mutex);
+	mpmcq_push(&o->queue, &newinfo->entry);
+
+	return 0;
+}
+
+#endif
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
 				    unsigned long df_conflicts,
 				    struct name_entry *names,
@@ -995,6 +1088,108 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+static int unpack_callback_parallel(int n, unsigned long mask,
+				    unsigned long dirmask,
+				    struct name_entry *names,
+				    struct traverse_info *info)
+{
+	struct cache_entry *src[MAX_UNPACK_TREES + 1] = {
+		NULL,
+	};
+	struct unpack_trees_options *o = info->data;
+	const struct name_entry *p = names;
+
+	/* Find first entry with a real name (we could use "mask" too) */
+	while (!p->mode)
+		p++;
+
+	if (o->debug_unpack)
+		debug_unpack_callback(n, mask, dirmask, names, info);
+
+	/* Are we supposed to look at the index too? */
+	if (o->merge) {
+		while (1) {
+			int cmp;
+			struct cache_entry *ce;
+
+			if (o->diff_index_cached)
+				ce = next_cache_entry(o);
+			else
+				ce = find_cache_entry(info, p);
+
+			if (!ce)
+				break;
+			cmp = compare_entry(ce, info, p);
+			if (cmp < 0) {
+				int ret;
+
+				pthread_mutex_lock(&o->unpack_index_entry_mutex);
+				ret = unpack_index_entry(ce, o);
+				pthread_mutex_unlock(&o->unpack_index_entry_mutex);
+				if (ret < 0)
+					return unpack_failed(o, NULL);
+				continue;
+			}
+			if (!cmp) {
+				if (ce_stage(ce)) {
+					/*
+					 * If we skip unmerged index
+					 * entries, we'll skip this
+					 * entry *and* the tree
+					 * entries associated with it!
+					 */
+					if (o->skip_unmerged) {
+						add_same_unmerged(ce, o);
+						return mask;
+					}
+				}
+				src[0] = ce;
+			}
+			break;
+		}
+	}
+
+	pthread_mutex_lock(&o->unpack_nondirectories_mutex);
+	int ret = unpack_nondirectories(n, mask, dirmask, src, names, info);
+	pthread_mutex_unlock(&o->unpack_nondirectories_mutex);
+	if (ret < 0)
+		return -1;
+
+	if (o->merge && src[0]) {
+		if (ce_stage(src[0]))
+			mark_ce_used_same_name(src[0], o);
+		else
+			mark_ce_used(src[0], o);
+	}
+
+	/* Now handle any directories.. */
+	if (dirmask) {
+		/* special case: "diff-index --cached" looking at a tree */
+		if (o->diff_index_cached && n == 1 && dirmask == 1 &&
+		    S_ISDIR(names->mode)) {
+			int matches;
+			matches = cache_tree_matches_traversal(
+				o->src_index->cache_tree, names, info);
+			/*
+			 * Everything under the name matches; skip the
+			 * entire hierarchy.  diff_index_cached codepath
+			 * special cases D/F conflicts in such a way that
+			 * it does not do any look-ahead, so this is safe.
+			 */
+			if (matches) {
+				o->cache_bottom += matches;
+				return mask;
+			}
+		}
+
+		if (traverse_trees_parallel(n, dirmask, mask & ~dirmask, names, info) < 0)
+			return -1;
+		return mask;
+	}
+
+	return mask;
+}
+
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
@@ -1263,6 +1458,116 @@ static void mark_new_skip_worktree(struct exclude_list *el,
 static int verify_absent(const struct cache_entry *,
 			 enum unpack_trees_error_types,
 			 struct unpack_trees_options *);
+
+#ifndef NO_PTHREADS
+static void *traverse_trees_parallel_thread_proc(void *_data)
+{
+	struct unpack_trees_options *o = _data;
+	struct traverse_info_parallel *info;
+	int i;
+
+	while (1) {
+		info = (struct traverse_info_parallel *)mpmcq_pop(&o->queue);
+		if (!info)
+			break;
+
+		info->ret = traverse_trees(info->n, info->t, &info->info);
+		/*
+		 * We can't play games with the cache bottom as we are processing
+		 * the tree objects in parallel.
+		 * restore_cache_bottom(&info->info, info->bottom);
+		 */
+
+		for (i = 0; i < info->nr_buf; i++)
+			free(info->buf[i]);
+		/*
+		 * TODO: Can't free "info" when thread is done because it can be used
+		 * as ->prev link in child info objects.  Ref count?  Free all at end?
+		free(info);
+		 */
+
+		/* All I really need here is fetch_and_add() */
+		pthread_mutex_lock(&o->work_mutex);
+		o->remaining_work--;
+		if (o->remaining_work == 0)
+			mpmcq_cancel(&o->queue);
+		pthread_mutex_unlock(&o->work_mutex);
+	}
+
+	return NULL;
+}
+
+static void init_parallel_traverse(struct unpack_trees_options *o,
+				   struct traverse_info *info)
+{
+	/*
+	 * TODO: Add logic to bypass parallel path when not needed.
+	 *			- not enough CPU cores to help
+	 *			- 'git status' is always fast - how to detect?
+	 *			- small trees (may be able to use index size as proxy, small index likely means small commit tree)
+	 */
+	if (core_parallel_unpack_trees) {
+		int t;
+
+		mpmcq_init(&o->queue);
+		o->remaining_work = 0;
+		pthread_mutex_init(&o->unpack_nondirectories_mutex, NULL);
+		pthread_mutex_init(&o->unpack_index_entry_mutex, NULL);
+		pthread_mutex_init(&o->odb_mutex, NULL);
+		pthread_mutex_init(&o->work_mutex, NULL);
+		o->nr_threads = online_cpus();
+		o->pthreads = xcalloc(o->nr_threads, sizeof(pthread_t));
+		info->fn = unpack_callback_parallel;
+
+		for (t = 0; t < o->nr_threads; t++) {
+			if (pthread_create(&o->pthreads[t], NULL,
+					   traverse_trees_parallel_thread_proc,
+					   o))
+				die("unable to create traverse_trees_parallel_thread");
+		}
+	}
+}
+
+static void wait_parallel_traverse(struct unpack_trees_options *o)
+{
+	/*
+	 * The first tree (root directory) is processed on the main thread.
+	 * This function is called after it has completed.  If there is no
+	 * remaining work, we know we are finished.
+	 */
+	if (core_parallel_unpack_trees) {
+		int t;
+
+		pthread_mutex_lock(&o->work_mutex);
+		if (o->remaining_work == 0)
+			mpmcq_cancel(&o->queue);
+		pthread_mutex_unlock(&o->work_mutex);
+
+		for (t = 0; t < o->nr_threads; t++) {
+			if (pthread_join(o->pthreads[t], NULL))
+				die("unable to join traverse_trees_parallel_thread");
+		}
+
+		free(o->pthreads);
+		pthread_mutex_destroy(&o->work_mutex);
+		pthread_mutex_destroy(&o->odb_mutex);
+		pthread_mutex_destroy(&o->unpack_index_entry_mutex);
+		pthread_mutex_destroy(&o->unpack_nondirectories_mutex);
+		mpmcq_destroy(&o->queue);
+	}
+}
+#else
+static void init_parallel_traverse(struct unpack_trees_options *o)
+{
+	return;
+}
+
+static void wait_parallel_traverse(struct unpack_trees_options *o)
+{
+	return;
+}
+#endif
+
 /*
  * N-way merge "len" trees.  Returns 0 on success, -1 on failure to manipulate the
  * resulting index, -2 on failure to reflect the changes to the work tree.
@@ -1327,6 +1632,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		const char *prefix = o->prefix ? o->prefix : "";
 		struct traverse_info info;
 		uint64_t start;
+		int ret;
 
 		setup_traverse_info(&info, prefix);
 		info.fn = unpack_callback;
@@ -1352,9 +1658,12 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		}
 
 		start = getnanotime();
-		if (traverse_trees(len, t, &info) < 0)
-			goto return_failed;
+		init_parallel_traverse(o, &info);
+		ret = traverse_trees(len, t, &info);
+		wait_parallel_traverse(o);
 		trace_performance_since(start, "traverse_trees");
+		if (ret < 0)
+			goto return_failed;
 	}
 
 	/* Any left-over entries in the index? */
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..b7140099fa 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -3,6 +3,11 @@
 
 #include "tree-walk.h"
 #include "argv-array.h"
+#ifndef NO_PTHREADS
+#include "git-compat-util.h"
+#include <pthread.h>
+#include "mpmcqueue.h"
+#endif
 
 #define MAX_UNPACK_TREES 8
 
@@ -80,6 +85,31 @@ struct unpack_trees_options {
 	struct index_state result;
 
 	struct exclude_list *el; /* for internal use */
+#ifndef NO_PTHREADS
+	/*
+	 * Speed up the tree traversal by adding all discovered tree objects
+	 * into a queue and have a pool of worker threads process them in
+	 * parallel.  Since there is no upper bound on the size of a tree and
+	 * each worker thread will be adding discovered tree objects to the
+	 * queue, we need an unbounded multi-producer-multi-consumer queue.
+	 */
+	struct mpmcq queue;
+
+	int nr_threads;
+	pthread_t *pthreads;
+
+	/* need a mutex as we don't have fetch_and_add() */
+	int remaining_work;
+	pthread_mutex_t work_mutex;
+
+	/* The ODB is not thread safe so we must serialize access to it */
+	pthread_mutex_t odb_mutex;
+
+	/* various functions that are not thread safe and must be serialized for now */
+	pthread_mutex_t unpack_index_entry_mutex;
+	pthread_mutex_t unpack_nondirectories_mutex;
+
+#endif
 };
 
 extern int unpack_trees(unsigned n, struct tree_desc *t,
-- 
2.17.0.gvfs.1.123.g449c066


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue
  2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
@ 2018-07-18 20:57   ` Stefan Beller
  2018-07-19 19:11   ` Junio C Hamano
  1 sibling, 0 replies; 121+ messages in thread
From: Stefan Beller @ 2018-07-18 20:57 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, Junio C Hamano

On Wed, Jul 18, 2018 at 1:45 PM Ben Peart <Ben.Peart@microsoft.com> wrote:
>

Did you have any further considerations that are worth recording here?
(memory, performance, CPU execution, threading, would all come to mind)

> Signed-off-by: Ben Peart <benpeart@microsoft.com>

> +/*
> + * Initializes a mpmcq structure.
> + */

I'd find the name mpmcq a bit troubling if I were just stumbling upon it
in the code without the knowledge of this review (and its abbreviation),
maybe just 'threadsafe_queue' ?

> +extern void mpmcq_init(struct mpmcq *queue);

We prefer no extern keyword these days
c.f. Documentation/CodingGuidelines:
 - Variables and functions local to a given source file should be marked
   with "static". Variables that are visible to other source files
   must be declared with "extern" in header files. However, function
   declarations should not use "extern", as that is already the default.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
                   ` (2 preceding siblings ...)
  2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
@ 2018-07-18 21:02 ` Stefan Beller
  2018-07-18 21:34 ` Jeff King
  4 siblings, 0 replies; 121+ messages in thread
From: Stefan Beller @ 2018-07-18 21:02 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, Junio C Hamano

> don�t

The encoding seems to be broken somehow (also on)
https://public-inbox.org/git/20180718204458.20936-1-benpeart@microsoft.com/


> When I brought up this idea with some other git contributors they mentioned
> that multi threading unpack_trees() had been discussed a few years ago on

https://public-inbox.org/git/CACsJy8A0KUyxK_2NAMh+da9yithZM5d68rhqEVZe3NcMxinAjA@mail.gmail.com/
https://public-inbox.org/git/20160415095139.GA3985@lanh/


> the list but that the idea was discarded.  They couldn�t remember exactly
> why it was discarded and none of us have been able to find the email threads
> from that earlier discussion. As a result, I decided to write up this RFC
> and see if the greater git community has ideas, suggestions, or more
> background/history on whether this is a reasonable path to pursue or if
> there are other/better ideas on how to speed up checkout especially on large
> repos.

If you want more than a bare bones threaded queue, see
https://public-inbox.org/git/1440724495-708-5-git-send-email-sbeller@google.com/
for inspiration.

Stefan

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
                   ` (3 preceding siblings ...)
  2018-07-18 21:02 ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Stefan Beller
@ 2018-07-18 21:34 ` Jeff King
  2018-07-23 15:48   ` Ben Peart
  4 siblings, 1 reply; 121+ messages in thread
From: Jeff King @ 2018-07-18 21:34 UTC (permalink / raw)
  To: Ben Peart; +Cc: git@vger.kernel.org, gitster@pobox.com

On Wed, Jul 18, 2018 at 08:45:14PM +0000, Ben Peart wrote:

> When working directories get big, checkout times start to suffer.  Even with
> GVFS virtualization (which limits git to only having to update those files
> that have been changed locally) we�re seeing P50 times for checkout of 31
> seconds and the P80 time is 43 seconds.

Funny aside: all of your apostrophes look like the unicode question
mark. Looking at raw bytes of your mail, they're actually u+fffd
(unicode "replacement character"). Your headers correctly claim to be
utf8. So presumably they got munged by whatever converted to unicode and
didn't have the original character in its translation table. I wonder if
this was send-email (so really perl's encode module), or if your smtp
server tried to do an on-the-fly conversion (I know many servers will
switch the content-transfer-encoding, but I haven't seen a charset
conversion before).

Anyway, on to the actual discussion:

> Here is a checkout command with tracing turned on to demonstrate where the
> time is spent.  Note, this is somewhat of a �best case� as I�m simply
> checking out the current commit:
> 
> benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
> $ /usr/src/git/git.exe checkout
> 12:31:50.419016 read-cache.c:2006       performance: 1.180966800 s: read cache .git/index
> 12:31:51.184636 name-hash.c:605         performance: 0.664575200 s: initialize name hash
> 12:31:51.200280 preload-index.c:111     performance: 0.019811600 s: preload index
> 12:31:51.294012 read-cache.c:1543       performance: 0.094515600 s: refresh index
> 12:32:29.731344 unpack-trees.c:1358     performance: 33.889840200 s: traverse_trees
> 12:32:37.512555 read-cache.c:2541       performance: 1.564438300 s: write index, changed mask = 28
> 12:32:44.918730 unpack-trees.c:1358     performance: 7.243155600 s: traverse_trees
> 12:32:44.965611 diff-lib.c:527          performance: 7.374729200 s: diff-index
> Waiting for GVFS to parse index and update placeholder files...Succeeded
> 12:32:46.824986 trace.c:420             performance: 57.715656000 s: git command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout

What's the current state of the index before this checkout? I don't
recall offhand how aggressively we prune the tree walk based on the diff
between the index and the tree we're loading. If we're starting from
scratch, then obviously we do have to walk the whole thing. But in most
cases we should be able to avoid walking into sub-trees where the index
has a matching cache_tree record.

If we're not doing that, it seems like that's going to be the big
obvious win, because it reduces the number of trees we have to consider
in the first place.

> ODB cache
> =========
> Since traverse_trees() hits the ODB for each tree object (of which there are
> over 500K in this repo) I wrote and tested having an in-memory ODB cache
> that cached all tree objects.  This resulted in a > 50% hit ratio (largely
> due to the fact we traverse the tree twice during checkout) but resulted in
> only a minimal savings (1.3 seconds).

In my experience, one major cost of object access is decompression, both
delta and zlib. Trees in particular tend to delta very well across
versions. We have a cache to try to reuse intermediate delta results,
but the default size is probably woefully undersized for your repository
(I know from past tests it's undersized a bit even for the linux
kernel).

Try bumping core.deltaBaseCacheLimit to see if that has any impact. It's
96MB by default.

There may also be some possible work in making it more aggressive about
storing the intermediate results. I seem to recall from past
explorations that it doesn't keep everything, and I don't know if its
heuristics have ever been proven sane.

For zlib compression, I don't have numbers handy, but previous
experiments showed that trees don't actually benefit all that much from
zlib (presumably because they're mostly random-looking hashes). So one
option would be to try repacking _just_ the trees with
"pack.compression" set to 0, and see how the result behaves. I suspect
that will be pretty painful with your giant multi-pack repo.

It might be slightly easier if we had an option to set the compression
level on a per-type basis (both to experiment, and then of course if it
works to actually tune your repo).

The numbers above aren't specific enough to know how much time was spent
doing zlib stuff, though. And even with more specific probes, it's
generally still hard to tell the difference between what's specific to
the compression level, and what's a result of the fact that zlib is
essentially copying all the bytes from the filesystem into memory.
Still, my timings with zstd[1] showed something like 10-20% improvement
on object access, so we should be able to get something at least as good
by moving to no compression.

[1] https://public-inbox.org/git/20161023080552.lma2v6zxmyaiiqz5@sigill.intra.peff.net/

> Tree Graph File
> ===============
> I also considered storing the commit tree in an alternate structure that is
> faster to load/parse (ala the Commit graph) but the cache results along with
> the negligible impact of running checkout back to back (thus ensuring the
> objects were cached in my file system cache) made me believe this would not
> result in much savings. MIDX has already helped out here given we end up
> with a lot of pack files of commits and trees.

I don't think this will help. Tree objects are actually reasonably
compact on disk relative to the information you're getting out of them.
As opposed to commit objects, where you expand a kilobyte to get the 40
bytes of parent pointer.

So there's probably benefit from storing tree _relationships_ (like the
diff between a commit and its parent) if it lets you avoid opening the
tree, but for a checkout you really are walking the tree entries.
Possibly with a diff to your current state (as above), but that
relationship is much less predictable (your current state is arbitrary,
not necessarily the commit parent).

> Sparse tree traversal
> =====================
> We�ve sped up other parts of git by taking advantage of the existing
> sparse-checkout/excludes logic to limit what files git has to consider to
> those that have been modified by the user locally.  I haven�t been able to
> think of a way to take advantage of that with unpack-trees() as when you are
> merging n commits, a change/conflict can occur in any tree object so they
> must all be traversed.  If I�m missing something here and there _is_ a way
> to entirely skip large parts of the tree, please let me know!  Please note
> that we�re already limiting the files that git needs to update in the
> working directory via sparse-checkout/excludes but the other/merge logic
> still executes for the entire tree whether there are files to update or not.

For narrow/partial clones, it seems like we'd ultimately have to
consider a merge of unavailable trees to be a conflict anyway. I.e., it
seems reasonable to me that if we're sparse in path "subdir", then in a
merge, either:

  1. Neither side touched "subdir", and we can ignore it without
     descending. 

  2. One side touched "subdir", in which we can take the trivial
     merge. This would be the common case when you pull somebody else's
     branch that touched code you have marked as sparse.

  3. Both sides touched it, in which case we must report a conflict.
     This might happen if you pulled two topics which both touched the
     same code (even if you didn't). It might even resolve cleanly if we
     had the actual trees and blobs to look at, but that just means
     _you_ can't resolve it in your sparse clone.

I'd expect (1) and (2) to be the common cases. I won't be at all
surprised if unpack_trees() isn't particularly smart about that, though.

> Multi-threading unpack_trees()
> ==============================
> The current model of unpack_trees() is that a single thread recursively
> traverses each tree object as it comes across it.  One thought I had was to
> multi-thread the traversal so that each tree object could be processed in
> parallel.  To test this idea out, I wrote an unbounded
> Multi-Product-Multi-Consumer queue and then wrote a
> traverse_trees_parallel() function that would add any new tree objects into
> the queue where they can be processed by a pool of worker threads.  Each
> thread will wake up when there is work in the queue, remove a tree object,
> process it adding any additional tree objects it finds.

I'm generally terrified of multi-threading anything in the core parts of
Git. There are so many latent bits of non-reentrant or racy code.

I think your queue suggestion may be the sanest approach, though,
because it makes it keeps the responsibilities of the worker threads
pretty clear.

> When I brought up this idea with some other git contributors they mentioned
> that multi threading unpack_trees() had been discussed a few years ago on
> the list but that the idea was discarded.  They couldn�t remember exactly
> why it was discarded and none of us have been able to find the email threads
> from that earlier discussion. As a result, I decided to write up this RFC
> and see if the greater git community has ideas, suggestions, or more
> background/history on whether this is a reasonable path to pursue or if
> there are other/better ideas on how to speed up checkout especially on large
> repos.

I don't remember any specific discussion, and didn't dig anything up
after a few minutes. But I'd be willing to bet that the primary reason
it would not be pursued is the general lack of thread safety in the
current codebase.

-Peff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 3/3] Add initial parallel version of unpack_trees()
  2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
@ 2018-07-18 22:56   ` Junio C Hamano
  0 siblings, 0 replies; 121+ messages in thread
From: Junio C Hamano @ 2018-07-18 22:56 UTC (permalink / raw)
  To: Ben Peart; +Cc: git@vger.kernel.org

Ben Peart <Ben.Peart@microsoft.com> writes:

> +	 * Fetch the tree from the ODB for each peer directory in the
> +	 * n commits.
> +	 *
> +	 * For 2- and 3-way traversals, we try to avoid hitting the
> +	 * ODB twice for the same OID.  This should yield a nice speed
> +	 * up in checkouts and merges when the commits are similar.
> +	 *
> +	 * We don't bother doing the full O(n^2) search for larger n,
> +	 * because wider traversals don't happen that often and we
> +	 * avoid the search setup.

It is sensible to optimize for common cases while leaving out the
complexity that is only needed to support rare cases.

> +	 * When 2 peer OIDs are the same, we just copy the tree
> +	 * descriptor data.  This implicitly borrows the buffer
> +	 * data from the earlier cell.

cell meaning...?


> +	for (i = 0; i < n; i++, dirmask >>= 1) {
> +		if (i > 0 && are_same_oid(&names[i], &names[i - 1]))
> +			newinfo->t[i] = newinfo->t[i - 1];
> +		else if (i > 1 && are_same_oid(&names[i], &names[i - 2]))
> +			newinfo->t[i] = newinfo->t[i - 2];
> +		else {
> +			const struct object_id *oid = NULL;
> +			if (dirmask & 1)
> +				oid = names[i].oid;
> +
> +			/*
> +			 * fill_tree_descriptor() will load the tree from the
> +			 * ODB. Accessing the ODB is not thread safe so
> +			 * serialize access using the odb_mutex.
> +			 */
> +			pthread_mutex_lock(&o->odb_mutex);
> +			newinfo->buf[newinfo->nr_buf++] =
> +				fill_tree_descriptor(newinfo->t + i, oid);
> +			pthread_mutex_unlock(&o->odb_mutex);
> +		}
> +	}
> +
> +	/*
> +	 * We can't play games with the cache bottom as we are processing
> +	 * the tree objects in parallel.
> +	 * newinfo->bottom = switch_cache_bottom(&newinfo->info);
> +	 */

Would the resulting code match corresponding entries from two/three
trees correctly with a tree with entries "foo." (blob), "foo/" (has
subtree), and "foo0" (blob) at the same time, without adjusting the
bottom?  I am worried because cache_bottom stuff is not about
optimization but is about correctness.

> +	/* All I really need here is fetch_and_add() */
> +	pthread_mutex_lock(&o->work_mutex);
> +	o->remaining_work++;
> +	pthread_mutex_unlock(&o->work_mutex);
> +	mpmcq_push(&o->queue, &newinfo->entry);

Nice.  I like the general idea.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue
  2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
  2018-07-18 20:57   ` Stefan Beller
@ 2018-07-19 19:11   ` Junio C Hamano
  1 sibling, 0 replies; 121+ messages in thread
From: Junio C Hamano @ 2018-07-19 19:11 UTC (permalink / raw)
  To: Ben Peart; +Cc: git@vger.kernel.org

Ben Peart <Ben.Peart@microsoft.com> writes:

> +/*
> + * struct mpmcq_entry is an opaque structure representing an entry in the
> + * queue.
> + */
> +struct mpmcq_entry {
> +	struct mpmcq_entry *next;
> +};
> +
> +/*
> + * struct mpmcq is the concurrent queue structure. Members should not be
> + * modified directly.
> + */
> +struct mpmcq {
> +	struct mpmcq_entry *head;
> +	pthread_mutex_t mutex;
> +	pthread_cond_t condition;
> +	int cancel;
> +};

This calls itself a queue, but a new element goes to the beginning
of a singly linked list, and the only way to take an element out is
from near the beinning of the linked list, so it looks more like a
LIFO stack to me.

I do not know how much it matters, as the name mpmcq is totally
opaque to readers so perhaps readers are not even aware of various
aspects of the service, e.g. how it works, what fairness it gives to
the calling code, etc.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-18 21:34 ` Jeff King
@ 2018-07-23 15:48   ` Ben Peart
  2018-07-23 17:03     ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-07-23 15:48 UTC (permalink / raw)
  To: Jeff King, Ben Peart; +Cc: git@vger.kernel.org, gitster@pobox.com



On 7/18/2018 5:34 PM, Jeff King wrote:
> On Wed, Jul 18, 2018 at 08:45:14PM +0000, Ben Peart wrote:
> 
>> When working directories get big, checkout times start to suffer.  Even with
>> GVFS virtualization (which limits git to only having to update those files
>> that have been changed locally) we�re seeing P50 times for checkout of 31
>> seconds and the P80 time is 43 seconds.
> 
> Funny aside: all of your apostrophes look like the unicode question
> mark. Looking at raw bytes of your mail, they're actually u+fffd
> (unicode "replacement character"). Your headers correctly claim to be
> utf8. So presumably they got munged by whatever converted to unicode and
> didn't have the original character in its translation table. I wonder if
> this was send-email (so really perl's encode module), or if your smtp
> server tried to do an on-the-fly conversion (I know many servers will
> switch the content-transfer-encoding, but I haven't seen a charset
> conversion before).
> 

This was my bad.  I wrote the email in Word so I could get spell 
checking and it has this 'feature' where it converts all straight quotes 
to "smart quotes."  I just forgot to search/replace them back to 
straight quotes before sending the mail.

> Anyway, on to the actual discussion:
> 
>> Here is a checkout command with tracing turned on to demonstrate where the
>> time is spent.  Note, this is somewhat of a �best case� as I�m simply
>> checking out the current commit:
>>
>> benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
>> $ /usr/src/git/git.exe checkout
>> 12:31:50.419016 read-cache.c:2006       performance: 1.180966800 s: read cache .git/index
>> 12:31:51.184636 name-hash.c:605         performance: 0.664575200 s: initialize name hash
>> 12:31:51.200280 preload-index.c:111     performance: 0.019811600 s: preload index
>> 12:31:51.294012 read-cache.c:1543       performance: 0.094515600 s: refresh index
>> 12:32:29.731344 unpack-trees.c:1358     performance: 33.889840200 s: traverse_trees
>> 12:32:37.512555 read-cache.c:2541       performance: 1.564438300 s: write index, changed mask = 28
>> 12:32:44.918730 unpack-trees.c:1358     performance: 7.243155600 s: traverse_trees
>> 12:32:44.965611 diff-lib.c:527          performance: 7.374729200 s: diff-index
>> Waiting for GVFS to parse index and update placeholder files...Succeeded
>> 12:32:46.824986 trace.c:420             performance: 57.715656000 s: git command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout
> 
> What's the current state of the index before this checkout? 

This was after running "git checkout" multiple times so there was really 
nothing for git to do.

> I don't
> recall offhand how aggressively we prune the tree walk based on the diff
> between the index and the tree we're loading. If we're starting from > scratch, then obviously we do have to walk the whole thing. But in most
> cases we should be able to avoid walking into sub-trees where the index
> has a matching cache_tree record.
> 
> If we're not doing that, it seems like that's going to be the big
> obvious win, because it reduces the number of trees we have to consider
> in the first place.
> 

I agree this could be a big win.  Especially in large trees, the 
percentage of the tree that changes between two commits is often quite 
small.  Saving 100% of that is a much bigger win than actually doing all 
that work even in parallel. Today, we aren't aggressive at all and do no 
pruning.

This brings up a concern I have with this approach altogether. In an 
earlier patch series, I tried to optimize the "git checkout -b" code 
path to not update every file in the working directory but only to 
create the new branch and switch to it.  The feedback to that patch was 
that people rely on the current behavior of rewriting every file so the 
patch was rejected.  This earlier attempt/failure to optimize checkout 
makes me worried that _any_ effort to prune the tree will be rejected 
for the same reason.

I'd be interested in how we can prune the tree and only do the work 
required without breaking the implied behavior of the current 
implementation. Would it be acceptable to have two code paths 1) the old 
one for back compat that updates every file whether there are changes or 
not and 2) a new/optimized one that only does the minimum work required? 
  Then we could put which code path executes by default behind by a new 
config setting that allows people to opt-in to the new/faster behavior.

Any other ideas or suggestions that don't require coming up with new git 
commands (ie "git fast-checkout") and retraining existing git users?

>> ODB cache
>> =========
>> Since traverse_trees() hits the ODB for each tree object (of which there are
>> over 500K in this repo) I wrote and tested having an in-memory ODB cache
>> that cached all tree objects.  This resulted in a > 50% hit ratio (largely
>> due to the fact we traverse the tree twice during checkout) but resulted in
>> only a minimal savings (1.3 seconds).
> 
> In my experience, one major cost of object access is decompression, both
> delta and zlib. Trees in particular tend to delta very well across
> versions. We have a cache to try to reuse intermediate delta results,
> but the default size is probably woefully undersized for your repository
> (I know from past tests it's undersized a bit even for the linux
> kernel).
> 
> Try bumping core.deltaBaseCacheLimit to see if that has any impact. It's
> 96MB by default.
> 
> There may also be some possible work in making it more aggressive about
> storing the intermediate results. I seem to recall from past
> explorations that it doesn't keep everything, and I don't know if its
> heuristics have ever been proven sane.
> 
> For zlib compression, I don't have numbers handy, but previous
> experiments showed that trees don't actually benefit all that much from
> zlib (presumably because they're mostly random-looking hashes). So one
> option would be to try repacking _just_ the trees with
> "pack.compression" set to 0, and see how the result behaves. I suspect
> that will be pretty painful with your giant multi-pack repo.
> 
> It might be slightly easier if we had an option to set the compression
> level on a per-type basis (both to experiment, and then of course if it
> works to actually tune your repo).
> 
> The numbers above aren't specific enough to know how much time was spent
> doing zlib stuff, though. And even with more specific probes, it's
> generally still hard to tell the difference between what's specific to
> the compression level, and what's a result of the fact that zlib is
> essentially copying all the bytes from the filesystem into memory.
> Still, my timings with zstd[1] showed something like 10-20% improvement
> on object access, so we should be able to get something at least as good
> by moving to no compression.
> 
> [1] https://public-inbox.org/git/20161023080552.lma2v6zxmyaiiqz5@sigill.intra.peff.net/
> 

Thanks, these are good ideas to pursue.  I've added them to my list of 
things to look into but believe pruning the tree or traversing it in 
parallel has more performance saving potential so I'll be looking there 
first.

<snip>


>> Multi-threading unpack_trees()
>> ==============================
>> The current model of unpack_trees() is that a single thread recursively
>> traverses each tree object as it comes across it.  One thought I had was to
>> multi-thread the traversal so that each tree object could be processed in
>> parallel.  To test this idea out, I wrote an unbounded
>> Multi-Product-Multi-Consumer queue and then wrote a
>> traverse_trees_parallel() function that would add any new tree objects into
>> the queue where they can be processed by a pool of worker threads.  Each
>> thread will wake up when there is work in the queue, remove a tree object,
>> process it adding any additional tree objects it finds.
> 
> I'm generally terrified of multi-threading anything in the core parts of
> Git. There are so many latent bits of non-reentrant or racy code.
> 
> I think your queue suggestion may be the sanest approach, though,
> because it makes it keeps the responsibilities of the worker threads
> pretty clear.
> 

I agree the thought of multi-threading unpack_trees() is daunting!  It 
would be nice if the model of pruning the tree was sufficient to get 
reasonable performance with large repos.  I guess we'll see...

>> When I brought up this idea with some other git contributors they mentioned
>> that multi threading unpack_trees() had been discussed a few years ago on
>> the list but that the idea was discarded.  They couldn�t remember exactly
>> why it was discarded and none of us have been able to find the email threads
>> from that earlier discussion. As a result, I decided to write up this RFC
>> and see if the greater git community has ideas, suggestions, or more
>> background/history on whether this is a reasonable path to pursue or if
>> there are other/better ideas on how to speed up checkout especially on large
>> repos.
> 
> I don't remember any specific discussion, and didn't dig anything up
> after a few minutes. But I'd be willing to bet that the primary reason
> it would not be pursued is the general lack of thread safety in the
> current codebase.
> 
> -Peff
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-23 15:48   ` Ben Peart
@ 2018-07-23 17:03     ` Duy Nguyen
  2018-07-23 20:51       ` Ben Peart
  2018-07-24  4:27       ` Jeff King
  0 siblings, 2 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-07-23 17:03 UTC (permalink / raw)
  To: Ben Peart; +Cc: Jeff King, Ben Peart, Git Mailing List, Junio C Hamano

On Mon, Jul 23, 2018 at 5:50 PM Ben Peart <peartben@gmail.com> wrote:
> > Anyway, on to the actual discussion:
> >
> >> Here is a checkout command with tracing turned on to demonstrate where the
> >> time is spent.  Note, this is somewhat of a �best case� as I�m simply
> >> checking out the current commit:
> >>
> >> benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
> >> $ /usr/src/git/git.exe checkout
> >> 12:31:50.419016 read-cache.c:2006       performance: 1.180966800 s: read cache .git/index
> >> 12:31:51.184636 name-hash.c:605         performance: 0.664575200 s: initialize name hash
> >> 12:31:51.200280 preload-index.c:111     performance: 0.019811600 s: preload index
> >> 12:31:51.294012 read-cache.c:1543       performance: 0.094515600 s: refresh index
> >> 12:32:29.731344 unpack-trees.c:1358     performance: 33.889840200 s: traverse_trees
> >> 12:32:37.512555 read-cache.c:2541       performance: 1.564438300 s: write index, changed mask = 28
> >> 12:32:44.918730 unpack-trees.c:1358     performance: 7.243155600 s: traverse_trees
> >> 12:32:44.965611 diff-lib.c:527          performance: 7.374729200 s: diff-index
> >> Waiting for GVFS to parse index and update placeholder files...Succeeded
> >> 12:32:46.824986 trace.c:420             performance: 57.715656000 s: git command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout
> >
> > What's the current state of the index before this checkout?
>
> This was after running "git checkout" multiple times so there was really
> nothing for git to do.

Hmm.. this means cache-tree is fully valid, unless you have changes in
index. We're quite aggressive in repairing cache-tree since aecf567cbf
(cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
have very good cache-tree records and still spend 33s on
traverse_trees, maybe there's something else.

> >> ODB cache
> >> =========
> >> Since traverse_trees() hits the ODB for each tree object (of which there are
> >> over 500K in this repo) I wrote and tested having an in-memory ODB cache
> >> that cached all tree objects.  This resulted in a > 50% hit ratio (largely
> >> due to the fact we traverse the tree twice during checkout) but resulted in
> >> only a minimal savings (1.3 seconds).
> >
> > In my experience, one major cost of object access is decompression, both
> > delta and zlib. Trees in particular tend to delta very well across
> > versions. We have a cache to try to reuse intermediate delta results,
> > but the default size is probably woefully undersized for your repository
> > (I know from past tests it's undersized a bit even for the linux
> > kernel).
> >
> > Try bumping core.deltaBaseCacheLimit to see if that has any impact. It's
> > 96MB by default.
> >
> > There may also be some possible work in making it more aggressive about
> > storing the intermediate results. I seem to recall from past
> > explorations that it doesn't keep everything, and I don't know if its
> > heuristics have ever been proven sane.

Could we be a bit more flexible about cache size? Say if we know
there's 8 GB memory still available, we should be able to use like 1
GB at least (and that's done automatically without tinkering with
config).
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-23 17:03     ` Duy Nguyen
@ 2018-07-23 20:51       ` Ben Peart
  2018-07-24  4:20         ` Jeff King
                           ` (2 more replies)
  2018-07-24  4:27       ` Jeff King
  1 sibling, 3 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-23 20:51 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Ben Peart, Git Mailing List, Junio C Hamano



On 7/23/2018 1:03 PM, Duy Nguyen wrote:
> On Mon, Jul 23, 2018 at 5:50 PM Ben Peart <peartben@gmail.com> wrote:
>>> Anyway, on to the actual discussion:
>>>
>>>> Here is a checkout command with tracing turned on to demonstrate where the
>>>> time is spent.  Note, this is somewhat of a �best case� as I�m simply
>>>> checking out the current commit:
>>>>
>>>> benpeart@gvfs-perf MINGW64 /f/os/src (official/rs_es_debug_dev)
>>>> $ /usr/src/git/git.exe checkout
>>>> 12:31:50.419016 read-cache.c:2006       performance: 1.180966800 s: read cache .git/index
>>>> 12:31:51.184636 name-hash.c:605         performance: 0.664575200 s: initialize name hash
>>>> 12:31:51.200280 preload-index.c:111     performance: 0.019811600 s: preload index
>>>> 12:31:51.294012 read-cache.c:1543       performance: 0.094515600 s: refresh index
>>>> 12:32:29.731344 unpack-trees.c:1358     performance: 33.889840200 s: traverse_trees
>>>> 12:32:37.512555 read-cache.c:2541       performance: 1.564438300 s: write index, changed mask = 28
>>>> 12:32:44.918730 unpack-trees.c:1358     performance: 7.243155600 s: traverse_trees
>>>> 12:32:44.965611 diff-lib.c:527          performance: 7.374729200 s: diff-index
>>>> Waiting for GVFS to parse index and update placeholder files...Succeeded
>>>> 12:32:46.824986 trace.c:420             performance: 57.715656000 s: git command: 'C:\git-sdk-64\usr\src\git\git.exe' checkout
>>>
>>> What's the current state of the index before this checkout?
>>
>> This was after running "git checkout" multiple times so there was really
>> nothing for git to do.
> 
> Hmm.. this means cache-tree is fully valid, unless you have changes in
> index. We're quite aggressive in repairing cache-tree since aecf567cbf
> (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
> have very good cache-tree records and still spend 33s on
> traverse_trees, maybe there's something else.
> 

I'm not at all familiar with the cache-tree and couldn't find any 
documentation on it other than index-format.txt which says "it helps 
speed up tree object generation for a new commit."  In this particular 
case, no new commit is being created so I don't know that the cache-tree 
would help.

After a quick look at the code, the only place I can find that tries to 
use cache_tree_matches_traversal() is in unpack_callback() and that only 
happens if n == 1 and in the "git checkout" case, n == 2. Am I missing 
something?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-23 20:51       ` Ben Peart
@ 2018-07-24  4:20         ` Jeff King
  2018-07-24 15:33           ` Duy Nguyen
  2018-07-24  5:54         ` Junio C Hamano
  2018-07-24 15:13         ` Duy Nguyen
  2 siblings, 1 reply; 121+ messages in thread
From: Jeff King @ 2018-07-24  4:20 UTC (permalink / raw)
  To: Ben Peart; +Cc: Duy Nguyen, Ben Peart, Git Mailing List, Junio C Hamano

On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote:

> > Hmm.. this means cache-tree is fully valid, unless you have changes in
> > index. We're quite aggressive in repairing cache-tree since aecf567cbf
> > (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
> > have very good cache-tree records and still spend 33s on
> > traverse_trees, maybe there's something else.
> 
> I'm not at all familiar with the cache-tree and couldn't find any
> documentation on it other than index-format.txt which says "it helps speed
> up tree object generation for a new commit."  In this particular case, no
> new commit is being created so I don't know that the cache-tree would help.

It's basically an index extension that mirrors the tree structure within
the index, telling you the sha1 of the three that _would_ be generated
from any particular path. So any time you're walking a tree alongside
the index, in theory you should be able to say "the cache-tree for this
subset of the index matches the tree" and skip over a bunch of entries.

At least that's my view of it. unpack_trees() has always been a
terrifying beast that I've avoided looking too closely at.

> After a quick look at the code, the only place I can find that tries to use
> cache_tree_matches_traversal() is in unpack_callback() and that only happens
> if n == 1 and in the "git checkout" case, n == 2. Am I missing something?

Looks like it's trying to special-case "diff-index --cached". Which
kind-of makes sense. In the non-cached case, we're thinking not only
about the relationship between the index and the tree, but also whether
the on-disk files are up to date.

And that would be the same for checkout. We want to know not only
whether there are changes to make to the index, but also whether the
on-disk files need to be updated from the index.

But I assume in your case that we've just refreshed the index quickly
using fsmonitor. So I think in the long run what you want is:

  1. fsmonitor tells us which index entries are not clean

  2. based on the unclean list, we invalidate cache-tree entries for
     those paths

  3. if we have a valid cache-tree entry, we should be able to skip
     digging into that tree; if not, then we walk the index and tree as
     normal, adding/deleting index entries and updating (or complaining
     about) modified on-disk files

I think the "n" adds an extra layer of complexity. n==2 means we're
doing a "2-way" merge. Moving from tree X to tree Y, and dealing with
the index as we go. Naively I _think_ we'd be OK to just extend the rule
to "if both subtrees match each other _and_ match the valid cache-tree,
then we can skip".

Again, I'm a little out of my area of expertise here, but cargo-culting
like this:

diff --git a/sha1-file.c b/sha1-file.c
index de4839e634..c105af70ce 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1375,6 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 
 	if (oid_object_info_extended(the_repository, &oid, &oi, 0) < 0)
 		return NULL;
+	trace_printf("reading %s %s", type_name(*type), sha1_to_hex(sha1));
 	return content;
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..cfdad4133d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1075,6 +1075,23 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 				o->cache_bottom += matches;
 				return mask;
 			}
+		} else if (n == 2 && S_ISDIR(names->mode) &&
+			   names[0].mode == names[1].mode &&
+			   !strcmp(names[0].path, names[1].path) &&
+			   !oidcmp(names[0].oid, names[1].oid)
+			   /* && somehow account for modified on-disk files */) {
+			int matches;
+
+			/*
+			 * we know that the two trees have the same oid, so we
+			 * only need to look at one of them
+			 */
+			matches = cache_tree_matches_traversal(o->src_index->cache_tree,
+							       names, info);
+			if (matches) {
+				o->cache_bottom += matches;
+				return mask;
+			}
 		}
 
 		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,

seems to avoid the tree reads when running "GIT_TRACE=1 git checkout".
It also totally empties the index. ;) So clearly we have to do a bit
more there. Probably rather than just bumping o->cache_bottom forward,
we'd need to actually move those entries into the new index. Or maybe
it's something else entirely (I did say cargo-culting, right?).

-Peff

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-23 17:03     ` Duy Nguyen
  2018-07-23 20:51       ` Ben Peart
@ 2018-07-24  4:27       ` Jeff King
  1 sibling, 0 replies; 121+ messages in thread
From: Jeff King @ 2018-07-24  4:27 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Ben Peart, Git Mailing List, Junio C Hamano

On Mon, Jul 23, 2018 at 07:03:16PM +0200, Duy Nguyen wrote:

> > > Try bumping core.deltaBaseCacheLimit to see if that has any impact. It's
> > > 96MB by default.
> > >
> > > There may also be some possible work in making it more aggressive about
> > > storing the intermediate results. I seem to recall from past
> > > explorations that it doesn't keep everything, and I don't know if its
> > > heuristics have ever been proven sane.
> 
> Could we be a bit more flexible about cache size? Say if we know
> there's 8 GB memory still available, we should be able to use like 1
> GB at least (and that's done automatically without tinkering with
> config).

I have mixed feelings on that kind of auto-scaling for caches. Git isn't
always the only program running (or maybe you even have several git
operations running at once). So in many cases you'd want a more holistic
view of the system, and what resources are available.

The OS already does OK scheduling CPU and the block cache for our mmap'd
files. I don't know if there's a way to communicate with it about this
kind of cache. I guess asking "what memory is free" is one way to do
that. But it's not always the best answer (because we might be happy
trade off some block cache, etc). On the other hand, that would always
give us a conservative value, so if we picked min(96MB, free_mem /
nr_cpu) or something, that might be an OK rule of thumb.

-Peff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-23 20:51       ` Ben Peart
  2018-07-24  4:20         ` Jeff King
@ 2018-07-24  5:54         ` Junio C Hamano
  2018-07-24 15:13         ` Duy Nguyen
  2 siblings, 0 replies; 121+ messages in thread
From: Junio C Hamano @ 2018-07-24  5:54 UTC (permalink / raw)
  To: Ben Peart; +Cc: Duy Nguyen, Jeff King, Ben Peart, Git Mailing List

Ben Peart <peartben@gmail.com> writes:

>> Hmm.. this means cache-tree is fully valid, unless you have changes in
>> index. We're quite aggressive in repairing cache-tree since aecf567cbf
>> (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
>> have very good cache-tree records and still spend 33s on
>> traverse_trees, maybe there's something else.
>>
>
> I'm not at all familiar with the cache-tree and couldn't find any
> documentation on it other than index-format.txt which says "it helps
> speed up tree object generation for a new commit."  In this particular
> case, no new commit is being created so I don't know that the
> cache-tree would help.

cache-tree is an index extension that records tree object names for
subdirectories you see in the index.  Every time you write the
contents of the index as a tree object, we need to collect the
object name for each top-level paths and write a new top-level tree
object out, after doing the same recursively for any modified
subdirectory.  Whenever you add, remove or modify a path in the
index, the cache-tree entry for enclosing directories are
invalidated, so a cache-tree entry that is still valid means that
all the paths in the index under that directory match the contents
of the tree object that the cache-tree entry holds.

And that property is used by "diff-index --cached $TREE" that is run
internally.  When we find that the subdirectory "D"'s cache-tree
entry is valid in the index, and the tree object recorded in the
cache-tree for that subdirectory matches the subtree D in the tree
object $TREE, then "diff-index --cached" ignores the entire
subdirectory D (which saves relatively little in the index as it
only needs to scan what is already in the memory forward, but on the
$TREE traversal side, it does not have to even open a subtree, that
can save a lot), and with a well-populated cache-tree, it can save a
significant processing.

I think that is what Duy meant to refer to while looking at the
numbers.

> After a quick look at the code, the only place I can find that tries
> to use cache_tree_matches_traversal() is in unpack_callback() and that
> only happens if n == 1 and in the "git checkout" case, n == 2. Am I
> missing something?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-23 20:51       ` Ben Peart
  2018-07-24  4:20         ` Jeff King
  2018-07-24  5:54         ` Junio C Hamano
@ 2018-07-24 15:13         ` Duy Nguyen
  2018-07-24 21:21           ` Jeff King
  2018-07-25 16:09           ` Ben Peart
  2 siblings, 2 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-07-24 15:13 UTC (permalink / raw)
  To: Ben Peart; +Cc: Jeff King, Ben Peart, Git Mailing List, Junio C Hamano

On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote:
> >>> What's the current state of the index before this checkout?
> >>
> >> This was after running "git checkout" multiple times so there was really
> >> nothing for git to do.
> > 
> > Hmm.. this means cache-tree is fully valid, unless you have changes in
> > index. We're quite aggressive in repairing cache-tree since aecf567cbf
> > (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
> > have very good cache-tree records and still spend 33s on
> > traverse_trees, maybe there's something else.
> > 
> 
> I'm not at all familiar with the cache-tree and couldn't find any 
> documentation on it other than index-format.txt which says "it helps 
> speed up tree object generation for a new commit."

I guess you have the starting points you need after Jeff's and Junio's
explanation (and it would be great if cache-tree could actually be for
for this two-way merge). But to make it easier for new people in
future, maybe we should add this?

This is basically a ripoff of Junio's explanation with starting points
(write-tree and index-format.txt). I wanted to incorporate some pieces
from Jeff's too but I think Junio's already covered it well.

-- 8< --
Subject: [PATCH] cache-tree.h: more description of what it is and what's it used for

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/cache-tree.h b/cache-tree.h
index cfd5328cc9..d25a800a72 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -5,6 +5,35 @@
 #include "tree.h"
 #include "tree-walk.h"
 
+/*
+ * cache-tree is an index extension that records tree object names for
+ * subdirectories you see in the index. It is mainly used for
+ * generating trees from the index before you create a new commit (see
+ * builtin/write-tree.c as starting point) but it's also used in "git
+ * diff-index --cached $TREE" as an optimization. See index-format.txt
+ * for on-disk format.
+ *
+ * Every time you write the contents of the index as a tree object, we
+ * need to collect the object name for each top-level paths and write
+ * a new top-level tree object out, after doing the same recursively
+ * for any modified subdirectory. Whenever you add, remove or modify a
+ * path in the index, the cache-tree entry for enclosing directories
+ * are invalidated, so a cache-tree entry that is still valid means
+ * that all the paths in the index under that directory match the
+ * contents of the tree object that the cache-tree entry holds.
+ *
+ * And that property is used by "diff-index --cached $TREE" that is
+ * run internally.  When we find that the subdirectory "D"'s
+ * cache-tree entry is valid in the index, and the tree object
+ * recorded in the cache-tree for that subdirectory matches the
+ * subtree D in the tree object $TREE, then "diff-index --cached"
+ * ignores the entire subdirectory D (which saves relatively little in
+ * the index as it only needs to scan what is already in the memory
+ * forward, but on the $TREE traversal side, it does not have to even
+ * open a subtree, that can save a lot), and with a well-populated
+ * cache-tree, it can save a significant processing.
+ */
+
 struct cache_tree;
 struct cache_tree_sub {
 	struct cache_tree *cache_tree;
-- 
2.18.0.656.gda699b98b3

-- 8< --

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-24  4:20         ` Jeff King
@ 2018-07-24 15:33           ` Duy Nguyen
  2018-07-25 20:56             ` Ben Peart
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-07-24 15:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Ben Peart, Ben Peart, Git Mailing List, Junio C Hamano

On Tue, Jul 24, 2018 at 6:20 AM Jeff King <peff@peff.net> wrote:
> At least that's my view of it. unpack_trees() has always been a
> terrifying beast that I've avoided looking too closely at.

/me nods on the terrifying part.

> > After a quick look at the code, the only place I can find that tries to use
> > cache_tree_matches_traversal() is in unpack_callback() and that only happens
> > if n == 1 and in the "git checkout" case, n == 2. Am I missing something?

So we do not actually use cache-tree? Big optimization opportunity (if
we can make it!).

> Looks like it's trying to special-case "diff-index --cached". Which
> kind-of makes sense. In the non-cached case, we're thinking not only
> about the relationship between the index and the tree, but also whether
> the on-disk files are up to date.
>
> And that would be the same for checkout. We want to know not only
> whether there are changes to make to the index, but also whether the
> on-disk files need to be updated from the index.
>
> But I assume in your case that we've just refreshed the index quickly
> using fsmonitor. So I think in the long run what you want is:
>
>   1. fsmonitor tells us which index entries are not clean
>
>   2. based on the unclean list, we invalidate cache-tree entries for
>      those paths
>
>   3. if we have a valid cache-tree entry, we should be able to skip
>      digging into that tree; if not, then we walk the index and tree as
>      normal, adding/deleting index entries and updating (or complaining
>      about) modified on-disk files

If you tie this optimization to twoway_merge specifically (by checking
"fn" field), then I think we can do it even better. Since
cache_tree_matches_traversal() is one (hopefully not too costly)
lookup, we can do it without checking with fsmonitor or whatever and
only do so when we have found a cache tree.

Then if we write this new special code just for twoway_merge, we need
to tighten the checks a bit. I think in this case twoway_merge() will
be called with "oldtree" as same as "newtree" (and "current" may
contains dirty stuff from the index). Then

 - o->df_conflict_entry should be NULL (because we handle it slightly
differently in twoway_merge)
 - "current" should not have CE_CONFLICTED

then I believe we will fall into case /* 20 or 21 */ where
merged_entry() is suppoed to be called on all entries and it would
change nothing in the index since newtree is the same as oldtree, and
we could just jump over the whole tree in traverse_trees().

> I think the "n" adds an extra layer of complexity. n==2 means we're
> doing a "2-way" merge. Moving from tree X to tree Y, and dealing with
> the index as we go. Naively I _think_ we'd be OK to just extend the rule
> to "if both subtrees match each other _and_ match the valid cache-tree,
> then we can skip".
>
> Again, I'm a little out of my area of expertise here, but cargo-culting
> like this:
>
> diff --git a/sha1-file.c b/sha1-file.c
> index de4839e634..c105af70ce 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -1375,6 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
>
>         if (oid_object_info_extended(the_repository, &oid, &oi, 0) < 0)
>                 return NULL;
> +       trace_printf("reading %s %s", type_name(*type), sha1_to_hex(sha1));
>         return content;
>  }
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 66741130ae..cfdad4133d 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1075,6 +1075,23 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>                                 o->cache_bottom += matches;
>                                 return mask;
>                         }
> +               } else if (n == 2 && S_ISDIR(names->mode) &&
> +                          names[0].mode == names[1].mode &&
> +                          !strcmp(names[0].path, names[1].path) &&
> +                          !oidcmp(names[0].oid, names[1].oid)
> +                          /* && somehow account for modified on-disk files */) {
> +                       int matches;
> +
> +                       /*
> +                        * we know that the two trees have the same oid, so we
> +                        * only need to look at one of them
> +                        */
> +                       matches = cache_tree_matches_traversal(o->src_index->cache_tree,
> +                                                              names, info);
> +                       if (matches) {
> +                               o->cache_bottom += matches;
> +                               return mask;
> +                       }
>                 }
>
>                 if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>
> seems to avoid the tree reads when running "GIT_TRACE=1 git checkout".
> It also totally empties the index. ;) So clearly we have to do a bit
> more there. Probably rather than just bumping o->cache_bottom forward,
> we'd need to actually move those entries into the new index. Or maybe
> it's something else entirely (I did say cargo-culting, right?).

Ah this cache_bottom magic. I think this is Junio's alley ;-)

> -Peff
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-24 15:13         ` Duy Nguyen
@ 2018-07-24 21:21           ` Jeff King
  2018-07-25 16:09           ` Ben Peart
  1 sibling, 0 replies; 121+ messages in thread
From: Jeff King @ 2018-07-24 21:21 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Ben Peart, Git Mailing List, Junio C Hamano

On Tue, Jul 24, 2018 at 05:13:36PM +0200, Duy Nguyen wrote:

> I guess you have the starting points you need after Jeff's and Junio's
> explanation (and it would be great if cache-tree could actually be for
> for this two-way merge). But to make it easier for new people in
> future, maybe we should add this?
> 
> This is basically a ripoff of Junio's explanation with starting points
> (write-tree and index-format.txt). I wanted to incorporate some pieces
> from Jeff's too but I think Junio's already covered it well.
> 
> -- 8< --
> Subject: [PATCH] cache-tree.h: more description of what it is and what's it used for

There is some discussion of this extension in
Documentation/technical/index-format.txt. But it's mostly the mechanical
bits, not how or why you would use it.

I like the idea of putting this explanation into the repo, though a lot
of it is pretty specific to "diff-index --cached", which could
potentially grow stale.

-Peff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-24 15:13         ` Duy Nguyen
  2018-07-24 21:21           ` Jeff King
@ 2018-07-25 16:09           ` Ben Peart
  1 sibling, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-25 16:09 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Ben Peart, Git Mailing List, Junio C Hamano



On 7/24/2018 11:13 AM, Duy Nguyen wrote:
> On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote:
>>>>> What's the current state of the index before this checkout?
>>>>
>>>> This was after running "git checkout" multiple times so there was really
>>>> nothing for git to do.
>>>
>>> Hmm.. this means cache-tree is fully valid, unless you have changes in
>>> index. We're quite aggressive in repairing cache-tree since aecf567cbf
>>> (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
>>> have very good cache-tree records and still spend 33s on
>>> traverse_trees, maybe there's something else.
>>>
>>
>> I'm not at all familiar with the cache-tree and couldn't find any
>> documentation on it other than index-format.txt which says "it helps
>> speed up tree object generation for a new commit."
> 
> I guess you have the starting points you need after Jeff's and Junio's
> explanation (and it would be great if cache-tree could actually be for
> for this two-way merge). But to make it easier for new people in
> future, maybe we should add this?
> 
> This is basically a ripoff of Junio's explanation with starting points
> (write-tree and index-format.txt). I wanted to incorporate some pieces
> from Jeff's too but I think Junio's already covered it well.
> 

I definitely like capturing this in the code or documentation somewhere. 
  Given I checked the header file for any hints on the design, I think 
that is a reasonable place to put it.

> -- 8< --
> Subject: [PATCH] cache-tree.h: more description of what it is and what's it used for
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   cache-tree.h | 29 +++++++++++++++++++++++++++++
>   1 file changed, 29 insertions(+)
> 
> diff --git a/cache-tree.h b/cache-tree.h
> index cfd5328cc9..d25a800a72 100644
> --- a/cache-tree.h
> +++ b/cache-tree.h
> @@ -5,6 +5,35 @@
>   #include "tree.h"
>   #include "tree-walk.h"
>   
> +/*
> + * cache-tree is an index extension that records tree object names for
> + * subdirectories you see in the index. It is mainly used for
> + * generating trees from the index before you create a new commit (see
> + * builtin/write-tree.c as starting point) but it's also used in "git
> + * diff-index --cached $TREE" as an optimization. See index-format.txt
> + * for on-disk format.
> + *
> + * Every time you write the contents of the index as a tree object, we

I had to read this a couple of times to figure out what was meant by 
"write the contents of the index as a tree object."  Maybe it was just 
me but how about something like:

"Every time you write a new tree object from the index you need to 
collect the object name for each top-level path and write a new 
top-level tree object out and then do the same recursively for any 
subdirectory."

> + * need to collect the object name for each top-level paths and write
> + * a new top-level tree object out, after doing the same recursively
> + * for any modified subdirectory. Whenever you add, remove or modify a
> + * path in the index, the cache-tree entry for enclosing directories
> + * are invalidated, so a cache-tree entry that is still valid means
> + * that all the paths in the index under that directory match the
> + * contents of the tree object that the cache-tree entry holds.
> + *
> + * And that property is used by "diff-index --cached $TREE" that is
> + * run internally.  When we find that the subdirectory "D"'s
> + * cache-tree entry is valid in the index, and the tree object
> + * recorded in the cache-tree for that subdirectory matches the
> + * subtree D in the tree object $TREE, then "diff-index --cached"
> + * ignores the entire subdirectory D (which saves relatively little in
> + * the index as it only needs to scan what is already in the memory
> + * forward, but on the $TREE traversal side, it does not have to even
> + * open a subtree, that can save a lot), and with a well-populated
> + * cache-tree, it can save a significant processing.
> + */
> +
>   struct cache_tree;
>   struct cache_tree_sub {
>   	struct cache_tree *cache_tree;
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-24 15:33           ` Duy Nguyen
@ 2018-07-25 20:56             ` Ben Peart
  2018-07-26  5:30               ` Duy Nguyen
  2018-07-26 16:35               ` Duy Nguyen
  0 siblings, 2 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-25 20:56 UTC (permalink / raw)
  To: Duy Nguyen, Jeff King; +Cc: Ben Peart, Git Mailing List, Junio C Hamano



On 7/24/2018 11:33 AM, Duy Nguyen wrote:
> On Tue, Jul 24, 2018 at 6:20 AM Jeff King <peff@peff.net> wrote:
>> At least that's my view of it. unpack_trees() has always been a
>> terrifying beast that I've avoided looking too closely at.
> 
> /me nods on the terrifying part.
> 
>>> After a quick look at the code, the only place I can find that tries to use
>>> cache_tree_matches_traversal() is in unpack_callback() and that only happens
>>> if n == 1 and in the "git checkout" case, n == 2. Am I missing something?
> 
> So we do not actually use cache-tree? Big optimization opportunity (if
> we can make it!).
> 

I agree!  Assuming we can figure out the technical issues around using 
the cache tree to optimize two way merges, another question I'm trying 
to answer is how we can enable this optimization without causing back 
compat issues?

We're discussing detecting that there are no changes for parts of the 
tree between two commits but that isn't the only thing that can trigger 
changes to be made to the index entries and working directory. Changes 
can come from other inputs as well.

One example I am aware of is sparse-checkout.  If you made changes to 
your sparse checkout settings or $GIT_DIR/info/sparse-checkout file, 
that could trigger the need to update index entries and files in the 
working directory.  Since that is a relatively rare occurrence, I can 
see detecting changes to those settings/file and bypassing the 
optimization if there have been changes.  But are there other cases of 
things that could cause unexpected changes in behavior?

One thought I had was to put the optimization behind a config setting so 
that people had to opt-in to the difference in behavior.  I submitted a 
canary patch [1] to test out how receptive people would be to that idea. 
  Hopefully I can get some feedback on that aspect of the patch.

[1] 
https://public-inbox.org/git/ab8ee481-54fa-a014-69d9-8f621b136766@gmail.com/T/#m2a425a23df5e064a79b0a72537a5dd6ccba3b07b

>> Looks like it's trying to special-case "diff-index --cached". Which
>> kind-of makes sense. In the non-cached case, we're thinking not only
>> about the relationship between the index and the tree, but also whether
>> the on-disk files are up to date.
>>
>> And that would be the same for checkout. We want to know not only
>> whether there are changes to make to the index, but also whether the
>> on-disk files need to be updated from the index.
>>
>> But I assume in your case that we've just refreshed the index quickly
>> using fsmonitor. So I think in the long run what you want is:
>>
>>    1. fsmonitor tells us which index entries are not clean
>>
>>    2. based on the unclean list, we invalidate cache-tree entries for
>>       those paths
>>
>>    3. if we have a valid cache-tree entry, we should be able to skip
>>       digging into that tree; if not, then we walk the index and tree as
>>       normal, adding/deleting index entries and updating (or complaining
>>       about) modified on-disk files
> 
> If you tie this optimization to twoway_merge specifically (by checking
> "fn" field), then I think we can do it even better. Since
> cache_tree_matches_traversal() is one (hopefully not too costly)
> lookup, we can do it without checking with fsmonitor or whatever and
> only do so when we have found a cache tree.
> 
> Then if we write this new special code just for twoway_merge, we need
> to tighten the checks a bit. I think in this case twoway_merge() will
> be called with "oldtree" as same as "newtree" (and "current" may
> contains dirty stuff from the index). Then
> 
>   - o->df_conflict_entry should be NULL (because we handle it slightly
> differently in twoway_merge)
>   - "current" should not have CE_CONFLICTED
> 
> then I believe we will fall into case /* 20 or 21 */ where
> merged_entry() is suppoed to be called on all entries and it would
> change nothing in the index since newtree is the same as oldtree, and
> we could just jump over the whole tree in traverse_trees().
> 

I'm fine with tying specific optimizations to twoway_merge as that is a 
very common (if not the most common) merge.

I'm still very new to this part of the code so am trying to figure out 
what you're suggesting.  I've read your description a few times and what 
I'm getting out of it is that with some additional checks (ie verify 
it's a twoway_merge, df_conflict_entry, not CE_CONFLICTED) that we 
should be able to skip the whole tree similar to how Peff demonstrated 
below without having to invalidate the cache tree to reflect modified 
on-disk files.  Is that correct or am I missing something?

>> I think the "n" adds an extra layer of complexity. n==2 means we're
>> doing a "2-way" merge. Moving from tree X to tree Y, and dealing with
>> the index as we go. Naively I _think_ we'd be OK to just extend the rule
>> to "if both subtrees match each other _and_ match the valid cache-tree,
>> then we can skip".
>>
>> Again, I'm a little out of my area of expertise here, but cargo-culting
>> like this:
>>
>> diff --git a/sha1-file.c b/sha1-file.c
>> index de4839e634..c105af70ce 100644
>> --- a/sha1-file.c
>> +++ b/sha1-file.c
>> @@ -1375,6 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
>>
>>          if (oid_object_info_extended(the_repository, &oid, &oi, 0) < 0)
>>                  return NULL;
>> +       trace_printf("reading %s %s", type_name(*type), sha1_to_hex(sha1));
>>          return content;
>>   }
>>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 66741130ae..cfdad4133d 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -1075,6 +1075,23 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
>>                                  o->cache_bottom += matches;
>>                                  return mask;
>>                          }
>> +               } else if (n == 2 && S_ISDIR(names->mode) &&
>> +                          names[0].mode == names[1].mode &&
>> +                          !strcmp(names[0].path, names[1].path) &&
>> +                          !oidcmp(names[0].oid, names[1].oid)
>> +                          /* && somehow account for modified on-disk files */) {
>> +                       int matches;
>> +
>> +                       /*
>> +                        * we know that the two trees have the same oid, so we
>> +                        * only need to look at one of them
>> +                        */
>> +                       matches = cache_tree_matches_traversal(o->src_index->cache_tree,
>> +                                                              names, info);
>> +                       if (matches) {
>> +                               o->cache_bottom += matches;
>> +                               return mask;
>> +                       }
>>                  }
>>
>>                  if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
>>
>> seems to avoid the tree reads when running "GIT_TRACE=1 git checkout".
>> It also totally empties the index. ;) So clearly we have to do a bit
>> more there. Probably rather than just bumping o->cache_bottom forward,
>> we'd need to actually move those entries into the new index. Or maybe
>> it's something else entirely (I did say cargo-culting, right?).
> 
> Ah this cache_bottom magic. I think this is Junio's alley ;-)
> 
>> -Peff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-25 20:56             ` Ben Peart
@ 2018-07-26  5:30               ` Duy Nguyen
  2018-07-26 16:30                 ` Duy Nguyen
  2018-07-26 16:35               ` Duy Nguyen
  1 sibling, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-07-26  5:30 UTC (permalink / raw)
  To: Ben Peart; +Cc: Jeff King, Ben Peart, Git Mailing List, Junio C Hamano

On Wed, Jul 25, 2018 at 10:56 PM Ben Peart <peartben@gmail.com> wrote:
> I'm still very new to this part of the code so am trying to figure out
> what you're suggesting.  I've read your description a few times and what
> I'm getting out of it is that with some additional checks (ie verify
> it's a twoway_merge, df_conflict_entry, not CE_CONFLICTED) that we
> should be able to skip the whole tree similar to how Peff demonstrated
> below without having to invalidate the cache tree to reflect modified
> on-disk files.  Is that correct or am I missing something?

And I didn't give you an easy time because I was not very clear in my
suggestion, I think. So let's start again. But first let's start with
a potentially more generic optimization using cache-tree that I
noticed just now.

You now know traverse_trees() is used to walk N trees and the index at
the same time. Cache tree is also used to quickly check if a big chunk
of the index matches some tree object. So what if we try to avoid tree
objects if possible (which reduces I/O, object inflation and tree
parsing cost)? Let's say we're walking two trees X and Y, then we
notice through cache-tree that X is the same in the index. Then
instead of walking the actual X, you could just get the same entry
from the index and make it "X". This way you only need to walk Y and
the index (until the shared tree ends of course). If Y happens to
match cache-tree too, all the better!

Let's get back to two-way merge. I suggest you read the two-way merge
in git-read-tree.txt. That table could give you a pretty good idea
what's going on. twoway_merge() will be given a tuple of three entries
(I, H, M) of the same path name, for every path. I think what we need
is determine the condition where the outcome is known in advance, so
that we can just skip walking the index for one directory. One of the
checks we could do quickly is I==M or I==H (using cache-tree) and H==M
(using tree hash).

The first obvious cases that we can optimize are

clean (H==M)
       ------
     14 yes                 exists   exists   keep index
     15 no                  exists   exists   keep index

In other words if we know H==M, there's no much we need to do since
we're keeping the index the same. But you don't really know how many
entries are in this directory where H==M. You would need cache-tree
for that, so in reality it's I==H==M.

The "clean" column is what fsmonitor comes in, though I'm not sure if
it's actually needed. I haven't checked how '-u' flag works.

There's two other cases that we can also optimize, though I think it's
less likely to happen:

        clean I==H  I==M (H!=M)
       ------------------
     18 yes   no    yes     exists   exists   keep index
     19 no    no    yes     exists   exists   keep index

Some other cases where I==H can benefit from the generic tree walk
optimization above since we can skip parsing H.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-26  5:30               ` Duy Nguyen
@ 2018-07-26 16:30                 ` Duy Nguyen
  2018-07-26 19:40                   ` Junio C Hamano
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-07-26 16:30 UTC (permalink / raw)
  To: Ben Peart; +Cc: Jeff King, Ben Peart, Git Mailing List, Junio C Hamano

On Thu, Jul 26, 2018 at 07:30:20AM +0200, Duy Nguyen wrote:
> Let's get back to two-way merge. I suggest you read the two-way merge
> in git-read-tree.txt. That table could give you a pretty good idea
> what's going on. twoway_merge() will be given a tuple of three entries
> (I, H, M) of the same path name, for every path. I think what we need
> is determine the condition where the outcome is known in advance, so
> that we can just skip walking the index for one directory. One of the
> checks we could do quickly is I==M or I==H (using cache-tree) and H==M
> (using tree hash).
> 
> The first obvious cases that we can optimize are
> 
> clean (H==M)
>        ------
>      14 yes                 exists   exists   keep index
>      15 no                  exists   exists   keep index
> 
> In other words if we know H==M, there's no much we need to do since
> we're keeping the index the same. But you don't really know how many
> entries are in this directory where H==M. You would need cache-tree
> for that, so in reality it's I==H==M.
> 
> The "clean" column is what fsmonitor comes in, though I'm not sure if
> it's actually needed. I haven't checked how '-u' flag works.
> 
> There's two other cases that we can also optimize, though I think it's
> less likely to happen:
> 
>         clean I==H  I==M (H!=M)
>        ------------------
>      18 yes   no    yes     exists   exists   keep index
>      19 no    no    yes     exists   exists   keep index
> 
> Some other cases where I==H can benefit from the generic tree walk
> optimization above since we can skip parsing H.

I'm excited so I decided to try out anyway. This is what I've come up
with. Switching trees on git.git shows it could skip plenty entries,
so promising. It's ugly and it fails at t6020 though, there's still
work ahead. But I think it'll stop here.

A few notes after getting my hands dirty

- one big difference between diff --cached and checkout is, diff is a
  read-only operation while checkout actually creates new index.  One
  of the side effect is that cache-tree may be destroyed while we're
  walking the trees, i'm not so sure.

- I don't think we even need to a special twoway_merge_same()
  here. That function could just call twoway_merge() with the right
  "src" parameter and the outcome should still be the same. Which
  means it'll work for threeway merge too.

- i'm still scared of that cache_bottom switching to death. no idea
  how it works or if i broke anything by changing the condition there.

-- 8< --
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 28627650cd..276712af64 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -515,6 +515,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		topts.gently = opts->merge && old_branch_info->commit;
 		topts.verbose_update = opts->show_progress;
 		topts.fn = twoway_merge;
+		topts.fn_same = twoway_merge_same;
 		if (opts->overwrite_ignore) {
 			topts.dir = xcalloc(1, sizeof(*topts.dir));
 			topts.dir->flags |= DIR_SHOW_IGNORED;
diff --git a/diff-lib.c b/diff-lib.c
index a9f38eb5a3..48e6c4ab0d 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -485,6 +485,15 @@ static int oneway_diff(const struct cache_entry * const *src,
 	return 0;
 }
 
+static int oneway_diff_cached(int pos, int nr, struct unpack_trees_options *options)
+{
+	/*
+	 * Nothing to do. Unpack-trees can safely skip the whole
+	 * nr_matches cache entries.
+	 */
+	return 0;
+}
+
 static int diff_cache(struct rev_info *revs,
 		      const struct object_id *tree_oid,
 		      const char *tree_name,
@@ -501,8 +510,8 @@ static int diff_cache(struct rev_info *revs,
 	memset(&opts, 0, sizeof(opts));
 	opts.head_idx = 1;
 	opts.index_only = cached;
-	opts.diff_index_cached = (cached &&
-				  !revs->diffopt.flags.find_copies_harder);
+	if (cached && !revs->diffopt.flags.find_copies_harder)
+		opts.fn_same = oneway_diff_cached;
 	opts.merge = 1;
 	opts.fn = oneway_diff;
 	opts.unpack_data = revs;
diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..01e3f38807 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -615,7 +615,7 @@ static void restore_cache_bottom(struct traverse_info *info, int bottom)
 {
 	struct unpack_trees_options *o = info->data;
 
-	if (o->diff_index_cached)
+	if (o->fn_same)
 		return;
 	o->cache_bottom = bottom;
 }
@@ -625,7 +625,7 @@ static int switch_cache_bottom(struct traverse_info *info)
 	struct unpack_trees_options *o = info->data;
 	int ret, pos;
 
-	if (o->diff_index_cached)
+	if (o->fn_same)
 		return 0;
 	ret = o->cache_bottom;
 	pos = find_cache_pos(info->prev, &info->name);
@@ -996,6 +996,43 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+static int skip_dir(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int i, matches;
+	int len;
+	char *name;
+	int pos;
+
+	if (dirmask != ((1 << n) - 1) || !S_ISDIR(names->mode))
+		return 0;
+
+	for (i = 1; i < n; i++)
+		if (oidcmp(names[0].oid, names[i].oid))
+			return 0;
+
+	matches = cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
+	if (!matches)
+		return 0;
+
+	/*
+	 * Everything under the name matches; skip the entire
+	 * hierarchy. fn_same must special cases D/F conflicts in such
+	 * a way that it does not do any look-ahead, so this is safe.
+	 */
+	len = traverse_path_len(info, names);
+	name = xmalloc(len + 1);
+
+	make_traverse_path(name, info, names);
+	pos = index_name_pos(o->src_index, name, len);
+	if (pos >= 0)
+		die("NOOO");
+	trace_printf("dirmask = %lx, path = %s\n", dirmask, name);
+	if (o->fn_same(-pos-1, matches, o))
+		matches = 0;
+	free(name);
+	return matches;
+}
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
@@ -1015,7 +1052,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			int cmp;
 			struct cache_entry *ce;
 
-			if (o->diff_index_cached)
+			if (o->fn_same)
 				ce = next_cache_entry(o);
 			else
 				ce = find_cache_entry(info, p);
@@ -1059,18 +1096,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Now handle any directories.. */
 	if (dirmask) {
-		/* special case: "diff-index --cached" looking at a tree */
-		if (o->diff_index_cached &&
-		    n == 1 && dirmask == 1 && S_ISDIR(names->mode)) {
-			int matches;
-			matches = cache_tree_matches_traversal(o->src_index->cache_tree,
-							       names, info);
-			/*
-			 * Everything under the name matches; skip the
-			 * entire hierarchy.  diff_index_cached codepath
-			 * special cases D/F conflicts in such a way that
-			 * it does not do any look-ahead, so this is safe.
-			 */
+		if (o->fn_same) {
+			int matches = skip_dir(n, mask, dirmask, names, info);
 			if (matches) {
 				o->cache_bottom += matches;
 				return mask;
@@ -1881,6 +1908,7 @@ static int deleted_entry(const struct cache_entry *ce,
 static int keep_entry(const struct cache_entry *ce,
 		      struct unpack_trees_options *o)
 {
+	trace_printf("keep_entry(%s)\n", ce->name);
 	add_entry(o, ce, 0, 0);
 	return 1;
 }
@@ -2132,6 +2160,25 @@ int twoway_merge(const struct cache_entry * const *src,
 	return deleted_entry(oldtree, current, o);
 }
 
+int twoway_merge_same(int pos, int nr, struct unpack_trees_options *o)
+{
+	int i;
+
+	/*
+	 * Since cache-tree at "src" exists, it means there's no
+	 * staged entries here (they would have invalidated cache-tree
+	 * otherwise). So no CE_CONFLICTED.
+	 *
+	 * And because I==H==M, we can't run into d/f conflicts
+	 * either: for every path name, we will always find a _file_
+	 * in the index as well as the two other trees.
+	 */
+	trace_printf("Skipping %d entries\n", nr);
+	for (i = 0; i < nr; i++)
+		keep_entry(o->src_index->cache[pos + i], o);
+	return 0;
+}
+
 /*
  * Bind merge.
  *
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..45c69e2ed0 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -12,6 +12,9 @@ struct exclude_list;
 typedef int (*merge_fn_t)(const struct cache_entry * const *src,
 		struct unpack_trees_options *options);
 
+typedef int (*merge_same_fn_t)(int pos, int nr,
+			       struct unpack_trees_options *options);
+
 enum unpack_trees_error_types {
 	ERROR_WOULD_OVERWRITE = 0,
 	ERROR_NOT_UPTODATE_FILE,
@@ -49,7 +52,6 @@ struct unpack_trees_options {
 		     aggressive,
 		     skip_unmerged,
 		     initial_checkout,
-		     diff_index_cached,
 		     debug_unpack,
 		     skip_sparse_checkout,
 		     gently,
@@ -61,6 +63,7 @@ struct unpack_trees_options {
 	struct dir_struct *dir;
 	struct pathspec *pathspec;
 	merge_fn_t fn;
+	merge_same_fn_t fn_same;
 	const char *msgs[NB_UNPACK_TREES_ERROR_TYPES];
 	struct argv_array msgs_to_free;
 	/*
@@ -92,6 +95,8 @@ int threeway_merge(const struct cache_entry * const *stages,
 		   struct unpack_trees_options *o);
 int twoway_merge(const struct cache_entry * const *src,
 		 struct unpack_trees_options *o);
+int twoway_merge_same(int pos, int nr,
+		      struct unpack_trees_options *o);
 int bind_merge(const struct cache_entry * const *src,
 	       struct unpack_trees_options *o);
 int oneway_merge(const struct cache_entry * const *src,
-- 8< --

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-25 20:56             ` Ben Peart
  2018-07-26  5:30               ` Duy Nguyen
@ 2018-07-26 16:35               ` Duy Nguyen
  1 sibling, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-07-26 16:35 UTC (permalink / raw)
  To: Ben Peart; +Cc: Jeff King, Ben Peart, Git Mailing List, Junio C Hamano

On Wed, Jul 25, 2018 at 10:56 PM Ben Peart <peartben@gmail.com> wrote:
>
>
>
> On 7/24/2018 11:33 AM, Duy Nguyen wrote:
> > On Tue, Jul 24, 2018 at 6:20 AM Jeff King <peff@peff.net> wrote:
> >> At least that's my view of it. unpack_trees() has always been a
> >> terrifying beast that I've avoided looking too closely at.
> >
> > /me nods on the terrifying part.
> >
> >>> After a quick look at the code, the only place I can find that tries to use
> >>> cache_tree_matches_traversal() is in unpack_callback() and that only happens
> >>> if n == 1 and in the "git checkout" case, n == 2. Am I missing something?
> >
> > So we do not actually use cache-tree? Big optimization opportunity (if
> > we can make it!).
> >
>
> I agree!  Assuming we can figure out the technical issues around using
> the cache tree to optimize two way merges, another question I'm trying
> to answer is how we can enable this optimization without causing back
> compat issues?

If it works as I expect, then there's no compat issues at all (exactly
like the diff_index_cached optimization we already have). We simply
find a safe shortcut that does not add any side effects.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-26 16:30                 ` Duy Nguyen
@ 2018-07-26 19:40                   ` Junio C Hamano
  2018-07-27 15:42                     ` Duy Nguyen
  2018-07-27 15:50                     ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
  0 siblings, 2 replies; 121+ messages in thread
From: Junio C Hamano @ 2018-07-26 19:40 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Jeff King, Ben Peart, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> I'm excited so I decided to try out anyway. This is what I've come up
> with. Switching trees on git.git shows it could skip plenty entries,
> so promising. It's ugly and it fails at t6020 though, there's still
> work ahead. But I think it'll stop here.

We are extremely shallow compared to projects like the kernel and
stuff from java land, so that is quite an interesting find.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-26 19:40                   ` Junio C Hamano
@ 2018-07-27 15:42                     ` Duy Nguyen
  2018-07-27 16:22                       ` Ben Peart
                                         ` (2 more replies)
  2018-07-27 15:50                     ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
  1 sibling, 3 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-07-27 15:42 UTC (permalink / raw)
  To: Junio C Hamano, Ben Peart; +Cc: Jeff King, Ben Peart, Git Mailing List

On Thu, Jul 26, 2018 at 12:40:05PM -0700, Junio C Hamano wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > I'm excited so I decided to try out anyway. This is what I've come up
> > with. Switching trees on git.git shows it could skip plenty entries,
> > so promising. It's ugly and it fails at t6020 though, there's still
> > work ahead. But I think it'll stop here.
> 
> We are extremely shallow compared to projects like the kernel and
> stuff from java land, so that is quite an interesting find.
> 

Yeah. I've got a more or less complete patch now with full test suite
passed and even with linux.git, the numbers look pretty good.

Ben, is it possible for you to try this one out? I don't suppose it
will be that good on a real big repo. But I'm curious how much faster
could this patch does.

I'm quite happy that I don't have to make specific code for twoway
merge, which means this patch would also help real merges (3way)
too. Interestingly this also helps reduce traverse_trees() when
diff_index_cached optimization is on. I have no idea how but
well.. can't complain.

-- 8< --
Subject: [PATCH] unpack-trees: optimize walking same trees with cache-tree

In order to merge one or many trees with the index, unpack-trees code
walk multiple trees in parallel with the index and perform n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees.

One nice attribute of cache-tree (and the index) is that the tree is
only flattened (and it's called "the index") and we know how many
files that directory has. With this information, we could avoid
accessing object database to walk tree objects and just take the
entries from the index instead.

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and the apply deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

WIth this patch, switching between two trees on linux.git where
there's only one file changed (toplevel Makefile) seems sped up pretty
good. Total checkout time goes down from 0.543 to 0.352 (35%).
traverse_trees() one twoway merge (the big one in unpack_trees()) goes
from 0.157s to 0.036 (70%).

Note that compared to diff_index_cached optimization (which is very
similar to this) we do more work here. This is because diff_index_cached
only cares about side effect, it does not modify the index, so we can
quickly jump through a big chunk of cache entries. For n-way merge, we
need to add entries and verify stuff, so more CPU cycles.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..9c791b55b2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -642,6 +642,110 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
 	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
+					struct name_entry *names,
+					struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int i;
+
+	if (dirmask != ((1 << n) - 1) || !S_ISDIR(names->mode) || !o->merge)
+		return 0;
+
+	for (i = 1; i < n; i++)
+		if (!are_same_oid(names, names + i))
+			return 0;
+
+	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
+}
+
+/*
+ * Fast path if we detect that all trees are the same as cache-tree at this
+ * path. We'll walk these trees recursively using cache-tree/index instead of
+ * ODB since already know what these trees contain.
+ */
+static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
+				  struct name_entry *names,
+				  struct traverse_info *info)
+{
+	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
+	struct unpack_trees_options *o = info->data;
+	int i, d;
+
+	/*
+	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * do. But we do it in one function call (for even nested trees)
+	 * instead.
+	 *
+	 * D/F conflicts and staged entries are not a concern because cache-tree
+	 * would be invalidated and we would never get here in the first place.
+	 */
+	for (i = 0; i < nr_entries; i++) {
+		struct cache_entry *tree_ce;
+		int len, rc;
+
+		src[0] = o->src_index->cache[pos + i];
+
+		/* Do what unpack_nondirectories() normally does */
+		len = ce_namelen(src[0]);
+		tree_ce = xcalloc(1, cache_entry_size(len));
+
+		tree_ce->ce_mode = src[0]->ce_mode;
+		tree_ce->ce_flags = create_ce_flags(0);
+		tree_ce->ce_namelen = len;
+		oidcpy(&tree_ce->oid, &src[0]->oid);
+		memcpy(tree_ce->name, src[0]->name, len + 1);
+
+		for (d = 1; d <= nr_names; d++)
+			src[d] = tree_ce;
+
+		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
+		free(tree_ce);
+		if (rc < 0)
+			return rc;
+
+		mark_ce_used(src[0], o);
+	}
+	trace_printf("Quick traverse over %d entries from %s to %s\n",
+		     nr_entries,
+		     o->src_index->cache[pos]->name,
+		     o->src_index->cache[pos + nr_entries - 1]->name);
+	return 0;
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+				      struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int len = traverse_path_len(info, names);
+	char *name = xmalloc(len + 1);
+	int pos;
+
+	make_traverse_path(name, info, names);
+	pos = index_name_pos(o->src_index, name, len);
+	if (pos >= 0)
+		BUG("This is so wrong. This is a directory and should not exist in index");
+	pos = -pos - 1;
+	/*
+	 * There's no guarantee that pos points to the first entry of the
+	 * directory. If the directory name is "letters" and there's another
+	 * file named "letters.txt" in the index, pos will point to that file
+	 * instead.
+	 */
+	while (pos < o->src_index->cache_nr) {
+		const struct cache_entry *ce = o->src_index->cache[pos];
+		if (ce_namelen(ce) > len &&
+		    ce->name[len] == '/' &&
+		    !memcmp(ce->name, name, len))
+			break;
+		pos++;
+	}
+	if (pos == o->src_index->cache_nr)
+		BUG("This is still wrong");
+	free(name);
+	return pos;
+}
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
 				    unsigned long df_conflicts,
 				    struct name_entry *names,
@@ -653,6 +757,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	void *buf[MAX_UNPACK_TREES];
 	struct traverse_info newinfo;
 	struct name_entry *p;
+	int nr_entries;
+
+	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
+	if (nr_entries > 0) {
+		struct unpack_trees_options *o = info->data;
+		int pos = index_pos_by_traverse_info(names, info);
+
+		if (!o->merge || df_conflicts)
+			BUG("Wrong condition to get here buddy");
+		return traverse_by_cache_tree(pos, nr_entries, n, names, info);
+	}
 
 	p = names;
 	while (!p->mode)
@@ -812,6 +927,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
 	return ce;
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this funciton
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_nondirectories(int n, unsigned long mask,
 				 unsigned long dirmask,
 				 struct cache_entry **src,
@@ -996,6 +1116,11 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this funciton
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
-- 
2.18.0.656.gda699b98b3

-- 8< --
--
Duy

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-26 19:40                   ` Junio C Hamano
  2018-07-27 15:42                     ` Duy Nguyen
@ 2018-07-27 15:50                     ` Ben Peart
  1 sibling, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-27 15:50 UTC (permalink / raw)
  To: Junio C Hamano, Duy Nguyen; +Cc: Jeff King, Ben Peart, Git Mailing List



On 7/26/2018 3:40 PM, Junio C Hamano wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
> 
>> I'm excited so I decided to try out anyway. This is what I've come up
>> with. Switching trees on git.git shows it could skip plenty entries,
>> so promising. It's ugly and it fails at t6020 though, there's still
>> work ahead. But I think it'll stop here.
> 
> We are extremely shallow compared to projects like the kernel and
> stuff from java land, so that is quite an interesting find.
> 

I had a few minutes so applied this patch to the latest git for windows 
and ran the p0006-read-tree-checkout.sh perf test on the git repo as 
well as a synthetic large repo.  The results look quite promising - up 
to a 28.8% savings!

I'm out of time this week but am _very_ interested in seeing if this can 
be completed successfully.

Ben


git repo results
================

Test                                                            this 
tree          gfw
---------------------------------------------------------------------------------------------------------
0006.2: read-tree br_base br_ballast (1000001) 
1.37(0.04+0.09)    1.34(0.03+0.09) -2.2%
0006.3: switch between br_base br_ballast (1000001) 
50.21(0.07+0.09)   50.22(0.03+0.09) +0.0%
0006.4: switch between br_ballast br_ballast_plus_1 (1000001) 
3.58(0.03+0.09)    4.61(0.03+0.10) +28.8%
0006.5: switch between aliases (1000001) 
3.67(0.03+0.07)    4.56(0.01+0.07) +24.3%


large synthetic repo results
============================

Test                                                            this 
tree          gfw
---------------------------------------------------------------------------------------------------------
0006.2: read-tree br_base br_ballast (1000001) 
1.33(0.04+0.04)    1.33(0.04+0.06) +0.0%
0006.3: switch between br_base br_ballast (1000001) 
48.96(0.03+0.12)   50.76(0.03+0.07) +3.7%
0006.4: switch between br_ballast br_ballast_plus_1 (1000001) 
3.64(0.01+0.09)    4.59(0.06+0.07) +26.1%
0006.5: switch between aliases (1000001) 
3.68(0.03+0.07)    4.66(0.04+0.06) +26.6%

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-27 15:42                     ` Duy Nguyen
@ 2018-07-27 16:22                       ` Ben Peart
  2018-07-27 18:00                         ` Duy Nguyen
  2018-07-27 17:14                       ` Junio C Hamano
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-07-27 16:22 UTC (permalink / raw)
  To: Duy Nguyen, Junio C Hamano; +Cc: Jeff King, Ben Peart, Git Mailing List



On 7/27/2018 11:42 AM, Duy Nguyen wrote:
> On Thu, Jul 26, 2018 at 12:40:05PM -0700, Junio C Hamano wrote:
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>>> I'm excited so I decided to try out anyway. This is what I've come up
>>> with. Switching trees on git.git shows it could skip plenty entries,
>>> so promising. It's ugly and it fails at t6020 though, there's still
>>> work ahead. But I think it'll stop here.
>>
>> We are extremely shallow compared to projects like the kernel and
>> stuff from java land, so that is quite an interesting find.
>>
> 
> Yeah. I've got a more or less complete patch now with full test suite
> passed and even with linux.git, the numbers look pretty good.
> 
> Ben, is it possible for you to try this one out? I don't suppose it
> will be that good on a real big repo. But I'm curious how much faster
> could this patch does.
> 

Thanks Duy.  I'm super excited about this so did a quick and dirty 
manual perf test.

I ran "git checkout" 5 times, discarded the first 2 runs and averaged 
the last 3 with and without this patch on top of VFSForGit in a large repo.

Without this patch average times were 16.97
With this patch average times were 10.55

That is a significant improvement!

I really have to run but I'll be back next week to dig in more.

Ben

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-27 15:42                     ` Duy Nguyen
  2018-07-27 16:22                       ` Ben Peart
@ 2018-07-27 17:14                       ` Junio C Hamano
  2018-07-27 17:52                         ` Duy Nguyen
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-07-27 17:14 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Jeff King, Ben Peart, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 66741130ae..9c791b55b2 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -642,6 +642,110 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
>  	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
>  }
>  
> +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> +					struct name_entry *names,
> +					struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int i;
> +
> +	if (dirmask != ((1 << n) - 1) || !S_ISDIR(names->mode) || !o->merge)
> +		return 0;

In other words, punt if (1) not all are directories, (2) the first
name entry given by the caller in names[] is not ISDIR(), or (3) we
are not merging i.e. not "Are we supposed to look at the index too?"
in unpack_callback().

I am not sure if the second one is doing us any good.  When
S_ISDIR(names->mode) is not true, then the bit in dirmask that
corresponds to the one in the entry[] traverse_trees() filled and
passed to us must be zero, so the dirmask check would reject such a
case anyway, no?

I would have moved !o->merge to the front, not for performance
reasons but to make it clear that this function helps an
optimization that matters only when we are walking tree(s) together
with the index.

> +	for (i = 1; i < n; i++)
> +		if (!are_same_oid(names, names + i))
> +			return 0;
> +
> +	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> +}
> +
> +/*
> + * Fast path if we detect that all trees are the same as cache-tree at this
> + * path. We'll walk these trees recursively using cache-tree/index instead of
> + * ODB since already know what these trees contain.
> + */
> +static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> +				  struct name_entry *names,
> +				  struct traverse_info *info)
> +{
> +	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> +	struct unpack_trees_options *o = info->data;
> +	int i, d;
> +
> +	/*
> +	 * Do what unpack_callback() and unpack_nondirectories() normally
> +	 * do. But we do it in one function call (for even nested trees)
> +	 * instead.
> +	 *
> +	 * D/F conflicts and staged entries are not a concern because cache-tree
> +	 * would be invalidated and we would never get here in the first place.
> +	 */

We want to at least have

	if (!o->merge || ARRAY_SIZE(src) <= nr_names)
		BUG("");

here, I'd think.

> +	for (i = 0; i < nr_entries; i++) {
> +		struct cache_entry *tree_ce;
> +		int len, rc;
> +
> +		src[0] = o->src_index->cache[pos + i];
> +
> +		/* Do what unpack_nondirectories() normally does */
> +		len = ce_namelen(src[0]);
> +		tree_ce = xcalloc(1, cache_entry_size(len));

unpack_nondirectories() uses create_ce_entry() here.  Any reason why
we shouldn't use it and tell it to make a transient one?

> +		tree_ce->ce_mode = src[0]->ce_mode;
> +		tree_ce->ce_flags = create_ce_flags(0);
> +		tree_ce->ce_namelen = len;
> +		oidcpy(&tree_ce->oid, &src[0]->oid);
> +		memcpy(tree_ce->name, src[0]->name, len + 1);
> +
> +		for (d = 1; d <= nr_names; d++)
> +			src[d] = tree_ce;
> +
> +		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> +		free(tree_ce);
> +		if (rc < 0)
> +			return rc;
> +
> +		mark_ce_used(src[0], o);
> +	}
> +	trace_printf("Quick traverse over %d entries from %s to %s\n",
> +		     nr_entries,
> +		     o->src_index->cache[pos]->name,
> +		     o->src_index->cache[pos + nr_entries - 1]->name);
> +	return 0;
> +}

When I invented the cache-tree originally, primarily to speed up
writing of deeply nested trees, I had the "diff-index --cached"
optimization where a subtree with contents known to be the same as
the corresponding span in the index is entirely skipped without
getting even looked at.  I didn't realize this (now obvious)
optimization that scanning the index is faster than opening and
traversing trees (I was more focused on not even scanning, which
is what "diff-index --cached" optimization was about).

Nice.


> +static int index_pos_by_traverse_info(struct name_entry *names,
> +				      struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int len = traverse_path_len(info, names);
> +	char *name = xmalloc(len + 1);
> +	int pos;
> +
> +	make_traverse_path(name, info, names);
> +	pos = index_name_pos(o->src_index, name, len);
> +	if (pos >= 0)
> +		BUG("This is so wrong. This is a directory and should not exist in index");
> +	pos = -pos - 1;
> +	/*
> +	 * There's no guarantee that pos points to the first entry of the
> +	 * directory. If the directory name is "letters" and there's another
> +	 * file named "letters.txt" in the index, pos will point to that file
> +	 * instead.
> +	 */

Is this trying to address the issue o->cache_bottom,
next_cache_entry(), etc. are trying to address?  i.e. an entry
"letters" appears at a different place relative to other entries in
a tree, depending on the type of the entry itself, so linear and
parallel scan of the index and the trees may miss matching entries
without backtracking?  If so, I am not sure if the loop below is
sufficient.

> +	while (pos < o->src_index->cache_nr) {
> +		const struct cache_entry *ce = o->src_index->cache[pos];
> +		if (ce_namelen(ce) > len &&
> +		    ce->name[len] == '/' &&
> +		    !memcmp(ce->name, name, len))
> +			break;
> +		pos++;
> +	}
> +	if (pos == o->src_index->cache_nr)
> +		BUG("This is still wrong");
> +	free(name);
> +	return pos;
> +}
> +

In anycase, nice progress.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-27 17:14                       ` Junio C Hamano
@ 2018-07-27 17:52                         ` Duy Nguyen
  2018-07-29  6:24                           ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-07-27 17:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Peart, Jeff King, Ben Peart, Git Mailing List

On Fri, Jul 27, 2018 at 7:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Duy Nguyen <pclouds@gmail.com> writes:
>
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index 66741130ae..9c791b55b2 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -642,6 +642,110 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
> >       return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
> >  }
> >
> > +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> > +                                     struct name_entry *names,
> > +                                     struct traverse_info *info)
> > +{
> > +     struct unpack_trees_options *o = info->data;
> > +     int i;
> > +
> > +     if (dirmask != ((1 << n) - 1) || !S_ISDIR(names->mode) || !o->merge)
> > +             return 0;
>
> In other words, punt if (1) not all are directories, (2) the first
> name entry given by the caller in names[] is not ISDIR(), or (3) we
> are not merging i.e. not "Are we supposed to look at the index too?"
> in unpack_callback().
>
> I am not sure if the second one is doing us any good.  When
> S_ISDIR(names->mode) is not true, then the bit in dirmask that
> corresponds to the one in the entry[] traverse_trees() filled and
> passed to us must be zero, so the dirmask check would reject such a
> case anyway, no?

You're right. This code kinda evolved from the diff_index_cached and I
forgot about this.

> > +     for (i = 0; i < nr_entries; i++) {
> > +             struct cache_entry *tree_ce;
> > +             int len, rc;
> > +
> > +             src[0] = o->src_index->cache[pos + i];
> > +
> > +             /* Do what unpack_nondirectories() normally does */
> > +             len = ce_namelen(src[0]);
> > +             tree_ce = xcalloc(1, cache_entry_size(len));
>
> unpack_nondirectories() uses create_ce_entry() here.  Any reason why
> we shouldn't use it and tell it to make a transient one?

That one takes a struct name_entry to recreate the path, which will
not be correct since we will go deep in subdirs in this loop as well.

Side note. I notice that I allocate/free (and memcpy even) more than I
should. The directory part in ce->name for example will never change.
And if the old tree_ce is large enough, we could avoid reallocation
too.

> > +             tree_ce->ce_mode = src[0]->ce_mode;
> > +             tree_ce->ce_flags = create_ce_flags(0);
> > +             tree_ce->ce_namelen = len;
> > +             oidcpy(&tree_ce->oid, &src[0]->oid);
> > +             memcpy(tree_ce->name, src[0]->name, len + 1);
> > +
> > +             for (d = 1; d <= nr_names; d++)
> > +                     src[d] = tree_ce;
> > +
> > +             rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> > +             free(tree_ce);
> > +             if (rc < 0)
> > +                     return rc;
> > +
> > +             mark_ce_used(src[0], o);
> > +     }
> > +     trace_printf("Quick traverse over %d entries from %s to %s\n",
> > +                  nr_entries,
> > +                  o->src_index->cache[pos]->name,
> > +                  o->src_index->cache[pos + nr_entries - 1]->name);
> > +     return 0;
> > +}
>
> When I invented the cache-tree originally, primarily to speed up
> writing of deeply nested trees, I had the "diff-index --cached"
> optimization where a subtree with contents known to be the same as
> the corresponding span in the index is entirely skipped without
> getting even looked at.  I didn't realize this (now obvious)
> optimization that scanning the index is faster than opening and
> traversing trees (I was more focused on not even scanning, which
> is what "diff-index --cached" optimization was about).
>
> Nice.

I would still love to take this further. We should have cache-tree for
like 90% of HEAD, and even if we do 2 or 3 merge where the other trees
are very different, we should be able to just "recreate" HEAD from the
index by using cache-tree.

This is hard though, much trickier than dealing with this case. And I
guess that the benefit will be much smaller so probably not worth the
complexity.

> > +static int index_pos_by_traverse_info(struct name_entry *names,
> > +                                   struct traverse_info *info)
> > +{
> > +     struct unpack_trees_options *o = info->data;
> > +     int len = traverse_path_len(info, names);
> > +     char *name = xmalloc(len + 1);
> > +     int pos;
> > +
> > +     make_traverse_path(name, info, names);
> > +     pos = index_name_pos(o->src_index, name, len);
> > +     if (pos >= 0)
> > +             BUG("This is so wrong. This is a directory and should not exist in index");
> > +     pos = -pos - 1;
> > +     /*
> > +      * There's no guarantee that pos points to the first entry of the
> > +      * directory. If the directory name is "letters" and there's another
> > +      * file named "letters.txt" in the index, pos will point to that file
> > +      * instead.
> > +      */
>
> Is this trying to address the issue o->cache_bottom,
> next_cache_entry(), etc. are trying to address?  i.e. an entry
> "letters" appears at a different place relative to other entries in
> a tree, depending on the type of the entry itself, so linear and
> parallel scan of the index and the trees may miss matching entries
> without backtracking?  If so, I am not sure if the loop below is
> sufficient.

No it's because index_name_pos does not necessarily give us the right
starting point. This is why t6020 fails, where the index has "letters"
and "letters/foo" when the cache-tree for "letters" is valid. -pos-1
would give me the position of "letters", not "letters/foo". Ideally we
should be able to get this starting index from cache-tree code since
we're searching for it in there anyway. Then this code could be gone.

The cache_bottom stuff still scares me though. I reuse mark_ce_used()
with hope that it deals with cache_bottom correctly. And as you note,
the lookahead code to deal with D/F conflicts could probably mess up
here too. You're probably the best one to check this ;-)

> > +     while (pos < o->src_index->cache_nr) {
> > +             const struct cache_entry *ce = o->src_index->cache[pos];
> > +             if (ce_namelen(ce) > len &&
> > +                 ce->name[len] == '/' &&
> > +                 !memcmp(ce->name, name, len))
> > +                     break;
> > +             pos++;
> > +     }
> > +     if (pos == o->src_index->cache_nr)
> > +             BUG("This is still wrong");
> > +     free(name);
> > +     return pos;
> > +}
> > +
>
> In anycase, nice progress.

Just FYI I'm still trying to reduce execution time further and this
change happens to half traverse_trees() time (which is a huge deal)

diff --git a/unpack-trees.c b/unpack-trees.c
index f0be9f298d..a2e63ad5bf 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -201,7 +201,7 @@ static int do_add_entry(struct
unpack_trees_options *o, struct cache_entry *ce,

        ce->ce_flags = (ce->ce_flags & ~clear) | set;
        return add_index_entry(&o->result, ce,
-                              ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
+                              ADD_CACHE_JUST_APPEND |
ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
 }

 static struct cache_entry *dup_entry(const struct cache_entry *ce)

It's probably not the right thing to do of course. But perhaps we
could do something in that direction (e.g. validate everything at the
end of traverse_by_cache_tree...)
-- 
Duy

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-27 16:22                       ` Ben Peart
@ 2018-07-27 18:00                         ` Duy Nguyen
  0 siblings, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-07-27 18:00 UTC (permalink / raw)
  To: Ben Peart; +Cc: Junio C Hamano, Jeff King, Ben Peart, Git Mailing List

On Fri, Jul 27, 2018 at 6:22 PM Ben Peart <peartben@gmail.com> wrote:
>
>
>
> On 7/27/2018 11:42 AM, Duy Nguyen wrote:
> > On Thu, Jul 26, 2018 at 12:40:05PM -0700, Junio C Hamano wrote:
> >> Duy Nguyen <pclouds@gmail.com> writes:
> >>
> >>> I'm excited so I decided to try out anyway. This is what I've come up
> >>> with. Switching trees on git.git shows it could skip plenty entries,
> >>> so promising. It's ugly and it fails at t6020 though, there's still
> >>> work ahead. But I think it'll stop here.
> >>
> >> We are extremely shallow compared to projects like the kernel and
> >> stuff from java land, so that is quite an interesting find.
> >>
> >
> > Yeah. I've got a more or less complete patch now with full test suite
> > passed and even with linux.git, the numbers look pretty good.
> >
> > Ben, is it possible for you to try this one out? I don't suppose it
> > will be that good on a real big repo. But I'm curious how much faster
> > could this patch does.
> >
>
> Thanks Duy.  I'm super excited about this so did a quick and dirty
> manual perf test.
>
> I ran "git checkout" 5 times, discarded the first 2 runs and averaged
> the last 3 with and without this patch on top of VFSForGit in a large repo.
>
> Without this patch average times were 16.97
> With this patch average times were 10.55
>
> That is a significant improvement!

Meh! Junio cut down time to like 1/5th in b65982b608 (Optimize
"diff-index --cached" using cache-tree - 2009-05-20). This is not
enough!

OK i'm kidding :) I'd like to see you measure traverse_trees like in
your first mail though. Total checkout number is nice and all but I
still like to see exactly how much time is reduced in traverse_trees()
alone (or unpack_trees() to be precise). That would give me a much
better picture of this unpacking business.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
  2018-07-27 17:52                         ` Duy Nguyen
@ 2018-07-29  6:24                           ` Duy Nguyen
  0 siblings, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-07-29  6:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Peart, Jeff King, Ben Peart, Git Mailing List

On Fri, Jul 27, 2018 at 07:52:33PM +0200, Duy Nguyen wrote:
> Just FYI I'm still trying to reduce execution time further and this
> change happens to half traverse_trees() time (which is a huge deal)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index f0be9f298d..a2e63ad5bf 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -201,7 +201,7 @@ static int do_add_entry(struct
> unpack_trees_options *o, struct cache_entry *ce,
> 
>         ce->ce_flags = (ce->ce_flags & ~clear) | set;
>         return add_index_entry(&o->result, ce,
> -                              ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
> +                              ADD_CACHE_JUST_APPEND |
> ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
>  }
> 
>  static struct cache_entry *dup_entry(const struct cache_entry *ce)
> 
> It's probably not the right thing to do of course. But perhaps we
> could do something in that direction (e.g. validate everything at the
> end of traverse_by_cache_tree...)

It's just too much computation that could be reduced. The following
patch gives more or less the same performance gain as adding
ADD_CACHE_JUST_APPEND (traverse_trees() time cut down by half).

Of these, the walking cache-tree inside add_index_entry_with_check()
is most expensive and we probably could just walk the cache-tree in
traverse_by_cache_tree() loop and do the invalidation there instead.

-- 8< --
diff --git a/cache.h b/cache.h
index 8b447652a7..e6f7ee4b64 100644
--- a/cache.h
+++ b/cache.h
@@ -673,6 +673,7 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
 #define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
 #define ADD_CACHE_NEW_ONLY 16		/* Do not replace existing ones */
 #define ADD_CACHE_KEEP_CACHE_TREE 32	/* Do not invalidate cache-tree */
+#define ADD_CACHE_SKIP_VERIFY_PATH 64	/* Do not verify path */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
 extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
 
diff --git a/read-cache.c b/read-cache.c
index e865254bea..b0b5df5de7 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1170,6 +1170,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
 	int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
+	int skip_verify_path = option & ADD_CACHE_SKIP_VERIFY_PATH;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
 	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
@@ -1210,7 +1211,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 
 	if (!ok_to_add)
 		return -1;
-	if (!verify_path(ce->name, ce->ce_mode))
+	if (!skip_verify_path && !verify_path(ce->name, ce->ce_mode))
 		return error("Invalid path '%s'", ce->name);
 
 	if (!skip_df_check &&
diff --git a/unpack-trees.c b/unpack-trees.c
index f2a2db6ab8..ff6a0f2bd3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -201,6 +201,7 @@ static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
 
 	ce->ce_flags = (ce->ce_flags & ~clear) | set;
 	return add_index_entry(&o->result, ce,
+			       o->extra_add_index_flags |
 			       ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
 }
 
@@ -678,6 +679,25 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	const char *first_name = o->src_index->cache[pos]->name;
 	int dirlen = (strrchr(first_name, '/') - first_name)+1;
 
+	/*
+	 * Try to keep add_index_entry() as fast as possible since
+	 * we're going to do a lot of them.
+	 *
+	 * Skipping verify_path() should totally be safe because these
+	 * paths are from the source index, which must have been
+	 * verified.
+	 *
+	 * Skipping D/F and cache-tree validation checks is trickier
+	 * because it assumes what n-merge code would do when all
+	 * trees and the index are the same. We probably could just
+	 * optimize those code instead (e.g. we don't invalidate that
+	 * many cache-tree, but the searching for them is very
+	 * expensive).
+	 */
+	o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
+	o->extra_add_index_flags |= ADD_CACHE_KEEP_CACHE_TREE;
+	o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
+
 	/*
 	 * Do what unpack_callback() and unpack_nondirectories() normally
 	 * do. But we do it in one function call (for even nested trees)
@@ -721,6 +741,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 
 		mark_ce_used(src[0], o);
 	}
+	o->extra_add_index_flags = 0;
 	free(tree_ce);
 	trace_printf("Quick traverse over %d entries from %s to %s\n",
 		     nr_entries,
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..94e1b14078 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -80,6 +80,7 @@ struct unpack_trees_options {
 	struct index_state result;
 
 	struct exclude_list *el; /* for internal use */
+	unsigned int extra_add_index_flags;
 };
 
 extern int unpack_trees(unsigned n, struct tree_desc *t,
-- 8< --

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 0/4] Speed up unpack_trees()
  2018-07-27 15:42                     ` Duy Nguyen
  2018-07-27 16:22                       ` Ben Peart
  2018-07-27 17:14                       ` Junio C Hamano
@ 2018-07-29 10:33                       ` Nguyễn Thái Ngọc Duy
  2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
                                           ` (6 more replies)
  2 siblings, 7 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-07-29 10:33 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

This series speeds up unpack_trees() a bit by using cache-tree.
unpack-trees could bit split in three big parts

- the actual tree unpacking and running n-way merging
- update worktree, which could be expensive depending on how much I/O
  is involved
- repair cache-tree

This series focuses on the first part alone and could give 700%
speedup (best case possible scenario, real life ones probably not that
impressive).

It also shows that the reparing cache-tree is kinda expensive. I have
an idea of reusing cache-tree from the original index, but I'll leave
that to Ben or others to try out and see if it helps at all.

v2 fixes the comments from Junio, adds more performance tracing and
reduces the cost of adding index entries.

Nguyễn Thái Ngọc Duy (4):
  unpack-trees.c: add performance tracing
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: cheaper index update when walking by cache-tree

 cache-tree.c   |   2 +
 cache.h        |   1 +
 read-cache.c   |   3 +-
 unpack-trees.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++-
 unpack-trees.h |   1 +
 5 files changed, 166 insertions(+), 2 deletions(-)

-- 
2.18.0.656.gda699b98b3


^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v2 1/4] unpack-trees.c: add performance tracing
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
@ 2018-07-29 10:33                         ` Nguyễn Thái Ngọc Duy
  2018-07-30 20:16                           ` Ben Peart
  2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
                                           ` (5 subsequent siblings)
  6 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-07-29 10:33 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on gcc.git, 80k
files on worktree)

    0.018239226 s: read cache .git/index
    0.052541655 s: preload index
    0.001537598 s: refresh index
    0.168167768 s: unpack trees
    0.002897186 s: update worktree after a merge
    0.131661745 s: repair cache-tree
    0.075389117 s: write index, changed mask = 2a
    0.111702023 s: unpack trees
    0.000023245 s: update worktree after a merge
    0.111793866 s: diff-index
    0.587933288 s: git command: /home/pclouds/w/git/git checkout -

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.c   | 2 ++
 unpack-trees.c | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..0dbe10fc85 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -426,6 +426,7 @@ static int update_one(struct cache_tree *it,
 
 int cache_tree_update(struct index_state *istate, int flags)
 {
+	uint64_t start = getnanotime();
 	struct cache_tree *it = istate->cache_tree;
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
@@ -437,6 +438,7 @@ int cache_tree_update(struct index_state *istate, int flags)
 	if (i < 0)
 		return i;
 	istate->cache_changed |= CACHE_TREE_CHANGED;
+	trace_performance_since(start, "repair cache-tree");
 	return 0;
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..dc58d1f5ae 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -352,6 +352,7 @@ static int check_updates(struct unpack_trees_options *o)
 	struct progress *progress = NULL;
 	struct index_state *index = &o->result;
 	struct checkout state = CHECKOUT_INIT;
+	uint64_t start = getnanotime();
 	int i;
 
 	state.force = 1;
@@ -423,6 +424,7 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+	trace_performance_since(start, "update worktree after a merge");
 	return errs != 0;
 }
 
@@ -1275,6 +1277,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	int i, ret;
 	static struct cache_entry *dfc;
 	struct exclude_list el;
+	uint64_t start = getnanotime();
 
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
@@ -1423,6 +1426,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 			goto done;
 		}
 	}
+	trace_performance_since(start, "unpack trees");
 
 	ret = check_updates(o) ? (-2) : 0;
 	if (o->dst_index) {
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
  2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
@ 2018-07-29 10:33                         ` Nguyễn Thái Ngọc Duy
  2018-07-30 20:52                           ` Ben Peart
  2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
                                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-07-29 10:33 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

From: Duy Nguyen <pclouds@gmail.com>

In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and the apply deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on gcc.git:

    baseline      new
  --------------------------------------------------------------------
    0.018239226   0.019365414 s: read cache .git/index
    0.052541655   0.049605548 s: preload index
    0.001537598   0.001571695 s: refresh index
    0.168167768   0.049677212 s: unpack trees
    0.002897186   0.002845256 s: update worktree after a merge
    0.131661745   0.136597522 s: repair cache-tree
    0.075389117   0.075422517 s: write index, changed mask = 2a
    0.111702023   0.032813253 s: unpack trees
    0.000023245   0.000022002 s: update worktree after a merge
    0.111793866   0.032933140 s: diff-index
    0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 118 insertions(+), 1 deletion(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index dc58d1f5ae..39566b28fb 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
 	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
+					struct name_entry *names,
+					struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int i;
+
+	if (!o->merge || dirmask != ((1 << n) - 1))
+		return 0;
+
+	for (i = 1; i < n; i++)
+		if (!are_same_oid(names, names + i))
+			return 0;
+
+	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+				      struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int len = traverse_path_len(info, names);
+	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
+	int pos;
+
+	make_traverse_path(name, info, names);
+	name[len++] = '/';
+	name[len] = '\0';
+	pos = index_name_pos(o->src_index, name, len);
+	if (pos >= 0)
+		BUG("This is a directory and should not exist in index");
+	pos = -pos - 1;
+	if (!starts_with(o->src_index->cache[pos]->name, name) ||
+	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
+		BUG("pos must point at the first entry in this directory");
+	free(name);
+	return pos;
+}
+
+/*
+ * Fast path if we detect that all trees are the same as cache-tree at this
+ * path. We'll walk these trees recursively using cache-tree/index instead of
+ * ODB since already know what these trees contain.
+ */
+static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
+				  struct name_entry *names,
+				  struct traverse_info *info)
+{
+	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
+	struct unpack_trees_options *o = info->data;
+	int i, d;
+
+	if (!o->merge)
+		BUG("We need cache-tree to do this optimization");
+
+	/*
+	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * do. But we walk all paths recursively in just one loop instead.
+	 *
+	 * D/F conflicts and staged entries are not a concern because
+	 * cache-tree would be invalidated and we would never get here
+	 * in the first place.
+	 */
+	for (i = 0; i < nr_entries; i++) {
+		struct cache_entry *tree_ce;
+		int len, rc;
+
+		src[0] = o->src_index->cache[pos + i];
+
+		len = ce_namelen(src[0]);
+		tree_ce = xcalloc(1, cache_entry_size(len));
+
+		tree_ce->ce_mode = src[0]->ce_mode;
+		tree_ce->ce_flags = create_ce_flags(0);
+		tree_ce->ce_namelen = len;
+		oidcpy(&tree_ce->oid, &src[0]->oid);
+		memcpy(tree_ce->name, src[0]->name, len + 1);
+
+		for (d = 1; d <= nr_names; d++)
+			src[d] = tree_ce;
+
+		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
+		free(tree_ce);
+		if (rc < 0)
+			return rc;
+
+		mark_ce_used(src[0], o);
+	}
+	if (o->debug_unpack)
+		printf("Unpacked %d entries from %s to %s using cache-tree\n",
+		       nr_entries,
+		       o->src_index->cache[pos]->name,
+		       o->src_index->cache[pos + nr_entries - 1]->name);
+	return 0;
+}
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
 				    unsigned long df_conflicts,
 				    struct name_entry *names,
@@ -655,6 +751,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	void *buf[MAX_UNPACK_TREES];
 	struct traverse_info newinfo;
 	struct name_entry *p;
+	int nr_entries;
+
+	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
+	if (nr_entries > 0) {
+		struct unpack_trees_options *o = info->data;
+		int pos = index_pos_by_traverse_info(names, info);
+
+		if (!o->merge || df_conflicts)
+			BUG("Wrong condition to get here buddy");
+		return traverse_by_cache_tree(pos, nr_entries, n, names, info);
+	}
 
 	p = names;
 	while (!p->mode)
@@ -814,6 +921,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
 	return ce;
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this funciton
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_nondirectories(int n, unsigned long mask,
 				 unsigned long dirmask,
 				 struct cache_entry **src,
@@ -998,6 +1110,11 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this funciton
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
@@ -1280,7 +1397,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	uint64_t start = getnanotime();
 
 	if (len > MAX_UNPACK_TREES)
-		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
+		die(_("unpack_trees takes at most %d trees"), MAX_UNPACK_TREES);
 
 	memset(&el, 0, sizeof(el));
 	if (!core_apply_sparse_checkout || !o->update)
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
  2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
  2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-07-29 10:33                         ` Nguyễn Thái Ngọc Duy
  2018-07-30 20:58                           ` Ben Peart
  2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
                                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-07-29 10:33 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

This is a micro optimization that probably only shines on repos with
deep directory structure. Instead of allocating and freeing a new
cache_entry in every iteration, we reuse the last one and only update
the parts that are new each iteration.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 39566b28fb..c33ebaf001 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -694,6 +694,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
+	struct cache_entry *tree_ce = NULL;
+	int ce_len = 0;
 	int i, d;
 
 	if (!o->merge)
@@ -708,30 +710,39 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	 * in the first place.
 	 */
 	for (i = 0; i < nr_entries; i++) {
-		struct cache_entry *tree_ce;
-		int len, rc;
+		int new_ce_len, len, rc;
 
 		src[0] = o->src_index->cache[pos + i];
 
 		len = ce_namelen(src[0]);
-		tree_ce = xcalloc(1, cache_entry_size(len));
+		new_ce_len = cache_entry_size(len);
+
+		if (new_ce_len > ce_len) {
+			new_ce_len <<= 1;
+			tree_ce = xrealloc(tree_ce, new_ce_len);
+			memset(tree_ce, 0, new_ce_len);
+			ce_len = new_ce_len;
+
+			tree_ce->ce_flags = create_ce_flags(0);
+
+			for (d = 1; d <= nr_names; d++)
+				src[d] = tree_ce;
+		}
 
 		tree_ce->ce_mode = src[0]->ce_mode;
-		tree_ce->ce_flags = create_ce_flags(0);
 		tree_ce->ce_namelen = len;
 		oidcpy(&tree_ce->oid, &src[0]->oid);
 		memcpy(tree_ce->name, src[0]->name, len + 1);
 
-		for (d = 1; d <= nr_names; d++)
-			src[d] = tree_ce;
-
 		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
-		free(tree_ce);
-		if (rc < 0)
+		if (rc < 0) {
+			free(tree_ce);
 			return rc;
+		}
 
 		mark_ce_used(src[0], o);
 	}
+	free(tree_ce);
 	if (o->debug_unpack)
 		printf("Unpacked %d entries from %s to %s using cache-tree\n",
 		       nr_entries,
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
                                           ` (2 preceding siblings ...)
  2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
@ 2018-07-29 10:33                         ` Nguyễn Thái Ngọc Duy
  2018-08-08 18:46                           ` Elijah Newren
  2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
                                           ` (2 subsequent siblings)
  6 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-07-29 10:33 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

With the new cache-tree, we could mostly avoid I/O (due to odb access)
the code mostly becomes a loop of "check this, check that, add the
entry to the index". We could skip a couple checks in this giant loop
to go faster:

- We know here that we're copying entries from the source index to the
  result one. All paths in the source index must have been validated
  at load time already (and we're not taking strange paths from tree
  objects) which means we can skip verify_path() without compromise.

- We also know that D/F conflicts can't happen for all these entries
  (since cache-tree and all the trees are the same) so we can skip
  that as well.

This gives rather nice speedups for "unpack trees" rows where "unpack
trees" time is now cut in half compared to when
traverse_by_cache_tree() is added, or 1/7 of the original "unpack
trees" time.

   baseline      cache-tree    this patch
 --------------------------------------------------------------------
   0.018239226   0.019365414   0.020519621 s: read cache .git/index
   0.052541655   0.049605548   0.048814384 s: preload index
   0.001537598   0.001571695   0.001575382 s: refresh index
   0.168167768   0.049677212   0.024719308 s: unpack trees
   0.002897186   0.002845256   0.002805555 s: update worktree after a merge
   0.131661745   0.136597522   0.134891617 s: repair cache-tree
   0.075389117   0.075422517   0.074832291 s: write index, changed mask = 2a
   0.111702023   0.032813253   0.008616479 s: unpack trees
   0.000023245   0.000022002   0.000026630 s: update worktree after a merge
   0.111793866   0.032933140   0.008714071 s: diff-index
   0.587933288   0.398924370   0.380452871 s: git command: /home/pclouds/w/git/git

Total saving of this new patch looks even less impressive, now that
time spent in unpacking trees is so small. Which is why the next
attempt should be on that "repair cache-tree" line.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h        |  1 +
 read-cache.c   |  3 ++-
 unpack-trees.c | 27 +++++++++++++++++++++++++++
 unpack-trees.h |  1 +
 4 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 8b447652a7..e6f7ee4b64 100644
--- a/cache.h
+++ b/cache.h
@@ -673,6 +673,7 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
 #define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
 #define ADD_CACHE_NEW_ONLY 16		/* Do not replace existing ones */
 #define ADD_CACHE_KEEP_CACHE_TREE 32	/* Do not invalidate cache-tree */
+#define ADD_CACHE_SKIP_VERIFY_PATH 64	/* Do not verify path */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
 extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
 
diff --git a/read-cache.c b/read-cache.c
index e865254bea..b0b5df5de7 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1170,6 +1170,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
 	int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
+	int skip_verify_path = option & ADD_CACHE_SKIP_VERIFY_PATH;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
 	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
@@ -1210,7 +1211,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 
 	if (!ok_to_add)
 		return -1;
-	if (!verify_path(ce->name, ce->ce_mode))
+	if (!skip_verify_path && !verify_path(ce->name, ce->ce_mode))
 		return error("Invalid path '%s'", ce->name);
 
 	if (!skip_df_check &&
diff --git a/unpack-trees.c b/unpack-trees.c
index c33ebaf001..dc62afd968 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -201,6 +201,7 @@ static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
 
 	ce->ce_flags = (ce->ce_flags & ~clear) | set;
 	return add_index_entry(&o->result, ce,
+			       o->extra_add_index_flags |
 			       ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
 }
 
@@ -701,6 +702,24 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	if (!o->merge)
 		BUG("We need cache-tree to do this optimization");
 
+	/*
+	 * Try to keep add_index_entry() as fast as possible since
+	 * we're going to do a lot of them.
+	 *
+	 * Skipping verify_path() should totally be safe because these
+	 * paths are from the source index, which must have been
+	 * verified.
+	 *
+	 * Skipping D/F and cache-tree validation checks is trickier
+	 * because it assumes what n-merge code would do when all
+	 * trees and the index are the same. We probably could just
+	 * optimize those code instead (e.g. we don't invalidate that
+	 * many cache-tree, but the searching for them is very
+	 * expensive).
+	 */
+	o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
+	o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
+
 	/*
 	 * Do what unpack_callback() and unpack_nondirectories() normally
 	 * do. But we walk all paths recursively in just one loop instead.
@@ -742,6 +761,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 
 		mark_ce_used(src[0], o);
 	}
+	o->extra_add_index_flags = 0;
 	free(tree_ce);
 	if (o->debug_unpack)
 		printf("Unpacked %d entries from %s to %s using cache-tree\n",
@@ -1561,6 +1581,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 		if (!ret) {
 			if (!o->result.cache_tree)
 				o->result.cache_tree = cache_tree();
+			/*
+			 * TODO: Walk o.src_index->cache_tree, quickly check
+			 * if o->result.cache has the exact same content for
+			 * any valid cache-tree in o.src_index, then we can
+			 * just copy the cache-tree over instead of hashing a
+			 * new tree object.
+			 */
 			if (!cache_tree_fully_valid(o->result.cache_tree))
 				cache_tree_update(&o->result,
 						  WRITE_TREE_SILENT |
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..94e1b14078 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -80,6 +80,7 @@ struct unpack_trees_options {
 	struct index_state result;
 
 	struct exclude_list *el; /* for internal use */
+	unsigned int extra_add_index_flags;
 };
 
 extern int unpack_trees(unsigned n, struct tree_desc *t,
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
                                           ` (3 preceding siblings ...)
  2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
@ 2018-07-30 18:10                         ` Ben Peart
  2018-07-31 15:31                           ` Duy Nguyen
  2018-07-30 21:04                         ` Ben Peart
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
  6 siblings, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-07-30 18:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Ben.Peart, git, gitster, peff



On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:
> This series speeds up unpack_trees() a bit by using cache-tree.
> unpack-trees could bit split in three big parts
> 
> - the actual tree unpacking and running n-way merging
> - update worktree, which could be expensive depending on how much I/O
>    is involved
> - repair cache-tree
> 
> This series focuses on the first part alone and could give 700%
> speedup (best case possible scenario, real life ones probably not that
> impressive).
> 
> It also shows that the reparing cache-tree is kinda expensive. I have
> an idea of reusing cache-tree from the original index, but I'll leave
> that to Ben or others to try out and see if it helps at all.
> 
> v2 fixes the comments from Junio, adds more performance tracing and
> reduces the cost of adding index entries.
> 
> Nguyễn Thái Ngọc Duy (4):
>    unpack-trees.c: add performance tracing
>    unpack-trees: optimize walking same trees with cache-tree
>    unpack-trees: reduce malloc in cache-tree walk
>    unpack-trees: cheaper index update when walking by cache-tree
> 
>   cache-tree.c   |   2 +
>   cache.h        |   1 +
>   read-cache.c   |   3 +-
>   unpack-trees.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   unpack-trees.h |   1 +
>   5 files changed, 166 insertions(+), 2 deletions(-)
> 

I ran "git checkout" on a large repo and averaged the results of 3 runs. 
  This clearly demonstrates the benefit of the optimized unpack_trees() 
as even the final "diff-index" is essentially a 3rd call to unpack_trees().

baseline	new	
----------------------------------------------------------------------
0.535510167	0.556558733	s: read cache .git/index
0.3057373	0.3147105	s: initialize name hash
0.0184082	0.023558433	s: preload index
0.086910967	0.089085967	s: refresh index
7.889590767	2.191554433	s: unpack trees
0.120760833	0.131941267	s: update worktree after a merge
2.2583504	2.572663167	s: repair cache-tree
0.8916137	0.959495233	s: write index, changed mask = 28
3.405199233	0.2710663	s: unpack trees
0.000999667	0.0021554	s: update worktree after a merge
3.4063306	0.273318333	s: diff-index
16.9524923	9.462943133	s: git command: 
'c:\git-sdk-64\usr\src\git\git.exe' checkout

The first call to unpack_trees() saves 72%
The 2nd and 3rd call save 92%
Total time savings for the entire command was 44%

In the performance game of whack-a-mole, that call to repair cache-tree 
is now looking quite expensive...

Ben

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 1/4] unpack-trees.c: add performance tracing
  2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
@ 2018-07-30 20:16                           ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-30 20:16 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Ben.Peart, git, gitster, peff



On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:
> We're going to optimize unpack_trees() a bit in the following
> patches. Let's add some tracing to measure how long it takes before
> and after. This is the baseline ("git checkout -" on gcc.git, 80k
> files on worktree)
> 
>      0.018239226 s: read cache .git/index
>      0.052541655 s: preload index
>      0.001537598 s: refresh index
>      0.168167768 s: unpack trees
>      0.002897186 s: update worktree after a merge
>      0.131661745 s: repair cache-tree
>      0.075389117 s: write index, changed mask = 2a
>      0.111702023 s: unpack trees
>      0.000023245 s: update worktree after a merge
>      0.111793866 s: diff-index
>      0.587933288 s: git command: /home/pclouds/w/git/git checkout -
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

I've reviewed this patch and it looks good to me.  Nice to see the 
additional breakdown on where time is being spent.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree
  2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-07-30 20:52                           ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-30 20:52 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Ben.Peart, git, gitster, peff



On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:
> From: Duy Nguyen <pclouds@gmail.com>
> 
> In order to merge one or many trees with the index, unpack-trees code
> walks multiple trees in parallel with the index and performs n-way
> merge. If we find out at start of a directory that all trees are the
> same (by comparing OID) and cache-tree happens to be available for
> that directory as well, we could avoid walking the trees because we
> already know what these trees contain: it's flattened in what's called
> "the index".
> 
> The upside is of course a lot less I/O since we can potentially skip
> lots of trees (think subtrees). We also save CPU because we don't have
> to inflate and the apply deltas. The downside is of course more
> fragile code since the logic in some functions are now duplicated
> elsewhere.
> 
> "checkout -" with this patch on gcc.git:
> 
>      baseline      new
>    --------------------------------------------------------------------
>      0.018239226   0.019365414 s: read cache .git/index
>      0.052541655   0.049605548 s: preload index
>      0.001537598   0.001571695 s: refresh index
>      0.168167768   0.049677212 s: unpack trees
>      0.002897186   0.002845256 s: update worktree after a merge
>      0.131661745   0.136597522 s: repair cache-tree
>      0.075389117   0.075422517 s: write index, changed mask = 2a
>      0.111702023   0.032813253 s: unpack trees
>      0.000023245   0.000022002 s: update worktree after a merge
>      0.111793866   0.032933140 s: diff-index
>      0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git
> 
> This command calls unpack_trees() twice, the first time on 2way merge
> and the second 1way merge. In both times, "unpack trees" time is
> reduced to one third. Overall time reduction is not that impressive of
> course because index operations take a big chunk. And there's that
> repair cache-tree line.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   unpack-trees.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index dc58d1f5ae..39566b28fb 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
>   	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
>   }
>   
> +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> +					struct name_entry *names,
> +					struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int i;
> +
> +	if (!o->merge || dirmask != ((1 << n) - 1))
> +		return 0;
> +
> +	for (i = 1; i < n; i++)
> +		if (!are_same_oid(names, names + i))
> +			return 0;
> +
> +	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> +}
> +
> +static int index_pos_by_traverse_info(struct name_entry *names,
> +				      struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int len = traverse_path_len(info, names);
> +	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
> +	int pos;
> +
> +	make_traverse_path(name, info, names);
> +	name[len++] = '/';
> +	name[len] = '\0';
> +	pos = index_name_pos(o->src_index, name, len);
> +	if (pos >= 0)
> +		BUG("This is a directory and should not exist in index");
> +	pos = -pos - 1;
> +	if (!starts_with(o->src_index->cache[pos]->name, name) ||
> +	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
> +		BUG("pos must point at the first entry in this directory");
> +	free(name);
> +	return pos;
> +}
> +
> +/*
> + * Fast path if we detect that all trees are the same as cache-tree at this
> + * path. We'll walk these trees recursively using cache-tree/index instead of
> + * ODB since already know what these trees contain.
> + */
> +static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> +				  struct name_entry *names,
> +				  struct traverse_info *info)
> +{
> +	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> +	struct unpack_trees_options *o = info->data;
> +	int i, d;
> +
> +	if (!o->merge)
> +		BUG("We need cache-tree to do this optimization");
> +
> +	/*
> +	 * Do what unpack_callback() and unpack_nondirectories() normally
> +	 * do. But we walk all paths recursively in just one loop instead.
> +	 *
> +	 * D/F conflicts and staged entries are not a concern because
> +	 * cache-tree would be invalidated and we would never get here
> +	 * in the first place.
> +	 */
> +	for (i = 0; i < nr_entries; i++) {
> +		struct cache_entry *tree_ce;
> +		int len, rc;
> +
> +		src[0] = o->src_index->cache[pos + i];
> +
> +		len = ce_namelen(src[0]);
> +		tree_ce = xcalloc(1, cache_entry_size(len));
> +
> +		tree_ce->ce_mode = src[0]->ce_mode;
> +		tree_ce->ce_flags = create_ce_flags(0);
> +		tree_ce->ce_namelen = len;
> +		oidcpy(&tree_ce->oid, &src[0]->oid);
> +		memcpy(tree_ce->name, src[0]->name, len + 1);
> +

I don't like the overhead of having to create an entirely new cache 
entry here but I see you clean this up in the next patch.

> +		for (d = 1; d <= nr_names; d++)
> +			src[d] = tree_ce;
> +
> +		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> +		free(tree_ce);
> +		if (rc < 0)
> +			return rc;
> +
> +		mark_ce_used(src[0], o);
> +	}
> +	if (o->debug_unpack)
> +		printf("Unpacked %d entries from %s to %s using cache-tree\n",
> +		       nr_entries,
> +		       o->src_index->cache[pos]->name,
> +		       o->src_index->cache[pos + nr_entries - 1]->name);
> +	return 0;
> +}
> +
>   static int traverse_trees_recursive(int n, unsigned long dirmask,
>   				    unsigned long df_conflicts,
>   				    struct name_entry *names,
> @@ -655,6 +751,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>   	void *buf[MAX_UNPACK_TREES];
>   	struct traverse_info newinfo;
>   	struct name_entry *p;
> +	int nr_entries;
> +
> +	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
> +	if (nr_entries > 0) {
> +		struct unpack_trees_options *o = info->data;
> +		int pos = index_pos_by_traverse_info(names, info);
> +
> +		if (!o->merge || df_conflicts)
> +			BUG("Wrong condition to get here buddy");
> +		return traverse_by_cache_tree(pos, nr_entries, n, names, info);
> +	}
>   
>   	p = names;
>   	while (!p->mode)
> @@ -814,6 +921,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
>   	return ce;
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this funciton

s/funciton/function

> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_nondirectories(int n, unsigned long mask,
>   				 unsigned long dirmask,
>   				 struct cache_entry **src,
> @@ -998,6 +1110,11 @@ static void debug_unpack_callback(int n,
>   		debug_name_entry(i, names + i);
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this funciton
> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
>   {
>   	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> @@ -1280,7 +1397,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	uint64_t start = getnanotime();
>   
>   	if (len > MAX_UNPACK_TREES)
> -		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> +		die(_("unpack_trees takes at most %d trees"), MAX_UNPACK_TREES);
>   

I'd like to see this get in independently of this patch series.

>   	memset(&el, 0, sizeof(el));
>   	if (!core_apply_sparse_checkout || !o->update)
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk
  2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
@ 2018-07-30 20:58                           ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-30 20:58 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Ben.Peart, git, gitster, peff



On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:
> This is a micro optimization that probably only shines on repos with
> deep directory structure. Instead of allocating and freeing a new
> cache_entry in every iteration, we reuse the last one and only update
> the parts that are new each iteration.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   unpack-trees.c | 29 ++++++++++++++++++++---------
>   1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 39566b28fb..c33ebaf001 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -694,6 +694,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>   {
>   	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>   	struct unpack_trees_options *o = info->data;
> +	struct cache_entry *tree_ce = NULL;
> +	int ce_len = 0;
>   	int i, d;
>   
>   	if (!o->merge)
> @@ -708,30 +710,39 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>   	 * in the first place.
>   	 */
>   	for (i = 0; i < nr_entries; i++) {
> -		struct cache_entry *tree_ce;
> -		int len, rc;
> +		int new_ce_len, len, rc;
>   
>   		src[0] = o->src_index->cache[pos + i];
>   
>   		len = ce_namelen(src[0]);
> -		tree_ce = xcalloc(1, cache_entry_size(len));
> +		new_ce_len = cache_entry_size(len);
> +
> +		if (new_ce_len > ce_len) {
> +			new_ce_len <<= 1;
> +			tree_ce = xrealloc(tree_ce, new_ce_len);
> +			memset(tree_ce, 0, new_ce_len);
> +			ce_len = new_ce_len;
> +
> +			tree_ce->ce_flags = create_ce_flags(0);
> +
> +			for (d = 1; d <= nr_names; d++)
> +				src[d] = tree_ce;
> +		}

Nice optimization - especially when there are a lot of cache entries and 
large trees.

>   
>   		tree_ce->ce_mode = src[0]->ce_mode;
> -		tree_ce->ce_flags = create_ce_flags(0);
>   		tree_ce->ce_namelen = len;
>   		oidcpy(&tree_ce->oid, &src[0]->oid);
>   		memcpy(tree_ce->name, src[0]->name, len + 1);
>   
> -		for (d = 1; d <= nr_names; d++)
> -			src[d] = tree_ce;
> -
>   		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> -		free(tree_ce);
> -		if (rc < 0)
> +		if (rc < 0) {
> +			free(tree_ce);
>   			return rc;
> +		}
>   
>   		mark_ce_used(src[0], o);
>   	}
> +	free(tree_ce);
>   	if (o->debug_unpack)
>   		printf("Unpacked %d entries from %s to %s using cache-tree\n",
>   		       nr_entries,
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
                                           ` (4 preceding siblings ...)
  2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
@ 2018-07-30 21:04                         ` Ben Peart
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
  6 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-07-30 21:04 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Ben.Peart, git, gitster, peff



On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:
> This series speeds up unpack_trees() a bit by using cache-tree.
> unpack-trees could bit split in three big parts
> 
> - the actual tree unpacking and running n-way merging
> - update worktree, which could be expensive depending on how much I/O
>    is involved
> - repair cache-tree
> 
> This series focuses on the first part alone and could give 700%
> speedup (best case possible scenario, real life ones probably not that
> impressive).
> 
> It also shows that the reparing cache-tree is kinda expensive. I have
> an idea of reusing cache-tree from the original index, but I'll leave
> that to Ben or others to try out and see if it helps at all.
> 
> v2 fixes the comments from Junio, adds more performance tracing and
> reduces the cost of adding index entries.
> 
> Nguyễn Thái Ngọc Duy (4):
>    unpack-trees.c: add performance tracing
>    unpack-trees: optimize walking same trees with cache-tree
>    unpack-trees: reduce malloc in cache-tree walk
>    unpack-trees: cheaper index update when walking by cache-tree
> 
>   cache-tree.c   |   2 +
>   cache.h        |   1 +
>   read-cache.c   |   3 +-
>   unpack-trees.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   unpack-trees.h |   1 +
>   5 files changed, 166 insertions(+), 2 deletions(-)
> 

I have a limited understanding of this code path so I'm not the best 
person to review this but I didn't see any issues that concerned me.  I 
also was able to run our internal functional and performance tests in 
addition to the git tests and the results were positive.

Ben

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
@ 2018-07-31 15:31                           ` Duy Nguyen
  2018-07-31 16:50                             ` Ben Peart
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-07-31 15:31 UTC (permalink / raw)
  To: Ben Peart; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King

On Mon, Jul 30, 2018 at 8:10 PM Ben Peart <peartben@gmail.com> wrote:
> I ran "git checkout" on a large repo and averaged the results of 3 runs.
>   This clearly demonstrates the benefit of the optimized unpack_trees()
> as even the final "diff-index" is essentially a 3rd call to unpack_trees().
>
> baseline        new
> ----------------------------------------------------------------------
> 0.535510167     0.556558733     s: read cache .git/index
> 0.3057373       0.3147105       s: initialize name hash
> 0.0184082       0.023558433     s: preload index
> 0.086910967     0.089085967     s: refresh index
> 7.889590767     2.191554433     s: unpack trees
> 0.120760833     0.131941267     s: update worktree after a merge
> 2.2583504       2.572663167     s: repair cache-tree
> 0.8916137       0.959495233     s: write index, changed mask = 28
> 3.405199233     0.2710663       s: unpack trees
> 0.000999667     0.0021554       s: update worktree after a merge
> 3.4063306       0.273318333     s: diff-index
> 16.9524923      9.462943133     s: git command:
> 'c:\git-sdk-64\usr\src\git\git.exe' checkout
>
> The first call to unpack_trees() saves 72%
> The 2nd and 3rd call save 92%

By the 3rd I guess you meant "diff-index" line. I think it's the same
with the second call. diff-index triggers the second unpack-trees but
there's no indent here and it's misleading to read this as diff-index
and unpack-trees execute one after the other.

> Total time savings for the entire command was 44%

Wow.. I guess you have more trees since I could only save 30% on gcc.git.

> In the performance game of whack-a-mole, that call to repair cache-tree
> is now looking quite expensive...

Yeah and I think we can whack that mole too. I did some measurement.
Best case possible, we just need to scan through two indexes (one with
many good cache-tree, one with no cache-tree), compare and copy
cache-tree over. The scanning takes like 1% time of current repair
step and I suspect it's the hashing that takes most of the time. Of
course real world won't have such nice numbers, but I guess we could
maybe half cache-tree update/repair time.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-07-31 15:31                           ` Duy Nguyen
@ 2018-07-31 16:50                             ` Ben Peart
  2018-07-31 17:31                               ` Ben Peart
  0 siblings, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-07-31 16:50 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King



On 7/31/2018 11:31 AM, Duy Nguyen wrote:
> On Mon, Jul 30, 2018 at 8:10 PM Ben Peart <peartben@gmail.com> wrote:
>> I ran "git checkout" on a large repo and averaged the results of 3 runs.
>>    This clearly demonstrates the benefit of the optimized unpack_trees()
>> as even the final "diff-index" is essentially a 3rd call to unpack_trees().
>>
>> baseline        new
>> ----------------------------------------------------------------------
>> 0.535510167     0.556558733     s: read cache .git/index
>> 0.3057373       0.3147105       s: initialize name hash
>> 0.0184082       0.023558433     s: preload index
>> 0.086910967     0.089085967     s: refresh index
>> 7.889590767     2.191554433     s: unpack trees
>> 0.120760833     0.131941267     s: update worktree after a merge
>> 2.2583504       2.572663167     s: repair cache-tree
>> 0.8916137       0.959495233     s: write index, changed mask = 28
>> 3.405199233     0.2710663       s: unpack trees
>> 0.000999667     0.0021554       s: update worktree after a merge
>> 3.4063306       0.273318333     s: diff-index
>> 16.9524923      9.462943133     s: git command:
>> 'c:\git-sdk-64\usr\src\git\git.exe' checkout
>>
>> The first call to unpack_trees() saves 72%
>> The 2nd and 3rd call save 92%
> 
> By the 3rd I guess you meant "diff-index" line. I think it's the same
> with the second call. diff-index triggers the second unpack-trees but
> there's no indent here and it's misleading to read this as diff-index
> and unpack-trees execute one after the other.
> 
>> Total time savings for the entire command was 44%
> 
> Wow.. I guess you have more trees since I could only save 30% on gcc.git.

Yes, with over 500K trees, this optimization really pays off for us.  I 
can't wait to see how this works out in the wild (vs my "lab" based 
performance testing).

Thank you!  I definitely owe you lunch. :)

> 
>> In the performance game of whack-a-mole, that call to repair cache-tree
>> is now looking quite expensive...
> 
> Yeah and I think we can whack that mole too. I did some measurement.
> Best case possible, we just need to scan through two indexes (one with
> many good cache-tree, one with no cache-tree), compare and copy
> cache-tree over. The scanning takes like 1% time of current repair
> step and I suspect it's the hashing that takes most of the time. Of
> course real world won't have such nice numbers, but I guess we could
> maybe half cache-tree update/repair time.
> 

I have some great profiling tools available so will take a look at this 
next and see exactly where the time is being spent.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-07-31 16:50                             ` Ben Peart
@ 2018-07-31 17:31                               ` Ben Peart
  2018-08-01 16:38                                 ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-07-31 17:31 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King



On 7/31/2018 12:50 PM, Ben Peart wrote:
> 
> 
> On 7/31/2018 11:31 AM, Duy Nguyen wrote:

>>
>>> In the performance game of whack-a-mole, that call to repair cache-tree
>>> is now looking quite expensive...
>>
>> Yeah and I think we can whack that mole too. I did some measurement.
>> Best case possible, we just need to scan through two indexes (one with
>> many good cache-tree, one with no cache-tree), compare and copy
>> cache-tree over. The scanning takes like 1% time of current repair
>> step and I suspect it's the hashing that takes most of the time. Of
>> course real world won't have such nice numbers, but I guess we could
>> maybe half cache-tree update/repair time.
>>
> 
> I have some great profiling tools available so will take a look at this 
> next and see exactly where the time is being spent.

Good instincts.  In cache_tree_update, the heavy hitter is definitely 
hash_object_file followed by has_object_file.

Name                               	Inc %	     Inc
+ git!cache_tree_update            	 12.4	   4,935
|+ git!update_one                  	 11.8	   4,706
| + git!update_one                 	 11.8	   4,706
|  + git!hash_object_file          	  6.1	   2,406
|  + git!has_object_file           	  2.0	     813
|  + OTHER <<vcruntime140d!strchr>>	  0.5	     203
|  + git!strbuf_addf               	  0.4	     155
|  + git!strbuf_release            	  0.4	     143
|  + git!strbuf_add                	  0.3	     121
|  + OTHER <<vcruntime140d!memcmp>>	  0.2	      93
|  + git!strbuf_grow               	  0.1	      25

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-07-31 17:31                               ` Ben Peart
@ 2018-08-01 16:38                                 ` Duy Nguyen
  2018-08-08 20:53                                   ` Ben Peart
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-01 16:38 UTC (permalink / raw)
  To: Ben Peart; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King

On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:
> 
> 
> On 7/31/2018 12:50 PM, Ben Peart wrote:
> > 
> > 
> > On 7/31/2018 11:31 AM, Duy Nguyen wrote:
> 
> >>
> >>> In the performance game of whack-a-mole, that call to repair cache-tree
> >>> is now looking quite expensive...
> >>
> >> Yeah and I think we can whack that mole too. I did some measurement.
> >> Best case possible, we just need to scan through two indexes (one with
> >> many good cache-tree, one with no cache-tree), compare and copy
> >> cache-tree over. The scanning takes like 1% time of current repair
> >> step and I suspect it's the hashing that takes most of the time. Of
> >> course real world won't have such nice numbers, but I guess we could
> >> maybe half cache-tree update/repair time.
> >>
> > 
> > I have some great profiling tools available so will take a look at this 
> > next and see exactly where the time is being spent.
> 
> Good instincts.  In cache_tree_update, the heavy hitter is definitely 
> hash_object_file followed by has_object_file.
> 
> Name                               	Inc %	     Inc
> + git!cache_tree_update            	 12.4	   4,935
> |+ git!update_one                  	 11.8	   4,706
> | + git!update_one                 	 11.8	   4,706
> |  + git!hash_object_file          	  6.1	   2,406
> |  + git!has_object_file           	  2.0	     813
> |  + OTHER <<vcruntime140d!strchr>>	  0.5	     203
> |  + git!strbuf_addf               	  0.4	     155
> |  + git!strbuf_release            	  0.4	     143
> |  + git!strbuf_add                	  0.3	     121
> |  + OTHER <<vcruntime140d!memcmp>>	  0.2	      93
> |  + git!strbuf_grow               	  0.1	      25

Ben, if you work on this, this could be a good starting point. I will
not work on this because I still have some other things to catch up
and follow through. You can have my sign off if you reuse something
from this patch

Even if it's a naive implementation, the initial numbers look pretty
good. Without the patch we have

18:31:05.970621 unpack-trees.c:1437     performance: 0.000001029 s: copy
18:31:05.975729 unpack-trees.c:1444     performance: 0.005082004 s: update

And with the patch

18:31:13.295655 unpack-trees.c:1437     performance: 0.000198017 s: copy
18:31:13.296757 unpack-trees.c:1444     performance: 0.001075935 s: update

Time saving is about 80% by the look of this (best possible case
because only the top tree needs to be hashed and written out).

-- 8< --
diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..67a4a93100 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags)
 	return 0;
 }
 
+static int same(const struct cache_entry *a, const struct cache_entry *b)
+{
+	if (ce_stage(a) || ce_stage(b))
+		return 0;
+	if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
+		return 0;
+	return a->ce_mode == b->ce_mode &&
+	       !oidcmp(&a->oid, &b->oid);
+}
+
+static int cache_tree_name_pos(const struct index_state *istate,
+			       const struct strbuf *path)
+{
+	int pos;
+
+	if (!path->len)
+		return 0;
+
+	pos = index_name_pos(istate, path->buf, path->len);
+	if (pos >= 0)
+		BUG("No no no, directory path must not exist in index");
+	return -pos - 1;
+}
+
+/*
+ * Locate the same cache-tree in two separate indexes. Check the
+ * cache-tree is still valid for the "to" index (i.e. it contains the
+ * same set of entries in the "from" index).
+ */
+static int verify_one_cache_tree(const struct index_state *to,
+				 const struct index_state *from,
+				 const struct cache_tree *it,
+				 const struct strbuf *path)
+{
+	int i, spos, dpos;
+
+	spos = cache_tree_name_pos(from, path);
+	if (spos + it->entry_count > from->cache_nr)
+		return -1;
+
+	dpos = cache_tree_name_pos(to, path);
+	if (dpos + it->entry_count > to->cache_nr)
+		return -1;
+
+	/* Can we quickly check head and tail and bail out early */
+	if (!same(from->cache[spos], to->cache[spos]) ||
+	    !same(from->cache[spos + it->entry_count - 1],
+		  to->cache[spos + it->entry_count - 1]))
+		return -1;
+
+	for (i = 1; i < it->entry_count - 1; i++)
+		if (!same(from->cache[spos + i],
+			  to->cache[dpos + i]))
+			return -1;
+
+	return 0;
+}
+
+static int verify_and_invalidate(struct index_state *to,
+				 const struct index_state *from,
+				 struct cache_tree *it,
+				 struct strbuf *path)
+{
+	/*
+	 * Optimistically verify the current tree first. Alternatively
+	 * we could verify all the subtrees first then do this
+	 * last. Any invalid subtree would also invalidates its
+	 * ancestors.
+	 */
+	if (it->entry_count != -1 &&
+	    verify_one_cache_tree(to, from, it, path))
+		it->entry_count = -1;
+
+	/*
+	 * If the current tree is valid, don't bother checking
+	 * inside. All subtrees _should_ also be valid
+	 */
+	if (it->entry_count == -1) {
+		int i, len = path->len;
+
+		for (i = 0; i < it->subtree_nr; i++) {
+			struct cache_tree_sub *down = it->down[i];
+
+			if (!down || !down->cache_tree)
+				continue;
+
+			strbuf_setlen(path, len);
+			strbuf_add(path, down->name, down->namelen);
+			strbuf_addch(path, '/');
+			if (verify_and_invalidate(to, from,
+						  down->cache_tree, path))
+				return -1;
+		}
+		strbuf_setlen(path, len);
+	}
+	return 0;
+}
+
+static struct cache_tree *duplicate_cache_tree(const struct cache_tree *src)
+{
+	struct cache_tree *dst;
+	int i;
+
+	if (!src)
+		return NULL;
+
+	dst = xmalloc(sizeof(*dst));
+	dst->entry_count = src->entry_count;
+	oidcpy(&dst->oid, &src->oid);
+	dst->subtree_nr = src->subtree_nr;
+	dst->subtree_alloc = dst->subtree_nr;
+	ALLOC_ARRAY(dst->down, dst->subtree_alloc);
+	for (i = 0; i < src->subtree_nr; i++) {
+		struct cache_tree_sub *dsrc = src->down[i];
+		struct cache_tree_sub *down;
+
+		FLEX_ALLOC_MEM(down, name, dsrc->name, dsrc->namelen);
+		down->count = dsrc->count;
+		down->namelen = dsrc->namelen;
+		down->used = dsrc->used;
+		down->cache_tree = duplicate_cache_tree(dsrc->cache_tree);
+		dst->down[i] = down;
+	}
+	return dst;
+}
+
+int cache_tree_copy(struct index_state *to, const struct index_state *from)
+{
+	struct cache_tree *it = duplicate_cache_tree(from->cache_tree);
+	struct strbuf path = STRBUF_INIT;
+	int ret;
+
+	if (to->cache_tree)
+		BUG("Sorry merging cache-tree is not supported yet");
+	ret = verify_and_invalidate(to, from, it, &path);
+	to->cache_tree = it;
+	to->cache_changed |= CACHE_TREE_CHANGED;
+	strbuf_release(&path);
+	return ret;
+}
+
 static void write_one(struct strbuf *buffer, struct cache_tree *it,
                       const char *path, int pathlen)
 {
diff --git a/cache-tree.h b/cache-tree.h
index cfd5328cc9..6981da8e0d 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -53,4 +53,6 @@ void prime_cache_tree(struct index_state *, struct tree *);
 
 extern int cache_tree_matches_traversal(struct cache_tree *, struct name_entry *ent, struct traverse_info *info);
 
+int cache_tree_copy(struct index_state *to, const struct index_state *from);
+
 #endif
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..cb3fdd42a6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1427,12 +1427,22 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	ret = check_updates(o) ? (-2) : 0;
 	if (o->dst_index) {
 		if (!ret) {
-			if (!o->result.cache_tree)
+			if (!o->result.cache_tree) {
+				uint64_t start = getnanotime();
+#if 0
 				o->result.cache_tree = cache_tree();
-			if (!cache_tree_fully_valid(o->result.cache_tree))
+#else
+				cache_tree_copy(&o->result, o->src_index);
+#endif
+				trace_performance_since(start, "copy");
+			}
+			if (!cache_tree_fully_valid(o->result.cache_tree)) {
+				uint64_t start = getnanotime();
 				cache_tree_update(&o->result,
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
+				trace_performance_since(start, "update");
+			}
 		}
 		move_index_extensions(&o->result, o->src_index);
 		discard_index(o->dst_index);
-- 8< --

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v3 0/4] Speed up unpack_trees()
  2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
                                           ` (5 preceding siblings ...)
  2018-07-30 21:04                         ` Ben Peart
@ 2018-08-04  5:37                         ` Nguyễn Thái Ngọc Duy
  2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
                                             ` (5 more replies)
  6 siblings, 6 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-04  5:37 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

This is a minor update to address Ben's comments and add his
measurements in the commit message of 2/4 for the record.

I've also checked about the lookahead thing in unpack_trees() to see
if we accidentally break something there, which is my biggest worry.
See [1] and [2] for context, but I believe since we can't have D/F
conflicts, the situation where lookahead is needed will not occur. So
we should be safe.

[1] da165f470e (unpack-trees.c: prepare for looking ahead in the index - 2010-01-07)
[2] 730f72840c (unpack-trees.c: look ahead in the index - 2009-09-20)

range-diff:

1:  789f7e2872 ! 1:  05eb762d2d unpack-trees.c: add performance tracing
    @@ -1,6 +1,6 @@
     Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
     
    -    unpack-trees.c: add performance tracing
    +    unpack-trees: add performance tracing
     
         We're going to optimize unpack_trees() a bit in the following
         patches. Let's add some tracing to measure how long it takes before
2:  589bed1366 ! 2:  02286ad123 unpack-trees: optimize walking same trees with cache-tree
    @@ -32,6 +32,24 @@
             0.111793866   0.032933140 s: diff-index
             0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git
     
    +    Another measurement from Ben's running "git checkout" with over 500k
    +    trees (on the whole series):
    +
    +        baseline        new
    +      ----------------------------------------------------------------------
    +        0.535510167     0.556558733     s: read cache .git/index
    +        0.3057373       0.3147105       s: initialize name hash
    +        0.0184082       0.023558433     s: preload index
    +        0.086910967     0.089085967     s: refresh index
    +        7.889590767     2.191554433     s: unpack trees
    +        0.120760833     0.131941267     s: update worktree after a merge
    +        2.2583504       2.572663167     s: repair cache-tree
    +        0.8916137       0.959495233     s: write index, changed mask = 28
    +        3.405199233     0.2710663       s: unpack trees
    +        0.000999667     0.0021554       s: update worktree after a merge
    +        3.4063306       0.273318333     s: diff-index
    +        16.9524923      9.462943133     s: git command: git.exe checkout
    +
         This command calls unpack_trees() twice, the first time on 2way merge
         and the second 1way merge. In both times, "unpack trees" time is
         reduced to one third. Overall time reduction is not that impressive of
    @@ -39,7 +57,6 @@
         repair cache-tree line.
     
         Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/unpack-trees.c b/unpack-trees.c
     --- a/unpack-trees.c
    @@ -170,7 +187,7 @@
      }
      
     +/*
    -+ * Note that traverse_by_cache_tree() duplicates some logic in this funciton
    ++ * Note that traverse_by_cache_tree() duplicates some logic in this function
     + * without actually calling it. If you change the logic here you may need to
     + * check and change there as well.
     + */
    @@ -189,12 +206,3 @@
      static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
      {
      	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
    -@@
    - 	uint64_t start = getnanotime();
    - 
    - 	if (len > MAX_UNPACK_TREES)
    --		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
    -+		die(_("unpack_trees takes at most %d trees"), MAX_UNPACK_TREES);
    - 
    - 	memset(&el, 0, sizeof(el));
    - 	if (!core_apply_sparse_checkout || !o->update)
3:  7c6f863fc0 = 3:  c87b82ffee unpack-trees: reduce malloc in cache-tree walk
4:  6ca17b1138 ! 4:  e791cdfc82 unpack-trees: cheaper index update when walking by cache-tree
    @@ -40,7 +40,6 @@
         attempt should be on that "repair cache-tree" line.
     
         Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/cache.h b/cache.h
     --- a/cache.h
    @@ -119,20 +118,6 @@
      	free(tree_ce);
      	if (o->debug_unpack)
      		printf("Unpacked %d entries from %s to %s using cache-tree\n",
    -@@
    - 		if (!ret) {
    - 			if (!o->result.cache_tree)
    - 				o->result.cache_tree = cache_tree();
    -+			/*
    -+			 * TODO: Walk o.src_index->cache_tree, quickly check
    -+			 * if o->result.cache has the exact same content for
    -+			 * any valid cache-tree in o.src_index, then we can
    -+			 * just copy the cache-tree over instead of hashing a
    -+			 * new tree object.
    -+			 */
    - 			if (!cache_tree_fully_valid(o->result.cache_tree))
    - 				cache_tree_update(&o->result,
    - 						  WRITE_TREE_SILENT |
     
     diff --git a/unpack-trees.h b/unpack-trees.h
     --- a/unpack-trees.h

Nguyễn Thái Ngọc Duy (4):
  unpack-trees: add performance tracing
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: cheaper index update when walking by cache-tree

 cache-tree.c   |   2 +
 cache.h        |   1 +
 read-cache.c   |   3 +-
 unpack-trees.c | 152 +++++++++++++++++++++++++++++++++++++++++++++++++
 unpack-trees.h |   1 +
 5 files changed, 158 insertions(+), 1 deletion(-)

-- 
2.18.0.656.gda699b98b3

^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v3 1/4] unpack-trees: add performance tracing
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
@ 2018-08-04  5:37                           ` Nguyễn Thái Ngọc Duy
  2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
                                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-04  5:37 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on gcc.git, 80k
files on worktree)

    0.018239226 s: read cache .git/index
    0.052541655 s: preload index
    0.001537598 s: refresh index
    0.168167768 s: unpack trees
    0.002897186 s: update worktree after a merge
    0.131661745 s: repair cache-tree
    0.075389117 s: write index, changed mask = 2a
    0.111702023 s: unpack trees
    0.000023245 s: update worktree after a merge
    0.111793866 s: diff-index
    0.587933288 s: git command: /home/pclouds/w/git/git checkout -

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache-tree.c   | 2 ++
 unpack-trees.c | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..0dbe10fc85 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -426,6 +426,7 @@ static int update_one(struct cache_tree *it,
 
 int cache_tree_update(struct index_state *istate, int flags)
 {
+	uint64_t start = getnanotime();
 	struct cache_tree *it = istate->cache_tree;
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
@@ -437,6 +438,7 @@ int cache_tree_update(struct index_state *istate, int flags)
 	if (i < 0)
 		return i;
 	istate->cache_changed |= CACHE_TREE_CHANGED;
+	trace_performance_since(start, "repair cache-tree");
 	return 0;
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..a32ddee159 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -352,6 +352,7 @@ static int check_updates(struct unpack_trees_options *o)
 	struct progress *progress = NULL;
 	struct index_state *index = &o->result;
 	struct checkout state = CHECKOUT_INIT;
+	uint64_t start = getnanotime();
 	int i;
 
 	state.force = 1;
@@ -423,6 +424,7 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+	trace_performance_since(start, "update worktree after a merge");
 	return errs != 0;
 }
 
@@ -1275,6 +1277,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	int i, ret;
 	static struct cache_entry *dfc;
 	struct exclude_list el;
+	uint64_t start = getnanotime();
 
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
@@ -1423,6 +1426,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 			goto done;
 		}
 	}
+	trace_performance_since(start, "unpack trees");
 
 	ret = check_updates(o) ? (-2) : 0;
 	if (o->dst_index) {
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
  2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
@ 2018-08-04  5:37                           ` Nguyễn Thái Ngọc Duy
  2018-08-08 18:23                             ` Elijah Newren
  2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
                                             ` (3 subsequent siblings)
  5 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-04  5:37 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

From: Duy Nguyen <pclouds@gmail.com>

In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and the apply deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on gcc.git:

    baseline      new
  --------------------------------------------------------------------
    0.018239226   0.019365414 s: read cache .git/index
    0.052541655   0.049605548 s: preload index
    0.001537598   0.001571695 s: refresh index
    0.168167768   0.049677212 s: unpack trees
    0.002897186   0.002845256 s: update worktree after a merge
    0.131661745   0.136597522 s: repair cache-tree
    0.075389117   0.075422517 s: write index, changed mask = 2a
    0.111702023   0.032813253 s: unpack trees
    0.000023245   0.000022002 s: update worktree after a merge
    0.111793866   0.032933140 s: diff-index
    0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

    baseline        new
  ----------------------------------------------------------------------
    0.535510167     0.556558733     s: read cache .git/index
    0.3057373       0.3147105       s: initialize name hash
    0.0184082       0.023558433     s: preload index
    0.086910967     0.089085967     s: refresh index
    7.889590767     2.191554433     s: unpack trees
    0.120760833     0.131941267     s: update worktree after a merge
    2.2583504       2.572663167     s: repair cache-tree
    0.8916137       0.959495233     s: write index, changed mask = 28
    3.405199233     0.2710663       s: unpack trees
    0.000999667     0.0021554       s: update worktree after a merge
    3.4063306       0.273318333     s: diff-index
    16.9524923      9.462943133     s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index a32ddee159..ba3d2e947e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
 	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
+					struct name_entry *names,
+					struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int i;
+
+	if (!o->merge || dirmask != ((1 << n) - 1))
+		return 0;
+
+	for (i = 1; i < n; i++)
+		if (!are_same_oid(names, names + i))
+			return 0;
+
+	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+				      struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int len = traverse_path_len(info, names);
+	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
+	int pos;
+
+	make_traverse_path(name, info, names);
+	name[len++] = '/';
+	name[len] = '\0';
+	pos = index_name_pos(o->src_index, name, len);
+	if (pos >= 0)
+		BUG("This is a directory and should not exist in index");
+	pos = -pos - 1;
+	if (!starts_with(o->src_index->cache[pos]->name, name) ||
+	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
+		BUG("pos must point at the first entry in this directory");
+	free(name);
+	return pos;
+}
+
+/*
+ * Fast path if we detect that all trees are the same as cache-tree at this
+ * path. We'll walk these trees recursively using cache-tree/index instead of
+ * ODB since already know what these trees contain.
+ */
+static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
+				  struct name_entry *names,
+				  struct traverse_info *info)
+{
+	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
+	struct unpack_trees_options *o = info->data;
+	int i, d;
+
+	if (!o->merge)
+		BUG("We need cache-tree to do this optimization");
+
+	/*
+	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * do. But we walk all paths recursively in just one loop instead.
+	 *
+	 * D/F conflicts and staged entries are not a concern because
+	 * cache-tree would be invalidated and we would never get here
+	 * in the first place.
+	 */
+	for (i = 0; i < nr_entries; i++) {
+		struct cache_entry *tree_ce;
+		int len, rc;
+
+		src[0] = o->src_index->cache[pos + i];
+
+		len = ce_namelen(src[0]);
+		tree_ce = xcalloc(1, cache_entry_size(len));
+
+		tree_ce->ce_mode = src[0]->ce_mode;
+		tree_ce->ce_flags = create_ce_flags(0);
+		tree_ce->ce_namelen = len;
+		oidcpy(&tree_ce->oid, &src[0]->oid);
+		memcpy(tree_ce->name, src[0]->name, len + 1);
+
+		for (d = 1; d <= nr_names; d++)
+			src[d] = tree_ce;
+
+		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
+		free(tree_ce);
+		if (rc < 0)
+			return rc;
+
+		mark_ce_used(src[0], o);
+	}
+	if (o->debug_unpack)
+		printf("Unpacked %d entries from %s to %s using cache-tree\n",
+		       nr_entries,
+		       o->src_index->cache[pos]->name,
+		       o->src_index->cache[pos + nr_entries - 1]->name);
+	return 0;
+}
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
 				    unsigned long df_conflicts,
 				    struct name_entry *names,
@@ -655,6 +751,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	void *buf[MAX_UNPACK_TREES];
 	struct traverse_info newinfo;
 	struct name_entry *p;
+	int nr_entries;
+
+	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
+	if (nr_entries > 0) {
+		struct unpack_trees_options *o = info->data;
+		int pos = index_pos_by_traverse_info(names, info);
+
+		if (!o->merge || df_conflicts)
+			BUG("Wrong condition to get here buddy");
+		return traverse_by_cache_tree(pos, nr_entries, n, names, info);
+	}
 
 	p = names;
 	while (!p->mode)
@@ -814,6 +921,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
 	return ce;
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this function
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_nondirectories(int n, unsigned long mask,
 				 unsigned long dirmask,
 				 struct cache_entry **src,
@@ -998,6 +1110,11 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this funciton
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
  2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
  2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-04  5:37                           ` Nguyễn Thái Ngọc Duy
  2018-08-08 18:30                             ` Elijah Newren
  2018-08-04  5:37                           ` [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
                                             ` (2 subsequent siblings)
  5 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-04  5:37 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

This is a micro optimization that probably only shines on repos with
deep directory structure. Instead of allocating and freeing a new
cache_entry in every iteration, we reuse the last one and only update
the parts that are new each iteration.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 unpack-trees.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index ba3d2e947e..c8defc2015 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -694,6 +694,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
+	struct cache_entry *tree_ce = NULL;
+	int ce_len = 0;
 	int i, d;
 
 	if (!o->merge)
@@ -708,30 +710,39 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	 * in the first place.
 	 */
 	for (i = 0; i < nr_entries; i++) {
-		struct cache_entry *tree_ce;
-		int len, rc;
+		int new_ce_len, len, rc;
 
 		src[0] = o->src_index->cache[pos + i];
 
 		len = ce_namelen(src[0]);
-		tree_ce = xcalloc(1, cache_entry_size(len));
+		new_ce_len = cache_entry_size(len);
+
+		if (new_ce_len > ce_len) {
+			new_ce_len <<= 1;
+			tree_ce = xrealloc(tree_ce, new_ce_len);
+			memset(tree_ce, 0, new_ce_len);
+			ce_len = new_ce_len;
+
+			tree_ce->ce_flags = create_ce_flags(0);
+
+			for (d = 1; d <= nr_names; d++)
+				src[d] = tree_ce;
+		}
 
 		tree_ce->ce_mode = src[0]->ce_mode;
-		tree_ce->ce_flags = create_ce_flags(0);
 		tree_ce->ce_namelen = len;
 		oidcpy(&tree_ce->oid, &src[0]->oid);
 		memcpy(tree_ce->name, src[0]->name, len + 1);
 
-		for (d = 1; d <= nr_names; d++)
-			src[d] = tree_ce;
-
 		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
-		free(tree_ce);
-		if (rc < 0)
+		if (rc < 0) {
+			free(tree_ce);
 			return rc;
+		}
 
 		mark_ce_used(src[0], o);
 	}
+	free(tree_ce);
 	if (o->debug_unpack)
 		printf("Unpacked %d entries from %s to %s using cache-tree\n",
 		       nr_entries,
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
                                             ` (2 preceding siblings ...)
  2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
@ 2018-08-04  5:37                           ` Nguyễn Thái Ngọc Duy
  2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
  5 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-04  5:37 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff

With the new cache-tree, we could mostly avoid I/O (due to odb access)
the code mostly becomes a loop of "check this, check that, add the
entry to the index". We could skip a couple checks in this giant loop
to go faster:

- We know here that we're copying entries from the source index to the
  result one. All paths in the source index must have been validated
  at load time already (and we're not taking strange paths from tree
  objects) which means we can skip verify_path() without compromise.

- We also know that D/F conflicts can't happen for all these entries
  (since cache-tree and all the trees are the same) so we can skip
  that as well.

This gives rather nice speedups for "unpack trees" rows where "unpack
trees" time is now cut in half compared to when
traverse_by_cache_tree() is added, or 1/7 of the original "unpack
trees" time.

   baseline      cache-tree    this patch
 --------------------------------------------------------------------
   0.018239226   0.019365414   0.020519621 s: read cache .git/index
   0.052541655   0.049605548   0.048814384 s: preload index
   0.001537598   0.001571695   0.001575382 s: refresh index
   0.168167768   0.049677212   0.024719308 s: unpack trees
   0.002897186   0.002845256   0.002805555 s: update worktree after a merge
   0.131661745   0.136597522   0.134891617 s: repair cache-tree
   0.075389117   0.075422517   0.074832291 s: write index, changed mask = 2a
   0.111702023   0.032813253   0.008616479 s: unpack trees
   0.000023245   0.000022002   0.000026630 s: update worktree after a merge
   0.111793866   0.032933140   0.008714071 s: diff-index
   0.587933288   0.398924370   0.380452871 s: git command: /home/pclouds/w/git/git

Total saving of this new patch looks even less impressive, now that
time spent in unpacking trees is so small. Which is why the next
attempt should be on that "repair cache-tree" line.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h        |  1 +
 read-cache.c   |  3 ++-
 unpack-trees.c | 20 ++++++++++++++++++++
 unpack-trees.h |  1 +
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 8b447652a7..e6f7ee4b64 100644
--- a/cache.h
+++ b/cache.h
@@ -673,6 +673,7 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
 #define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
 #define ADD_CACHE_NEW_ONLY 16		/* Do not replace existing ones */
 #define ADD_CACHE_KEEP_CACHE_TREE 32	/* Do not invalidate cache-tree */
+#define ADD_CACHE_SKIP_VERIFY_PATH 64	/* Do not verify path */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
 extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
 
diff --git a/read-cache.c b/read-cache.c
index e865254bea..b0b5df5de7 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1170,6 +1170,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
 	int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
+	int skip_verify_path = option & ADD_CACHE_SKIP_VERIFY_PATH;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
 	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
@@ -1210,7 +1211,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 
 	if (!ok_to_add)
 		return -1;
-	if (!verify_path(ce->name, ce->ce_mode))
+	if (!skip_verify_path && !verify_path(ce->name, ce->ce_mode))
 		return error("Invalid path '%s'", ce->name);
 
 	if (!skip_df_check &&
diff --git a/unpack-trees.c b/unpack-trees.c
index c8defc2015..1438ee1555 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -201,6 +201,7 @@ static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
 
 	ce->ce_flags = (ce->ce_flags & ~clear) | set;
 	return add_index_entry(&o->result, ce,
+			       o->extra_add_index_flags |
 			       ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
 }
 
@@ -701,6 +702,24 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	if (!o->merge)
 		BUG("We need cache-tree to do this optimization");
 
+	/*
+	 * Try to keep add_index_entry() as fast as possible since
+	 * we're going to do a lot of them.
+	 *
+	 * Skipping verify_path() should totally be safe because these
+	 * paths are from the source index, which must have been
+	 * verified.
+	 *
+	 * Skipping D/F and cache-tree validation checks is trickier
+	 * because it assumes what n-merge code would do when all
+	 * trees and the index are the same. We probably could just
+	 * optimize those code instead (e.g. we don't invalidate that
+	 * many cache-tree, but the searching for them is very
+	 * expensive).
+	 */
+	o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
+	o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
+
 	/*
 	 * Do what unpack_callback() and unpack_nondirectories() normally
 	 * do. But we walk all paths recursively in just one loop instead.
@@ -742,6 +761,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 
 		mark_ce_used(src[0], o);
 	}
+	o->extra_add_index_flags = 0;
 	free(tree_ce);
 	if (o->debug_unpack)
 		printf("Unpacked %d entries from %s to %s using cache-tree\n",
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..94e1b14078 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -80,6 +80,7 @@ struct unpack_trees_options {
 	struct index_state result;
 
 	struct exclude_list *el; /* for internal use */
+	unsigned int extra_add_index_flags;
 };
 
 extern int unpack_trees(unsigned n, struct tree_desc *t,
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
                                             ` (3 preceding siblings ...)
  2018-08-04  5:37                           ` [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-06 15:48                           ` Junio C Hamano
  2018-08-06 15:59                             ` Duy Nguyen
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
  5 siblings, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-08-06 15:48 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: Ben.Peart, git, peartben, peff

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This is a minor update to address Ben's comments and add his
> measurements in the commit message of 2/4 for the record.

Yay.

> I've also checked about the lookahead thing in unpack_trees() to see
> if we accidentally break something there, which is my biggest worry.
> See [1] and [2] for context, but I believe since we can't have D/F
> conflicts, the situation where lookahead is needed will not occur. So
> we should be safe.

Isn't this about branch switching, where the currently checked out
branch may have a regular file 't' and checking out another branch
that has directory 't' in it (or vice versa, possibly with the index
having either a regular file 't' or requiring 't' to be a diretory
by having a blob 't/1' in it)?  The log messge of [1] talks about
walking three trees together with the index, but even if we limit us
to two-tree walk, I do not think that the picture fundamentally
changes.  So I am not sure how we can confidently say "we can't have
D/F".  I'd need to block a solid time to take a look at the patches.

> [1] da165f470e (unpack-trees.c: prepare for looking ahead in the index - 2010-01-07)
> [2] 730f72840c (unpack-trees.c: look ahead in the index - 2009-09-20)

Thanks.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
@ 2018-08-06 15:59                             ` Duy Nguyen
  2018-08-06 18:59                               ` Junio C Hamano
  2018-08-08 17:46                               ` Junio C Hamano
  0 siblings, 2 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-06 15:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Peart, Git Mailing List, Ben Peart, Jeff King

On Mon, Aug 6, 2018 at 5:48 PM Junio C Hamano <gitster@pobox.com> wrote:
> > I've also checked about the lookahead thing in unpack_trees() to see
> > if we accidentally break something there, which is my biggest worry.
> > See [1] and [2] for context, but I believe since we can't have D/F
> > conflicts, the situation where lookahead is needed will not occur. So
> > we should be safe.
>
> Isn't this about branch switching, where the currently checked out
> branch may have a regular file 't' and checking out another branch
> that has directory 't' in it (or vice versa, possibly with the index
> having either a regular file 't' or requiring 't' to be a diretory
> by having a blob 't/1' in it)?

We require the unpacked entry from all input trees to be a tree
objects (the dirmask thing), so if one tree has 't' as a file,
all_trees_same_as_cache_tree() should return false and not trigger
this optimization. Same thing for the index, if it has the file 't',
then we should not have the cache-tree at path 't' and the
optimization is skipped as well.

So yes branch switching definitely can have d/f conflicts, but we
should never ever accidentally run this new optimization when that
happens.

> The log messge of [1] talks about
> walking three trees together with the index, but even if we limit us
> to two-tree walk, I do not think that the picture fundamentally
> changes.  So I am not sure how we can confidently say "we can't have
> D/F".  I'd need to block a solid time to take a look at the patches.

Yes please :)
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-06 15:59                             ` Duy Nguyen
@ 2018-08-06 18:59                               ` Junio C Hamano
  2018-08-08 17:00                                 ` Ben Peart
  2018-08-08 17:46                               ` Junio C Hamano
  1 sibling, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-08-06 18:59 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Ben Peart, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> We require the unpacked entry from all input trees to be a tree
> objects (the dirmask thing), so if one tree has 't' as a file,

Ah, OK, this is still part of that "all the trees match cache tree
so we walk the index instead" optimization.  I forgot about that.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-06 18:59                               ` Junio C Hamano
@ 2018-08-08 17:00                                 ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-08 17:00 UTC (permalink / raw)
  To: Junio C Hamano, Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Jeff King



On 8/6/2018 2:59 PM, Junio C Hamano wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
> 
>> We require the unpacked entry from all input trees to be a tree
>> objects (the dirmask thing), so if one tree has 't' as a file,
> 
> Ah, OK, this is still part of that "all the trees match cache tree
> so we walk the index instead" optimization.  I forgot about that.
> 

I ran this set of patches through the VFS For Git set of functional 
tests as well as our performance test suite (my earlier perf numbers 
were from manual testing).  All the functional tests pass and the 
performance tests are looking _very_ promising.

Checkout times are impacted most and on average drop from 20.96	seconds 
to 11.63 seconds for a 45% savings.

Merge times drop from 19.44 seconds to 12.88 for a 34% savings.

Rebase times drop from 26.78 seconds to 20.72 for a 23% savings.

Overall, I'm looking forward to a good review of the patches and seeing 
them get merged as soon as they are ready.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-06 15:59                             ` Duy Nguyen
  2018-08-06 18:59                               ` Junio C Hamano
@ 2018-08-08 17:46                               ` Junio C Hamano
  2018-08-08 18:12                                 ` Junio C Hamano
  1 sibling, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-08-08 17:46 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Ben Peart, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Aug 6, 2018 at 5:48 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > I've also checked about the lookahead thing in unpack_trees() to see
>> > if we accidentally break something there, which is my biggest worry.
>> > See [1] and [2] for context, but I believe since we can't have D/F
>> > conflicts, the situation where lookahead is needed will not occur. So
>> > we should be safe.

I think you would want the same "switch cache-bottom before
descending into a subdirectory, and then restore cache-bottom after
traversal comes back" dance that is done for the normal tree
traversal case to happen.

	bottom = switch_cache_bottom(&newinfo);
	ret = traverse_trees(n, t, &newinfo);
	restore_cache_bottom(&newinfo, bottom);

During your walk of the index and the trees that are known to be in
sync, there is little reason to worry about the cache_bottom, which
is advanced by calling mark_ce_used() in traverse_by_cache_tree().
Where it matters is what happens after the traversal comes back out
of the subtree.  find_cache_pos() uses the bottom pointer so that it
does not have to go back to far to find an index entry that has not
been used to match with the entries from the trees (which are not
sorted exactly the same way as the index, unfortunately), so
forgetting to advance the bottom pointer while correctly marking a
ce as "used" is OK (i.e. hurts performance but not correctness), but
advancing the bottom pointer too much and leaving entries that are
not used behind is *not* OK.  And lack of restoring the bottom in
the new codepath makes me suspect exactly such a bug _after_ the
traversal exits the subtree we are using this new optimization in
and moves on.

Imagine we are iterating over the top-level of the trees, and found
a subtree in them.  There may be some index entries before the first
path in this subtree that are not yet marked as "used", the earliest
of which is pointed at by the cache_bottom pointer.

Before descending into the subtree (and start consuming the entry
with the first in this subtree from the index), we stash away the
current cache_bottom, and then start walking the subtree.  While we
are in that subtree, the cache_bottom starts from the first name in
the subtree and increments, as we _know_ the entry at the old
cache_bottom is outside this subtree and will not match any entry
form the subtree.  Then when the traversal returns, all index
entries within the "subtree/" path will be marked "used".  At that
point, when we continue to scan the top-level of the trees, we need
to restore the cache_bottom, so that we do not forget entries that
we knew we needed to scan eventually, if there was any.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-08 17:46                               ` Junio C Hamano
@ 2018-08-08 18:12                                 ` Junio C Hamano
  2018-08-08 18:39                                   ` Junio C Hamano
  0 siblings, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-08-08 18:12 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Ben Peart, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> not used behind is *not* OK.  And lack of restoring the bottom in
> the new codepath makes me suspect exactly such a bug _after_ the
> traversal exits the subtree we are using this new optimization in
> and moves on.

Hmph, thinking about this further, I cannot convince myself that
lack of bottom adjustment can lead to a triggerable bug.  The only
case that a subtree traversal need to skip some unpacked entries in
the index and then revisit them by rewinding, e.g. entries "t-i" and
"t-j" that are left unprocessed while entries "t/1", "t/2", etc. are
processed, in the illustration of da165f47 ("unpack-trees.c: prepare
for looking ahead in the index", 2010-01-07), is when one of the
trees have a non-tree with the same name as the subtree we are
trying to descend into, and as long as we know all trees have the
thing as a tree, I do not think of a case where such ordering
inversion would get in the way.

That was the only thing I found questionable in 2/4, which is the
most important piece in the series, so we probably are OK.

Thanks for working on this one.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree
  2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-08 18:23                             ` Elijah Newren
  2018-08-10 16:29                               ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Elijah Newren @ 2018-08-08 18:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Fri, Aug 3, 2018 at 10:39 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> From: Duy Nguyen <pclouds@gmail.com>
>
> In order to merge one or many trees with the index, unpack-trees code
> walks multiple trees in parallel with the index and performs n-way
> merge. If we find out at start of a directory that all trees are the
> same (by comparing OID) and cache-tree happens to be available for
> that directory as well, we could avoid walking the trees because we
> already know what these trees contain: it's flattened in what's called
> "the index".

This is cool.

> The upside is of course a lot less I/O since we can potentially skip
> lots of trees (think subtrees). We also save CPU because we don't have
> to inflate and the apply deltas. The downside is of course more

s/and the apply/and apply the/

> fragile code since the logic in some functions are now duplicated
> elsewhere.
>
> "checkout -" with this patch on gcc.git:
>
>     baseline      new
>   --------------------------------------------------------------------
>     0.018239226   0.019365414 s: read cache .git/index
>     0.052541655   0.049605548 s: preload index
>     0.001537598   0.001571695 s: refresh index
>     0.168167768   0.049677212 s: unpack trees
>     0.002897186   0.002845256 s: update worktree after a merge
>     0.131661745   0.136597522 s: repair cache-tree
>     0.075389117   0.075422517 s: write index, changed mask = 2a
>     0.111702023   0.032813253 s: unpack trees
>     0.000023245   0.000022002 s: update worktree after a merge
>     0.111793866   0.032933140 s: diff-index
>     0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git
>
> Another measurement from Ben's running "git checkout" with over 500k
> trees (on the whole series):
>
>     baseline        new
>   ----------------------------------------------------------------------
>     0.535510167     0.556558733     s: read cache .git/index
>     0.3057373       0.3147105       s: initialize name hash
>     0.0184082       0.023558433     s: preload index
>     0.086910967     0.089085967     s: refresh index
>     7.889590767     2.191554433     s: unpack trees
>     0.120760833     0.131941267     s: update worktree after a merge
>     2.2583504       2.572663167     s: repair cache-tree
>     0.8916137       0.959495233     s: write index, changed mask = 28
>     3.405199233     0.2710663       s: unpack trees
>     0.000999667     0.0021554       s: update worktree after a merge
>     3.4063306       0.273318333     s: diff-index
>     16.9524923      9.462943133     s: git command: git.exe checkout
>
> This command calls unpack_trees() twice, the first time on 2way merge
> and the second 1way merge. In both times, "unpack trees" time is
> reduced to one third. Overall time reduction is not that impressive of
> course because index operations take a big chunk. And there's that
> repair cache-tree line.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  unpack-trees.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 117 insertions(+)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index a32ddee159..ba3d2e947e 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
>         return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
>  }
>
> +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> +                                       struct name_entry *names,
> +                                       struct traverse_info *info)
> +{
> +       struct unpack_trees_options *o = info->data;
> +       int i;
> +
> +       if (!o->merge || dirmask != ((1 << n) - 1))
> +               return 0;
> +
> +       for (i = 1; i < n; i++)
> +               if (!are_same_oid(names, names + i))
> +                       return 0;
> +
> +       return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> +}

I was curious whether this could also be extended in the case of a
merge; as long as HEAD and MERGE have the same tree, even if the base
commit doesn't match, we can still just use the tree from HEAD which
should be in the current index/cache_tree.  However, it'd be a
somewhat odd history for HEAD and MERGE to match on some significantly
sized tree when the base commit doesn't also match.

> +
> +static int index_pos_by_traverse_info(struct name_entry *names,
> +                                     struct traverse_info *info)
> +{
> +       struct unpack_trees_options *o = info->data;
> +       int len = traverse_path_len(info, names);
> +       char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
> +       int pos;
> +
> +       make_traverse_path(name, info, names);
> +       name[len++] = '/';
> +       name[len] = '\0';
> +       pos = index_name_pos(o->src_index, name, len);
> +       if (pos >= 0)
> +               BUG("This is a directory and should not exist in index");
> +       pos = -pos - 1;
> +       if (!starts_with(o->src_index->cache[pos]->name, name) ||
> +           (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
> +               BUG("pos must point at the first entry in this directory");
> +       free(name);
> +       return pos;
> +}
> +
> +/*
> + * Fast path if we detect that all trees are the same as cache-tree at this
> + * path. We'll walk these trees recursively using cache-tree/index instead of
> + * ODB since already know what these trees contain.
> + */
> +static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> +                                 struct name_entry *names,
> +                                 struct traverse_info *info)
> +{
> +       struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> +       struct unpack_trees_options *o = info->data;
> +       int i, d;
> +
> +       if (!o->merge)
> +               BUG("We need cache-tree to do this optimization");
> +
> +       /*
> +        * Do what unpack_callback() and unpack_nondirectories() normally
> +        * do. But we walk all paths recursively in just one loop instead.
> +        *
> +        * D/F conflicts and staged entries are not a concern because

"staged entries"?  Do you mean "higher stage entries"?  I'm not sure
the correct terminology here, but the former makes me think of changes
the user has staged but not committed (i.e. stuff found at stage #0 in
the index, but which isn't found in any tree yet) vs. the latter which
I'd use to refer to entries at stages 1 or higher.

> +        * cache-tree would be invalidated and we would never get here
> +        * in the first place.
> +        */
> +       for (i = 0; i < nr_entries; i++) {
> +               struct cache_entry *tree_ce;
> +               int len, rc;
> +
> +               src[0] = o->src_index->cache[pos + i];
> +
> +               len = ce_namelen(src[0]);
> +               tree_ce = xcalloc(1, cache_entry_size(len));
> +
> +               tree_ce->ce_mode = src[0]->ce_mode;
> +               tree_ce->ce_flags = create_ce_flags(0);
> +               tree_ce->ce_namelen = len;
> +               oidcpy(&tree_ce->oid, &src[0]->oid);
> +               memcpy(tree_ce->name, src[0]->name, len + 1);

We do a bunch of work to setup tree_ce...

> +               for (d = 1; d <= nr_names; d++)
> +                       src[d] = tree_ce;

...then we make nr_names copies of tree_ce (so that *way_merge or
bind_merge or oneway_diff or whatever will have the expected number of
entries).

> +               rc = call_unpack_fn((const struct cache_entry * const *)src, o);

...then we call o->fn (via call_unpack_fn) to do various complicated
logic to figure out which tree_ce to use??  Isn't that just an
expensive way to recompute that what we currently have in the index is
what we want to keep there?

Granted, a caller of this may have set o->fn to something other than
{one,two,three}way_merge (or bind_merge), and that function might have
important side effects...but it just seems annoying to have to do so
much work when for most uses we already know the entry in the index is
the one we already want.  In fact, the only other thing in the
codebase that o->fn is now set to is oneway_diff, which I think is a
no-op when the two trees match.

Would be nice if we could avoid all this, at least in the common cases
where o->fn is a function known to not have side effects.  Or did I
not read those functions closely enough and they do have important
side effects?

> +               free(tree_ce);
> +               if (rc < 0)
> +                       return rc;
> +
> +               mark_ce_used(src[0], o);
> +       }
> +       if (o->debug_unpack)
> +               printf("Unpacked %d entries from %s to %s using cache-tree\n",
> +                      nr_entries,
> +                      o->src_index->cache[pos]->name,
> +                      o->src_index->cache[pos + nr_entries - 1]->name);
> +       return 0;
> +}
> +
>  static int traverse_trees_recursive(int n, unsigned long dirmask,
>                                     unsigned long df_conflicts,
>                                     struct name_entry *names,
> @@ -655,6 +751,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>         void *buf[MAX_UNPACK_TREES];
>         struct traverse_info newinfo;
>         struct name_entry *p;
> +       int nr_entries;
> +
> +       nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
> +       if (nr_entries > 0) {
> +               struct unpack_trees_options *o = info->data;
> +               int pos = index_pos_by_traverse_info(names, info);
> +
> +               if (!o->merge || df_conflicts)
> +                       BUG("Wrong condition to get here buddy");

heh.  :)

> +               return traverse_by_cache_tree(pos, nr_entries, n, names, info);
> +       }
>
>         p = names;
>         while (!p->mode)
> @@ -814,6 +921,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
>         return ce;
>  }
>
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this function
> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>  static int unpack_nondirectories(int n, unsigned long mask,
>                                  unsigned long dirmask,
>                                  struct cache_entry **src,
> @@ -998,6 +1110,11 @@ static void debug_unpack_callback(int n,
>                 debug_name_entry(i, names + i);
>  }
>
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this funciton

s/funciton/function/

> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>  static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
>  {
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> --
> 2.18.0.656.gda699b98b3

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk
  2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
@ 2018-08-08 18:30                             ` Elijah Newren
  0 siblings, 0 replies; 121+ messages in thread
From: Elijah Newren @ 2018-08-08 18:30 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Fri, Aug 3, 2018 at 10:39 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> This is a micro optimization that probably only shines on repos with
> deep directory structure. Instead of allocating and freeing a new
> cache_entry in every iteration, we reuse the last one and only update
> the parts that are new each iteration.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  unpack-trees.c | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index ba3d2e947e..c8defc2015 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -694,6 +694,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>  {
>         struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
>         struct unpack_trees_options *o = info->data;
> +       struct cache_entry *tree_ce = NULL;
> +       int ce_len = 0;
>         int i, d;
>
>         if (!o->merge)
> @@ -708,30 +710,39 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>          * in the first place.
>          */
>         for (i = 0; i < nr_entries; i++) {
> -               struct cache_entry *tree_ce;
> -               int len, rc;
> +               int new_ce_len, len, rc;
>
>                 src[0] = o->src_index->cache[pos + i];
>
>                 len = ce_namelen(src[0]);
> -               tree_ce = xcalloc(1, cache_entry_size(len));
> +               new_ce_len = cache_entry_size(len);
> +
> +               if (new_ce_len > ce_len) {
> +                       new_ce_len <<= 1;
> +                       tree_ce = xrealloc(tree_ce, new_ce_len);
> +                       memset(tree_ce, 0, new_ce_len);
> +                       ce_len = new_ce_len;
> +
> +                       tree_ce->ce_flags = create_ce_flags(0);
> +
> +                       for (d = 1; d <= nr_names; d++)
> +                               src[d] = tree_ce;
> +               }
>
>                 tree_ce->ce_mode = src[0]->ce_mode;
> -               tree_ce->ce_flags = create_ce_flags(0);
>                 tree_ce->ce_namelen = len;
>                 oidcpy(&tree_ce->oid, &src[0]->oid);
>                 memcpy(tree_ce->name, src[0]->name, len + 1);
>
> -               for (d = 1; d <= nr_names; d++)
> -                       src[d] = tree_ce;
> -
>                 rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> -               free(tree_ce);
> -               if (rc < 0)
> +               if (rc < 0) {
> +                       free(tree_ce);
>                         return rc;
> +               }
>
>                 mark_ce_used(src[0], o);
>         }
> +       free(tree_ce);
>         if (o->debug_unpack)
>                 printf("Unpacked %d entries from %s to %s using cache-tree\n",
>                        nr_entries,
> --
> 2.18.0.656.gda699b98b3

Seems reasonable, when we really do have to invoke call_unpack_fn.
I'm still curious if there are reasons why we couldn't just skip that
call (at least when o->fn is one of {oneway_merge, twoway_merge,
threeway_merge, bind_merge}), but I already brought that up in my
comments on patch 2.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-08 18:12                                 ` Junio C Hamano
@ 2018-08-08 18:39                                   ` Junio C Hamano
  2018-08-10 16:53                                     ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-08-08 18:39 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Ben Peart, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> not used behind is *not* OK.  And lack of restoring the bottom in
>> the new codepath makes me suspect exactly such a bug _after_ the
>> traversal exits the subtree we are using this new optimization in
>> and moves on.
>
> Hmph, thinking about this further, I cannot convince myself that
> lack of bottom adjustment can lead to a triggerable bug.  The only
> case that a subtree traversal need to skip some unpacked entries in
> the index and then revisit them by rewinding, e.g. entries "t-i" and
> "t-j" that are left unprocessed while entries "t/1", "t/2", etc. are
> processed, in the illustration of da165f47 ("unpack-trees.c: prepare
> for looking ahead in the index", 2010-01-07), is when one of the
> trees have a non-tree with the same name as the subtree we are
> trying to descend into, and as long as we know all trees have the
> thing as a tree, I do not think of a case where such ordering
> inversion would get in the way.

One more, and hopefully the final, note.

A paranoid may be soothed by a simple "cache_bottom must match pos
at this point" at the beginning of the optimized traversal.  Just
like you already have an assert to ensure that pos points at the
first entry in the directory in index_pos_by_traverse_info(), there
should not be any unused entry in the index before that entry and
the bottom pointer must be pointing at it.  It is a cheap check, and
if violated, would indicate that the above "I do not think of a
case ..." was incomplete.

> That was the only thing I found questionable in 2/4, which is the
> most important piece in the series, so we probably are OK.
>
> Thanks for working on this one.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-08 18:46                           ` Elijah Newren
  2018-08-10 16:39                             ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Elijah Newren @ 2018-08-08 18:46 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Sun, Jul 29, 2018 at 3:36 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> With the new cache-tree, we could mostly avoid I/O (due to odb access)
> the code mostly becomes a loop of "check this, check that, add the
> entry to the index". We could skip a couple checks in this giant loop
> to go faster:
>
> - We know here that we're copying entries from the source index to the
>   result one. All paths in the source index must have been validated
>   at load time already (and we're not taking strange paths from tree
>   objects) which means we can skip verify_path() without compromise.
>
> - We also know that D/F conflicts can't happen for all these entries
>   (since cache-tree and all the trees are the same) so we can skip
>   that as well.
>
> This gives rather nice speedups for "unpack trees" rows where "unpack
> trees" time is now cut in half compared to when
> traverse_by_cache_tree() is added, or 1/7 of the original "unpack
> trees" time.
>
>    baseline      cache-tree    this patch
>  --------------------------------------------------------------------
>    0.018239226   0.019365414   0.020519621 s: read cache .git/index
>    0.052541655   0.049605548   0.048814384 s: preload index
>    0.001537598   0.001571695   0.001575382 s: refresh index
>    0.168167768   0.049677212   0.024719308 s: unpack trees
>    0.002897186   0.002845256   0.002805555 s: update worktree after a merge
>    0.131661745   0.136597522   0.134891617 s: repair cache-tree
>    0.075389117   0.075422517   0.074832291 s: write index, changed mask = 2a
>    0.111702023   0.032813253   0.008616479 s: unpack trees
>    0.000023245   0.000022002   0.000026630 s: update worktree after a merge
>    0.111793866   0.032933140   0.008714071 s: diff-index
>    0.587933288   0.398924370   0.380452871 s: git command: /home/pclouds/w/git/git
>
> Total saving of this new patch looks even less impressive, now that
> time spent in unpacking trees is so small. Which is why the next
> attempt should be on that "repair cache-tree" line.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  cache.h        |  1 +
>  read-cache.c   |  3 ++-
>  unpack-trees.c | 27 +++++++++++++++++++++++++++
>  unpack-trees.h |  1 +
>  4 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/cache.h b/cache.h
> index 8b447652a7..e6f7ee4b64 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -673,6 +673,7 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
>  #define ADD_CACHE_JUST_APPEND 8                /* Append only; tree.c::read_tree() */
>  #define ADD_CACHE_NEW_ONLY 16          /* Do not replace existing ones */
>  #define ADD_CACHE_KEEP_CACHE_TREE 32   /* Do not invalidate cache-tree */
> +#define ADD_CACHE_SKIP_VERIFY_PATH 64  /* Do not verify path */
>  extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
>  extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
>
> diff --git a/read-cache.c b/read-cache.c
> index e865254bea..b0b5df5de7 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1170,6 +1170,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>         int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
>         int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
>         int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> +       int skip_verify_path = option & ADD_CACHE_SKIP_VERIFY_PATH;
>         int new_only = option & ADD_CACHE_NEW_ONLY;
>
>         if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
> @@ -1210,7 +1211,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>
>         if (!ok_to_add)
>                 return -1;
> -       if (!verify_path(ce->name, ce->ce_mode))
> +       if (!skip_verify_path && !verify_path(ce->name, ce->ce_mode))
>                 return error("Invalid path '%s'", ce->name);
>
>         if (!skip_df_check &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index c33ebaf001..dc62afd968 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -201,6 +201,7 @@ static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
>
>         ce->ce_flags = (ce->ce_flags & ~clear) | set;
>         return add_index_entry(&o->result, ce,
> +                              o->extra_add_index_flags |
>                                ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
>  }
>
> @@ -701,6 +702,24 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>         if (!o->merge)
>                 BUG("We need cache-tree to do this optimization");
>
> +       /*
> +        * Try to keep add_index_entry() as fast as possible since
> +        * we're going to do a lot of them.
> +        *
> +        * Skipping verify_path() should totally be safe because these
> +        * paths are from the source index, which must have been
> +        * verified.
> +        *
> +        * Skipping D/F and cache-tree validation checks is trickier
> +        * because it assumes what n-merge code would do when all
> +        * trees and the index are the same. We probably could just
> +        * optimize those code instead (e.g. we don't invalidate that
> +        * many cache-tree, but the searching for them is very
> +        * expensive).
> +        */
> +       o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
> +       o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
> +

In sum of this whole patch, you notice that the Nway_merge functions
are still a bit of a bottleneck, but you know you have a special case
where you want them to put an entry in the index that matches what is
already there, so you try to set some extra flags to short-circuit
part of their logic and get to what you know is the correct result.

This seems a little scary to me.  I think it's probably safe as long
as o->fn is one of {oneway_merge, twoway_merge, threeway_merge,
bind_merge} (the cases you have in mind and which the current code
uses), but the caller isn't limited to those.  Right now in
diff-lib.c, there's a caller that has their own function, oneway_diff.
More could be added in the future.

If we're going to go this route, I think we should first check that
o->fn is one of those known safe functions.  And if we're going that
route, the comments I bring up on patch 2 about possibly avoiding
call_unpack_fn() altogether might even obviate this patch while
speeding things up more.

>         /*
>          * Do what unpack_callback() and unpack_nondirectories() normally
>          * do. But we walk all paths recursively in just one loop instead.
> @@ -742,6 +761,7 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>
>                 mark_ce_used(src[0], o);
>         }
> +       o->extra_add_index_flags = 0;
>         free(tree_ce);
>         if (o->debug_unpack)
>                 printf("Unpacked %d entries from %s to %s using cache-tree\n",
> @@ -1561,6 +1581,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>                 if (!ret) {
>                         if (!o->result.cache_tree)
>                                 o->result.cache_tree = cache_tree();
> +                       /*
> +                        * TODO: Walk o.src_index->cache_tree, quickly check
> +                        * if o->result.cache has the exact same content for
> +                        * any valid cache-tree in o.src_index, then we can
> +                        * just copy the cache-tree over instead of hashing a
> +                        * new tree object.
> +                        */

Interesting.  I really don't know how cache_tree works...but if we
avoided calling call_unpack_fn, and thus left the original index entry
in place instead of replacing it with an equal one, would that as a
side effect speed up the cache_tree_valid/cache_tree_update calls for
us?  Or is there still work here?

>                         if (!cache_tree_fully_valid(o->result.cache_tree))
>                                 cache_tree_update(&o->result,
>                                                   WRITE_TREE_SILENT |
> diff --git a/unpack-trees.h b/unpack-trees.h
> index c2b434c606..94e1b14078 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -80,6 +80,7 @@ struct unpack_trees_options {
>         struct index_state result;
>
>         struct exclude_list *el; /* for internal use */
> +       unsigned int extra_add_index_flags;
>  };
>
>  extern int unpack_trees(unsigned n, struct tree_desc *t,
> --
> 2.18.0.656.gda699b98b3

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-08-01 16:38                                 ` Duy Nguyen
@ 2018-08-08 20:53                                   ` Ben Peart
  2018-08-09  8:16                                     ` Ben Peart
  2018-08-10 15:51                                     ` Duy Nguyen
  0 siblings, 2 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-08 20:53 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King



On 8/1/2018 12:38 PM, Duy Nguyen wrote:
> On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:
>>
>>
>> On 7/31/2018 12:50 PM, Ben Peart wrote:
>>>
>>>
>>> On 7/31/2018 11:31 AM, Duy Nguyen wrote:
>>
>>>>
>>>>> In the performance game of whack-a-mole, that call to repair cache-tree
>>>>> is now looking quite expensive...
>>>>
>>>> Yeah and I think we can whack that mole too. I did some measurement.
>>>> Best case possible, we just need to scan through two indexes (one with
>>>> many good cache-tree, one with no cache-tree), compare and copy
>>>> cache-tree over. The scanning takes like 1% time of current repair
>>>> step and I suspect it's the hashing that takes most of the time. Of
>>>> course real world won't have such nice numbers, but I guess we could
>>>> maybe half cache-tree update/repair time.
>>>>
>>>
>>> I have some great profiling tools available so will take a look at this
>>> next and see exactly where the time is being spent.
>>
>> Good instincts.  In cache_tree_update, the heavy hitter is definitely
>> hash_object_file followed by has_object_file.
>>
>> Name                               	Inc %	     Inc
>> + git!cache_tree_update            	 12.4	   4,935
>> |+ git!update_one                  	 11.8	   4,706
>> | + git!update_one                 	 11.8	   4,706
>> |  + git!hash_object_file          	  6.1	   2,406
>> |  + git!has_object_file           	  2.0	     813
>> |  + OTHER <<vcruntime140d!strchr>>	  0.5	     203
>> |  + git!strbuf_addf               	  0.4	     155
>> |  + git!strbuf_release            	  0.4	     143
>> |  + git!strbuf_add                	  0.3	     121
>> |  + OTHER <<vcruntime140d!memcmp>>	  0.2	      93
>> |  + git!strbuf_grow               	  0.1	      25
> 
> Ben, if you work on this, this could be a good starting point. I will
> not work on this because I still have some other things to catch up
> and follow through. You can have my sign off if you reuse something
> from this patch
> 
> Even if it's a naive implementation, the initial numbers look pretty
> good. Without the patch we have
> 
> 18:31:05.970621 unpack-trees.c:1437     performance: 0.000001029 s: copy
> 18:31:05.975729 unpack-trees.c:1444     performance: 0.005082004 s: update
> 
> And with the patch
> 
> 18:31:13.295655 unpack-trees.c:1437     performance: 0.000198017 s: copy
> 18:31:13.296757 unpack-trees.c:1444     performance: 0.001075935 s: update
> 
> Time saving is about 80% by the look of this (best possible case
> because only the top tree needs to be hashed and written out).
> 
> -- 8< --
> diff --git a/cache-tree.c b/cache-tree.c
> index 6b46711996..67a4a93100 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags)
>   	return 0;
>   }
>   
> +static int same(const struct cache_entry *a, const struct cache_entry *b)
> +{
> +	if (ce_stage(a) || ce_stage(b))
> +		return 0;
> +	if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
> +		return 0;
> +	return a->ce_mode == b->ce_mode &&
> +	       !oidcmp(&a->oid, &b->oid);
> +}
> +
> +static int cache_tree_name_pos(const struct index_state *istate,
> +			       const struct strbuf *path)
> +{
> +	int pos;
> +
> +	if (!path->len)
> +		return 0;
> +
> +	pos = index_name_pos(istate, path->buf, path->len);
> +	if (pos >= 0)
> +		BUG("No no no, directory path must not exist in index");
> +	return -pos - 1;
> +}
> +
> +/*
> + * Locate the same cache-tree in two separate indexes. Check the
> + * cache-tree is still valid for the "to" index (i.e. it contains the
> + * same set of entries in the "from" index).
> + */
> +static int verify_one_cache_tree(const struct index_state *to,
> +				 const struct index_state *from,
> +				 const struct cache_tree *it,
> +				 const struct strbuf *path)
> +{
> +	int i, spos, dpos;
> +
> +	spos = cache_tree_name_pos(from, path);
> +	if (spos + it->entry_count > from->cache_nr)
> +		return -1;
> +
> +	dpos = cache_tree_name_pos(to, path);
> +	if (dpos + it->entry_count > to->cache_nr)
> +		return -1;
> +
> +	/* Can we quickly check head and tail and bail out early */
> +	if (!same(from->cache[spos], to->cache[spos]) ||
> +	    !same(from->cache[spos + it->entry_count - 1],
> +		  to->cache[spos + it->entry_count - 1]))
> +		return -1;
> +
> +	for (i = 1; i < it->entry_count - 1; i++)
> +		if (!same(from->cache[spos + i],
> +			  to->cache[dpos + i]))
> +			return -1;
> +
> +	return 0;
> +}
> +
> +static int verify_and_invalidate(struct index_state *to,
> +				 const struct index_state *from,
> +				 struct cache_tree *it,
> +				 struct strbuf *path)
> +{
> +	/*
> +	 * Optimistically verify the current tree first. Alternatively
> +	 * we could verify all the subtrees first then do this
> +	 * last. Any invalid subtree would also invalidates its
> +	 * ancestors.
> +	 */
> +	if (it->entry_count != -1 &&
> +	    verify_one_cache_tree(to, from, it, path))
> +		it->entry_count = -1;
> +
> +	/*
> +	 * If the current tree is valid, don't bother checking
> +	 * inside. All subtrees _should_ also be valid
> +	 */
> +	if (it->entry_count == -1) {
> +		int i, len = path->len;
> +
> +		for (i = 0; i < it->subtree_nr; i++) {
> +			struct cache_tree_sub *down = it->down[i];
> +
> +			if (!down || !down->cache_tree)
> +				continue;
> +
> +			strbuf_setlen(path, len);
> +			strbuf_add(path, down->name, down->namelen);
> +			strbuf_addch(path, '/');
> +			if (verify_and_invalidate(to, from,
> +						  down->cache_tree, path))
> +				return -1;
> +		}
> +		strbuf_setlen(path, len);
> +	}
> +	return 0;
> +}
> +
> +static struct cache_tree *duplicate_cache_tree(const struct cache_tree *src)
> +{
> +	struct cache_tree *dst;
> +	int i;
> +
> +	if (!src)
> +		return NULL;
> +
> +	dst = xmalloc(sizeof(*dst));
> +	dst->entry_count = src->entry_count;
> +	oidcpy(&dst->oid, &src->oid);
> +	dst->subtree_nr = src->subtree_nr;
> +	dst->subtree_alloc = dst->subtree_nr;
> +	ALLOC_ARRAY(dst->down, dst->subtree_alloc);
> +	for (i = 0; i < src->subtree_nr; i++) {
> +		struct cache_tree_sub *dsrc = src->down[i];
> +		struct cache_tree_sub *down;
> +
> +		FLEX_ALLOC_MEM(down, name, dsrc->name, dsrc->namelen);
> +		down->count = dsrc->count;
> +		down->namelen = dsrc->namelen;
> +		down->used = dsrc->used;
> +		down->cache_tree = duplicate_cache_tree(dsrc->cache_tree);
> +		dst->down[i] = down;
> +	}
> +	return dst;
> +}
> +
> +int cache_tree_copy(struct index_state *to, const struct index_state *from)
> +{
> +	struct cache_tree *it = duplicate_cache_tree(from->cache_tree);
> +	struct strbuf path = STRBUF_INIT;
> +	int ret;
> +
> +	if (to->cache_tree)
> +		BUG("Sorry merging cache-tree is not supported yet");
> +	ret = verify_and_invalidate(to, from, it, &path);
> +	to->cache_tree = it;
> +	to->cache_changed |= CACHE_TREE_CHANGED;
> +	strbuf_release(&path);
> +	return ret;
> +}
> +
>   static void write_one(struct strbuf *buffer, struct cache_tree *it,
>                         const char *path, int pathlen)
>   {
> diff --git a/cache-tree.h b/cache-tree.h
> index cfd5328cc9..6981da8e0d 100644
> --- a/cache-tree.h
> +++ b/cache-tree.h
> @@ -53,4 +53,6 @@ void prime_cache_tree(struct index_state *, struct tree *);
>   
>   extern int cache_tree_matches_traversal(struct cache_tree *, struct name_entry *ent, struct traverse_info *info);
>   
> +int cache_tree_copy(struct index_state *to, const struct index_state *from);
> +
>   #endif
> diff --git a/unpack-trees.c b/unpack-trees.c
> index cd0680f11e..cb3fdd42a6 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1427,12 +1427,22 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	ret = check_updates(o) ? (-2) : 0;
>   	if (o->dst_index) {
>   		if (!ret) {
> -			if (!o->result.cache_tree)
> +			if (!o->result.cache_tree) {
> +				uint64_t start = getnanotime();
> +#if 0
>   				o->result.cache_tree = cache_tree();
> -			if (!cache_tree_fully_valid(o->result.cache_tree))
> +#else
> +				cache_tree_copy(&o->result, o->src_index);
> +#endif
> +				trace_performance_since(start, "copy");
> +			}
> +			if (!cache_tree_fully_valid(o->result.cache_tree)) {
> +				uint64_t start = getnanotime();
>   				cache_tree_update(&o->result,
>   						  WRITE_TREE_SILENT |
>   						  WRITE_TREE_REPAIR);
> +				trace_performance_since(start, "update");
> +			}
>   		}
>   		move_index_extensions(&o->result, o->src_index);
>   		discard_index(o->dst_index);
> -- 8< --
> 

I like the idea (and the perf win!) but it seems like there is an 
important piece missing.  If I'm reading this correctly, unpack_trees() 
will copy the source cache tree (instead of creating a new one) and then 
verify_and_invalidate() will walk the cache tree and for any tree that 
is dirty, it will flag its ancestors as dirty as well.

What I don't understand is how any cache tree entries that became 
invalid as a result of the merge of the n-trees are marked as invalid. 
It seems like something needs to walk the cache tree and call 
cache_tree_invalidate_path() for all entries that changed as a result of 
the merge before the call to verify_and_invalidate().

I thought at first cache_tree_fully_valid() might do that but it only 
looks for entries that are already marked as invalid (or are missing 
their corresponding object in the object store).  It assumes something 
else has marked the invalid paths already.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-08-08 20:53                                   ` Ben Peart
@ 2018-08-09  8:16                                     ` Ben Peart
  2018-08-10 16:08                                       ` Duy Nguyen
  2018-08-10 15:51                                     ` Duy Nguyen
  1 sibling, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-08-09  8:16 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King



On 8/8/2018 4:53 PM, Ben Peart wrote:
> 
> 
> On 8/1/2018 12:38 PM, Duy Nguyen wrote:
>> On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:
>>>
>>>
>>> On 7/31/2018 12:50 PM, Ben Peart wrote:
>>>>
>>>>
>>>> On 7/31/2018 11:31 AM, Duy Nguyen wrote:
>>>
>>>>>
>>>>>> In the performance game of whack-a-mole, that call to repair 
>>>>>> cache-tree
>>>>>> is now looking quite expensive...
>>>>>
>>>>> Yeah and I think we can whack that mole too. I did some measurement.
>>>>> Best case possible, we just need to scan through two indexes (one with
>>>>> many good cache-tree, one with no cache-tree), compare and copy
>>>>> cache-tree over. The scanning takes like 1% time of current repair
>>>>> step and I suspect it's the hashing that takes most of the time. Of
>>>>> course real world won't have such nice numbers, but I guess we could
>>>>> maybe half cache-tree update/repair time.
>>>>>
>>>>
>>>> I have some great profiling tools available so will take a look at this
>>>> next and see exactly where the time is being spent.
>>>
>>> Good instincts.  In cache_tree_update, the heavy hitter is definitely
>>> hash_object_file followed by has_object_file.
>>>
>>> Name                                   Inc %         Inc
>>> + git!cache_tree_update                 12.4       4,935
>>> |+ git!update_one                       11.8       4,706
>>> | + git!update_one                      11.8       4,706
>>> |  + git!hash_object_file                6.1       2,406
>>> |  + git!has_object_file                 2.0         813
>>> |  + OTHER <<vcruntime140d!strchr>>      0.5         203
>>> |  + git!strbuf_addf                     0.4         155
>>> |  + git!strbuf_release                  0.4         143
>>> |  + git!strbuf_add                      0.3         121
>>> |  + OTHER <<vcruntime140d!memcmp>>      0.2          93
>>> |  + git!strbuf_grow                     0.1          25
>>
>> Ben, if you work on this, this could be a good starting point. I will
>> not work on this because I still have some other things to catch up
>> and follow through. You can have my sign off if you reuse something
>> from this patch
>>
>> Even if it's a naive implementation, the initial numbers look pretty
>> good. Without the patch we have
>>
>> 18:31:05.970621 unpack-trees.c:1437     performance: 0.000001029 s: copy
>> 18:31:05.975729 unpack-trees.c:1444     performance: 0.005082004 s: 
>> update
>>
>> And with the patch
>>
>> 18:31:13.295655 unpack-trees.c:1437     performance: 0.000198017 s: copy
>> 18:31:13.296757 unpack-trees.c:1444     performance: 0.001075935 s: 
>> update
>>
>> Time saving is about 80% by the look of this (best possible case
>> because only the top tree needs to be hashed and written out).
>>
>> -- 8< --
>> diff --git a/cache-tree.c b/cache-tree.c
>> index 6b46711996..67a4a93100 100644
>> --- a/cache-tree.c
>> +++ b/cache-tree.c
>> @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state 
>> *istate, int flags)
>>       return 0;
>>   }
>> +static int same(const struct cache_entry *a, const struct cache_entry 
>> *b)
>> +{
>> +    if (ce_stage(a) || ce_stage(b))
>> +        return 0;
>> +    if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
>> +        return 0;
>> +    return a->ce_mode == b->ce_mode &&
>> +           !oidcmp(&a->oid, &b->oid);
>> +}
>> +
>> +static int cache_tree_name_pos(const struct index_state *istate,
>> +                   const struct strbuf *path)
>> +{
>> +    int pos;
>> +
>> +    if (!path->len)
>> +        return 0;
>> +
>> +    pos = index_name_pos(istate, path->buf, path->len);
>> +    if (pos >= 0)
>> +        BUG("No no no, directory path must not exist in index");
>> +    return -pos - 1;
>> +}
>> +
>> +/*
>> + * Locate the same cache-tree in two separate indexes. Check the
>> + * cache-tree is still valid for the "to" index (i.e. it contains the
>> + * same set of entries in the "from" index).
>> + */
>> +static int verify_one_cache_tree(const struct index_state *to,
>> +                 const struct index_state *from,
>> +                 const struct cache_tree *it,
>> +                 const struct strbuf *path)
>> +{
>> +    int i, spos, dpos;
>> +
>> +    spos = cache_tree_name_pos(from, path);
>> +    if (spos + it->entry_count > from->cache_nr)
>> +        return -1;
>> +
>> +    dpos = cache_tree_name_pos(to, path);
>> +    if (dpos + it->entry_count > to->cache_nr)
>> +        return -1;
>> +
>> +    /* Can we quickly check head and tail and bail out early */
>> +    if (!same(from->cache[spos], to->cache[spos]) ||
>> +        !same(from->cache[spos + it->entry_count - 1],
>> +          to->cache[spos + it->entry_count - 1]))
>> +        return -1;
>> +
>> +    for (i = 1; i < it->entry_count - 1; i++)
>> +        if (!same(from->cache[spos + i],
>> +              to->cache[dpos + i]))
>> +            return -1;
>> +
>> +    return 0;
>> +}
>> +
>> +static int verify_and_invalidate(struct index_state *to,
>> +                 const struct index_state *from,
>> +                 struct cache_tree *it,
>> +                 struct strbuf *path)
>> +{
>> +    /*
>> +     * Optimistically verify the current tree first. Alternatively
>> +     * we could verify all the subtrees first then do this
>> +     * last. Any invalid subtree would also invalidates its
>> +     * ancestors.
>> +     */
>> +    if (it->entry_count != -1 &&
>> +        verify_one_cache_tree(to, from, it, path))
>> +        it->entry_count = -1;
>> +
>> +    /*
>> +     * If the current tree is valid, don't bother checking
>> +     * inside. All subtrees _should_ also be valid
>> +     */
>> +    if (it->entry_count == -1) {
>> +        int i, len = path->len;
>> +
>> +        for (i = 0; i < it->subtree_nr; i++) {
>> +            struct cache_tree_sub *down = it->down[i];
>> +
>> +            if (!down || !down->cache_tree)
>> +                continue;
>> +
>> +            strbuf_setlen(path, len);
>> +            strbuf_add(path, down->name, down->namelen);
>> +            strbuf_addch(path, '/');
>> +            if (verify_and_invalidate(to, from,
>> +                          down->cache_tree, path))
>> +                return -1;
>> +        }
>> +        strbuf_setlen(path, len);
>> +    }
>> +    return 0;
>> +}
>> +
>> +static struct cache_tree *duplicate_cache_tree(const struct 
>> cache_tree *src)
>> +{
>> +    struct cache_tree *dst;
>> +    int i;
>> +
>> +    if (!src)
>> +        return NULL;
>> +
>> +    dst = xmalloc(sizeof(*dst));
>> +    dst->entry_count = src->entry_count;
>> +    oidcpy(&dst->oid, &src->oid);
>> +    dst->subtree_nr = src->subtree_nr;
>> +    dst->subtree_alloc = dst->subtree_nr;
>> +    ALLOC_ARRAY(dst->down, dst->subtree_alloc);
>> +    for (i = 0; i < src->subtree_nr; i++) {
>> +        struct cache_tree_sub *dsrc = src->down[i];
>> +        struct cache_tree_sub *down;
>> +
>> +        FLEX_ALLOC_MEM(down, name, dsrc->name, dsrc->namelen);
>> +        down->count = dsrc->count;
>> +        down->namelen = dsrc->namelen;
>> +        down->used = dsrc->used;
>> +        down->cache_tree = duplicate_cache_tree(dsrc->cache_tree);
>> +        dst->down[i] = down;
>> +    }
>> +    return dst;
>> +}
>> +
>> +int cache_tree_copy(struct index_state *to, const struct index_state 
>> *from)
>> +{
>> +    struct cache_tree *it = duplicate_cache_tree(from->cache_tree);
>> +    struct strbuf path = STRBUF_INIT;
>> +    int ret;
>> +
>> +    if (to->cache_tree)
>> +        BUG("Sorry merging cache-tree is not supported yet");
>> +    ret = verify_and_invalidate(to, from, it, &path);
>> +    to->cache_tree = it;
>> +    to->cache_changed |= CACHE_TREE_CHANGED;
>> +    strbuf_release(&path);
>> +    return ret;
>> +}
>> +
>>   static void write_one(struct strbuf *buffer, struct cache_tree *it,
>>                         const char *path, int pathlen)
>>   {
>> diff --git a/cache-tree.h b/cache-tree.h
>> index cfd5328cc9..6981da8e0d 100644
>> --- a/cache-tree.h
>> +++ b/cache-tree.h
>> @@ -53,4 +53,6 @@ void prime_cache_tree(struct index_state *, struct 
>> tree *);
>>   extern int cache_tree_matches_traversal(struct cache_tree *, struct 
>> name_entry *ent, struct traverse_info *info);
>> +int cache_tree_copy(struct index_state *to, const struct index_state 
>> *from);
>> +
>>   #endif
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index cd0680f11e..cb3fdd42a6 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -1427,12 +1427,22 @@ int unpack_trees(unsigned len, struct 
>> tree_desc *t, struct unpack_trees_options
>>       ret = check_updates(o) ? (-2) : 0;
>>       if (o->dst_index) {
>>           if (!ret) {
>> -            if (!o->result.cache_tree)
>> +            if (!o->result.cache_tree) {
>> +                uint64_t start = getnanotime();
>> +#if 0
>>                   o->result.cache_tree = cache_tree();
>> -            if (!cache_tree_fully_valid(o->result.cache_tree))
>> +#else
>> +                cache_tree_copy(&o->result, o->src_index);
>> +#endif
>> +                trace_performance_since(start, "copy");
>> +            }
>> +            if (!cache_tree_fully_valid(o->result.cache_tree)) {
>> +                uint64_t start = getnanotime();
>>                   cache_tree_update(&o->result,
>>                             WRITE_TREE_SILENT |
>>                             WRITE_TREE_REPAIR);
>> +                trace_performance_since(start, "update");
>> +            }
>>           }
>>           move_index_extensions(&o->result, o->src_index);
>>           discard_index(o->dst_index);
>> -- 8< --
>>
> 
> I like the idea (and the perf win!) but it seems like there is an 
> important piece missing.  If I'm reading this correctly, unpack_trees() 
> will copy the source cache tree (instead of creating a new one) and then 
> verify_and_invalidate() will walk the cache tree and for any tree that 
> is dirty, it will flag its ancestors as dirty as well.
> 
> What I don't understand is how any cache tree entries that became 
> invalid as a result of the merge of the n-trees are marked as invalid. 
> It seems like something needs to walk the cache tree and call 
> cache_tree_invalidate_path() for all entries that changed as a result of 
> the merge before the call to verify_and_invalidate().
> 
> I thought at first cache_tree_fully_valid() might do that but it only 
> looks for entries that are already marked as invalid (or are missing 
> their corresponding object in the object store).  It assumes something 
> else has marked the invalid paths already.

In fact, in the other [1] patch series, we're detecting the number of 
cache entries that are the same as the cache tree and using that to 
traverse_by_cache_tree().  At that point, couldn't we copy the 
corresponding cache tree entries over to the destination so that those 
don't have to get recreated in the later call to cache_tree_update()?

[1] 
https://public-inbox.org/git/20180727154241.GA21288@duynguyen.home/T/#mad6b94733dcf16c29350cbad4beccd9ca93beaed

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-08-08 20:53                                   ` Ben Peart
  2018-08-09  8:16                                     ` Ben Peart
@ 2018-08-10 15:51                                     ` Duy Nguyen
  1 sibling, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-10 15:51 UTC (permalink / raw)
  To: Ben Peart; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King

On Wed, Aug 8, 2018 at 10:53 PM Ben Peart <peartben@gmail.com> wrote:
>
>
>
> On 8/1/2018 12:38 PM, Duy Nguyen wrote:
> > On Tue, Jul 31, 2018 at 01:31:31PM -0400, Ben Peart wrote:
> >>
> >>
> >> On 7/31/2018 12:50 PM, Ben Peart wrote:
> >>>
> >>>
> >>> On 7/31/2018 11:31 AM, Duy Nguyen wrote:
> >>
> >>>>
> >>>>> In the performance game of whack-a-mole, that call to repair cache-tree
> >>>>> is now looking quite expensive...
> >>>>
> >>>> Yeah and I think we can whack that mole too. I did some measurement.
> >>>> Best case possible, we just need to scan through two indexes (one with
> >>>> many good cache-tree, one with no cache-tree), compare and copy
> >>>> cache-tree over. The scanning takes like 1% time of current repair
> >>>> step and I suspect it's the hashing that takes most of the time. Of
> >>>> course real world won't have such nice numbers, but I guess we could
> >>>> maybe half cache-tree update/repair time.
> >>>>
> >>>
> >>> I have some great profiling tools available so will take a look at this
> >>> next and see exactly where the time is being spent.
> >>
> >> Good instincts.  In cache_tree_update, the heavy hitter is definitely
> >> hash_object_file followed by has_object_file.
> >>
> >> Name                                 Inc %        Inc
> >> + git!cache_tree_update               12.4      4,935
> >> |+ git!update_one                     11.8      4,706
> >> | + git!update_one                    11.8      4,706
> >> |  + git!hash_object_file              6.1      2,406
> >> |  + git!has_object_file               2.0        813
> >> |  + OTHER <<vcruntime140d!strchr>>    0.5        203
> >> |  + git!strbuf_addf                   0.4        155
> >> |  + git!strbuf_release                0.4        143
> >> |  + git!strbuf_add                    0.3        121
> >> |  + OTHER <<vcruntime140d!memcmp>>    0.2         93
> >> |  + git!strbuf_grow                   0.1         25
> >
> > Ben, if you work on this, this could be a good starting point. I will
> > not work on this because I still have some other things to catch up
> > and follow through. You can have my sign off if you reuse something
> > from this patch
> >
> > Even if it's a naive implementation, the initial numbers look pretty
> > good. Without the patch we have
> >
> > 18:31:05.970621 unpack-trees.c:1437     performance: 0.000001029 s: copy
> > 18:31:05.975729 unpack-trees.c:1444     performance: 0.005082004 s: update
> >
> > And with the patch
> >
> > 18:31:13.295655 unpack-trees.c:1437     performance: 0.000198017 s: copy
> > 18:31:13.296757 unpack-trees.c:1444     performance: 0.001075935 s: update
> >
> > Time saving is about 80% by the look of this (best possible case
> > because only the top tree needs to be hashed and written out).
> >
> > -- 8< --
> > diff --git a/cache-tree.c b/cache-tree.c
> > index 6b46711996..67a4a93100 100644
> > --- a/cache-tree.c
> > +++ b/cache-tree.c
> > @@ -440,6 +440,147 @@ int cache_tree_update(struct index_state *istate, int flags)
> >       return 0;
> >   }
> >
> > +static int same(const struct cache_entry *a, const struct cache_entry *b)
> > +{
> > +     if (ce_stage(a) || ce_stage(b))
> > +             return 0;
> > +     if ((a->ce_flags | b->ce_flags) & CE_CONFLICTED)
> > +             return 0;
> > +     return a->ce_mode == b->ce_mode &&
> > +            !oidcmp(&a->oid, &b->oid);
> > +}
> > +
> > +static int cache_tree_name_pos(const struct index_state *istate,
> > +                            const struct strbuf *path)
> > +{
> > +     int pos;
> > +
> > +     if (!path->len)
> > +             return 0;
> > +
> > +     pos = index_name_pos(istate, path->buf, path->len);
> > +     if (pos >= 0)
> > +             BUG("No no no, directory path must not exist in index");
> > +     return -pos - 1;
> > +}
> > +
> > +/*
> > + * Locate the same cache-tree in two separate indexes. Check the
> > + * cache-tree is still valid for the "to" index (i.e. it contains the
> > + * same set of entries in the "from" index).
> > + */
> > +static int verify_one_cache_tree(const struct index_state *to,
> > +                              const struct index_state *from,
> > +                              const struct cache_tree *it,
> > +                              const struct strbuf *path)
> > +{
> > +     int i, spos, dpos;
> > +
> > +     spos = cache_tree_name_pos(from, path);
> > +     if (spos + it->entry_count > from->cache_nr)
> > +             return -1;
> > +
> > +     dpos = cache_tree_name_pos(to, path);
> > +     if (dpos + it->entry_count > to->cache_nr)
> > +             return -1;
> > +
> > +     /* Can we quickly check head and tail and bail out early */
> > +     if (!same(from->cache[spos], to->cache[spos]) ||
> > +         !same(from->cache[spos + it->entry_count - 1],
> > +               to->cache[spos + it->entry_count - 1]))
> > +             return -1;
> > +
> > +     for (i = 1; i < it->entry_count - 1; i++)
> > +             if (!same(from->cache[spos + i],
> > +                       to->cache[dpos + i]))
> > +                     return -1;
> > +
> > +     return 0;
> > +}
> > +
> > +static int verify_and_invalidate(struct index_state *to,
> > +                              const struct index_state *from,
> > +                              struct cache_tree *it,
> > +                              struct strbuf *path)
> > +{
> > +     /*
> > +      * Optimistically verify the current tree first. Alternatively
> > +      * we could verify all the subtrees first then do this
> > +      * last. Any invalid subtree would also invalidates its
> > +      * ancestors.
> > +      */
> > +     if (it->entry_count != -1 &&
> > +         verify_one_cache_tree(to, from, it, path))
> > +             it->entry_count = -1;
> > +
> > +     /*
> > +      * If the current tree is valid, don't bother checking
> > +      * inside. All subtrees _should_ also be valid
> > +      */
> > +     if (it->entry_count == -1) {
> > +             int i, len = path->len;
> > +
> > +             for (i = 0; i < it->subtree_nr; i++) {
> > +                     struct cache_tree_sub *down = it->down[i];
> > +
> > +                     if (!down || !down->cache_tree)
> > +                             continue;
> > +
> > +                     strbuf_setlen(path, len);
> > +                     strbuf_add(path, down->name, down->namelen);
> > +                     strbuf_addch(path, '/');
> > +                     if (verify_and_invalidate(to, from,
> > +                                               down->cache_tree, path))
> > +                             return -1;
> > +             }
> > +             strbuf_setlen(path, len);
> > +     }
> > +     return 0;
> > +}
> > +
> > +static struct cache_tree *duplicate_cache_tree(const struct cache_tree *src)
> > +{
> > +     struct cache_tree *dst;
> > +     int i;
> > +
> > +     if (!src)
> > +             return NULL;
> > +
> > +     dst = xmalloc(sizeof(*dst));
> > +     dst->entry_count = src->entry_count;
> > +     oidcpy(&dst->oid, &src->oid);
> > +     dst->subtree_nr = src->subtree_nr;
> > +     dst->subtree_alloc = dst->subtree_nr;
> > +     ALLOC_ARRAY(dst->down, dst->subtree_alloc);
> > +     for (i = 0; i < src->subtree_nr; i++) {
> > +             struct cache_tree_sub *dsrc = src->down[i];
> > +             struct cache_tree_sub *down;
> > +
> > +             FLEX_ALLOC_MEM(down, name, dsrc->name, dsrc->namelen);
> > +             down->count = dsrc->count;
> > +             down->namelen = dsrc->namelen;
> > +             down->used = dsrc->used;
> > +             down->cache_tree = duplicate_cache_tree(dsrc->cache_tree);
> > +             dst->down[i] = down;
> > +     }
> > +     return dst;
> > +}
> > +
> > +int cache_tree_copy(struct index_state *to, const struct index_state *from)
> > +{
> > +     struct cache_tree *it = duplicate_cache_tree(from->cache_tree);
> > +     struct strbuf path = STRBUF_INIT;
> > +     int ret;
> > +
> > +     if (to->cache_tree)
> > +             BUG("Sorry merging cache-tree is not supported yet");
> > +     ret = verify_and_invalidate(to, from, it, &path);
> > +     to->cache_tree = it;
> > +     to->cache_changed |= CACHE_TREE_CHANGED;
> > +     strbuf_release(&path);
> > +     return ret;
> > +}
> > +
> >   static void write_one(struct strbuf *buffer, struct cache_tree *it,
> >                         const char *path, int pathlen)
> >   {
> > diff --git a/cache-tree.h b/cache-tree.h
> > index cfd5328cc9..6981da8e0d 100644
> > --- a/cache-tree.h
> > +++ b/cache-tree.h
> > @@ -53,4 +53,6 @@ void prime_cache_tree(struct index_state *, struct tree *);
> >
> >   extern int cache_tree_matches_traversal(struct cache_tree *, struct name_entry *ent, struct traverse_info *info);
> >
> > +int cache_tree_copy(struct index_state *to, const struct index_state *from);
> > +
> >   #endif
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index cd0680f11e..cb3fdd42a6 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -1427,12 +1427,22 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >       ret = check_updates(o) ? (-2) : 0;
> >       if (o->dst_index) {
> >               if (!ret) {
> > -                     if (!o->result.cache_tree)
> > +                     if (!o->result.cache_tree) {
> > +                             uint64_t start = getnanotime();
> > +#if 0
> >                               o->result.cache_tree = cache_tree();
> > -                     if (!cache_tree_fully_valid(o->result.cache_tree))
> > +#else
> > +                             cache_tree_copy(&o->result, o->src_index);
> > +#endif
> > +                             trace_performance_since(start, "copy");
> > +                     }
> > +                     if (!cache_tree_fully_valid(o->result.cache_tree)) {
> > +                             uint64_t start = getnanotime();
> >                               cache_tree_update(&o->result,
> >                                                 WRITE_TREE_SILENT |
> >                                                 WRITE_TREE_REPAIR);
> > +                             trace_performance_since(start, "update");
> > +                     }
> >               }
> >               move_index_extensions(&o->result, o->src_index);
> >               discard_index(o->dst_index);
> > -- 8< --
> >
>
> I like the idea (and the perf win!) but it seems like there is an
> important piece missing.  If I'm reading this correctly, unpack_trees()
> will copy the source cache tree (instead of creating a new one) and then
> verify_and_invalidate() will walk the cache tree and for any tree that
> is dirty, it will flag its ancestors as dirty as well.

That, and the verification part. The for loop at the bottom of
verify_one_cache_tree() makes sure that the cache-tree is valid. That
is, if we recreate cache-tree from scratch in the destination index,
it should produce the same OID as the cache-tree we copy over.

But I think I'm a bit loose in that check. Suppose in the source index we have

abc
foo/abc
foo/def
xyz

the for loop tries to make sure that for cache-tree of 'foo', the
destination index must have foo/abc and foo/def (with same mode,
oid....) but it fails to catch this

abc
foo/abc
foo/def
foo/xyz
xyz

If we recreate cache-tree from scratch, the cache-tree for 'foo'
should cover three items and have different oid than one we copied
from the source index. Same problem could happen if we have something
in foo, but before foo/abc.

> What I don't understand is how any cache tree entries that became
> invalid as a result of the merge of the n-trees are marked as invalid.
> It seems like something needs to walk the cache tree and call
> cache_tree_invalidate_path() for all entries that changed as a result of
> the merge before the call to verify_and_invalidate().

I'm not sure I understand but anyway the way I understand it, when we
merge from o->src_index to o->result, we start o->result with empty
cache-tree. There's nothing in there to invalidate, even though we do
call cache_tree_invalidate_path() (from invalidate_ce_path() in
unpack-trees.c)

I don't think the merge operation is related to this at all. This
problem can be stated as "I have a set of good cache-trees that are
associated with index 'A' and a new index 'B' with no cache-tree at
all. Can I (cheaply) reuse some  cache-tree from 'A'?". My answer in
this patch is yes, for each cache-tree in A, make sure that the list
of cache-entries associated with it is present in B (which is almost
correct).
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 0/4] Speed up unpack_trees()
  2018-08-09  8:16                                     ` Ben Peart
@ 2018-08-10 16:08                                       ` Duy Nguyen
  0 siblings, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-10 16:08 UTC (permalink / raw)
  To: Ben Peart; +Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King

On Thu, Aug 9, 2018 at 10:16 AM Ben Peart <peartben@gmail.com> wrote:
> In fact, in the other [1] patch series, we're detecting the number of
> cache entries that are the same as the cache tree and using that to
> traverse_by_cache_tree().  At that point, couldn't we copy the
> corresponding cache tree entries over to the destination so that those
> don't have to get recreated in the later call to cache_tree_update()?

We could. But as I stated in another mail, I saw this cache-tree
optimization as a separate problem and didn't want to mix them up.
That way cache_tree_copy() could be used elsewhere if the opportunity
shows up.

Mixing them up could also complicate the problem. You you merge stuff,
you add new cache-entries to o->result with add_index_entry() which
tries to invalidate those paths in o->result's cache-tree. Right now
the cache-tree is empty so it's really no-op. But if you copy
cache-tree over while merging, that invalidation might either
invalidate your newly copied cache-tree, or get slowed down because
non-empty o->result's cache-tree means you start to need to walk it to
find if there's any path to invalidate.

PS. This code keeps messing me up. invalidate_ce_path() may also
invalidate cache-tree in the _source_ index. For this optimization to
really shine, you better keep the the original cache-tree intact (so
that you can reuse as much as possible).

I don't see the purpose of this source cache tree invalidation at all.
My guess at this point is Linus actually made a mistake in 34110cd4e3
(Make 'unpack_trees()' have a separate source and destination index -
2008-03-06) and he should have invalidated _destination_ index instead
of the source one. I'm going to dig in some more and probably will
send a patch to remove this invalidation.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree
  2018-08-08 18:23                             ` Elijah Newren
@ 2018-08-10 16:29                               ` Duy Nguyen
  2018-08-10 18:48                                 ` Elijah Newren
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-10 16:29 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Wed, Aug 8, 2018 at 8:23 PM Elijah Newren <newren@gmail.com> wrote:
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index a32ddee159..ba3d2e947e 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
> >         return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
> >  }
> >
> > +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> > +                                       struct name_entry *names,
> > +                                       struct traverse_info *info)
> > +{
> > +       struct unpack_trees_options *o = info->data;
> > +       int i;
> > +
> > +       if (!o->merge || dirmask != ((1 << n) - 1))
> > +               return 0;
> > +
> > +       for (i = 1; i < n; i++)
> > +               if (!are_same_oid(names, names + i))
> > +                       return 0;
> > +
> > +       return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> > +}
>
> I was curious whether this could also be extended in the case of a
> merge; as long as HEAD and MERGE have the same tree, even if the base
> commit doesn't match, we can still just use the tree from HEAD which
> should be in the current index/cache_tree.  However, it'd be a
> somewhat odd history for HEAD and MERGE to match on some significantly
> sized tree when the base commit doesn't also match.

I did have 3-way merge in mind when I wrote this patch. Yes it's
unlikely except one case (I think). Consider a large "mono repo" that
contains stuff from many teams. When you branch out for your own team,
then most of your changes will be in a few directories, the rest of
the code base untouched. In that case we could have a lot of same
trees in subdirectories outside the stuff your team touches. This of
course assumes that your team keeps the same base static for some
time, not constantly rebasing/merging on top of 'master'.

> > +       /*
> > +        * Do what unpack_callback() and unpack_nondirectories() normally
> > +        * do. But we walk all paths recursively in just one loop instead.
> > +        *
> > +        * D/F conflicts and staged entries are not a concern because
>
> "staged entries"?  Do you mean "higher stage entries"?  I'm not sure
> the correct terminology here, but the former makes me think of changes
> the user has staged but not committed (i.e. stuff found at stage #0 in
> the index, but which isn't found in any tree yet) vs. the latter which
> I'd use to refer to entries at stages 1 or higher.

Yep stage 1 or higher (I was thinking ce_stage() when I wrote this).
Will clarify.


> > +        * cache-tree would be invalidated and we would never get here
> > +        * in the first place.
> > +        */
> > +       for (i = 0; i < nr_entries; i++) {
> > +               struct cache_entry *tree_ce;
> > +               int len, rc;
> > +
> > +               src[0] = o->src_index->cache[pos + i];
> > +
> > +               len = ce_namelen(src[0]);
> > +               tree_ce = xcalloc(1, cache_entry_size(len));
> > +
> > +               tree_ce->ce_mode = src[0]->ce_mode;
> > +               tree_ce->ce_flags = create_ce_flags(0);
> > +               tree_ce->ce_namelen = len;
> > +               oidcpy(&tree_ce->oid, &src[0]->oid);
> > +               memcpy(tree_ce->name, src[0]->name, len + 1);
>
> We do a bunch of work to setup tree_ce...
>
> > +               for (d = 1; d <= nr_names; d++)
> > +                       src[d] = tree_ce;
>
> ...then we make nr_names copies of tree_ce (so that *way_merge or
> bind_merge or oneway_diff or whatever will have the expected number of
> entries).
>
> > +               rc = call_unpack_fn((const struct cache_entry * const *)src, o);
>
> ...then we call o->fn (via call_unpack_fn) to do various complicated
> logic to figure out which tree_ce to use??  Isn't that just an
> expensive way to recompute that what we currently have in the index is
> what we want to keep there?
>
> Granted, a caller of this may have set o->fn to something other than
> {one,two,three}way_merge (or bind_merge), and that function might have
> important side effects...but it just seems annoying to have to do so
> much work when for most uses we already know the entry in the index is
> the one we already want.

I'm not so sure about that. Which is why I keep it generic.

> In fact, the only other thing in the
> codebase that o->fn is now set to is oneway_diff, which I think is a
> no-op when the two trees match.
>
> Would be nice if we could avoid all this, at least in the common cases
> where o->fn is a function known to not have side effects.  Or did I
> not read those functions closely enough and they do have important
> side effects?

In one of my earlier "how about this" attempts, I introduced fn_same
[1] that can help achieve this without carving "known not to have side
effects" in common code. Which I think is still a good direction to go
if we want to optimize more aggressively. We could have something like
this

diff --git a/unpack-trees.c b/unpack-trees.c
index 1f11991a51..01b80389e0 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -699,6 +699,9 @@ static int traverse_by_cache_tree(int pos, int
nr_entries, int nr_names,
        int ce_len = 0;
        int i, d;

+       if (o->fn_cache_tree)
+               return o->fn_cache_tree(pos, nr_entries, nr_names, names, info);
+
        if (!o->merge)
                BUG("We need cache-tree to do this optimization");

then you can add, say threeway_cache_tree_merge(), that does what
traverse_by_cache_tree() does but more efficient. This involves a lot
more work (mostly staring and those n-merge functions and making sure
you don't set the right conditions before going the fast path).

I didn't do it because.. well.. it's more work and also riskier. I
think we can leave that for later, unless you think we should do it
now.

[1] https://public-inbox.org/git/20180726163049.GA15572@duynguyen.home/
-- 
Duy

^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-08-08 18:46                           ` Elijah Newren
@ 2018-08-10 16:39                             ` Duy Nguyen
  2018-08-10 18:39                               ` Elijah Newren
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-10 16:39 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Wed, Aug 8, 2018 at 8:46 PM Elijah Newren <newren@gmail.com> wrote:
> > @@ -701,6 +702,24 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> >         if (!o->merge)
> >                 BUG("We need cache-tree to do this optimization");
> >
> > +       /*
> > +        * Try to keep add_index_entry() as fast as possible since
> > +        * we're going to do a lot of them.
> > +        *
> > +        * Skipping verify_path() should totally be safe because these
> > +        * paths are from the source index, which must have been
> > +        * verified.
> > +        *
> > +        * Skipping D/F and cache-tree validation checks is trickier
> > +        * because it assumes what n-merge code would do when all
> > +        * trees and the index are the same. We probably could just
> > +        * optimize those code instead (e.g. we don't invalidate that
> > +        * many cache-tree, but the searching for them is very
> > +        * expensive).
> > +        */
> > +       o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
> > +       o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
> > +
>
> In sum of this whole patch, you notice that the Nway_merge functions
> are still a bit of a bottleneck, but you know you have a special case
> where you want them to put an entry in the index that matches what is
> already there, so you try to set some extra flags to short-circuit
> part of their logic and get to what you know is the correct result.
>
> This seems a little scary to me.  I think it's probably safe as long
> as o->fn is one of {oneway_merge, twoway_merge, threeway_merge,
> bind_merge} (the cases you have in mind and which the current code
> uses), but the caller isn't limited to those.  Right now in
> diff-lib.c, there's a caller that has their own function, oneway_diff.
> More could be added in the future.
>
> If we're going to go this route, I think we should first check that
> o->fn is one of those known safe functions.  And if we're going that
> route, the comments I bring up on patch 2 about possibly avoiding
> call_unpack_fn() altogether might even obviate this patch while
> speeding things up more.

Yes I do need to check o->fn. I might have to think more about
avoiding call_unpack_fn(). Even if we avoid it though, we still go
through add_index_entry() and suffer the same checks every time unless
we do somethine like this (but then of course it's safer because
you're doing it in a specific x-way merge, not generic code like
this).

> > @@ -1561,6 +1581,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> >                 if (!ret) {
> >                         if (!o->result.cache_tree)
> >                                 o->result.cache_tree = cache_tree();
> > +                       /*
> > +                        * TODO: Walk o.src_index->cache_tree, quickly check
> > +                        * if o->result.cache has the exact same content for
> > +                        * any valid cache-tree in o.src_index, then we can
> > +                        * just copy the cache-tree over instead of hashing a
> > +                        * new tree object.
> > +                        */
>
> Interesting.  I really don't know how cache_tree works...but if we
> avoided calling call_unpack_fn, and thus left the original index entry
> in place instead of replacing it with an equal one, would that as a
> side effect speed up the cache_tree_valid/cache_tree_update calls for
> us?  Or is there still work here?

Naah. Notice that we don't care at all about the source's cache-tree
when we update o->result one (and we never ever do anything about
o->result's cache-tree during the merge). Whether you invalidate or
not, o->result's cache-tree is always empty and you still have to
recreate all cache-tree in o->result. You essentially play full cost
of "git write-tree" here if I'm not mistaken.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 0/4] Speed up unpack_trees()
  2018-08-08 18:39                                   ` Junio C Hamano
@ 2018-08-10 16:53                                     ` Duy Nguyen
  0 siblings, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-10 16:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Peart, Git Mailing List, Ben Peart, Jeff King

On Wed, Aug 8, 2018 at 8:39 PM Junio C Hamano <gitster@pobox.com> wrote:
> One more, and hopefully the final, note.
>
> ..

Much appreciated. I don't think I could figure all this out no matter
how long I stare at those commits and current code.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-08-10 16:39                             ` Duy Nguyen
@ 2018-08-10 18:39                               ` Elijah Newren
  2018-08-10 19:30                                 ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Elijah Newren @ 2018-08-10 18:39 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Fri, Aug 10, 2018 at 9:39 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Wed, Aug 8, 2018 at 8:46 PM Elijah Newren <newren@gmail.com> wrote:
> > > @@ -701,6 +702,24 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,

> > If we're going to go this route, I think we should first check that
> > o->fn is one of those known safe functions.  And if we're going that
> > route, the comments I bring up on patch 2 about possibly avoiding
> > call_unpack_fn() altogether might even obviate this patch while
> > speeding things up more.
>
> Yes I do need to check o->fn. I might have to think more about
> avoiding call_unpack_fn(). Even if we avoid it though, we still go
> through add_index_entry() and suffer the same checks every time unless
> we do somethine like this (but then of course it's safer because
> you're doing it in a specific x-way merge, not generic code like
> this).

Why do we still need to go through add_index_entry()?  I thought that
the whole point was that you already checked that at the current path,
the trees being unpacked were all equal and matched both the index and
the cache_tree.  If so, why is there any need for an update at all?
(Did I read your all_trees_same_as_cache_tree() function wrong, and
you don't actually know these all match in some important way?)


> > > @@ -1561,6 +1581,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> > >                 if (!ret) {
> > >                         if (!o->result.cache_tree)
> > >                                 o->result.cache_tree = cache_tree();
> > > +                       /*
> > > +                        * TODO: Walk o.src_index->cache_tree, quickly check
> > > +                        * if o->result.cache has the exact same content for
> > > +                        * any valid cache-tree in o.src_index, then we can
> > > +                        * just copy the cache-tree over instead of hashing a
> > > +                        * new tree object.
> > > +                        */
> >
> > Interesting.  I really don't know how cache_tree works...but if we
> > avoided calling call_unpack_fn, and thus left the original index entry
> > in place instead of replacing it with an equal one, would that as a
> > side effect speed up the cache_tree_valid/cache_tree_update calls for
> > us?  Or is there still work here?
>
> Naah. Notice that we don't care at all about the source's cache-tree
> when we update o->result one (and we never ever do anything about
> o->result's cache-tree during the merge). Whether you invalidate or
> not, o->result's cache-tree is always empty and you still have to
> recreate all cache-tree in o->result. You essentially play full cost
> of "git write-tree" here if I'm not mistaken.

Oh...perhaps that answers my question above.  So we have to call
add_index_entry() for the side effect of populating the new
cache_tree?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree
  2018-08-10 16:29                               ` Duy Nguyen
@ 2018-08-10 18:48                                 ` Elijah Newren
  0 siblings, 0 replies; 121+ messages in thread
From: Elijah Newren @ 2018-08-10 18:48 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Fri, Aug 10, 2018 at 9:29 AM Duy Nguyen <pclouds@gmail.com> wrote:
> On Wed, Aug 8, 2018 at 8:23 PM Elijah Newren <newren@gmail.com> wrote:

> > > +        * cache-tree would be invalidated and we would never get here
> > > +        * in the first place.
> > > +        */
> > > +       for (i = 0; i < nr_entries; i++) {
> > > +               struct cache_entry *tree_ce;
> > > +               int len, rc;
> > > +
> > > +               src[0] = o->src_index->cache[pos + i];
> > > +
> > > +               len = ce_namelen(src[0]);
> > > +               tree_ce = xcalloc(1, cache_entry_size(len));
> > > +
> > > +               tree_ce->ce_mode = src[0]->ce_mode;
> > > +               tree_ce->ce_flags = create_ce_flags(0);
> > > +               tree_ce->ce_namelen = len;
> > > +               oidcpy(&tree_ce->oid, &src[0]->oid);
> > > +               memcpy(tree_ce->name, src[0]->name, len + 1);
> >
> > We do a bunch of work to setup tree_ce...
> >
> > > +               for (d = 1; d <= nr_names; d++)
> > > +                       src[d] = tree_ce;
> >
> > ...then we make nr_names copies of tree_ce (so that *way_merge or
> > bind_merge or oneway_diff or whatever will have the expected number of
> > entries).
> >
> > > +               rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> >
> > ...then we call o->fn (via call_unpack_fn) to do various complicated
> > logic to figure out which tree_ce to use??  Isn't that just an
> > expensive way to recompute that what we currently have in the index is
> > what we want to keep there?
> >
> > Granted, a caller of this may have set o->fn to something other than
> > {one,two,three}way_merge (or bind_merge), and that function might have
> > important side effects...but it just seems annoying to have to do so
> > much work when for most uses we already know the entry in the index is
> > the one we already want.
>
> I'm not so sure about that. Which is why I keep it generic.
>
> > In fact, the only other thing in the
> > codebase that o->fn is now set to is oneway_diff, which I think is a
> > no-op when the two trees match.
> >
> > Would be nice if we could avoid all this, at least in the common cases
> > where o->fn is a function known to not have side effects.  Or did I
> > not read those functions closely enough and they do have important
> > side effects?
>
> In one of my earlier "how about this" attempts, I introduced fn_same
> [1] that can help achieve this without carving "known not to have side
> effects" in common code. Which I think is still a good direction to go
> if we want to optimize more aggressively. We could have something like
> this
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 1f11991a51..01b80389e0 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -699,6 +699,9 @@ static int traverse_by_cache_tree(int pos, int
> nr_entries, int nr_names,
>         int ce_len = 0;
>         int i, d;
>
> +       if (o->fn_cache_tree)
> +               return o->fn_cache_tree(pos, nr_entries, nr_names, names, info);
> +
>         if (!o->merge)
>                 BUG("We need cache-tree to do this optimization");
>
> then you can add, say threeway_cache_tree_merge(), that does what
> traverse_by_cache_tree() does but more efficient. This involves a lot
> more work (mostly staring and those n-merge functions and making sure
> you don't set the right conditions before going the fast path).
>
> I didn't do it because.. well.. it's more work and also riskier. I
> think we can leave that for later, unless you think we should do it
> now.
>
> [1] https://public-inbox.org/git/20180726163049.GA15572@duynguyen.home/

Yeah, from your other thread, I think I was missing some of the
intracacies of how the cache-tree works and the extra work that'd be
needed to bring it along. Deferring until later makes sense.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-08-10 18:39                               ` Elijah Newren
@ 2018-08-10 19:30                                 ` Duy Nguyen
  2018-08-10 19:40                                   ` Elijah Newren
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-10 19:30 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Fri, Aug 10, 2018 at 8:39 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, Aug 10, 2018 at 9:39 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Wed, Aug 8, 2018 at 8:46 PM Elijah Newren <newren@gmail.com> wrote:
> > > > @@ -701,6 +702,24 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>
> > > If we're going to go this route, I think we should first check that
> > > o->fn is one of those known safe functions.  And if we're going that
> > > route, the comments I bring up on patch 2 about possibly avoiding
> > > call_unpack_fn() altogether might even obviate this patch while
> > > speeding things up more.
> >
> > Yes I do need to check o->fn. I might have to think more about
> > avoiding call_unpack_fn(). Even if we avoid it though, we still go
> > through add_index_entry() and suffer the same checks every time unless
> > we do somethine like this (but then of course it's safer because
> > you're doing it in a specific x-way merge, not generic code like
> > this).
>
> Why do we still need to go through add_index_entry()?  I thought that
> the whole point was that you already checked that at the current path,
> the trees being unpacked were all equal and matched both the index and
> the cache_tree.  If so, why is there any need for an update at all?
> (Did I read your all_trees_same_as_cache_tree() function wrong, and
> you don't actually know these all match in some important way?)

Unless fn is oneway_diff, we have to create a new index (in o->result)
based on o->src_index and some other trees. So we have to add entries
to o->result and add_index_entry() is the way to do that (granted if
we feel confident we could add ADD_CACHE_JUST_APPEND which makes it
super cheap). This is the outcome of n-way merge,

all_trees_same_as_cache_tree() only gurantees the input condition (all
trees the same, index also the same) but it can't affect what fn does.
I don't think we can just simply skip and not update anything (like
o->diff_index_cached case) because o->result would be empty in the
end. And we need to create (temporary) o->result before we can swap it
to o->dst_index as the result of a merge operation.

> > > > @@ -1561,6 +1581,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> > > >                 if (!ret) {
> > > >                         if (!o->result.cache_tree)
> > > >                                 o->result.cache_tree = cache_tree();
> > > > +                       /*
> > > > +                        * TODO: Walk o.src_index->cache_tree, quickly check
> > > > +                        * if o->result.cache has the exact same content for
> > > > +                        * any valid cache-tree in o.src_index, then we can
> > > > +                        * just copy the cache-tree over instead of hashing a
> > > > +                        * new tree object.
> > > > +                        */
> > >
> > > Interesting.  I really don't know how cache_tree works...but if we
> > > avoided calling call_unpack_fn, and thus left the original index entry
> > > in place instead of replacing it with an equal one, would that as a
> > > side effect speed up the cache_tree_valid/cache_tree_update calls for
> > > us?  Or is there still work here?
> >
> > Naah. Notice that we don't care at all about the source's cache-tree
> > when we update o->result one (and we never ever do anything about
> > o->result's cache-tree during the merge). Whether you invalidate or
> > not, o->result's cache-tree is always empty and you still have to
> > recreate all cache-tree in o->result. You essentially play full cost
> > of "git write-tree" here if I'm not mistaken.
>
> Oh...perhaps that answers my question above.  So we have to call
> add_index_entry() for the side effect of populating the new
> cache_tree?

I have a feeling that you're thinking we can swap o->src_index to
o->dst_index at the end? That might explain your confusion about
o->result (or I misread your replies horribly) and the original
index...
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-08-10 19:30                                 ` Duy Nguyen
@ 2018-08-10 19:40                                   ` Elijah Newren
  2018-08-10 19:48                                     ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Elijah Newren @ 2018-08-10 19:40 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Fri, Aug 10, 2018 at 12:30 PM Duy Nguyen <pclouds@gmail.com> wrote:
> On Fri, Aug 10, 2018 at 8:39 PM Elijah Newren <newren@gmail.com> wrote:
...
> > Why do we still need to go through add_index_entry()?  I thought that
> > the whole point was that you already checked that at the current path,
> > the trees being unpacked were all equal and matched both the index and
> > the cache_tree.  If so, why is there any need for an update at all?
> > (Did I read your all_trees_same_as_cache_tree() function wrong, and
> > you don't actually know these all match in some important way?)
>
> Unless fn is oneway_diff, we have to create a new index (in o->result)
> based on o->src_index and some other trees. So we have to add entries

Oh, right, o->src_index may not equal o->dst_index (because of people
like me who call it that way from merge-recursive.c) and even if it
does, we still have the temporary o->result in the mean time.  I
should have remembered that; just didn't.

> to o->result and add_index_entry() is the way to do that (granted if
> we feel confident we could add ADD_CACHE_JUST_APPEND which makes it
> super cheap). This is the outcome of n-way merge,
>
> all_trees_same_as_cache_tree() only gurantees the input condition (all
> trees the same, index also the same) but it can't affect what fn does.
> I don't think we can just simply skip and not update anything (like
> o->diff_index_cached case) because o->result would be empty in the
> end. And we need to create (temporary) o->result before we can swap it
> to o->dst_index as the result of a merge operation.
>

...

> I have a feeling that you're thinking we can swap o->src_index to
> o->dst_index at the end? That might explain your confusion about
> o->result (or I misread your replies horribly) and the original
> index...

Yeah, thanks for figuring out my confusion and jogging my memory.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree
  2018-08-10 19:40                                   ` Elijah Newren
@ 2018-08-10 19:48                                     ` Duy Nguyen
  0 siblings, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-10 19:48 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Fri, Aug 10, 2018 at 9:40 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, Aug 10, 2018 at 12:30 PM Duy Nguyen <pclouds@gmail.com> wrote:
> > On Fri, Aug 10, 2018 at 8:39 PM Elijah Newren <newren@gmail.com> wrote:
> ...
> > > Why do we still need to go through add_index_entry()?  I thought that
> > > the whole point was that you already checked that at the current path,
> > > the trees being unpacked were all equal and matched both the index and
> > > the cache_tree.  If so, why is there any need for an update at all?
> > > (Did I read your all_trees_same_as_cache_tree() function wrong, and
> > > you don't actually know these all match in some important way?)
> >
> > Unless fn is oneway_diff, we have to create a new index (in o->result)
> > based on o->src_index and some other trees. So we have to add entries
>
> Oh, right, o->src_index may not equal o->dst_index (because of people
> like me who call it that way from merge-recursive.c) and even if it
> does, we still have the temporary o->result in the mean time.  I
> should have remembered that; just didn't.

Your forgetting about this actually helps. I think the idea of
avoiding add_index_entry() may be worth considering.

We know that 90% of cases of unpack_trees() is from the_index to
the_index. So if instead of creating a full temporary index, where 90%
of it might be the same as source index, if we just mark in the source
index (e.g. in ce_flags) the entries that should be copied to
o->result and _not_ create them in o->result. When it's time to create
o->dst_index (which is the_index) from o->result, we could just do
little manipulation to delete stuff that the_index has but o->result
does not and add a bit more things. It is something that at least
sounds nice in my head, but I'm not sure if it works out...
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v4 0/5] Speed up unpack_trees()
  2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
                                             ` (4 preceding siblings ...)
  2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
@ 2018-08-12  8:15                           ` Nguyễn Thái Ngọc Duy
  2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
                                               ` (7 more replies)
  5 siblings, 8 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-12  8:15 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff, Elijah Newren

v4 has a bunch of changes

- 1/5 is a new one to show indented tracing. This way it's less
  misleading to read nested time measurements
- 3/5 now has the switch/restore cache_bottom logic. Junio suggested a
  check instead in his final note, but I think this is safer (yeah I'm
  scared too)
- the old 4/4 is dropped because
  - it assumes n-way logic
  - the visible time saving is not worth the tradeoff
  - Elijah gave me an idea to avoid add_index_entry() that I think
    does not have n-way logic assumptions and gives better saving.
    But it requires some more changes so I'm going to do it later
- 5/5 is also new and should help reduce cache_tree_update() cost.
  I wrote somewhere I was not going to work on this part, but it turns
  out just a couple lines, might as well do it now.

Interdiff

diff --git a/cache-tree.c b/cache-tree.c
index 0dbe10fc85..105f13806f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -426,7 +426,6 @@ static int update_one(struct cache_tree *it,
 
 int cache_tree_update(struct index_state *istate, int flags)
 {
-	uint64_t start = getnanotime();
 	struct cache_tree *it = istate->cache_tree;
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
@@ -434,11 +433,12 @@ int cache_tree_update(struct index_state *istate, int flags)
 
 	if (i)
 		return i;
+	trace_performance_enter();
 	i = update_one(it, cache, entries, "", 0, &skip, flags);
+	trace_performance_leave("cache_tree_update");
 	if (i < 0)
 		return i;
 	istate->cache_changed |= CACHE_TREE_CHANGED;
-	trace_performance_since(start, "repair cache-tree");
 	return 0;
 }
 
diff --git a/cache.h b/cache.h
index e6f7ee4b64..8b447652a7 100644
--- a/cache.h
+++ b/cache.h
@@ -673,7 +673,6 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
 #define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
 #define ADD_CACHE_NEW_ONLY 16		/* Do not replace existing ones */
 #define ADD_CACHE_KEEP_CACHE_TREE 32	/* Do not invalidate cache-tree */
-#define ADD_CACHE_SKIP_VERIFY_PATH 64	/* Do not verify path */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
 extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
 
diff --git a/diff-lib.c b/diff-lib.c
index a9f38eb5a3..1ffa22c882 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
 	struct object_array_entry *ent;
-	uint64_t start = getnanotime();
 
+	trace_performance_enter();
 	ent = revs->pending.objects;
 	if (diff_cache(revs, &ent->item->oid, ent->name, cached))
 		exit(128);
@@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	diffcore_fix_diff_index(&revs->diffopt);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
-	trace_performance_since(start, "diff-index");
+	trace_performance_leave("diff-index");
 	return 0;
 }
 
diff --git a/dir.c b/dir.c
index 21e6f2520a..c5e9fc8cea 100644
--- a/dir.c
+++ b/dir.c
@@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
-	uint64_t start = getnanotime();
 
 	if (has_symlink_leading_path(path, len))
 		return dir->nr;
 
+	trace_performance_enter();
 	untracked = validate_untracked_cache(dir, len, pathspec);
 	if (!untracked)
 		/*
@@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		dir->nr = i;
 	}
 
-	trace_performance_since(start, "read directory %.*s", len, path);
+	trace_performance_leave("read directory %.*s", len, path);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
 		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
diff --git a/name-hash.c b/name-hash.c
index 163849831c..1fcda73cb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash(
 
 static void lazy_init_name_hash(struct index_state *istate)
 {
-	uint64_t start = getnanotime();
 
 	if (istate->name_hash_initialized)
 		return;
+	trace_performance_enter();
 	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	}
 
 	istate->name_hash_initialized = 1;
-	trace_performance_since(start, "initialize name hash");
+	trace_performance_leave("initialize name hash");
 }
 
 /*
diff --git a/preload-index.c b/preload-index.c
index 4d08d44874..d7f7919ba2 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -78,7 +78,6 @@ static void preload_index(struct index_state *index,
 {
 	int threads, i, work, offset;
 	struct thread_data data[MAX_PARALLEL];
-	uint64_t start = getnanotime();
 
 	if (!core_preload_index)
 		return;
@@ -88,6 +87,7 @@ static void preload_index(struct index_state *index,
 		threads = 2;
 	if (threads < 2)
 		return;
+	trace_performance_enter();
 	if (threads > MAX_PARALLEL)
 		threads = MAX_PARALLEL;
 	offset = 0;
@@ -109,7 +109,7 @@ static void preload_index(struct index_state *index,
 		if (pthread_join(p->pthread, NULL))
 			die("unable to join threaded lstat");
 	}
-	trace_performance_since(start, "preload index");
+	trace_performance_leave("preload index");
 }
 #endif
 
diff --git a/read-cache.c b/read-cache.c
index b0b5df5de7..2b5646ef26 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1170,7 +1170,6 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
 	int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
-	int skip_verify_path = option & ADD_CACHE_SKIP_VERIFY_PATH;
 	int new_only = option & ADD_CACHE_NEW_ONLY;
 
 	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
@@ -1211,7 +1210,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 
 	if (!ok_to_add)
 		return -1;
-	if (!skip_verify_path && !verify_path(ce->name, ce->ce_mode))
+	if (!verify_path(ce->name, ce->ce_mode))
 		return error("Invalid path '%s'", ce->name);
 
 	if (!skip_df_check &&
@@ -1400,8 +1399,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *typechange_fmt;
 	const char *added_fmt;
 	const char *unmerged_fmt;
-	uint64_t start = getnanotime();
 
+	trace_performance_enter();
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
 	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
@@ -1471,7 +1470,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 
 		replace_index_entry(istate, i, new_entry);
 	}
-	trace_performance_since(start, "refresh index");
+	trace_performance_leave("refresh index");
 	return has_errors;
 }
 
@@ -1902,7 +1901,6 @@ static void freshen_shared_index(const char *shared_index, int warn)
 int read_index_from(struct index_state *istate, const char *path,
 		    const char *gitdir)
 {
-	uint64_t start = getnanotime();
 	struct split_index *split_index;
 	int ret;
 	char *base_oid_hex;
@@ -1912,8 +1910,9 @@ int read_index_from(struct index_state *istate, const char *path,
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	trace_performance_enter();
 	ret = do_read_index(istate, path, 0);
-	trace_performance_since(start, "read cache %s", path);
+	trace_performance_leave("read cache %s", path);
 
 	split_index = istate->split_index;
 	if (!split_index || is_null_oid(&split_index->base_oid)) {
@@ -1921,6 +1920,7 @@ int read_index_from(struct index_state *istate, const char *path,
 		return ret;
 	}
 
+	trace_performance_enter();
 	if (split_index->base)
 		discard_index(split_index->base);
 	else
@@ -1937,8 +1937,8 @@ int read_index_from(struct index_state *istate, const char *path,
 	freshen_shared_index(base_path, 0);
 	merge_base_index(istate);
 	post_read_index_from(istate);
-	trace_performance_since(start, "read cache %s", base_path);
 	free(base_path);
+	trace_performance_leave("read cache %s", base_path);
 	return ret;
 }
 
@@ -2763,4 +2763,6 @@ void move_index_extensions(struct index_state *dst, struct index_state *src)
 {
 	dst->untracked = src->untracked;
 	src->untracked = NULL;
+	dst->cache_tree = src->cache_tree;
+	src->cache_tree = NULL;
 }
diff --git a/trace.c b/trace.c
index fc623e91fd..fa4a2e7120 100644
--- a/trace.c
+++ b/trace.c
@@ -176,10 +176,30 @@ void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 	strbuf_release(&buf);
 }
 
+static uint64_t perf_start_times[10];
+static int perf_indent;
+
+uint64_t trace_performance_enter(void)
+{
+	uint64_t now;
+
+	if (!trace_want(&trace_perf_key))
+		return 0;
+
+	now = getnanotime();
+	perf_start_times[perf_indent] = now;
+	if (perf_indent + 1 < ARRAY_SIZE(perf_start_times))
+		perf_indent++;
+	else
+		BUG("Too deep indentation");
+	return now;
+}
+
 static void trace_performance_vprintf_fl(const char *file, int line,
 					 uint64_t nanos, const char *format,
 					 va_list ap)
 {
+	static const char space[] = "          ";
 	struct strbuf buf = STRBUF_INIT;
 
 	if (!prepare_trace_line(file, line, &trace_perf_key, &buf))
@@ -188,7 +208,10 @@ static void trace_performance_vprintf_fl(const char *file, int line,
 	strbuf_addf(&buf, "performance: %.9f s", (double) nanos / 1000000000);
 
 	if (format && *format) {
-		strbuf_addstr(&buf, ": ");
+		if (perf_indent >= strlen(space))
+			BUG("Too deep indentation");
+
+		strbuf_addf(&buf, ":%.*s ", perf_indent, space);
 		strbuf_vaddf(&buf, format, ap);
 	}
 
@@ -244,6 +267,24 @@ void trace_performance_since(uint64_t start, const char *format, ...)
 	va_end(ap);
 }
 
+void trace_performance_leave(const char *format, ...)
+{
+	va_list ap;
+	uint64_t since;
+
+	if (perf_indent)
+		perf_indent--;
+
+	if (!format) /* Allow callers to leave without tracing anything */
+		return;
+
+	since = perf_start_times[perf_indent];
+	va_start(ap, format);
+	trace_performance_vprintf_fl(NULL, 0, getnanotime() - since,
+				     format, ap);
+	va_end(ap);
+}
+
 #else
 
 void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
@@ -273,6 +314,24 @@ void trace_performance_fl(const char *file, int line, uint64_t nanos,
 	va_end(ap);
 }
 
+void trace_performance_leave_fl(const char *file, int line,
+				uint64_t nanos, const char *format, ...)
+{
+	va_list ap;
+	uint64_t since;
+
+	if (perf_indent)
+		perf_indent--;
+
+	if (!format) /* Allow callers to leave without tracing anything */
+		return;
+
+	since = perf_start_times[perf_indent];
+	va_start(ap, format);
+	trace_performance_vprintf_fl(file, line, nanos - since, format, ap);
+	va_end(ap);
+}
+
 #endif /* HAVE_VARIADIC_MACROS */
 
 
@@ -411,13 +470,11 @@ uint64_t getnanotime(void)
 	}
 }
 
-static uint64_t command_start_time;
 static struct strbuf command_line = STRBUF_INIT;
 
 static void print_command_performance_atexit(void)
 {
-	trace_performance_since(command_start_time, "git command:%s",
-				command_line.buf);
+	trace_performance_leave("git command:%s", command_line.buf);
 }
 
 void trace_command_performance(const char **argv)
@@ -425,10 +482,10 @@ void trace_command_performance(const char **argv)
 	if (!trace_want(&trace_perf_key))
 		return;
 
-	if (!command_start_time)
+	if (!command_line.len)
 		atexit(print_command_performance_atexit);
 
 	strbuf_reset(&command_line);
 	sq_quote_argv_pretty(&command_line, argv);
-	command_start_time = getnanotime();
+	trace_performance_enter();
 }
diff --git a/trace.h b/trace.h
index 2b6a1bc17c..171b256d26 100644
--- a/trace.h
+++ b/trace.h
@@ -23,6 +23,7 @@ extern void trace_disable(struct trace_key *key);
 extern uint64_t getnanotime(void);
 extern void trace_command_performance(const char **argv);
 extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
+uint64_t trace_performance_enter(void);
 
 #ifndef HAVE_VARIADIC_MACROS
 
@@ -45,6 +46,9 @@ extern void trace_performance(uint64_t nanos, const char *format, ...);
 __attribute__((format (printf, 2, 3)))
 extern void trace_performance_since(uint64_t start, const char *format, ...);
 
+__attribute__((format (printf, 1, 2)))
+void trace_performance_leave(const char *format, ...);
+
 #else
 
 /*
@@ -118,6 +122,14 @@ extern void trace_performance_since(uint64_t start, const char *format, ...);
 					     __VA_ARGS__);		    \
 	} while (0)
 
+#define trace_performance_leave(...)					    \
+	do {								    \
+		if (trace_pass_fl(&trace_perf_key))			    \
+			trace_performance_leave_fl(TRACE_CONTEXT, __LINE__, \
+						   getnanotime(),	    \
+						   __VA_ARGS__);	    \
+	} while (0)
+
 /* backend functions, use non-*fl macros instead */
 __attribute__((format (printf, 4, 5)))
 extern void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
@@ -130,6 +142,9 @@ extern void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 __attribute__((format (printf, 4, 5)))
 extern void trace_performance_fl(const char *file, int line,
 				 uint64_t nanos, const char *fmt, ...);
+__attribute__((format (printf, 4, 5)))
+extern void trace_performance_leave_fl(const char *file, int line,
+				       uint64_t nanos, const char *fmt, ...);
 static inline int trace_pass_fl(struct trace_key *key)
 {
 	return key->fd || !key->initialized;
diff --git a/unpack-trees.c b/unpack-trees.c
index 1438ee1555..d822662c75 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -201,7 +201,6 @@ static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
 
 	ce->ce_flags = (ce->ce_flags & ~clear) | set;
 	return add_index_entry(&o->result, ce,
-			       o->extra_add_index_flags |
 			       ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
 }
 
@@ -353,9 +352,9 @@ static int check_updates(struct unpack_trees_options *o)
 	struct progress *progress = NULL;
 	struct index_state *index = &o->result;
 	struct checkout state = CHECKOUT_INIT;
-	uint64_t start = getnanotime();
 	int i;
 
+	trace_performance_enter();
 	state.force = 1;
 	state.quiet = 1;
 	state.refresh_cache = 1;
@@ -425,7 +424,7 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
-	trace_performance_since(start, "update worktree after a merge");
+	trace_performance_leave("check_updates");
 	return errs != 0;
 }
 
@@ -702,31 +701,13 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	if (!o->merge)
 		BUG("We need cache-tree to do this optimization");
 
-	/*
-	 * Try to keep add_index_entry() as fast as possible since
-	 * we're going to do a lot of them.
-	 *
-	 * Skipping verify_path() should totally be safe because these
-	 * paths are from the source index, which must have been
-	 * verified.
-	 *
-	 * Skipping D/F and cache-tree validation checks is trickier
-	 * because it assumes what n-merge code would do when all
-	 * trees and the index are the same. We probably could just
-	 * optimize those code instead (e.g. we don't invalidate that
-	 * many cache-tree, but the searching for them is very
-	 * expensive).
-	 */
-	o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
-	o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
-
 	/*
 	 * Do what unpack_callback() and unpack_nondirectories() normally
 	 * do. But we walk all paths recursively in just one loop instead.
 	 *
-	 * D/F conflicts and staged entries are not a concern because
-	 * cache-tree would be invalidated and we would never get here
-	 * in the first place.
+	 * D/F conflicts and higher stage entries are not a concern
+	 * because cache-tree would be invalidated and we would never
+	 * get here in the first place.
 	 */
 	for (i = 0; i < nr_entries; i++) {
 		int new_ce_len, len, rc;
@@ -761,7 +742,6 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 
 		mark_ce_used(src[0], o);
 	}
-	o->extra_add_index_flags = 0;
 	free(tree_ce);
 	if (o->debug_unpack)
 		printf("Unpacked %d entries from %s to %s using cache-tree\n",
@@ -791,7 +771,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 
 		if (!o->merge || df_conflicts)
 			BUG("Wrong condition to get here buddy");
-		return traverse_by_cache_tree(pos, nr_entries, n, names, info);
+
+		/*
+		 * All entries up to 'pos' must have been processed
+		 * (i.e. marked CE_UNPACKED) at this point. But to be safe,
+		 * save and restore cache_bottom anyway to not miss
+		 * unprocessed entries before 'pos'.
+		 */
+		bottom = o->cache_bottom;
+		ret = traverse_by_cache_tree(pos, nr_entries, n, names, info);
+		o->cache_bottom = bottom;
+		return ret;
 	}
 
 	p = names;
@@ -1142,7 +1132,7 @@ static void debug_unpack_callback(int n,
 }
 
 /*
- * Note that traverse_by_cache_tree() duplicates some logic in this funciton
+ * Note that traverse_by_cache_tree() duplicates some logic in this function
  * without actually calling it. If you change the logic here you may need to
  * check and change there as well.
  */
@@ -1425,11 +1415,11 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	int i, ret;
 	static struct cache_entry *dfc;
 	struct exclude_list el;
-	uint64_t start = getnanotime();
 
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
+	trace_performance_enter();
 	memset(&el, 0, sizeof(el));
 	if (!core_apply_sparse_checkout || !o->update)
 		o->skip_sparse_checkout = 1;
@@ -1502,7 +1492,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 			}
 		}
 
-		if (traverse_trees(len, t, &info) < 0)
+		trace_performance_enter();
+		ret = traverse_trees(len, t, &info);
+		trace_performance_leave("traverse_trees");
+		if (ret < 0)
 			goto return_failed;
 	}
 
@@ -1574,10 +1567,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 			goto done;
 		}
 	}
-	trace_performance_since(start, "unpack trees");
 
 	ret = check_updates(o) ? (-2) : 0;
 	if (o->dst_index) {
+		move_index_extensions(&o->result, o->src_index);
 		if (!ret) {
 			if (!o->result.cache_tree)
 				o->result.cache_tree = cache_tree();
@@ -1586,7 +1579,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
-		move_index_extensions(&o->result, o->src_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
@@ -1595,6 +1587,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->src_index = NULL;
 
 done:
+	trace_performance_leave("unpack_trees");
 	clear_exclude_list(&el);
 	return ret;
 
diff --git a/unpack-trees.h b/unpack-trees.h
index 94e1b14078..c2b434c606 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -80,7 +80,6 @@ struct unpack_trees_options {
 	struct index_state result;
 
 	struct exclude_list *el; /* for internal use */
-	unsigned int extra_add_index_flags;
 };
 
 extern int unpack_trees(unsigned n, struct tree_desc *t,

Nguyễn Thái Ngọc Duy (5):
  trace.h: support nested performance tracing
  unpack-trees: add performance tracing
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: reuse (still valid) cache-tree from src_index

 cache-tree.c    |   2 +
 diff-lib.c      |   4 +-
 dir.c           |   4 +-
 name-hash.c     |   4 +-
 preload-index.c |   4 +-
 read-cache.c    |  13 +++--
 trace.c         |  69 ++++++++++++++++++++--
 trace.h         |  15 +++++
 unpack-trees.c  | 149 +++++++++++++++++++++++++++++++++++++++++++++++-
 9 files changed, 243 insertions(+), 21 deletions(-)

-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v4 1/5] trace.h: support nested performance tracing
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
@ 2018-08-12  8:15                             ` Nguyễn Thái Ngọc Duy
  2018-08-13 18:39                               ` Ben Peart
  2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
                                               ` (6 subsequent siblings)
  7 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-12  8:15 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff, Elijah Newren

Performance measurements are listed right now as a flat list, which is
fine when we measure big blocks. But when we start adding more and
more measurements, some of them could be just part of a bigger
measurement and a flat list gives a wrong impression that they are
executed at the same level instead of nested.

Add trace_performance_enter() and trace_performance_leave() to allow
indent these nested measurements. For now it does not help much
because the only nested thing is (lazy) name hash initialization
(e.g. called in diff-index from "git status"). This will help more
because I'm going to add some more tracing that's actually nested.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff-lib.c      |  4 +--
 dir.c           |  4 +--
 name-hash.c     |  4 +--
 preload-index.c |  4 +--
 read-cache.c    | 11 ++++----
 trace.c         | 69 ++++++++++++++++++++++++++++++++++++++++++++-----
 trace.h         | 15 +++++++++++
 7 files changed, 92 insertions(+), 19 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index a9f38eb5a3..1ffa22c882 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
 	struct object_array_entry *ent;
-	uint64_t start = getnanotime();
 
+	trace_performance_enter();
 	ent = revs->pending.objects;
 	if (diff_cache(revs, &ent->item->oid, ent->name, cached))
 		exit(128);
@@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	diffcore_fix_diff_index(&revs->diffopt);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
-	trace_performance_since(start, "diff-index");
+	trace_performance_leave("diff-index");
 	return 0;
 }
 
diff --git a/dir.c b/dir.c
index 21e6f2520a..c5e9fc8cea 100644
--- a/dir.c
+++ b/dir.c
@@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
-	uint64_t start = getnanotime();
 
 	if (has_symlink_leading_path(path, len))
 		return dir->nr;
 
+	trace_performance_enter();
 	untracked = validate_untracked_cache(dir, len, pathspec);
 	if (!untracked)
 		/*
@@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		dir->nr = i;
 	}
 
-	trace_performance_since(start, "read directory %.*s", len, path);
+	trace_performance_leave("read directory %.*s", len, path);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
 		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
diff --git a/name-hash.c b/name-hash.c
index 163849831c..1fcda73cb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash(
 
 static void lazy_init_name_hash(struct index_state *istate)
 {
-	uint64_t start = getnanotime();
 
 	if (istate->name_hash_initialized)
 		return;
+	trace_performance_enter();
 	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	}
 
 	istate->name_hash_initialized = 1;
-	trace_performance_since(start, "initialize name hash");
+	trace_performance_leave("initialize name hash");
 }
 
 /*
diff --git a/preload-index.c b/preload-index.c
index 4d08d44874..d7f7919ba2 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -78,7 +78,6 @@ static void preload_index(struct index_state *index,
 {
 	int threads, i, work, offset;
 	struct thread_data data[MAX_PARALLEL];
-	uint64_t start = getnanotime();
 
 	if (!core_preload_index)
 		return;
@@ -88,6 +87,7 @@ static void preload_index(struct index_state *index,
 		threads = 2;
 	if (threads < 2)
 		return;
+	trace_performance_enter();
 	if (threads > MAX_PARALLEL)
 		threads = MAX_PARALLEL;
 	offset = 0;
@@ -109,7 +109,7 @@ static void preload_index(struct index_state *index,
 		if (pthread_join(p->pthread, NULL))
 			die("unable to join threaded lstat");
 	}
-	trace_performance_since(start, "preload index");
+	trace_performance_leave("preload index");
 }
 #endif
 
diff --git a/read-cache.c b/read-cache.c
index e865254bea..4fd35f4f37 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1399,8 +1399,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *typechange_fmt;
 	const char *added_fmt;
 	const char *unmerged_fmt;
-	uint64_t start = getnanotime();
 
+	trace_performance_enter();
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
 	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
@@ -1470,7 +1470,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 
 		replace_index_entry(istate, i, new_entry);
 	}
-	trace_performance_since(start, "refresh index");
+	trace_performance_leave("refresh index");
 	return has_errors;
 }
 
@@ -1901,7 +1901,6 @@ static void freshen_shared_index(const char *shared_index, int warn)
 int read_index_from(struct index_state *istate, const char *path,
 		    const char *gitdir)
 {
-	uint64_t start = getnanotime();
 	struct split_index *split_index;
 	int ret;
 	char *base_oid_hex;
@@ -1911,8 +1910,9 @@ int read_index_from(struct index_state *istate, const char *path,
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	trace_performance_enter();
 	ret = do_read_index(istate, path, 0);
-	trace_performance_since(start, "read cache %s", path);
+	trace_performance_leave("read cache %s", path);
 
 	split_index = istate->split_index;
 	if (!split_index || is_null_oid(&split_index->base_oid)) {
@@ -1920,6 +1920,7 @@ int read_index_from(struct index_state *istate, const char *path,
 		return ret;
 	}
 
+	trace_performance_enter();
 	if (split_index->base)
 		discard_index(split_index->base);
 	else
@@ -1936,8 +1937,8 @@ int read_index_from(struct index_state *istate, const char *path,
 	freshen_shared_index(base_path, 0);
 	merge_base_index(istate);
 	post_read_index_from(istate);
-	trace_performance_since(start, "read cache %s", base_path);
 	free(base_path);
+	trace_performance_leave("read cache %s", base_path);
 	return ret;
 }
 
diff --git a/trace.c b/trace.c
index fc623e91fd..fa4a2e7120 100644
--- a/trace.c
+++ b/trace.c
@@ -176,10 +176,30 @@ void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 	strbuf_release(&buf);
 }
 
+static uint64_t perf_start_times[10];
+static int perf_indent;
+
+uint64_t trace_performance_enter(void)
+{
+	uint64_t now;
+
+	if (!trace_want(&trace_perf_key))
+		return 0;
+
+	now = getnanotime();
+	perf_start_times[perf_indent] = now;
+	if (perf_indent + 1 < ARRAY_SIZE(perf_start_times))
+		perf_indent++;
+	else
+		BUG("Too deep indentation");
+	return now;
+}
+
 static void trace_performance_vprintf_fl(const char *file, int line,
 					 uint64_t nanos, const char *format,
 					 va_list ap)
 {
+	static const char space[] = "          ";
 	struct strbuf buf = STRBUF_INIT;
 
 	if (!prepare_trace_line(file, line, &trace_perf_key, &buf))
@@ -188,7 +208,10 @@ static void trace_performance_vprintf_fl(const char *file, int line,
 	strbuf_addf(&buf, "performance: %.9f s", (double) nanos / 1000000000);
 
 	if (format && *format) {
-		strbuf_addstr(&buf, ": ");
+		if (perf_indent >= strlen(space))
+			BUG("Too deep indentation");
+
+		strbuf_addf(&buf, ":%.*s ", perf_indent, space);
 		strbuf_vaddf(&buf, format, ap);
 	}
 
@@ -244,6 +267,24 @@ void trace_performance_since(uint64_t start, const char *format, ...)
 	va_end(ap);
 }
 
+void trace_performance_leave(const char *format, ...)
+{
+	va_list ap;
+	uint64_t since;
+
+	if (perf_indent)
+		perf_indent--;
+
+	if (!format) /* Allow callers to leave without tracing anything */
+		return;
+
+	since = perf_start_times[perf_indent];
+	va_start(ap, format);
+	trace_performance_vprintf_fl(NULL, 0, getnanotime() - since,
+				     format, ap);
+	va_end(ap);
+}
+
 #else
 
 void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
@@ -273,6 +314,24 @@ void trace_performance_fl(const char *file, int line, uint64_t nanos,
 	va_end(ap);
 }
 
+void trace_performance_leave_fl(const char *file, int line,
+				uint64_t nanos, const char *format, ...)
+{
+	va_list ap;
+	uint64_t since;
+
+	if (perf_indent)
+		perf_indent--;
+
+	if (!format) /* Allow callers to leave without tracing anything */
+		return;
+
+	since = perf_start_times[perf_indent];
+	va_start(ap, format);
+	trace_performance_vprintf_fl(file, line, nanos - since, format, ap);
+	va_end(ap);
+}
+
 #endif /* HAVE_VARIADIC_MACROS */
 
 
@@ -411,13 +470,11 @@ uint64_t getnanotime(void)
 	}
 }
 
-static uint64_t command_start_time;
 static struct strbuf command_line = STRBUF_INIT;
 
 static void print_command_performance_atexit(void)
 {
-	trace_performance_since(command_start_time, "git command:%s",
-				command_line.buf);
+	trace_performance_leave("git command:%s", command_line.buf);
 }
 
 void trace_command_performance(const char **argv)
@@ -425,10 +482,10 @@ void trace_command_performance(const char **argv)
 	if (!trace_want(&trace_perf_key))
 		return;
 
-	if (!command_start_time)
+	if (!command_line.len)
 		atexit(print_command_performance_atexit);
 
 	strbuf_reset(&command_line);
 	sq_quote_argv_pretty(&command_line, argv);
-	command_start_time = getnanotime();
+	trace_performance_enter();
 }
diff --git a/trace.h b/trace.h
index 2b6a1bc17c..171b256d26 100644
--- a/trace.h
+++ b/trace.h
@@ -23,6 +23,7 @@ extern void trace_disable(struct trace_key *key);
 extern uint64_t getnanotime(void);
 extern void trace_command_performance(const char **argv);
 extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
+uint64_t trace_performance_enter(void);
 
 #ifndef HAVE_VARIADIC_MACROS
 
@@ -45,6 +46,9 @@ extern void trace_performance(uint64_t nanos, const char *format, ...);
 __attribute__((format (printf, 2, 3)))
 extern void trace_performance_since(uint64_t start, const char *format, ...);
 
+__attribute__((format (printf, 1, 2)))
+void trace_performance_leave(const char *format, ...);
+
 #else
 
 /*
@@ -118,6 +122,14 @@ extern void trace_performance_since(uint64_t start, const char *format, ...);
 					     __VA_ARGS__);		    \
 	} while (0)
 
+#define trace_performance_leave(...)					    \
+	do {								    \
+		if (trace_pass_fl(&trace_perf_key))			    \
+			trace_performance_leave_fl(TRACE_CONTEXT, __LINE__, \
+						   getnanotime(),	    \
+						   __VA_ARGS__);	    \
+	} while (0)
+
 /* backend functions, use non-*fl macros instead */
 __attribute__((format (printf, 4, 5)))
 extern void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
@@ -130,6 +142,9 @@ extern void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 __attribute__((format (printf, 4, 5)))
 extern void trace_performance_fl(const char *file, int line,
 				 uint64_t nanos, const char *fmt, ...);
+__attribute__((format (printf, 4, 5)))
+extern void trace_performance_leave_fl(const char *file, int line,
+				       uint64_t nanos, const char *fmt, ...);
 static inline int trace_pass_fl(struct trace_key *key)
 {
 	return key->fd || !key->initialized;
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
  2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
@ 2018-08-12  8:15                             ` Nguyễn Thái Ngọc Duy
  2018-08-12 10:05                               ` Thomas Adam
                                                 ` (2 more replies)
  2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
                                               ` (5 subsequent siblings)
  7 siblings, 3 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-12  8:15 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff, Elijah Newren

We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on webkit.git, 275k
files on worktree)

    performance: 0.056651714 s:  read cache .git/index
    performance: 0.183101080 s:  preload index
    performance: 0.008584433 s:  refresh index
    performance: 0.633767589 s:   traverse_trees
    performance: 0.340265448 s:   check_updates
    performance: 0.381884638 s:   cache_tree_update
    performance: 1.401562947 s:  unpack_trees
    performance: 0.338687914 s:  write index, changed mask = 2e
    performance: 0.411927922 s:    traverse_trees
    performance: 0.000023335 s:    check_updates
    performance: 0.423697246 s:   unpack_trees
    performance: 0.423708360 s:  diff-index
    performance: 2.559524127 s: git command: git checkout -

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.c   | 2 ++
 unpack-trees.c | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 6b46711996..105f13806f 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -433,7 +433,9 @@ int cache_tree_update(struct index_state *istate, int flags)
 
 	if (i)
 		return i;
+	trace_performance_enter();
 	i = update_one(it, cache, entries, "", 0, &skip, flags);
+	trace_performance_leave("cache_tree_update");
 	if (i < 0)
 		return i;
 	istate->cache_changed |= CACHE_TREE_CHANGED;
diff --git a/unpack-trees.c b/unpack-trees.c
index cd0680f11e..b237eaa0f2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -354,6 +354,7 @@ static int check_updates(struct unpack_trees_options *o)
 	struct checkout state = CHECKOUT_INIT;
 	int i;
 
+	trace_performance_enter();
 	state.force = 1;
 	state.quiet = 1;
 	state.refresh_cache = 1;
@@ -423,6 +424,7 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+	trace_performance_leave("check_updates");
 	return errs != 0;
 }
 
@@ -1279,6 +1281,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
+	trace_performance_enter();
 	memset(&el, 0, sizeof(el));
 	if (!core_apply_sparse_checkout || !o->update)
 		o->skip_sparse_checkout = 1;
@@ -1351,7 +1354,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 			}
 		}
 
-		if (traverse_trees(len, t, &info) < 0)
+		trace_performance_enter();
+		ret = traverse_trees(len, t, &info);
+		trace_performance_leave("traverse_trees");
+		if (ret < 0)
 			goto return_failed;
 	}
 
@@ -1443,6 +1449,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->src_index = NULL;
 
 done:
+	trace_performance_leave("unpack_trees");
 	clear_exclude_list(&el);
 	return ret;
 
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
  2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
  2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
@ 2018-08-12  8:15                             ` Nguyễn Thái Ngọc Duy
  2018-08-13 18:58                               ` Ben Peart
  2018-08-12  8:15                             ` [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
                                               ` (4 subsequent siblings)
  7 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-12  8:15 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff, Elijah Newren

In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on webkit.git (275k files):

    baseline      new
  --------------------------------------------------------------------
    0.056651714   0.080394752 s:  read cache .git/index
    0.183101080   0.216010838 s:  preload index
    0.008584433   0.008534301 s:  refresh index
    0.633767589   0.251992198 s:   traverse_trees
    0.340265448   0.377031383 s:   check_updates
    0.381884638   0.372768105 s:   cache_tree_update
    1.401562947   1.045887251 s:  unpack_trees
    0.338687914   0.314983512 s:  write index, changed mask = 2e
    0.411927922   0.062572653 s:    traverse_trees
    0.000023335   0.000022544 s:    check_updates
    0.423697246   0.073795585 s:   unpack_trees
    0.423708360   0.073807557 s:  diff-index
    2.559524127   1.938191592 s: git command: git checkout -

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

    baseline        new
  ----------------------------------------------------------------------
    0.535510167     0.556558733     s: read cache .git/index
    0.3057373       0.3147105       s: initialize name hash
    0.0184082       0.023558433     s: preload index
    0.086910967     0.089085967     s: refresh index
    7.889590767     2.191554433     s: unpack trees
    0.120760833     0.131941267     s: update worktree after a merge
    2.2583504       2.572663167     s: repair cache-tree
    0.8916137       0.959495233     s: write index, changed mask = 28
    3.405199233     0.2710663       s: unpack trees
    0.000999667     0.0021554       s: update worktree after a merge
    3.4063306       0.273318333     s: diff-index
    16.9524923      9.462943133     s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

PS. A note about cache-tree invalidation and the use of it in this
code.

We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?

The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index b237eaa0f2..07456d0fb2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
 	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
+					struct name_entry *names,
+					struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int i;
+
+	if (!o->merge || dirmask != ((1 << n) - 1))
+		return 0;
+
+	for (i = 1; i < n; i++)
+		if (!are_same_oid(names, names + i))
+			return 0;
+
+	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+				      struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int len = traverse_path_len(info, names);
+	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
+	int pos;
+
+	make_traverse_path(name, info, names);
+	name[len++] = '/';
+	name[len] = '\0';
+	pos = index_name_pos(o->src_index, name, len);
+	if (pos >= 0)
+		BUG("This is a directory and should not exist in index");
+	pos = -pos - 1;
+	if (!starts_with(o->src_index->cache[pos]->name, name) ||
+	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
+		BUG("pos must point at the first entry in this directory");
+	free(name);
+	return pos;
+}
+
+/*
+ * Fast path if we detect that all trees are the same as cache-tree at this
+ * path. We'll walk these trees recursively using cache-tree/index instead of
+ * ODB since already know what these trees contain.
+ */
+static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
+				  struct name_entry *names,
+				  struct traverse_info *info)
+{
+	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
+	struct unpack_trees_options *o = info->data;
+	int i, d;
+
+	if (!o->merge)
+		BUG("We need cache-tree to do this optimization");
+
+	/*
+	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * do. But we walk all paths recursively in just one loop instead.
+	 *
+	 * D/F conflicts and higher stage entries are not a concern
+	 * because cache-tree would be invalidated and we would never
+	 * get here in the first place.
+	 */
+	for (i = 0; i < nr_entries; i++) {
+		struct cache_entry *tree_ce;
+		int len, rc;
+
+		src[0] = o->src_index->cache[pos + i];
+
+		len = ce_namelen(src[0]);
+		tree_ce = xcalloc(1, cache_entry_size(len));
+
+		tree_ce->ce_mode = src[0]->ce_mode;
+		tree_ce->ce_flags = create_ce_flags(0);
+		tree_ce->ce_namelen = len;
+		oidcpy(&tree_ce->oid, &src[0]->oid);
+		memcpy(tree_ce->name, src[0]->name, len + 1);
+
+		for (d = 1; d <= nr_names; d++)
+			src[d] = tree_ce;
+
+		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
+		free(tree_ce);
+		if (rc < 0)
+			return rc;
+
+		mark_ce_used(src[0], o);
+	}
+	if (o->debug_unpack)
+		printf("Unpacked %d entries from %s to %s using cache-tree\n",
+		       nr_entries,
+		       o->src_index->cache[pos]->name,
+		       o->src_index->cache[pos + nr_entries - 1]->name);
+	return 0;
+}
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
 				    unsigned long df_conflicts,
 				    struct name_entry *names,
@@ -655,6 +751,27 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	void *buf[MAX_UNPACK_TREES];
 	struct traverse_info newinfo;
 	struct name_entry *p;
+	int nr_entries;
+
+	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
+	if (nr_entries > 0) {
+		struct unpack_trees_options *o = info->data;
+		int pos = index_pos_by_traverse_info(names, info);
+
+		if (!o->merge || df_conflicts)
+			BUG("Wrong condition to get here buddy");
+
+		/*
+		 * All entries up to 'pos' must have been processed
+		 * (i.e. marked CE_UNPACKED) at this point. But to be safe,
+		 * save and restore cache_bottom anyway to not miss
+		 * unprocessed entries before 'pos'.
+		 */
+		bottom = o->cache_bottom;
+		ret = traverse_by_cache_tree(pos, nr_entries, n, names, info);
+		o->cache_bottom = bottom;
+		return ret;
+	}
 
 	p = names;
 	while (!p->mode)
@@ -814,6 +931,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
 	return ce;
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this function
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_nondirectories(int n, unsigned long mask,
 				 unsigned long dirmask,
 				 struct cache_entry **src,
@@ -998,6 +1120,11 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this function
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
                                               ` (2 preceding siblings ...)
  2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-12  8:15                             ` Nguyễn Thái Ngọc Duy
  2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
                                               ` (3 subsequent siblings)
  7 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-12  8:15 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff, Elijah Newren

This is a micro optimization that probably only shines on repos with
deep directory structure. Instead of allocating and freeing a new
cache_entry in every iteration, we reuse the last one and only update
the parts that are new each iteration.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 07456d0fb2..6deb04c163 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -694,6 +694,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
+	struct cache_entry *tree_ce = NULL;
+	int ce_len = 0;
 	int i, d;
 
 	if (!o->merge)
@@ -708,30 +710,39 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	 * get here in the first place.
 	 */
 	for (i = 0; i < nr_entries; i++) {
-		struct cache_entry *tree_ce;
-		int len, rc;
+		int new_ce_len, len, rc;
 
 		src[0] = o->src_index->cache[pos + i];
 
 		len = ce_namelen(src[0]);
-		tree_ce = xcalloc(1, cache_entry_size(len));
+		new_ce_len = cache_entry_size(len);
+
+		if (new_ce_len > ce_len) {
+			new_ce_len <<= 1;
+			tree_ce = xrealloc(tree_ce, new_ce_len);
+			memset(tree_ce, 0, new_ce_len);
+			ce_len = new_ce_len;
+
+			tree_ce->ce_flags = create_ce_flags(0);
+
+			for (d = 1; d <= nr_names; d++)
+				src[d] = tree_ce;
+		}
 
 		tree_ce->ce_mode = src[0]->ce_mode;
-		tree_ce->ce_flags = create_ce_flags(0);
 		tree_ce->ce_namelen = len;
 		oidcpy(&tree_ce->oid, &src[0]->oid);
 		memcpy(tree_ce->name, src[0]->name, len + 1);
 
-		for (d = 1; d <= nr_names; d++)
-			src[d] = tree_ce;
-
 		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
-		free(tree_ce);
-		if (rc < 0)
+		if (rc < 0) {
+			free(tree_ce);
 			return rc;
+		}
 
 		mark_ce_used(src[0], o);
 	}
+	free(tree_ce);
 	if (o->debug_unpack)
 		printf("Unpacked %d entries from %s to %s using cache-tree\n",
 		       nr_entries,
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
                                               ` (3 preceding siblings ...)
  2018-08-12  8:15                             ` [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
@ 2018-08-12  8:15                             ` Nguyễn Thái Ngọc Duy
  2018-08-13 15:48                               ` Elijah Newren
  2018-08-13 19:01                             ` [PATCH v4 0/5] Speed up unpack_trees() Junio C Hamano
                                               ` (2 subsequent siblings)
  7 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-12  8:15 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, peartben, peff, Elijah Newren

We do n-way merge by walking the source index and n trees at the same
time and add merge results to a new temporary index called o->result.
The merge result for any given path could be either

- keep_entry(): same old index entry in o->src_index is reused
- merged_entry(): either a new entry is added, or an existing one updated
- deleted_entry(): one entry from o->src_index is removed

For some reason [1] we keep making sure that the source index's
cache-tree is still valid if used by o->result: for all those
merged/deleted entries, we invalidate the same path in o->src_index,
so only cache-trees covering the "keep_entry" parts remain good.

Because of this, the cache-tree from o->src_index can be perfectly
reused in o->result. And in fact we already rely on this logic to
reuse untracked cache in edf3b90553 (unpack-trees: preserve index
extensions - 2017-05-08). Move the cache-tree to o->result before
doing cache_tree_update() to reduce hashing cost.

Since cache_tree_update() has risen up as one of the most expensive
parts in unpack_trees() after the last few patches. This does help
reduce unpack_trees() time significantly (on webkit.git):

    before       after
  --------------------------------------------------------------------
    0.080394752  0.051258167 s:  read cache .git/index
    0.216010838  0.212106298 s:  preload index
    0.008534301  0.280521764 s:  refresh index
    0.251992198  0.218160442 s:   traverse_trees
    0.377031383  0.374948191 s:   check_updates
    0.372768105  0.037040114 s:   cache_tree_update
    1.045887251  0.672031609 s:  unpack_trees
    0.314983512  0.317456290 s:  write index, changed mask = 2e
    0.062572653  0.038382654 s:    traverse_trees
    0.000022544  0.000042731 s:    check_updates
    0.073795585  0.050930053 s:   unpack_trees
    0.073807557  0.051099735 s:  diff-index
    1.938191592  1.614241153 s: git command: git checkout -

[1] I'm pretty sure the reason is an oversight in 34110cd4e3 (Make
    'unpack_trees()' have a separate source and destination index -
    2008-03-06). That patch aims to _not_ update the source index at
    all. The invalidation should have been done on o->result in that
    patch. But then there was no cache-tree on o->result even then so
    it's pointless to do so.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c   | 2 ++
 unpack-trees.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 4fd35f4f37..2b5646ef26 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2763,4 +2763,6 @@ void move_index_extensions(struct index_state *dst, struct index_state *src)
 {
 	dst->untracked = src->untracked;
 	src->untracked = NULL;
+	dst->cache_tree = src->cache_tree;
+	src->cache_tree = NULL;
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 6deb04c163..d822662c75 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1570,6 +1570,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	ret = check_updates(o) ? (-2) : 0;
 	if (o->dst_index) {
+		move_index_extensions(&o->result, o->src_index);
 		if (!ret) {
 			if (!o->result.cache_tree)
 				o->result.cache_tree = cache_tree();
@@ -1578,7 +1579,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
-		move_index_extensions(&o->result, o->src_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
@ 2018-08-12 10:05                               ` Thomas Adam
  2018-08-13 18:50                                 ` Junio C Hamano
  2018-08-13 18:44                               ` Ben Peart
  2018-08-13 19:25                               ` Jeff King
  2 siblings, 1 reply; 121+ messages in thread
From: Thomas Adam @ 2018-08-12 10:05 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, Git Users, gitster, peartben, peff, newren

On Sun, 12 Aug 2018 at 09:19, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:

Hi,

> +       trace_performance_leave("cache_tree_update");

I would suggest trace_performance_leave() calls use __func__ instead.
That way, there's no ambiguity if the function name ever changes.

Kindly,
Thomas

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index
  2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
@ 2018-08-13 15:48                               ` Elijah Newren
  2018-08-13 15:57                                 ` Duy Nguyen
  2018-08-13 16:05                                 ` Ben Peart
  0 siblings, 2 replies; 121+ messages in thread
From: Elijah Newren @ 2018-08-13 15:48 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Sun, Aug 12, 2018 at 1:16 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> We do n-way merge by walking the source index and n trees at the same
> time and add merge results to a new temporary index called o->result.
> The merge result for any given path could be either
>
> - keep_entry(): same old index entry in o->src_index is reused
> - merged_entry(): either a new entry is added, or an existing one updated
> - deleted_entry(): one entry from o->src_index is removed
>
> For some reason [1] we keep making sure that the source index's
> cache-tree is still valid if used by o->result: for all those
> merged/deleted entries, we invalidate the same path in o->src_index,
> so only cache-trees covering the "keep_entry" parts remain good.
>
> Because of this, the cache-tree from o->src_index can be perfectly
> reused in o->result. And in fact we already rely on this logic to
> reuse untracked cache in edf3b90553 (unpack-trees: preserve index
> extensions - 2017-05-08). Move the cache-tree to o->result before
> doing cache_tree_update() to reduce hashing cost.
>
> Since cache_tree_update() has risen up as one of the most expensive
> parts in unpack_trees() after the last few patches. This does help
> reduce unpack_trees() time significantly (on webkit.git):
>
>     before       after
>   --------------------------------------------------------------------
>     0.080394752  0.051258167 s:  read cache .git/index
>     0.216010838  0.212106298 s:  preload index
>     0.008534301  0.280521764 s:  refresh index
>     0.251992198  0.218160442 s:   traverse_trees
>     0.377031383  0.374948191 s:   check_updates
>     0.372768105  0.037040114 s:   cache_tree_update
>     1.045887251  0.672031609 s:  unpack_trees

Cool, nice drop in both cache_tree_update() and unpack_trees().  But
why did refresh_index() go up so much?  That should have been
unaffected by this patch to, so it seems like something odd is going
on.  Any ideas?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index
  2018-08-13 15:48                               ` Elijah Newren
@ 2018-08-13 15:57                                 ` Duy Nguyen
  2018-08-13 16:05                                 ` Ben Peart
  1 sibling, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-13 15:57 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Mon, Aug 13, 2018 at 5:48 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Sun, Aug 12, 2018 at 1:16 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> >
> > We do n-way merge by walking the source index and n trees at the same
> > time and add merge results to a new temporary index called o->result.
> > The merge result for any given path could be either
> >
> > - keep_entry(): same old index entry in o->src_index is reused
> > - merged_entry(): either a new entry is added, or an existing one updated
> > - deleted_entry(): one entry from o->src_index is removed
> >
> > For some reason [1] we keep making sure that the source index's
> > cache-tree is still valid if used by o->result: for all those
> > merged/deleted entries, we invalidate the same path in o->src_index,
> > so only cache-trees covering the "keep_entry" parts remain good.
> >
> > Because of this, the cache-tree from o->src_index can be perfectly
> > reused in o->result. And in fact we already rely on this logic to
> > reuse untracked cache in edf3b90553 (unpack-trees: preserve index
> > extensions - 2017-05-08). Move the cache-tree to o->result before
> > doing cache_tree_update() to reduce hashing cost.
> >
> > Since cache_tree_update() has risen up as one of the most expensive
> > parts in unpack_trees() after the last few patches. This does help
> > reduce unpack_trees() time significantly (on webkit.git):
> >
> >     before       after
> >   --------------------------------------------------------------------
> >     0.080394752  0.051258167 s:  read cache .git/index
> >     0.216010838  0.212106298 s:  preload index
> >     0.008534301  0.280521764 s:  refresh index
> >     0.251992198  0.218160442 s:   traverse_trees
> >     0.377031383  0.374948191 s:   check_updates
> >     0.372768105  0.037040114 s:   cache_tree_update
> >     1.045887251  0.672031609 s:  unpack_trees
>
> Cool, nice drop in both cache_tree_update() and unpack_trees().  But
> why did refresh_index() go up so much?  That should have been
> unaffected by this patch to, so it seems like something odd is going
> on.  Any ideas?

Probably fs cache and stuff. This is a laptop with just 4GB RAM and a
very slow disk so if something triggers in the background and evicts
some webkit.git's stat info, refresh_index will get hot fast (and with
275k files, webkit.git needs quite a bit of ram to make sure stat()
calls don't hit the disk).
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index
  2018-08-13 15:48                               ` Elijah Newren
  2018-08-13 15:57                                 ` Duy Nguyen
@ 2018-08-13 16:05                                 ` Ben Peart
  2018-08-13 16:25                                   ` Duy Nguyen
  1 sibling, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-08-13 16:05 UTC (permalink / raw)
  To: Elijah Newren, Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King



On 8/13/2018 11:48 AM, Elijah Newren wrote:
> On Sun, Aug 12, 2018 at 1:16 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>>
>> We do n-way merge by walking the source index and n trees at the same
>> time and add merge results to a new temporary index called o->result.
>> The merge result for any given path could be either
>>
>> - keep_entry(): same old index entry in o->src_index is reused
>> - merged_entry(): either a new entry is added, or an existing one updated
>> - deleted_entry(): one entry from o->src_index is removed
>>
>> For some reason [1] we keep making sure that the source index's
>> cache-tree is still valid if used by o->result: for all those
>> merged/deleted entries, we invalidate the same path in o->src_index,
>> so only cache-trees covering the "keep_entry" parts remain good.
>>
>> Because of this, the cache-tree from o->src_index can be perfectly
>> reused in o->result. And in fact we already rely on this logic to
>> reuse untracked cache in edf3b90553 (unpack-trees: preserve index
>> extensions - 2017-05-08). Move the cache-tree to o->result before
>> doing cache_tree_update() to reduce hashing cost.
>>
>> Since cache_tree_update() has risen up as one of the most expensive
>> parts in unpack_trees() after the last few patches. This does help
>> reduce unpack_trees() time significantly (on webkit.git):
>>
>>      before       after
>>    --------------------------------------------------------------------
>>      0.080394752  0.051258167 s:  read cache .git/index
>>      0.216010838  0.212106298 s:  preload index
>>      0.008534301  0.280521764 s:  refresh index
>>      0.251992198  0.218160442 s:   traverse_trees
>>      0.377031383  0.374948191 s:   check_updates
>>      0.372768105  0.037040114 s:   cache_tree_update
>>      1.045887251  0.672031609 s:  unpack_trees
> 
> Cool, nice drop in both cache_tree_update() and unpack_trees().  But
> why did refresh_index() go up so much?  That should have been
> unaffected by this patch to, so it seems like something odd is going
> on.  Any ideas?
> 

I was part way through writing a patch that would copy the valid parts 
of the cache-tree from the source index to the dest index but the 
observation that the source index cache tree was already being 
invalidated properly which allows the simple pointer "copy" is much better!

I run some tests on a large repo and the results look very promising.

base	new	diff	% saved	
0.55	0.52	0.02	4.32%	s:  read cache .git/index
0.31	0.30	0.01	2.98%	s:  initialize name hash
0.03	0.02	0.00	9.98%	s:  preload index
0.09	0.09	0.00	4.86%	s:  refresh index
5.93	1.19	4.74	79.95%	s:   traverse_trees
0.12	0.13	-0.01	-4.15%	s:   check_updates
2.14	0.00	2.14	100.00%	s:   cache_tree_update
10.63	4.29	6.33	59.59%	s:  unpack_trees
0.97	0.91	0.06	6.41%	s:  write index, changed mask = 28
3.49	0.18	3.31	94.91%	s:    traverse_trees
0.00	0.00	0.00	17.53%	s:    check_updates
3.61	0.30	3.31	91.77%	s:   unpack_trees
3.61	0.30	3.31	91.77%	s:  diff-index
17.28	8.36	8.92	51.62%	s: git command: c:git.exe checkout

Same methodology as before, I ran "git checkout" 5 times, threw away the 
first 2 runs and averaged the last 3.  I entered 0 for the "new" 
cache_tree_update line as it no longer reports anything.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index
  2018-08-13 16:05                                 ` Ben Peart
@ 2018-08-13 16:25                                   ` Duy Nguyen
  2018-08-13 17:15                                     ` Ben Peart
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-13 16:25 UTC (permalink / raw)
  To: Ben Peart
  Cc: Elijah Newren, Ben Peart, Git Mailing List, Junio C Hamano,
	Jeff King

On Mon, Aug 13, 2018 at 6:05 PM Ben Peart <peartben@gmail.com> wrote:
> I was part way through writing a patch that would copy the valid parts
> of the cache-tree from the source index to the dest index

Yeah sorry about that. I make bad judgements all the time, unfortunately.

If it's sort of working though, please post to the list anyway to
archive it. Who knows, some time down the road we might actually need
it again.

> I run some tests on a large repo and the results look very promising.
>
> base    new     diff    % saved
> 0.55    0.52    0.02    4.32%   s:  read cache .git/index
> 0.31    0.30    0.01    2.98%   s:  initialize name hash
> 0.03    0.02    0.00    9.98%   s:  preload index
> 0.09    0.09    0.00    4.86%   s:  refresh index
> 5.93    1.19    4.74    79.95%  s:   traverse_trees
> 0.12    0.13    -0.01   -4.15%  s:   check_updates
> 2.14    0.00    2.14    100.00% s:   cache_tree_update
> 10.63   4.29    6.33    59.59%  s:  unpack_trees

There's a big gap here, I think. unpack_trees() takes 4s but the sum
of traverse_trees, check_updates and cache_tree_update is 1.5s top. I
guess that's sparse checkout and stuff? It's either that or there's
another big hidden thing we should pay attention to ;-)

> 0.97    0.91    0.06    6.41%   s:  write index, changed mask = 28
> 3.49    0.18    3.31    94.91%  s:    traverse_trees
> 0.00    0.00    0.00    17.53%  s:    check_updates
> 3.61    0.30    3.31    91.77%  s:   unpack_trees
> 3.61    0.30    3.31    91.77%  s:  diff-index
> 17.28   8.36    8.92    51.62%  s: git command: c:git.exe checkout
>
> Same methodology as before, I ran "git checkout" 5 times, threw away the
> first 2 runs and averaged the last 3.  I entered 0 for the "new"
> cache_tree_update line as it no longer reports anything.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index
  2018-08-13 16:25                                   ` Duy Nguyen
@ 2018-08-13 17:15                                     ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-13 17:15 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Elijah Newren, Ben Peart, Git Mailing List, Junio C Hamano,
	Jeff King



On 8/13/2018 12:25 PM, Duy Nguyen wrote:
> On Mon, Aug 13, 2018 at 6:05 PM Ben Peart <peartben@gmail.com> wrote:
>> I was part way through writing a patch that would copy the valid parts
>> of the cache-tree from the source index to the dest index
> 
> Yeah sorry about that. I make bad judgements all the time, unfortunately.
> 
> If it's sort of working though, please post to the list anyway to
> archive it. Who knows, some time down the road we might actually need
> it again.
> 
>> I run some tests on a large repo and the results look very promising.
>>
>> base    new     diff    % saved
>> 0.55    0.52    0.02    4.32%   s:  read cache .git/index
>> 0.31    0.30    0.01    2.98%   s:  initialize name hash
>> 0.03    0.02    0.00    9.98%   s:  preload index
>> 0.09    0.09    0.00    4.86%   s:  refresh index
>> 5.93    1.19    4.74    79.95%  s:   traverse_trees
>> 0.12    0.13    -0.01   -4.15%  s:   check_updates
>> 2.14    0.00    2.14    100.00% s:   cache_tree_update
>> 10.63   4.29    6.33    59.59%  s:  unpack_trees
> 
> There's a big gap here, I think. unpack_trees() takes 4s but the sum
> of traverse_trees, check_updates and cache_tree_update is 1.5s top. I
> guess that's sparse checkout and stuff? It's either that or there's
> another big hidden thing we should pay attention to ;-)
> 

Yes, there are additional costs associated with the sparse-checkout and 
excludes logic.  We've sped that up significantly by converting it to a 
hashmap but it still has measurable cost as we have to compute the hash 
of the cache entry name before looking it up.

Name                                            Inc %	     Inc
+ git!unpack_trees                      	 50.9	   4,575
|+ git!clear_ce_flags_1                 	 16.5	   1,479
||+ git!is_included_in_virtualfilesystem	 15.9	   1,430
|| + git!check_includes_hashmap         	 15.8	   1,418
|+ git!traverse_trees                   	 16.3	   1,468
|+ git!cache_tree_fully_valid           	 11.7	   1,055
|+ git!check_updates                    	  1.9	     169
|+ git!discard_index                    	  1.8	     162
|+ git!apply_sparse_checkout            	  0.2	      15
|+ git!next_cache_entry                 	  0.0	       3


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 1/5] trace.h: support nested performance tracing
  2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
@ 2018-08-13 18:39                               ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-13 18:39 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben.Peart, git, gitster, peff, Elijah Newren



On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote:
> Performance measurements are listed right now as a flat list, which is
> fine when we measure big blocks. But when we start adding more and
> more measurements, some of them could be just part of a bigger
> measurement and a flat list gives a wrong impression that they are
> executed at the same level instead of nested.
> 
> Add trace_performance_enter() and trace_performance_leave() to allow
> indent these nested measurements. For now it does not help much
> because the only nested thing is (lazy) name hash initialization
> (e.g. called in diff-index from "git status"). This will help more
> because I'm going to add some more tracing that's actually nested.
> 

I reviewed this and it looks reasonable to me.

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   diff-lib.c      |  4 +--
>   dir.c           |  4 +--
>   name-hash.c     |  4 +--
>   preload-index.c |  4 +--
>   read-cache.c    | 11 ++++----
>   trace.c         | 69 ++++++++++++++++++++++++++++++++++++++++++++-----
>   trace.h         | 15 +++++++++++
>   7 files changed, 92 insertions(+), 19 deletions(-)
> 
> diff --git a/diff-lib.c b/diff-lib.c
> index a9f38eb5a3..1ffa22c882 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
>   int run_diff_index(struct rev_info *revs, int cached)
>   {
>   	struct object_array_entry *ent;
> -	uint64_t start = getnanotime();
>   
> +	trace_performance_enter();
>   	ent = revs->pending.objects;
>   	if (diff_cache(revs, &ent->item->oid, ent->name, cached))
>   		exit(128);
> @@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
>   	diffcore_fix_diff_index(&revs->diffopt);
>   	diffcore_std(&revs->diffopt);
>   	diff_flush(&revs->diffopt);
> -	trace_performance_since(start, "diff-index");
> +	trace_performance_leave("diff-index");
>   	return 0;
>   }
>   
> diff --git a/dir.c b/dir.c
> index 21e6f2520a..c5e9fc8cea 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>   		   const char *path, int len, const struct pathspec *pathspec)
>   {
>   	struct untracked_cache_dir *untracked;
> -	uint64_t start = getnanotime();
>   

I think removing the cost of has_symlink_leading_path() from this perf 
trace is probably OK to simplify the enter/leave logic.

>   	if (has_symlink_leading_path(path, len))
>   		return dir->nr;
>   
> +	trace_performance_enter();
>   	untracked = validate_untracked_cache(dir, len, pathspec);
>   	if (!untracked)
>   		/*
> @@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>   		dir->nr = i;
>   	}
>   
> -	trace_performance_since(start, "read directory %.*s", len, path);
> +	trace_performance_leave("read directory %.*s", len, path);
>   	if (dir->untracked) {
>   		static int force_untracked_cache = -1;
>   		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
> diff --git a/name-hash.c b/name-hash.c
> index 163849831c..1fcda73cb3 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash(
>   
>   static void lazy_init_name_hash(struct index_state *istate)
>   {
> -	uint64_t start = getnanotime();
>   
>   	if (istate->name_hash_initialized)
>   		return;
> +	trace_performance_enter();
>   	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
>   	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
>   
> @@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
>   	}
>   
>   	istate->name_hash_initialized = 1;
> -	trace_performance_since(start, "initialize name hash");
> +	trace_performance_leave("initialize name hash");
>   }
>   
>   /*
> diff --git a/preload-index.c b/preload-index.c
> index 4d08d44874..d7f7919ba2 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -78,7 +78,6 @@ static void preload_index(struct index_state *index,
>   {
>   	int threads, i, work, offset;
>   	struct thread_data data[MAX_PARALLEL];
> -	uint64_t start = getnanotime();
>   
>   	if (!core_preload_index)
>   		return;
> @@ -88,6 +87,7 @@ static void preload_index(struct index_state *index,
>   		threads = 2;
>   	if (threads < 2)
>   		return;
> +	trace_performance_enter();
>   	if (threads > MAX_PARALLEL)
>   		threads = MAX_PARALLEL;
>   	offset = 0;
> @@ -109,7 +109,7 @@ static void preload_index(struct index_state *index,
>   		if (pthread_join(p->pthread, NULL))
>   			die("unable to join threaded lstat");
>   	}
> -	trace_performance_since(start, "preload index");
> +	trace_performance_leave("preload index");
>   }
>   #endif
>   
> diff --git a/read-cache.c b/read-cache.c
> index e865254bea..4fd35f4f37 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1399,8 +1399,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>   	const char *typechange_fmt;
>   	const char *added_fmt;
>   	const char *unmerged_fmt;
> -	uint64_t start = getnanotime();
>   
> +	trace_performance_enter();
>   	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
>   	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
>   	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
> @@ -1470,7 +1470,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>   
>   		replace_index_entry(istate, i, new_entry);
>   	}
> -	trace_performance_since(start, "refresh index");
> +	trace_performance_leave("refresh index");
>   	return has_errors;
>   }
>   
> @@ -1901,7 +1901,6 @@ static void freshen_shared_index(const char *shared_index, int warn)
>   int read_index_from(struct index_state *istate, const char *path,
>   		    const char *gitdir)
>   {
> -	uint64_t start = getnanotime();
>   	struct split_index *split_index;
>   	int ret;
>   	char *base_oid_hex;
> @@ -1911,8 +1910,9 @@ int read_index_from(struct index_state *istate, const char *path,
>   	if (istate->initialized)
>   		return istate->cache_nr;
>   
> +	trace_performance_enter();
>   	ret = do_read_index(istate, path, 0);
> -	trace_performance_since(start, "read cache %s", path);
> +	trace_performance_leave("read cache %s", path);
>   
>   	split_index = istate->split_index;
>   	if (!split_index || is_null_oid(&split_index->base_oid)) {
> @@ -1920,6 +1920,7 @@ int read_index_from(struct index_state *istate, const char *path,
>   		return ret;
>   	}
>   

This one is kind of odd how it's splitting up the index read from the 
split index but it's no more odd than it was before.

> +	trace_performance_enter();
>   	if (split_index->base)
>   		discard_index(split_index->base);
>   	else
> @@ -1936,8 +1937,8 @@ int read_index_from(struct index_state *istate, const char *path,
>   	freshen_shared_index(base_path, 0);
>   	merge_base_index(istate);
>   	post_read_index_from(istate);
> -	trace_performance_since(start, "read cache %s", base_path);
>   	free(base_path);
> +	trace_performance_leave("read cache %s", base_path);
>   	return ret;
>   }
>   
> diff --git a/trace.c b/trace.c
> index fc623e91fd..fa4a2e7120 100644
> --- a/trace.c
> +++ b/trace.c
> @@ -176,10 +176,30 @@ void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
>   	strbuf_release(&buf);
>   }
>   
> +static uint64_t perf_start_times[10];
> +static int perf_indent;
> +
> +uint64_t trace_performance_enter(void)
> +{
> +	uint64_t now;
> +
> +	if (!trace_want(&trace_perf_key))
> +		return 0;
> +
> +	now = getnanotime();
> +	perf_start_times[perf_indent] = now;
> +	if (perf_indent + 1 < ARRAY_SIZE(perf_start_times))
> +		perf_indent++;
> +	else
> +		BUG("Too deep indentation");
> +	return now;
> +}
> +
>   static void trace_performance_vprintf_fl(const char *file, int line,
>   					 uint64_t nanos, const char *format,
>   					 va_list ap)
>   {
> +	static const char space[] = "          ";
>   	struct strbuf buf = STRBUF_INIT;
>   
>   	if (!prepare_trace_line(file, line, &trace_perf_key, &buf))
> @@ -188,7 +208,10 @@ static void trace_performance_vprintf_fl(const char *file, int line,
>   	strbuf_addf(&buf, "performance: %.9f s", (double) nanos / 1000000000);
>   
>   	if (format && *format) {
> -		strbuf_addstr(&buf, ": ");
> +		if (perf_indent >= strlen(space))
> +			BUG("Too deep indentation");
> +
> +		strbuf_addf(&buf, ":%.*s ", perf_indent, space);
>   		strbuf_vaddf(&buf, format, ap);
>   	}
>   
> @@ -244,6 +267,24 @@ void trace_performance_since(uint64_t start, const char *format, ...)
>   	va_end(ap);
>   }
>   
> +void trace_performance_leave(const char *format, ...)
> +{
> +	va_list ap;
> +	uint64_t since;
> +
> +	if (perf_indent)
> +		perf_indent--;
> +
> +	if (!format) /* Allow callers to leave without tracing anything */
> +		return;
> +
> +	since = perf_start_times[perf_indent];
> +	va_start(ap, format);
> +	trace_performance_vprintf_fl(NULL, 0, getnanotime() - since,
> +				     format, ap);
> +	va_end(ap);
> +}
> +
>   #else
>   
>   void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
> @@ -273,6 +314,24 @@ void trace_performance_fl(const char *file, int line, uint64_t nanos,
>   	va_end(ap);
>   }
>   
> +void trace_performance_leave_fl(const char *file, int line,
> +				uint64_t nanos, const char *format, ...)
> +{
> +	va_list ap;
> +	uint64_t since;
> +
> +	if (perf_indent)
> +		perf_indent--;
> +
> +	if (!format) /* Allow callers to leave without tracing anything */
> +		return;
> +
> +	since = perf_start_times[perf_indent];
> +	va_start(ap, format);
> +	trace_performance_vprintf_fl(file, line, nanos - since, format, ap);
> +	va_end(ap);
> +}
> +
>   #endif /* HAVE_VARIADIC_MACROS */
>   
>   
> @@ -411,13 +470,11 @@ uint64_t getnanotime(void)
>   	}
>   }
>   
> -static uint64_t command_start_time;
>   static struct strbuf command_line = STRBUF_INIT;
>   
>   static void print_command_performance_atexit(void)
>   {
> -	trace_performance_since(command_start_time, "git command:%s",
> -				command_line.buf);
> +	trace_performance_leave("git command:%s", command_line.buf);
>   }
>   
>   void trace_command_performance(const char **argv)
> @@ -425,10 +482,10 @@ void trace_command_performance(const char **argv)
>   	if (!trace_want(&trace_perf_key))
>   		return;
>   
> -	if (!command_start_time)
> +	if (!command_line.len)
>   		atexit(print_command_performance_atexit);
>   
>   	strbuf_reset(&command_line);
>   	sq_quote_argv_pretty(&command_line, argv);
> -	command_start_time = getnanotime();
> +	trace_performance_enter();
>   }
> diff --git a/trace.h b/trace.h
> index 2b6a1bc17c..171b256d26 100644
> --- a/trace.h
> +++ b/trace.h
> @@ -23,6 +23,7 @@ extern void trace_disable(struct trace_key *key);
>   extern uint64_t getnanotime(void);
>   extern void trace_command_performance(const char **argv);
>   extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
> +uint64_t trace_performance_enter(void);
>   
>   #ifndef HAVE_VARIADIC_MACROS
>   
> @@ -45,6 +46,9 @@ extern void trace_performance(uint64_t nanos, const char *format, ...);
>   __attribute__((format (printf, 2, 3)))
>   extern void trace_performance_since(uint64_t start, const char *format, ...);
>   
> +__attribute__((format (printf, 1, 2)))
> +void trace_performance_leave(const char *format, ...);
> +
>   #else
>   
>   /*
> @@ -118,6 +122,14 @@ extern void trace_performance_since(uint64_t start, const char *format, ...);
>   					     __VA_ARGS__);		    \
>   	} while (0)
>   
> +#define trace_performance_leave(...)					    \
> +	do {								    \
> +		if (trace_pass_fl(&trace_perf_key))			    \
> +			trace_performance_leave_fl(TRACE_CONTEXT, __LINE__, \
> +						   getnanotime(),	    \
> +						   __VA_ARGS__);	    \
> +	} while (0)
> +
>   /* backend functions, use non-*fl macros instead */
>   __attribute__((format (printf, 4, 5)))
>   extern void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
> @@ -130,6 +142,9 @@ extern void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
>   __attribute__((format (printf, 4, 5)))
>   extern void trace_performance_fl(const char *file, int line,
>   				 uint64_t nanos, const char *fmt, ...);
> +__attribute__((format (printf, 4, 5)))
> +extern void trace_performance_leave_fl(const char *file, int line,
> +				       uint64_t nanos, const char *fmt, ...);
>   static inline int trace_pass_fl(struct trace_key *key)
>   {
>   	return key->fd || !key->initialized;
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
  2018-08-12 10:05                               ` Thomas Adam
@ 2018-08-13 18:44                               ` Ben Peart
  2018-08-13 19:25                               ` Jeff King
  2 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-13 18:44 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben.Peart, git, gitster, peff, Elijah Newren



On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote:
> We're going to optimize unpack_trees() a bit in the following
> patches. Let's add some tracing to measure how long it takes before
> and after. This is the baseline ("git checkout -" on webkit.git, 275k
> files on worktree)
> 
>      performance: 0.056651714 s:  read cache .git/index
>      performance: 0.183101080 s:  preload index
>      performance: 0.008584433 s:  refresh index
>      performance: 0.633767589 s:   traverse_trees
>      performance: 0.340265448 s:   check_updates
>      performance: 0.381884638 s:   cache_tree_update
>      performance: 1.401562947 s:  unpack_trees
>      performance: 0.338687914 s:  write index, changed mask = 2e
>      performance: 0.411927922 s:    traverse_trees
>      performance: 0.000023335 s:    check_updates
>      performance: 0.423697246 s:   unpack_trees
>      performance: 0.423708360 s:  diff-index
>      performance: 2.559524127 s: git command: git checkout -
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   cache-tree.c   | 2 ++
>   unpack-trees.c | 9 ++++++++-
>   2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/cache-tree.c b/cache-tree.c
> index 6b46711996..105f13806f 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -433,7 +433,9 @@ int cache_tree_update(struct index_state *istate, int flags)
>   
>   	if (i)
>   		return i;
> +	trace_performance_enter();

This one is a little odd to me.  I think the either the 
trace_performance_enter() call should move up to include the 
verify_cache() call or the enter/leave should move into the update_one() 
call as that is all it is measuring/reporting on.

>   	i = update_one(it, cache, entries, "", 0, &skip, flags);
> +	trace_performance_leave("cache_tree_update");
>   	if (i < 0)
>   		return i;
>   	istate->cache_changed |= CACHE_TREE_CHANGED;
> diff --git a/unpack-trees.c b/unpack-trees.c
> index cd0680f11e..b237eaa0f2 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -354,6 +354,7 @@ static int check_updates(struct unpack_trees_options *o)
>   	struct checkout state = CHECKOUT_INIT;
>   	int i;
>   
> +	trace_performance_enter();
>   	state.force = 1;
>   	state.quiet = 1;
>   	state.refresh_cache = 1;
> @@ -423,6 +424,7 @@ static int check_updates(struct unpack_trees_options *o)
>   	errs |= finish_delayed_checkout(&state);
>   	if (o->update)
>   		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
> +	trace_performance_leave("check_updates");
>   	return errs != 0;
>   }
>   
> @@ -1279,6 +1281,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	if (len > MAX_UNPACK_TREES)
>   		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
>   
> +	trace_performance_enter();
>   	memset(&el, 0, sizeof(el));
>   	if (!core_apply_sparse_checkout || !o->update)
>   		o->skip_sparse_checkout = 1;
> @@ -1351,7 +1354,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   			}
>   		}
>   
> -		if (traverse_trees(len, t, &info) < 0)
> +		trace_performance_enter();
> +		ret = traverse_trees(len, t, &info);
> +		trace_performance_leave("traverse_trees");

Why not move this enter/leave pair into the traverse_trees() function 
itself?

> +		if (ret < 0)
>   			goto return_failed;
>   	}
>   
> @@ -1443,6 +1449,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	o->src_index = NULL;
>   
>   done:
> +	trace_performance_leave("unpack_trees");
>   	clear_exclude_list(&el);
>   	return ret;
>   
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-12 10:05                               ` Thomas Adam
@ 2018-08-13 18:50                                 ` Junio C Hamano
  0 siblings, 0 replies; 121+ messages in thread
From: Junio C Hamano @ 2018-08-13 18:50 UTC (permalink / raw)
  To: Thomas Adam; +Cc: pclouds, Ben.Peart, Git Users, peartben, peff, newren

Thomas Adam <thomas@xteddy.org> writes:

> On Sun, 12 Aug 2018 at 09:19, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> Hi,
>
>> +       trace_performance_leave("cache_tree_update");
>
> I would suggest trace_performance_leave() calls use __func__ instead.
> That way, there's no ambiguity if the function name ever changes.

Please don't, unless you are certain that everybody has __func__ in
the first place.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree
  2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-13 18:58                               ` Ben Peart
  2018-08-15 16:38                                 ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Ben Peart @ 2018-08-13 18:58 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben.Peart, git, gitster, peff, Elijah Newren



On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote:
> In order to merge one or many trees with the index, unpack-trees code
> walks multiple trees in parallel with the index and performs n-way
> merge. If we find out at start of a directory that all trees are the
> same (by comparing OID) and cache-tree happens to be available for
> that directory as well, we could avoid walking the trees because we
> already know what these trees contain: it's flattened in what's called
> "the index".
> 
> The upside is of course a lot less I/O since we can potentially skip
> lots of trees (think subtrees). We also save CPU because we don't have
> to inflate and apply the deltas. The downside is of course more
> fragile code since the logic in some functions are now duplicated
> elsewhere.
> 
> "checkout -" with this patch on webkit.git (275k files):
> 
>      baseline      new
>    --------------------------------------------------------------------
>      0.056651714   0.080394752 s:  read cache .git/index
>      0.183101080   0.216010838 s:  preload index
>      0.008584433   0.008534301 s:  refresh index
>      0.633767589   0.251992198 s:   traverse_trees
>      0.340265448   0.377031383 s:   check_updates
>      0.381884638   0.372768105 s:   cache_tree_update
>      1.401562947   1.045887251 s:  unpack_trees
>      0.338687914   0.314983512 s:  write index, changed mask = 2e
>      0.411927922   0.062572653 s:    traverse_trees
>      0.000023335   0.000022544 s:    check_updates
>      0.423697246   0.073795585 s:   unpack_trees
>      0.423708360   0.073807557 s:  diff-index
>      2.559524127   1.938191592 s: git command: git checkout -
> 
> Another measurement from Ben's running "git checkout" with over 500k
> trees (on the whole series):
> 
>      baseline        new
>    ----------------------------------------------------------------------
>      0.535510167     0.556558733     s: read cache .git/index
>      0.3057373       0.3147105       s: initialize name hash
>      0.0184082       0.023558433     s: preload index
>      0.086910967     0.089085967     s: refresh index
>      7.889590767     2.191554433     s: unpack trees
>      0.120760833     0.131941267     s: update worktree after a merge
>      2.2583504       2.572663167     s: repair cache-tree
>      0.8916137       0.959495233     s: write index, changed mask = 28
>      3.405199233     0.2710663       s: unpack trees
>      0.000999667     0.0021554       s: update worktree after a merge
>      3.4063306       0.273318333     s: diff-index
>      16.9524923      9.462943133     s: git command: git.exe checkout
> 
> This command calls unpack_trees() twice, the first time on 2way merge
> and the second 1way merge. In both times, "unpack trees" time is
> reduced to one third. Overall time reduction is not that impressive of
> course because index operations take a big chunk. And there's that
> repair cache-tree line.
> 
> PS. A note about cache-tree invalidation and the use of it in this
> code.
> 
> We do invalidate cache-tree in _source_ index when we add new entries
> to the (temporary) "result" index. But we also use the cache-tree from
> source index in this optimization. Does this mean we end up having no
> cache-tree in the source index to activate this optimization?
> 
> The answer is twisted: the order of finding a good cache-tree and
> invalidating it matters. In this case we check for a good cache-tree
> first in all_trees_same_as_cache_tree(), then we start to merge things
> and potentially invalidate that same cache-tree in the process. Since
> cache-tree invalidation happens after the optimization kicks in, we're
> still good. But we may lose that cache-tree at the very first
> call_unpack_fn() call in traverse_by_cache_tree().
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   unpack-trees.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 127 insertions(+)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index b237eaa0f2..07456d0fb2 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
>   	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
>   }
>   
> +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> +					struct name_entry *names,
> +					struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int i;
> +
> +	if (!o->merge || dirmask != ((1 << n) - 1))
> +		return 0;
> +
> +	for (i = 1; i < n; i++)
> +		if (!are_same_oid(names, names + i))
> +			return 0;
> +
> +	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> +}
> +
> +static int index_pos_by_traverse_info(struct name_entry *names,
> +				      struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int len = traverse_path_len(info, names);
> +	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
> +	int pos;
> +
> +	make_traverse_path(name, info, names);
> +	name[len++] = '/';
> +	name[len] = '\0';
> +	pos = index_name_pos(o->src_index, name, len);
> +	if (pos >= 0)
> +		BUG("This is a directory and should not exist in index");
> +	pos = -pos - 1;
> +	if (!starts_with(o->src_index->cache[pos]->name, name) ||
> +	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
> +		BUG("pos must point at the first entry in this directory");
> +	free(name);
> +	return pos;
> +}
> +
> +/*
> + * Fast path if we detect that all trees are the same as cache-tree at this
> + * path. We'll walk these trees recursively using cache-tree/index instead of
> + * ODB since already know what these trees contain.
> + */
> +static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> +				  struct name_entry *names,
> +				  struct traverse_info *info)
> +{
> +	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> +	struct unpack_trees_options *o = info->data;
> +	int i, d;
> +
> +	if (!o->merge)
> +		BUG("We need cache-tree to do this optimization");
> +
> +	/*
> +	 * Do what unpack_callback() and unpack_nondirectories() normally
> +	 * do. But we walk all paths recursively in just one loop instead.

This comment threw me for a second.  Instead of walking the paths 
recursively (like the old code path does), this code path actually does 
it in an iterative loop.  How about:

"But we walk all paths in an iterative loop instead."

> +	 *
> +	 * D/F conflicts and higher stage entries are not a concern
> +	 * because cache-tree would be invalidated and we would never
> +	 * get here in the first place.
> +	 */
> +	for (i = 0; i < nr_entries; i++) {
> +		struct cache_entry *tree_ce;
> +		int len, rc;
> +
> +		src[0] = o->src_index->cache[pos + i];
> +
> +		len = ce_namelen(src[0]);
> +		tree_ce = xcalloc(1, cache_entry_size(len));
> +
> +		tree_ce->ce_mode = src[0]->ce_mode;
> +		tree_ce->ce_flags = create_ce_flags(0);
> +		tree_ce->ce_namelen = len;
> +		oidcpy(&tree_ce->oid, &src[0]->oid);
> +		memcpy(tree_ce->name, src[0]->name, len + 1);
> +
> +		for (d = 1; d <= nr_names; d++)
> +			src[d] = tree_ce;
> +
> +		rc = call_unpack_fn((const struct cache_entry * const *)src, o);

I don't fully understand why this is still necessary since "we detect 
that all trees are the same as cache-tree at this path."  I do know 
(because I tried it :)) that if we don't actually call the unpack 
function the patch fails a bunch of tests so clearly something important 
is being missed.

> +		free(tree_ce);
> +		if (rc < 0)
> +			return rc;
> +
> +		mark_ce_used(src[0], o);
> +	}
> +	if (o->debug_unpack)
> +		printf("Unpacked %d entries from %s to %s using cache-tree\n",
> +		       nr_entries,
> +		       o->src_index->cache[pos]->name,
> +		       o->src_index->cache[pos + nr_entries - 1]->name);
> +	return 0;
> +}
> +
>   static int traverse_trees_recursive(int n, unsigned long dirmask,
>   				    unsigned long df_conflicts,
>   				    struct name_entry *names,
> @@ -655,6 +751,27 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>   	void *buf[MAX_UNPACK_TREES];
>   	struct traverse_info newinfo;
>   	struct name_entry *p;
> +	int nr_entries;
> +
> +	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
> +	if (nr_entries > 0) {
> +		struct unpack_trees_options *o = info->data;
> +		int pos = index_pos_by_traverse_info(names, info);
> +
> +		if (!o->merge || df_conflicts)
> +			BUG("Wrong condition to get here buddy");
> +
> +		/*
> +		 * All entries up to 'pos' must have been processed
> +		 * (i.e. marked CE_UNPACKED) at this point. But to be safe,
> +		 * save and restore cache_bottom anyway to not miss
> +		 * unprocessed entries before 'pos'.
> +		 */
> +		bottom = o->cache_bottom;
> +		ret = traverse_by_cache_tree(pos, nr_entries, n, names, info);
> +		o->cache_bottom = bottom;

I agree with adding this back in - very low cost to provide some 
consistency and additional safety.

> +		return ret;
> +	}
>   
>   	p = names;
>   	while (!p->mode)
> @@ -814,6 +931,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
>   	return ce;
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this function
> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_nondirectories(int n, unsigned long mask,
>   				 unsigned long dirmask,
>   				 struct cache_entry **src,
> @@ -998,6 +1120,11 @@ static void debug_unpack_callback(int n,
>   		debug_name_entry(i, names + i);
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this function
> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
>   {
>   	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 0/5] Speed up unpack_trees()
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
                                               ` (4 preceding siblings ...)
  2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
@ 2018-08-13 19:01                             ` Junio C Hamano
  2018-08-14 19:19                             ` Ben Peart
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
  7 siblings, 0 replies; 121+ messages in thread
From: Junio C Hamano @ 2018-08-13 19:01 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben.Peart, git, peartben, peff, Elijah Newren

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> v4 has a bunch of changes
>
> - 1/5 is a new one to show indented tracing. This way it's less
>   misleading to read nested time measurements
> - 3/5 now has the switch/restore cache_bottom logic. Junio suggested a
>   check instead in his final note, but I think this is safer (yeah I'm
>   scared too)
> - the old 4/4 is dropped because
>   - it assumes n-way logic
>   - the visible time saving is not worth the tradeoff
>   - Elijah gave me an idea to avoid add_index_entry() that I think
>     does not have n-way logic assumptions and gives better saving.
>     But it requires some more changes so I'm going to do it later
> - 5/5 is also new and should help reduce cache_tree_update() cost.
>   I wrote somewhere I was not going to work on this part, but it turns
>   out just a couple lines, might as well do it now.

The last step feels a bit scary, but other than that I did not spot
anything iffy in the series.  Nicely done.

Thanks.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
  2018-08-12 10:05                               ` Thomas Adam
  2018-08-13 18:44                               ` Ben Peart
@ 2018-08-13 19:25                               ` Jeff King
  2018-08-13 19:36                                 ` Stefan Beller
                                                   ` (2 more replies)
  2 siblings, 3 replies; 121+ messages in thread
From: Jeff King @ 2018-08-13 19:25 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben.Peart, git, gitster, peartben, Elijah Newren

On Sun, Aug 12, 2018 at 10:15:48AM +0200, Nguyễn Thái Ngọc Duy wrote:

> We're going to optimize unpack_trees() a bit in the following
> patches. Let's add some tracing to measure how long it takes before
> and after. This is the baseline ("git checkout -" on webkit.git, 275k
> files on worktree)
> 
>     performance: 0.056651714 s:  read cache .git/index
>     performance: 0.183101080 s:  preload index
>     performance: 0.008584433 s:  refresh index
>     performance: 0.633767589 s:   traverse_trees
>     performance: 0.340265448 s:   check_updates
>     performance: 0.381884638 s:   cache_tree_update
>     performance: 1.401562947 s:  unpack_trees
>     performance: 0.338687914 s:  write index, changed mask = 2e
>     performance: 0.411927922 s:    traverse_trees
>     performance: 0.000023335 s:    check_updates
>     performance: 0.423697246 s:   unpack_trees
>     performance: 0.423708360 s:  diff-index
>     performance: 2.559524127 s: git command: git checkout -

Am I the only one who feels a little funny about us sprinkling these
performance probes through the code base?

On Linux, "perf" already does a great job of this without having to
modify the source, and there are tools like:

  http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

that help make sense of the results.

I know that's not going to help on Windows, but presumably there are
hardware-counter based perf tools there, too.

I can buy the argument that it's nice to have some form of profiling
that works everywhere, even if it's lowest-common-denominator. I just
wonder if we could be investing effort into tooling around existing
solutions that will end up more powerful and flexible in the long run.

-Peff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-13 19:25                               ` Jeff King
@ 2018-08-13 19:36                                 ` Stefan Beller
  2018-08-13 20:11                                   ` Ben Peart
  2018-08-13 19:52                                 ` Duy Nguyen
  2018-08-13 22:41                                 ` Junio C Hamano
  2 siblings, 1 reply; 121+ messages in thread
From: Stefan Beller @ 2018-08-13 19:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Duy Nguyen, Ben Peart, git, Junio C Hamano, Ben Peart,
	Elijah Newren

On Mon, Aug 13, 2018 at 12:25 PM Jeff King <peff@peff.net> wrote:

> I can buy the argument that it's nice to have some form of profiling
> that works everywhere, even if it's lowest-common-denominator. I just
> wonder if we could be investing effort into tooling around existing
> solutions that will end up more powerful and flexible in the long run.

The issue AFAICT is that running perf is done by $YOU, the specialist,
whereas the performance framework put into place here can be
"turned on for the whole fleet" and the ability to collect data from
non-specialists is there. (Note: At GitHub you do the serving side,
whereas Google, MS also control the shipped binary on the client
side; asking a random engineer to run perf on their Git thing only
helps their special case and is unstructured; what helps is colorful
dashboards aggregating all the results from all the people).

So it really is "works everywhere," but not as you envisioned
(cross platform vs more machines) ;-)

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-13 19:25                               ` Jeff King
  2018-08-13 19:36                                 ` Stefan Beller
@ 2018-08-13 19:52                                 ` Duy Nguyen
  2018-08-13 21:47                                   ` Jeff King
  2018-08-13 22:41                                 ` Junio C Hamano
  2 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-13 19:52 UTC (permalink / raw)
  To: Jeff King
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart,
	Elijah Newren

On Mon, Aug 13, 2018 at 9:25 PM Jeff King <peff@peff.net> wrote:
> Am I the only one who feels a little funny about us sprinkling these
> performance probes through the code base?
>
> On Linux, "perf" already does a great job of this without having to
> modify the source, and there are tools like:
>
>   http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
>
> that help make sense of the results.

I don't think I have really fully mastered 'perf'. In this case for
example, I don't think the default event 'cycles' is the right one
because we are hit hard by I/O as well. I think at least I now have an
excuse to try that famous flamegraph out ;-) but if you have time to
run a quick analysis of this unpack-trees with 'perf', I'd love to
learn a trick or two from you.

> I know that's not going to help on Windows, but presumably there are
> hardware-counter based perf tools there, too.
>
> I can buy the argument that it's nice to have some form of profiling
> that works everywhere, even if it's lowest-common-denominator. I just
> wonder if we could be investing effort into tooling around existing
> solutions that will end up more powerful and flexible in the long run.

I think part of this sprinkling is to highlight the performance
sensitive spots in the code. And it would be helpful to ask a user to
enable GIT_TRACE_PERFORMANCE to have a quick breakdown when something
is reported slow. I don't care that much about other platforms to be
honest, but perf being largely restricted to root does prevent it from
replacing GIT_TRACE_PERFORMANCE in this case.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-13 19:36                                 ` Stefan Beller
@ 2018-08-13 20:11                                   ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-13 20:11 UTC (permalink / raw)
  To: Stefan Beller, Jeff King
  Cc: Duy Nguyen, Ben Peart, git, Junio C Hamano, Elijah Newren,
	jeffhost



On 8/13/2018 3:36 PM, Stefan Beller wrote:
> On Mon, Aug 13, 2018 at 12:25 PM Jeff King <peff@peff.net> wrote:
> 
>> I can buy the argument that it's nice to have some form of profiling
>> that works everywhere, even if it's lowest-common-denominator. I just
>> wonder if we could be investing effort into tooling around existing
>> solutions that will end up more powerful and flexible in the long run.
> 
> The issue AFAICT is that running perf is done by $YOU, the specialist,
> whereas the performance framework put into place here can be
> "turned on for the whole fleet" and the ability to collect data from
> non-specialists is there. (Note: At GitHub you do the serving side,
> whereas Google, MS also control the shipped binary on the client
> side; asking a random engineer to run perf on their Git thing only
> helps their special case and is unstructured; what helps is colorful
> dashboards aggregating all the results from all the people).
> 
> So it really is "works everywhere," but not as you envisioned
> (cross platform vs more machines) ;-)
> 

I currently use GIT_TRACE_PERFORMANCE primarily to communicate 
performance measurements on the mailing list.  While it is convenient 
occasionally to run with it turned on locally, primarily it gives me a 
common reference when communicating with others on the list about 
performance.

We have several excellent profiling tools available on Windows 
(perfview, VS, wpa, etc) so for any detailed investigations, I use 
those.  They obviously don't require any instrumenting in the code.

For our internal end user performance data, we'll use structured logging 
and our custom telemetry solution rather than the GIT_TRACE_PERFORMANCE 
mechanism.  We never ask end users to turn on GIT_TRACE_PERFORMANCE.  If 
we need more than what we can gather using telemetry, we ask them to 
capture a perfview along with other diagnostic data and send it to us 
for evaluation.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-13 19:52                                 ` Duy Nguyen
@ 2018-08-13 21:47                                   ` Jeff King
  0 siblings, 0 replies; 121+ messages in thread
From: Jeff King @ 2018-08-13 21:47 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart,
	Elijah Newren

On Mon, Aug 13, 2018 at 09:52:41PM +0200, Duy Nguyen wrote:

> I don't think I have really fully mastered 'perf'. In this case for
> example, I don't think the default event 'cycles' is the right one
> because we are hit hard by I/O as well. I think at least I now have an
> excuse to try that famous flamegraph out ;-) but if you have time to
> run a quick analysis of this unpack-trees with 'perf', I'd love to
> learn a trick or two from you.

To be honest, I don't feel like I know how to use perf either. ;) But
I'll try to contribute what I know.

Usually I'd just use perf to get a callgraph with hot-spots, like:

  perf record -g git ...
  perf report --call-graph=fractal,0.05,caller

But that's not going to show you absolute times, which makes it lousy
for comparing run-to-run (if you speed something up, its percentage gets
smaller, but it's hard to tell _how much_ you've sped it up). And as you
note, it's measuring CPU cycles, not wall-clock.

To get output most similar to what you've shown, I think you'd define
some probes at functions of interest:

  for i in unpack_trees cache_tree_update; do
    # Cover both function entrance and return.
    perf probe -x $(which git) $i
    perf probe -x $(which git) ${i}%return
  done

and then record a run looking for those events:

  perf record -e 'probe_git:*' git ...

and then dump the result:

  perf script -F time,event

which gives you the times for each event. If you want elapsed times, you
have to compute them yourself:

  perf script -F time,event |
  perl -ne '
    /([0-9.]+):\s+probe_git:(.*):/ or die "confusing: $_";
    my ($t, $func) = ($1, $2);
    if ($func =~ s/__return$//) {
      my $start = pop @stack;
      printf "%0.9f", $t - $start;
      print " s: ";
      print "  " for (0..@stack-1);
      print $func, "\n";
    } else {
      push @stack, $t;
    }
  '

which gives a similar inverted-graph elapsed-time output that your trace
output does. One annoying downside is that you have to be root to create
or use the dynamic probes. I don't know if there's an easy way around
that. Or if there's a perf command which already handles this kind of
elapsed stuff (there's a "perf trace" which seems really close, but I
couldn't convince it to look at elapsed time for non-syscalls).

> > I can buy the argument that it's nice to have some form of profiling
> > that works everywhere, even if it's lowest-common-denominator. I just
> > wonder if we could be investing effort into tooling around existing
> > solutions that will end up more powerful and flexible in the long run.
> 
> I think part of this sprinkling is to highlight the performance
> sensitive spots in the code. And it would be helpful to ask a user to
> enable GIT_TRACE_PERFORMANCE to have a quick breakdown when something
> is reported slow. I don't care that much about other platforms to be
> honest, but perf being largely restricted to root does prevent it from
> replacing GIT_TRACE_PERFORMANCE in this case.

Yeah, this line of reasoning (which is similar to what Stefan said) is
compelling to me. GIT_TRACE_* is _most_ useful when we can ask ordinary
users to give us output. Even if we scripted the complexity I showed
above, it's not guaranteed that perf is even available or that the user
has permissions to use it.

-Peff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-13 19:25                               ` Jeff King
  2018-08-13 19:36                                 ` Stefan Beller
  2018-08-13 19:52                                 ` Duy Nguyen
@ 2018-08-13 22:41                                 ` Junio C Hamano
  2018-08-14 18:19                                   ` Jeff Hostetler
  2 siblings, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-08-13 22:41 UTC (permalink / raw)
  To: Jeff King
  Cc: Nguyễn Thái Ngọc Duy, Ben.Peart, git, peartben,
	Elijah Newren

Jeff King <peff@peff.net> writes:

> I can buy the argument that it's nice to have some form of profiling
> that works everywhere, even if it's lowest-common-denominator. I just
> wonder if we could be investing effort into tooling around existing
> solutions that will end up more powerful and flexible in the long run.

Another thing I noticed is that the codepaths we would find
interesting to annotate with trace_performance_* stuff often
overlaps with the "slog" thing.  If the latter aims to eventually
replace GIT_TRACE (and if not, I suspect there is not much point
adding it in the first place), perhaps we can extend it to also
cover the need of these trace_performance_* calls, so that we do not
have to carry three different tracing mechanisms.


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-13 22:41                                 ` Junio C Hamano
@ 2018-08-14 18:19                                   ` Jeff Hostetler
  2018-08-14 18:32                                     ` Duy Nguyen
  0 siblings, 1 reply; 121+ messages in thread
From: Jeff Hostetler @ 2018-08-14 18:19 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King
  Cc: Nguyễn Thái Ngọc Duy, Ben.Peart, git, peartben,
	Elijah Newren



On 8/13/2018 6:41 PM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> I can buy the argument that it's nice to have some form of profiling
>> that works everywhere, even if it's lowest-common-denominator. I just
>> wonder if we could be investing effort into tooling around existing
>> solutions that will end up more powerful and flexible in the long run.
> 
> Another thing I noticed is that the codepaths we would find
> interesting to annotate with trace_performance_* stuff often
> overlaps with the "slog" thing.  If the latter aims to eventually
> replace GIT_TRACE (and if not, I suspect there is not much point
> adding it in the first place), perhaps we can extend it to also
> cover the need of these trace_performance_* calls, so that we do not
> have to carry three different tracing mechanisms.
> 

I'm looking at adding code to my SLOG (better name suggestions welcome)
patch series to eventually replace the existing git_trace facility.
And I would like to have a set of nested messages like Duy has proposed
be a part of that.

In an independent effort I've found the nested messages being very
helpful in certain contexts.  They are not a replacement for the
various platform tools, like PerfView and friends as discussed earlier
on this thread, but then again I can ask a customer to turn a knob and
run it again and send me the output and hopefully get a rough idea of
the problem -- without having them install a bunch of perf tools.

Jeff


^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-14 18:19                                   ` Jeff Hostetler
@ 2018-08-14 18:32                                     ` Duy Nguyen
  2018-08-14 18:44                                       ` Stefan Beller
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-14 18:32 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Jeff King, Ben Peart, Git Mailing List, Ben Peart,
	Elijah Newren

On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> I'm looking at adding code to my SLOG (better name suggestions welcome)
> patch series to eventually replace the existing git_trace facility.

Complement maybe. Replace, please no. I'd rather not stare at json messages.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-14 18:32                                     ` Duy Nguyen
@ 2018-08-14 18:44                                       ` Stefan Beller
  2018-08-14 18:51                                         ` Duy Nguyen
  2018-08-14 20:14                                         ` Jeff Hostetler
  0 siblings, 2 replies; 121+ messages in thread
From: Stefan Beller @ 2018-08-14 18:44 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff Hostetler, Junio C Hamano, Jeff King, Ben Peart, git,
	Ben Peart, Elijah Newren

On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> > I'm looking at adding code to my SLOG (better name suggestions welcome)
> > patch series to eventually replace the existing git_trace facility.
>
> Complement maybe. Replace, please no. I'd rather not stare at json messages.

From the sidelines: We'd only need one logging infrastructure in place, as the
formatting would be done as a later step? For local operations we'd certainly
find better formatting than json, and we figured that we might end up desiring
ProtocolBuffers[1] instead of JSon, so if it would be easy to change
the output of
the structured logging easily that would be great.

But AFAICT these series are all about putting the sampling points into the
code base, so formatting would be orthogonal to it?

Stefan

[1] https://developers.google.com/protocol-buffers/

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-14 18:44                                       ` Stefan Beller
@ 2018-08-14 18:51                                         ` Duy Nguyen
  2018-08-14 19:54                                           ` Jeff King
  2018-08-14 20:52                                           ` Junio C Hamano
  2018-08-14 20:14                                         ` Jeff Hostetler
  1 sibling, 2 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-14 18:51 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff Hostetler, Junio C Hamano, Jeff King, Ben Peart,
	Git Mailing List, Ben Peart, Elijah Newren

On Tue, Aug 14, 2018 at 8:44 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
> > > I'm looking at adding code to my SLOG (better name suggestions welcome)
> > > patch series to eventually replace the existing git_trace facility.
> >
> > Complement maybe. Replace, please no. I'd rather not stare at json messages.
>
> From the sidelines: We'd only need one logging infrastructure in place, as the
> formatting would be done as a later step? For local operations we'd certainly
> find better formatting than json, and we figured that we might end up desiring
> ProtocolBuffers[1] instead of JSon, so if it would be easy to change
> the output of
> the structured logging easily that would be great.

These trace messages are made for human consumption. Granted
occasionally we need some processing but I find one liners mostly
suffice. Now we turn these into something made for machines, turning
people to second citizens. I've read these messages reformatted for
human, it's usually too verbose even if it's reformatted.

> But AFAICT these series are all about putting the sampling points into the
> code base, so formatting would be orthogonal to it?

It's not just sampling points. There's things like index id being
shown in the message for example. I prefer to keep free style format
to help me read. There's also things like indentation I do here to
help me read. Granted you could do all that with scripts and stuff,
but will we pass around in mail  dumps of json messages to be decoded
locally?

> Stefan
>
> [1] https://developers.google.com/protocol-buffers/



-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 0/5] Speed up unpack_trees()
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
                                               ` (5 preceding siblings ...)
  2018-08-13 19:01                             ` [PATCH v4 0/5] Speed up unpack_trees() Junio C Hamano
@ 2018-08-14 19:19                             ` Ben Peart
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
  7 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-14 19:19 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben.Peart, git, gitster, peff, Elijah Newren



On 8/12/2018 4:15 AM, Nguyễn Thái Ngọc Duy wrote:
> v4 has a bunch of changes
> 
> - 1/5 is a new one to show indented tracing. This way it's less
>    misleading to read nested time measurements
> - 3/5 now has the switch/restore cache_bottom logic. Junio suggested a
>    check instead in his final note, but I think this is safer (yeah I'm
>    scared too)
> - the old 4/4 is dropped because
>    - it assumes n-way logic
>    - the visible time saving is not worth the tradeoff
>    - Elijah gave me an idea to avoid add_index_entry() that I think
>      does not have n-way logic assumptions and gives better saving.
>      But it requires some more changes so I'm going to do it later
> - 5/5 is also new and should help reduce cache_tree_update() cost.
>    I wrote somewhere I was not going to work on this part, but it turns
>    out just a couple lines, might as well do it now.
> 
> Interdiff
> 

I've now had a chance to run the git tests, as well as our own unit and 
functional tests with this patch series and all passed.

I reviewed the tests in t0090-cache-tree.h and verified that there are 
tests that validate the cache tree is correct after doing a checkout and 
merge (both of which exercise the new cache tree optimization in patch 5).

I've also run our perf test suite and the results are outstanding:

Checkout saves 51% on average
Merge saves 44%
Pull saves 30%
Rebase saves 26%

For perspective, that means these commands are going from ~20 seconds to 
~10 seconds.

I don't feel that any of my comments to the individual patches deserve a 
re-roll.  Given the ongoing discussion about the additional tracing - 
I'm happy to leave out the first 2 patches so that the rest can go in 
sooner rather than later.

Looks good!

> diff --git a/cache-tree.c b/cache-tree.c
> index 0dbe10fc85..105f13806f 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -426,7 +426,6 @@ static int update_one(struct cache_tree *it,
>   
>   int cache_tree_update(struct index_state *istate, int flags)
>   {
> -	uint64_t start = getnanotime();
>   	struct cache_tree *it = istate->cache_tree;
>   	struct cache_entry **cache = istate->cache;
>   	int entries = istate->cache_nr;
> @@ -434,11 +433,12 @@ int cache_tree_update(struct index_state *istate, int flags)
>   
>   	if (i)
>   		return i;
> +	trace_performance_enter();
>   	i = update_one(it, cache, entries, "", 0, &skip, flags);
> +	trace_performance_leave("cache_tree_update");
>   	if (i < 0)
>   		return i;
>   	istate->cache_changed |= CACHE_TREE_CHANGED;
> -	trace_performance_since(start, "repair cache-tree");
>   	return 0;
>   }
>   
> diff --git a/cache.h b/cache.h
> index e6f7ee4b64..8b447652a7 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -673,7 +673,6 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
>   #define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
>   #define ADD_CACHE_NEW_ONLY 16		/* Do not replace existing ones */
>   #define ADD_CACHE_KEEP_CACHE_TREE 32	/* Do not invalidate cache-tree */
> -#define ADD_CACHE_SKIP_VERIFY_PATH 64	/* Do not verify path */
>   extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
>   extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
>   
> diff --git a/diff-lib.c b/diff-lib.c
> index a9f38eb5a3..1ffa22c882 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -518,8 +518,8 @@ static int diff_cache(struct rev_info *revs,
>   int run_diff_index(struct rev_info *revs, int cached)
>   {
>   	struct object_array_entry *ent;
> -	uint64_t start = getnanotime();
>   
> +	trace_performance_enter();
>   	ent = revs->pending.objects;
>   	if (diff_cache(revs, &ent->item->oid, ent->name, cached))
>   		exit(128);
> @@ -528,7 +528,7 @@ int run_diff_index(struct rev_info *revs, int cached)
>   	diffcore_fix_diff_index(&revs->diffopt);
>   	diffcore_std(&revs->diffopt);
>   	diff_flush(&revs->diffopt);
> -	trace_performance_since(start, "diff-index");
> +	trace_performance_leave("diff-index");
>   	return 0;
>   }
>   
> diff --git a/dir.c b/dir.c
> index 21e6f2520a..c5e9fc8cea 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2263,11 +2263,11 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>   		   const char *path, int len, const struct pathspec *pathspec)
>   {
>   	struct untracked_cache_dir *untracked;
> -	uint64_t start = getnanotime();
>   
>   	if (has_symlink_leading_path(path, len))
>   		return dir->nr;
>   
> +	trace_performance_enter();
>   	untracked = validate_untracked_cache(dir, len, pathspec);
>   	if (!untracked)
>   		/*
> @@ -2302,7 +2302,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>   		dir->nr = i;
>   	}
>   
> -	trace_performance_since(start, "read directory %.*s", len, path);
> +	trace_performance_leave("read directory %.*s", len, path);
>   	if (dir->untracked) {
>   		static int force_untracked_cache = -1;
>   		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
> diff --git a/name-hash.c b/name-hash.c
> index 163849831c..1fcda73cb3 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash(
>   
>   static void lazy_init_name_hash(struct index_state *istate)
>   {
> -	uint64_t start = getnanotime();
>   
>   	if (istate->name_hash_initialized)
>   		return;
> +	trace_performance_enter();
>   	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
>   	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
>   
> @@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
>   	}
>   
>   	istate->name_hash_initialized = 1;
> -	trace_performance_since(start, "initialize name hash");
> +	trace_performance_leave("initialize name hash");
>   }
>   
>   /*
> diff --git a/preload-index.c b/preload-index.c
> index 4d08d44874..d7f7919ba2 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -78,7 +78,6 @@ static void preload_index(struct index_state *index,
>   {
>   	int threads, i, work, offset;
>   	struct thread_data data[MAX_PARALLEL];
> -	uint64_t start = getnanotime();
>   
>   	if (!core_preload_index)
>   		return;
> @@ -88,6 +87,7 @@ static void preload_index(struct index_state *index,
>   		threads = 2;
>   	if (threads < 2)
>   		return;
> +	trace_performance_enter();
>   	if (threads > MAX_PARALLEL)
>   		threads = MAX_PARALLEL;
>   	offset = 0;
> @@ -109,7 +109,7 @@ static void preload_index(struct index_state *index,
>   		if (pthread_join(p->pthread, NULL))
>   			die("unable to join threaded lstat");
>   	}
> -	trace_performance_since(start, "preload index");
> +	trace_performance_leave("preload index");
>   }
>   #endif
>   
> diff --git a/read-cache.c b/read-cache.c
> index b0b5df5de7..2b5646ef26 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1170,7 +1170,6 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>   	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
>   	int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
>   	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> -	int skip_verify_path = option & ADD_CACHE_SKIP_VERIFY_PATH;
>   	int new_only = option & ADD_CACHE_NEW_ONLY;
>   
>   	if (!(option & ADD_CACHE_KEEP_CACHE_TREE))
> @@ -1211,7 +1210,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>   
>   	if (!ok_to_add)
>   		return -1;
> -	if (!skip_verify_path && !verify_path(ce->name, ce->ce_mode))
> +	if (!verify_path(ce->name, ce->ce_mode))
>   		return error("Invalid path '%s'", ce->name);
>   
>   	if (!skip_df_check &&
> @@ -1400,8 +1399,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>   	const char *typechange_fmt;
>   	const char *added_fmt;
>   	const char *unmerged_fmt;
> -	uint64_t start = getnanotime();
>   
> +	trace_performance_enter();
>   	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
>   	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
>   	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
> @@ -1471,7 +1470,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>   
>   		replace_index_entry(istate, i, new_entry);
>   	}
> -	trace_performance_since(start, "refresh index");
> +	trace_performance_leave("refresh index");
>   	return has_errors;
>   }
>   
> @@ -1902,7 +1901,6 @@ static void freshen_shared_index(const char *shared_index, int warn)
>   int read_index_from(struct index_state *istate, const char *path,
>   		    const char *gitdir)
>   {
> -	uint64_t start = getnanotime();
>   	struct split_index *split_index;
>   	int ret;
>   	char *base_oid_hex;
> @@ -1912,8 +1910,9 @@ int read_index_from(struct index_state *istate, const char *path,
>   	if (istate->initialized)
>   		return istate->cache_nr;
>   
> +	trace_performance_enter();
>   	ret = do_read_index(istate, path, 0);
> -	trace_performance_since(start, "read cache %s", path);
> +	trace_performance_leave("read cache %s", path);
>   
>   	split_index = istate->split_index;
>   	if (!split_index || is_null_oid(&split_index->base_oid)) {
> @@ -1921,6 +1920,7 @@ int read_index_from(struct index_state *istate, const char *path,
>   		return ret;
>   	}
>   
> +	trace_performance_enter();
>   	if (split_index->base)
>   		discard_index(split_index->base);
>   	else
> @@ -1937,8 +1937,8 @@ int read_index_from(struct index_state *istate, const char *path,
>   	freshen_shared_index(base_path, 0);
>   	merge_base_index(istate);
>   	post_read_index_from(istate);
> -	trace_performance_since(start, "read cache %s", base_path);
>   	free(base_path);
> +	trace_performance_leave("read cache %s", base_path);
>   	return ret;
>   }
>   
> @@ -2763,4 +2763,6 @@ void move_index_extensions(struct index_state *dst, struct index_state *src)
>   {
>   	dst->untracked = src->untracked;
>   	src->untracked = NULL;
> +	dst->cache_tree = src->cache_tree;
> +	src->cache_tree = NULL;
>   }
> diff --git a/trace.c b/trace.c
> index fc623e91fd..fa4a2e7120 100644
> --- a/trace.c
> +++ b/trace.c
> @@ -176,10 +176,30 @@ void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
>   	strbuf_release(&buf);
>   }
>   
> +static uint64_t perf_start_times[10];
> +static int perf_indent;
> +
> +uint64_t trace_performance_enter(void)
> +{
> +	uint64_t now;
> +
> +	if (!trace_want(&trace_perf_key))
> +		return 0;
> +
> +	now = getnanotime();
> +	perf_start_times[perf_indent] = now;
> +	if (perf_indent + 1 < ARRAY_SIZE(perf_start_times))
> +		perf_indent++;
> +	else
> +		BUG("Too deep indentation");
> +	return now;
> +}
> +
>   static void trace_performance_vprintf_fl(const char *file, int line,
>   					 uint64_t nanos, const char *format,
>   					 va_list ap)
>   {
> +	static const char space[] = "          ";
>   	struct strbuf buf = STRBUF_INIT;
>   
>   	if (!prepare_trace_line(file, line, &trace_perf_key, &buf))
> @@ -188,7 +208,10 @@ static void trace_performance_vprintf_fl(const char *file, int line,
>   	strbuf_addf(&buf, "performance: %.9f s", (double) nanos / 1000000000);
>   
>   	if (format && *format) {
> -		strbuf_addstr(&buf, ": ");
> +		if (perf_indent >= strlen(space))
> +			BUG("Too deep indentation");
> +
> +		strbuf_addf(&buf, ":%.*s ", perf_indent, space);
>   		strbuf_vaddf(&buf, format, ap);
>   	}
>   
> @@ -244,6 +267,24 @@ void trace_performance_since(uint64_t start, const char *format, ...)
>   	va_end(ap);
>   }
>   
> +void trace_performance_leave(const char *format, ...)
> +{
> +	va_list ap;
> +	uint64_t since;
> +
> +	if (perf_indent)
> +		perf_indent--;
> +
> +	if (!format) /* Allow callers to leave without tracing anything */
> +		return;
> +
> +	since = perf_start_times[perf_indent];
> +	va_start(ap, format);
> +	trace_performance_vprintf_fl(NULL, 0, getnanotime() - since,
> +				     format, ap);
> +	va_end(ap);
> +}
> +
>   #else
>   
>   void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
> @@ -273,6 +314,24 @@ void trace_performance_fl(const char *file, int line, uint64_t nanos,
>   	va_end(ap);
>   }
>   
> +void trace_performance_leave_fl(const char *file, int line,
> +				uint64_t nanos, const char *format, ...)
> +{
> +	va_list ap;
> +	uint64_t since;
> +
> +	if (perf_indent)
> +		perf_indent--;
> +
> +	if (!format) /* Allow callers to leave without tracing anything */
> +		return;
> +
> +	since = perf_start_times[perf_indent];
> +	va_start(ap, format);
> +	trace_performance_vprintf_fl(file, line, nanos - since, format, ap);
> +	va_end(ap);
> +}
> +
>   #endif /* HAVE_VARIADIC_MACROS */
>   
>   
> @@ -411,13 +470,11 @@ uint64_t getnanotime(void)
>   	}
>   }
>   
> -static uint64_t command_start_time;
>   static struct strbuf command_line = STRBUF_INIT;
>   
>   static void print_command_performance_atexit(void)
>   {
> -	trace_performance_since(command_start_time, "git command:%s",
> -				command_line.buf);
> +	trace_performance_leave("git command:%s", command_line.buf);
>   }
>   
>   void trace_command_performance(const char **argv)
> @@ -425,10 +482,10 @@ void trace_command_performance(const char **argv)
>   	if (!trace_want(&trace_perf_key))
>   		return;
>   
> -	if (!command_start_time)
> +	if (!command_line.len)
>   		atexit(print_command_performance_atexit);
>   
>   	strbuf_reset(&command_line);
>   	sq_quote_argv_pretty(&command_line, argv);
> -	command_start_time = getnanotime();
> +	trace_performance_enter();
>   }
> diff --git a/trace.h b/trace.h
> index 2b6a1bc17c..171b256d26 100644
> --- a/trace.h
> +++ b/trace.h
> @@ -23,6 +23,7 @@ extern void trace_disable(struct trace_key *key);
>   extern uint64_t getnanotime(void);
>   extern void trace_command_performance(const char **argv);
>   extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
> +uint64_t trace_performance_enter(void);
>   
>   #ifndef HAVE_VARIADIC_MACROS
>   
> @@ -45,6 +46,9 @@ extern void trace_performance(uint64_t nanos, const char *format, ...);
>   __attribute__((format (printf, 2, 3)))
>   extern void trace_performance_since(uint64_t start, const char *format, ...);
>   
> +__attribute__((format (printf, 1, 2)))
> +void trace_performance_leave(const char *format, ...);
> +
>   #else
>   
>   /*
> @@ -118,6 +122,14 @@ extern void trace_performance_since(uint64_t start, const char *format, ...);
>   					     __VA_ARGS__);		    \
>   	} while (0)
>   
> +#define trace_performance_leave(...)					    \
> +	do {								    \
> +		if (trace_pass_fl(&trace_perf_key))			    \
> +			trace_performance_leave_fl(TRACE_CONTEXT, __LINE__, \
> +						   getnanotime(),	    \
> +						   __VA_ARGS__);	    \
> +	} while (0)
> +
>   /* backend functions, use non-*fl macros instead */
>   __attribute__((format (printf, 4, 5)))
>   extern void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
> @@ -130,6 +142,9 @@ extern void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
>   __attribute__((format (printf, 4, 5)))
>   extern void trace_performance_fl(const char *file, int line,
>   				 uint64_t nanos, const char *fmt, ...);
> +__attribute__((format (printf, 4, 5)))
> +extern void trace_performance_leave_fl(const char *file, int line,
> +				       uint64_t nanos, const char *fmt, ...);
>   static inline int trace_pass_fl(struct trace_key *key)
>   {
>   	return key->fd || !key->initialized;
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 1438ee1555..d822662c75 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -201,7 +201,6 @@ static int do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
>   
>   	ce->ce_flags = (ce->ce_flags & ~clear) | set;
>   	return add_index_entry(&o->result, ce,
> -			       o->extra_add_index_flags |
>   			       ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
>   }
>   
> @@ -353,9 +352,9 @@ static int check_updates(struct unpack_trees_options *o)
>   	struct progress *progress = NULL;
>   	struct index_state *index = &o->result;
>   	struct checkout state = CHECKOUT_INIT;
> -	uint64_t start = getnanotime();
>   	int i;
>   
> +	trace_performance_enter();
>   	state.force = 1;
>   	state.quiet = 1;
>   	state.refresh_cache = 1;
> @@ -425,7 +424,7 @@ static int check_updates(struct unpack_trees_options *o)
>   	errs |= finish_delayed_checkout(&state);
>   	if (o->update)
>   		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
> -	trace_performance_since(start, "update worktree after a merge");
> +	trace_performance_leave("check_updates");
>   	return errs != 0;
>   }
>   
> @@ -702,31 +701,13 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>   	if (!o->merge)
>   		BUG("We need cache-tree to do this optimization");
>   
> -	/*
> -	 * Try to keep add_index_entry() as fast as possible since
> -	 * we're going to do a lot of them.
> -	 *
> -	 * Skipping verify_path() should totally be safe because these
> -	 * paths are from the source index, which must have been
> -	 * verified.
> -	 *
> -	 * Skipping D/F and cache-tree validation checks is trickier
> -	 * because it assumes what n-merge code would do when all
> -	 * trees and the index are the same. We probably could just
> -	 * optimize those code instead (e.g. we don't invalidate that
> -	 * many cache-tree, but the searching for them is very
> -	 * expensive).
> -	 */
> -	o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
> -	o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
> -
>   	/*
>   	 * Do what unpack_callback() and unpack_nondirectories() normally
>   	 * do. But we walk all paths recursively in just one loop instead.
>   	 *
> -	 * D/F conflicts and staged entries are not a concern because
> -	 * cache-tree would be invalidated and we would never get here
> -	 * in the first place.
> +	 * D/F conflicts and higher stage entries are not a concern
> +	 * because cache-tree would be invalidated and we would never
> +	 * get here in the first place.
>   	 */
>   	for (i = 0; i < nr_entries; i++) {
>   		int new_ce_len, len, rc;
> @@ -761,7 +742,6 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
>   
>   		mark_ce_used(src[0], o);
>   	}
> -	o->extra_add_index_flags = 0;
>   	free(tree_ce);
>   	if (o->debug_unpack)
>   		printf("Unpacked %d entries from %s to %s using cache-tree\n",
> @@ -791,7 +771,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>   
>   		if (!o->merge || df_conflicts)
>   			BUG("Wrong condition to get here buddy");
> -		return traverse_by_cache_tree(pos, nr_entries, n, names, info);
> +
> +		/*
> +		 * All entries up to 'pos' must have been processed
> +		 * (i.e. marked CE_UNPACKED) at this point. But to be safe,
> +		 * save and restore cache_bottom anyway to not miss
> +		 * unprocessed entries before 'pos'.
> +		 */
> +		bottom = o->cache_bottom;
> +		ret = traverse_by_cache_tree(pos, nr_entries, n, names, info);
> +		o->cache_bottom = bottom;
> +		return ret;
>   	}
>   
>   	p = names;
> @@ -1142,7 +1132,7 @@ static void debug_unpack_callback(int n,
>   }
>   
>   /*
> - * Note that traverse_by_cache_tree() duplicates some logic in this funciton
> + * Note that traverse_by_cache_tree() duplicates some logic in this function
>    * without actually calling it. If you change the logic here you may need to
>    * check and change there as well.
>    */
> @@ -1425,11 +1415,11 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	int i, ret;
>   	static struct cache_entry *dfc;
>   	struct exclude_list el;
> -	uint64_t start = getnanotime();
>   
>   	if (len > MAX_UNPACK_TREES)
>   		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
>   
> +	trace_performance_enter();
>   	memset(&el, 0, sizeof(el));
>   	if (!core_apply_sparse_checkout || !o->update)
>   		o->skip_sparse_checkout = 1;
> @@ -1502,7 +1492,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   			}
>   		}
>   
> -		if (traverse_trees(len, t, &info) < 0)
> +		trace_performance_enter();
> +		ret = traverse_trees(len, t, &info);
> +		trace_performance_leave("traverse_trees");
> +		if (ret < 0)
>   			goto return_failed;
>   	}
>   
> @@ -1574,10 +1567,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   			goto done;
>   		}
>   	}
> -	trace_performance_since(start, "unpack trees");
>   
>   	ret = check_updates(o) ? (-2) : 0;
>   	if (o->dst_index) {
> +		move_index_extensions(&o->result, o->src_index);
>   		if (!ret) {
>   			if (!o->result.cache_tree)
>   				o->result.cache_tree = cache_tree();
> @@ -1586,7 +1579,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   						  WRITE_TREE_SILENT |
>   						  WRITE_TREE_REPAIR);
>   		}
> -		move_index_extensions(&o->result, o->src_index);
>   		discard_index(o->dst_index);
>   		*o->dst_index = o->result;
>   	} else {
> @@ -1595,6 +1587,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	o->src_index = NULL;
>   
>   done:
> +	trace_performance_leave("unpack_trees");
>   	clear_exclude_list(&el);
>   	return ret;
>   
> diff --git a/unpack-trees.h b/unpack-trees.h
> index 94e1b14078..c2b434c606 100644
> --- a/unpack-trees.h
> +++ b/unpack-trees.h
> @@ -80,7 +80,6 @@ struct unpack_trees_options {
>   	struct index_state result;
>   
>   	struct exclude_list *el; /* for internal use */
> -	unsigned int extra_add_index_flags;
>   };
>   
>   extern int unpack_trees(unsigned n, struct tree_desc *t,
> 
> Nguyễn Thái Ngọc Duy (5):
>    trace.h: support nested performance tracing
>    unpack-trees: add performance tracing
>    unpack-trees: optimize walking same trees with cache-tree
>    unpack-trees: reduce malloc in cache-tree walk
>    unpack-trees: reuse (still valid) cache-tree from src_index
> 
>   cache-tree.c    |   2 +
>   diff-lib.c      |   4 +-
>   dir.c           |   4 +-
>   name-hash.c     |   4 +-
>   preload-index.c |   4 +-
>   read-cache.c    |  13 +++--
>   trace.c         |  69 ++++++++++++++++++++--
>   trace.h         |  15 +++++
>   unpack-trees.c  | 149 +++++++++++++++++++++++++++++++++++++++++++++++-
>   9 files changed, 243 insertions(+), 21 deletions(-)
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-14 18:51                                         ` Duy Nguyen
@ 2018-08-14 19:54                                           ` Jeff King
  2018-08-14 20:52                                           ` Junio C Hamano
  1 sibling, 0 replies; 121+ messages in thread
From: Jeff King @ 2018-08-14 19:54 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Stefan Beller, Jeff Hostetler, Junio C Hamano, Ben Peart,
	Git Mailing List, Ben Peart, Elijah Newren

On Tue, Aug 14, 2018 at 08:51:41PM +0200, Duy Nguyen wrote:

> > But AFAICT these series are all about putting the sampling points into the
> > code base, so formatting would be orthogonal to it?
> 
> It's not just sampling points. There's things like index id being
> shown in the message for example. I prefer to keep free style format
> to help me read. There's also things like indentation I do here to
> help me read. Granted you could do all that with scripts and stuff,
> but will we pass around in mail  dumps of json messages to be decoded
> locally?

I think you could have both forms using the same entry points sprinkled
through the code.

At GitHub we have a similar telemetry-ish thing, where we collect some
data points and then the resulting JSON is stored for every operation
(for a few weeks for read ops, and indefinitely attached to every ref
write).

And I've found that the storage and the trace-style "just show a
human-readable message to stderr" interface complement each other in
both directions:

 - you can output a human readable message that is sent immediately to
   the trace mechanism but _also_ becomes part of the telemetry. E.g.,
   imagine that one item in the json blob is "this is the last message
   from GIT_TRACE_FOO". Now you can push tracing messages into whatever
   plan you're using to store SLOG. We do this less with TRACE, and much
   more with error() and die() messages.

 - when a structured telemetry item is updated, we can still output a
   human-readable trace message with just that item. E.g., with:

     trace_performance(n, "foo");

   we could either store a json key (perf.foo=n) or output a nicely
   formatted string like we do now, depending on what the user has
   configured (or even both, of course).

It helps if the sampling points give enough information to cover both
cases (as in the trace_performance example), but you can generally
shoe-horn unstructured data into the structured log, and pretty-print
structured data.

-Peff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-14 18:44                                       ` Stefan Beller
  2018-08-14 18:51                                         ` Duy Nguyen
@ 2018-08-14 20:14                                         ` Jeff Hostetler
  1 sibling, 0 replies; 121+ messages in thread
From: Jeff Hostetler @ 2018-08-14 20:14 UTC (permalink / raw)
  To: Stefan Beller, Duy Nguyen
  Cc: Junio C Hamano, Jeff King, Ben Peart, git, Ben Peart,
	Elijah Newren



On 8/14/2018 2:44 PM, Stefan Beller wrote:
> On Tue, Aug 14, 2018 at 11:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
>>
>> On Tue, Aug 14, 2018 at 8:19 PM Jeff Hostetler <git@jeffhostetler.com> wrote:
>>> I'm looking at adding code to my SLOG (better name suggestions welcome)
>>> patch series to eventually replace the existing git_trace facility.
>>
>> Complement maybe. Replace, please no. I'd rather not stare at json messages.
> 
>  From the sidelines: We'd only need one logging infrastructure in place, as the
> formatting would be done as a later step? For local operations we'd certainly
> find better formatting than json, and we figured that we might end up desiring
> ProtocolBuffers[1] instead of JSon, so if it would be easy to change
> the output of
> the structured logging easily that would be great.
> 
> But AFAICT these series are all about putting the sampling points into the
> code base, so formatting would be orthogonal to it?
> 
> Stefan
> 
> [1] https://developers.google.com/protocol-buffers/
> 

Last time I checked, protocol-buffers has a C++ binding but not
a C binding.

I've not had a chance to use pbuffers, so I have to ask what advantages
would they have over JSON or some other similar self-describing format?
And/or would it be possible for you to tail the json log file and
convert it to whatever format you preferred?

It seems like the important thing is to capture structured data
(whatever the format) to disk first.

Jeff

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-14 18:51                                         ` Duy Nguyen
  2018-08-14 19:54                                           ` Jeff King
@ 2018-08-14 20:52                                           ` Junio C Hamano
  2018-08-15 16:32                                             ` Duy Nguyen
  1 sibling, 1 reply; 121+ messages in thread
From: Junio C Hamano @ 2018-08-14 20:52 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Stefan Beller, Jeff Hostetler, Jeff King, Ben Peart,
	Git Mailing List, Ben Peart, Elijah Newren

Duy Nguyen <pclouds@gmail.com> writes:

> These trace messages are made for human consumption. Granted
> occasionally we need some processing but I find one liners mostly
> suffice. Now we turn these into something made for machines, turning
> people to second citizens. I've read these messages reformatted for
> human, it's usually too verbose even if it's reformatted.

I actually actively hate the aspect of the slog thing that exposes
the fact that it wants to take and show JSON too much in its API,
but if you look at these "jw_object_*()" thing as _only_ filling
parameters to be emitted, there is no reason to think we cannot
enhance/extend slog_emit_*() thing to take a format string (perhaps
inside the jw structure) so that the formatter does not have to
generate JSON at all.  Envisioning that kind of future, json_writer
is a misnomer that too narrowly defines what it is---it is merely a
generic data container that the codepath being traced can use to
communicate what needs to be logged to the outside world.
slog_emit* can (and when enhanced, should) be capable of paying
attention to an external input (e.g. environment variable) to switch
the output format, and JSON could be just one of the choices.

> It's not just sampling points. There's things like index id being
> shown in the message for example. I prefer to keep free style format
> to help me read. There's also things like indentation I do here to
> help me read.

Yup, I do not think that contradicts with the approach to have a
single unified "data collection" API; you should also be able to
specify how that collection of data is to be presented in the trace
messages meant for humans, which would be discarded when emitting
json but would be used when showing human-readble trace, no?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-14 20:52                                           ` Junio C Hamano
@ 2018-08-15 16:32                                             ` Duy Nguyen
  2018-08-15 18:28                                               ` Junio C Hamano
  0 siblings, 1 reply; 121+ messages in thread
From: Duy Nguyen @ 2018-08-15 16:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jeff Hostetler, Jeff King, Ben Peart,
	Git Mailing List, Ben Peart, Elijah Newren

On Tue, Aug 14, 2018 at 10:52 PM Junio C Hamano <gitster@pobox.com> wrote:
> > It's not just sampling points. There's things like index id being
> > shown in the message for example. I prefer to keep free style format
> > to help me read. There's also things like indentation I do here to
> > help me read.
>
> Yup, I do not think that contradicts with the approach to have a
> single unified "data collection" API; you should also be able to
> specify how that collection of data is to be presented in the trace
> messages meant for humans, which would be discarded when emitting
> json but would be used when showing human-readble trace, no?

Yes. As Peff also pointed out in another mail, as long as this
structured logging stuff does not stop me from manual trace messages
and don't force more work on me when I add new traces, I don't care if
it exists.
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree
  2018-08-13 18:58                               ` Ben Peart
@ 2018-08-15 16:38                                 ` Duy Nguyen
  0 siblings, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-15 16:38 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Jeff King,
	Elijah Newren

On Mon, Aug 13, 2018 at 8:58 PM Ben Peart <peartben@gmail.com> wrote:
> > +      *
> > +      * D/F conflicts and higher stage entries are not a concern
> > +      * because cache-tree would be invalidated and we would never
> > +      * get here in the first place.
> > +      */
> > +     for (i = 0; i < nr_entries; i++) {
> > +             struct cache_entry *tree_ce;
> > +             int len, rc;
> > +
> > +             src[0] = o->src_index->cache[pos + i];
> > +
> > +             len = ce_namelen(src[0]);
> > +             tree_ce = xcalloc(1, cache_entry_size(len));
> > +
> > +             tree_ce->ce_mode = src[0]->ce_mode;
> > +             tree_ce->ce_flags = create_ce_flags(0);
> > +             tree_ce->ce_namelen = len;
> > +             oidcpy(&tree_ce->oid, &src[0]->oid);
> > +             memcpy(tree_ce->name, src[0]->name, len + 1);
> > +
> > +             for (d = 1; d <= nr_names; d++)
> > +                     src[d] = tree_ce;
> > +
> > +             rc = call_unpack_fn((const struct cache_entry * const *)src, o);
>
> I don't fully understand why this is still necessary since "we detect
> that all trees are the same as cache-tree at this path."  I do know
> (because I tried it :)) that if we don't actually call the unpack
> function the patch fails a bunch of tests so clearly something important
> is being missed.

Yeah because removing this line assumes n-way logic, which most likely
means "use the index version if all trees are the same as the index"
but it's not necessarily true. There could be flags that make n-way
behave differently. And even if we make that assumption, we need to
copy src[0] to o->result (heh I tried that "skip call_unpack_fn" thing
too when I thought this would be the same as the diff-index --cached
optimization path, and only realized copying to o->result was needed
afterwards).
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v4 2/5] unpack-trees: add performance tracing
  2018-08-15 16:32                                             ` Duy Nguyen
@ 2018-08-15 18:28                                               ` Junio C Hamano
  0 siblings, 0 replies; 121+ messages in thread
From: Junio C Hamano @ 2018-08-15 18:28 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Stefan Beller, Jeff Hostetler, Jeff King, Ben Peart,
	Git Mailing List, Ben Peart, Elijah Newren

Duy Nguyen <pclouds@gmail.com> writes:

> On Tue, Aug 14, 2018 at 10:52 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > It's not just sampling points. There's things like index id being
>> > shown in the message for example. I prefer to keep free style format
>> > to help me read. There's also things like indentation I do here to
>> > help me read.
>>
>> Yup, I do not think that contradicts with the approach to have a
>> single unified "data collection" API; you should also be able to
>> specify how that collection of data is to be presented in the trace
>> messages meant for humans, which would be discarded when emitting
>> json but would be used when showing human-readble trace, no?
>
> Yes. As Peff also pointed out in another mail, as long as this
> structured logging stuff does not stop me from manual trace messages
> and don't force more work on me when I add new traces, I don't care if
> it exists.

I am hoping that we are on the same page, but just to make sure,
what I think we would want is to have just a single set of
annotations in the codepath, instead of "we can add annotations from
these two separate sets, and they do not interfere each other so I
do not care about what the other guy is doing".

IOW, I found it highly annoying having to resolve merges like
7234f27b ("Merge branch 'nd/unpack-trees-with-cache-tree' into pu",
2018-08-14), taking two topics that try to use different tracing
mechanisms in the same codepath.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v5 0/7] Speed up unpack_trees()
  2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
                                               ` (6 preceding siblings ...)
  2018-08-14 19:19                             ` Ben Peart
@ 2018-08-18 14:41                             ` Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
                                                 ` (8 more replies)
  7 siblings, 9 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

v5 fixes some minor comments from round 4 and a big mistake in 5/5.
Junio's scary feeling turns out true. There is a missing invalidation
in keep_entry() which is not added in 6/7. 7/7 makes sure that similar
problems will not slip through.

I had to rebase this series on top of 'master' because 7/7 caught a
bad cache-tree situation that has been fixed by Elijah in ad3762042a
(read-cache: fix directory/file conflict handling in
read_index_unmerged() - 2018-07-31). I believe the issue was we prime
cache-tree in 'git reset --hard' even though the index has conflicts.

Range-diff (before the rebase):

1:  a192faf79e ! 1:  ed8763726b trace.h: support nested performance tracing
    @@ -49,13 +49,16 @@
      	struct untracked_cache_dir *untracked;
     -	uint64_t start = getnanotime();
      
    - 	if (has_symlink_leading_path(path, len))
    +-	if (has_symlink_leading_path(path, len))
    ++	trace_performance_enter();
    ++
    ++	if (has_symlink_leading_path(path, len)) {
    ++		trace_performance_leave("read directory %.*s", len, path);
      		return dir->nr;
    ++	}
      
    -+	trace_performance_enter();
      	untracked = validate_untracked_cache(dir, len, pathspec);
      	if (!untracked)
    - 		/*
     @@
      		dir->nr = i;
      	}
2:  9afe7c488a = 2:  9b70652fa2 unpack-trees: add performance tracing
3:  74101edb60 ! 3:  8b3cfea623 unpack-trees: optimize walking same trees with cache-tree
    @@ -141,7 +141,7 @@
     +
     +	/*
     +	 * Do what unpack_callback() and unpack_nondirectories() normally
    -+	 * do. But we walk all paths recursively in just one loop instead.
    ++	 * do. But we walk all paths in an iterative loop instead.
     +	 *
     +	 * D/F conflicts and higher stage entries are not a concern
     +	 * because cache-tree would be invalidated and we would never
4:  9261c5920e = 4:  5af28d44ca unpack-trees: reduce malloc in cache-tree walk
5:  43fac1154f = 5:  5657c92fe9 unpack-trees: reuse (still valid) cache-tree from src_index
-:  ---------- > 6:  3b91783afc unpack-trees: add missing cache invalidation
-:  ---------- > 7:  0d5464c0dc cache-tree: verify valid cache-tree in the test suite

Nguyễn Thái Ngọc Duy (7):
  trace.h: support nested performance tracing
  unpack-trees: add performance tracing
  unpack-trees: optimize walking same trees with cache-tree
  unpack-trees: reduce malloc in cache-tree walk
  unpack-trees: reuse (still valid) cache-tree from src_index
  unpack-trees: add missing cache invalidation
  cache-tree: verify valid cache-tree in the test suite

 cache-tree.c    |  80 +++++++++++++++++++++++++
 cache-tree.h    |   1 +
 diff-lib.c      |   4 +-
 dir.c           |   9 ++-
 name-hash.c     |   4 +-
 preload-index.c |   4 +-
 read-cache.c    |  16 +++--
 t/test-lib.sh   |   6 ++
 trace.c         |  69 ++++++++++++++++++++--
 trace.h         |  15 +++++
 unpack-trees.c  | 154 +++++++++++++++++++++++++++++++++++++++++++++++-
 11 files changed, 340 insertions(+), 22 deletions(-)

-- 
2.18.0.1004.g6639190530


^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v5 1/7] trace.h: support nested performance tracing
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
@ 2018-08-18 14:41                               ` Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 2/7] unpack-trees: add " Nguyễn Thái Ngọc Duy
                                                 ` (7 subsequent siblings)
  8 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

Performance measurements are listed right now as a flat list, which is
fine when we measure big blocks. But when we start adding more and
more measurements, some of them could be just part of a bigger
measurement and a flat list gives a wrong impression that they are
executed at the same level instead of nested.

Add trace_performance_enter() and trace_performance_leave() to allow
indent these nested measurements. For now it does not help much
because the only nested thing is (lazy) name hash initialization
(e.g. called in diff-index from "git status"). This will help more
because I'm going to add some more tracing that's actually nested.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff-lib.c      |  4 +--
 dir.c           |  9 ++++---
 name-hash.c     |  4 +--
 preload-index.c |  4 +--
 read-cache.c    | 11 ++++----
 trace.c         | 69 ++++++++++++++++++++++++++++++++++++++++++++-----
 trace.h         | 15 +++++++++++
 7 files changed, 96 insertions(+), 20 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 732f684a49..d5bbb7ea50 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -518,11 +518,11 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
 	struct object_array_entry *ent;
-	uint64_t start = getnanotime();
 
 	if (revs->pending.nr != 1)
 		BUG("run_diff_index must be passed exactly one tree");
 
+	trace_performance_enter();
 	ent = revs->pending.objects;
 	if (diff_cache(revs, &ent->item->oid, ent->name, cached))
 		exit(128);
@@ -531,7 +531,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	diffcore_fix_diff_index(&revs->diffopt);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
-	trace_performance_since(start, "diff-index");
+	trace_performance_leave("diff-index");
 	return 0;
 }
 
diff --git a/dir.c b/dir.c
index 32f5f72759..18b57b94cc 100644
--- a/dir.c
+++ b/dir.c
@@ -2263,10 +2263,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
-	uint64_t start = getnanotime();
 
-	if (has_symlink_leading_path(path, len))
+	trace_performance_enter();
+
+	if (has_symlink_leading_path(path, len)) {
+		trace_performance_leave("read directory %.*s", len, path);
 		return dir->nr;
+	}
 
 	untracked = validate_untracked_cache(dir, len, pathspec);
 	if (!untracked)
@@ -2302,7 +2305,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		dir->nr = i;
 	}
 
-	trace_performance_since(start, "read directory %.*s", len, path);
+	trace_performance_leave("read directory %.*s", len, path);
 	if (dir->untracked) {
 		static int force_untracked_cache = -1;
 		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
diff --git a/name-hash.c b/name-hash.c
index 163849831c..1fcda73cb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -578,10 +578,10 @@ static void threaded_lazy_init_name_hash(
 
 static void lazy_init_name_hash(struct index_state *istate)
 {
-	uint64_t start = getnanotime();
 
 	if (istate->name_hash_initialized)
 		return;
+	trace_performance_enter();
 	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, dir_entry_cmp, NULL, istate->cache_nr);
 
@@ -602,7 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	}
 
 	istate->name_hash_initialized = 1;
-	trace_performance_since(start, "initialize name hash");
+	trace_performance_leave("initialize name hash");
 }
 
 /*
diff --git a/preload-index.c b/preload-index.c
index 4d08d44874..d7f7919ba2 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -78,7 +78,6 @@ static void preload_index(struct index_state *index,
 {
 	int threads, i, work, offset;
 	struct thread_data data[MAX_PARALLEL];
-	uint64_t start = getnanotime();
 
 	if (!core_preload_index)
 		return;
@@ -88,6 +87,7 @@ static void preload_index(struct index_state *index,
 		threads = 2;
 	if (threads < 2)
 		return;
+	trace_performance_enter();
 	if (threads > MAX_PARALLEL)
 		threads = MAX_PARALLEL;
 	offset = 0;
@@ -109,7 +109,7 @@ static void preload_index(struct index_state *index,
 		if (pthread_join(p->pthread, NULL))
 			die("unable to join threaded lstat");
 	}
-	trace_performance_since(start, "preload index");
+	trace_performance_leave("preload index");
 }
 #endif
 
diff --git a/read-cache.c b/read-cache.c
index c5fabc844a..1c9c88c130 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1476,8 +1476,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *typechange_fmt;
 	const char *added_fmt;
 	const char *unmerged_fmt;
-	uint64_t start = getnanotime();
 
+	trace_performance_enter();
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
 	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
@@ -1547,7 +1547,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 
 		replace_index_entry(istate, i, new_entry);
 	}
-	trace_performance_since(start, "refresh index");
+	trace_performance_leave("refresh index");
 	return has_errors;
 }
 
@@ -2002,7 +2002,6 @@ static void freshen_shared_index(const char *shared_index, int warn)
 int read_index_from(struct index_state *istate, const char *path,
 		    const char *gitdir)
 {
-	uint64_t start = getnanotime();
 	struct split_index *split_index;
 	int ret;
 	char *base_oid_hex;
@@ -2012,8 +2011,9 @@ int read_index_from(struct index_state *istate, const char *path,
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	trace_performance_enter();
 	ret = do_read_index(istate, path, 0);
-	trace_performance_since(start, "read cache %s", path);
+	trace_performance_leave("read cache %s", path);
 
 	split_index = istate->split_index;
 	if (!split_index || is_null_oid(&split_index->base_oid)) {
@@ -2021,6 +2021,7 @@ int read_index_from(struct index_state *istate, const char *path,
 		return ret;
 	}
 
+	trace_performance_enter();
 	if (split_index->base)
 		discard_index(split_index->base);
 	else
@@ -2037,8 +2038,8 @@ int read_index_from(struct index_state *istate, const char *path,
 	freshen_shared_index(base_path, 0);
 	merge_base_index(istate);
 	post_read_index_from(istate);
-	trace_performance_since(start, "read cache %s", base_path);
 	free(base_path);
+	trace_performance_leave("read cache %s", base_path);
 	return ret;
 }
 
diff --git a/trace.c b/trace.c
index fc623e91fd..fa4a2e7120 100644
--- a/trace.c
+++ b/trace.c
@@ -176,10 +176,30 @@ void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 	strbuf_release(&buf);
 }
 
+static uint64_t perf_start_times[10];
+static int perf_indent;
+
+uint64_t trace_performance_enter(void)
+{
+	uint64_t now;
+
+	if (!trace_want(&trace_perf_key))
+		return 0;
+
+	now = getnanotime();
+	perf_start_times[perf_indent] = now;
+	if (perf_indent + 1 < ARRAY_SIZE(perf_start_times))
+		perf_indent++;
+	else
+		BUG("Too deep indentation");
+	return now;
+}
+
 static void trace_performance_vprintf_fl(const char *file, int line,
 					 uint64_t nanos, const char *format,
 					 va_list ap)
 {
+	static const char space[] = "          ";
 	struct strbuf buf = STRBUF_INIT;
 
 	if (!prepare_trace_line(file, line, &trace_perf_key, &buf))
@@ -188,7 +208,10 @@ static void trace_performance_vprintf_fl(const char *file, int line,
 	strbuf_addf(&buf, "performance: %.9f s", (double) nanos / 1000000000);
 
 	if (format && *format) {
-		strbuf_addstr(&buf, ": ");
+		if (perf_indent >= strlen(space))
+			BUG("Too deep indentation");
+
+		strbuf_addf(&buf, ":%.*s ", perf_indent, space);
 		strbuf_vaddf(&buf, format, ap);
 	}
 
@@ -244,6 +267,24 @@ void trace_performance_since(uint64_t start, const char *format, ...)
 	va_end(ap);
 }
 
+void trace_performance_leave(const char *format, ...)
+{
+	va_list ap;
+	uint64_t since;
+
+	if (perf_indent)
+		perf_indent--;
+
+	if (!format) /* Allow callers to leave without tracing anything */
+		return;
+
+	since = perf_start_times[perf_indent];
+	va_start(ap, format);
+	trace_performance_vprintf_fl(NULL, 0, getnanotime() - since,
+				     format, ap);
+	va_end(ap);
+}
+
 #else
 
 void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
@@ -273,6 +314,24 @@ void trace_performance_fl(const char *file, int line, uint64_t nanos,
 	va_end(ap);
 }
 
+void trace_performance_leave_fl(const char *file, int line,
+				uint64_t nanos, const char *format, ...)
+{
+	va_list ap;
+	uint64_t since;
+
+	if (perf_indent)
+		perf_indent--;
+
+	if (!format) /* Allow callers to leave without tracing anything */
+		return;
+
+	since = perf_start_times[perf_indent];
+	va_start(ap, format);
+	trace_performance_vprintf_fl(file, line, nanos - since, format, ap);
+	va_end(ap);
+}
+
 #endif /* HAVE_VARIADIC_MACROS */
 
 
@@ -411,13 +470,11 @@ uint64_t getnanotime(void)
 	}
 }
 
-static uint64_t command_start_time;
 static struct strbuf command_line = STRBUF_INIT;
 
 static void print_command_performance_atexit(void)
 {
-	trace_performance_since(command_start_time, "git command:%s",
-				command_line.buf);
+	trace_performance_leave("git command:%s", command_line.buf);
 }
 
 void trace_command_performance(const char **argv)
@@ -425,10 +482,10 @@ void trace_command_performance(const char **argv)
 	if (!trace_want(&trace_perf_key))
 		return;
 
-	if (!command_start_time)
+	if (!command_line.len)
 		atexit(print_command_performance_atexit);
 
 	strbuf_reset(&command_line);
 	sq_quote_argv_pretty(&command_line, argv);
-	command_start_time = getnanotime();
+	trace_performance_enter();
 }
diff --git a/trace.h b/trace.h
index 2b6a1bc17c..171b256d26 100644
--- a/trace.h
+++ b/trace.h
@@ -23,6 +23,7 @@ extern void trace_disable(struct trace_key *key);
 extern uint64_t getnanotime(void);
 extern void trace_command_performance(const char **argv);
 extern void trace_verbatim(struct trace_key *key, const void *buf, unsigned len);
+uint64_t trace_performance_enter(void);
 
 #ifndef HAVE_VARIADIC_MACROS
 
@@ -45,6 +46,9 @@ extern void trace_performance(uint64_t nanos, const char *format, ...);
 __attribute__((format (printf, 2, 3)))
 extern void trace_performance_since(uint64_t start, const char *format, ...);
 
+__attribute__((format (printf, 1, 2)))
+void trace_performance_leave(const char *format, ...);
+
 #else
 
 /*
@@ -118,6 +122,14 @@ extern void trace_performance_since(uint64_t start, const char *format, ...);
 					     __VA_ARGS__);		    \
 	} while (0)
 
+#define trace_performance_leave(...)					    \
+	do {								    \
+		if (trace_pass_fl(&trace_perf_key))			    \
+			trace_performance_leave_fl(TRACE_CONTEXT, __LINE__, \
+						   getnanotime(),	    \
+						   __VA_ARGS__);	    \
+	} while (0)
+
 /* backend functions, use non-*fl macros instead */
 __attribute__((format (printf, 4, 5)))
 extern void trace_printf_key_fl(const char *file, int line, struct trace_key *key,
@@ -130,6 +142,9 @@ extern void trace_strbuf_fl(const char *file, int line, struct trace_key *key,
 __attribute__((format (printf, 4, 5)))
 extern void trace_performance_fl(const char *file, int line,
 				 uint64_t nanos, const char *fmt, ...);
+__attribute__((format (printf, 4, 5)))
+extern void trace_performance_leave_fl(const char *file, int line,
+				       uint64_t nanos, const char *fmt, ...);
 static inline int trace_pass_fl(struct trace_key *key)
 {
 	return key->fd || !key->initialized;
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v5 2/7] unpack-trees: add performance tracing
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
@ 2018-08-18 14:41                               ` Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
                                                 ` (6 subsequent siblings)
  8 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on webkit.git, 275k
files on worktree)

    performance: 0.056651714 s:  read cache .git/index
    performance: 0.183101080 s:  preload index
    performance: 0.008584433 s:  refresh index
    performance: 0.633767589 s:   traverse_trees
    performance: 0.340265448 s:   check_updates
    performance: 0.381884638 s:   cache_tree_update
    performance: 1.401562947 s:  unpack_trees
    performance: 0.338687914 s:  write index, changed mask = 2e
    performance: 0.411927922 s:    traverse_trees
    performance: 0.000023335 s:    check_updates
    performance: 0.423697246 s:   unpack_trees
    performance: 0.423708360 s:  diff-index
    performance: 2.559524127 s: git command: git checkout -

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 cache-tree.c   | 2 ++
 unpack-trees.c | 9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 181d5919f0..caafbff2ff 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -433,7 +433,9 @@ int cache_tree_update(struct index_state *istate, int flags)
 
 	if (i)
 		return i;
+	trace_performance_enter();
 	i = update_one(it, cache, entries, "", 0, &skip, flags);
+	trace_performance_leave("cache_tree_update");
 	if (i < 0)
 		return i;
 	istate->cache_changed |= CACHE_TREE_CHANGED;
diff --git a/unpack-trees.c b/unpack-trees.c
index f9efee0836..6d9f692ea6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -345,6 +345,7 @@ static int check_updates(struct unpack_trees_options *o)
 	struct checkout state = CHECKOUT_INIT;
 	int i;
 
+	trace_performance_enter();
 	state.force = 1;
 	state.quiet = 1;
 	state.refresh_cache = 1;
@@ -414,6 +415,7 @@ static int check_updates(struct unpack_trees_options *o)
 	errs |= finish_delayed_checkout(&state);
 	if (o->update)
 		git_attr_set_direction(GIT_ATTR_CHECKIN, NULL);
+	trace_performance_leave("check_updates");
 	return errs != 0;
 }
 
@@ -1285,6 +1287,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	if (len > MAX_UNPACK_TREES)
 		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
 
+	trace_performance_enter();
 	memset(&el, 0, sizeof(el));
 	if (!core_apply_sparse_checkout || !o->update)
 		o->skip_sparse_checkout = 1;
@@ -1357,7 +1360,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 			}
 		}
 
-		if (traverse_trees(len, t, &info) < 0)
+		trace_performance_enter();
+		ret = traverse_trees(len, t, &info);
+		trace_performance_leave("traverse_trees");
+		if (ret < 0)
 			goto return_failed;
 	}
 
@@ -1449,6 +1455,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->src_index = NULL;
 
 done:
+	trace_performance_leave("unpack_trees");
 	clear_exclude_list(&el);
 	return ret;
 
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 2/7] unpack-trees: add " Nguyễn Thái Ngọc Duy
@ 2018-08-18 14:41                               ` Nguyễn Thái Ngọc Duy
  2018-08-20 12:43                                 ` Ben Peart
  2018-08-18 14:41                               ` [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
                                                 ` (5 subsequent siblings)
  8 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".

The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.

"checkout -" with this patch on webkit.git (275k files):

    baseline      new
  --------------------------------------------------------------------
    0.056651714   0.080394752 s:  read cache .git/index
    0.183101080   0.216010838 s:  preload index
    0.008584433   0.008534301 s:  refresh index
    0.633767589   0.251992198 s:   traverse_trees
    0.340265448   0.377031383 s:   check_updates
    0.381884638   0.372768105 s:   cache_tree_update
    1.401562947   1.045887251 s:  unpack_trees
    0.338687914   0.314983512 s:  write index, changed mask = 2e
    0.411927922   0.062572653 s:    traverse_trees
    0.000023335   0.000022544 s:    check_updates
    0.423697246   0.073795585 s:   unpack_trees
    0.423708360   0.073807557 s:  diff-index
    2.559524127   1.938191592 s: git command: git checkout -

Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):

    baseline        new
  ----------------------------------------------------------------------
    0.535510167     0.556558733     s: read cache .git/index
    0.3057373       0.3147105       s: initialize name hash
    0.0184082       0.023558433     s: preload index
    0.086910967     0.089085967     s: refresh index
    7.889590767     2.191554433     s: unpack trees
    0.120760833     0.131941267     s: update worktree after a merge
    2.2583504       2.572663167     s: repair cache-tree
    0.8916137       0.959495233     s: write index, changed mask = 28
    3.405199233     0.2710663       s: unpack trees
    0.000999667     0.0021554       s: update worktree after a merge
    3.4063306       0.273318333     s: diff-index
    16.9524923      9.462943133     s: git command: git.exe checkout

This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.

PS. A note about cache-tree invalidation and the use of it in this
code.

We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?

The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 unpack-trees.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index 6d9f692ea6..8376663b59 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -635,6 +635,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
 	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
 }
 
+static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
+					struct name_entry *names,
+					struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int i;
+
+	if (!o->merge || dirmask != ((1 << n) - 1))
+		return 0;
+
+	for (i = 1; i < n; i++)
+		if (!are_same_oid(names, names + i))
+			return 0;
+
+	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
+}
+
+static int index_pos_by_traverse_info(struct name_entry *names,
+				      struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int len = traverse_path_len(info, names);
+	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
+	int pos;
+
+	make_traverse_path(name, info, names);
+	name[len++] = '/';
+	name[len] = '\0';
+	pos = index_name_pos(o->src_index, name, len);
+	if (pos >= 0)
+		BUG("This is a directory and should not exist in index");
+	pos = -pos - 1;
+	if (!starts_with(o->src_index->cache[pos]->name, name) ||
+	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
+		BUG("pos must point at the first entry in this directory");
+	free(name);
+	return pos;
+}
+
+/*
+ * Fast path if we detect that all trees are the same as cache-tree at this
+ * path. We'll walk these trees recursively using cache-tree/index instead of
+ * ODB since already know what these trees contain.
+ */
+static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
+				  struct name_entry *names,
+				  struct traverse_info *info)
+{
+	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
+	struct unpack_trees_options *o = info->data;
+	int i, d;
+
+	if (!o->merge)
+		BUG("We need cache-tree to do this optimization");
+
+	/*
+	 * Do what unpack_callback() and unpack_nondirectories() normally
+	 * do. But we walk all paths in an iterative loop instead.
+	 *
+	 * D/F conflicts and higher stage entries are not a concern
+	 * because cache-tree would be invalidated and we would never
+	 * get here in the first place.
+	 */
+	for (i = 0; i < nr_entries; i++) {
+		struct cache_entry *tree_ce;
+		int len, rc;
+
+		src[0] = o->src_index->cache[pos + i];
+
+		len = ce_namelen(src[0]);
+		tree_ce = xcalloc(1, cache_entry_size(len));
+
+		tree_ce->ce_mode = src[0]->ce_mode;
+		tree_ce->ce_flags = create_ce_flags(0);
+		tree_ce->ce_namelen = len;
+		oidcpy(&tree_ce->oid, &src[0]->oid);
+		memcpy(tree_ce->name, src[0]->name, len + 1);
+
+		for (d = 1; d <= nr_names; d++)
+			src[d] = tree_ce;
+
+		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
+		free(tree_ce);
+		if (rc < 0)
+			return rc;
+
+		mark_ce_used(src[0], o);
+	}
+	if (o->debug_unpack)
+		printf("Unpacked %d entries from %s to %s using cache-tree\n",
+		       nr_entries,
+		       o->src_index->cache[pos]->name,
+		       o->src_index->cache[pos + nr_entries - 1]->name);
+	return 0;
+}
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
 				    unsigned long df_conflicts,
 				    struct name_entry *names,
@@ -646,6 +742,27 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	void *buf[MAX_UNPACK_TREES];
 	struct traverse_info newinfo;
 	struct name_entry *p;
+	int nr_entries;
+
+	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
+	if (nr_entries > 0) {
+		struct unpack_trees_options *o = info->data;
+		int pos = index_pos_by_traverse_info(names, info);
+
+		if (!o->merge || df_conflicts)
+			BUG("Wrong condition to get here buddy");
+
+		/*
+		 * All entries up to 'pos' must have been processed
+		 * (i.e. marked CE_UNPACKED) at this point. But to be safe,
+		 * save and restore cache_bottom anyway to not miss
+		 * unprocessed entries before 'pos'.
+		 */
+		bottom = o->cache_bottom;
+		ret = traverse_by_cache_tree(pos, nr_entries, n, names, info);
+		o->cache_bottom = bottom;
+		return ret;
+	}
 
 	p = names;
 	while (!p->mode)
@@ -812,6 +929,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	return ce;
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this function
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_nondirectories(int n, unsigned long mask,
 				 unsigned long dirmask,
 				 struct cache_entry **src,
@@ -1004,6 +1126,11 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+/*
+ * Note that traverse_by_cache_tree() duplicates some logic in this function
+ * without actually calling it. If you change the logic here you may need to
+ * check and change there as well.
+ */
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
                                                 ` (2 preceding siblings ...)
  2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-18 14:41                               ` Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
                                                 ` (4 subsequent siblings)
  8 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

This is a micro optimization that probably only shines on repos with
deep directory structure. Instead of allocating and freeing a new
cache_entry in every iteration, we reuse the last one and only update
the parts that are new each iteration.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 unpack-trees.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 8376663b59..dbef6e1b8a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -685,6 +685,8 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
 	struct unpack_trees_options *o = info->data;
+	struct cache_entry *tree_ce = NULL;
+	int ce_len = 0;
 	int i, d;
 
 	if (!o->merge)
@@ -699,30 +701,39 @@ static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 	 * get here in the first place.
 	 */
 	for (i = 0; i < nr_entries; i++) {
-		struct cache_entry *tree_ce;
-		int len, rc;
+		int new_ce_len, len, rc;
 
 		src[0] = o->src_index->cache[pos + i];
 
 		len = ce_namelen(src[0]);
-		tree_ce = xcalloc(1, cache_entry_size(len));
+		new_ce_len = cache_entry_size(len);
+
+		if (new_ce_len > ce_len) {
+			new_ce_len <<= 1;
+			tree_ce = xrealloc(tree_ce, new_ce_len);
+			memset(tree_ce, 0, new_ce_len);
+			ce_len = new_ce_len;
+
+			tree_ce->ce_flags = create_ce_flags(0);
+
+			for (d = 1; d <= nr_names; d++)
+				src[d] = tree_ce;
+		}
 
 		tree_ce->ce_mode = src[0]->ce_mode;
-		tree_ce->ce_flags = create_ce_flags(0);
 		tree_ce->ce_namelen = len;
 		oidcpy(&tree_ce->oid, &src[0]->oid);
 		memcpy(tree_ce->name, src[0]->name, len + 1);
 
-		for (d = 1; d <= nr_names; d++)
-			src[d] = tree_ce;
-
 		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
-		free(tree_ce);
-		if (rc < 0)
+		if (rc < 0) {
+			free(tree_ce);
 			return rc;
+		}
 
 		mark_ce_used(src[0], o);
 	}
+	free(tree_ce);
 	if (o->debug_unpack)
 		printf("Unpacked %d entries from %s to %s using cache-tree\n",
 		       nr_entries,
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
                                                 ` (3 preceding siblings ...)
  2018-08-18 14:41                               ` [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
@ 2018-08-18 14:41                               ` Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 6/7] unpack-trees: add missing cache invalidation Nguyễn Thái Ngọc Duy
                                                 ` (3 subsequent siblings)
  8 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

We do n-way merge by walking the source index and n trees at the same
time and add merge results to a new temporary index called o->result.
The merge result for any given path could be either

- keep_entry(): same old index entry in o->src_index is reused
- merged_entry(): either a new entry is added, or an existing one updated
- deleted_entry(): one entry from o->src_index is removed

For some reason [1] we keep making sure that the source index's
cache-tree is still valid if used by o->result: for all those
merged/deleted entries, we invalidate the same path in o->src_index,
so only cache-trees covering the "keep_entry" parts remain good.

Because of this, the cache-tree from o->src_index can be perfectly
reused in o->result. And in fact we already rely on this logic to
reuse untracked cache in edf3b90553 (unpack-trees: preserve index
extensions - 2017-05-08). Move the cache-tree to o->result before
doing cache_tree_update() to reduce hashing cost.

Since cache_tree_update() has risen up as one of the most expensive
parts in unpack_trees() after the last few patches. This does help
reduce unpack_trees() time significantly (on webkit.git):

    before       after
  --------------------------------------------------------------------
    0.080394752  0.051258167 s:  read cache .git/index
    0.216010838  0.212106298 s:  preload index
    0.008534301  0.280521764 s:  refresh index
    0.251992198  0.218160442 s:   traverse_trees
    0.377031383  0.374948191 s:   check_updates
    0.372768105  0.037040114 s:   cache_tree_update
    1.045887251  0.672031609 s:  unpack_trees
    0.314983512  0.317456290 s:  write index, changed mask = 2e
    0.062572653  0.038382654 s:    traverse_trees
    0.000022544  0.000042731 s:    check_updates
    0.073795585  0.050930053 s:   unpack_trees
    0.073807557  0.051099735 s:  diff-index
    1.938191592  1.614241153 s: git command: git checkout -

[1] I'm pretty sure the reason is an oversight in 34110cd4e3 (Make
    'unpack_trees()' have a separate source and destination index -
    2008-03-06). That patch aims to _not_ update the source index at
    all. The invalidation should have been done on o->result in that
    patch. But then there was no cache-tree on o->result even then so
    it's pointless to do so.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 read-cache.c   | 2 ++
 unpack-trees.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 1c9c88c130..5ce40f39b3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2940,6 +2940,8 @@ void move_index_extensions(struct index_state *dst, struct index_state *src)
 {
 	dst->untracked = src->untracked;
 	src->untracked = NULL;
+	dst->cache_tree = src->cache_tree;
+	src->cache_tree = NULL;
 }
 
 struct cache_entry *dup_cache_entry(const struct cache_entry *ce,
diff --git a/unpack-trees.c b/unpack-trees.c
index dbef6e1b8a..aa80b65ee1 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1576,6 +1576,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	ret = check_updates(o) ? (-2) : 0;
 	if (o->dst_index) {
+		move_index_extensions(&o->result, o->src_index);
 		if (!ret) {
 			if (!o->result.cache_tree)
 				o->result.cache_tree = cache_tree();
@@ -1584,7 +1585,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
-		move_index_extensions(&o->result, o->src_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v5 6/7] unpack-trees: add missing cache invalidation
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
                                                 ` (4 preceding siblings ...)
  2018-08-18 14:41                               ` [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
@ 2018-08-18 14:41                               ` Nguyễn Thái Ngọc Duy
  2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
                                                 ` (2 subsequent siblings)
  8 siblings, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

Any changes to the output index should be (confusingly) marked in the
source index with invalidate_ce_path(). This is used to make sure we
still have valid untracked cache and cache-tree extensions in the end.

We do a pretty good job of invalidating except in two places.
verify_clean_subdirectory() is part of verify_absent() and
verify_absent_sparse(). The former is usually called by merged_entry()
or directly in threeway_merge(). The latter is obviously used by
sparse checkout.

In these three call sites, only merged_entry() follows up with
invalidate_ce_path(). The other two don't, but they should not trigger
this ce removal because this is about D/F conflicts [1]. But let's be
safe and invalidate_ce_path() here as well.

The second place is keep_entry() which is also used by threeway_merge()
to keep higher stage entries. In order to reuse cache-tree we need to
invalidate these paths as well. It's not a problem in the past because
whenever a higher stage entry is present, cache-tree will not be
created [2]. Now we salvage cache-tree even when higher stage entries
are present, we need more invalidation.

[1] c81935348b (Fix switching to a branch with D/F when current branch
    has file D. - 2007-03-15)

[2] This is probably too strict. We should be able to create and save
    cache-tree for the directories that do not have conflict entries
    in cache_tree_update(). And this becomes more important when
    cache-tree plays bigger role in terms of performance.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 unpack-trees.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/unpack-trees.c b/unpack-trees.c
index aa80b65ee1..bc43922922 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1774,6 +1774,7 @@ static int verify_clean_subdirectory(const struct cache_entry *ce,
 			if (verify_uptodate(ce2, o))
 				return -1;
 			add_entry(o, ce2, CE_REMOVE, 0);
+			invalidate_ce_path(ce, o);
 			mark_ce_used(ce2, o);
 		}
 		cnt++;
@@ -2033,6 +2034,8 @@ static int keep_entry(const struct cache_entry *ce,
 		      struct unpack_trees_options *o)
 {
 	add_entry(o, ce, 0, 0);
+	if (ce_stage(ce))
+		invalidate_ce_path(ce, o);
 	return 1;
 }
 
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
                                                 ` (5 preceding siblings ...)
  2018-08-18 14:41                               ` [PATCH v5 6/7] unpack-trees: add missing cache invalidation Nguyễn Thái Ngọc Duy
@ 2018-08-18 14:41                               ` Nguyễn Thái Ngọc Duy
  2018-08-18 21:45                                 ` Elijah Newren
  2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
  2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
  8 siblings, 1 reply; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-18 14:41 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, peff

This makes sure that cache-tree is consistent with the index. The main
purpose is to catch potential problems by saving the index in
unpack_trees() but the line in write_index() would also help spot
missing invalidation in other code.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache-tree.c   | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++
 cache-tree.h   |  1 +
 read-cache.c   |  3 ++
 t/test-lib.sh  |  6 ++++
 unpack-trees.c |  2 ++
 5 files changed, 90 insertions(+)

diff --git a/cache-tree.c b/cache-tree.c
index caafbff2ff..c3c206427c 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -4,6 +4,7 @@
 #include "tree-walk.h"
 #include "cache-tree.h"
 #include "object-store.h"
+#include "replace-object.h"
 
 #ifndef DEBUG
 #define DEBUG 0
@@ -732,3 +733,80 @@ int update_main_cache_tree(int flags)
 		the_index.cache_tree = cache_tree();
 	return cache_tree_update(&the_index, flags);
 }
+
+static void verify_one(struct index_state *istate,
+		       struct cache_tree *it,
+		       struct strbuf *path)
+{
+	int i, pos, len = path->len;
+	struct strbuf tree_buf = STRBUF_INIT;
+	struct object_id new_oid;
+
+	for (i = 0; i < it->subtree_nr; i++) {
+		strbuf_addf(path, "%s/", it->down[i]->name);
+		verify_one(istate, it->down[i]->cache_tree, path);
+		strbuf_setlen(path, len);
+	}
+
+	if (it->entry_count < 0 ||
+	    /* no verification on tests (t7003) that replace trees */
+	    lookup_replace_object(the_repository, &it->oid) != &it->oid)
+		return;
+
+	if (path->len) {
+		pos = index_name_pos(istate, path->buf, path->len);
+		pos = -pos - 1;
+	} else {
+		pos = 0;
+	}
+
+	i = 0;
+	while (i < it->entry_count) {
+		struct cache_entry *ce = istate->cache[pos + i];
+		const char *slash;
+		struct cache_tree_sub *sub = NULL;
+		const struct object_id *oid;
+		const char *name;
+		unsigned mode;
+		int entlen;
+
+		if (ce->ce_flags & (CE_STAGEMASK | CE_INTENT_TO_ADD | CE_REMOVE))
+			BUG("%s with flags 0x%x should not be in cache-tree",
+			    ce->name, ce->ce_flags);
+		name = ce->name + path->len;
+		slash = strchr(name, '/');
+		if (slash) {
+			entlen = slash - name;
+			sub = find_subtree(it, ce->name + path->len, entlen, 0);
+			if (!sub || sub->cache_tree->entry_count < 0)
+				BUG("bad subtree '%.*s'", entlen, name);
+			oid = &sub->cache_tree->oid;
+			mode = S_IFDIR;
+			i += sub->cache_tree->entry_count;
+		} else {
+			oid = &ce->oid;
+			mode = ce->ce_mode;
+			entlen = ce_namelen(ce) - path->len;
+			i++;
+		}
+		strbuf_addf(&tree_buf, "%o %.*s%c", mode, entlen, name, '\0');
+		strbuf_add(&tree_buf, oid->hash, the_hash_algo->rawsz);
+	}
+	hash_object_file(tree_buf.buf, tree_buf.len, tree_type, &new_oid);
+	if (oidcmp(&new_oid, &it->oid))
+		BUG("cache-tree for path %.*s does not match. "
+		    "Expected %s got %s", len, path->buf,
+		    oid_to_hex(&new_oid), oid_to_hex(&it->oid));
+	strbuf_setlen(path, len);
+	strbuf_release(&tree_buf);
+}
+
+void cache_tree_verify(struct index_state *istate)
+{
+	struct strbuf path = STRBUF_INIT;
+
+	if (!istate->cache_tree)
+		return;
+	verify_one(istate, istate->cache_tree, &path);
+	strbuf_release(&path);
+}
diff --git a/cache-tree.h b/cache-tree.h
index 9799e894f7..c1fde531f9 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -32,6 +32,7 @@ struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
 
 int cache_tree_fully_valid(struct cache_tree *);
 int cache_tree_update(struct index_state *, int);
+void cache_tree_verify(struct index_state *);
 
 int update_main_cache_tree(int);
 
diff --git a/read-cache.c b/read-cache.c
index 5ce40f39b3..41f313bc9e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2744,6 +2744,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 	int new_shared_index, ret;
 	struct split_index *si = istate->split_index;
 
+	if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
+		cache_tree_verify(istate);
+
 	if ((flags & SKIP_IF_UNCHANGED) && !istate->cache_changed) {
 		if (flags & COMMIT_LOCK)
 			rollback_lock_file(lock);
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 78f7097746..5b50f6e2e6 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1083,6 +1083,12 @@ else
 	test_set_prereq C_LOCALE_OUTPUT
 fi
 
+if test -z "$GIT_TEST_CHECK_CACHE_TREE"
+then
+	GIT_TEST_CHECK_CACHE_TREE=true
+	export GIT_TEST_CHECK_CACHE_TREE
+fi
+
 test_lazy_prereq PIPE '
 	# test whether the filesystem supports FIFOs
 	test_have_prereq !MINGW,!CYGWIN &&
diff --git a/unpack-trees.c b/unpack-trees.c
index bc43922922..3394540842 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1578,6 +1578,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	if (o->dst_index) {
 		move_index_extensions(&o->result, o->src_index);
 		if (!ret) {
+			if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
+				cache_tree_verify(&o->result);
 			if (!o->result.cache_tree)
 				o->result.cache_tree = cache_tree();
 			if (!cache_tree_fully_valid(o->result.cache_tree))
-- 
2.18.0.1004.g6639190530


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite
  2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
@ 2018-08-18 21:45                                 ` Elijah Newren
  0 siblings, 0 replies; 121+ messages in thread
From: Elijah Newren @ 2018-08-18 21:45 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Sat, Aug 18, 2018 at 7:41 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
...
> diff --git a/read-cache.c b/read-cache.c
> index 5ce40f39b3..41f313bc9e 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2744,6 +2744,9 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
>         int new_shared_index, ret;
>         struct split_index *si = istate->split_index;
>
> +       if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
> +               cache_tree_verify(istate);
> +
>         if ((flags & SKIP_IF_UNCHANGED) && !istate->cache_changed) {
>                 if (flags & COMMIT_LOCK)
>                         rollback_lock_file(lock);
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 78f7097746..5b50f6e2e6 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1083,6 +1083,12 @@ else
>         test_set_prereq C_LOCALE_OUTPUT
>  fi
>
> +if test -z "$GIT_TEST_CHECK_CACHE_TREE"
> +then
> +       GIT_TEST_CHECK_CACHE_TREE=true
> +       export GIT_TEST_CHECK_CACHE_TREE
> +fi
> +
>  test_lazy_prereq PIPE '
>         # test whether the filesystem supports FIFOs
>         test_have_prereq !MINGW,!CYGWIN &&
> diff --git a/unpack-trees.c b/unpack-trees.c
> index bc43922922..3394540842 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1578,6 +1578,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>         if (o->dst_index) {
>                 move_index_extensions(&o->result, o->src_index);
>                 if (!ret) {
> +                       if (git_env_bool("GIT_TEST_CHECK_CACHE_TREE", 0))
> +                               cache_tree_verify(&o->result);
>                         if (!o->result.cache_tree)
>                                 o->result.cache_tree = cache_tree();
>                         if (!cache_tree_fully_valid(o->result.cache_tree))
> --
> 2.18.0.1004.g6639190530

Should documentation of GIT_TEST_CHECK_CACHE_TREE be added in
t/README, int the "Running tests with special setups" section?

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v5 0/7] Speed up unpack_trees()
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
                                                 ` (6 preceding siblings ...)
  2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
@ 2018-08-18 22:01                               ` Elijah Newren
  2018-08-19  5:09                                 ` Duy Nguyen
  2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
  8 siblings, 1 reply; 121+ messages in thread
From: Elijah Newren @ 2018-08-18 22:01 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Sat, Aug 18, 2018 at 7:41 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> v5 fixes some minor comments from round 4 and a big mistake in 5/5.
> Junio's scary feeling turns out true. There is a missing invalidation
> in keep_entry() which is not added in 6/7. 7/7 makes sure that similar

I'm having trouble parsing this.  Did you mean "...which is now
added..."?  Also, if 6/7 represents a fix to the "big mistake in 5/5",
why is 6/7 separate from 5/7 instead of squashed in?

> problems will not slip through.
>
> I had to rebase this series on top of 'master' because 7/7 caught a
> bad cache-tree situation that has been fixed by Elijah in ad3762042a

Cool, glad that helped.

...
> Nguyễn Thái Ngọc Duy (7):
>   trace.h: support nested performance tracing
>   unpack-trees: add performance tracing
>   unpack-trees: optimize walking same trees with cache-tree
>   unpack-trees: reduce malloc in cache-tree walk
>   unpack-trees: reuse (still valid) cache-tree from src_index
>   unpack-trees: add missing cache invalidation
>   cache-tree: verify valid cache-tree in the test suite

I read through the new series and only had one small comment.  I'm not
up to speed on cache-tree stuff, still, so don't feel qualified to
give an Ack on it.

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v5 0/7] Speed up unpack_trees()
  2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
@ 2018-08-19  5:09                                 ` Duy Nguyen
  0 siblings, 0 replies; 121+ messages in thread
From: Duy Nguyen @ 2018-08-19  5:09 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Ben Peart, Jeff King

On Sun, Aug 19, 2018 at 12:01 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Sat, Aug 18, 2018 at 7:41 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> >
> > v5 fixes some minor comments from round 4 and a big mistake in 5/5.
> > Junio's scary feeling turns out true. There is a missing invalidation
> > in keep_entry() which is not added in 6/7. 7/7 makes sure that similar
>
> I'm having trouble parsing this.  Did you mean "...which is now
> added..."?

Oops. Yes.

>  Also, if 6/7 represents a fix to the "big mistake in 5/5",
> why is 6/7 separate from 5/7 instead of squashed in?

I felt that was cramming up too much in the commit message. But if
it's the right thing to do, I'll reroll and combine 5/7 and 6/7 .
-- 
Duy

^ permalink raw reply	[flat|nested] 121+ messages in thread

* Re: [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree
  2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-20 12:43                                 ` Ben Peart
  0 siblings, 0 replies; 121+ messages in thread
From: Ben Peart @ 2018-08-20 12:43 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben.Peart, git, gitster, newren, peff



On 8/18/2018 10:41 AM, Nguyễn Thái Ngọc Duy wrote:
> In order to merge one or many trees with the index, unpack-trees code
> walks multiple trees in parallel with the index and performs n-way
> merge. If we find out at start of a directory that all trees are the
> same (by comparing OID) and cache-tree happens to be available for
> that directory as well, we could avoid walking the trees because we
> already know what these trees contain: it's flattened in what's called
> "the index".
> 
> The upside is of course a lot less I/O since we can potentially skip
> lots of trees (think subtrees). We also save CPU because we don't have
> to inflate and apply the deltas. The downside is of course more
> fragile code since the logic in some functions are now duplicated
> elsewhere.
> 
> "checkout -" with this patch on webkit.git (275k files):
> 
>      baseline      new
>    --------------------------------------------------------------------
>      0.056651714   0.080394752 s:  read cache .git/index
>      0.183101080   0.216010838 s:  preload index
>      0.008584433   0.008534301 s:  refresh index
>      0.633767589   0.251992198 s:   traverse_trees
>      0.340265448   0.377031383 s:   check_updates
>      0.381884638   0.372768105 s:   cache_tree_update
>      1.401562947   1.045887251 s:  unpack_trees
>      0.338687914   0.314983512 s:  write index, changed mask = 2e
>      0.411927922   0.062572653 s:    traverse_trees
>      0.000023335   0.000022544 s:    check_updates
>      0.423697246   0.073795585 s:   unpack_trees
>      0.423708360   0.073807557 s:  diff-index
>      2.559524127   1.938191592 s: git command: git checkout -
> 
> Another measurement from Ben's running "git checkout" with over 500k
> trees (on the whole series):
> 
>      baseline        new
>    ----------------------------------------------------------------------
>      0.535510167     0.556558733     s: read cache .git/index
>      0.3057373       0.3147105       s: initialize name hash
>      0.0184082       0.023558433     s: preload index
>      0.086910967     0.089085967     s: refresh index
>      7.889590767     2.191554433     s: unpack trees
>      0.120760833     0.131941267     s: update worktree after a merge
>      2.2583504       2.572663167     s: repair cache-tree
>      0.8916137       0.959495233     s: write index, changed mask = 28
>      3.405199233     0.2710663       s: unpack trees
>      0.000999667     0.0021554       s: update worktree after a merge
>      3.4063306       0.273318333     s: diff-index
>      16.9524923      9.462943133     s: git command: git.exe checkout
> 
> This command calls unpack_trees() twice, the first time on 2way merge
> and the second 1way merge. In both times, "unpack trees" time is
> reduced to one third. Overall time reduction is not that impressive of
> course because index operations take a big chunk. And there's that
> repair cache-tree line.
> 
> PS. A note about cache-tree invalidation and the use of it in this
> code.
> 
> We do invalidate cache-tree in _source_ index when we add new entries
> to the (temporary) "result" index. But we also use the cache-tree from
> source index in this optimization. Does this mean we end up having no
> cache-tree in the source index to activate this optimization?
> 
> The answer is twisted: the order of finding a good cache-tree and
> invalidating it matters. In this case we check for a good cache-tree
> first in all_trees_same_as_cache_tree(), then we start to merge things
> and potentially invalidate that same cache-tree in the process. Since
> cache-tree invalidation happens after the optimization kicks in, we're
> still good. But we may lose that cache-tree at the very first
> call_unpack_fn() call in traverse_by_cache_tree().
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>   unpack-trees.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 127 insertions(+)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 6d9f692ea6..8376663b59 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -635,6 +635,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
>   	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
>   }
>   
> +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> +					struct name_entry *names,
> +					struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int i;
> +
> +	if (!o->merge || dirmask != ((1 << n) - 1))
> +		return 0;
> +
> +	for (i = 1; i < n; i++)
> +		if (!are_same_oid(names, names + i))
> +			return 0;
> +
> +	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> +}
> +
> +static int index_pos_by_traverse_info(struct name_entry *names,
> +				      struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int len = traverse_path_len(info, names);
> +	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
> +	int pos;
> +
> +	make_traverse_path(name, info, names);
> +	name[len++] = '/';
> +	name[len] = '\0';
> +	pos = index_name_pos(o->src_index, name, len);
> +	if (pos >= 0)
> +		BUG("This is a directory and should not exist in index");
> +	pos = -pos - 1;
> +	if (!starts_with(o->src_index->cache[pos]->name, name) ||
> +	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
> +		BUG("pos must point at the first entry in this directory");
> +	free(name);
> +	return pos;
> +}
> +
> +/*
> + * Fast path if we detect that all trees are the same as cache-tree at this
> + * path. We'll walk these trees recursively using cache-tree/index instead of

nit, not worth a re-roll

"We'll walk these trees in an iterative loop using cache-tree/index..."

> + * ODB since already know what these trees contain.
> + */
> +static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> +				  struct name_entry *names,
> +				  struct traverse_info *info)
> +{
> +	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> +	struct unpack_trees_options *o = info->data;
> +	int i, d;
> +
> +	if (!o->merge)
> +		BUG("We need cache-tree to do this optimization");
> +
> +	/*
> +	 * Do what unpack_callback() and unpack_nondirectories() normally
> +	 * do. But we walk all paths in an iterative loop instead.
> +	 *
> +	 * D/F conflicts and higher stage entries are not a concern
> +	 * because cache-tree would be invalidated and we would never
> +	 * get here in the first place.
> +	 */
> +	for (i = 0; i < nr_entries; i++) {
> +		struct cache_entry *tree_ce;
> +		int len, rc;
> +
> +		src[0] = o->src_index->cache[pos + i];
> +
> +		len = ce_namelen(src[0]);
> +		tree_ce = xcalloc(1, cache_entry_size(len));
> +
> +		tree_ce->ce_mode = src[0]->ce_mode;
> +		tree_ce->ce_flags = create_ce_flags(0);
> +		tree_ce->ce_namelen = len;
> +		oidcpy(&tree_ce->oid, &src[0]->oid);
> +		memcpy(tree_ce->name, src[0]->name, len + 1);
> +
> +		for (d = 1; d <= nr_names; d++)
> +			src[d] = tree_ce;
> +
> +		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> +		free(tree_ce);
> +		if (rc < 0)
> +			return rc;
> +
> +		mark_ce_used(src[0], o);
> +	}
> +	if (o->debug_unpack)
> +		printf("Unpacked %d entries from %s to %s using cache-tree\n",
> +		       nr_entries,
> +		       o->src_index->cache[pos]->name,
> +		       o->src_index->cache[pos + nr_entries - 1]->name);
> +	return 0;
> +}
> +
>   static int traverse_trees_recursive(int n, unsigned long dirmask,
>   				    unsigned long df_conflicts,
>   				    struct name_entry *names,
> @@ -646,6 +742,27 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>   	void *buf[MAX_UNPACK_TREES];
>   	struct traverse_info newinfo;
>   	struct name_entry *p;
> +	int nr_entries;
> +
> +	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
> +	if (nr_entries > 0) {
> +		struct unpack_trees_options *o = info->data;
> +		int pos = index_pos_by_traverse_info(names, info);
> +
> +		if (!o->merge || df_conflicts)
> +			BUG("Wrong condition to get here buddy");
> +
> +		/*
> +		 * All entries up to 'pos' must have been processed
> +		 * (i.e. marked CE_UNPACKED) at this point. But to be safe,
> +		 * save and restore cache_bottom anyway to not miss
> +		 * unprocessed entries before 'pos'.
> +		 */
> +		bottom = o->cache_bottom;
> +		ret = traverse_by_cache_tree(pos, nr_entries, n, names, info);
> +		o->cache_bottom = bottom;
> +		return ret;
> +	}
>   
>   	p = names;
>   	while (!p->mode)
> @@ -812,6 +929,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
>   	return ce;
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this function
> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_nondirectories(int n, unsigned long mask,
>   				 unsigned long dirmask,
>   				 struct cache_entry **src,
> @@ -1004,6 +1126,11 @@ static void debug_unpack_callback(int n,
>   		debug_name_entry(i, names + i);
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this function
> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
>   {
>   	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> 

^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH] Document update for nd/unpack-trees-with-cache-tree
  2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
                                                 ` (7 preceding siblings ...)
  2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
@ 2018-08-25 12:18                               ` Nguyễn Thái Ngọc Duy
  2018-08-25 12:31                                 ` Martin Ågren
  2018-08-25 13:02                                 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  8 siblings, 2 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-25 12:18 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben

Fix an incorrect comment in the new code added in b4da37380b
(unpack-trees: optimize walking same trees with cache-tree -
2018-08-18) and document about the new test variable that is enabled
by default in test-lib.sh in 4592e6080f (cache-tree: verify valid
cache-tree in the test suite - 2018-08-18)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 On top of nd/unpack-trees-with-cache-tree. Incremental update since
 this topic has entered 'next'

 t/README       | 4 ++++
 unpack-trees.c | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/README b/t/README
index 8373a27fea..0e7cc23734 100644
--- a/t/README
+++ b/t/README
@@ -315,6 +315,10 @@ packs on demand. This normally only happens when the object size is
 over 2GB. This variable forces the code path on any object larger than
 <n> bytes.
 
+GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES=<boolean> checks that cache-tree
+records are valid when the index is written out or after a merge. This
+is mostly to catch missing invalidation. Default is true.
+
 Naming Tests
 ------------
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 3394540842..5a18f36143 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -676,8 +676,8 @@ static int index_pos_by_traverse_info(struct name_entry *names,
 
 /*
  * Fast path if we detect that all trees are the same as cache-tree at this
- * path. We'll walk these trees recursively using cache-tree/index instead of
- * ODB since already know what these trees contain.
+ * path. We'll walk these trees in an iteractive loop using cache-tree/index
+ * instead of ODB since already know what these trees contain.
  */
 static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 				  struct name_entry *names,
-- 
2.19.0.rc0.337.ge906d732e7


^ permalink raw reply related	[flat|nested] 121+ messages in thread

* Re: [PATCH] Document update for nd/unpack-trees-with-cache-tree
  2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
@ 2018-08-25 12:31                                 ` Martin Ågren
  2018-08-25 13:02                                 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  1 sibling, 0 replies; 121+ messages in thread
From: Martin Ågren @ 2018-08-25 12:31 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Ben Peart, Git Mailing List, Junio C Hamano, Elijah Newren,
	Ben Peart

On Sat, 25 Aug 2018 at 14:22, Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>   * Fast path if we detect that all trees are the same as cache-tree at this
> - * path. We'll walk these trees recursively using cache-tree/index instead of
> - * ODB since already know what these trees contain.
> + * path. We'll walk these trees in an iteractive loop using cache-tree/index
> + * instead of ODB since already know what these trees contain.

s/iteractive/iterative/ (i.e., drop "c")

Not new, but still: s/already/we already/

Martin

^ permalink raw reply	[flat|nested] 121+ messages in thread

* [PATCH v2] Document update for nd/unpack-trees-with-cache-tree
  2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
  2018-08-25 12:31                                 ` Martin Ågren
@ 2018-08-25 13:02                                 ` Nguyễn Thái Ngọc Duy
  1 sibling, 0 replies; 121+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-08-25 13:02 UTC (permalink / raw)
  To: pclouds; +Cc: Ben.Peart, git, gitster, newren, peartben, Martin Ågren

Fix an incorrect comment in the new code added in b4da37380b
(unpack-trees: optimize walking same trees with cache-tree -
2018-08-18) and document about the new test variable that is enabled
by default in test-lib.sh in 4592e6080f (cache-tree: verify valid
cache-tree in the test suite - 2018-08-18)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Some more typo fix, found by Martin.

 t/README       | 4 ++++
 unpack-trees.c | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/README b/t/README
index 8373a27fea..0e7cc23734 100644
--- a/t/README
+++ b/t/README
@@ -315,6 +315,10 @@ packs on demand. This normally only happens when the object size is
 over 2GB. This variable forces the code path on any object larger than
 <n> bytes.
 
+GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES=<boolean> checks that cache-tree
+records are valid when the index is written out or after a merge. This
+is mostly to catch missing invalidation. Default is true.
+
 Naming Tests
 ------------
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 3394540842..515c374373 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -676,8 +676,8 @@ static int index_pos_by_traverse_info(struct name_entry *names,
 
 /*
  * Fast path if we detect that all trees are the same as cache-tree at this
- * path. We'll walk these trees recursively using cache-tree/index instead of
- * ODB since already know what these trees contain.
+ * path. We'll walk these trees in an iterative loop using cache-tree/index
+ * instead of ODB since we already know what these trees contain.
  */
 static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
 				  struct name_entry *names,
-- 
2.19.0.rc0.337.ge906d732e7


^ permalink raw reply related	[flat|nested] 121+ messages in thread

end of thread, other threads:[~2018-08-25 13:02 UTC | newest]

Thread overview: 121+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
2018-07-18 20:57   ` Stefan Beller
2018-07-19 19:11   ` Junio C Hamano
2018-07-18 20:45 ` [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees() Ben Peart
2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
2018-07-18 22:56   ` Junio C Hamano
2018-07-18 21:02 ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Stefan Beller
2018-07-18 21:34 ` Jeff King
2018-07-23 15:48   ` Ben Peart
2018-07-23 17:03     ` Duy Nguyen
2018-07-23 20:51       ` Ben Peart
2018-07-24  4:20         ` Jeff King
2018-07-24 15:33           ` Duy Nguyen
2018-07-25 20:56             ` Ben Peart
2018-07-26  5:30               ` Duy Nguyen
2018-07-26 16:30                 ` Duy Nguyen
2018-07-26 19:40                   ` Junio C Hamano
2018-07-27 15:42                     ` Duy Nguyen
2018-07-27 16:22                       ` Ben Peart
2018-07-27 18:00                         ` Duy Nguyen
2018-07-27 17:14                       ` Junio C Hamano
2018-07-27 17:52                         ` Duy Nguyen
2018-07-29  6:24                           ` Duy Nguyen
2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
2018-07-30 20:16                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-07-30 20:52                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-07-30 20:58                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:46                           ` Elijah Newren
2018-08-10 16:39                             ` Duy Nguyen
2018-08-10 18:39                               ` Elijah Newren
2018-08-10 19:30                                 ` Duy Nguyen
2018-08-10 19:40                                   ` Elijah Newren
2018-08-10 19:48                                     ` Duy Nguyen
2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
2018-07-31 15:31                           ` Duy Nguyen
2018-07-31 16:50                             ` Ben Peart
2018-07-31 17:31                               ` Ben Peart
2018-08-01 16:38                                 ` Duy Nguyen
2018-08-08 20:53                                   ` Ben Peart
2018-08-09  8:16                                     ` Ben Peart
2018-08-10 16:08                                       ` Duy Nguyen
2018-08-10 15:51                                     ` Duy Nguyen
2018-07-30 21:04                         ` Ben Peart
2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:23                             ` Elijah Newren
2018-08-10 16:29                               ` Duy Nguyen
2018-08-10 18:48                                 ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-08 18:30                             ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
2018-08-06 15:59                             ` Duy Nguyen
2018-08-06 18:59                               ` Junio C Hamano
2018-08-08 17:00                                 ` Ben Peart
2018-08-08 17:46                               ` Junio C Hamano
2018-08-08 18:12                                 ` Junio C Hamano
2018-08-08 18:39                                   ` Junio C Hamano
2018-08-10 16:53                                     ` Duy Nguyen
2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-13 18:39                               ` Ben Peart
2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-12 10:05                               ` Thomas Adam
2018-08-13 18:50                                 ` Junio C Hamano
2018-08-13 18:44                               ` Ben Peart
2018-08-13 19:25                               ` Jeff King
2018-08-13 19:36                                 ` Stefan Beller
2018-08-13 20:11                                   ` Ben Peart
2018-08-13 19:52                                 ` Duy Nguyen
2018-08-13 21:47                                   ` Jeff King
2018-08-13 22:41                                 ` Junio C Hamano
2018-08-14 18:19                                   ` Jeff Hostetler
2018-08-14 18:32                                     ` Duy Nguyen
2018-08-14 18:44                                       ` Stefan Beller
2018-08-14 18:51                                         ` Duy Nguyen
2018-08-14 19:54                                           ` Jeff King
2018-08-14 20:52                                           ` Junio C Hamano
2018-08-15 16:32                                             ` Duy Nguyen
2018-08-15 18:28                                               ` Junio C Hamano
2018-08-14 20:14                                         ` Jeff Hostetler
2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-13 18:58                               ` Ben Peart
2018-08-15 16:38                                 ` Duy Nguyen
2018-08-12  8:15                             ` [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-13 15:48                               ` Elijah Newren
2018-08-13 15:57                                 ` Duy Nguyen
2018-08-13 16:05                                 ` Ben Peart
2018-08-13 16:25                                   ` Duy Nguyen
2018-08-13 17:15                                     ` Ben Peart
2018-08-13 19:01                             ` [PATCH v4 0/5] Speed up unpack_trees() Junio C Hamano
2018-08-14 19:19                             ` Ben Peart
2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 2/7] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-20 12:43                                 ` Ben Peart
2018-08-18 14:41                               ` [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 6/7] unpack-trees: add missing cache invalidation Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
2018-08-18 21:45                                 ` Elijah Newren
2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
2018-08-19  5:09                                 ` Duy Nguyen
2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
2018-08-25 12:31                                 ` Martin Ågren
2018-08-25 13:02                                 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-27 15:50                     ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-26 16:35               ` Duy Nguyen
2018-07-24  5:54         ` Junio C Hamano
2018-07-24 15:13         ` Duy Nguyen
2018-07-24 21:21           ` Jeff King
2018-07-25 16:09           ` Ben Peart
2018-07-24  4:27       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).