git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Ben Peart <peartben@gmail.com>
Cc: Jeff King <peff@peff.net>, Ben Peart <Ben.Peart@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
Date: Thu, 26 Jul 2018 18:30:49 +0200	[thread overview]
Message-ID: <20180726163049.GA15572@duynguyen.home> (raw)
In-Reply-To: <CACsJy8AWcHVYNBZGRUTdcg8FmwOGz3MSUHH+3uVSGrg6MMZMng@mail.gmail.com>

On Thu, Jul 26, 2018 at 07:30:20AM +0200, Duy Nguyen wrote:
> Let's get back to two-way merge. I suggest you read the two-way merge
> in git-read-tree.txt. That table could give you a pretty good idea
> what's going on. twoway_merge() will be given a tuple of three entries
> (I, H, M) of the same path name, for every path. I think what we need
> is determine the condition where the outcome is known in advance, so
> that we can just skip walking the index for one directory. One of the
> checks we could do quickly is I==M or I==H (using cache-tree) and H==M
> (using tree hash).
> 
> The first obvious cases that we can optimize are
> 
> clean (H==M)
>        ------
>      14 yes                 exists   exists   keep index
>      15 no                  exists   exists   keep index
> 
> In other words if we know H==M, there's no much we need to do since
> we're keeping the index the same. But you don't really know how many
> entries are in this directory where H==M. You would need cache-tree
> for that, so in reality it's I==H==M.
> 
> The "clean" column is what fsmonitor comes in, though I'm not sure if
> it's actually needed. I haven't checked how '-u' flag works.
> 
> There's two other cases that we can also optimize, though I think it's
> less likely to happen:
> 
>         clean I==H  I==M (H!=M)
>        ------------------
>      18 yes   no    yes     exists   exists   keep index
>      19 no    no    yes     exists   exists   keep index
> 
> Some other cases where I==H can benefit from the generic tree walk
> optimization above since we can skip parsing H.

I'm excited so I decided to try out anyway. This is what I've come up
with. Switching trees on git.git shows it could skip plenty entries,
so promising. It's ugly and it fails at t6020 though, there's still
work ahead. But I think it'll stop here.

A few notes after getting my hands dirty

- one big difference between diff --cached and checkout is, diff is a
  read-only operation while checkout actually creates new index.  One
  of the side effect is that cache-tree may be destroyed while we're
  walking the trees, i'm not so sure.

- I don't think we even need to a special twoway_merge_same()
  here. That function could just call twoway_merge() with the right
  "src" parameter and the outcome should still be the same. Which
  means it'll work for threeway merge too.

- i'm still scared of that cache_bottom switching to death. no idea
  how it works or if i broke anything by changing the condition there.

-- 8< --
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 28627650cd..276712af64 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -515,6 +515,7 @@ static int merge_working_tree(const struct checkout_opts *opts,
 		topts.gently = opts->merge && old_branch_info->commit;
 		topts.verbose_update = opts->show_progress;
 		topts.fn = twoway_merge;
+		topts.fn_same = twoway_merge_same;
 		if (opts->overwrite_ignore) {
 			topts.dir = xcalloc(1, sizeof(*topts.dir));
 			topts.dir->flags |= DIR_SHOW_IGNORED;
diff --git a/diff-lib.c b/diff-lib.c
index a9f38eb5a3..48e6c4ab0d 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -485,6 +485,15 @@ static int oneway_diff(const struct cache_entry * const *src,
 	return 0;
 }
 
+static int oneway_diff_cached(int pos, int nr, struct unpack_trees_options *options)
+{
+	/*
+	 * Nothing to do. Unpack-trees can safely skip the whole
+	 * nr_matches cache entries.
+	 */
+	return 0;
+}
+
 static int diff_cache(struct rev_info *revs,
 		      const struct object_id *tree_oid,
 		      const char *tree_name,
@@ -501,8 +510,8 @@ static int diff_cache(struct rev_info *revs,
 	memset(&opts, 0, sizeof(opts));
 	opts.head_idx = 1;
 	opts.index_only = cached;
-	opts.diff_index_cached = (cached &&
-				  !revs->diffopt.flags.find_copies_harder);
+	if (cached && !revs->diffopt.flags.find_copies_harder)
+		opts.fn_same = oneway_diff_cached;
 	opts.merge = 1;
 	opts.fn = oneway_diff;
 	opts.unpack_data = revs;
diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..01e3f38807 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -615,7 +615,7 @@ static void restore_cache_bottom(struct traverse_info *info, int bottom)
 {
 	struct unpack_trees_options *o = info->data;
 
-	if (o->diff_index_cached)
+	if (o->fn_same)
 		return;
 	o->cache_bottom = bottom;
 }
@@ -625,7 +625,7 @@ static int switch_cache_bottom(struct traverse_info *info)
 	struct unpack_trees_options *o = info->data;
 	int ret, pos;
 
-	if (o->diff_index_cached)
+	if (o->fn_same)
 		return 0;
 	ret = o->cache_bottom;
 	pos = find_cache_pos(info->prev, &info->name);
@@ -996,6 +996,43 @@ static void debug_unpack_callback(int n,
 		debug_name_entry(i, names + i);
 }
 
+static int skip_dir(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
+{
+	struct unpack_trees_options *o = info->data;
+	int i, matches;
+	int len;
+	char *name;
+	int pos;
+
+	if (dirmask != ((1 << n) - 1) || !S_ISDIR(names->mode))
+		return 0;
+
+	for (i = 1; i < n; i++)
+		if (oidcmp(names[0].oid, names[i].oid))
+			return 0;
+
+	matches = cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
+	if (!matches)
+		return 0;
+
+	/*
+	 * Everything under the name matches; skip the entire
+	 * hierarchy. fn_same must special cases D/F conflicts in such
+	 * a way that it does not do any look-ahead, so this is safe.
+	 */
+	len = traverse_path_len(info, names);
+	name = xmalloc(len + 1);
+
+	make_traverse_path(name, info, names);
+	pos = index_name_pos(o->src_index, name, len);
+	if (pos >= 0)
+		die("NOOO");
+	trace_printf("dirmask = %lx, path = %s\n", dirmask, name);
+	if (o->fn_same(-pos-1, matches, o))
+		matches = 0;
+	free(name);
+	return matches;
+}
 static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
 {
 	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
@@ -1015,7 +1052,7 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 			int cmp;
 			struct cache_entry *ce;
 
-			if (o->diff_index_cached)
+			if (o->fn_same)
 				ce = next_cache_entry(o);
 			else
 				ce = find_cache_entry(info, p);
@@ -1059,18 +1096,8 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 
 	/* Now handle any directories.. */
 	if (dirmask) {
-		/* special case: "diff-index --cached" looking at a tree */
-		if (o->diff_index_cached &&
-		    n == 1 && dirmask == 1 && S_ISDIR(names->mode)) {
-			int matches;
-			matches = cache_tree_matches_traversal(o->src_index->cache_tree,
-							       names, info);
-			/*
-			 * Everything under the name matches; skip the
-			 * entire hierarchy.  diff_index_cached codepath
-			 * special cases D/F conflicts in such a way that
-			 * it does not do any look-ahead, so this is safe.
-			 */
+		if (o->fn_same) {
+			int matches = skip_dir(n, mask, dirmask, names, info);
 			if (matches) {
 				o->cache_bottom += matches;
 				return mask;
@@ -1881,6 +1908,7 @@ static int deleted_entry(const struct cache_entry *ce,
 static int keep_entry(const struct cache_entry *ce,
 		      struct unpack_trees_options *o)
 {
+	trace_printf("keep_entry(%s)\n", ce->name);
 	add_entry(o, ce, 0, 0);
 	return 1;
 }
@@ -2132,6 +2160,25 @@ int twoway_merge(const struct cache_entry * const *src,
 	return deleted_entry(oldtree, current, o);
 }
 
+int twoway_merge_same(int pos, int nr, struct unpack_trees_options *o)
+{
+	int i;
+
+	/*
+	 * Since cache-tree at "src" exists, it means there's no
+	 * staged entries here (they would have invalidated cache-tree
+	 * otherwise). So no CE_CONFLICTED.
+	 *
+	 * And because I==H==M, we can't run into d/f conflicts
+	 * either: for every path name, we will always find a _file_
+	 * in the index as well as the two other trees.
+	 */
+	trace_printf("Skipping %d entries\n", nr);
+	for (i = 0; i < nr; i++)
+		keep_entry(o->src_index->cache[pos + i], o);
+	return 0;
+}
+
 /*
  * Bind merge.
  *
diff --git a/unpack-trees.h b/unpack-trees.h
index c2b434c606..45c69e2ed0 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -12,6 +12,9 @@ struct exclude_list;
 typedef int (*merge_fn_t)(const struct cache_entry * const *src,
 		struct unpack_trees_options *options);
 
+typedef int (*merge_same_fn_t)(int pos, int nr,
+			       struct unpack_trees_options *options);
+
 enum unpack_trees_error_types {
 	ERROR_WOULD_OVERWRITE = 0,
 	ERROR_NOT_UPTODATE_FILE,
@@ -49,7 +52,6 @@ struct unpack_trees_options {
 		     aggressive,
 		     skip_unmerged,
 		     initial_checkout,
-		     diff_index_cached,
 		     debug_unpack,
 		     skip_sparse_checkout,
 		     gently,
@@ -61,6 +63,7 @@ struct unpack_trees_options {
 	struct dir_struct *dir;
 	struct pathspec *pathspec;
 	merge_fn_t fn;
+	merge_same_fn_t fn_same;
 	const char *msgs[NB_UNPACK_TREES_ERROR_TYPES];
 	struct argv_array msgs_to_free;
 	/*
@@ -92,6 +95,8 @@ int threeway_merge(const struct cache_entry * const *stages,
 		   struct unpack_trees_options *o);
 int twoway_merge(const struct cache_entry * const *src,
 		 struct unpack_trees_options *o);
+int twoway_merge_same(int pos, int nr,
+		      struct unpack_trees_options *o);
 int bind_merge(const struct cache_entry * const *src,
 	       struct unpack_trees_options *o);
 int oneway_merge(const struct cache_entry * const *src,
-- 8< --

  reply	other threads:[~2018-07-26 16:30 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
2018-07-18 20:57   ` Stefan Beller
2018-07-19 19:11   ` Junio C Hamano
2018-07-18 20:45 ` [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees() Ben Peart
2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
2018-07-18 22:56   ` Junio C Hamano
2018-07-18 21:02 ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Stefan Beller
2018-07-18 21:34 ` Jeff King
2018-07-23 15:48   ` Ben Peart
2018-07-23 17:03     ` Duy Nguyen
2018-07-23 20:51       ` Ben Peart
2018-07-24  4:20         ` Jeff King
2018-07-24 15:33           ` Duy Nguyen
2018-07-25 20:56             ` Ben Peart
2018-07-26  5:30               ` Duy Nguyen
2018-07-26 16:30                 ` Duy Nguyen [this message]
2018-07-26 19:40                   ` Junio C Hamano
2018-07-27 15:42                     ` Duy Nguyen
2018-07-27 16:22                       ` Ben Peart
2018-07-27 18:00                         ` Duy Nguyen
2018-07-27 17:14                       ` Junio C Hamano
2018-07-27 17:52                         ` Duy Nguyen
2018-07-29  6:24                           ` Duy Nguyen
2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
2018-07-30 20:16                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-07-30 20:52                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-07-30 20:58                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:46                           ` Elijah Newren
2018-08-10 16:39                             ` Duy Nguyen
2018-08-10 18:39                               ` Elijah Newren
2018-08-10 19:30                                 ` Duy Nguyen
2018-08-10 19:40                                   ` Elijah Newren
2018-08-10 19:48                                     ` Duy Nguyen
2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
2018-07-31 15:31                           ` Duy Nguyen
2018-07-31 16:50                             ` Ben Peart
2018-07-31 17:31                               ` Ben Peart
2018-08-01 16:38                                 ` Duy Nguyen
2018-08-08 20:53                                   ` Ben Peart
2018-08-09  8:16                                     ` Ben Peart
2018-08-10 16:08                                       ` Duy Nguyen
2018-08-10 15:51                                     ` Duy Nguyen
2018-07-30 21:04                         ` Ben Peart
2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:23                             ` Elijah Newren
2018-08-10 16:29                               ` Duy Nguyen
2018-08-10 18:48                                 ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-08 18:30                             ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
2018-08-06 15:59                             ` Duy Nguyen
2018-08-06 18:59                               ` Junio C Hamano
2018-08-08 17:00                                 ` Ben Peart
2018-08-08 17:46                               ` Junio C Hamano
2018-08-08 18:12                                 ` Junio C Hamano
2018-08-08 18:39                                   ` Junio C Hamano
2018-08-10 16:53                                     ` Duy Nguyen
2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-13 18:39                               ` Ben Peart
2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-12 10:05                               ` Thomas Adam
2018-08-13 18:50                                 ` Junio C Hamano
2018-08-13 18:44                               ` Ben Peart
2018-08-13 19:25                               ` Jeff King
2018-08-13 19:36                                 ` Stefan Beller
2018-08-13 20:11                                   ` Ben Peart
2018-08-13 19:52                                 ` Duy Nguyen
2018-08-13 21:47                                   ` Jeff King
2018-08-13 22:41                                 ` Junio C Hamano
2018-08-14 18:19                                   ` Jeff Hostetler
2018-08-14 18:32                                     ` Duy Nguyen
2018-08-14 18:44                                       ` Stefan Beller
2018-08-14 18:51                                         ` Duy Nguyen
2018-08-14 19:54                                           ` Jeff King
2018-08-14 20:52                                           ` Junio C Hamano
2018-08-15 16:32                                             ` Duy Nguyen
2018-08-15 18:28                                               ` Junio C Hamano
2018-08-14 20:14                                         ` Jeff Hostetler
2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-13 18:58                               ` Ben Peart
2018-08-15 16:38                                 ` Duy Nguyen
2018-08-12  8:15                             ` [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-13 15:48                               ` Elijah Newren
2018-08-13 15:57                                 ` Duy Nguyen
2018-08-13 16:05                                 ` Ben Peart
2018-08-13 16:25                                   ` Duy Nguyen
2018-08-13 17:15                                     ` Ben Peart
2018-08-13 19:01                             ` [PATCH v4 0/5] Speed up unpack_trees() Junio C Hamano
2018-08-14 19:19                             ` Ben Peart
2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 2/7] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-20 12:43                                 ` Ben Peart
2018-08-18 14:41                               ` [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 6/7] unpack-trees: add missing cache invalidation Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
2018-08-18 21:45                                 ` Elijah Newren
2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
2018-08-19  5:09                                 ` Duy Nguyen
2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
2018-08-25 12:31                                 ` Martin Ågren
2018-08-25 13:02                                 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-27 15:50                     ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-26 16:35               ` Duy Nguyen
2018-07-24  5:54         ` Junio C Hamano
2018-07-24 15:13         ` Duy Nguyen
2018-07-24 21:21           ` Jeff King
2018-07-25 16:09           ` Ben Peart
2018-07-24  4:27       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180726163049.GA15572@duynguyen.home \
    --to=pclouds@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peartben@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).