git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: Ben.Peart@microsoft.com, git@vger.kernel.org, gitster@pobox.com,
	peff@peff.net
Subject: Re: [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree
Date: Mon, 30 Jul 2018 16:52:54 -0400	[thread overview]
Message-ID: <2b97c84e-0634-3879-18a1-8f96a7b4a30d@gmail.com> (raw)
In-Reply-To: <20180729103306.16403-3-pclouds@gmail.com>



On 7/29/2018 6:33 AM, Nguyễn Thái Ngọc Duy wrote:
> From: Duy Nguyen <pclouds@gmail.com>
> 
> In order to merge one or many trees with the index, unpack-trees code
> walks multiple trees in parallel with the index and performs n-way
> merge. If we find out at start of a directory that all trees are the
> same (by comparing OID) and cache-tree happens to be available for
> that directory as well, we could avoid walking the trees because we
> already know what these trees contain: it's flattened in what's called
> "the index".
> 
> The upside is of course a lot less I/O since we can potentially skip
> lots of trees (think subtrees). We also save CPU because we don't have
> to inflate and the apply deltas. The downside is of course more
> fragile code since the logic in some functions are now duplicated
> elsewhere.
> 
> "checkout -" with this patch on gcc.git:
> 
>      baseline      new
>    --------------------------------------------------------------------
>      0.018239226   0.019365414 s: read cache .git/index
>      0.052541655   0.049605548 s: preload index
>      0.001537598   0.001571695 s: refresh index
>      0.168167768   0.049677212 s: unpack trees
>      0.002897186   0.002845256 s: update worktree after a merge
>      0.131661745   0.136597522 s: repair cache-tree
>      0.075389117   0.075422517 s: write index, changed mask = 2a
>      0.111702023   0.032813253 s: unpack trees
>      0.000023245   0.000022002 s: update worktree after a merge
>      0.111793866   0.032933140 s: diff-index
>      0.587933288   0.398924370 s: git command: /home/pclouds/w/git/git
> 
> This command calls unpack_trees() twice, the first time on 2way merge
> and the second 1way merge. In both times, "unpack trees" time is
> reduced to one third. Overall time reduction is not that impressive of
> course because index operations take a big chunk. And there's that
> repair cache-tree line.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   unpack-trees.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index dc58d1f5ae..39566b28fb 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -644,6 +644,102 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
>   	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
>   }
>   
> +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> +					struct name_entry *names,
> +					struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int i;
> +
> +	if (!o->merge || dirmask != ((1 << n) - 1))
> +		return 0;
> +
> +	for (i = 1; i < n; i++)
> +		if (!are_same_oid(names, names + i))
> +			return 0;
> +
> +	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> +}
> +
> +static int index_pos_by_traverse_info(struct name_entry *names,
> +				      struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int len = traverse_path_len(info, names);
> +	char *name = xmalloc(len + 1 /* slash */ + 1 /* NUL */);
> +	int pos;
> +
> +	make_traverse_path(name, info, names);
> +	name[len++] = '/';
> +	name[len] = '\0';
> +	pos = index_name_pos(o->src_index, name, len);
> +	if (pos >= 0)
> +		BUG("This is a directory and should not exist in index");
> +	pos = -pos - 1;
> +	if (!starts_with(o->src_index->cache[pos]->name, name) ||
> +	    (pos > 0 && starts_with(o->src_index->cache[pos-1]->name, name)))
> +		BUG("pos must point at the first entry in this directory");
> +	free(name);
> +	return pos;
> +}
> +
> +/*
> + * Fast path if we detect that all trees are the same as cache-tree at this
> + * path. We'll walk these trees recursively using cache-tree/index instead of
> + * ODB since already know what these trees contain.
> + */
> +static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> +				  struct name_entry *names,
> +				  struct traverse_info *info)
> +{
> +	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> +	struct unpack_trees_options *o = info->data;
> +	int i, d;
> +
> +	if (!o->merge)
> +		BUG("We need cache-tree to do this optimization");
> +
> +	/*
> +	 * Do what unpack_callback() and unpack_nondirectories() normally
> +	 * do. But we walk all paths recursively in just one loop instead.
> +	 *
> +	 * D/F conflicts and staged entries are not a concern because
> +	 * cache-tree would be invalidated and we would never get here
> +	 * in the first place.
> +	 */
> +	for (i = 0; i < nr_entries; i++) {
> +		struct cache_entry *tree_ce;
> +		int len, rc;
> +
> +		src[0] = o->src_index->cache[pos + i];
> +
> +		len = ce_namelen(src[0]);
> +		tree_ce = xcalloc(1, cache_entry_size(len));
> +
> +		tree_ce->ce_mode = src[0]->ce_mode;
> +		tree_ce->ce_flags = create_ce_flags(0);
> +		tree_ce->ce_namelen = len;
> +		oidcpy(&tree_ce->oid, &src[0]->oid);
> +		memcpy(tree_ce->name, src[0]->name, len + 1);
> +

I don't like the overhead of having to create an entirely new cache 
entry here but I see you clean this up in the next patch.

> +		for (d = 1; d <= nr_names; d++)
> +			src[d] = tree_ce;
> +
> +		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> +		free(tree_ce);
> +		if (rc < 0)
> +			return rc;
> +
> +		mark_ce_used(src[0], o);
> +	}
> +	if (o->debug_unpack)
> +		printf("Unpacked %d entries from %s to %s using cache-tree\n",
> +		       nr_entries,
> +		       o->src_index->cache[pos]->name,
> +		       o->src_index->cache[pos + nr_entries - 1]->name);
> +	return 0;
> +}
> +
>   static int traverse_trees_recursive(int n, unsigned long dirmask,
>   				    unsigned long df_conflicts,
>   				    struct name_entry *names,
> @@ -655,6 +751,17 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>   	void *buf[MAX_UNPACK_TREES];
>   	struct traverse_info newinfo;
>   	struct name_entry *p;
> +	int nr_entries;
> +
> +	nr_entries = all_trees_same_as_cache_tree(n, dirmask, names, info);
> +	if (nr_entries > 0) {
> +		struct unpack_trees_options *o = info->data;
> +		int pos = index_pos_by_traverse_info(names, info);
> +
> +		if (!o->merge || df_conflicts)
> +			BUG("Wrong condition to get here buddy");
> +		return traverse_by_cache_tree(pos, nr_entries, n, names, info);
> +	}
>   
>   	p = names;
>   	while (!p->mode)
> @@ -814,6 +921,11 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, con
>   	return ce;
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this funciton

s/funciton/function

> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_nondirectories(int n, unsigned long mask,
>   				 unsigned long dirmask,
>   				 struct cache_entry **src,
> @@ -998,6 +1110,11 @@ static void debug_unpack_callback(int n,
>   		debug_name_entry(i, names + i);
>   }
>   
> +/*
> + * Note that traverse_by_cache_tree() duplicates some logic in this funciton
> + * without actually calling it. If you change the logic here you may need to
> + * check and change there as well.
> + */
>   static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
>   {
>   	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> @@ -1280,7 +1397,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   	uint64_t start = getnanotime();
>   
>   	if (len > MAX_UNPACK_TREES)
> -		die("unpack_trees takes at most %d trees", MAX_UNPACK_TREES);
> +		die(_("unpack_trees takes at most %d trees"), MAX_UNPACK_TREES);
>   

I'd like to see this get in independently of this patch series.

>   	memset(&el, 0, sizeof(el));
>   	if (!core_apply_sparse_checkout || !o->update)
> 

  reply	other threads:[~2018-07-30 20:53 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
2018-07-18 20:57   ` Stefan Beller
2018-07-19 19:11   ` Junio C Hamano
2018-07-18 20:45 ` [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees() Ben Peart
2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
2018-07-18 22:56   ` Junio C Hamano
2018-07-18 21:02 ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Stefan Beller
2018-07-18 21:34 ` Jeff King
2018-07-23 15:48   ` Ben Peart
2018-07-23 17:03     ` Duy Nguyen
2018-07-23 20:51       ` Ben Peart
2018-07-24  4:20         ` Jeff King
2018-07-24 15:33           ` Duy Nguyen
2018-07-25 20:56             ` Ben Peart
2018-07-26  5:30               ` Duy Nguyen
2018-07-26 16:30                 ` Duy Nguyen
2018-07-26 19:40                   ` Junio C Hamano
2018-07-27 15:42                     ` Duy Nguyen
2018-07-27 16:22                       ` Ben Peart
2018-07-27 18:00                         ` Duy Nguyen
2018-07-27 17:14                       ` Junio C Hamano
2018-07-27 17:52                         ` Duy Nguyen
2018-07-29  6:24                           ` Duy Nguyen
2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
2018-07-30 20:16                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-07-30 20:52                           ` Ben Peart [this message]
2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-07-30 20:58                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:46                           ` Elijah Newren
2018-08-10 16:39                             ` Duy Nguyen
2018-08-10 18:39                               ` Elijah Newren
2018-08-10 19:30                                 ` Duy Nguyen
2018-08-10 19:40                                   ` Elijah Newren
2018-08-10 19:48                                     ` Duy Nguyen
2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
2018-07-31 15:31                           ` Duy Nguyen
2018-07-31 16:50                             ` Ben Peart
2018-07-31 17:31                               ` Ben Peart
2018-08-01 16:38                                 ` Duy Nguyen
2018-08-08 20:53                                   ` Ben Peart
2018-08-09  8:16                                     ` Ben Peart
2018-08-10 16:08                                       ` Duy Nguyen
2018-08-10 15:51                                     ` Duy Nguyen
2018-07-30 21:04                         ` Ben Peart
2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:23                             ` Elijah Newren
2018-08-10 16:29                               ` Duy Nguyen
2018-08-10 18:48                                 ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-08 18:30                             ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
2018-08-06 15:59                             ` Duy Nguyen
2018-08-06 18:59                               ` Junio C Hamano
2018-08-08 17:00                                 ` Ben Peart
2018-08-08 17:46                               ` Junio C Hamano
2018-08-08 18:12                                 ` Junio C Hamano
2018-08-08 18:39                                   ` Junio C Hamano
2018-08-10 16:53                                     ` Duy Nguyen
2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-13 18:39                               ` Ben Peart
2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-12 10:05                               ` Thomas Adam
2018-08-13 18:50                                 ` Junio C Hamano
2018-08-13 18:44                               ` Ben Peart
2018-08-13 19:25                               ` Jeff King
2018-08-13 19:36                                 ` Stefan Beller
2018-08-13 20:11                                   ` Ben Peart
2018-08-13 19:52                                 ` Duy Nguyen
2018-08-13 21:47                                   ` Jeff King
2018-08-13 22:41                                 ` Junio C Hamano
2018-08-14 18:19                                   ` Jeff Hostetler
2018-08-14 18:32                                     ` Duy Nguyen
2018-08-14 18:44                                       ` Stefan Beller
2018-08-14 18:51                                         ` Duy Nguyen
2018-08-14 19:54                                           ` Jeff King
2018-08-14 20:52                                           ` Junio C Hamano
2018-08-15 16:32                                             ` Duy Nguyen
2018-08-15 18:28                                               ` Junio C Hamano
2018-08-14 20:14                                         ` Jeff Hostetler
2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-13 18:58                               ` Ben Peart
2018-08-15 16:38                                 ` Duy Nguyen
2018-08-12  8:15                             ` [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-13 15:48                               ` Elijah Newren
2018-08-13 15:57                                 ` Duy Nguyen
2018-08-13 16:05                                 ` Ben Peart
2018-08-13 16:25                                   ` Duy Nguyen
2018-08-13 17:15                                     ` Ben Peart
2018-08-13 19:01                             ` [PATCH v4 0/5] Speed up unpack_trees() Junio C Hamano
2018-08-14 19:19                             ` Ben Peart
2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 2/7] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-20 12:43                                 ` Ben Peart
2018-08-18 14:41                               ` [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 6/7] unpack-trees: add missing cache invalidation Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
2018-08-18 21:45                                 ` Elijah Newren
2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
2018-08-19  5:09                                 ` Duy Nguyen
2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
2018-08-25 12:31                                 ` Martin Ågren
2018-08-25 13:02                                 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-27 15:50                     ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-26 16:35               ` Duy Nguyen
2018-07-24  5:54         ` Junio C Hamano
2018-07-24 15:13         ` Duy Nguyen
2018-07-24 21:21           ` Jeff King
2018-07-25 16:09           ` Ben Peart
2018-07-24  4:27       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b97c84e-0634-3879-18a1-8f96a7b4a30d@gmail.com \
    --to=peartben@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).