git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Ben Peart <peartben@gmail.com>, Jeff King <peff@peff.net>,
	Ben Peart <Ben.Peart@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
Date: Fri, 27 Jul 2018 10:14:14 -0700	[thread overview]
Message-ID: <xmqqpnz8ob2x.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20180727154241.GA21288@duynguyen.home> (Duy Nguyen's message of "Fri, 27 Jul 2018 17:42:41 +0200")

Duy Nguyen <pclouds@gmail.com> writes:

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 66741130ae..9c791b55b2 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -642,6 +642,110 @@ static inline int are_same_oid(struct name_entry *name_j, struct name_entry *nam
>  	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
>  }
>  
> +static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
> +					struct name_entry *names,
> +					struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int i;
> +
> +	if (dirmask != ((1 << n) - 1) || !S_ISDIR(names->mode) || !o->merge)
> +		return 0;

In other words, punt if (1) not all are directories, (2) the first
name entry given by the caller in names[] is not ISDIR(), or (3) we
are not merging i.e. not "Are we supposed to look at the index too?"
in unpack_callback().

I am not sure if the second one is doing us any good.  When
S_ISDIR(names->mode) is not true, then the bit in dirmask that
corresponds to the one in the entry[] traverse_trees() filled and
passed to us must be zero, so the dirmask check would reject such a
case anyway, no?

I would have moved !o->merge to the front, not for performance
reasons but to make it clear that this function helps an
optimization that matters only when we are walking tree(s) together
with the index.

> +	for (i = 1; i < n; i++)
> +		if (!are_same_oid(names, names + i))
> +			return 0;
> +
> +	return cache_tree_matches_traversal(o->src_index->cache_tree, names, info);
> +}
> +
> +/*
> + * Fast path if we detect that all trees are the same as cache-tree at this
> + * path. We'll walk these trees recursively using cache-tree/index instead of
> + * ODB since already know what these trees contain.
> + */
> +static int traverse_by_cache_tree(int pos, int nr_entries, int nr_names,
> +				  struct name_entry *names,
> +				  struct traverse_info *info)
> +{
> +	struct cache_entry *src[MAX_UNPACK_TREES + 1] = { NULL, };
> +	struct unpack_trees_options *o = info->data;
> +	int i, d;
> +
> +	/*
> +	 * Do what unpack_callback() and unpack_nondirectories() normally
> +	 * do. But we do it in one function call (for even nested trees)
> +	 * instead.
> +	 *
> +	 * D/F conflicts and staged entries are not a concern because cache-tree
> +	 * would be invalidated and we would never get here in the first place.
> +	 */

We want to at least have

	if (!o->merge || ARRAY_SIZE(src) <= nr_names)
		BUG("");

here, I'd think.

> +	for (i = 0; i < nr_entries; i++) {
> +		struct cache_entry *tree_ce;
> +		int len, rc;
> +
> +		src[0] = o->src_index->cache[pos + i];
> +
> +		/* Do what unpack_nondirectories() normally does */
> +		len = ce_namelen(src[0]);
> +		tree_ce = xcalloc(1, cache_entry_size(len));

unpack_nondirectories() uses create_ce_entry() here.  Any reason why
we shouldn't use it and tell it to make a transient one?

> +		tree_ce->ce_mode = src[0]->ce_mode;
> +		tree_ce->ce_flags = create_ce_flags(0);
> +		tree_ce->ce_namelen = len;
> +		oidcpy(&tree_ce->oid, &src[0]->oid);
> +		memcpy(tree_ce->name, src[0]->name, len + 1);
> +
> +		for (d = 1; d <= nr_names; d++)
> +			src[d] = tree_ce;
> +
> +		rc = call_unpack_fn((const struct cache_entry * const *)src, o);
> +		free(tree_ce);
> +		if (rc < 0)
> +			return rc;
> +
> +		mark_ce_used(src[0], o);
> +	}
> +	trace_printf("Quick traverse over %d entries from %s to %s\n",
> +		     nr_entries,
> +		     o->src_index->cache[pos]->name,
> +		     o->src_index->cache[pos + nr_entries - 1]->name);
> +	return 0;
> +}

When I invented the cache-tree originally, primarily to speed up
writing of deeply nested trees, I had the "diff-index --cached"
optimization where a subtree with contents known to be the same as
the corresponding span in the index is entirely skipped without
getting even looked at.  I didn't realize this (now obvious)
optimization that scanning the index is faster than opening and
traversing trees (I was more focused on not even scanning, which
is what "diff-index --cached" optimization was about).

Nice.


> +static int index_pos_by_traverse_info(struct name_entry *names,
> +				      struct traverse_info *info)
> +{
> +	struct unpack_trees_options *o = info->data;
> +	int len = traverse_path_len(info, names);
> +	char *name = xmalloc(len + 1);
> +	int pos;
> +
> +	make_traverse_path(name, info, names);
> +	pos = index_name_pos(o->src_index, name, len);
> +	if (pos >= 0)
> +		BUG("This is so wrong. This is a directory and should not exist in index");
> +	pos = -pos - 1;
> +	/*
> +	 * There's no guarantee that pos points to the first entry of the
> +	 * directory. If the directory name is "letters" and there's another
> +	 * file named "letters.txt" in the index, pos will point to that file
> +	 * instead.
> +	 */

Is this trying to address the issue o->cache_bottom,
next_cache_entry(), etc. are trying to address?  i.e. an entry
"letters" appears at a different place relative to other entries in
a tree, depending on the type of the entry itself, so linear and
parallel scan of the index and the trees may miss matching entries
without backtracking?  If so, I am not sure if the loop below is
sufficient.

> +	while (pos < o->src_index->cache_nr) {
> +		const struct cache_entry *ce = o->src_index->cache[pos];
> +		if (ce_namelen(ce) > len &&
> +		    ce->name[len] == '/' &&
> +		    !memcmp(ce->name, name, len))
> +			break;
> +		pos++;
> +	}
> +	if (pos == o->src_index->cache_nr)
> +		BUG("This is still wrong");
> +	free(name);
> +	return pos;
> +}
> +

In anycase, nice progress.

  parent reply	other threads:[~2018-07-27 17:14 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
2018-07-18 20:57   ` Stefan Beller
2018-07-19 19:11   ` Junio C Hamano
2018-07-18 20:45 ` [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees() Ben Peart
2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
2018-07-18 22:56   ` Junio C Hamano
2018-07-18 21:02 ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Stefan Beller
2018-07-18 21:34 ` Jeff King
2018-07-23 15:48   ` Ben Peart
2018-07-23 17:03     ` Duy Nguyen
2018-07-23 20:51       ` Ben Peart
2018-07-24  4:20         ` Jeff King
2018-07-24 15:33           ` Duy Nguyen
2018-07-25 20:56             ` Ben Peart
2018-07-26  5:30               ` Duy Nguyen
2018-07-26 16:30                 ` Duy Nguyen
2018-07-26 19:40                   ` Junio C Hamano
2018-07-27 15:42                     ` Duy Nguyen
2018-07-27 16:22                       ` Ben Peart
2018-07-27 18:00                         ` Duy Nguyen
2018-07-27 17:14                       ` Junio C Hamano [this message]
2018-07-27 17:52                         ` Duy Nguyen
2018-07-29  6:24                           ` Duy Nguyen
2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
2018-07-30 20:16                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-07-30 20:52                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-07-30 20:58                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:46                           ` Elijah Newren
2018-08-10 16:39                             ` Duy Nguyen
2018-08-10 18:39                               ` Elijah Newren
2018-08-10 19:30                                 ` Duy Nguyen
2018-08-10 19:40                                   ` Elijah Newren
2018-08-10 19:48                                     ` Duy Nguyen
2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
2018-07-31 15:31                           ` Duy Nguyen
2018-07-31 16:50                             ` Ben Peart
2018-07-31 17:31                               ` Ben Peart
2018-08-01 16:38                                 ` Duy Nguyen
2018-08-08 20:53                                   ` Ben Peart
2018-08-09  8:16                                     ` Ben Peart
2018-08-10 16:08                                       ` Duy Nguyen
2018-08-10 15:51                                     ` Duy Nguyen
2018-07-30 21:04                         ` Ben Peart
2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:23                             ` Elijah Newren
2018-08-10 16:29                               ` Duy Nguyen
2018-08-10 18:48                                 ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-08 18:30                             ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
2018-08-06 15:59                             ` Duy Nguyen
2018-08-06 18:59                               ` Junio C Hamano
2018-08-08 17:00                                 ` Ben Peart
2018-08-08 17:46                               ` Junio C Hamano
2018-08-08 18:12                                 ` Junio C Hamano
2018-08-08 18:39                                   ` Junio C Hamano
2018-08-10 16:53                                     ` Duy Nguyen
2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-13 18:39                               ` Ben Peart
2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-12 10:05                               ` Thomas Adam
2018-08-13 18:50                                 ` Junio C Hamano
2018-08-13 18:44                               ` Ben Peart
2018-08-13 19:25                               ` Jeff King
2018-08-13 19:36                                 ` Stefan Beller
2018-08-13 20:11                                   ` Ben Peart
2018-08-13 19:52                                 ` Duy Nguyen
2018-08-13 21:47                                   ` Jeff King
2018-08-13 22:41                                 ` Junio C Hamano
2018-08-14 18:19                                   ` Jeff Hostetler
2018-08-14 18:32                                     ` Duy Nguyen
2018-08-14 18:44                                       ` Stefan Beller
2018-08-14 18:51                                         ` Duy Nguyen
2018-08-14 19:54                                           ` Jeff King
2018-08-14 20:52                                           ` Junio C Hamano
2018-08-15 16:32                                             ` Duy Nguyen
2018-08-15 18:28                                               ` Junio C Hamano
2018-08-14 20:14                                         ` Jeff Hostetler
2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-13 18:58                               ` Ben Peart
2018-08-15 16:38                                 ` Duy Nguyen
2018-08-12  8:15                             ` [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-13 15:48                               ` Elijah Newren
2018-08-13 15:57                                 ` Duy Nguyen
2018-08-13 16:05                                 ` Ben Peart
2018-08-13 16:25                                   ` Duy Nguyen
2018-08-13 17:15                                     ` Ben Peart
2018-08-13 19:01                             ` [PATCH v4 0/5] Speed up unpack_trees() Junio C Hamano
2018-08-14 19:19                             ` Ben Peart
2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 2/7] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-20 12:43                                 ` Ben Peart
2018-08-18 14:41                               ` [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 6/7] unpack-trees: add missing cache invalidation Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
2018-08-18 21:45                                 ` Elijah Newren
2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
2018-08-19  5:09                                 ` Duy Nguyen
2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
2018-08-25 12:31                                 ` Martin Ågren
2018-08-25 13:02                                 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-27 15:50                     ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-26 16:35               ` Duy Nguyen
2018-07-24  5:54         ` Junio C Hamano
2018-07-24 15:13         ` Duy Nguyen
2018-07-24 21:21           ` Jeff King
2018-07-25 16:09           ` Ben Peart
2018-07-24  4:27       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpnz8ob2x.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).