git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Ben Peart <peartben@gmail.com>
Cc: Duy Nguyen <pclouds@gmail.com>,
	Ben Peart <Ben.Peart@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc)
Date: Tue, 24 Jul 2018 00:20:17 -0400	[thread overview]
Message-ID: <20180724042017.GA13248@sigill.intra.peff.net> (raw)
In-Reply-To: <6ff6fbdc-d9cf-019f-317c-7fdba31105c6@gmail.com>

On Mon, Jul 23, 2018 at 04:51:38PM -0400, Ben Peart wrote:

> > Hmm.. this means cache-tree is fully valid, unless you have changes in
> > index. We're quite aggressive in repairing cache-tree since aecf567cbf
> > (cache-tree: create/update cache-tree on checkout - 2014-07-05). If we
> > have very good cache-tree records and still spend 33s on
> > traverse_trees, maybe there's something else.
> 
> I'm not at all familiar with the cache-tree and couldn't find any
> documentation on it other than index-format.txt which says "it helps speed
> up tree object generation for a new commit."  In this particular case, no
> new commit is being created so I don't know that the cache-tree would help.

It's basically an index extension that mirrors the tree structure within
the index, telling you the sha1 of the three that _would_ be generated
from any particular path. So any time you're walking a tree alongside
the index, in theory you should be able to say "the cache-tree for this
subset of the index matches the tree" and skip over a bunch of entries.

At least that's my view of it. unpack_trees() has always been a
terrifying beast that I've avoided looking too closely at.

> After a quick look at the code, the only place I can find that tries to use
> cache_tree_matches_traversal() is in unpack_callback() and that only happens
> if n == 1 and in the "git checkout" case, n == 2. Am I missing something?

Looks like it's trying to special-case "diff-index --cached". Which
kind-of makes sense. In the non-cached case, we're thinking not only
about the relationship between the index and the tree, but also whether
the on-disk files are up to date.

And that would be the same for checkout. We want to know not only
whether there are changes to make to the index, but also whether the
on-disk files need to be updated from the index.

But I assume in your case that we've just refreshed the index quickly
using fsmonitor. So I think in the long run what you want is:

  1. fsmonitor tells us which index entries are not clean

  2. based on the unclean list, we invalidate cache-tree entries for
     those paths

  3. if we have a valid cache-tree entry, we should be able to skip
     digging into that tree; if not, then we walk the index and tree as
     normal, adding/deleting index entries and updating (or complaining
     about) modified on-disk files

I think the "n" adds an extra layer of complexity. n==2 means we're
doing a "2-way" merge. Moving from tree X to tree Y, and dealing with
the index as we go. Naively I _think_ we'd be OK to just extend the rule
to "if both subtrees match each other _and_ match the valid cache-tree,
then we can skip".

Again, I'm a little out of my area of expertise here, but cargo-culting
like this:

diff --git a/sha1-file.c b/sha1-file.c
index de4839e634..c105af70ce 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1375,6 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 
 	if (oid_object_info_extended(the_repository, &oid, &oi, 0) < 0)
 		return NULL;
+	trace_printf("reading %s %s", type_name(*type), sha1_to_hex(sha1));
 	return content;
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 66741130ae..cfdad4133d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1075,6 +1075,23 @@ static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
 				o->cache_bottom += matches;
 				return mask;
 			}
+		} else if (n == 2 && S_ISDIR(names->mode) &&
+			   names[0].mode == names[1].mode &&
+			   !strcmp(names[0].path, names[1].path) &&
+			   !oidcmp(names[0].oid, names[1].oid)
+			   /* && somehow account for modified on-disk files */) {
+			int matches;
+
+			/*
+			 * we know that the two trees have the same oid, so we
+			 * only need to look at one of them
+			 */
+			matches = cache_tree_matches_traversal(o->src_index->cache_tree,
+							       names, info);
+			if (matches) {
+				o->cache_bottom += matches;
+				return mask;
+			}
 		}
 
 		if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,

seems to avoid the tree reads when running "GIT_TRACE=1 git checkout".
It also totally empties the index. ;) So clearly we have to do a bit
more there. Probably rather than just bumping o->cache_bottom forward,
we'd need to actually move those entries into the new index. Or maybe
it's something else entirely (I did say cargo-culting, right?).

-Peff

  reply	other threads:[~2018-07-24  4:20 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 20:45 [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-18 20:45 ` [PATCH v1 1/3] add unbounded Multi-Producer-Multi-Consumer queue Ben Peart
2018-07-18 20:57   ` Stefan Beller
2018-07-19 19:11   ` Junio C Hamano
2018-07-18 20:45 ` [PATCH v1 2/3] add performance tracing around traverse_trees() in unpack_trees() Ben Peart
2018-07-18 20:45 ` [PATCH v1 3/3] Add initial parallel version of unpack_trees() Ben Peart
2018-07-18 22:56   ` Junio C Hamano
2018-07-18 21:02 ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Stefan Beller
2018-07-18 21:34 ` Jeff King
2018-07-23 15:48   ` Ben Peart
2018-07-23 17:03     ` Duy Nguyen
2018-07-23 20:51       ` Ben Peart
2018-07-24  4:20         ` Jeff King [this message]
2018-07-24 15:33           ` Duy Nguyen
2018-07-25 20:56             ` Ben Peart
2018-07-26  5:30               ` Duy Nguyen
2018-07-26 16:30                 ` Duy Nguyen
2018-07-26 19:40                   ` Junio C Hamano
2018-07-27 15:42                     ` Duy Nguyen
2018-07-27 16:22                       ` Ben Peart
2018-07-27 18:00                         ` Duy Nguyen
2018-07-27 17:14                       ` Junio C Hamano
2018-07-27 17:52                         ` Duy Nguyen
2018-07-29  6:24                           ` Duy Nguyen
2018-07-29 10:33                       ` [PATCH v2 0/4] Speed up unpack_trees() Nguyễn Thái Ngọc Duy
2018-07-29 10:33                         ` [PATCH v2 1/4] unpack-trees.c: add performance tracing Nguyễn Thái Ngọc Duy
2018-07-30 20:16                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-07-30 20:52                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-07-30 20:58                           ` Ben Peart
2018-07-29 10:33                         ` [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:46                           ` Elijah Newren
2018-08-10 16:39                             ` Duy Nguyen
2018-08-10 18:39                               ` Elijah Newren
2018-08-10 19:30                                 ` Duy Nguyen
2018-08-10 19:40                                   ` Elijah Newren
2018-08-10 19:48                                     ` Duy Nguyen
2018-07-30 18:10                         ` [PATCH v2 0/4] Speed up unpack_trees() Ben Peart
2018-07-31 15:31                           ` Duy Nguyen
2018-07-31 16:50                             ` Ben Peart
2018-07-31 17:31                               ` Ben Peart
2018-08-01 16:38                                 ` Duy Nguyen
2018-08-08 20:53                                   ` Ben Peart
2018-08-09  8:16                                     ` Ben Peart
2018-08-10 16:08                                       ` Duy Nguyen
2018-08-10 15:51                                     ` Duy Nguyen
2018-07-30 21:04                         ` Ben Peart
2018-08-04  5:37                         ` [PATCH v3 " Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 1/4] unpack-trees: add performance tracing Nguyễn Thái Ngọc Duy
2018-08-04  5:37                           ` [PATCH v3 2/4] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-08 18:23                             ` Elijah Newren
2018-08-10 16:29                               ` Duy Nguyen
2018-08-10 18:48                                 ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 3/4] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-08 18:30                             ` Elijah Newren
2018-08-04  5:37                           ` [PATCH v3 4/4] unpack-trees: cheaper index update when walking by cache-tree Nguyễn Thái Ngọc Duy
2018-08-06 15:48                           ` [PATCH v3 0/4] Speed up unpack_trees() Junio C Hamano
2018-08-06 15:59                             ` Duy Nguyen
2018-08-06 18:59                               ` Junio C Hamano
2018-08-08 17:00                                 ` Ben Peart
2018-08-08 17:46                               ` Junio C Hamano
2018-08-08 18:12                                 ` Junio C Hamano
2018-08-08 18:39                                   ` Junio C Hamano
2018-08-10 16:53                                     ` Duy Nguyen
2018-08-12  8:15                           ` [PATCH v4 0/5] " Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 1/5] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-13 18:39                               ` Ben Peart
2018-08-12  8:15                             ` [PATCH v4 2/5] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-12 10:05                               ` Thomas Adam
2018-08-13 18:50                                 ` Junio C Hamano
2018-08-13 18:44                               ` Ben Peart
2018-08-13 19:25                               ` Jeff King
2018-08-13 19:36                                 ` Stefan Beller
2018-08-13 20:11                                   ` Ben Peart
2018-08-13 19:52                                 ` Duy Nguyen
2018-08-13 21:47                                   ` Jeff King
2018-08-13 22:41                                 ` Junio C Hamano
2018-08-14 18:19                                   ` Jeff Hostetler
2018-08-14 18:32                                     ` Duy Nguyen
2018-08-14 18:44                                       ` Stefan Beller
2018-08-14 18:51                                         ` Duy Nguyen
2018-08-14 19:54                                           ` Jeff King
2018-08-14 20:52                                           ` Junio C Hamano
2018-08-15 16:32                                             ` Duy Nguyen
2018-08-15 18:28                                               ` Junio C Hamano
2018-08-14 20:14                                         ` Jeff Hostetler
2018-08-12  8:15                             ` [PATCH v4 3/5] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-13 18:58                               ` Ben Peart
2018-08-15 16:38                                 ` Duy Nguyen
2018-08-12  8:15                             ` [PATCH v4 4/5] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-12  8:15                             ` [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-13 15:48                               ` Elijah Newren
2018-08-13 15:57                                 ` Duy Nguyen
2018-08-13 16:05                                 ` Ben Peart
2018-08-13 16:25                                   ` Duy Nguyen
2018-08-13 17:15                                     ` Ben Peart
2018-08-13 19:01                             ` [PATCH v4 0/5] Speed up unpack_trees() Junio C Hamano
2018-08-14 19:19                             ` Ben Peart
2018-08-18 14:41                             ` [PATCH v5 0/7] " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 1/7] trace.h: support nested performance tracing Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 2/7] unpack-trees: add " Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 3/7] unpack-trees: optimize walking same trees with cache-tree Nguyễn Thái Ngọc Duy
2018-08-20 12:43                                 ` Ben Peart
2018-08-18 14:41                               ` [PATCH v5 4/7] unpack-trees: reduce malloc in cache-tree walk Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 5/7] unpack-trees: reuse (still valid) cache-tree from src_index Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 6/7] unpack-trees: add missing cache invalidation Nguyễn Thái Ngọc Duy
2018-08-18 14:41                               ` [PATCH v5 7/7] cache-tree: verify valid cache-tree in the test suite Nguyễn Thái Ngọc Duy
2018-08-18 21:45                                 ` Elijah Newren
2018-08-18 22:01                               ` [PATCH v5 0/7] Speed up unpack_trees() Elijah Newren
2018-08-19  5:09                                 ` Duy Nguyen
2018-08-25 12:18                               ` [PATCH] Document update for nd/unpack-trees-with-cache-tree Nguyễn Thái Ngọc Duy
2018-08-25 12:31                                 ` Martin Ågren
2018-08-25 13:02                                 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-07-27 15:50                     ` [PATCH v1 0/3] [RFC] Speeding up checkout (and merge, rebase, etc) Ben Peart
2018-07-26 16:35               ` Duy Nguyen
2018-07-24  5:54         ` Junio C Hamano
2018-07-24 15:13         ` Duy Nguyen
2018-07-24 21:21           ` Jeff King
2018-07-25 16:09           ` Ben Peart
2018-07-24  4:27       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180724042017.GA13248@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Ben.Peart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).