git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Turner <dturner@twopensource.com>
To: git@vger.kernel.org
Subject: Re: [PATCH v2 2/2] do_compare_entry: use already-computed path
Date: Tue, 05 Jan 2016 14:40:58 -0500	[thread overview]
Message-ID: <1452022858.3892.48.camel@twopensource.com> (raw)
In-Reply-To: <1450737260-15965-3-git-send-email-dturner@twopensource.com>

[-- Attachment #1: Type: text/plain, Size: 202 bytes --]

On Mon, 2015-12-21 at 17:34 -0500, David Turner wrote:
> In traverse_trees, we generate the complete traverse path for a

Please replace with the attached version, which eliminates an
unnecessary check.

[-- Attachment #2: 0001-do_compare_entry-use-already-computed-path.patch --]
[-- Type: text/x-patch, Size: 4769 bytes --]

From 520cfbff15faa6de50d37b4a476b21dbe1598433 Mon Sep 17 00:00:00 2001
From: David Turner <dturner@twopensource.com>
Date: Mon, 21 Dec 2015 17:34:20 -0500
Subject: [PATCH] do_compare_entry: use already-computed path

In traverse_trees, we generate the complete traverse path for a
traverse_info.  Later, in do_compare_entry, we used to go do a bunch
of work to compare the traverse_info to a cache_entry's name without
computing that path.  But since we already have that path, we don't
need to do all that work.  Instead, we can just put the generated
path into the traverse_info, and do the comparison more directly.

We copy the path because prune_traversal might mutate `base`. This
doesn't happen in any codepaths where do_compare_entry is called,
but it's better to be safe.

This makes git checkout much faster -- about 25% on Twitter's
monorepo.  Deeper directory trees are likely to benefit more than
shallower ones.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 tree-walk.c    |  7 +++++++
 tree-walk.h    |  1 +
 unpack-trees.c | 38 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 6dccd2d..cd4bb2c 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -320,6 +320,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 	struct tree_desc_x *tx = xcalloc(n, sizeof(*tx));
 	struct strbuf base = STRBUF_INIT;
 	int interesting = 1;
+	char *traverse_path;
 
 	for (i = 0; i < n; i++)
 		tx[i].d = t[i];
@@ -329,7 +330,11 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 		make_traverse_path(base.buf, info->prev, &info->name);
 		base.buf[info->pathlen-1] = '/';
 		strbuf_setlen(&base, info->pathlen);
+		traverse_path = xstrndup(base.buf, info->pathlen);
+	} else {
+		traverse_path = xstrndup(info->name.path, info->pathlen);
 	}
+	info->traverse_path = traverse_path;
 	for (;;) {
 		int trees_used;
 		unsigned long mask, dirmask;
@@ -411,6 +416,8 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 	for (i = 0; i < n; i++)
 		free_extended_entry(tx + i);
 	free(tx);
+	free(traverse_path);
+	info->traverse_path = NULL;
 	strbuf_release(&base);
 	return error;
 }
diff --git a/tree-walk.h b/tree-walk.h
index 3b2f7bf..174eb61 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -59,6 +59,7 @@ enum follow_symlinks_result {
 enum follow_symlinks_result get_tree_entry_follow_symlinks(unsigned char *tree_sha1, const char *name, unsigned char *result, struct strbuf *result_path, unsigned *mode);
 
 struct traverse_info {
+	const char *traverse_path;
 	struct traverse_info *prev;
 	struct name_entry name;
 	int pathlen;
diff --git a/unpack-trees.c b/unpack-trees.c
index 8e2032f..5f541c2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -498,13 +498,14 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
  * itself - the caller needs to do the final check for the cache
  * entry having more data at the end!
  */
-static int do_compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
+static int do_compare_entry_piecewise(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
 {
 	int len, pathlen, ce_len;
 	const char *ce_name;
 
 	if (info->prev) {
-		int cmp = do_compare_entry(ce, info->prev, &info->name);
+		int cmp = do_compare_entry_piecewise(ce, info->prev,
+						     &info->name);
 		if (cmp)
 			return cmp;
 	}
@@ -522,6 +523,39 @@ static int do_compare_entry(const struct cache_entry *ce, const struct traverse_
 	return df_name_compare(ce_name, ce_len, S_IFREG, n->path, len, n->mode);
 }
 
+static int do_compare_entry(const struct cache_entry *ce,
+			    const struct traverse_info *info,
+			    const struct name_entry *n)
+{
+	int len, pathlen, ce_len;
+	const char *ce_name;
+	int cmp;
+
+	/*
+	 * If we have not precomputed the traverse path, it is quicker
+	 * to avoid doing so.  But if we have precomputed it,
+	 * it is quicker to use the precomputed version.
+	 */
+	if (!info->traverse_path)
+		return do_compare_entry_piecewise(ce, info, n);
+
+	cmp = strncmp(ce->name, info->traverse_path, info->pathlen);
+	if (cmp)
+		return cmp;
+
+	pathlen = info->pathlen;
+	ce_len = ce_namelen(ce);
+
+	if (ce_len < pathlen)
+		return -1;
+
+	ce_len -= pathlen;
+	ce_name = ce->name + pathlen;
+
+	len = tree_entry_len(n);
+	return df_name_compare(ce_name, ce_len, S_IFREG, n->path, len, n->mode);
+}
+
 static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
 {
 	int cmp = do_compare_entry(ce, info, n);
-- 
2.4.2.749.g730654d-twtrsrc


  reply	other threads:[~2016-01-05 19:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-21 22:34 [PATCH v2 0/2] do_compare_entry: use already-computed path David Turner
2015-12-21 22:34 ` [PATCH v2 1/2] traverse_info: make mostly const David Turner
2015-12-21 23:15   ` David Turner
2015-12-21 22:34 ` [PATCH v2 2/2] do_compare_entry: use already-computed path David Turner
2016-01-05 19:40   ` David Turner [this message]
2015-12-21 23:27 ` [PATCH v2 0/2] " Junio C Hamano
2015-12-21 23:33   ` David Turner
     [not found]     ` <CAPc5daW4ru0j4Zd3ynnRcG8df7sZ9ZuVHu8mz2rxVonZpyE4Gw@mail.gmail.com>
2015-12-22  4:26       ` David Turner
2015-12-22 18:51         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1452022858.3892.48.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).