git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff Hostetler <Jeff.Hostetler@microsoft.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>,
	Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 3/5] name-hash: precompute hash values during preload-index
Date: Sun, 19 Feb 2017 13:45:20 -0800	[thread overview]
Message-ID: <xmqq1sutn9cf.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <MWHPR03MB295845950BB87BA9479E973E8A5F0@MWHPR03MB2958.namprd03.prod.outlook.com> (Jeff Hostetler's message of "Sun, 19 Feb 2017 00:19:58 +0000")

Jeff Hostetler <Jeff.Hostetler@microsoft.com> writes:

> I looked at doing this, but I didn't think the complexity and overhead to
> forward search for peers at the current level didn't warrant the limited gains.

It seems that I wasn't clear what I meant.  I didn't mean anything
complex like what you said.

Just something simple, like this on top of yours, that passes and
compares with only the previous one.  I do not know if that gives
any gain, though ;-).

 cache.h         |  2 +-
 name-hash.c     | 11 +++++++++--
 preload-index.c |  4 +++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 390aa803df..bd2980f6e3 100644
--- a/cache.h
+++ b/cache.h
@@ -233,7 +233,7 @@ struct cache_entry {
 #error "CE_EXTENDED_FLAGS out of range"
 #endif
 
-void precompute_istate_hashes(struct cache_entry *ce);
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev);
 
 /* Forward structure decls */
 struct pathspec;
diff --git a/name-hash.c b/name-hash.c
index f95054f44c..5e09b79170 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -300,7 +300,7 @@ void free_name_hash(struct index_state *istate)
  * non-skip-worktree items (since status should not observe skipped items), but
  * because lazy_init_name_hash() hashes everything, we force it here.
  */
-void precompute_istate_hashes(struct cache_entry *ce)
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev)
 {
 	int namelen = ce_namelen(ce);
 
@@ -312,7 +312,14 @@ void precompute_istate_hashes(struct cache_entry *ce)
 		ce->precomputed_hash.root_entry = 1;
 	} else {
 		namelen--;
-		ce->precomputed_hash.dir = memihash(ce->name, namelen);
+
+		if (prev && 
+		    prev->precomputed_hash.initialized &&
+		    namelen <= ce_namelen(prev) &&
+		    !memcmp(ce->name, prev->name, namelen))
+			ce->precomputed_hash.dir = prev->precomputed_hash.dir;
+		else
+			ce->precomputed_hash.dir = memihash(ce->name, namelen);
 		ce->precomputed_hash.name = memihash_continue(
 			ce->precomputed_hash.dir, ce->name + namelen,
 			ce_namelen(ce) - namelen);
diff --git a/preload-index.c b/preload-index.c
index 602737f9d0..784378ffac 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -37,6 +37,7 @@ static void *preload_thread(void *_data)
 	struct thread_data *p = _data;
 	struct index_state *index = p->index;
 	struct cache_entry **cep = index->cache + p->offset;
+	struct cache_entry *previous = NULL;
 	struct cache_def cache = CACHE_DEF_INIT;
 
 	nr = p->nr;
@@ -47,7 +48,8 @@ static void *preload_thread(void *_data)
 		struct cache_entry *ce = *cep++;
 		struct stat st;
 
-		precompute_istate_hashes(ce);
+		precompute_istate_hashes(ce, previous);
+		previous = ce;
 
 		if (ce_stage(ce))
 			continue;




> (I was just looking at the complexity of clear_ce_flags_1() in unpack-trees.c
> and how hard it has to look to find the end of the current directory and the
> effect that that has on the recursion and it felt like too much work for the
> potential gain.)
>
> Whereas remembering the previous one was basically free.  Granted, it only
> helps us for adjacent files in the index, so it's not perfect, but gives us the
> best bang for the buck.
>
> Jeff

  reply	other threads:[~2017-02-19 21:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-14 11:31 [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area Johannes Schindelin
2017-02-14 11:31 ` [PATCH 1/5] name-hash: eliminate duplicate memihash call Johannes Schindelin
2017-02-14 11:32 ` [PATCH 2/5] hashmap: allow memihash computation to be continued Johannes Schindelin
2017-02-18  5:35   ` Junio C Hamano
2017-02-20 12:43     ` Johannes Schindelin
2017-02-20 20:27       ` Junio C Hamano
2017-02-14 11:32 ` [PATCH 3/5] name-hash: precompute hash values during preload-index Johannes Schindelin
2017-02-18  5:47   ` Junio C Hamano
2017-02-19  0:19     ` Jeff Hostetler
2017-02-19 21:45       ` Junio C Hamano [this message]
2017-02-14 11:32 ` [PATCH 4/5] name-hash: specify initial size for istate.dir_hash table Johannes Schindelin
2017-02-14 11:32 ` [PATCH 5/5] name-hash: remember previous dir_entry during lazy_init_name_hash Johannes Schindelin
2017-02-14 22:03 ` [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area Jeff King
2017-02-15 14:27   ` Jeff Hostetler
2017-02-15 16:44     ` Jeff King
2017-02-18  5:56       ` Junio C Hamano
2017-02-19  0:02         ` Jeff Hostetler
2017-02-18  5:58     ` Junio C Hamano
2017-02-18  6:29       ` Jeff King
2017-02-18 20:48         ` Junio C Hamano
2017-02-18 23:52           ` Jeff Hostetler
2017-02-19 21:50             ` Junio C Hamano
2017-03-02 21:11     ` Junio C Hamano
2017-03-02 21:18       ` Jeff Hostetler
2017-03-02 21:40         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1sutn9cf.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=Jeff.Hostetler@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).