From: Junio C Hamano <gitster@pobox.com>
To: Jeff Hostetler <Jeff.Hostetler@microsoft.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>,
Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 3/5] name-hash: precompute hash values during preload-index
Date: Sun, 19 Feb 2017 13:45:20 -0800 [thread overview]
Message-ID: <xmqq1sutn9cf.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <MWHPR03MB295845950BB87BA9479E973E8A5F0@MWHPR03MB2958.namprd03.prod.outlook.com> (Jeff Hostetler's message of "Sun, 19 Feb 2017 00:19:58 +0000")
Jeff Hostetler <Jeff.Hostetler@microsoft.com> writes:
> I looked at doing this, but I didn't think the complexity and overhead to
> forward search for peers at the current level didn't warrant the limited gains.
It seems that I wasn't clear what I meant. I didn't mean anything
complex like what you said.
Just something simple, like this on top of yours, that passes and
compares with only the previous one. I do not know if that gives
any gain, though ;-).
cache.h | 2 +-
name-hash.c | 11 +++++++++--
preload-index.c | 4 +++-
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/cache.h b/cache.h
index 390aa803df..bd2980f6e3 100644
--- a/cache.h
+++ b/cache.h
@@ -233,7 +233,7 @@ struct cache_entry {
#error "CE_EXTENDED_FLAGS out of range"
#endif
-void precompute_istate_hashes(struct cache_entry *ce);
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev);
/* Forward structure decls */
struct pathspec;
diff --git a/name-hash.c b/name-hash.c
index f95054f44c..5e09b79170 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -300,7 +300,7 @@ void free_name_hash(struct index_state *istate)
* non-skip-worktree items (since status should not observe skipped items), but
* because lazy_init_name_hash() hashes everything, we force it here.
*/
-void precompute_istate_hashes(struct cache_entry *ce)
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev)
{
int namelen = ce_namelen(ce);
@@ -312,7 +312,14 @@ void precompute_istate_hashes(struct cache_entry *ce)
ce->precomputed_hash.root_entry = 1;
} else {
namelen--;
- ce->precomputed_hash.dir = memihash(ce->name, namelen);
+
+ if (prev &&
+ prev->precomputed_hash.initialized &&
+ namelen <= ce_namelen(prev) &&
+ !memcmp(ce->name, prev->name, namelen))
+ ce->precomputed_hash.dir = prev->precomputed_hash.dir;
+ else
+ ce->precomputed_hash.dir = memihash(ce->name, namelen);
ce->precomputed_hash.name = memihash_continue(
ce->precomputed_hash.dir, ce->name + namelen,
ce_namelen(ce) - namelen);
diff --git a/preload-index.c b/preload-index.c
index 602737f9d0..784378ffac 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -37,6 +37,7 @@ static void *preload_thread(void *_data)
struct thread_data *p = _data;
struct index_state *index = p->index;
struct cache_entry **cep = index->cache + p->offset;
+ struct cache_entry *previous = NULL;
struct cache_def cache = CACHE_DEF_INIT;
nr = p->nr;
@@ -47,7 +48,8 @@ static void *preload_thread(void *_data)
struct cache_entry *ce = *cep++;
struct stat st;
- precompute_istate_hashes(ce);
+ precompute_istate_hashes(ce, previous);
+ previous = ce;
if (ce_stage(ce))
continue;
> (I was just looking at the complexity of clear_ce_flags_1() in unpack-trees.c
> and how hard it has to look to find the end of the current directory and the
> effect that that has on the recursion and it felt like too much work for the
> potential gain.)
>
> Whereas remembering the previous one was basically free. Granted, it only
> helps us for adjacent files in the index, so it's not perfect, but gives us the
> best bang for the buck.
>
> Jeff
next prev parent reply other threads:[~2017-02-19 21:54 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-14 11:31 [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area Johannes Schindelin
2017-02-14 11:31 ` [PATCH 1/5] name-hash: eliminate duplicate memihash call Johannes Schindelin
2017-02-14 11:32 ` [PATCH 2/5] hashmap: allow memihash computation to be continued Johannes Schindelin
2017-02-18 5:35 ` Junio C Hamano
2017-02-20 12:43 ` Johannes Schindelin
2017-02-20 20:27 ` Junio C Hamano
2017-02-14 11:32 ` [PATCH 3/5] name-hash: precompute hash values during preload-index Johannes Schindelin
2017-02-18 5:47 ` Junio C Hamano
2017-02-19 0:19 ` Jeff Hostetler
2017-02-19 21:45 ` Junio C Hamano [this message]
2017-02-14 11:32 ` [PATCH 4/5] name-hash: specify initial size for istate.dir_hash table Johannes Schindelin
2017-02-14 11:32 ` [PATCH 5/5] name-hash: remember previous dir_entry during lazy_init_name_hash Johannes Schindelin
2017-02-14 22:03 ` [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area Jeff King
2017-02-15 14:27 ` Jeff Hostetler
2017-02-15 16:44 ` Jeff King
2017-02-18 5:56 ` Junio C Hamano
2017-02-19 0:02 ` Jeff Hostetler
2017-02-18 5:58 ` Junio C Hamano
2017-02-18 6:29 ` Jeff King
2017-02-18 20:48 ` Junio C Hamano
2017-02-18 23:52 ` Jeff Hostetler
2017-02-19 21:50 ` Junio C Hamano
2017-03-02 21:11 ` Junio C Hamano
2017-03-02 21:18 ` Jeff Hostetler
2017-03-02 21:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq1sutn9cf.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=Jeff.Hostetler@microsoft.com \
--cc=git@vger.kernel.org \
--cc=johannes.schindelin@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).