From: "René Scharfe" <l.s.r@web.de>
To: git@jeffhostetler.com, git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout
Date: Fri, 7 Apr 2017 02:32:44 +0200 [thread overview]
Message-ID: <a1e08e20-7fe6-da3c-cd18-74a8bcd266e0@web.de> (raw)
In-Reply-To: <20170406203732.47714-2-git@jeffhostetler.com>
Am 06.04.2017 um 22:37 schrieb git@jeffhostetler.com:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach traverse_trees_recursive() to not do redundant ODB
> lookups when both directories refer to the same OID.
>
> In operations such as read-tree, checkout, and merge when
> the differences between the commits are relatively small,
> there will likely be many directories that have the same
> SHA-1. In these cases we can avoid hitting the ODB multiple
> times for the same SHA-1.
>
> TODO This change is a first attempt to test that by comparing
> TODO the hashes of name[i] and name[i-i] and simply copying
> TODO the tree-descriptor data. I was thinking of the n=2
> TODO case here. We may want to extend this to the n=3 case.
>
> ================
> On the Windows repo (500K trees, 3.1M files, 450MB index),
> this reduced the overall time by 0.75 seconds when cycling
> between 2 commits with a single file difference.
>
> (avg) before: 22.699
> (avg) after: 21.955
> ===============
>
> ================
> Using the p0004-read-tree test (posted earlier this week)
> with 1M files on Linux:
>
> before:
> $ ./p0004-read-tree.sh
> 0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32)
> 0004.6: switch commit aliases (1003037) 11.95(8.20+3.14)
>
> after:
> $ ./p0004-read-tree.sh
> 0004.5: switch work1 work2 (1003037) 11.17(7.84+2.76)
> 0004.6: switch commit aliases (1003037) 11.13(7.82+2.72)
> ================
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
> tree-walk.c | 8 ++++++++
> tree-walk.h | 1 +
> unpack-trees.c | 13 +++++++++----
> 3 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/tree-walk.c b/tree-walk.c
> index ff77605..3b82f0e 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -92,6 +92,14 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1)
> return buf;
> }
>
> +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src)
> +{
> + void *buf = xmalloc(src->size);
> + memcpy(buf, src->buffer, src->size);
> + init_tree_desc(dest, buf, src->size);
> + return buf;
> +}
> +
> static void entry_clear(struct name_entry *a)
> {
> memset(a, 0, sizeof(*a));
> diff --git a/tree-walk.h b/tree-walk.h
> index 68bb78b..ca4032b 100644
> --- a/tree-walk.h
> +++ b/tree-walk.h
> @@ -43,6 +43,7 @@ int tree_entry(struct tree_desc *, struct name_entry *);
> int tree_entry_gently(struct tree_desc *, struct name_entry *);
>
> void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
> +void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src);
>
> struct traverse_info;
> typedef int (*traverse_callback_t)(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *);
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3a8ee19..50aacad 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -554,10 +554,15 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
> newinfo.df_conflicts |= df_conflicts;
>
> for (i = 0; i < n; i++, dirmask >>= 1) {
> - const unsigned char *sha1 = NULL;
> - if (dirmask & 1)
> - sha1 = names[i].oid->hash;
> - buf[i] = fill_tree_descriptor(t+i, sha1);
> + if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid &&
Can .oid even be NULL? (I didn't check, but it's dereferenced in the
sha1 assignment below, so I guess the answer is no and these two checks
are not needed.)
> + !hashcmp(names[i].oid->hash, names[i-1].oid->hash)) {
Calling oidcmp would be shorter.
> + buf[i] = copy_tree_descriptor(&t[i], &t[i-1]);
buf keeps track of the allocations that need to be freed at the end of
the function. I assume these buffers are read-only. Can you use an
alias here instead of a duplicate by calling init_tree_desc with the
predecessor's buffer and setting buf[i] to NULL? Or even just copying
t[i - 1] to t[i] with an assignment? That would be shorter and probably
also quicker.
> + } else {
> + const unsigned char *sha1 = NULL;
> + if (dirmask & 1)
> + sha1 = names[i].oid->hash;
> + buf[i] = fill_tree_descriptor(t+i, sha1);
> + }
> }
>
> bottom = switch_cache_bottom(&newinfo);
>
next prev parent reply other threads:[~2017-04-07 0:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-06 20:37 [PATCH v1] WIP unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-06 20:37 ` [PATCH v1] " git
2017-04-06 22:48 ` Stefan Beller
2017-04-07 5:19 ` Jeff King
2017-04-07 13:51 ` Jeff Hostetler
2017-04-07 17:35 ` Jeff Hostetler
2017-04-07 0:32 ` René Scharfe [this message]
2017-04-07 13:57 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a1e08e20-7fe6-da3c-cd18-74a8bcd266e0@web.de \
--to=l.s.r@web.de \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).