[PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
@ 2017-04-07 15:53 git
  2017-04-07 15:53 ` git
  0 siblings, 1 reply; 8+ messages in thread
From: git @ 2017-04-07 15:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Version 2 simplifies this and just copies the tree_descriptor
data and borrows the underlying buffer without mallocing.  It
also handles the n=3 cases, so merges shold be helped too.

I've updated the p0004 perf times in the commit message.
The V2 changes took ~0.15 off the V1 times.  The total
reduction is ~1 second.

================
Avoid duplicate ODB lookups for trees during traverse_tree_recursive().

Jeff Hostetler (1):
  unpack-trees: avoid duplicate ODB lookups during checkout

 unpack-trees.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
  2017-04-07 15:53 [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout git
@ 2017-04-07 15:53 ` git
  2017-04-08 14:06   ` René Scharfe
  0 siblings, 1 reply; 8+ messages in thread
From: git @ 2017-04-07 15:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.

In operations such as read-tree, checkout, and merge when
the differences between the commits are relatively small,
there will likely be many directories that have the same
SHA-1.  In these cases we can avoid hitting the ODB multiple
times for the same SHA-1.

This patch handles n=2 and n=3 cases and simply copies the
data rather than repeating the fill_tree_descriptor().

================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.

(avg) before: 22.699
(avg) after:  21.955
===============

================
Using the p0004-read-tree test (posted earlier this week)
with 1M files on Linux:

before:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037)       11.99(8.12+3.32)
0004.6: switch commit aliases (1003037)    11.95(8.20+3.14)

after:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037)       11.02(7.71+2.78)
0004.6: switch commit aliases (1003037)    10.95(7.57+2.82)
================

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 unpack-trees.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19..143c5d9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info)
 	return ret;
 }
 
+static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
+{
+	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
+}
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
 				    unsigned long df_conflicts,
 				    struct name_entry *names,
@@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	newinfo.df_conflicts |= df_conflicts;
 
 	for (i = 0; i < n; i++, dirmask >>= 1) {
-		const unsigned char *sha1 = NULL;
-		if (dirmask & 1)
-			sha1 = names[i].oid->hash;
-		buf[i] = fill_tree_descriptor(t+i, sha1);
+		if (i > 0 && are_same_oid(&names[i], &names[i-1])) {
+			/* implicitly borrow buf[i-1] inside tree_desc[i] */
+			memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));
+			buf[i] = NULL;
+		} else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
+			/* implicitly borrow buf[i-2] inside tree_desc[i] */
+			memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
+			buf[i] = NULL;
+		} else {
+			const unsigned char *sha1 = NULL;
+			if (dirmask & 1)
+				sha1 = names[i].oid->hash;
+			buf[i] = fill_tree_descriptor(t+i, sha1);
+		}
 	}
 
 	bottom = switch_cache_bottom(&newinfo);
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
  2017-04-07 15:53 ` git
@ 2017-04-08 14:06   ` René Scharfe
  2017-04-10 20:55     ` Jeff King
  2017-04-10 21:26     ` Jeff Hostetler
  0 siblings, 2 replies; 8+ messages in thread
From: René Scharfe @ 2017-04-08 14:06 UTC (permalink / raw)
  To: git, git; +Cc: gitster, peff, Jeff Hostetler

Am 07.04.2017 um 17:53 schrieb git@jeffhostetler.com:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Teach traverse_trees_recursive() to not do redundant ODB
> lookups when both directories refer to the same OID.
> 
> In operations such as read-tree, checkout, and merge when
> the differences between the commits are relatively small,
> there will likely be many directories that have the same
> SHA-1.  In these cases we can avoid hitting the ODB multiple
> times for the same SHA-1.
> 
> This patch handles n=2 and n=3 cases and simply copies the
> data rather than repeating the fill_tree_descriptor().
> 
> ================
> On the Windows repo (500K trees, 3.1M files, 450MB index),
> this reduced the overall time by 0.75 seconds when cycling
> between 2 commits with a single file difference.
> 
> (avg) before: 22.699
> (avg) after:  21.955
> ===============
> 
> ================
> Using the p0004-read-tree test (posted earlier this week)
> with 1M files on Linux:
> 
> before:
> $ ./p0004-read-tree.sh
> 0004.5: switch work1 work2 (1003037)       11.99(8.12+3.32)
> 0004.6: switch commit aliases (1003037)    11.95(8.20+3.14)
> 
> after:
> $ ./p0004-read-tree.sh
> 0004.5: switch work1 work2 (1003037)       11.02(7.71+2.78)
> 0004.6: switch commit aliases (1003037)    10.95(7.57+2.82)
> ================
> 
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>   unpack-trees.c | 23 +++++++++++++++++++----
>   1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 3a8ee19..143c5d9 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info)
>   	return ret;
>   }
>   
> +static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
> +{
> +	return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
> +}
> +
>   static int traverse_trees_recursive(int n, unsigned long dirmask,
>   				    unsigned long df_conflicts,
>   				    struct name_entry *names,
> @@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
>   	newinfo.df_conflicts |= df_conflicts;
>   
>   	for (i = 0; i < n; i++, dirmask >>= 1) {
> -		const unsigned char *sha1 = NULL;
> -		if (dirmask & 1)
> -			sha1 = names[i].oid->hash;
> -		buf[i] = fill_tree_descriptor(t+i, sha1);
> +		if (i > 0 && are_same_oid(&names[i], &names[i-1])) {
> +			/* implicitly borrow buf[i-1] inside tree_desc[i] */
> +			memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));

An assignment would be simpler:

			t[i] = t[i - 1];

> +			buf[i] = NULL;
> +		} else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
> +			/* implicitly borrow buf[i-2] inside tree_desc[i] */
> +			memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));

Similar case.

> +			buf[i] = NULL;

What makes the previous two entries special, or differently put: Why not 
just check *all* previous entries?  MAX_UNPACK_TREES is 8; the number of 
comparisons would just about double in the worst case and stay the same 
for three trees or less.  The order of four trees or more wouldn't 
matter anymore.

> +		} else {
> +			const unsigned char *sha1 = NULL;
> +			if (dirmask & 1)
> +				sha1 = names[i].oid->hash;
> +			buf[i] = fill_tree_descriptor(t+i, sha1);
> +		}
>   	}
>   
>   	bottom = switch_cache_bottom(&newinfo);
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
  2017-04-08 14:06   ` René Scharfe
@ 2017-04-10 20:55     ` Jeff King
  2017-04-10 21:28       ` Jeff Hostetler
  2017-04-10 21:26     ` Jeff Hostetler
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2017-04-10 20:55 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, git, gitster, Jeff Hostetler

On Sat, Apr 08, 2017 at 04:06:41PM +0200, René Scharfe wrote:

> > +		} else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
> > +			/* implicitly borrow buf[i-2] inside tree_desc[i] */
> > +			memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
> 
> Similar case.
> 
> > +			buf[i] = NULL;
> 
> What makes the previous two entries special, or differently put: Why not
> just check *all* previous entries?  MAX_UNPACK_TREES is 8; the number of
> comparisons would just about double in the worst case and stay the same for
> three trees or less.  The order of four trees or more wouldn't matter
> anymore.

If I understand correctly, we've got N (up to 8) trees, and we want to
find sets of duplicates. Adjacency doesn't matter. Aren't our options
basically:

  - look through all previous entries naively, O(N^2)

  - sort the list, O(N log N); but we need the original order, so we'd
    have to unsort at the end, too

  - use a hash table, O(N) but with O(N) extra storage

I know N=8 isn't very big algorithmically speaking, but it would feel
ugly to introduce something quadratic. Especially the limit of 8 seems
rather arbitrary. In normal cases we see at most a 3-way merge. Beyond
that we're in octopus territory, and 8 is way too low there (I think the
octopus strategy just unpacks one at a time and barfs if there are any
conflicts).

I assume the rationale behind "check the last 2" is just "this
optimization will kick in reliably for all sane cases", weird octopus
unpack-trees be damned (though reading ca885a4fe, it sounds like 4-trees
can actually happen, but I'm not clear on how).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
  2017-04-10 20:55     ` Jeff King
@ 2017-04-10 21:28       ` Jeff Hostetler
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Hostetler @ 2017-04-10 21:28 UTC (permalink / raw)
  To: Jeff King, René Scharfe; +Cc: git, gitster, Jeff Hostetler



On 4/10/2017 4:55 PM, Jeff King wrote:
> On Sat, Apr 08, 2017 at 04:06:41PM +0200, René Scharfe wrote:
>
>>> +		} else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
>>> +			/* implicitly borrow buf[i-2] inside tree_desc[i] */
>>> +			memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
>>
>> Similar case.
>>
>>> +			buf[i] = NULL;
>>
>> What makes the previous two entries special, or differently put: Why not
>> just check *all* previous entries?  MAX_UNPACK_TREES is 8; the number of
>> comparisons would just about double in the worst case and stay the same for
>> three trees or less.  The order of four trees or more wouldn't matter
>> anymore.
>
> If I understand correctly, we've got N (up to 8) trees, and we want to
> find sets of duplicates. Adjacency doesn't matter. Aren't our options
> basically:
>
>   - look through all previous entries naively, O(N^2)
>
>   - sort the list, O(N log N); but we need the original order, so we'd
>     have to unsort at the end, too
>
>   - use a hash table, O(N) but with O(N) extra storage
>
> I know N=8 isn't very big algorithmically speaking, but it would feel
> ugly to introduce something quadratic. Especially the limit of 8 seems
> rather arbitrary. In normal cases we see at most a 3-way merge. Beyond
> that we're in octopus territory, and 8 is way too low there (I think the
> octopus strategy just unpacks one at a time and barfs if there are any
> conflicts).
>
> I assume the rationale behind "check the last 2" is just "this
> optimization will kick in reliably for all sane cases", weird octopus
> unpack-trees be damned (though reading ca885a4fe, it sounds like 4-trees
> can actually happen, but I'm not clear on how).

yeah, i didn't want to pay for the obscure (n >= 4) cases with any
of the above.  doing 2 or 3 gets us checkout and merge.  these are
the common usages.

Jeff


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
  2017-04-08 14:06   ` René Scharfe
  2017-04-10 20:55     ` Jeff King
@ 2017-04-10 21:26     ` Jeff Hostetler
  2017-04-10 23:09       ` René Scharfe
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Hostetler @ 2017-04-10 21:26 UTC (permalink / raw)
  To: René Scharfe, git; +Cc: gitster, peff, Jeff Hostetler



On 4/8/2017 10:06 AM, René Scharfe wrote:
> Am 07.04.2017 um 17:53 schrieb git@jeffhostetler.com:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach traverse_trees_recursive() to not do redundant ODB
>> lookups when both directories refer to the same OID.
>>
>> In operations such as read-tree, checkout, and merge when
>> the differences between the commits are relatively small,
>> there will likely be many directories that have the same
>> SHA-1.  In these cases we can avoid hitting the ODB multiple
>> times for the same SHA-1.
>>
>> This patch handles n=2 and n=3 cases and simply copies the
>> data rather than repeating the fill_tree_descriptor().
>>
>> ================
>> On the Windows repo (500K trees, 3.1M files, 450MB index),
>> this reduced the overall time by 0.75 seconds when cycling
>> between 2 commits with a single file difference.
>>
>> (avg) before: 22.699
>> (avg) after:  21.955
>> ===============
>>
>> ================
>> Using the p0004-read-tree test (posted earlier this week)
>> with 1M files on Linux:
>>
>> before:
>> $ ./p0004-read-tree.sh
>> 0004.5: switch work1 work2 (1003037)       11.99(8.12+3.32)
>> 0004.6: switch commit aliases (1003037)    11.95(8.20+3.14)
>>
>> after:
>> $ ./p0004-read-tree.sh
>> 0004.5: switch work1 work2 (1003037)       11.02(7.71+2.78)
>> 0004.6: switch commit aliases (1003037)    10.95(7.57+2.82)
>> ================
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> ---
>>   unpack-trees.c | 23 +++++++++++++++++++----
>>   1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 3a8ee19..143c5d9 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -531,6 +531,11 @@ static int switch_cache_bottom(struct
>> traverse_info *info)
>>       return ret;
>>   }
>>   +static inline int are_same_oid(struct name_entry *name_j, struct
>> name_entry *name_k)
>> +{
>> +    return name_j->oid && name_k->oid && !oidcmp(name_j->oid,
>> name_k->oid);
>> +}
>> +
>>   static int traverse_trees_recursive(int n, unsigned long dirmask,
>>                       unsigned long df_conflicts,
>>                       struct name_entry *names,
>> @@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n,
>> unsigned long dirmask,
>>       newinfo.df_conflicts |= df_conflicts;
>>         for (i = 0; i < n; i++, dirmask >>= 1) {
>> -        const unsigned char *sha1 = NULL;
>> -        if (dirmask & 1)
>> -            sha1 = names[i].oid->hash;
>> -        buf[i] = fill_tree_descriptor(t+i, sha1);
>> +        if (i > 0 && are_same_oid(&names[i], &names[i-1])) {
>> +            /* implicitly borrow buf[i-1] inside tree_desc[i] */
>> +            memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));
>
> An assignment would be simpler:
>
>             t[i] = t[i - 1];

True, but this might be a coin toss.  Maybe we should
see what the generated assembly looks like for each ??

>
>> +            buf[i] = NULL;
>> +        } else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
>> +            /* implicitly borrow buf[i-2] inside tree_desc[i] */
>> +            memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
>
> Similar case.
>
>> +            buf[i] = NULL;
>
> What makes the previous two entries special, or differently put: Why not
> just check *all* previous entries?  MAX_UNPACK_TREES is 8; the number of
> comparisons would just about double in the worst case and stay the same
> for three trees or less.  The order of four trees or more wouldn't
> matter anymore.

I tried to hit the common cases.  This loop runs a lot
and I didn't want to put an O(n^2) thing in there to
look for any matching peer.  Most of the time we are
in a simple 2 or 3 way effort.  I didn't want to pay
for the looping/branching overhead for the obscure [4..8]
efforts.

>
>> +        } else {
>> +            const unsigned char *sha1 = NULL;
>> +            if (dirmask & 1)
>> +                sha1 = names[i].oid->hash;
>> +            buf[i] = fill_tree_descriptor(t+i, sha1);
>> +        }
>>       }
>>         bottom = switch_cache_bottom(&newinfo);
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
  2017-04-10 21:26     ` Jeff Hostetler
@ 2017-04-10 23:09       ` René Scharfe
  2017-04-11 20:42         ` Jeff Hostetler
  0 siblings, 1 reply; 8+ messages in thread
From: René Scharfe @ 2017-04-10 23:09 UTC (permalink / raw)
  To: Jeff Hostetler, git; +Cc: gitster, peff, Jeff Hostetler

Am 10.04.2017 um 23:26 schrieb Jeff Hostetler:
 > On 4/8/2017 10:06 AM, René Scharfe wrote:
 >> Am 07.04.2017 um 17:53 schrieb git@jeffhostetler.com:
 >>> +            /* implicitly borrow buf[i-1] inside tree_desc[i] */
 >>> +            memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));
 >>
 >> An assignment would be simpler:
 >>
 >>             t[i] = t[i - 1];
 >
 > True, but this might be a coin toss.  Maybe we should
 > see what the generated assembly looks like for each ??

Clang, GCC and ICC inline that memcpy call; the assembly output is the
same for both variants: https://godbolt.org/g/1q0YwK.  I guess you worry
about compilers that are just bad at struct assignments (i.e. worse than
calling memcpy)?  Do you have examples (just curious)?

Assignments are easier on the eye of human readers in any case, and
there is no way to silently get the size wrong.

 > I tried to hit the common cases.  This loop runs a lot
 > and I didn't want to put an O(n^2) thing in there to
 > look for any matching peer.  Most of the time we are
 > in a simple 2 or 3 way effort.  I didn't want to pay
 > for the looping/branching overhead for the obscure [4..8]
 > efforts.

Makes sense, and it's a nice heuristic.  Perhaps it would be a good idea
to document these choices in a comment?

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
  2017-04-10 23:09       ` René Scharfe
@ 2017-04-11 20:42         ` Jeff Hostetler
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Hostetler @ 2017-04-11 20:42 UTC (permalink / raw)
  To: René Scharfe, git; +Cc: gitster, peff, Jeff Hostetler



On 4/10/2017 7:09 PM, René Scharfe wrote:
> Am 10.04.2017 um 23:26 schrieb Jeff Hostetler:
>> On 4/8/2017 10:06 AM, René Scharfe wrote:
>>> Am 07.04.2017 um 17:53 schrieb git@jeffhostetler.com:
>>>> +            /* implicitly borrow buf[i-1] inside tree_desc[i] */
>>>> +            memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));
>>>
>>> An assignment would be simpler:
>>>
>>>             t[i] = t[i - 1];
>>
>> True, but this might be a coin toss.  Maybe we should
>> see what the generated assembly looks like for each ??
>
> Clang, GCC and ICC inline that memcpy call; the assembly output is the
> same for both variants: https://godbolt.org/g/1q0YwK.  I guess you worry
> about compilers that are just bad at struct assignments (i.e. worse than
> calling memcpy)?  Do you have examples (just curious)?

Nice website!  Really!

Yes, my concern was that structure copies would do it
field by field rather than just a block copy.  No, I
don't have any examples -- maybe just some very old
brain cells.... :-)

And I just checked VS2015 and the structure copy is a
few instructions shorter, but roughly the same.

>
> Assignments are easier on the eye of human readers in any case, and
> there is no way to silently get the size wrong.

agreed. thanks.

>
>> I tried to hit the common cases.  This loop runs a lot
>> and I didn't want to put an O(n^2) thing in there to
>> look for any matching peer.  Most of the time we are
>> in a simple 2 or 3 way effort.  I didn't want to pay
>> for the looping/branching overhead for the obscure [4..8]
>> efforts.
>
> Makes sense, and it's a nice heuristic.  Perhaps it would be a good idea
> to document these choices in a comment?

Good point. Thanks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-11 20:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07 15:53 [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-07 15:53 ` git
2017-04-08 14:06   ` René Scharfe
2017-04-10 20:55     ` Jeff King
2017-04-10 21:28       ` Jeff Hostetler
2017-04-10 21:26     ` Jeff Hostetler
2017-04-10 23:09       ` René Scharfe
2017-04-11 20:42         ` Jeff Hostetler

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).