git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] cache-tree: remove use of strbuf_addf in update_one
@ 2017-08-10 18:47 Kevin Willford
  2017-08-10 18:58 ` Stefan Beller
  2017-08-11 13:08 ` René Scharfe
  0 siblings, 2 replies; 7+ messages in thread
From: Kevin Willford @ 2017-08-10 18:47 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, peartben, Kevin Willford

String formatting can be a performance issue when there are
hundreds of thousands of trees.

Change to stop using the strbuf_addf and just add the strings
or characters individually.

There are a limited number of modes so added a switch for the
known ones and a default case if something comes through that
are not a known one for git.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
 cache-tree.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 2440d1dc89..41744b3db7 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it,
 			continue;
 
 		strbuf_grow(&buffer, entlen + 100);
-		strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0');
+
+		switch (mode) {
+		case 0100644:
+			strbuf_add(&buffer, "100644 ", 7);
+			break;
+		case 0100664:
+			strbuf_add(&buffer, "100664 ", 7);
+			break;
+		case 0100755:
+			strbuf_add(&buffer, "100755 ", 7);
+			break;
+		case 0120000:
+			strbuf_add(&buffer, "120000 ", 7);
+			break;
+		case 0160000:
+			strbuf_add(&buffer, "160000 ", 7);
+			break;
+		default:
+			strbuf_addf(&buffer, "%o ", mode);
+			break;
+		}
+		strbuf_add(&buffer, path + baselen, entlen);
+		strbuf_addch(&buffer, '\0');
 		strbuf_add(&buffer, sha1, 20);
 
 #if DEBUG
-- 
2.14.0.rc0.286.g44127d70e4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one
  2017-08-10 18:47 [PATCH] cache-tree: remove use of strbuf_addf in update_one Kevin Willford
@ 2017-08-10 18:58 ` Stefan Beller
  2017-08-10 19:03   ` Jeff King
  2017-08-11 13:08 ` René Scharfe
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2017-08-10 18:58 UTC (permalink / raw)
  To: Kevin Willford
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Ben Peart,
	Kevin Willford

On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford <kcwillford@gmail.com> wrote:
> String formatting can be a performance issue when there are
> hundreds of thousands of trees.

When changing this for the sake of performance, could you give
an example (which kind of repository you need for this to become
a bottleneck? I presume the large Windows repo? Or can I
reproduce it with a small repo such as linux.git or even git.git?)
and some numbers how this improves the performance?

> Change to stop using the strbuf_addf and just add the strings
> or characters individually.
>
> There are a limited number of modes so added a switch for the
> known ones and a default case if something comes through that
> are not a known one for git.
>
> Signed-off-by: Kevin Willford <kewillf@microsoft.com>
> ---
>  cache-tree.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/cache-tree.c b/cache-tree.c
> index 2440d1dc89..41744b3db7 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it,
>                         continue;
>
>                 strbuf_grow(&buffer, entlen + 100);
> -               strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0');
> +
> +               switch (mode) {
> +               case 0100644:
> +                       strbuf_add(&buffer, "100644 ", 7);
> +                       break;
> +               case 0100664:
> +                       strbuf_add(&buffer, "100664 ", 7);
> +                       break;
> +               case 0100755:
> +                       strbuf_add(&buffer, "100755 ", 7);
> +                       break;
> +               case 0120000:
> +                       strbuf_add(&buffer, "120000 ", 7);
> +                       break;
> +               case 0160000:
> +                       strbuf_add(&buffer, "160000 ", 7);
> +                       break;

Maybe it is worth spelling out the modes in non-numeric,
but e.g. S_IFGITLINK.

> +               default:
> +                       strbuf_addf(&buffer, "%o ", mode);

Given the repository you are measuring, maybe we could
get away with fewer entries here and only take the 2 or
3 most used entries and special case them?

Or in case this is assumed to be the exhaustive list,
we could issue a warning here?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one
  2017-08-10 18:58 ` Stefan Beller
@ 2017-08-10 19:03   ` Jeff King
  2017-08-10 19:57     ` Kevin Willford
  2017-08-14 18:51     ` Junio C Hamano
  0 siblings, 2 replies; 7+ messages in thread
From: Jeff King @ 2017-08-10 19:03 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Kevin Willford, git@vger.kernel.org, Junio C Hamano, Ben Peart,
	Kevin Willford

On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote:

> On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford <kcwillford@gmail.com> wrote:
> > String formatting can be a performance issue when there are
> > hundreds of thousands of trees.
> 
> When changing this for the sake of performance, could you give
> an example (which kind of repository you need for this to become
> a bottleneck? I presume the large Windows repo? Or can I
> reproduce it with a small repo such as linux.git or even git.git?)
> and some numbers how this improves the performance?

I was about to say the same thing. Normally I don't mind a small
optimization without numbers if the result is obviously an improvement.

But in this case the result is a lot less readable, and it's not
entirely clear to me that it would always be an improvement (we now
always run 3 strbuf calls instead of one, and have to check the length
for each one).

What I'm wondering specifically is if vsnprintf() on Kevin's platform
(which I'll assume is Windows) is slow, and we would do better to
replace it with a faster compat/ routine.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one
  2017-08-10 19:03   ` Jeff King
@ 2017-08-10 19:57     ` Kevin Willford
  2017-08-10 20:33       ` Stefan Beller
  2017-08-14 18:51     ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Kevin Willford @ 2017-08-10 19:57 UTC (permalink / raw)
  To: Jeff King, Stefan Beller
  Cc: git@vger.kernel.org, Junio C Hamano, Ben Peart, Kevin Willford



On 8/10/2017 3:03 PM, Jeff King wrote:
> On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote:
>
>> On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford <kcwillford@gmail.com> wrote:
>>> String formatting can be a performance issue when there are
>>> hundreds of thousands of trees.
>> When changing this for the sake of performance, could you give
>> an example (which kind of repository you need for this to become
>> a bottleneck? I presume the large Windows repo? Or can I
>> reproduce it with a small repo such as linux.git or even git.git?)
>> and some numbers how this improves the performance?
> I was about to say the same thing. Normally I don't mind a small
> optimization without numbers if the result is obviously an improvement.
>
> But in this case the result is a lot less readable, and it's not
> entirely clear to me that it would always be an improvement (we now
> always run 3 strbuf calls instead of one, and have to check the length
> for each one).
>
> What I'm wondering specifically is if vsnprintf() on Kevin's platform
> (which I'll assume is Windows) is slow, and we would do better to
> replace it with a faster compat/ routine.
>
> -Peff

The strbuf_add call is essentially only having to do a memcpy whereas
the strbuf_addf will have to parse the string, determine the types,
convert the data, and then get it in the buffer.  That could be made
faster with a better compat/ routine but I fear still far from
the length check and memcpy.

void strbuf_add(struct strbuf *sb, const void *data, size_t len)
{
	strbuf_grow(sb, len);
	memcpy(sb->buf + sb->len, data, len);
	strbuf_setlen(sb, sb->len + len);
}

Here are some of the performance numbers from the windows repo.
I will work on writing a perf test for this change so that we
have a better idea on smaller repo what the impact of this change
is on them.

              | w/o     | with fix |
-----------------------------------
git checkout | 36.08 s | 33.34 s  |
-----------------------------------
git checkout | 32.54 s | 28.26 s  |
-----------------------------------
git checkout | 44.10 s | 38.13 s  |
-----------------------------------
git merge    | 32.90 s | 30.56 s  |
-----------------------------------
git rebase   | 46.14 s | 42.18 s  |




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one
  2017-08-10 19:57     ` Kevin Willford
@ 2017-08-10 20:33       ` Stefan Beller
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Beller @ 2017-08-10 20:33 UTC (permalink / raw)
  To: Kevin Willford
  Cc: Jeff King, git@vger.kernel.org, Junio C Hamano, Ben Peart,
	Kevin Willford

On Thu, Aug 10, 2017 at 12:57 PM, Kevin Willford <kcwillford@gmail.com> wrote:
> Here are some of the performance numbers from the windows repo.
> I will work on writing a perf test for this change so that we
> have a better idea on smaller repo what the impact of this change
> is on them.
>
>              | w/o     | with fix |
> -----------------------------------
> git checkout | 36.08 s | 33.34 s  |
> -----------------------------------
> git checkout | 32.54 s | 28.26 s  |
> -----------------------------------
> git checkout | 44.10 s | 38.13 s  |
> -----------------------------------
> git merge    | 32.90 s | 30.56 s  |
> -----------------------------------
> git rebase   | 46.14 s | 42.18 s  |
>

~10-15% is impressive for this patch, I certainly did not
expect as much. Thanks for providing the numbers!

Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one
  2017-08-10 18:47 [PATCH] cache-tree: remove use of strbuf_addf in update_one Kevin Willford
  2017-08-10 18:58 ` Stefan Beller
@ 2017-08-11 13:08 ` René Scharfe
  1 sibling, 0 replies; 7+ messages in thread
From: René Scharfe @ 2017-08-11 13:08 UTC (permalink / raw)
  To: Kevin Willford, git; +Cc: peff, gitster, peartben, Kevin Willford

Am 10.08.2017 um 20:47 schrieb Kevin Willford:
> String formatting can be a performance issue when there are
> hundreds of thousands of trees.
> 
> Change to stop using the strbuf_addf and just add the strings
> or characters individually.
> 
> There are a limited number of modes so added a switch for the
> known ones and a default case if something comes through that
> are not a known one for git.
> 
> Signed-off-by: Kevin Willford <kewillf@microsoft.com>
> ---
>   cache-tree.c | 24 +++++++++++++++++++++++-
>   1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/cache-tree.c b/cache-tree.c
> index 2440d1dc89..41744b3db7 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it,
>   			continue;
>   
>   		strbuf_grow(&buffer, entlen + 100);
> -		strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0');
> +
> +		switch (mode) {
> +		case 0100644:
> +			strbuf_add(&buffer, "100644 ", 7);
> +			break;
> +		case 0100664:
> +			strbuf_add(&buffer, "100664 ", 7);
> +			break;
> +		case 0100755:
> +			strbuf_add(&buffer, "100755 ", 7);
> +			break;
> +		case 0120000:
> +			strbuf_add(&buffer, "120000 ", 7);
> +			break;
> +		case 0160000:
> +			strbuf_add(&buffer, "160000 ", 7);
> +			break;

You can avoid specifying the string length by using strbuf_addstr.  The
compiler can determine that value; the resulting object code should be
the same with -O2.

> +		default:
> +			strbuf_addf(&buffer, "%o ", mode);
> +			break;
> +		}
> +		strbuf_add(&buffer, path + baselen, entlen);
> +		strbuf_addch(&buffer, '\0');

How much of the performance improvement is due to these two (especially
%.*s)?  Looking forward to the perf script to find out myself. :)

>   		strbuf_add(&buffer, sha1, 20);
>   
>   #if DEBUG
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one
  2017-08-10 19:03   ` Jeff King
  2017-08-10 19:57     ` Kevin Willford
@ 2017-08-14 18:51     ` Junio C Hamano
  1 sibling, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2017-08-14 18:51 UTC (permalink / raw)
  To: Jeff King
  Cc: Stefan Beller, Kevin Willford, git@vger.kernel.org, Ben Peart,
	Kevin Willford

Jeff King <peff@peff.net> writes:

> On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote:
>
>> On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford <kcwillford@gmail.com> wrote:
>> > String formatting can be a performance issue when there are
>> > hundreds of thousands of trees.
>> 
>> When changing this for the sake of performance, could you give
>> an example (which kind of repository you need for this to become
>> a bottleneck? I presume the large Windows repo? Or can I
>> reproduce it with a small repo such as linux.git or even git.git?)
>> and some numbers how this improves the performance?
>
> I was about to say the same thing. Normally I don't mind a small
> optimization without numbers if the result is obviously an improvement.
>
> But in this case the result is a lot less readable, and it's not
> entirely clear to me that it would always be an improvement (we now
> always run 3 strbuf calls instead of one, and have to check the length
> for each one).
>
> What I'm wondering specifically is if vsnprintf() on Kevin's platform
> (which I'll assume is Windows) is slow, and we would do better to
> replace it with a faster compat/ routine.

Yeah, I had the same reaction.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-14 18:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-10 18:47 [PATCH] cache-tree: remove use of strbuf_addf in update_one Kevin Willford
2017-08-10 18:58 ` Stefan Beller
2017-08-10 19:03   ` Jeff King
2017-08-10 19:57     ` Kevin Willford
2017-08-10 20:33       ` Stefan Beller
2017-08-14 18:51     ` Junio C Hamano
2017-08-11 13:08 ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).