git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/1] Optimize run_diff_files()' rename detection
@ 2019-06-08 14:42 Johannes Schindelin via GitGitGadget
  2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget
  2019-06-10 14:43 ` [PATCH 0/1] Optimize run_diff_files()' rename detection Jeff Hostetler
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2019-06-08 14:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Just another patch from Git for Windows' branch thicket...

Jeff Hostetler (1):
  diffcore-rename: speed up register_rename_src

 diffcore-rename.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)


base-commit: 8104ec994ea3849a968b4667d072fedd1e688642
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-142%2Fdscho%2Fregister_rename_src-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-142/dscho/register_rename_src-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/142
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] diffcore-rename: speed up register_rename_src
  2019-06-08 14:42 [PATCH 0/1] Optimize run_diff_files()' rename detection Johannes Schindelin via GitGitGadget
@ 2019-06-08 14:42 ` Jeff Hostetler via GitGitGadget
  2019-06-08 22:27   ` René Scharfe
  2019-06-10 12:26   ` SZEDER Gábor
  2019-06-10 14:43 ` [PATCH 0/1] Optimize run_diff_files()' rename detection Jeff Hostetler
  1 sibling, 2 replies; 7+ messages in thread
From: Jeff Hostetler via GitGitGadget @ 2019-06-08 14:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach register_rename_src() to see if new file pair
can simply be appended to the rename_src[] array before
performing the binary search to find the proper insertion
point.

This is a performance optimization.  This routine is called
during run_diff_files in status and the caller is iterating
over the sorted index, so we should expect to be able to
append in the normal case.  The existing insert logic is
preserved so we don't have to assume that, but simply take
advantage of it if possible.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 diffcore-rename.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 07bd34b631..5bfc5f6c22 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
 
 	first = 0;
 	last = rename_src_nr;
+
+	if (last > 0) {
+		struct diff_rename_src *src = &(rename_src[last-1]);
+		int cmp = strcmp(one->path, src->p->one->path);
+		if (!cmp)
+			return src;
+		if (cmp > 0) {
+			first = last;
+			goto append_it;
+		}
+	}
+
 	while (last > first) {
 		int next = (last + first) >> 1;
 		struct diff_rename_src *src = &(rename_src[next]);
@@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
 		first = next+1;
 	}
 
+append_it:
 	/* insert to make it at "first" */
 	ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
 	rename_src_nr++;
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src
  2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget
@ 2019-06-08 22:27   ` René Scharfe
  2019-06-10 14:23     ` Jeff Hostetler
  2019-06-10 12:26   ` SZEDER Gábor
  1 sibling, 1 reply; 7+ messages in thread
From: René Scharfe @ 2019-06-08 22:27 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget, git; +Cc: Junio C Hamano, Jeff Hostetler

Am 08.06.19 um 16:42 schrieb Jeff Hostetler via GitGitGadget:
> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Teach register_rename_src() to see if new file pair
> can simply be appended to the rename_src[] array before
> performing the binary search to find the proper insertion
> point.
>
> This is a performance optimization.  This routine is called
> during run_diff_files in status and the caller is iterating
> over the sorted index, so we should expect to be able to
> append in the normal case.  The existing insert logic is
> preserved so we don't have to assume that, but simply take
> advantage of it if possible.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  diffcore-rename.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 07bd34b631..5bfc5f6c22 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>
>  	first = 0;
>  	last = rename_src_nr;
> +
> +	if (last > 0) {
> +		struct diff_rename_src *src = &(rename_src[last-1]);
> +		int cmp = strcmp(one->path, src->p->one->path);
> +		if (!cmp)
> +			return src;
> +		if (cmp > 0) {
> +			first = last;
> +			goto append_it;

The goto is not necessary from a logical point of view; the binary
search is skipped even without it because the loop condition below is
not met anymore.

> +		}

You could decrement "last" at this point as a micro-optimization, to use
all three possible outcomes of the comparison to make some progress.

Not sure if any of that would have a _measurable_ impact, though, so I
don't mind the patch going in as is.

> +	}
> +
>  	while (last > first) {
>  		int next = (last + first) >> 1;

Hmm, "last" and "first" are ints as well, so this will give weird results
when "last" > INT_MAX / 2.  I thought 19716b21a4 ("cleanup: fix possible
overflow errors in binary search", 2017-10-08) got us rid of those, but
git grep -n '(.*+.*) >> 1' actually finds some more cases:

   builtin/ls-files.c:376:         int next = (last + first) >> 1;
   diffcore-rename.c:26:           int next = (last + first) >> 1;
   diffcore-rename.c:86:           int next = (last + first) >> 1;
   dir.c:704:              int cmp, next = (last + first) >> 1;
   name-hash.c:349:                        int mid = (begin + end) >> 1;
   read-cache.c:552:               int next = (last + first) >> 1;
   sh-i18n--envsubst.c:252:          size_t j = (j1 + j2) >> 1;

(Not caused by this patch, of course, just noticed it in the context.)

>  		struct diff_rename_src *src = &(rename_src[next]);
> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>  		first = next+1;
>  	}
>
> +append_it:
>  	/* insert to make it at "first" */
>  	ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
>  	rename_src_nr++;
>

Anyway, this here should work as well (and fix the possible overflow),
but may be too terse and quirky:

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 07bd34b631..f2a9007e08 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -76,14 +76,13 @@ static int rename_src_nr, rename_src_alloc;

 static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
 {
-	int first, last;
+	int first, last, next;
 	struct diff_filespec *one = p->one;
 	unsigned short score = p->score;

 	first = 0;
 	last = rename_src_nr;
-	while (last > first) {
-		int next = (last + first) >> 1;
+	for (next = last - 1; last > first; next = first + (last - first) / 2) {
 		struct diff_rename_src *src = &(rename_src[next]);
 		int cmp = strcmp(one->path, src->p->one->path);
 		if (!cmp)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src
  2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget
  2019-06-08 22:27   ` René Scharfe
@ 2019-06-10 12:26   ` SZEDER Gábor
  2019-06-10 14:55     ` Jeff Hostetler
  1 sibling, 1 reply; 7+ messages in thread
From: SZEDER Gábor @ 2019-06-10 12:26 UTC (permalink / raw)
  To: Jeff Hostetler via GitGitGadget; +Cc: git, Junio C Hamano, Jeff Hostetler

On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Teach register_rename_src() to see if new file pair
> can simply be appended to the rename_src[] array before
> performing the binary search to find the proper insertion
> point.
> 
> This is a performance optimization.  This routine is called
> during run_diff_files in status and the caller is iterating
> over the sorted index, so we should expect to be able to
> append in the normal case.  The existing insert logic is
> preserved so we don't have to assume that, but simply take
> advantage of it if possible.

Could you add a command and performance figures to the commit message
to show off the benefits of this patch?

From the description it's clear that it's a performance optimization,
but it's unclear whether it has a noticeable, or at least measurable
effect, or it's "only" a micro-optimization.

I tried something more substantial than a simple 'git status':

  # without this patch
  $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
  [ ... ]
  
  real    2m4.320s
  user    2m0.913s
  sys     0m2.284s
  $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
      52.40%     0.79%  git      git                 [.] diffcore_rename
       0.01%     0.01%  git      git                 [.] register_rename_src

but it looks like that while more than half of the considerable
runtime is spent detecting renames, the time spent in
register_rename_src(), the function optimized in this patch, is
negligible.


> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  diffcore-rename.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 07bd34b631..5bfc5f6c22 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>  
>  	first = 0;
>  	last = rename_src_nr;
> +
> +	if (last > 0) {
> +		struct diff_rename_src *src = &(rename_src[last-1]);
> +		int cmp = strcmp(one->path, src->p->one->path);
> +		if (!cmp)
> +			return src;
> +		if (cmp > 0) {
> +			first = last;
> +			goto append_it;
> +		}
> +	}
> +
>  	while (last > first) {
>  		int next = (last + first) >> 1;
>  		struct diff_rename_src *src = &(rename_src[next]);
> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>  		first = next+1;
>  	}
>  
> +append_it:
>  	/* insert to make it at "first" */
>  	ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
>  	rename_src_nr++;
> -- 
> gitgitgadget

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src
  2019-06-08 22:27   ` René Scharfe
@ 2019-06-10 14:23     ` Jeff Hostetler
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Hostetler @ 2019-06-10 14:23 UTC (permalink / raw)
  To: René Scharfe, Jeff Hostetler via GitGitGadget, git
  Cc: Junio C Hamano, Jeff Hostetler



On 6/8/2019 6:27 PM, René Scharfe wrote:
> Am 08.06.19 um 16:42 schrieb Jeff Hostetler via GitGitGadget:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach register_rename_src() to see if new file pair
>> can simply be appended to the rename_src[] array before
>> performing the binary search to find the proper insertion
>> point.
>>
>> This is a performance optimization.  This routine is called
>> during run_diff_files in status and the caller is iterating
>> over the sorted index, so we should expect to be able to
>> append in the normal case.  The existing insert logic is
>> preserved so we don't have to assume that, but simply take
>> advantage of it if possible.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> ---
>>   diffcore-rename.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/diffcore-rename.c b/diffcore-rename.c
>> index 07bd34b631..5bfc5f6c22 100644
>> --- a/diffcore-rename.c
>> +++ b/diffcore-rename.c
>> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>>
>>   	first = 0;
>>   	last = rename_src_nr;
>> +
>> +	if (last > 0) {
>> +		struct diff_rename_src *src = &(rename_src[last-1]);
>> +		int cmp = strcmp(one->path, src->p->one->path);
>> +		if (!cmp)
>> +			return src;
>> +		if (cmp > 0) {
>> +			first = last;
>> +			goto append_it;
> 
> The goto is not necessary from a logical point of view; the binary
> search is skipped even without it because the loop condition below is
> not met anymore.

Perhaps this is just a personal style thing, but I prefer the
goto because it makes it clear that we know the answer and are
done searching.  I know it's a small thing and I know we do it
all the time, but setting state here in just the right way as
to defeat the loop would work, but is a little less clear.  But
again, that is a personal style thing I guess.


>> +		}
> 
> You could decrement "last" at this point as a micro-optimization, to use
> all three possible outcomes of the comparison to make some progress.
> 
> Not sure if any of that would have a _measurable_ impact, though, so I
> don't mind the patch going in as is.

Yes, it looks like we could decrement "last" here.  And yes, I
suspect it would have minimal impact.  I'll pass on that in this
series to keep it focused on the advertised goal of simply
appending if possible.

> 
>> +	}
>> +
>>   	while (last > first) {
>>   		int next = (last + first) >> 1;
> 
> Hmm, "last" and "first" are ints as well, so this will give weird results
> when "last" > INT_MAX / 2.  I thought 19716b21a4 ("cleanup: fix possible
> overflow errors in binary search", 2017-10-08) got us rid of those, but
> git grep -n '(.*+.*) >> 1' actually finds some more cases:
> 
>     builtin/ls-files.c:376:         int next = (last + first) >> 1;
>     diffcore-rename.c:26:           int next = (last + first) >> 1;
>     diffcore-rename.c:86:           int next = (last + first) >> 1;
>     dir.c:704:              int cmp, next = (last + first) >> 1;
>     name-hash.c:349:                        int mid = (begin + end) >> 1;
>     read-cache.c:552:               int next = (last + first) >> 1;
>     sh-i18n--envsubst.c:252:          size_t j = (j1 + j2) >> 1;
> 
> (Not caused by this patch, of course, just noticed it in the context.)
> 
>>   		struct diff_rename_src *src = &(rename_src[next]);
>> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>>   		first = next+1;
>>   	}
>>
>> +append_it:
>>   	/* insert to make it at "first" */
>>   	ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
>>   	rename_src_nr++;
>>
> 
> Anyway, this here should work as well (and fix the possible overflow),
> but may be too terse and quirky:
> 
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 07bd34b631..f2a9007e08 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -76,14 +76,13 @@ static int rename_src_nr, rename_src_alloc;
> 
>   static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>   {
> -	int first, last;
> +	int first, last, next;
>   	struct diff_filespec *one = p->one;
>   	unsigned short score = p->score;
> 
>   	first = 0;
>   	last = rename_src_nr;
> -	while (last > first) {
> -		int next = (last + first) >> 1;
> +	for (next = last - 1; last > first; next = first + (last - first) / 2) {
>   		struct diff_rename_src *src = &(rename_src[next]);
>   		int cmp = strcmp(one->path, src->p->one->path);
>   		if (!cmp)
> 

I'd like to limit the scope of this patch series to just
the advertised topic and save the arithmetic fixes to
another patch series like 19716b21a4.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] Optimize run_diff_files()' rename detection
  2019-06-08 14:42 [PATCH 0/1] Optimize run_diff_files()' rename detection Johannes Schindelin via GitGitGadget
  2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget
@ 2019-06-10 14:43 ` Jeff Hostetler
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff Hostetler @ 2019-06-10 14:43 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget, git; +Cc: Junio C Hamano



On 6/8/2019 10:42 AM, Johannes Schindelin via GitGitGadget wrote:
> Just another patch from Git for Windows' branch thicket...
> 
> Jeff Hostetler (1):
>    diffcore-rename: speed up register_rename_src
> 
>   diffcore-rename.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> 
> base-commit: 8104ec994ea3849a968b4667d072fedd1e688642
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-142%2Fdscho%2Fregister_rename_src-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-142/dscho/register_rename_src-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/142
> 

For the sake of full disclosure, we added this patch
to Git for Windows in December 2016.

It was discussed on the mailing list in April 2017 [1] but
it was shelved for various reasons.

Let me put this one on hold while I dig up my notes from 2016
and re-review of the original mailing list suggestions and
see where I want to take this patch going forward.

Jeff


[1] 
https://public-inbox.org/git/20170418194421.22453-1-git@jeffhostetler.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src
  2019-06-10 12:26   ` SZEDER Gábor
@ 2019-06-10 14:55     ` Jeff Hostetler
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Hostetler @ 2019-06-10 14:55 UTC (permalink / raw)
  To: SZEDER Gábor, Jeff Hostetler via GitGitGadget
  Cc: git, Junio C Hamano, Jeff Hostetler



On 6/10/2019 8:26 AM, SZEDER Gábor wrote:
> On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach register_rename_src() to see if new file pair
>> can simply be appended to the rename_src[] array before
>> performing the binary search to find the proper insertion
>> point.
>>
>> This is a performance optimization.  This routine is called
>> during run_diff_files in status and the caller is iterating
>> over the sorted index, so we should expect to be able to
>> append in the normal case.  The existing insert logic is
>> preserved so we don't have to assume that, but simply take
>> advantage of it if possible.
> 
> Could you add a command and performance figures to the commit message
> to show off the benefits of this patch?
> 
>  From the description it's clear that it's a performance optimization,
> but it's unclear whether it has a noticeable, or at least measurable
> effect, or it's "only" a micro-optimization.
> 
> I tried something more substantial than a simple 'git status':
> 
>    # without this patch
>    $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null
>    [ ... ]
>    
>    real    2m4.320s
>    user    2m0.913s
>    sys     0m2.284s
>    $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)'
>        52.40%     0.79%  git      git                 [.] diffcore_rename
>         0.01%     0.01%  git      git                 [.] register_rename_src
> 
> but it looks like that while more than half of the considerable
> runtime is spent detecting renames, the time spent in
> register_rename_src(), the function optimized in this patch, is
> negligible.
> 

I just wanted to send an ACK.  I'll include perf numbers
and more clarity after I dig up my notes on this.

Jeff


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-08 14:42 [PATCH 0/1] Optimize run_diff_files()' rename detection Johannes Schindelin via GitGitGadget
2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget
2019-06-08 22:27   ` René Scharfe
2019-06-10 14:23     ` Jeff Hostetler
2019-06-10 12:26   ` SZEDER Gábor
2019-06-10 14:55     ` Jeff Hostetler
2019-06-10 14:43 ` [PATCH 0/1] Optimize run_diff_files()' rename detection Jeff Hostetler

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox