From: Jeff Hostetler <git@jeffhostetler.com>
To: "René Scharfe" <l.s.r@web.de>,
"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src
Date: Mon, 10 Jun 2019 10:23:11 -0400 [thread overview]
Message-ID: <bf63c8cc-7146-6f13-cb77-785763fb77c8@jeffhostetler.com> (raw)
In-Reply-To: <bcaff0ce-365d-1d17-6ea0-7e3b52fc928b@web.de>
On 6/8/2019 6:27 PM, René Scharfe wrote:
> Am 08.06.19 um 16:42 schrieb Jeff Hostetler via GitGitGadget:
>> From: Jeff Hostetler <jeffhost@microsoft.com>
>>
>> Teach register_rename_src() to see if new file pair
>> can simply be appended to the rename_src[] array before
>> performing the binary search to find the proper insertion
>> point.
>>
>> This is a performance optimization. This routine is called
>> during run_diff_files in status and the caller is iterating
>> over the sorted index, so we should expect to be able to
>> append in the normal case. The existing insert logic is
>> preserved so we don't have to assume that, but simply take
>> advantage of it if possible.
>>
>> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> ---
>> diffcore-rename.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/diffcore-rename.c b/diffcore-rename.c
>> index 07bd34b631..5bfc5f6c22 100644
>> --- a/diffcore-rename.c
>> +++ b/diffcore-rename.c
>> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>>
>> first = 0;
>> last = rename_src_nr;
>> +
>> + if (last > 0) {
>> + struct diff_rename_src *src = &(rename_src[last-1]);
>> + int cmp = strcmp(one->path, src->p->one->path);
>> + if (!cmp)
>> + return src;
>> + if (cmp > 0) {
>> + first = last;
>> + goto append_it;
>
> The goto is not necessary from a logical point of view; the binary
> search is skipped even without it because the loop condition below is
> not met anymore.
Perhaps this is just a personal style thing, but I prefer the
goto because it makes it clear that we know the answer and are
done searching. I know it's a small thing and I know we do it
all the time, but setting state here in just the right way as
to defeat the loop would work, but is a little less clear. But
again, that is a personal style thing I guess.
>> + }
>
> You could decrement "last" at this point as a micro-optimization, to use
> all three possible outcomes of the comparison to make some progress.
>
> Not sure if any of that would have a _measurable_ impact, though, so I
> don't mind the patch going in as is.
Yes, it looks like we could decrement "last" here. And yes, I
suspect it would have minimal impact. I'll pass on that in this
series to keep it focused on the advertised goal of simply
appending if possible.
>
>> + }
>> +
>> while (last > first) {
>> int next = (last + first) >> 1;
>
> Hmm, "last" and "first" are ints as well, so this will give weird results
> when "last" > INT_MAX / 2. I thought 19716b21a4 ("cleanup: fix possible
> overflow errors in binary search", 2017-10-08) got us rid of those, but
> git grep -n '(.*+.*) >> 1' actually finds some more cases:
>
> builtin/ls-files.c:376: int next = (last + first) >> 1;
> diffcore-rename.c:26: int next = (last + first) >> 1;
> diffcore-rename.c:86: int next = (last + first) >> 1;
> dir.c:704: int cmp, next = (last + first) >> 1;
> name-hash.c:349: int mid = (begin + end) >> 1;
> read-cache.c:552: int next = (last + first) >> 1;
> sh-i18n--envsubst.c:252: size_t j = (j1 + j2) >> 1;
>
> (Not caused by this patch, of course, just noticed it in the context.)
>
>> struct diff_rename_src *src = &(rename_src[next]);
>> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
>> first = next+1;
>> }
>>
>> +append_it:
>> /* insert to make it at "first" */
>> ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
>> rename_src_nr++;
>>
>
> Anyway, this here should work as well (and fix the possible overflow),
> but may be too terse and quirky:
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 07bd34b631..f2a9007e08 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -76,14 +76,13 @@ static int rename_src_nr, rename_src_alloc;
>
> static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
> {
> - int first, last;
> + int first, last, next;
> struct diff_filespec *one = p->one;
> unsigned short score = p->score;
>
> first = 0;
> last = rename_src_nr;
> - while (last > first) {
> - int next = (last + first) >> 1;
> + for (next = last - 1; last > first; next = first + (last - first) / 2) {
> struct diff_rename_src *src = &(rename_src[next]);
> int cmp = strcmp(one->path, src->p->one->path);
> if (!cmp)
>
I'd like to limit the scope of this patch series to just
the advertised topic and save the arithmetic fixes to
another patch series like 19716b21a4.
Jeff
next prev parent reply other threads:[~2019-06-10 14:23 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-08 14:42 [PATCH 0/1] Optimize run_diff_files()' rename detection Johannes Schindelin via GitGitGadget
2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget
2019-06-08 22:27 ` René Scharfe
2019-06-10 14:23 ` Jeff Hostetler [this message]
2019-06-10 12:26 ` SZEDER Gábor
2019-06-10 14:55 ` Jeff Hostetler
2019-06-10 14:43 ` [PATCH 0/1] Optimize run_diff_files()' rename detection Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bf63c8cc-7146-6f13-cb77-785763fb77c8@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=l.s.r@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).