* [PATCH 0/1] Optimize run_diff_files()' rename detection @ 2019-06-08 14:42 Johannes Schindelin via GitGitGadget 2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget 2019-06-10 14:43 ` [PATCH 0/1] Optimize run_diff_files()' rename detection Jeff Hostetler 0 siblings, 2 replies; 7+ messages in thread From: Johannes Schindelin via GitGitGadget @ 2019-06-08 14:42 UTC (permalink / raw) To: git; +Cc: Junio C Hamano Just another patch from Git for Windows' branch thicket... Jeff Hostetler (1): diffcore-rename: speed up register_rename_src diffcore-rename.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) base-commit: 8104ec994ea3849a968b4667d072fedd1e688642 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-142%2Fdscho%2Fregister_rename_src-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-142/dscho/register_rename_src-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/142 -- gitgitgadget ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/1] diffcore-rename: speed up register_rename_src 2019-06-08 14:42 [PATCH 0/1] Optimize run_diff_files()' rename detection Johannes Schindelin via GitGitGadget @ 2019-06-08 14:42 ` Jeff Hostetler via GitGitGadget 2019-06-08 22:27 ` René Scharfe 2019-06-10 12:26 ` SZEDER Gábor 2019-06-10 14:43 ` [PATCH 0/1] Optimize run_diff_files()' rename detection Jeff Hostetler 1 sibling, 2 replies; 7+ messages in thread From: Jeff Hostetler via GitGitGadget @ 2019-06-08 14:42 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeff Hostetler From: Jeff Hostetler <jeffhost@microsoft.com> Teach register_rename_src() to see if new file pair can simply be appended to the rename_src[] array before performing the binary search to find the proper insertion point. This is a performance optimization. This routine is called during run_diff_files in status and the caller is iterating over the sorted index, so we should expect to be able to append in the normal case. The existing insert logic is preserved so we don't have to assume that, but simply take advantage of it if possible. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- diffcore-rename.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index 07bd34b631..5bfc5f6c22 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) first = 0; last = rename_src_nr; + + if (last > 0) { + struct diff_rename_src *src = &(rename_src[last-1]); + int cmp = strcmp(one->path, src->p->one->path); + if (!cmp) + return src; + if (cmp > 0) { + first = last; + goto append_it; + } + } + while (last > first) { int next = (last + first) >> 1; struct diff_rename_src *src = &(rename_src[next]); @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) first = next+1; } +append_it: /* insert to make it at "first" */ ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); rename_src_nr++; -- gitgitgadget ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src 2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget @ 2019-06-08 22:27 ` René Scharfe 2019-06-10 14:23 ` Jeff Hostetler 2019-06-10 12:26 ` SZEDER Gábor 1 sibling, 1 reply; 7+ messages in thread From: René Scharfe @ 2019-06-08 22:27 UTC (permalink / raw) To: Jeff Hostetler via GitGitGadget, git; +Cc: Junio C Hamano, Jeff Hostetler Am 08.06.19 um 16:42 schrieb Jeff Hostetler via GitGitGadget: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Teach register_rename_src() to see if new file pair > can simply be appended to the rename_src[] array before > performing the binary search to find the proper insertion > point. > > This is a performance optimization. This routine is called > during run_diff_files in status and the caller is iterating > over the sorted index, so we should expect to be able to > append in the normal case. The existing insert logic is > preserved so we don't have to assume that, but simply take > advantage of it if possible. > > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > diffcore-rename.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/diffcore-rename.c b/diffcore-rename.c > index 07bd34b631..5bfc5f6c22 100644 > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) > > first = 0; > last = rename_src_nr; > + > + if (last > 0) { > + struct diff_rename_src *src = &(rename_src[last-1]); > + int cmp = strcmp(one->path, src->p->one->path); > + if (!cmp) > + return src; > + if (cmp > 0) { > + first = last; > + goto append_it; The goto is not necessary from a logical point of view; the binary search is skipped even without it because the loop condition below is not met anymore. > + } You could decrement "last" at this point as a micro-optimization, to use all three possible outcomes of the comparison to make some progress. Not sure if any of that would have a _measurable_ impact, though, so I don't mind the patch going in as is. > + } > + > while (last > first) { > int next = (last + first) >> 1; Hmm, "last" and "first" are ints as well, so this will give weird results when "last" > INT_MAX / 2. I thought 19716b21a4 ("cleanup: fix possible overflow errors in binary search", 2017-10-08) got us rid of those, but git grep -n '(.*+.*) >> 1' actually finds some more cases: builtin/ls-files.c:376: int next = (last + first) >> 1; diffcore-rename.c:26: int next = (last + first) >> 1; diffcore-rename.c:86: int next = (last + first) >> 1; dir.c:704: int cmp, next = (last + first) >> 1; name-hash.c:349: int mid = (begin + end) >> 1; read-cache.c:552: int next = (last + first) >> 1; sh-i18n--envsubst.c:252: size_t j = (j1 + j2) >> 1; (Not caused by this patch, of course, just noticed it in the context.) > struct diff_rename_src *src = &(rename_src[next]); > @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) > first = next+1; > } > > +append_it: > /* insert to make it at "first" */ > ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); > rename_src_nr++; > Anyway, this here should work as well (and fix the possible overflow), but may be too terse and quirky: diff --git a/diffcore-rename.c b/diffcore-rename.c index 07bd34b631..f2a9007e08 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -76,14 +76,13 @@ static int rename_src_nr, rename_src_alloc; static struct diff_rename_src *register_rename_src(struct diff_filepair *p) { - int first, last; + int first, last, next; struct diff_filespec *one = p->one; unsigned short score = p->score; first = 0; last = rename_src_nr; - while (last > first) { - int next = (last + first) >> 1; + for (next = last - 1; last > first; next = first + (last - first) / 2) { struct diff_rename_src *src = &(rename_src[next]); int cmp = strcmp(one->path, src->p->one->path); if (!cmp) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src 2019-06-08 22:27 ` René Scharfe @ 2019-06-10 14:23 ` Jeff Hostetler 0 siblings, 0 replies; 7+ messages in thread From: Jeff Hostetler @ 2019-06-10 14:23 UTC (permalink / raw) To: René Scharfe, Jeff Hostetler via GitGitGadget, git Cc: Junio C Hamano, Jeff Hostetler On 6/8/2019 6:27 PM, René Scharfe wrote: > Am 08.06.19 um 16:42 schrieb Jeff Hostetler via GitGitGadget: >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Teach register_rename_src() to see if new file pair >> can simply be appended to the rename_src[] array before >> performing the binary search to find the proper insertion >> point. >> >> This is a performance optimization. This routine is called >> during run_diff_files in status and the caller is iterating >> over the sorted index, so we should expect to be able to >> append in the normal case. The existing insert logic is >> preserved so we don't have to assume that, but simply take >> advantage of it if possible. >> >> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> >> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> >> --- >> diffcore-rename.c | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> >> diff --git a/diffcore-rename.c b/diffcore-rename.c >> index 07bd34b631..5bfc5f6c22 100644 >> --- a/diffcore-rename.c >> +++ b/diffcore-rename.c >> @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) >> >> first = 0; >> last = rename_src_nr; >> + >> + if (last > 0) { >> + struct diff_rename_src *src = &(rename_src[last-1]); >> + int cmp = strcmp(one->path, src->p->one->path); >> + if (!cmp) >> + return src; >> + if (cmp > 0) { >> + first = last; >> + goto append_it; > > The goto is not necessary from a logical point of view; the binary > search is skipped even without it because the loop condition below is > not met anymore. Perhaps this is just a personal style thing, but I prefer the goto because it makes it clear that we know the answer and are done searching. I know it's a small thing and I know we do it all the time, but setting state here in just the right way as to defeat the loop would work, but is a little less clear. But again, that is a personal style thing I guess. >> + } > > You could decrement "last" at this point as a micro-optimization, to use > all three possible outcomes of the comparison to make some progress. > > Not sure if any of that would have a _measurable_ impact, though, so I > don't mind the patch going in as is. Yes, it looks like we could decrement "last" here. And yes, I suspect it would have minimal impact. I'll pass on that in this series to keep it focused on the advertised goal of simply appending if possible. > >> + } >> + >> while (last > first) { >> int next = (last + first) >> 1; > > Hmm, "last" and "first" are ints as well, so this will give weird results > when "last" > INT_MAX / 2. I thought 19716b21a4 ("cleanup: fix possible > overflow errors in binary search", 2017-10-08) got us rid of those, but > git grep -n '(.*+.*) >> 1' actually finds some more cases: > > builtin/ls-files.c:376: int next = (last + first) >> 1; > diffcore-rename.c:26: int next = (last + first) >> 1; > diffcore-rename.c:86: int next = (last + first) >> 1; > dir.c:704: int cmp, next = (last + first) >> 1; > name-hash.c:349: int mid = (begin + end) >> 1; > read-cache.c:552: int next = (last + first) >> 1; > sh-i18n--envsubst.c:252: size_t j = (j1 + j2) >> 1; > > (Not caused by this patch, of course, just noticed it in the context.) > >> struct diff_rename_src *src = &(rename_src[next]); >> @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) >> first = next+1; >> } >> >> +append_it: >> /* insert to make it at "first" */ >> ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); >> rename_src_nr++; >> > > Anyway, this here should work as well (and fix the possible overflow), > but may be too terse and quirky: > > diff --git a/diffcore-rename.c b/diffcore-rename.c > index 07bd34b631..f2a9007e08 100644 > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -76,14 +76,13 @@ static int rename_src_nr, rename_src_alloc; > > static struct diff_rename_src *register_rename_src(struct diff_filepair *p) > { > - int first, last; > + int first, last, next; > struct diff_filespec *one = p->one; > unsigned short score = p->score; > > first = 0; > last = rename_src_nr; > - while (last > first) { > - int next = (last + first) >> 1; > + for (next = last - 1; last > first; next = first + (last - first) / 2) { > struct diff_rename_src *src = &(rename_src[next]); > int cmp = strcmp(one->path, src->p->one->path); > if (!cmp) > I'd like to limit the scope of this patch series to just the advertised topic and save the arithmetic fixes to another patch series like 19716b21a4. Jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src 2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget 2019-06-08 22:27 ` René Scharfe @ 2019-06-10 12:26 ` SZEDER Gábor 2019-06-10 14:55 ` Jeff Hostetler 1 sibling, 1 reply; 7+ messages in thread From: SZEDER Gábor @ 2019-06-10 12:26 UTC (permalink / raw) To: Jeff Hostetler via GitGitGadget; +Cc: git, Junio C Hamano, Jeff Hostetler On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Teach register_rename_src() to see if new file pair > can simply be appended to the rename_src[] array before > performing the binary search to find the proper insertion > point. > > This is a performance optimization. This routine is called > during run_diff_files in status and the caller is iterating > over the sorted index, so we should expect to be able to > append in the normal case. The existing insert logic is > preserved so we don't have to assume that, but simply take > advantage of it if possible. Could you add a command and performance figures to the commit message to show off the benefits of this patch? From the description it's clear that it's a performance optimization, but it's unclear whether it has a noticeable, or at least measurable effect, or it's "only" a micro-optimization. I tried something more substantial than a simple 'git status': # without this patch $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null [ ... ] real 2m4.320s user 2m0.913s sys 0m2.284s $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)' 52.40% 0.79% git git [.] diffcore_rename 0.01% 0.01% git git [.] register_rename_src but it looks like that while more than half of the considerable runtime is spent detecting renames, the time spent in register_rename_src(), the function optimized in this patch, is negligible. > Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > diffcore-rename.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/diffcore-rename.c b/diffcore-rename.c > index 07bd34b631..5bfc5f6c22 100644 > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) > > first = 0; > last = rename_src_nr; > + > + if (last > 0) { > + struct diff_rename_src *src = &(rename_src[last-1]); > + int cmp = strcmp(one->path, src->p->one->path); > + if (!cmp) > + return src; > + if (cmp > 0) { > + first = last; > + goto append_it; > + } > + } > + > while (last > first) { > int next = (last + first) >> 1; > struct diff_rename_src *src = &(rename_src[next]); > @@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) > first = next+1; > } > > +append_it: > /* insert to make it at "first" */ > ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc); > rename_src_nr++; > -- > gitgitgadget ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src 2019-06-10 12:26 ` SZEDER Gábor @ 2019-06-10 14:55 ` Jeff Hostetler 0 siblings, 0 replies; 7+ messages in thread From: Jeff Hostetler @ 2019-06-10 14:55 UTC (permalink / raw) To: SZEDER Gábor, Jeff Hostetler via GitGitGadget Cc: git, Junio C Hamano, Jeff Hostetler On 6/10/2019 8:26 AM, SZEDER Gábor wrote: > On Sat, Jun 08, 2019 at 07:42:42AM -0700, Jeff Hostetler via GitGitGadget wrote: >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Teach register_rename_src() to see if new file pair >> can simply be appended to the rename_src[] array before >> performing the binary search to find the proper insertion >> point. >> >> This is a performance optimization. This routine is called >> during run_diff_files in status and the caller is iterating >> over the sorted index, so we should expect to be able to >> append in the normal case. The existing insert logic is >> preserved so we don't have to assume that, but simply take >> advantage of it if possible. > > Could you add a command and performance figures to the commit message > to show off the benefits of this patch? > > From the description it's clear that it's a performance optimization, > but it's unclear whether it has a noticeable, or at least measurable > effect, or it's "only" a micro-optimization. > > I tried something more substantial than a simple 'git status': > > # without this patch > $ time perf record -g ./git log --oneline -m --name-only v2.20.0 >/dev/null > [ ... ] > > real 2m4.320s > user 2m0.913s > sys 0m2.284s > $ perf report -g none |grep -E '(diffcore_rename|register_rename_src)' > 52.40% 0.79% git git [.] diffcore_rename > 0.01% 0.01% git git [.] register_rename_src > > but it looks like that while more than half of the considerable > runtime is spent detecting renames, the time spent in > register_rename_src(), the function optimized in this patch, is > negligible. > I just wanted to send an ACK. I'll include perf numbers and more clarity after I dig up my notes on this. Jeff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/1] Optimize run_diff_files()' rename detection 2019-06-08 14:42 [PATCH 0/1] Optimize run_diff_files()' rename detection Johannes Schindelin via GitGitGadget 2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget @ 2019-06-10 14:43 ` Jeff Hostetler 1 sibling, 0 replies; 7+ messages in thread From: Jeff Hostetler @ 2019-06-10 14:43 UTC (permalink / raw) To: Johannes Schindelin via GitGitGadget, git; +Cc: Junio C Hamano On 6/8/2019 10:42 AM, Johannes Schindelin via GitGitGadget wrote: > Just another patch from Git for Windows' branch thicket... > > Jeff Hostetler (1): > diffcore-rename: speed up register_rename_src > > diffcore-rename.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > > base-commit: 8104ec994ea3849a968b4667d072fedd1e688642 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-142%2Fdscho%2Fregister_rename_src-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-142/dscho/register_rename_src-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/142 > For the sake of full disclosure, we added this patch to Git for Windows in December 2016. It was discussed on the mailing list in April 2017 [1] but it was shelved for various reasons. Let me put this one on hold while I dig up my notes from 2016 and re-review of the original mailing list suggestions and see where I want to take this patch going forward. Jeff [1] https://public-inbox.org/git/20170418194421.22453-1-git@jeffhostetler.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-06-10 14:55 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-08 14:42 [PATCH 0/1] Optimize run_diff_files()' rename detection Johannes Schindelin via GitGitGadget 2019-06-08 14:42 ` [PATCH 1/1] diffcore-rename: speed up register_rename_src Jeff Hostetler via GitGitGadget 2019-06-08 22:27 ` René Scharfe 2019-06-10 14:23 ` Jeff Hostetler 2019-06-10 12:26 ` SZEDER Gábor 2019-06-10 14:55 ` Jeff Hostetler 2019-06-10 14:43 ` [PATCH 0/1] Optimize run_diff_files()' rename detection Jeff Hostetler
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).