From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@jeffhostetler.com, git@vger.kernel.org,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v1] diffcore-rename: speed up register_rename_src
Date: Tue, 18 Apr 2017 22:56:08 -0400 [thread overview]
Message-ID: <20170419025608.xy5nvso6k6lb5z7g@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqd1c9cdzi.fsf@gitster.mtv.corp.google.com>
On Tue, Apr 18, 2017 at 07:45:05PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > On Tue, Apr 18, 2017 at 07:44:21PM +0000, git@jeffhostetler.com wrote:
> >
> >> From: Jeff Hostetler <jeffhost@microsoft.com>
> >>
> >> Teach register_rename_src() to see if new file pair
> >> can simply be appended to the rename_src[] array before
> >> performing the binary search to find the proper insertion
> >> point.
> >
> > I guess your perf results show some minor improvement. But I suspect
> > this is because your synthetic repo does not resemble the real world
> > very much. You're saving a few strcmps, but for each of those files
> > you're potentially going to have actually zlib inflate the object
> > contents and do similarity analysis.
> >
> > So "absurd number of files doing 100% exact renames" is the absolute
> > best case, and it saves a few percent.
> >
> > I dunno. It is not that much code _here_, but I'm not excited about the
> > prospect of sprinkling this same "check the last one" optimization all
> > over the code base. I wonder if there's some way to generalize it.
>
> When adding many things, we often just append and then sort at the
> end after we finished adding. I wonder if recent "check the last
> one and append" optimization beats that strategy.
The big question is whether we need to detect duplicates while we're
appending to the list, which is hard on an unsorted list. In this
function, at least, we do detect when the path already exists and return
the existing entry. I'm not sure under what circumstances we would see
such a duplicate, though, as each filename should appear only once in
the tree diff. I would think.
Doing:
diff --git a/diffcore-rename.c b/diffcore-rename.c
index f7444c86b..56a493d97 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -86,7 +86,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
struct diff_rename_src *src = &(rename_src[next]);
int cmp = strcmp(one->path, src->p->one->path);
if (!cmp)
- return src;
+ die("BUG: duplicate rename src: %s", one->path);
if (cmp < 0) {
last = next;
continue;
passes the test suite, at least. :)
-Peff
next prev parent reply other threads:[~2017-04-19 2:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-18 19:44 [PATCH v1] diffcore-rename speedup git
2017-04-18 19:44 ` [PATCH v1] diffcore-rename: speed up register_rename_src git
2017-04-19 1:32 ` Jeff King
2017-04-19 2:45 ` Junio C Hamano
2017-04-19 2:56 ` Jeff King [this message]
2017-04-19 3:18 ` Jeff King
2017-04-20 14:00 ` Jeff Hostetler
2017-04-20 16:13 ` Jeff King
2017-04-20 18:08 ` Jeff Hostetler
2017-04-20 18:34 ` Jeff King
2017-04-21 1:19 ` Junio C Hamano
2017-04-20 10:40 ` Johannes Schindelin
2017-04-20 15:50 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170419025608.xy5nvso6k6lb5z7g@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).