From: Jeff King <peff@peff.net>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org, Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v1] diffcore-rename: speed up register_rename_src
Date: Thu, 20 Apr 2017 14:34:55 -0400 [thread overview]
Message-ID: <20170420183455.y5oz3hspz3v2g6yr@sigill.intra.peff.net> (raw)
In-Reply-To: <4d400fb6-201e-e1ba-cc3a-935951ab3e14@jeffhostetler.com>
On Thu, Apr 20, 2017 at 02:08:46PM -0400, Jeff Hostetler wrote:
> > That's not the minimal change you were going for, but I think the end
> > result is simpler and more consistent.
>
> OK, let me take a stab at something like that and
> see where it takes me.
Thanks.
I set the patch as a lump, but I think there are a few things going on
there:
- the return value of register_rename_src() is actively dangerous (it
points to memory which may be reallocated), so it's good that it
goes away in favor of an "int"
- we already refuse to do rename detection when there are duplicate
dsts. This adds the same for srcs. I don't know if the same safety
rules apply there, but it certainly seems like a reasonable and
consistent precaution to say "this tree looks broken, let's skip
rename detection". But it does mean a potential change in
functionality in that corner case.
- this patch probably adds "unsorted tree" to the list of breakages
that would cause us to skip rename detection. I don't know if that's
actually possible in practice (i.e., do we end up sorting the
diffq elsewhere anyway?). I also wondered if it might run afoul of
diffcore_order(), but that is applied after rename detection, so
we're OK.
> WRT your earlier comment about how often we add or delete 4M
> files and then run status. The use case that started this was a
> 1% sparse-checkout followed by a read-tree (which reset the
> skip-worktree bits) and then status (which thought 99% of the
> worktree had been deleted or maybe renamed). There are probably
> other ways to get into this state, but that's how this started.
Right, that sounds plausible. I guess I just wondered if this is
something an average developer runs daily, or something that they would
run into once a year. Shaving 4s of CPU off of a once-a-year operation
is less exciting.
> The more subtle point is that -- for these obscenely large
> values of n -- any time I see an O(n log n) operation that could
> or should be O(n), I want to stop and look at it.
Heh. I spent a fair bit of time in Git's past turning O(n^2) operations
into O(n log n), so I feel your pain. I do think it's important to pay
attention to whole-operation numbers, though. Quite often you have an
O(n log n) with a small constant (like a single strcmp) coupled with
something linear but with a huge constant (like loading blob contents),
and micro-optimizations to the former get drowned out by the latter.
-Peff
next prev parent reply other threads:[~2017-04-20 18:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-18 19:44 [PATCH v1] diffcore-rename speedup git
2017-04-18 19:44 ` [PATCH v1] diffcore-rename: speed up register_rename_src git
2017-04-19 1:32 ` Jeff King
2017-04-19 2:45 ` Junio C Hamano
2017-04-19 2:56 ` Jeff King
2017-04-19 3:18 ` Jeff King
2017-04-20 14:00 ` Jeff Hostetler
2017-04-20 16:13 ` Jeff King
2017-04-20 18:08 ` Jeff Hostetler
2017-04-20 18:34 ` Jeff King [this message]
2017-04-21 1:19 ` Junio C Hamano
2017-04-20 10:40 ` Johannes Schindelin
2017-04-20 15:50 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170420183455.y5oz3hspz3v2g6yr@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).