git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v1] diffcore-rename: speed up register_rename_src
Date: Thu, 20 Apr 2017 14:08:46 -0400	[thread overview]
Message-ID: <4d400fb6-201e-e1ba-cc3a-935951ab3e14@jeffhostetler.com> (raw)
In-Reply-To: <20170420161359.haolllw4ac5jjqx4@sigill.intra.peff.net>



On 4/20/2017 12:13 PM, Jeff King wrote:
> On Thu, Apr 20, 2017 at 10:00:04AM -0400, Jeff Hostetler wrote:
>
>> Perhaps the thing to learn from this (and the other ones) is that
>> we have lots of places where we are building a sorted list by
>> iterating over a sorted list.  The insert routines are general
>> purpose and cannot assume this, so they search first.  Perhaps it
>> would be clearer to have independent _append and _insert functions
>> and have the caller explicitly call the appropriate one.  The mainline
>> iterations on the existing index could just call the _append form
>> and never have to worry about searching or the negative-integer
>> return trick.  Whereas, the random iterations (such as on the
>> command's arg list), would always call the _insert form.
>
> Yes. I'd be much happier if your patch was flipping between two
> general-purpose insertion functions. And if that same trick was used on
> the dst side.
>
> Or even, given that this these functions are called from a single
> location that has sorted input, the binary search was just replaced
> completely with an append combined with a sort-check.
>
> That's not the minimal change you were going for, but I think the end
> result is simpler and more consistent.

OK, let me take a stab at something like that and
see where it takes me.

WRT your earlier comment about how often we add or delete 4M
files and then run status.  The use case that started this was a
1% sparse-checkout followed by a read-tree (which reset the
skip-worktree bits) and then status (which thought 99% of the
worktree had been deleted or maybe renamed).  There are probably
other ways to get into this state, but that's how this started.
The more subtle point is that -- for these obscenely large
values of n -- any time I see an O(n log n) operation that could
or should be O(n), I want to stop and look at it.


  reply	other threads:[~2017-04-20 18:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-18 19:44 [PATCH v1] diffcore-rename speedup git
2017-04-18 19:44 ` [PATCH v1] diffcore-rename: speed up register_rename_src git
2017-04-19  1:32   ` Jeff King
2017-04-19  2:45     ` Junio C Hamano
2017-04-19  2:56       ` Jeff King
2017-04-19  3:18         ` Jeff King
2017-04-20 14:00           ` Jeff Hostetler
2017-04-20 16:13             ` Jeff King
2017-04-20 18:08               ` Jeff Hostetler [this message]
2017-04-20 18:34                 ` Jeff King
2017-04-21  1:19                   ` Junio C Hamano
2017-04-20 10:40     ` Johannes Schindelin
2017-04-20 15:50       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d400fb6-201e-e1ba-cc3a-935951ab3e14@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).