list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <>
To: Jeff Hostetler <>
Cc: Junio C Hamano <>,, Jeff Hostetler <>
Subject: Re: [PATCH v1] diffcore-rename: speed up register_rename_src
Date: Thu, 20 Apr 2017 14:34:55 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Thu, Apr 20, 2017 at 02:08:46PM -0400, Jeff Hostetler wrote:

> > That's not the minimal change you were going for, but I think the end
> > result is simpler and more consistent.
> OK, let me take a stab at something like that and
> see where it takes me.


I set the patch as a lump, but I think there are a few things going on

  - the return value of register_rename_src() is actively dangerous (it
    points to memory which may be reallocated), so it's good that it
    goes away in favor of an "int"

  - we already refuse to do rename detection when there are duplicate
    dsts. This adds the same for srcs. I don't know if the same safety
    rules apply there, but it certainly seems like a reasonable and
    consistent precaution to say "this tree looks broken, let's skip
    rename detection". But it does mean a potential change in
    functionality in that corner case.

  - this patch probably adds "unsorted tree" to the list of breakages
    that would cause us to skip rename detection. I don't know if that's
    actually possible in practice (i.e., do we end up sorting the
    diffq elsewhere anyway?). I also wondered if it might run afoul of
    diffcore_order(), but that is applied after rename detection, so
    we're OK.

> WRT your earlier comment about how often we add or delete 4M
> files and then run status.  The use case that started this was a
> 1% sparse-checkout followed by a read-tree (which reset the
> skip-worktree bits) and then status (which thought 99% of the
> worktree had been deleted or maybe renamed).  There are probably
> other ways to get into this state, but that's how this started.

Right, that sounds plausible. I guess I just wondered if this is
something an average developer runs daily, or something that they would
run into once a year. Shaving 4s of CPU off of a once-a-year operation
is less exciting.

> The more subtle point is that -- for these obscenely large
> values of n -- any time I see an O(n log n) operation that could
> or should be O(n), I want to stop and look at it.

Heh. I spent a fair bit of time in Git's past turning O(n^2) operations
into O(n log n), so I feel your pain. I do think it's important to pay
attention to whole-operation numbers, though. Quite often you have an
O(n log n) with a small constant (like a single strcmp) coupled with
something linear but with a huge constant (like loading blob contents),
and micro-optimizations to the former get drowned out by the latter.


  reply	other threads:[~2017-04-20 18:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-18 19:44 [PATCH v1] diffcore-rename speedup git
2017-04-18 19:44 ` [PATCH v1] diffcore-rename: speed up register_rename_src git
2017-04-19  1:32   ` Jeff King
2017-04-19  2:45     ` Junio C Hamano
2017-04-19  2:56       ` Jeff King
2017-04-19  3:18         ` Jeff King
2017-04-20 14:00           ` Jeff Hostetler
2017-04-20 16:13             ` Jeff King
2017-04-20 18:08               ` Jeff Hostetler
2017-04-20 18:34                 ` Jeff King [this message]
2017-04-21  1:19                   ` Junio C Hamano
2017-04-20 10:40     ` Johannes Schindelin
2017-04-20 15:50       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \
    --subject='Re: [PATCH v1] diffcore-rename: speed up register_rename_src' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ \
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
 note: .onion URLs require Tor:

code repositories for project(s) associated with this inbox:

AGPL code for this site: git clone