git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Christopher Jefferson <caj21@st-andrews.ac.uk>,
	"git\@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: git rm VERY slow for directories with many files.
Date: Sun, 29 Oct 2017 09:51:55 +0900	[thread overview]
Message-ID: <xmqqbmkqbwt0.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20171028223103.wevq5zf4rjl7ietd@genre.crustytoothpaste.net> (brian m. carlson's message of "Sat, 28 Oct 2017 22:31:04 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> Looking at an optimized profile, all the time seems to be spent in “get_tree_entry” — I assume there is some huge object representing the directory which is being re-expanded for each file?
>
> Yes, there's a tree object that represents each directory.
>
>> Is there any way I can speed up removing this directory?
>
> First, make sure your working directory is clean with no changes.  Then,
> remove the directory (by hand) or move it somewhere else.  Then, run
> "git add -u".
>
> That should allow you to commit the removal of those files quickly.

If get_tree_entry() shows up a lot in the profile, it would indicate
that a lot of cycles are spent in check_local_mod().  Bypassing it
with "-f" may be the first thing to try ;-)

The way "git rm" makes repeated calls to get_tree_entry() with deep
pathnames would be an easy recipe to get quadratic behaviour like
the one reported in the first message on this thread, as it always
goes from the root level, grabs an tree object and scans it to get
the entry for the next level, and (worse yet) a look-up of a path
component in each of these tree object must be done as a linear
scan.

I wonder how fast "git diff-index --cached -r HEAD --", with the
same pathspec used for the problematic "git rm", runs in this same
50,000 path project.  

If it runs in a reasonable time, one easy way out may be to revamp
the codepath to call check_local_mod() to:

 - first before making the call, do the "diff-index --cached" thing
   internally with the same pathspec to grab the list of paths that
   have local modifications; save the set of paths in a hashmap or
   something.

 - pass that hashmap to check_local_mod(), and where the function
   does the "staged_changes" check, consult the hashmap to see the
   path in question is different between the HEAD and the index.


  reply	other threads:[~2017-10-29  0:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-28 17:02 git rm VERY slow for directories with many files Christopher Jefferson
2017-10-28 22:31 ` brian m. carlson
2017-10-29  0:51   ` Junio C Hamano [this message]
2017-10-29  3:52     ` Junio C Hamano
2017-10-29 16:52     ` brian m. carlson
2017-10-30  1:36       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbmkqbwt0.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=caj21@st-andrews.ac.uk \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).