git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Christopher Jefferson <caj21@st-andrews.ac.uk>,
	"git\@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: git rm VERY slow for directories with many files.
Date: Mon, 30 Oct 2017 10:36:27 +0900	[thread overview]
Message-ID: <xmqqbmkp8lic.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20171029165244.si4a5furgf6trqe3@genre.crustytoothpaste.net> (brian m. carlson's message of "Sun, 29 Oct 2017 16:52:44 +0000")

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On Sun, Oct 29, 2017 at 09:51:55AM +0900, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> > First, make sure your working directory is clean with no changes.  Then,
>> > remove the directory (by hand) or move it somewhere else.  Then, run
>> > "git add -u".
>> >
>> > That should allow you to commit the removal of those files quickly.
>> 
>> If get_tree_entry() shows up a lot in the profile, it would indicate
>> that a lot of cycles are spent in check_local_mod().  Bypassing it
>> with "-f" may be the first thing to try ;-)
>
> That is indeed faster.  I tested my solution by creating a directory
> with 20,000 files in a temporary repo.  git rm -r took 17.96s, and git
> rm -rf took .12s.  (This is on an SSD.)
>
> That's also a nicer and more intuitive solution than mine.

Heh, the above was meant as a joke, though.  "-f" is bypassing an
important safety valve.  In fact in my early draft of the message,
the paragraph that followed started with "Jokes aside, ..." ;-)

>> I wonder how fast "git diff-index --cached -r HEAD --", with the
>> same pathspec used for the problematic "git rm", runs in this same
>> 50,000 path project.
>
> I'll let the original poster answer this one as well, but it was very
> fast in my test repo.  I'm not very familiar with the code path in
> question, but it definitely looks like we're avoiding the quadratic
> behavior in this case.

Because of the way "diff-index --cached" iterates over the index and
the tree in parallel, it should be a lot faster than doing
get_tree_entry() for each and every path you care about.  In
addition, the "--cached" form is further optimized to take advantage
of the cached-tree index extension, so you often can tell "all index
entries in this directory are untouched" without descending into
deep subdirectories.

      reply	other threads:[~2017-10-30  1:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-28 17:02 git rm VERY slow for directories with many files Christopher Jefferson
2017-10-28 22:31 ` brian m. carlson
2017-10-29  0:51   ` Junio C Hamano
2017-10-29  3:52     ` Junio C Hamano
2017-10-29 16:52     ` brian m. carlson
2017-10-30  1:36       ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbmkp8lic.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=caj21@st-andrews.ac.uk \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).