From: Junio C Hamano <gitster@pobox.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Christopher Jefferson <caj21@st-andrews.ac.uk>,
"git\@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: git rm VERY slow for directories with many files.
Date: Mon, 30 Oct 2017 10:36:27 +0900 [thread overview]
Message-ID: <xmqqbmkp8lic.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20171029165244.si4a5furgf6trqe3@genre.crustytoothpaste.net> (brian m. carlson's message of "Sun, 29 Oct 2017 16:52:44 +0000")
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On Sun, Oct 29, 2017 at 09:51:55AM +0900, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> > First, make sure your working directory is clean with no changes. Then,
>> > remove the directory (by hand) or move it somewhere else. Then, run
>> > "git add -u".
>> >
>> > That should allow you to commit the removal of those files quickly.
>>
>> If get_tree_entry() shows up a lot in the profile, it would indicate
>> that a lot of cycles are spent in check_local_mod(). Bypassing it
>> with "-f" may be the first thing to try ;-)
>
> That is indeed faster. I tested my solution by creating a directory
> with 20,000 files in a temporary repo. git rm -r took 17.96s, and git
> rm -rf took .12s. (This is on an SSD.)
>
> That's also a nicer and more intuitive solution than mine.
Heh, the above was meant as a joke, though. "-f" is bypassing an
important safety valve. In fact in my early draft of the message,
the paragraph that followed started with "Jokes aside, ..." ;-)
>> I wonder how fast "git diff-index --cached -r HEAD --", with the
>> same pathspec used for the problematic "git rm", runs in this same
>> 50,000 path project.
>
> I'll let the original poster answer this one as well, but it was very
> fast in my test repo. I'm not very familiar with the code path in
> question, but it definitely looks like we're avoiding the quadratic
> behavior in this case.
Because of the way "diff-index --cached" iterates over the index and
the tree in parallel, it should be a lot faster than doing
get_tree_entry() for each and every path you care about. In
addition, the "--cached" form is further optimized to take advantage
of the cached-tree index extension, so you often can tell "all index
entries in this directory are untouched" without descending into
deep subdirectories.
prev parent reply other threads:[~2017-10-30 1:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-28 17:02 git rm VERY slow for directories with many files Christopher Jefferson
2017-10-28 22:31 ` brian m. carlson
2017-10-29 0:51 ` Junio C Hamano
2017-10-29 3:52 ` Junio C Hamano
2017-10-29 16:52 ` brian m. carlson
2017-10-30 1:36 ` Junio C Hamano [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqbmkp8lic.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=caj21@st-andrews.ac.uk \
--cc=git@vger.kernel.org \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).