From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Martin Melka <martin.melka@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Samuel Lijin <sxlijin@gmail.com>
Subject: Re: git status --ignored hangs when a deep directory structure present in working tree
Date: Tue, 28 Jan 2020 14:57:07 +0100 [thread overview]
Message-ID: <20200128135707.GD10482@szeder.dev> (raw)
In-Reply-To: <CABPp-BGvU_DHQu66bqPZ+WXg5mL8bCP5Uxp4g5393WnWyO1Dhg@mail.gmail.com>
On Mon, Jan 27, 2020 at 09:06:01PM -0800, Elijah Newren wrote:
> > the runtime of 'git status
> > --ignored' grows quickly with the depth of the untracked directory.
> > Running this shell loop produces the numbers below:
> >
> > for depth in $(seq 10 30)
> > do
> > dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> > rm -rf dir
> > mkdir -p $dirs
> > >$dirs/untracked-file
> > /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> > done
> >
> > 10: 0.01
> > 11: 0.03
> > 12: 0.05
> > 13: 0.11
> > 14: 0.23
> > 15: 0.47
> > 16: 0.97
> > 17: 1.97
> > 18: 3.88
> > 19: 7.85
> > 20: 16.29
> > 21: 32.92
> > 22: 76.24
>
> Wow.
>
> Really nice testcase, though, thanks.
>
> > Beautifully quadratic, isn't it? :)
>
> I think you mean exponential (in particular 2^n rather than n^2).
Ouch, yeah, indeed.
> > Unless I messed up my numbers, with a depth of 120 directories it
> > would take over 6*10^23 years to complete... so yeah, it does qualify
> > as indefinitely.
>
> No comment about how people today are spoiled and addicted to instant
> gratification? I mean, can't folks just be a little patient? ;-)
Nope. Notice how my shell loop above goes to 30, but the results only
to 22 :)
> > This slowdown was caused by commit df5bcdf83a (dir: recurse into
> > untracked dirs for ignored files, 2017-05-18), which was part of a
> > patch series to fix 'git clean -d' deleting untracked directories even
> > if they contained ignored files.
> >
> > Cc'ing Samuel, author of that commit, and Elijah, who had quite some
> > fun with 'dir.c' recently.
>
> Heh, yes, what "fun" it was.
>
> Anyway, after digging around for quite a bit today... that commit
> added calling read_directory_recursive() directly from itself for
> certain untracked paths. This means that read_directory_recursive()
> (which I'll abbreviate to r_d_r()), when we're dealing with certain
> untracked paths:
>
> * Calls treat_path() -> treat_one_path() -> treat_directory() -> r_d_r()
> * Calls r_d_r() directly as well
>
> So, from the toplevel directory, r_d_r() will call itself twice on the
> next directory down. For each of those, it'll call r_d_r() twice on
> the second directory down. From each of those, it'll call r_d_r()
> twice on the third directory in the hierarchy and so on until we have
> 2^n calls to r_d_r() for the nth level deep directory.
Got it, thanks.
> Trying to back out the underlying problem, I _think_ the cause behind
> all this is that r_d_r() and friends all use the path_treatment enum,
> which says that the treatment of any path has to be one of those four
> types. The dichotomy between path_untracked and path_recurse in
> particular means the system has no way to mark that something should
> be both marked as untracked and recursed into, yet we definitely need
> to have some directories marked as untracked and we also need to
> recurse into them. This, I think led to Samuel's attempt to
> workaround that dichotomy by having the code in r_d_r() check for
> certain path_untracked cases which should also be recursed into. I
> think adding another type to the enum and shifting the logic elsewhere
> might enable us to both simplify the logic and avoid this expensive
> exponential behavior, but I haven't gotten it working yet. We'll see
> if my patch goes anywhere, or if it's just another dead-end among
> many.
I was wondering whether it would make sense to give the enum contants
power-of-two values, so we could say 'path_recurse | path_untracked'.
But while this particular combination makes sense, others, at least to
my superficial understanding, not at all (e.g. 'path_recurse |
path_exclude').
next prev parent reply other threads:[~2020-01-28 13:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-27 10:55 git status --ignored hangs when a deep directory structure present in working tree Martin Melka
2020-01-27 12:08 ` SZEDER Gábor
2020-01-28 5:06 ` Elijah Newren
2020-01-28 13:57 ` SZEDER Gábor [this message]
2020-01-28 15:00 ` Elijah Newren
2020-01-29 22:10 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200128135707.GD10482@szeder.dev \
--to=szeder.dev@gmail.com \
--cc=git@vger.kernel.org \
--cc=martin.melka@gmail.com \
--cc=newren@gmail.com \
--cc=sxlijin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).