git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git status --ignored hangs when a deep directory structure present in working tree
@ 2020-01-27 10:55 Martin Melka
  2020-01-27 12:08 ` SZEDER Gábor
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Melka @ 2020-01-27 10:55 UTC (permalink / raw)
  To: git

Hi all, I have ran across what might be a bug in git. When there is a
deep directory structure (tried on 100+ nested dirs), then git status
--ignored hangs indefinitely.
Discovered this on OSX (Mojave, git 2.20.1 (Apple Git-117)), but it
reproduces in Ubuntu (19.04, git 2.25.0) Docker container on OSX and
also on baremetal Ubuntu server (16.04, git 2.17.1).

Steps to reproduce:

1. Generate the deep dir structure:

    mkdir gittest; pushd gittest; git init; for i in $(seq 1 120); do
mkdir dir; cd dir; done; touch leaf; popd

2. Try to get git status --ignored

    cd gittest && git status --ignored


If there is a dir depth limit, git should probably exit with an error
rather than getting stuck endlessly.

StackOverflow report:
https://stackoverflow.com/questions/59928800/git-status-ignored-hangs-indefinitely

Best regards,
Martin Melka

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git status --ignored hangs when a deep directory structure present in working tree
  2020-01-27 10:55 git status --ignored hangs when a deep directory structure present in working tree Martin Melka
@ 2020-01-27 12:08 ` SZEDER Gábor
  2020-01-28  5:06   ` Elijah Newren
  0 siblings, 1 reply; 6+ messages in thread
From: SZEDER Gábor @ 2020-01-27 12:08 UTC (permalink / raw)
  To: Martin Melka; +Cc: git, Samuel Lijin, Elijah Newren

On Mon, Jan 27, 2020 at 11:55:01AM +0100, Martin Melka wrote:
> Hi all, I have ran across what might be a bug in git. When there is a
> deep directory structure (tried on 100+ nested dirs), then git status
> --ignored hangs indefinitely.
> Discovered this on OSX (Mojave, git 2.20.1 (Apple Git-117)), but it
> reproduces in Ubuntu (19.04, git 2.25.0) Docker container on OSX and
> also on baremetal Ubuntu server (16.04, git 2.17.1).
> 
> Steps to reproduce:
> 
> 1. Generate the deep dir structure:
> 
>     mkdir gittest; pushd gittest; git init; for i in $(seq 1 120); do
> mkdir dir; cd dir; done; touch leaf; popd
> 
> 2. Try to get git status --ignored
> 
>     cd gittest && git status --ignored
> 
> 
> If there is a dir depth limit, git should probably exit with an error
> rather than getting stuck endlessly.

This is interesting, thanks for the report.

There is no such directory depth limit, but the runtime of 'git status
--ignored' grows quickly with the depth of the untracked directory.
Running this shell loop produces the numbers below:

for depth in $(seq 10 30)
do
        dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
        rm -rf dir
        mkdir -p $dirs
        >$dirs/untracked-file
        /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
done

10: 0.01
11: 0.03
12: 0.05
13: 0.11
14: 0.23
15: 0.47
16: 0.97
17: 1.97
18: 3.88
19: 7.85
20: 16.29
21: 32.92
22: 76.24

Beautifully quadratic, isn't it? :)

Unless I messed up my numbers, with a depth of 120 directories it
would take over 6*10^23 years to complete... so yeah, it does qualify
as indefinitely.

This slowdown was caused by commit df5bcdf83a (dir: recurse into
untracked dirs for ignored files, 2017-05-18), which was part of a
patch series to fix 'git clean -d' deleting untracked directories even
if they contained ignored files.

Cc'ing Samuel, author of that commit, and Elijah, who had quite some
fun with 'dir.c' recently.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git status --ignored hangs when a deep directory structure present in working tree
  2020-01-27 12:08 ` SZEDER Gábor
@ 2020-01-28  5:06   ` Elijah Newren
  2020-01-28 13:57     ` SZEDER Gábor
  2020-01-29 22:10     ` Elijah Newren
  0 siblings, 2 replies; 6+ messages in thread
From: Elijah Newren @ 2020-01-28  5:06 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Martin Melka, Git Mailing List, Samuel Lijin

On Mon, Jan 27, 2020 at 4:08 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Mon, Jan 27, 2020 at 11:55:01AM +0100, Martin Melka wrote:
> > Hi all, I have ran across what might be a bug in git. When there is a
> > deep directory structure (tried on 100+ nested dirs), then git status
> > --ignored hangs indefinitely.
> > Discovered this on OSX (Mojave, git 2.20.1 (Apple Git-117)), but it
> > reproduces in Ubuntu (19.04, git 2.25.0) Docker container on OSX and
> > also on baremetal Ubuntu server (16.04, git 2.17.1).
> >
> > Steps to reproduce:
> >
> > 1. Generate the deep dir structure:
> >
> >     mkdir gittest; pushd gittest; git init; for i in $(seq 1 120); do
> > mkdir dir; cd dir; done; touch leaf; popd
> >
> > 2. Try to get git status --ignored
> >
> >     cd gittest && git status --ignored
> >
> >
> > If there is a dir depth limit, git should probably exit with an error
> > rather than getting stuck endlessly.
>
> This is interesting, thanks for the report.

Agreed.

> There is no such directory depth limit, but the runtime of 'git status
> --ignored' grows quickly with the depth of the untracked directory.
> Running this shell loop produces the numbers below:
>
> for depth in $(seq 10 30)
> do
>         dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
>         rm -rf dir
>         mkdir -p $dirs
>         >$dirs/untracked-file
>         /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> done
>
> 10: 0.01
> 11: 0.03
> 12: 0.05
> 13: 0.11
> 14: 0.23
> 15: 0.47
> 16: 0.97
> 17: 1.97
> 18: 3.88
> 19: 7.85
> 20: 16.29
> 21: 32.92
> 22: 76.24

Wow.

Really nice testcase, though, thanks.

> Beautifully quadratic, isn't it? :)

I think you mean exponential (in particular 2^n rather than n^2).

> Unless I messed up my numbers, with a depth of 120 directories it
> would take over 6*10^23 years to complete... so yeah, it does qualify
> as indefinitely.

No comment about how people today are spoiled and addicted to instant
gratification?  I mean, can't folks just be a little patient?  ;-)

> This slowdown was caused by commit df5bcdf83a (dir: recurse into
> untracked dirs for ignored files, 2017-05-18), which was part of a
> patch series to fix 'git clean -d' deleting untracked directories even
> if they contained ignored files.
>
> Cc'ing Samuel, author of that commit, and Elijah, who had quite some
> fun with 'dir.c' recently.

Heh, yes, what "fun" it was.

Anyway, after digging around for quite a bit today... that commit
added calling read_directory_recursive() directly from itself for
certain untracked paths.  This means that read_directory_recursive()
(which I'll abbreviate to r_d_r()), when we're dealing with certain
untracked paths:

  * Calls treat_path() -> treat_one_path() -> treat_directory() -> r_d_r()
  * Calls r_d_r() directly as well

So, from the toplevel directory, r_d_r() will call itself twice on the
next directory down.  For each of those, it'll call r_d_r() twice on
the second directory down.  From each of those, it'll call r_d_r()
twice on the third directory in the hierarchy and so on until we have
2^n calls to r_d_r() for the nth level deep directory.

Trying to back out the underlying problem, I _think_ the cause behind
all this is that r_d_r() and friends all use the path_treatment enum,
which says that the treatment of any path has to be one of those four
types. The dichotomy between path_untracked and path_recurse in
particular means the system has no way to mark that something should
be both marked as untracked and recursed into, yet we definitely need
to have some directories marked as untracked and we also need to
recurse into them.  This, I think led to Samuel's attempt to
workaround that dichotomy by having the code in r_d_r() check for
certain path_untracked cases which should also be recursed into.  I
think adding another type to the enum and shifting the logic elsewhere
might enable us to both simplify the logic and avoid this expensive
exponential behavior, but I haven't gotten it working yet.  We'll see
if my patch goes anywhere, or if it's just another dead-end among
many.


Elijah

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git status --ignored hangs when a deep directory structure present in working tree
  2020-01-28  5:06   ` Elijah Newren
@ 2020-01-28 13:57     ` SZEDER Gábor
  2020-01-28 15:00       ` Elijah Newren
  2020-01-29 22:10     ` Elijah Newren
  1 sibling, 1 reply; 6+ messages in thread
From: SZEDER Gábor @ 2020-01-28 13:57 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Martin Melka, Git Mailing List, Samuel Lijin

On Mon, Jan 27, 2020 at 09:06:01PM -0800, Elijah Newren wrote:
> > the runtime of 'git status
> > --ignored' grows quickly with the depth of the untracked directory.
> > Running this shell loop produces the numbers below:
> >
> > for depth in $(seq 10 30)
> > do
> >         dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> >         rm -rf dir
> >         mkdir -p $dirs
> >         >$dirs/untracked-file
> >         /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> > done
> >
> > 10: 0.01
> > 11: 0.03
> > 12: 0.05
> > 13: 0.11
> > 14: 0.23
> > 15: 0.47
> > 16: 0.97
> > 17: 1.97
> > 18: 3.88
> > 19: 7.85
> > 20: 16.29
> > 21: 32.92
> > 22: 76.24
> 
> Wow.
> 
> Really nice testcase, though, thanks.
> 
> > Beautifully quadratic, isn't it? :)
> 
> I think you mean exponential (in particular 2^n rather than n^2).

Ouch, yeah, indeed.

> > Unless I messed up my numbers, with a depth of 120 directories it
> > would take over 6*10^23 years to complete... so yeah, it does qualify
> > as indefinitely.
> 
> No comment about how people today are spoiled and addicted to instant
> gratification?  I mean, can't folks just be a little patient?  ;-)

Nope.  Notice how my shell loop above goes to 30, but the results only
to 22 :)

> > This slowdown was caused by commit df5bcdf83a (dir: recurse into
> > untracked dirs for ignored files, 2017-05-18), which was part of a
> > patch series to fix 'git clean -d' deleting untracked directories even
> > if they contained ignored files.
> >
> > Cc'ing Samuel, author of that commit, and Elijah, who had quite some
> > fun with 'dir.c' recently.
> 
> Heh, yes, what "fun" it was.
> 
> Anyway, after digging around for quite a bit today... that commit
> added calling read_directory_recursive() directly from itself for
> certain untracked paths.  This means that read_directory_recursive()
> (which I'll abbreviate to r_d_r()), when we're dealing with certain
> untracked paths:
> 
>   * Calls treat_path() -> treat_one_path() -> treat_directory() -> r_d_r()
>   * Calls r_d_r() directly as well
> 
> So, from the toplevel directory, r_d_r() will call itself twice on the
> next directory down.  For each of those, it'll call r_d_r() twice on
> the second directory down.  From each of those, it'll call r_d_r()
> twice on the third directory in the hierarchy and so on until we have
> 2^n calls to r_d_r() for the nth level deep directory.

Got it, thanks.

> Trying to back out the underlying problem, I _think_ the cause behind
> all this is that r_d_r() and friends all use the path_treatment enum,
> which says that the treatment of any path has to be one of those four
> types. The dichotomy between path_untracked and path_recurse in
> particular means the system has no way to mark that something should
> be both marked as untracked and recursed into, yet we definitely need
> to have some directories marked as untracked and we also need to
> recurse into them.  This, I think led to Samuel's attempt to
> workaround that dichotomy by having the code in r_d_r() check for
> certain path_untracked cases which should also be recursed into.  I
> think adding another type to the enum and shifting the logic elsewhere
> might enable us to both simplify the logic and avoid this expensive
> exponential behavior, but I haven't gotten it working yet.  We'll see
> if my patch goes anywhere, or if it's just another dead-end among
> many.

I was wondering whether it would make sense to give the enum contants
power-of-two values, so we could say 'path_recurse | path_untracked'.
But while this particular combination makes sense, others, at least to
my superficial understanding, not at all (e.g. 'path_recurse |
path_exclude').


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git status --ignored hangs when a deep directory structure present in working tree
  2020-01-28 13:57     ` SZEDER Gábor
@ 2020-01-28 15:00       ` Elijah Newren
  0 siblings, 0 replies; 6+ messages in thread
From: Elijah Newren @ 2020-01-28 15:00 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Martin Melka, Git Mailing List, Samuel Lijin

On Tue, Jan 28, 2020 at 5:57 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Mon, Jan 27, 2020 at 09:06:01PM -0800, Elijah Newren wrote:
> > > the runtime of 'git status
> > > --ignored' grows quickly with the depth of the untracked directory.
> > > Running this shell loop produces the numbers below:
> > >
> > > for depth in $(seq 10 30)
> > > do
> > >         dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> > >         rm -rf dir
> > >         mkdir -p $dirs
> > >         >$dirs/untracked-file
> > >         /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> > > done
> > >
> > > 10: 0.01
> > > 11: 0.03
> > > 12: 0.05
> > > 13: 0.11
> > > 14: 0.23
> > > 15: 0.47
> > > 16: 0.97
> > > 17: 1.97
> > > 18: 3.88
> > > 19: 7.85
> > > 20: 16.29
> > > 21: 32.92
> > > 22: 76.24
[...]
> > > Unless I messed up my numbers, with a depth of 120 directories it
> > > would take over 6*10^23 years to complete... so yeah, it does qualify
> > > as indefinitely.
> >
> > No comment about how people today are spoiled and addicted to instant
> > gratification?  I mean, can't folks just be a little patient?  ;-)
>
> Nope.  Notice how my shell loop above goes to 30, but the results only
> to 22 :)

I was specifically referring to your 6*10^23 years estimate when I was
jokingly suggesting just a little more patience.  :-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git status --ignored hangs when a deep directory structure present in working tree
  2020-01-28  5:06   ` Elijah Newren
  2020-01-28 13:57     ` SZEDER Gábor
@ 2020-01-29 22:10     ` Elijah Newren
  1 sibling, 0 replies; 6+ messages in thread
From: Elijah Newren @ 2020-01-29 22:10 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Martin Melka, Git Mailing List, Samuel Lijin

On Mon, Jan 27, 2020 at 9:06 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Jan 27, 2020 at 4:08 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
> >
> > On Mon, Jan 27, 2020 at 11:55:01AM +0100, Martin Melka wrote:
> > > Hi all, I have ran across what might be a bug in git. When there is a
> > > deep directory structure (tried on 100+ nested dirs), then git status
> > > --ignored hangs indefinitely.
> > > Discovered this on OSX (Mojave, git 2.20.1 (Apple Git-117)), but it
> > > reproduces in Ubuntu (19.04, git 2.25.0) Docker container on OSX and
> > > also on baremetal Ubuntu server (16.04, git 2.17.1).
> > >
> > > Steps to reproduce:
> > >
> > > 1. Generate the deep dir structure:
> > >
> > >     mkdir gittest; pushd gittest; git init; for i in $(seq 1 120); do
> > > mkdir dir; cd dir; done; touch leaf; popd
> > >
> > > 2. Try to get git status --ignored
> > >
> > >     cd gittest && git status --ignored
> > >
> > >
> > > If there is a dir depth limit, git should probably exit with an error
> > > rather than getting stuck endlessly.
> >
> > This is interesting, thanks for the report.
>
> Agreed.
>
> > There is no such directory depth limit, but the runtime of 'git status
> > --ignored' grows quickly with the depth of the untracked directory.
> > Running this shell loop produces the numbers below:
> >
> > for depth in $(seq 10 30)
> > do
> >         dirs=$(for i in $(seq 1 $depth) ; do printf 'dir/' ; done)
> >         rm -rf dir
> >         mkdir -p $dirs
> >         >$dirs/untracked-file
> >         /usr/bin/time --format="$depth: %e" git status --ignored >/dev/null
> > done
> >
> > 10: 0.01
> > 11: 0.03
> > 12: 0.05
> > 13: 0.11
> > 14: 0.23
> > 15: 0.47
> > 16: 0.97
> > 17: 1.97
> > 18: 3.88
> > 19: 7.85
> > 20: 16.29
> > 21: 32.92
> > 22: 76.24
>
> Wow.
>
> Really nice testcase, though, thanks.
>
[...]
> We'll see
> if my patch goes anywhere, or if it's just another dead-end among
> many.

Since gitgitgadget doesn't have an in-reply-to capability, for those
wanting to follow along see
https://lore.kernel.org/git/pull.700.git.git.1580335424.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-29 22:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27 10:55 git status --ignored hangs when a deep directory structure present in working tree Martin Melka
2020-01-27 12:08 ` SZEDER Gábor
2020-01-28  5:06   ` Elijah Newren
2020-01-28 13:57     ` SZEDER Gábor
2020-01-28 15:00       ` Elijah Newren
2020-01-29 22:10     ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).