* Git clean enumerates ignored directories (since 2.27) @ 2021-04-14 17:17 Jason Gore 2021-04-14 22:56 ` brian m. carlson 0 siblings, 1 reply; 10+ messages in thread From: Jason Gore @ 2021-04-14 17:17 UTC (permalink / raw) To: git@vger.kernel.org I'm unsure of whether this issue is Windows-specific but at the very least I suspect this is a change of behavior that would affect all platforms. In July 2020 I reported an issue I initially encountered in 2.27 that still seems to be present in 2.31.1.windows.1: https://github.com/git-for-windows/git/issues/2732 Since I haven't seen a response on the issue and it still occurs I decided to email this list as well. Package managers such as PNPM tend to create very long filenames due to symlinking. Having git not enumerate these ignored directories is essential to leveraging any sort of clean behavior (both for performance and for warning output) as we did before version 2.27. Our repo clean functions went from taking a few seconds to over 10 minutes due to this change in behavior. I've also had repeated problems sending you this email as your email server seems to reject any email with a URL in it (per the github link above.) Thanks, Jason ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-04-14 17:17 Git clean enumerates ignored directories (since 2.27) Jason Gore @ 2021-04-14 22:56 ` brian m. carlson 2021-04-15 8:51 ` Jeff King 0 siblings, 1 reply; 10+ messages in thread From: brian m. carlson @ 2021-04-14 22:56 UTC (permalink / raw) To: Jason Gore; +Cc: git@vger.kernel.org, Elijah Newren [-- Attachment #1: Type: text/plain, Size: 2380 bytes --] On 2021-04-14 at 17:17:46, Jason Gore wrote: > I'm unsure of whether this issue is Windows-specific but at the very least I suspect this is a change of behavior that would affect all platforms. > > In July 2020 I reported an issue I initially encountered in 2.27 that still seems to be present in 2.31.1.windows.1: https://github.com/git-for-windows/git/issues/2732 > > Since I haven't seen a response on the issue and it still occurs I decided to email this list as well. > > Package managers such as PNPM tend to create very long filenames due to symlinking. Having git not enumerate these ignored directories is essential to leveraging any sort of clean behavior (both for performance and for warning output) as we did before version 2.27. Our repo clean functions went from taking a few seconds to over 10 minutes due to this change in behavior. It looks like there were some changes to the dir.c code between 2.26 and 2.27 which may have impacted clean. I'm CCing Elijah Newren who wrote those patches and in any event would be more familiar with that code than I am. I've taken the liberty of pulling in your testcase from that issue and converting it to work in POSIX sh on a Linux machine. If the path is not sufficiently long for your needs, you can bump the value of the variable "last" below. ---- #!/bin/sh git init test-repo cd test-repo longname="directory" touch "$longname.txt" last=400 for x in $(seq 1 $last); do mkdir "x$longname$x" mv directory* "x$longname$x" mv "x$longname$x" "$longname$x" done git clean -ffdxn -e directory$last ---- When it fails, it will complain that it wasn't able to open the directory. It still exits zero, however. I haven't bisected this, so I don't know if those patches are related to the problem or not. I'm a little short on time today to investigate further, but hopefully this can get someone on the right path with a modified version and git bisect run if nothing else. > I've also had repeated problems sending you this email as your email server seems to reject any email with a URL in it (per the github link above.) I don't think that's the problem, since this email came through just fine. It's more likely that your email was being bounced for HTML, which isn't allowed on the list. -- brian m. carlson (he/him or they/them) Houston, Texas, US [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-04-14 22:56 ` brian m. carlson @ 2021-04-15 8:51 ` Jeff King 2021-04-22 17:18 ` Jason Gore 2021-05-07 2:31 ` Elijah Newren 0 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2021-04-15 8:51 UTC (permalink / raw) To: brian m. carlson; +Cc: Jason Gore, git@vger.kernel.org, Elijah Newren On Wed, Apr 14, 2021 at 10:56:14PM +0000, brian m. carlson wrote: > ---- > #!/bin/sh > > git init test-repo > cd test-repo > longname="directory" > touch "$longname.txt" > last=400 > for x in $(seq 1 $last); do > mkdir "x$longname$x" > mv directory* "x$longname$x" > mv "x$longname$x" "$longname$x" > done > git clean -ffdxn -e directory$last > ---- > > When it fails, it will complain that it wasn't able to open the > directory. It still exits zero, however. > > I haven't bisected this, so I don't know if those patches are related to > the problem or not. I'm a little short on time today to investigate > further, but hopefully this can get someone on the right path with a > modified version and git bisect run if nothing else. It bisects to 8d92fb2927 (dir: replace exponential algorithm with a linear one, 2020-04-01). I won't pretend to understand everything going on in that commit, though. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Git clean enumerates ignored directories (since 2.27) 2021-04-15 8:51 ` Jeff King @ 2021-04-22 17:18 ` Jason Gore 2021-04-22 17:24 ` Elijah Newren 2021-05-07 4:08 ` Elijah Newren 2021-05-07 2:31 ` Elijah Newren 1 sibling, 2 replies; 10+ messages in thread From: Jason Gore @ 2021-04-22 17:18 UTC (permalink / raw) To: peff@peff.net, sandals@crustytoothpaste.net Cc: git@vger.kernel.org, newren@gmail.com Sorry to add noise to this thread but since the github issue was closed I wanted to make sure I'm not missing any updates here. Would any updates come through on this thread? Thanks! -----Original Message----- From: Jeff King <peff@peff.net> Sent: Thursday, April 15, 2021 1:52 AM To: brian m. carlson <sandals@crustytoothpaste.net> Cc: Jason Gore <Jason.Gore@microsoft.com>; git@vger.kernel.org; Elijah Newren <newren@gmail.com> Subject: Re: Git clean enumerates ignored directories (since 2.27) On Wed, Apr 14, 2021 at 10:56:14PM +0000, brian m. carlson wrote: > ---- > #!/bin/sh > > git init test-repo > cd test-repo > longname="directory" > touch "$longname.txt" > last=400 > for x in $(seq 1 $last); do > mkdir "x$longname$x" > mv directory* "x$longname$x" > mv "x$longname$x" "$longname$x" > done > git clean -ffdxn -e directory$last > ---- > > When it fails, it will complain that it wasn't able to open the > directory. It still exits zero, however. > > I haven't bisected this, so I don't know if those patches are related > to the problem or not. I'm a little short on time today to > investigate further, but hopefully this can get someone on the right > path with a modified version and git bisect run if nothing else. It bisects to 8d92fb2927 (dir: replace exponential algorithm with a linear one, 2020-04-01). I won't pretend to understand everything going on in that commit, though. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-04-22 17:18 ` Jason Gore @ 2021-04-22 17:24 ` Elijah Newren 2021-05-07 4:08 ` Elijah Newren 1 sibling, 0 replies; 10+ messages in thread From: Elijah Newren @ 2021-04-22 17:24 UTC (permalink / raw) To: Jason Gore Cc: peff@peff.net, sandals@crustytoothpaste.net, git@vger.kernel.org Yes, sorry, I've just been swamped. It's still on my list to look at. On Thu, Apr 22, 2021 at 10:18 AM Jason Gore <Jason.Gore@microsoft.com> wrote: > > Sorry to add noise to this thread but since the github issue was closed I wanted to make sure I'm not missing any updates here. Would any updates come through on this thread? Thanks! > > -----Original Message----- > From: Jeff King <peff@peff.net> > Sent: Thursday, April 15, 2021 1:52 AM > To: brian m. carlson <sandals@crustytoothpaste.net> > Cc: Jason Gore <Jason.Gore@microsoft.com>; git@vger.kernel.org; Elijah Newren <newren@gmail.com> > Subject: Re: Git clean enumerates ignored directories (since 2.27) > > On Wed, Apr 14, 2021 at 10:56:14PM +0000, brian m. carlson wrote: > > > ---- > > #!/bin/sh > > > > git init test-repo > > cd test-repo > > longname="directory" > > touch "$longname.txt" > > last=400 > > for x in $(seq 1 $last); do > > mkdir "x$longname$x" > > mv directory* "x$longname$x" > > mv "x$longname$x" "$longname$x" > > done > > git clean -ffdxn -e directory$last > > ---- > > > > When it fails, it will complain that it wasn't able to open the > > directory. It still exits zero, however. > > > > I haven't bisected this, so I don't know if those patches are related > > to the problem or not. I'm a little short on time today to > > investigate further, but hopefully this can get someone on the right > > path with a modified version and git bisect run if nothing else. > > It bisects to 8d92fb2927 (dir: replace exponential algorithm with a linear one, 2020-04-01). I won't pretend to understand everything going on in that commit, though. > > -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-04-22 17:18 ` Jason Gore 2021-04-22 17:24 ` Elijah Newren @ 2021-05-07 4:08 ` Elijah Newren 1 sibling, 0 replies; 10+ messages in thread From: Elijah Newren @ 2021-05-07 4:08 UTC (permalink / raw) To: Jason Gore Cc: peff@peff.net, sandals@crustytoothpaste.net, git@vger.kernel.org On Thu, Apr 22, 2021 at 10:18 AM Jason Gore <Jason.Gore@microsoft.com> wrote: > > Sorry to add noise to this thread but since the github issue was closed I wanted to make sure I'm not missing any updates here. Would any updates come through on this thread? Thanks! I have posted some proposed fixes at https://lore.kernel.org/git/pull.1020.git.git.1620360300.gitgitgadget@gmail.com/T/#t (particularly patches 1 & 3); feel free to watch there, or over at https://github.com/git/git/pull/1020. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-04-15 8:51 ` Jeff King 2021-04-22 17:18 ` Jason Gore @ 2021-05-07 2:31 ` Elijah Newren 2021-05-07 3:25 ` Jeff King 1 sibling, 1 reply; 10+ messages in thread From: Elijah Newren @ 2021-05-07 2:31 UTC (permalink / raw) To: Jeff King; +Cc: brian m. carlson, Jason Gore, git@vger.kernel.org On Thu, Apr 15, 2021 at 1:51 AM Jeff King <peff@peff.net> wrote: > > On Wed, Apr 14, 2021 at 10:56:14PM +0000, brian m. carlson wrote: > > > ---- > > #!/bin/sh > > > > git init test-repo > > cd test-repo > > longname="directory" > > touch "$longname.txt" > > last=400 > > for x in $(seq 1 $last); do > > mkdir "x$longname$x" > > mv directory* "x$longname$x" > > mv "x$longname$x" "$longname$x" > > done > > git clean -ffdxn -e directory$last > > ---- > > > > When it fails, it will complain that it wasn't able to open the > > directory. It still exits zero, however. > > > > I haven't bisected this, so I don't know if those patches are related to > > the problem or not. I'm a little short on time today to investigate > > further, but hopefully this can get someone on the right path with a > > modified version and git bisect run if nothing else. > > It bisects to 8d92fb2927 (dir: replace exponential algorithm with a > linear one, 2020-04-01). I won't pretend to understand everything going > on in that commit, though. I've got a patch series to fix this and another traversal problem, and the tests pass on all platforms. Wahoo! BUT! On alpine linux-musl, I get an "error: Tests passed but test cleanup failed; aborting", which makes it report as a failed build. I'm not sure how to fix it and am asking for ideas. Apparently the deeply nested directory hierarchy cannot be removed on that platform with a simple "rm -rf $dirname". It throws a "rm: can't stat '/__w/git/git/t/trash directory.t7300-clean/avoid-traversing-deep-hierarchy/directory400/directory399/directory398/.....(you get the idea)....': Filename too long" error message.[1] Adding a "test_when_finished find directory400 -delete" also gives a "Filename too long" message followed by a lot of "Directory not empty" messages.[2] Anyone have any bright ideas about how to tweak this test? See [3] for the current incarnation of the code, which was basically taken from Brian's sample testcase. [1] https://github.com/git/git/pull/1020/checks?check_run_id=2523414561 [2] https://github.com/git/git/pull/1020/checks?check_run_id=2523594095 [3] https://github.com/git/git/pull/1020 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-05-07 2:31 ` Elijah Newren @ 2021-05-07 3:25 ` Jeff King 2021-05-07 3:43 ` Jeff King 2021-05-07 3:44 ` Elijah Newren 0 siblings, 2 replies; 10+ messages in thread From: Jeff King @ 2021-05-07 3:25 UTC (permalink / raw) To: Elijah Newren; +Cc: brian m. carlson, Jason Gore, git@vger.kernel.org On Thu, May 06, 2021 at 07:31:40PM -0700, Elijah Newren wrote: > On alpine linux-musl, I get an "error: Tests passed but test cleanup > failed; aborting", which makes it report as a failed build. I'm not > sure how to fix it and am asking for ideas. > > Apparently the deeply nested directory hierarchy cannot be removed on > that platform with a simple "rm -rf $dirname". It throws a "rm: can't > stat '/__w/git/git/t/trash > directory.t7300-clean/avoid-traversing-deep-hierarchy/directory400/directory399/directory398/.....(you > get the idea)....': Filename too long" error message.[1] > > Adding a "test_when_finished find directory400 -delete" also gives a > "Filename too long" message followed by a lot of "Directory not empty" > messages.[2] > > Anyone have any bright ideas about how to tweak this test? See [3] > for the current incarnation of the code, which was basically taken > from Brian's sample testcase. My guess is that that version of "rm" is trying to feed the entire pathname directly to unlink() and rmdir(), and it exceeds PATH_MAX. Even with GNU tools, for instance, I get: $ rmdir $(find avoid-traversing-deep-hierarchy -type d | tail -1) rmdir: failed to remove 'avoid-traversing-deep-hierarchy/directory400/ [...and so on...]/directory1': File name too long because it feeds the whole to a single rmdir() call. Whereas stracing GNU "rm -rf", it uses unlinkat() and openat() to delete each level individually (probably to avoid this exact problem). Is the actual path length important, or just the depth? If the latter, then calling it "d400/d399/.../d2/d1" would likely help, as that's less than 2000 bytes. -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-05-07 3:25 ` Jeff King @ 2021-05-07 3:43 ` Jeff King 2021-05-07 3:44 ` Elijah Newren 1 sibling, 0 replies; 10+ messages in thread From: Jeff King @ 2021-05-07 3:43 UTC (permalink / raw) To: Elijah Newren; +Cc: brian m. carlson, Jason Gore, git@vger.kernel.org On Thu, May 06, 2021 at 11:25:54PM -0400, Jeff King wrote: > > Anyone have any bright ideas about how to tweak this test? See [3] > > for the current incarnation of the code, which was basically taken > > from Brian's sample testcase. > > My guess is that that version of "rm" is trying to feed the entire > pathname directly to unlink() and rmdir(), and it exceeds PATH_MAX. > > Even with GNU tools, for instance, I get: > > $ rmdir $(find avoid-traversing-deep-hierarchy -type d | tail -1) > rmdir: failed to remove 'avoid-traversing-deep-hierarchy/directory400/ > [...and so on...]/directory1': File name too long > > because it feeds the whole to a single rmdir() call. Whereas stracing > GNU "rm -rf", it uses unlinkat() and openat() to delete each level > individually (probably to avoid this exact problem). > > Is the actual path length important, or just the depth? If the latter, > then calling it "d400/d399/.../d2/d1" would likely help, as that's less > than 2000 bytes. Reading your commit messages a little more carefully, I'm still not quite sure of the answer to that question. But if you really do need the long length, a workaround is to avoid dealing with the full path all at once. E.g., make two strings, one with "directory400/.../directory200", and one with "directory199/.../directory1". And then you can probably: (cd $one && rm -rf directory199) && rm -rf directory400 to do it in two parts, with each "rm" seeing only a half-length path. I notice you also run O(n) "mkdir" and "mv" calls to create the directory. I "mkdir -p" would be much more efficient, though it might run afoul of similar path-length problems (especially on non-GNU systems). It might be worth turning to perl here: perl -e ' for (reverse 1..400) { my $d = "directory$_"; mkdir($d) and chdir($d) or die "mkdir($d): $!"; } open(my $fh, ">", "some-file"); ' and you could probably do something similar to remove it. Sadly, I don't think using File::Path makes building it easier, because it hits the same path limit (it builds up the string internally). However, its removal does work (and is in the set of core modules that we can count on always being available): perl -MFile::Path=remove_tree -e 'remove_tree("directory400")' -Peff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Git clean enumerates ignored directories (since 2.27) 2021-05-07 3:25 ` Jeff King 2021-05-07 3:43 ` Jeff King @ 2021-05-07 3:44 ` Elijah Newren 1 sibling, 0 replies; 10+ messages in thread From: Elijah Newren @ 2021-05-07 3:44 UTC (permalink / raw) To: Jeff King; +Cc: brian m. carlson, Jason Gore, git@vger.kernel.org On Thu, May 6, 2021 at 8:25 PM Jeff King <peff@peff.net> wrote: > > On Thu, May 06, 2021 at 07:31:40PM -0700, Elijah Newren wrote: > > > On alpine linux-musl, I get an "error: Tests passed but test cleanup > > failed; aborting", which makes it report as a failed build. I'm not > > sure how to fix it and am asking for ideas. > > > > Apparently the deeply nested directory hierarchy cannot be removed on > > that platform with a simple "rm -rf $dirname". It throws a "rm: can't > > stat '/__w/git/git/t/trash > > directory.t7300-clean/avoid-traversing-deep-hierarchy/directory400/directory399/directory398/.....(you > > get the idea)....': Filename too long" error message.[1] > > > > Adding a "test_when_finished find directory400 -delete" also gives a > > "Filename too long" message followed by a lot of "Directory not empty" > > messages.[2] > > > > Anyone have any bright ideas about how to tweak this test? See [3] > > for the current incarnation of the code, which was basically taken > > from Brian's sample testcase. > > My guess is that that version of "rm" is trying to feed the entire > pathname directly to unlink() and rmdir(), and it exceeds PATH_MAX. > > Even with GNU tools, for instance, I get: > > $ rmdir $(find avoid-traversing-deep-hierarchy -type d | tail -1) > rmdir: failed to remove 'avoid-traversing-deep-hierarchy/directory400/ > [...and so on...]/directory1': File name too long > > because it feeds the whole to a single rmdir() call. Whereas stracing > GNU "rm -rf", it uses unlinkat() and openat() to delete each level > individually (probably to avoid this exact problem). > > Is the actual path length important, or just the depth? If the latter, > then calling it "d400/d399/.../d2/d1" would likely help, as that's less > than 2000 bytes. I needed some kind of way to notice and test that we had recursed into the directory. Excessively long paths trigger failures in open() calls in git itself, which was the only marker I knew of. So, I was essentially depending on the long paths. (The actual depth was not important.). If there were a different way to determine whether we recursed into the first level subdirectory, that'd be nicer. I thought about adding some trace2_region_enter/trace2_region_leave calls and letting them nest up to whatever depth, though that seemed a bit ugly. I also thought about trying to determine the maximum recursion level and seeing if that could be emitted via some kind of trace2 magic, but that sounded like another rabbit hole. In the end, I gave up and used Brian's modification of Jason's testcase. However, it looks like manually unstacking and deleting the directory is going to work. Incredibly ugly and slow, but adding the following at the end of the testcase makes it work (at least on that platform, still awaiting results on others): # alpine-linux-musl fails to "rm -rf" a directory with such # a deeply nested hierarchy. Help it out by deleting the # leading directories ourselves. Super slow, but, what else # can we do? Without this, we will hit a # error: Tests passed but test cleanup failed; aborting # so do this ugly manual cleanup... while test ! -f directory-random-file.txt; do name=$(ls -d directory*) && mv $name/* . && rmdir $name done ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-05-07 4:08 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-14 17:17 Git clean enumerates ignored directories (since 2.27) Jason Gore 2021-04-14 22:56 ` brian m. carlson 2021-04-15 8:51 ` Jeff King 2021-04-22 17:18 ` Jason Gore 2021-04-22 17:24 ` Elijah Newren 2021-05-07 4:08 ` Elijah Newren 2021-05-07 2:31 ` Elijah Newren 2021-05-07 3:25 ` Jeff King 2021-05-07 3:43 ` Jeff King 2021-05-07 3:44 ` Elijah Newren
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).