* [PATCH 0/5] Directory traversal fixes @ 2021-05-07 4:04 Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (7 more replies) 0 siblings, 8 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-07 4:04 UTC (permalink / raw) To: git; +Cc: Elijah Newren This patchset fixes a few directory traversal issues, where fill_directory() would traverse into directories that it shouldn't and not traverse into directories that it should. One of these issues was reported recently on this list[1], another was found at $DAYJOB. The fifth patch might have backward compatibility implications, but is easy to review. Even if the logic in dir.c makes your eyes glaze over, at least take a look at the fifth patch. Also, if anyone has any ideas about a better place to put the "Some sidenotes" from the third commit message rather than keeping them in a random commit message, that might be helpful too. [1] See https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/ or alternatively https://github.com/git-for-windows/git/issues/2732. Elijah Newren (5): t7300: add testcase showing unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal dir: avoid unnecessary traversal into ignored directory dir: traverse into untracked directories if they may have ignored subfiles [RFC] ls-files: error out on -i unless -o or -c are specified builtin/ls-files.c | 3 ++ dir.c | 50 ++++++++++++++++--------- t/t1306-xdg-files.sh | 2 +- t/t3001-ls-files-others-exclude.sh | 5 +++ t/t3003-ls-files-exclude.sh | 4 +- t/t7300-clean.sh | 59 ++++++++++++++++++++++++++++++ 6 files changed, 103 insertions(+), 20 deletions(-) base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v1 Pull-Request: https://github.com/git/git/pull/1020 -- gitgitgadget ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget @ 2021-05-07 4:04 ` Elijah Newren via GitGitGadget 2021-05-07 4:27 ` Eric Sunshine 2021-05-08 11:13 ` Philip Oakley 2021-05-07 4:04 ` [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget ` (6 subsequent siblings) 7 siblings, 2 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-07 4:04 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> PNPM is apparently creating deeply nested (but ignored) directory structures; traversing them is costly performance-wise, unnecessary, and in some cases is even throwing warnings/errors because the paths are too long to handle on various platforms. Add a testcase that demonstrates this problem. Initial-test-by: Jason Gore <Jason.Gore@microsoft.com> Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index a74816ca8b46..5f1dc397c11e 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' +test_expect_failure 'avoid traversing into ignored directories' ' + test_when_finished rm -f output error && + test_create_repo avoid-traversing-deep-hierarchy && + ( + cd avoid-traversing-deep-hierarchy && + + >directory-random-file.txt && + # Put this file under directory400/directory399/.../directory1/ + depth=400 && + for x in $(test_seq 1 $depth); do + mkdir "tmpdirectory$x" && + mv directory* "tmpdirectory$x" && + mv "tmpdirectory$x" "directory$x" + done && + + git clean -ffdxn -e directory$depth >../output 2>../error && + + test_must_be_empty ../output && + # We especially do not want things like + # "warning: could not open directory " + # appearing in the error output. It is true that directories + # that are too long cannot be opened, but we should not be + # recursing into those directories anyway since the very first + # level is ignored. + test_must_be_empty ../error && + + # alpine-linux-musl fails to "rm -rf" a directory with such + # a deeply nested hierarchy. Help it out by deleting the + # leading directories ourselves. Super slow, but, what else + # can we do? Without this, we will hit a + # error: Tests passed but test cleanup failed; aborting + # so do this ugly manual cleanup... + while test ! -f directory-random-file.txt; do + name=$(ls -d directory*) && + mv $name/* . && + rmdir $name + done + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-07 4:27 ` Eric Sunshine 2021-05-07 5:00 ` Elijah Newren 2021-05-08 11:13 ` Philip Oakley 1 sibling, 1 reply; 90+ messages in thread From: Eric Sunshine @ 2021-05-07 4:27 UTC (permalink / raw) To: Elijah Newren via GitGitGadget; +Cc: Git List, Elijah Newren On Fri, May 7, 2021 at 12:05 AM Elijah Newren via GitGitGadget <gitgitgadget@gmail.com> wrote: > PNPM is apparently creating deeply nested (but ignored) directory > structures; traversing them is costly performance-wise, unnecessary, and > in some cases is even throwing warnings/errors because the paths are too > long to handle on various platforms. Add a testcase that demonstrates > this problem. > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' ' > +test_expect_failure 'avoid traversing into ignored directories' ' > + test_when_finished rm -f output error && > + test_create_repo avoid-traversing-deep-hierarchy && > + ( > + cd avoid-traversing-deep-hierarchy && > + > + >directory-random-file.txt && > + # Put this file under directory400/directory399/.../directory1/ > + depth=400 && > + for x in $(test_seq 1 $depth); do > + mkdir "tmpdirectory$x" && > + mv directory* "tmpdirectory$x" && > + mv "tmpdirectory$x" "directory$x" > + done && Is this expensive/slow loop needed because you'd otherwise run afoul of command-line length limits on some platforms if you tried creating the entire mess of directories with a single `mkdir -p`? > + git clean -ffdxn -e directory$depth >../output 2>../error && > + > + test_must_be_empty ../output && > + # We especially do not want things like > + # "warning: could not open directory " > + # appearing in the error output. It is true that directories > + # that are too long cannot be opened, but we should not be > + # recursing into those directories anyway since the very first > + # level is ignored. > + test_must_be_empty ../error && > + > + # alpine-linux-musl fails to "rm -rf" a directory with such > + # a deeply nested hierarchy. Help it out by deleting the > + # leading directories ourselves. Super slow, but, what else > + # can we do? Without this, we will hit a > + # error: Tests passed but test cleanup failed; aborting > + # so do this ugly manual cleanup... > + while test ! -f directory-random-file.txt; do > + name=$(ls -d directory*) && > + mv $name/* . && > + rmdir $name > + done Shouldn't this cleanup loop be under the control of test_when_finished() to ensure it is invoked regardless of how the test exits? > + ) > +' ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 4:27 ` Eric Sunshine @ 2021-05-07 5:00 ` Elijah Newren 2021-05-07 5:31 ` Eric Sunshine 2021-05-07 23:05 ` Jeff King 0 siblings, 2 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-07 5:00 UTC (permalink / raw) To: Eric Sunshine; +Cc: Elijah Newren via GitGitGadget, Git List On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > On Fri, May 7, 2021 at 12:05 AM Elijah Newren via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > PNPM is apparently creating deeply nested (but ignored) directory > > structures; traversing them is costly performance-wise, unnecessary, and > > in some cases is even throwing warnings/errors because the paths are too > > long to handle on various platforms. Add a testcase that demonstrates > > this problem. > > > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh > > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' ' > > +test_expect_failure 'avoid traversing into ignored directories' ' > > + test_when_finished rm -f output error && > > + test_create_repo avoid-traversing-deep-hierarchy && > > + ( > > + cd avoid-traversing-deep-hierarchy && > > + > > + >directory-random-file.txt && > > + # Put this file under directory400/directory399/.../directory1/ > > + depth=400 && > > + for x in $(test_seq 1 $depth); do > > + mkdir "tmpdirectory$x" && > > + mv directory* "tmpdirectory$x" && > > + mv "tmpdirectory$x" "directory$x" > > + done && > > Is this expensive/slow loop needed because you'd otherwise run afoul > of command-line length limits on some platforms if you tried creating > the entire mess of directories with a single `mkdir -p`? The whole point is creating a path long enough that it runs afoul of limits, yes. If we had an alternative way to check whether dir.c actually recursed into a directory, then I could dispense with this and just have a single directory (and it could be named a single character long for that matter too), but I don't know of a good way to do that. (Some possiibilities I considered along that route are mentioned at https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/) > > + git clean -ffdxn -e directory$depth >../output 2>../error && > > + > > + test_must_be_empty ../output && > > + # We especially do not want things like > > + # "warning: could not open directory " > > + # appearing in the error output. It is true that directories > > + # that are too long cannot be opened, but we should not be > > + # recursing into those directories anyway since the very first > > + # level is ignored. > > + test_must_be_empty ../error && > > + > > + # alpine-linux-musl fails to "rm -rf" a directory with such > > + # a deeply nested hierarchy. Help it out by deleting the > > + # leading directories ourselves. Super slow, but, what else > > + # can we do? Without this, we will hit a > > + # error: Tests passed but test cleanup failed; aborting > > + # so do this ugly manual cleanup... > > + while test ! -f directory-random-file.txt; do > > + name=$(ls -d directory*) && > > + mv $name/* . && > > + rmdir $name > > + done > > Shouldn't this cleanup loop be under the control of > test_when_finished() to ensure it is invoked regardless of how the > test exits? I thought about that, but if the test fails, it seems nicer to leave everything behind so it can be inspected. It's similar to test_done, which will only delete the $TRASH_DIRECTORY if all the tests passed. So no, I don't think this should be under the control of test_when_finished. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 5:00 ` Elijah Newren @ 2021-05-07 5:31 ` Eric Sunshine 2021-05-07 5:42 ` Elijah Newren 2021-05-07 23:05 ` Jeff King 1 sibling, 1 reply; 90+ messages in thread From: Eric Sunshine @ 2021-05-07 5:31 UTC (permalink / raw) To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git List On Fri, May 7, 2021 at 1:01 AM Elijah Newren <newren@gmail.com> wrote: > On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > Is this expensive/slow loop needed because you'd otherwise run afoul > > of command-line length limits on some platforms if you tried creating > > the entire mess of directories with a single `mkdir -p`? > > The whole point is creating a path long enough that it runs afoul of > limits, yes. > > If we had an alternative way to check whether dir.c actually recursed > into a directory, then I could dispense with this and just have a > single directory (and it could be named a single character long for > that matter too), but I don't know of a good way to do that. (Some > possiibilities I considered along that route are mentioned at > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/) Thanks, I read that exchange (of course) immediately after sending the above question. > > > + while test ! -f directory-random-file.txt; do > > > + name=$(ls -d directory*) && > > > + mv $name/* . && > > > + rmdir $name > > > + done > > > > Shouldn't this cleanup loop be under the control of > > test_when_finished() to ensure it is invoked regardless of how the > > test exits? > > I thought about that, but if the test fails, it seems nicer to leave > everything behind so it can be inspected. It's similar to test_done, > which will only delete the $TRASH_DIRECTORY if all the tests passed. > So no, I don't think this should be under the control of > test_when_finished. I may be confused, but I'm not following this reasoning. If you're using `-i` to debug a failure within the test, then the test_when_finished() cleanup actions won't be triggered anyhow (they're suppressed by `-i`), so everything will be left behind as desired. The problem with not placing this under control of test_when_finished() is that, if something in the test proper does break, after the "test failed" message, you'll get the undesirable alpine-linux-musl behavior you explained in your earlier email where test_done() bombs out. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 5:31 ` Eric Sunshine @ 2021-05-07 5:42 ` Elijah Newren 2021-05-07 5:56 ` Eric Sunshine 0 siblings, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-07 5:42 UTC (permalink / raw) To: Eric Sunshine; +Cc: Elijah Newren via GitGitGadget, Git List On Thu, May 6, 2021 at 10:32 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > On Fri, May 7, 2021 at 1:01 AM Elijah Newren <newren@gmail.com> wrote: > > On Thu, May 6, 2021 at 9:27 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > > Is this expensive/slow loop needed because you'd otherwise run afoul > > > of command-line length limits on some platforms if you tried creating > > > the entire mess of directories with a single `mkdir -p`? > > > > The whole point is creating a path long enough that it runs afoul of > > limits, yes. > > > > If we had an alternative way to check whether dir.c actually recursed > > into a directory, then I could dispense with this and just have a > > single directory (and it could be named a single character long for > > that matter too), but I don't know of a good way to do that. (Some > > possiibilities I considered along that route are mentioned at > > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/) > > Thanks, I read that exchange (of course) immediately after sending the > above question. > > > > > + while test ! -f directory-random-file.txt; do > > > > + name=$(ls -d directory*) && > > > > + mv $name/* . && > > > > + rmdir $name > > > > + done > > > > > > Shouldn't this cleanup loop be under the control of > > > test_when_finished() to ensure it is invoked regardless of how the > > > test exits? > > > > I thought about that, but if the test fails, it seems nicer to leave > > everything behind so it can be inspected. It's similar to test_done, > > which will only delete the $TRASH_DIRECTORY if all the tests passed. > > So no, I don't think this should be under the control of > > test_when_finished. > > I may be confused, but I'm not following this reasoning. If you're > using `-i` to debug a failure within the test, then the > test_when_finished() cleanup actions won't be triggered anyhow > (they're suppressed by `-i`), so everything will be left behind as > desired. I didn't know that about --immediate. It's good to know. However, not all debugging is done with -i; someone can also just run the testsuite expecting everything to pass, see a failure, and then decide to go look around (and then maybe re-run with -i if the initial looking around isn't clear). I do that every once in a while. > The problem with not placing this under control of > test_when_finished() is that, if something in the test proper does > break, after the "test failed" message, you'll get the undesirable > alpine-linux-musl behavior you explained in your earlier email where > test_done() bombs out. Unless I'm misunderstanding the test_done() code (I'm looking at test-lib.sh, lines 1149-1183), test_done() only bombs out when it tries to "rm -rf $TRASH_DIRECTORY", and it only runs that command if there are 0 test failures (see test-lib.sh, lines 1149-1183). So, if something in the test proper does break, that by itself will prevent test_done() from bombing out. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 5:42 ` Elijah Newren @ 2021-05-07 5:56 ` Eric Sunshine 0 siblings, 0 replies; 90+ messages in thread From: Eric Sunshine @ 2021-05-07 5:56 UTC (permalink / raw) To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git List On Fri, May 7, 2021 at 1:42 AM Elijah Newren <newren@gmail.com> wrote: > On Thu, May 6, 2021 at 10:32 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > I may be confused, but I'm not following this reasoning. If you're > > using `-i` to debug a failure within the test, then the > > test_when_finished() cleanup actions won't be triggered anyhow > > (they're suppressed by `-i`), so everything will be left behind as > > desired. > > I didn't know that about --immediate. It's good to know. However, > not all debugging is done with -i; someone can also just run the > testsuite expecting everything to pass, see a failure, and then decide > to go look around (and then maybe re-run with -i if the initial > looking around isn't clear). I do that every once in a while. That's certainly an approach, and it's made easier when each test creates its own repo (as the tests you write typically do). In general. though, the majority of Git test scripts run all their tests in a single repo (per test script), with the result that state from a failed test is very frequently clobbered by subsequent tests, which is why --immediate is so useful (it stops the script as soon as one test fails, so the test state is preserved as well as it can be). Due to the "clobbering" problem, I don't think I've ever tried debugging a failed test without using --immediate. > > The problem with not placing this under control of > > test_when_finished() is that, if something in the test proper does > > break, after the "test failed" message, you'll get the undesirable > > alpine-linux-musl behavior you explained in your earlier email where > > test_done() bombs out. > > Unless I'm misunderstanding the test_done() code (I'm looking at > test-lib.sh, lines 1149-1183), test_done() only bombs out when it > tries to "rm -rf $TRASH_DIRECTORY", and it only runs that command if > there are 0 test failures (see test-lib.sh, lines 1149-1183). So, if > something in the test proper does break, that by itself will prevent > test_done() from bombing out. I see what you're saying. Okay. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 5:00 ` Elijah Newren 2021-05-07 5:31 ` Eric Sunshine @ 2021-05-07 23:05 ` Jeff King 2021-05-07 23:15 ` Eric Sunshine 2021-05-08 0:04 ` Elijah Newren 1 sibling, 2 replies; 90+ messages in thread From: Jeff King @ 2021-05-07 23:05 UTC (permalink / raw) To: Elijah Newren; +Cc: Eric Sunshine, Elijah Newren via GitGitGadget, Git List On Thu, May 06, 2021 at 10:00:49PM -0700, Elijah Newren wrote: > > > + >directory-random-file.txt && > > > + # Put this file under directory400/directory399/.../directory1/ > > > + depth=400 && > > > + for x in $(test_seq 1 $depth); do > > > + mkdir "tmpdirectory$x" && > > > + mv directory* "tmpdirectory$x" && > > > + mv "tmpdirectory$x" "directory$x" > > > + done && > > > > Is this expensive/slow loop needed because you'd otherwise run afoul > > of command-line length limits on some platforms if you tried creating > > the entire mess of directories with a single `mkdir -p`? > > The whole point is creating a path long enough that it runs afoul of > limits, yes. > > If we had an alternative way to check whether dir.c actually recursed > into a directory, then I could dispense with this and just have a > single directory (and it could be named a single character long for > that matter too), but I don't know of a good way to do that. (Some > possiibilities I considered along that route are mentioned at > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/) I don't have a better way of checking the dir.c behavior. But I think the other half of Eric's question was: why can't we do this setup way more efficiently with "mkdir -p"? I'd be suspicious that it would work portably because of the long path. But I think the perl I showed earlier would create it in much less time: $ touch directory-file $ time sh -c ' for x in $(seq 1 400) do mkdir tmpdirectory$x && mv directory* tmpdirectory$x && mv tmpdirectory$x directory$x done ' real 0m2.222s user 0m1.481s sys 0m0.816s $ time perl -e ' for (reverse 1..400) { my $d = "directory$_"; mkdir($d) and chdir($d) or die "mkdir($d): $!"; } open(my $fh, ">", "some-file"); ' real 0m0.010s user 0m0.001s sys 0m0.009s -Peff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 23:05 ` Jeff King @ 2021-05-07 23:15 ` Eric Sunshine 2021-05-08 0:04 ` Elijah Newren 1 sibling, 0 replies; 90+ messages in thread From: Eric Sunshine @ 2021-05-07 23:15 UTC (permalink / raw) To: Jeff King; +Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git List On Fri, May 7, 2021 at 7:05 PM Jeff King <peff@peff.net> wrote: > I don't have a better way of checking the dir.c behavior. But I think > the other half of Eric's question was: why can't we do this setup way > more efficiently with "mkdir -p"? I didn't really have that other half-question, as I understood the portability ramifications. Rather, I just wanted to make sure the reason I thought the code was doing the for-loop-plus-mv dance was indeed correct, and that I wasn't overlooking something non-obvious. I was also indirectly hinting that that bit of code might deserve an in-code comment explaining why the for-loop is there so that someone doesn't come along in the future and try replacing it with `mkdir -p`. > I'd be suspicious that it would work portably because of the long path. > But I think the perl I showed earlier would create it in much less time: > > $ time perl -e ' > for (reverse 1..400) { > my $d = "directory$_"; > mkdir($d) and chdir($d) or die "mkdir($d): $!"; > } > open(my $fh, ">", "some-file"); > ' Yep, this and your other Perl code snippet for removing the directory seemed much nicer than the far more expensive shell for-loop-plus-mv (especially for Windows folk). ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 23:05 ` Jeff King 2021-05-07 23:15 ` Eric Sunshine @ 2021-05-08 0:04 ` Elijah Newren 2021-05-08 0:10 ` Eric Sunshine 1 sibling, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-08 0:04 UTC (permalink / raw) To: Jeff King; +Cc: Eric Sunshine, Elijah Newren via GitGitGadget, Git List On Fri, May 7, 2021 at 4:05 PM Jeff King <peff@peff.net> wrote: > > On Thu, May 06, 2021 at 10:00:49PM -0700, Elijah Newren wrote: > > > > > + >directory-random-file.txt && > > > > + # Put this file under directory400/directory399/.../directory1/ > > > > + depth=400 && > > > > + for x in $(test_seq 1 $depth); do > > > > + mkdir "tmpdirectory$x" && > > > > + mv directory* "tmpdirectory$x" && > > > > + mv "tmpdirectory$x" "directory$x" > > > > + done && > > > > > > Is this expensive/slow loop needed because you'd otherwise run afoul > > > of command-line length limits on some platforms if you tried creating > > > the entire mess of directories with a single `mkdir -p`? > > > > The whole point is creating a path long enough that it runs afoul of > > limits, yes. > > > > If we had an alternative way to check whether dir.c actually recursed > > into a directory, then I could dispense with this and just have a > > single directory (and it could be named a single character long for > > that matter too), but I don't know of a good way to do that. (Some > > possiibilities I considered along that route are mentioned at > > https://lore.kernel.org/git/CABPp-BF3e+MWQAGb6ER7d5jqjcV=kYqQ2stM_oDyaqvonPPPSw@mail.gmail.com/) > > I don't have a better way of checking the dir.c behavior. But I think > the other half of Eric's question was: why can't we do this setup way > more efficiently with "mkdir -p"? I think I figured it out. I now have the test simplified down to just: test_expect_success 'avoid traversing into ignored directories' ' test_when_finished rm -f output error trace.* && test_create_repo avoid-traversing-deep-hierarchy && ( mkdir -p untracked/subdir/with/a && >untracked/subdir/with/a/random-file.txt && GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git clean -ffdxn -e untracked && grep data.*read_directo.*visited ../trace.output \ | cut -d "|" -f 9 >../trace.relevant && cat >../trace.expect <<-EOF && directories-visited:1 paths-visited:4 EOF test_cmp ../trace.expect ../trace.relevant ) ' This relies on a few extra changes to the code: (1) switching the existing trace calls in dir.c over to using trace2 variants, and (2) adding two new counters (visited_directories and visited_paths) that are output using the trace2 framework. I'm a little unsure if I should check the paths-visited counter (will some platform have additional files in every directory besides '.' and '..'? Or not have one of those?), but it is good to have it check that the code in this case visits no directories other than the toplevel one (i.e. that directories-visited is 1). New patches incoming shortly... ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 0:04 ` Elijah Newren @ 2021-05-08 0:10 ` Eric Sunshine 2021-05-08 17:20 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Eric Sunshine @ 2021-05-08 0:10 UTC (permalink / raw) To: Elijah Newren; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git List On Fri, May 7, 2021 at 8:04 PM Elijah Newren <newren@gmail.com> wrote: > I think I figured it out. I now have the test simplified down to just: > > test_expect_success 'avoid traversing into ignored directories' ' > test_when_finished rm -f output error trace.* && > test_create_repo avoid-traversing-deep-hierarchy && > ( > mkdir -p untracked/subdir/with/a && > >untracked/subdir/with/a/random-file.txt && > > GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ > git clean -ffdxn -e untracked && > > grep data.*read_directo.*visited ../trace.output \ > | cut -d "|" -f 9 >../trace.relevant && > cat >../trace.expect <<-EOF && > directories-visited:1 > paths-visited:4 > EOF > test_cmp ../trace.expect ../trace.relevant > ) > ' I believe that you can close the subshell immediately after `git clean`, which would allow you to drop all the "../" prefixes on pathnames. > This relies on a few extra changes to the code: (1) switching the > existing trace calls in dir.c over to using trace2 variants, and (2) > adding two new counters (visited_directories and visited_paths) that > are output using the trace2 framework. I'm a little unsure if I > should check the paths-visited counter (will some platform have > additional files in every directory besides '.' and '..'? Or not have > one of those?), but it is good to have it check that the code in this > case visits no directories other than the toplevel one (i.e. that > directories-visited is 1). I can't find the reference, but I recall a reply by jrneider (to some proposed patch) that not all platforms are guaranteed to have "." and ".." entries (but I'm not sure we need to worry about that presently). ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 0:10 ` Eric Sunshine @ 2021-05-08 17:20 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-08 17:20 UTC (permalink / raw) To: Eric Sunshine; +Cc: Jeff King, Elijah Newren via GitGitGadget, Git List On Fri, May 7, 2021 at 5:11 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > > On Fri, May 7, 2021 at 8:04 PM Elijah Newren <newren@gmail.com> wrote: > > I think I figured it out. I now have the test simplified down to just: > > > > test_expect_success 'avoid traversing into ignored directories' ' > > test_when_finished rm -f output error trace.* && > > test_create_repo avoid-traversing-deep-hierarchy && > > ( > > mkdir -p untracked/subdir/with/a && > > >untracked/subdir/with/a/random-file.txt && > > > > GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ > > git clean -ffdxn -e untracked && > > > > grep data.*read_directo.*visited ../trace.output \ > > | cut -d "|" -f 9 >../trace.relevant && > > cat >../trace.expect <<-EOF && > > directories-visited:1 > > paths-visited:4 > > EOF > > test_cmp ../trace.expect ../trace.relevant > > ) > > ' > > I believe that you can close the subshell immediately after `git > clean`, which would allow you to drop all the "../" prefixes on > pathnames. Ah, good point. I'll make that fix. > > This relies on a few extra changes to the code: (1) switching the > > existing trace calls in dir.c over to using trace2 variants, and (2) > > adding two new counters (visited_directories and visited_paths) that > > are output using the trace2 framework. I'm a little unsure if I > > should check the paths-visited counter (will some platform have > > additional files in every directory besides '.' and '..'? Or not have > > one of those?), but it is good to have it check that the code in this > > case visits no directories other than the toplevel one (i.e. that > > directories-visited is 1). > > I can't find the reference, but I recall a reply by jrneider (to some > proposed patch) that not all platforms are guaranteed to have "." and > ".." entries (but I'm not sure we need to worry about that presently). ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-07 4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-07 4:27 ` Eric Sunshine @ 2021-05-08 11:13 ` Philip Oakley 2021-05-08 17:20 ` Elijah Newren 1 sibling, 1 reply; 90+ messages in thread From: Philip Oakley @ 2021-05-08 11:13 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren On 07/05/2021 05:04, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > PNPM for me, this was a UNA (un-named abbreviation), can we clarify it, e.g s/PNPM/& package manager/ > is apparently creating deeply nested (but ignored) directory > structures; traversing them is costly performance-wise, unnecessary, and > in some cases is even throwing warnings/errors because the paths are too > long to handle on various platforms. Add a testcase that demonstrates > this problem. > > Initial-test-by: Jason Gore <Jason.Gore@microsoft.com> > Helped-by: brian m. carlson <sandals@crustytoothpaste.net> > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 40 insertions(+) > > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh > index a74816ca8b46..5f1dc397c11e 100755 > --- a/t/t7300-clean.sh > +++ b/t/t7300-clean.sh > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' ' > test_must_be_empty actual > ' > > +test_expect_failure 'avoid traversing into ignored directories' ' > + test_when_finished rm -f output error && > + test_create_repo avoid-traversing-deep-hierarchy && > + ( > + cd avoid-traversing-deep-hierarchy && > + > + >directory-random-file.txt && > + # Put this file under directory400/directory399/.../directory1/ > + depth=400 && > + for x in $(test_seq 1 $depth); do > + mkdir "tmpdirectory$x" && > + mv directory* "tmpdirectory$x" && > + mv "tmpdirectory$x" "directory$x" > + done && > + > + git clean -ffdxn -e directory$depth >../output 2>../error && > + > + test_must_be_empty ../output && > + # We especially do not want things like > + # "warning: could not open directory " > + # appearing in the error output. It is true that directories > + # that are too long cannot be opened, but we should not be > + # recursing into those directories anyway since the very first > + # level is ignored. > + test_must_be_empty ../error && > + > + # alpine-linux-musl fails to "rm -rf" a directory with such > + # a deeply nested hierarchy. Help it out by deleting the > + # leading directories ourselves. Super slow, but, what else > + # can we do? Without this, we will hit a > + # error: Tests passed but test cleanup failed; aborting > + # so do this ugly manual cleanup... > + while test ! -f directory-random-file.txt; do > + name=$(ls -d directory*) && > + mv $name/* . && > + rmdir $name > + done > + ) > +' > + > test_done ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 11:13 ` Philip Oakley @ 2021-05-08 17:20 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-08 17:20 UTC (permalink / raw) To: Philip Oakley; +Cc: Elijah Newren via GitGitGadget, Git Mailing List On Sat, May 8, 2021 at 4:13 AM Philip Oakley <philipoakley@iee.email> wrote: > > On 07/05/2021 05:04, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > PNPM > > for me, this was a UNA (un-named abbreviation), can we clarify it, e.g > s/PNPM/& package manager/ Will do, thanks. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-07 4:04 ` Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (5 subsequent siblings) 7 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-07 4:04 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> In the last commit, we added a testcase showing that the directory traversal machinery sometimes traverses into directories unnecessarily. Here we show that there are cases where it does the opposite: it does not traverse into directories, despite those directories having important files that need to be flagged. Add a testcase showing that `git ls-files -o -i --directory` can omit some of the files it should be listing, and another showing that `git clean -fX` can fail to clean out some of the expected files. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t3001-ls-files-others-exclude.sh | 5 +++++ t/t7300-clean.sh | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index 1ec7cb57c7a8..ac05d1a17931 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,6 +292,11 @@ EOF test_cmp expect actual ' +test_expect_failure 'ls-files with "**" patterns and --directory' ' + # Expectation same as previous test + git ls-files --directory -o -i --exclude "**/a.1" >actual && + test_cmp expect actual +' test_expect_success 'ls-files with "**" patterns and no slashes' ' git ls-files -o -i --exclude "one**a.1" >actual && diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 5f1dc397c11e..337f9af1d74b 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -786,4 +786,23 @@ test_expect_failure 'avoid traversing into ignored directories' ' ) ' +test_expect_failure 'traverse into directories that may have ignored entries' ' + test_when_finished rm -f output && + test_create_repo need-to-traverse-into-hierarchy && + ( + cd need-to-traverse-into-hierarchy && + mkdir -p modules/foobar/src/generated && + > modules/foobar/src/generated/code.c && + > modules/foobar/Makefile && + echo "/modules/**/src/generated/" >.gitignore && + + git clean -fX modules/foobar >../output && + + grep Removing ../output && + + test_path_is_missing modules/foobar/src/generated/code.c && + test_path_is_file modules/foobar/Makefile + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget @ 2021-05-07 4:04 ` Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget ` (4 subsequent siblings) 7 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-07 4:04 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The show_other_directories case in treat_directory() tried to handle both excludes and untracked files with the same logic, and mishandled both the excludes and the untracked files in the process, in different ways. Split that logic apart, and then focus on the logic for the excludes; a subsequent commit will address the logic for untracked files. For show_other_directories, an excluded directory means that every path underneath that directory will also be excluded. Given that the calling code requested to just show directories when everything under a directory had the same state (that's what the "DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to traverse into such directories and can just immediately mark them as ignored (i.e. as path_excluded). The only reason we cannot just immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag and the possibility that the ignored directory is an empty directory. The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an exception as well, which was wrong. It can sometimes reduce the number of cases where we need to recurse (namely if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able to increase the number of cases where we need to recurse. Fix the logic accordingly. Some sidenotes about possible confusion with dir.c: * "ignored" often refers to an untracked ignore", i.e. a file which is not tracked which matches one of the ignore/exclusion rules. But you can also have a "tracked ignore", a tracked file that happens to match one of the ignore/exclusion rules and which dir.c has to worry about since "git ls-files -c -i" is supposed to list them. * The dir code often uses "ignored" and "excluded" interchangeably, which you need to keep in mind while reading the code. Sadly, though, it can get very confusing since ignore rules can have exclusions, as in the last of the following .gitignore rules: .gitignore *~ *.log !settings.log In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE) will be true due the the '!' negating the rule. Someone might refer to this as "excluded". That means the file 'settings.log' will not match, and thus not be ignored. So we won't return path_excluded for it. So it's an exclude rule that prevents the file from being an exclude. The non-excluded rules are the ones that result in files being excludes. Great fun, eh? Sometimes it feels like dir.c needs its own glossary with its many definitions, including the multiply-defined terms. Reported-by: Jason Gore <Jason.Gore@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 44 +++++++++++++++++++++++++++++--------------- t/t7300-clean.sh | 2 +- 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/dir.c b/dir.c index 3474e67e8f3c..4b183749843e 100644 --- a/dir.c +++ b/dir.c @@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } /* This is the "show_other_directories" case */ + assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* * If we have a pathspec which could match something _below_ this @@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir, if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC) return path_recurse; + /* Special cases for where this directory is excluded/ignored */ + if (excluded) { + /* + * In the show_other_directories case, if we're not + * hiding empty directories, there is no need to + * recurse into an ignored directory. + */ + if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + return path_excluded; + + /* + * Even if we are hiding empty directories, we can still avoid + * recursing into ignored directories for DIR_SHOW_IGNORED_TOO + * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. + */ + if ((dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) + return path_excluded; + } + /* - * Other than the path_recurse case immediately above, we only need - * to recurse into untracked/ignored directories if either of the - * following bits is set: + * Other than the path_recurse case above, we only need to + * recurse into untracked directories if either of the following + * bits is set: * - DIR_SHOW_IGNORED_TOO (because then we need to determine if * there are ignored entries below) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ - if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) - return excluded ? path_excluded : path_untracked; - - /* - * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid - * recursing into ignored directories if the path is excluded and - * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. - */ - if (excluded && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) - return path_excluded; + if (!excluded && + !(dir->flags & (DIR_SHOW_IGNORED_TOO | + DIR_HIDE_EMPTY_DIRECTORIES))) { + return path_untracked; + } /* * Even if we don't want to know all the paths under an untracked or diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 337f9af1d74b..00e5fa35dae3 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' -test_expect_failure 'avoid traversing into ignored directories' ' +test_expect_success 'avoid traversing into ignored directories' ' test_when_finished rm -f output error && test_create_repo avoid-traversing-deep-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-05-07 4:04 ` [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-07 4:04 ` Elijah Newren via GitGitGadget 2021-05-07 4:05 ` [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget ` (3 subsequent siblings) 7 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-07 4:04 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A directory that is untracked does not imply that all files under it should be categorized as untracked; in particular, if the caller is interested in ignored files, many files or directories underneath the untracked directory may be ignored. We previously partially handled this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED. It was not obvious, though, because the logic for untracked and excluded files had been fused together making it harder to reason about. The previous commit split that logic out, making it easier to notice that DIR_SHOW_IGNORED was missing. Add it. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 10 ++++++---- t/t3001-ls-files-others-exclude.sh | 2 +- t/t7300-clean.sh | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index 4b183749843e..3beb8e17a839 100644 --- a/dir.c +++ b/dir.c @@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* * Other than the path_recurse case above, we only need to - * recurse into untracked directories if either of the following + * recurse into untracked directories if any of the following * bits is set: - * - DIR_SHOW_IGNORED_TOO (because then we need to determine if - * there are ignored entries below) + * - DIR_SHOW_IGNORED (because then we need to determine if + * there are ignored entries below) + * - DIR_SHOW_IGNORED_TOO (same as above) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ if (!excluded && - !(dir->flags & (DIR_SHOW_IGNORED_TOO | + !(dir->flags & (DIR_SHOW_IGNORED | + DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) { return path_untracked; } diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index ac05d1a17931..516c95ea0e82 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,7 +292,7 @@ EOF test_cmp expect actual ' -test_expect_failure 'ls-files with "**" patterns and --directory' ' +test_expect_success 'ls-files with "**" patterns and --directory' ' # Expectation same as previous test git ls-files --directory -o -i --exclude "**/a.1" >actual && test_cmp expect actual diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 00e5fa35dae3..c2a3b7b6a52b 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -786,7 +786,7 @@ test_expect_success 'avoid traversing into ignored directories' ' ) ' -test_expect_failure 'traverse into directories that may have ignored entries' ' +test_expect_success 'traverse into directories that may have ignored entries' ' test_when_finished rm -f output && test_create_repo need-to-traverse-into-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2021-05-07 4:04 ` [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget @ 2021-05-07 4:05 ` Elijah Newren via GitGitGadget 2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee ` (2 subsequent siblings) 7 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-07 4:05 UTC (permalink / raw) To: git; +Cc: Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> ls-files --ignored can be used together with either --others or --cached. After being perplexed for a bit and digging in to the code, I assumed that ls-files -i was just broken and not printing anything and had a nice patch ready to submit when I finally realized that -i can be used with --cached to find tracked ignores. While that was a mistake on my part, and a careful reading of the documentation could have made this more clear, I suspect this is an error others are likely to make as well. In fact, of two uses in our testsuite, I believe one of the two did make this error. In t1306.13, there are NO tracked files, and all the excludes built up and used in that test and in previous tests thus have to be about untracked files. However, since they were looking for an empty result, the mistake went unnoticed as their erroneous command also just happened to give an empty answer. -i will most the time be used with -o, which would suggest we could just make -i imply -o in the absence of either a -o or -c, but that would be a backward incompatible break. Instead, let's just flag -i without either a -o or -c as an error, and update the two relevant testcases to specify their intent. Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/ls-files.c | 3 +++ t/t1306-xdg-files.sh | 2 +- t/t3003-ls-files-exclude.sh | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 60a2913a01e9..9f74b1ab2e69 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) if (pathspec.nr && error_unmatch) ps_matched = xcalloc(pathspec.nr, 1); + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) + die("ls-files --ignored is usually used with --others, but --cached is the default. Please specify which you want."); + if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given) die("ls-files --ignored needs some exclude pattern"); diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh index dd87b43be1a6..40d3c42618c0 100755 --- a/t/t1306-xdg-files.sh +++ b/t/t1306-xdg-files.sh @@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' ' test_expect_success 'Checking XDG ignore file when HOME is unset' ' (sane_unset HOME && git config --unset core.excludesfile && - git ls-files --exclude-standard --ignored >actual) && + git ls-files --exclude-standard --ignored --others >actual) && test_must_be_empty actual ' diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh index d5ec333131f9..c41c4f046abf 100755 --- a/t/t3003-ls-files-exclude.sh +++ b/t/t3003-ls-files-exclude.sh @@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' ' ' check_all_output -test_expect_success 'ls-files -i lists only tracked-but-ignored files' ' +test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' ' echo content >other-file && git add other-file && echo file >expect && - git ls-files -i --exclude-standard >output && + git ls-files -i -c --exclude-standard >output && test_cmp expect output ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH 6/5] dir: update stale description of treat_directory() 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2021-05-07 4:05 ` [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget @ 2021-05-07 16:22 ` Derrick Stolee 2021-05-07 17:57 ` Elijah Newren 2021-05-07 16:27 ` [PATCH 0/5] Directory traversal fixes Derrick Stolee 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget 7 siblings, 1 reply; 90+ messages in thread From: Derrick Stolee @ 2021-05-07 16:22 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren On 5/7/2021 12:04 AM, Elijah Newren via GitGitGadget wrote: > This patchset fixes a few directory traversal issues, where fill_directory() > would traverse into directories that it shouldn't and not traverse into > directories that it should. One of these issues was reported recently on > this list[1], another was found at $DAYJOB. > > The fifth patch might have backward compatibility implications, but is easy > to review. Even if the logic in dir.c makes your eyes glaze over, at least > take a look at the fifth patch. My eyes were glazing over, so I went to read the whole treat_directory() method and its related documentation comment. I found it to be a bit confusing that it was referencing names that were deprecated 12 years ago. Here is a patch that you could add to this series to improve these comments. Thanks, -Stolee -- >8 -- From 587a94ac396c969b6e7734ee46afeac20e87ccb9 Mon Sep 17 00:00:00 2001 From: Derrick Stolee <dstolee@microsoft.com> Date: Fri, 7 May 2021 12:14:13 -0400 Subject: [PATCH] dir: update stale description of treat_directory() The documentation comment for treat_directory() was originally written in 095952 (Teach directory traversal about subprojects, 2007-04-11) which was before the 'struct dir_struct' split its bitfield of named options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct dir_struct into a single variable, 2009-02-16). When those flags changed, the comment became stale, since members like 'show_other_directories' transitioned into flags like DIR_SHOW_OTHER_DIRECTORIES. Update the comments for treat_directory() to use these flag names rather than the old member names. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> --- dir.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/dir.c b/dir.c index 3beb8e17a83..0a0138bc1aa 100644 --- a/dir.c +++ b/dir.c @@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, * Case 3: if we didn't have it in the index previously, we * have a few sub-cases: * - * (a) if "show_other_directories" is true, we show it as - * just a directory, unless "hide_empty_directories" is + * (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as + * just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is * also true, in which case we need to check if it contains any * untracked and / or ignored files. - * (b) if it looks like a git directory, and we don't have - * 'no_gitlinks' set we treat it as a gitlink, and show it - * as a directory. + * (b) if it looks like a git directory and we don't have the + * DIR_NO_GITLINKS flag, then we treat it as a gitlink, and + * show it as a directory. * (c) otherwise, we recurse into it. */ static enum path_treatment treat_directory(struct dir_struct *dir, @@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir, return path_recurse; } - /* This is the "show_other_directories" case */ assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* @@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* Special cases for where this directory is excluded/ignored */ if (excluded) { /* - * In the show_other_directories case, if we're not + * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not * hiding empty directories, there is no need to * recurse into an ignored directory. */ -- 2.31.1.vfs.0.0.80.gb082c853c0e ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH 6/5] dir: update stale description of treat_directory() 2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee @ 2021-05-07 17:57 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-07 17:57 UTC (permalink / raw) To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List On Fri, May 7, 2021 at 9:22 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 5/7/2021 12:04 AM, Elijah Newren via GitGitGadget wrote: > > This patchset fixes a few directory traversal issues, where fill_directory() > > would traverse into directories that it shouldn't and not traverse into > > directories that it should. One of these issues was reported recently on > > this list[1], another was found at $DAYJOB. > > > > The fifth patch might have backward compatibility implications, but is easy > > to review. Even if the logic in dir.c makes your eyes glaze over, at least > > take a look at the fifth patch. > > My eyes were glazing over, so I went to read the whole treat_directory() > method and its related documentation comment. I found it to be a bit > confusing that it was referencing names that were deprecated 12 years ago. > > Here is a patch that you could add to this series to improve these > comments. > > Thanks, > -Stolee > > -- >8 -- > > From 587a94ac396c969b6e7734ee46afeac20e87ccb9 Mon Sep 17 00:00:00 2001 > From: Derrick Stolee <dstolee@microsoft.com> > Date: Fri, 7 May 2021 12:14:13 -0400 > Subject: [PATCH] dir: update stale description of treat_directory() > > The documentation comment for treat_directory() was originally written > in 095952 (Teach directory traversal about subprojects, 2007-04-11) > which was before the 'struct dir_struct' split its bitfield of named > options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct > dir_struct into a single variable, 2009-02-16). When those flags > changed, the comment became stale, since members like > 'show_other_directories' transitioned into flags like > DIR_SHOW_OTHER_DIRECTORIES. > > Update the comments for treat_directory() to use these flag names rather > than the old member names. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > dir.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/dir.c b/dir.c > index 3beb8e17a83..0a0138bc1aa 100644 > --- a/dir.c > +++ b/dir.c > @@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, > * Case 3: if we didn't have it in the index previously, we > * have a few sub-cases: > * > - * (a) if "show_other_directories" is true, we show it as > - * just a directory, unless "hide_empty_directories" is > + * (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as > + * just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is > * also true, in which case we need to check if it contains any > * untracked and / or ignored files. > - * (b) if it looks like a git directory, and we don't have > - * 'no_gitlinks' set we treat it as a gitlink, and show it > - * as a directory. > + * (b) if it looks like a git directory and we don't have the > + * DIR_NO_GITLINKS flag, then we treat it as a gitlink, and > + * show it as a directory. > * (c) otherwise, we recurse into it. > */ > static enum path_treatment treat_directory(struct dir_struct *dir, > @@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > return path_recurse; > } > > - /* This is the "show_other_directories" case */ > assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); > > /* > @@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, > /* Special cases for where this directory is excluded/ignored */ > if (excluded) { > /* > - * In the show_other_directories case, if we're not > + * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not > * hiding empty directories, there is no need to > * recurse into an ignored directory. > */ > -- > 2.31.1.vfs.0.0.80.gb082c853c0e Looks good to me; I'll give it some more time for other comments to come in, but when I re-roll, I'll include this patch of yours. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH 0/5] Directory traversal fixes 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee @ 2021-05-07 16:27 ` Derrick Stolee 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget 7 siblings, 0 replies; 90+ messages in thread From: Derrick Stolee @ 2021-05-07 16:27 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren On 5/7/2021 12:04 AM, Elijah Newren via GitGitGadget wrote: > This patchset fixes a few directory traversal issues, where fill_directory() > would traverse into directories that it shouldn't and not traverse into > directories that it should. One of these issues was reported recently on > this list[1], another was found at $DAYJOB. > > The fifth patch might have backward compatibility implications, but is easy > to review. Even if the logic in dir.c makes your eyes glaze over, at least > take a look at the fifth patch. > > Also, if anyone has any ideas about a better place to put the "Some > sidenotes" from the third commit message rather than keeping them in a > random commit message, that might be helpful too. As for your patches themselves, I can't claim to understand all the complicated details about how treat_directory() is working, but your patches are well organized and the new tests are the real proof that this is working as intended. Thanks for the attention to detail here. -Stolee ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v2 0/8] Directory traversal fixes 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2021-05-07 16:27 ` [PATCH 0/5] Directory traversal fixes Derrick Stolee @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (8 more replies) 7 siblings, 9 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git; +Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren This patchset fixes a few directory traversal issues, where fill_directory() would traverse into directories that it shouldn't and not traverse into directories that it should. Changes since v2: * Added a patch from Stolee to clean up some nearby comments that were made out-of-date 12 years ago * Added a new RFC patch that switches dir.c from using trace1 to trace2 * Added a new RFC patch that adds directories-visited and paths-visited statistics using the trace2 output, and use that to vastly simplify (and accelerate) the t7300 testcase I'm curious what others think of the backward compatibility ramifications of the RFC patches, patch 5 & patch 6. And whether my use of trace2 is clean, idiomatic, correct, etc. I've not used it before for things other than region_enter & region_leave. Also, if anyone has any ideas about a better place to put the "Some sidenotes" from the third commit message rather than keeping them in a random commit message, that might be helpful too. [1] See https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/ or alternatively https://github.com/git-for-windows/git/issues/2732. Derrick Stolee (1): dir: update stale description of treat_directory() Elijah Newren (7): t7300: add testcase showing unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal dir: avoid unnecessary traversal into ignored directory dir: traverse into untracked directories if they may have ignored subfiles [RFC] ls-files: error out on -i unless -o or -c are specified [RFC] dir: convert trace calls to trace2 equivalents [RFC] dir: reported number of visited directories and paths with trace2 builtin/ls-files.c | 3 + dir.c | 103 +++++++++------ dir.h | 4 + t/t1306-xdg-files.sh | 2 +- t/t3001-ls-files-others-exclude.sh | 5 + t/t3003-ls-files-exclude.sh | 4 +- t/t7063-status-untracked-cache.sh | 194 ++++++++++++++++------------- t/t7300-clean.sh | 41 ++++++ t/t7519-status-fsmonitor.sh | 8 +- 9 files changed, 238 insertions(+), 126 deletions(-) base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v2 Pull-Request: https://github.com/git/git/pull/1020 Range-diff vs v1: 1: a3bd253fa8e8 = 1: a3bd253fa8e8 t7300: add testcase showing unnecessary traversal into ignored directory 2: aa3a41e26eca = 2: aa3a41e26eca t3001, t7300: add testcase showcasing missed directory traversal 3: 3c3f6111da13 = 3: 3c3f6111da13 dir: avoid unnecessary traversal into ignored directory 4: fad048339b81 = 4: fad048339b81 dir: traverse into untracked directories if they may have ignored subfiles 5: 3d8dd00ccd10 = 5: 3d8dd00ccd10 [RFC] ls-files: error out on -i unless -o or -c are specified -: ------------ > 6: 1d825dfdc70b dir: update stale description of treat_directory() -: ------------ > 7: 3a2394506a53 [RFC] dir: convert trace calls to trace2 equivalents -: ------------ > 8: fba4d65b78c7 [RFC] dir: reported number of visited directories and paths with trace2 -- gitgitgadget ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 10:13 ` Junio C Hamano 2021-05-08 10:19 ` Junio C Hamano 2021-05-08 0:08 ` [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget ` (7 subsequent siblings) 8 siblings, 2 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> PNPM is apparently creating deeply nested (but ignored) directory structures; traversing them is costly performance-wise, unnecessary, and in some cases is even throwing warnings/errors because the paths are too long to handle on various platforms. Add a testcase that demonstrates this problem. Initial-test-by: Jason Gore <Jason.Gore@microsoft.com> Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index a74816ca8b46..5f1dc397c11e 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' +test_expect_failure 'avoid traversing into ignored directories' ' + test_when_finished rm -f output error && + test_create_repo avoid-traversing-deep-hierarchy && + ( + cd avoid-traversing-deep-hierarchy && + + >directory-random-file.txt && + # Put this file under directory400/directory399/.../directory1/ + depth=400 && + for x in $(test_seq 1 $depth); do + mkdir "tmpdirectory$x" && + mv directory* "tmpdirectory$x" && + mv "tmpdirectory$x" "directory$x" + done && + + git clean -ffdxn -e directory$depth >../output 2>../error && + + test_must_be_empty ../output && + # We especially do not want things like + # "warning: could not open directory " + # appearing in the error output. It is true that directories + # that are too long cannot be opened, but we should not be + # recursing into those directories anyway since the very first + # level is ignored. + test_must_be_empty ../error && + + # alpine-linux-musl fails to "rm -rf" a directory with such + # a deeply nested hierarchy. Help it out by deleting the + # leading directories ourselves. Super slow, but, what else + # can we do? Without this, we will hit a + # error: Tests passed but test cleanup failed; aborting + # so do this ugly manual cleanup... + while test ! -f directory-random-file.txt; do + name=$(ls -d directory*) && + mv $name/* . && + rmdir $name + done + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 0:08 ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-08 10:13 ` Junio C Hamano 2021-05-08 17:34 ` Elijah Newren 2021-05-08 10:19 ` Junio C Hamano 1 sibling, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-08 10:13 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Elijah Newren <newren@gmail.com> > > PNPM is apparently creating deeply nested (but ignored) directory Sorry, but what's PNPM? > structures; traversing them is costly performance-wise, unnecessary, and > in some cases is even throwing warnings/errors because the paths are too > long to handle on various platforms. Add a testcase that demonstrates > this problem. > > Initial-test-by: Jason Gore <Jason.Gore@microsoft.com> > Helped-by: brian m. carlson <sandals@crustytoothpaste.net> > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 40 insertions(+) > > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh > index a74816ca8b46..5f1dc397c11e 100755 > --- a/t/t7300-clean.sh > +++ b/t/t7300-clean.sh > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' ' > test_must_be_empty actual > ' > > +test_expect_failure 'avoid traversing into ignored directories' ' > + test_when_finished rm -f output error && > + test_create_repo avoid-traversing-deep-hierarchy && > + ( > + cd avoid-traversing-deep-hierarchy && > + > + >directory-random-file.txt && > + # Put this file under directory400/directory399/.../directory1/ > + depth=400 && > + for x in $(test_seq 1 $depth); do Style. Lose semicolon, have "do" on the next line on its own, aligned with "for". Tip: you shouldn't need any semicolon other than the doubled ones in case/esac in your shell script. > + mkdir "tmpdirectory$x" && > + mv directory* "tmpdirectory$x" && > + mv "tmpdirectory$x" "directory$x" > + done && > + > + git clean -ffdxn -e directory$depth >../output 2>../error && > + > + test_must_be_empty ../output && > + # We especially do not want things like > + # "warning: could not open directory " > + # appearing in the error output. It is true that directories > + # that are too long cannot be opened, but we should not be > + # recursing into those directories anyway since the very first > + # level is ignored. > + test_must_be_empty ../error && > + > + # alpine-linux-musl fails to "rm -rf" a directory with such > + # a deeply nested hierarchy. Help it out by deleting the > + # leading directories ourselves. Super slow, but, what else > + # can we do? Without this, we will hit a > + # error: Tests passed but test cleanup failed; aborting > + # so do this ugly manual cleanup... > + while test ! -f directory-random-file.txt; do Ditto. > + name=$(ls -d directory*) && > + mv $name/* . && > + rmdir $name > + done Hmph, after seeing the discussion thread of v1, I was expecting to see a helper in Perl that cd's down and then comes back up while removing what is in its directory (and I expected something similar for creation side we saw above). > + ) > +' > + > test_done ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 10:13 ` Junio C Hamano @ 2021-05-08 17:34 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-08 17:34 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King On Sat, May 8, 2021 at 3:13 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Elijah Newren <newren@gmail.com> > > > > PNPM is apparently creating deeply nested (but ignored) directory > > Sorry, but what's PNPM? a package manager; I'll use Philip Oakley's suggestion to make it more clear. > > structures; traversing them is costly performance-wise, unnecessary, and > > in some cases is even throwing warnings/errors because the paths are too > > long to handle on various platforms. Add a testcase that demonstrates > > this problem. > > > > Initial-test-by: Jason Gore <Jason.Gore@microsoft.com> > > Helped-by: brian m. carlson <sandals@crustytoothpaste.net> > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > t/t7300-clean.sh | 40 ++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 40 insertions(+) > > > > diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh > > index a74816ca8b46..5f1dc397c11e 100755 > > --- a/t/t7300-clean.sh > > +++ b/t/t7300-clean.sh > > @@ -746,4 +746,44 @@ test_expect_success 'clean untracked paths by pathspec' ' > > test_must_be_empty actual > > ' > > > > +test_expect_failure 'avoid traversing into ignored directories' ' > > + test_when_finished rm -f output error && > > + test_create_repo avoid-traversing-deep-hierarchy && > > + ( > > + cd avoid-traversing-deep-hierarchy && > > + > > + >directory-random-file.txt && > > + # Put this file under directory400/directory399/.../directory1/ > > + depth=400 && > > + for x in $(test_seq 1 $depth); do > > Style. Lose semicolon, have "do" on the next line on its own, > aligned with "for". Tip: you shouldn't need any semicolon other > than the doubled ones in case/esac in your shell script. Thanks. > > > + mkdir "tmpdirectory$x" && > > + mv directory* "tmpdirectory$x" && > > + mv "tmpdirectory$x" "directory$x" > > + done && > > + > > + git clean -ffdxn -e directory$depth >../output 2>../error && > > + > > + test_must_be_empty ../output && > > + # We especially do not want things like > > + # "warning: could not open directory " > > + # appearing in the error output. It is true that directories > > + # that are too long cannot be opened, but we should not be > > + # recursing into those directories anyway since the very first > > + # level is ignored. > > + test_must_be_empty ../error && > > + > > + # alpine-linux-musl fails to "rm -rf" a directory with such > > + # a deeply nested hierarchy. Help it out by deleting the > > + # leading directories ourselves. Super slow, but, what else > > + # can we do? Without this, we will hit a > > + # error: Tests passed but test cleanup failed; aborting > > + # so do this ugly manual cleanup... > > + while test ! -f directory-random-file.txt; do > > Ditto. Yep, sorry. > > + name=$(ls -d directory*) && > > + mv $name/* . && > > + rmdir $name > > + done > > Hmph, after seeing the discussion thread of v1, I was expecting to > see a helper in Perl that cd's down and then comes back up while > removing what is in its directory (and I expected something similar > for creation side we saw above). Hmm, I was a bit unsure of the alternative route I took in patches 7 and 8 (switching trace1 to trace2 in dir.c, then using it to get more statistics which would allow a much more shallow directory structure for this test). I wasn't sure if the strategy seemed acceptable, and I wanted people to be able to see the two schemes side-by-side, but if that alternative is acceptable, I want to move patch 7 to the front of the series, the code change parts of patch 8 as the second patch, and then squash the rest of patch 8 into this patch vastly simplifying this testcase and obsoleting everyone's comments on it. Maybe I should have just refactored the series that way anyway. I'll send a reroll that does that, and put all the [RFC] patches first. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 0:08 ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-08 10:13 ` Junio C Hamano @ 2021-05-08 10:19 ` Junio C Hamano 2021-05-08 17:41 ` Elijah Newren 1 sibling, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-08 10:19 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > + # alpine-linux-musl fails to "rm -rf" a directory with such > + # a deeply nested hierarchy. Help it out by deleting the > + # leading directories ourselves. Super slow, but, what else > + # can we do? Without this, we will hit a > + # error: Tests passed but test cleanup failed; aborting > + # so do this ugly manual cleanup... > + while test ! -f directory-random-file.txt; do > + name=$(ls -d directory*) && > + mv $name/* . && > + rmdir $name > + done Another thing: this not being a test_when_finished handler means it would not help after a test failure. Perhaps wrap it in a helper clean_deep_hierarchy () { rm -fr directory* || while test ! -f directory-random-file.txt do ... done } and call it from test_when_finished? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 10:19 ` Junio C Hamano @ 2021-05-08 17:41 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-08 17:41 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King On Sat, May 8, 2021 at 3:19 AM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > + # alpine-linux-musl fails to "rm -rf" a directory with such > > + # a deeply nested hierarchy. Help it out by deleting the > > + # leading directories ourselves. Super slow, but, what else > > + # can we do? Without this, we will hit a > > + # error: Tests passed but test cleanup failed; aborting > > + # so do this ugly manual cleanup... > > + while test ! -f directory-random-file.txt; do > > + name=$(ls -d directory*) && > > + mv $name/* . && > > + rmdir $name > > + done > > Another thing: this not being a test_when_finished handler means it > would not help after a test failure. test failures are irrelevant here; this code is here to help test_done's directory cleanup, which only fires when all tests pass. But if I restructure the series, this whole section of code disappears. I'll do that... ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (6 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> In the last commit, we added a testcase showing that the directory traversal machinery sometimes traverses into directories unnecessarily. Here we show that there are cases where it does the opposite: it does not traverse into directories, despite those directories having important files that need to be flagged. Add a testcase showing that `git ls-files -o -i --directory` can omit some of the files it should be listing, and another showing that `git clean -fX` can fail to clean out some of the expected files. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t3001-ls-files-others-exclude.sh | 5 +++++ t/t7300-clean.sh | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index 1ec7cb57c7a8..ac05d1a17931 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,6 +292,11 @@ EOF test_cmp expect actual ' +test_expect_failure 'ls-files with "**" patterns and --directory' ' + # Expectation same as previous test + git ls-files --directory -o -i --exclude "**/a.1" >actual && + test_cmp expect actual +' test_expect_success 'ls-files with "**" patterns and no slashes' ' git ls-files -o -i --exclude "one**a.1" >actual && diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 5f1dc397c11e..337f9af1d74b 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -786,4 +786,23 @@ test_expect_failure 'avoid traversing into ignored directories' ' ) ' +test_expect_failure 'traverse into directories that may have ignored entries' ' + test_when_finished rm -f output && + test_create_repo need-to-traverse-into-hierarchy && + ( + cd need-to-traverse-into-hierarchy && + mkdir -p modules/foobar/src/generated && + > modules/foobar/src/generated/code.c && + > modules/foobar/Makefile && + echo "/modules/**/src/generated/" >.gitignore && + + git clean -fX modules/foobar >../output && + + grep Removing ../output && + + test_path_is_missing modules/foobar/src/generated/code.c && + test_path_is_file modules/foobar/Makefile + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget ` (5 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The show_other_directories case in treat_directory() tried to handle both excludes and untracked files with the same logic, and mishandled both the excludes and the untracked files in the process, in different ways. Split that logic apart, and then focus on the logic for the excludes; a subsequent commit will address the logic for untracked files. For show_other_directories, an excluded directory means that every path underneath that directory will also be excluded. Given that the calling code requested to just show directories when everything under a directory had the same state (that's what the "DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to traverse into such directories and can just immediately mark them as ignored (i.e. as path_excluded). The only reason we cannot just immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag and the possibility that the ignored directory is an empty directory. The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an exception as well, which was wrong. It can sometimes reduce the number of cases where we need to recurse (namely if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able to increase the number of cases where we need to recurse. Fix the logic accordingly. Some sidenotes about possible confusion with dir.c: * "ignored" often refers to an untracked ignore", i.e. a file which is not tracked which matches one of the ignore/exclusion rules. But you can also have a "tracked ignore", a tracked file that happens to match one of the ignore/exclusion rules and which dir.c has to worry about since "git ls-files -c -i" is supposed to list them. * The dir code often uses "ignored" and "excluded" interchangeably, which you need to keep in mind while reading the code. Sadly, though, it can get very confusing since ignore rules can have exclusions, as in the last of the following .gitignore rules: .gitignore *~ *.log !settings.log In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE) will be true due the the '!' negating the rule. Someone might refer to this as "excluded". That means the file 'settings.log' will not match, and thus not be ignored. So we won't return path_excluded for it. So it's an exclude rule that prevents the file from being an exclude. The non-excluded rules are the ones that result in files being excludes. Great fun, eh? Sometimes it feels like dir.c needs its own glossary with its many definitions, including the multiply-defined terms. Reported-by: Jason Gore <Jason.Gore@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 44 +++++++++++++++++++++++++++++--------------- t/t7300-clean.sh | 2 +- 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/dir.c b/dir.c index 3474e67e8f3c..4b183749843e 100644 --- a/dir.c +++ b/dir.c @@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } /* This is the "show_other_directories" case */ + assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* * If we have a pathspec which could match something _below_ this @@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir, if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC) return path_recurse; + /* Special cases for where this directory is excluded/ignored */ + if (excluded) { + /* + * In the show_other_directories case, if we're not + * hiding empty directories, there is no need to + * recurse into an ignored directory. + */ + if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + return path_excluded; + + /* + * Even if we are hiding empty directories, we can still avoid + * recursing into ignored directories for DIR_SHOW_IGNORED_TOO + * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. + */ + if ((dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) + return path_excluded; + } + /* - * Other than the path_recurse case immediately above, we only need - * to recurse into untracked/ignored directories if either of the - * following bits is set: + * Other than the path_recurse case above, we only need to + * recurse into untracked directories if either of the following + * bits is set: * - DIR_SHOW_IGNORED_TOO (because then we need to determine if * there are ignored entries below) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ - if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) - return excluded ? path_excluded : path_untracked; - - /* - * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid - * recursing into ignored directories if the path is excluded and - * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. - */ - if (excluded && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) - return path_excluded; + if (!excluded && + !(dir->flags & (DIR_SHOW_IGNORED_TOO | + DIR_HIDE_EMPTY_DIRECTORIES))) { + return path_untracked; + } /* * Even if we don't want to know all the paths under an untracked or diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 337f9af1d74b..00e5fa35dae3 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' -test_expect_failure 'avoid traversing into ignored directories' ' +test_expect_success 'avoid traversing into ignored directories' ' test_when_finished rm -f output error && test_create_repo avoid-traversing-deep-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-05-08 0:08 ` [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget ` (4 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A directory that is untracked does not imply that all files under it should be categorized as untracked; in particular, if the caller is interested in ignored files, many files or directories underneath the untracked directory may be ignored. We previously partially handled this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED. It was not obvious, though, because the logic for untracked and excluded files had been fused together making it harder to reason about. The previous commit split that logic out, making it easier to notice that DIR_SHOW_IGNORED was missing. Add it. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 10 ++++++---- t/t3001-ls-files-others-exclude.sh | 2 +- t/t7300-clean.sh | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index 4b183749843e..3beb8e17a839 100644 --- a/dir.c +++ b/dir.c @@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* * Other than the path_recurse case above, we only need to - * recurse into untracked directories if either of the following + * recurse into untracked directories if any of the following * bits is set: - * - DIR_SHOW_IGNORED_TOO (because then we need to determine if - * there are ignored entries below) + * - DIR_SHOW_IGNORED (because then we need to determine if + * there are ignored entries below) + * - DIR_SHOW_IGNORED_TOO (same as above) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ if (!excluded && - !(dir->flags & (DIR_SHOW_IGNORED_TOO | + !(dir->flags & (DIR_SHOW_IGNORED | + DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) { return path_untracked; } diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index ac05d1a17931..516c95ea0e82 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,7 +292,7 @@ EOF test_cmp expect actual ' -test_expect_failure 'ls-files with "**" patterns and --directory' ' +test_expect_success 'ls-files with "**" patterns and --directory' ' # Expectation same as previous test git ls-files --directory -o -i --exclude "**/a.1" >actual && test_cmp expect actual diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 00e5fa35dae3..c2a3b7b6a52b 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -786,7 +786,7 @@ test_expect_success 'avoid traversing into ignored directories' ' ) ' -test_expect_failure 'traverse into directories that may have ignored entries' ' +test_expect_success 'traverse into directories that may have ignored entries' ' test_when_finished rm -f output && test_create_repo need-to-traverse-into-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2021-05-08 0:08 ` [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 6/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget ` (3 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> ls-files --ignored can be used together with either --others or --cached. After being perplexed for a bit and digging in to the code, I assumed that ls-files -i was just broken and not printing anything and had a nice patch ready to submit when I finally realized that -i can be used with --cached to find tracked ignores. While that was a mistake on my part, and a careful reading of the documentation could have made this more clear, I suspect this is an error others are likely to make as well. In fact, of two uses in our testsuite, I believe one of the two did make this error. In t1306.13, there are NO tracked files, and all the excludes built up and used in that test and in previous tests thus have to be about untracked files. However, since they were looking for an empty result, the mistake went unnoticed as their erroneous command also just happened to give an empty answer. -i will most the time be used with -o, which would suggest we could just make -i imply -o in the absence of either a -o or -c, but that would be a backward incompatible break. Instead, let's just flag -i without either a -o or -c as an error, and update the two relevant testcases to specify their intent. Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/ls-files.c | 3 +++ t/t1306-xdg-files.sh | 2 +- t/t3003-ls-files-exclude.sh | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 60a2913a01e9..9f74b1ab2e69 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) if (pathspec.nr && error_unmatch) ps_matched = xcalloc(pathspec.nr, 1); + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) + die("ls-files --ignored is usually used with --others, but --cached is the default. Please specify which you want."); + if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given) die("ls-files --ignored needs some exclude pattern"); diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh index dd87b43be1a6..40d3c42618c0 100755 --- a/t/t1306-xdg-files.sh +++ b/t/t1306-xdg-files.sh @@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' ' test_expect_success 'Checking XDG ignore file when HOME is unset' ' (sane_unset HOME && git config --unset core.excludesfile && - git ls-files --exclude-standard --ignored >actual) && + git ls-files --exclude-standard --ignored --others >actual) && test_must_be_empty actual ' diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh index d5ec333131f9..c41c4f046abf 100755 --- a/t/t3003-ls-files-exclude.sh +++ b/t/t3003-ls-files-exclude.sh @@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' ' ' check_all_output -test_expect_success 'ls-files -i lists only tracked-but-ignored files' ' +test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' ' echo content >other-file && git add other-file && echo file >expect && - git ls-files -i --exclude-standard >output && + git ls-files -i -c --exclude-standard >output && test_cmp expect output ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v2 6/8] dir: update stale description of treat_directory() 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2021-05-08 0:08 ` [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget @ 2021-05-08 0:08 ` Derrick Stolee via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget ` (2 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Derrick Stolee From: Derrick Stolee <stolee@gmail.com> The documentation comment for treat_directory() was originally written in 095952 (Teach directory traversal about subprojects, 2007-04-11) which was before the 'struct dir_struct' split its bitfield of named options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct dir_struct into a single variable, 2009-02-16). When those flags changed, the comment became stale, since members like 'show_other_directories' transitioned into flags like DIR_SHOW_OTHER_DIRECTORIES. Update the comments for treat_directory() to use these flag names rather than the old member names. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> --- dir.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/dir.c b/dir.c index 3beb8e17a839..0a0138bc1aa6 100644 --- a/dir.c +++ b/dir.c @@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, * Case 3: if we didn't have it in the index previously, we * have a few sub-cases: * - * (a) if "show_other_directories" is true, we show it as - * just a directory, unless "hide_empty_directories" is + * (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as + * just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is * also true, in which case we need to check if it contains any * untracked and / or ignored files. - * (b) if it looks like a git directory, and we don't have - * 'no_gitlinks' set we treat it as a gitlink, and show it - * as a directory. + * (b) if it looks like a git directory and we don't have the + * DIR_NO_GITLINKS flag, then we treat it as a gitlink, and + * show it as a directory. * (c) otherwise, we recurse into it. */ static enum path_treatment treat_directory(struct dir_struct *dir, @@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir, return path_recurse; } - /* This is the "show_other_directories" case */ assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* @@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* Special cases for where this directory is excluded/ignored */ if (excluded) { /* - * In the show_other_directories case, if we're not + * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not * hiding empty directories, there is no need to * recurse into an ignored directory. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2021-05-08 0:08 ` [PATCH v2 6/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 34 ++++-- t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++------------- t/t7519-status-fsmonitor.sh | 8 +- 3 files changed, 135 insertions(+), 100 deletions(-) diff --git a/dir.c b/dir.c index 0a0138bc1aa6..23c71ab7e9a1 100644 --- a/dir.c +++ b/dir.c @@ -2775,12 +2775,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d return root; } +static void trace2_read_directory_statistics(struct dir_struct *dir, + struct repository *repo) +{ + if (!dir->untracked) + return; + trace2_data_intmax("read_directory", repo, + "node-creation", dir->untracked->dir_created); + trace2_data_intmax("read_directory", repo, + "gitignore-invalidation", + dir->untracked->gitignore_invalidated); + trace2_data_intmax("read_directory", repo, + "directory-invalidation", + dir->untracked->dir_invalidated); + trace2_data_intmax("read_directory", repo, + "opendir", dir->untracked->dir_opened); +} + int read_directory(struct dir_struct *dir, struct index_state *istate, const char *path, int len, const struct pathspec *pathspec) { struct untracked_cache_dir *untracked; - trace_performance_enter(); + trace2_region_enter("dir", "read_directory", istate->repo); if (has_symlink_leading_path(path, len)) { trace_performance_leave("read directory %.*s", len, path); @@ -2799,23 +2816,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, QSORT(dir->entries, dir->nr, cmp_dir_entry); QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); - trace_performance_leave("read directory %.*s", len, path); + trace2_region_leave("dir", "read_directory", istate->repo); if (dir->untracked) { static int force_untracked_cache = -1; - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); if (force_untracked_cache < 0) force_untracked_cache = git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); - trace_printf_key(&trace_untracked_stats, - "node creation: %u\n" - "gitignore invalidation: %u\n" - "directory invalidation: %u\n" - "opendir: %u\n", - dir->untracked->dir_created, - dir->untracked->gitignore_invalidated, - dir->untracked->dir_invalidated, - dir->untracked->dir_opened); if (force_untracked_cache && dir->untracked == istate->untracked && (dir->untracked->dir_opened || @@ -2826,6 +2833,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, FREE_AND_NULL(dir->untracked); } } + + if (trace2_is_enabled()) + trace2_read_directory_statistics(dir, istate->repo); return dir->nr; } diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index accefde72fb1..6bce65b439e3 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -57,6 +57,19 @@ iuc () { return $ret } +get_relevant_traces() { + # From the GIT_TRACE2_PERF data of the form + # $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT + # extract the $RELEVANT_STAT fields. We don't care about region_enter + # or region_leave, or stats for things outside read_directory. + INPUT_FILE=$1 + OUTPUT_FILE=$2 + grep data.*read_directo $INPUT_FILE \ + | cut -d "|" -f 9 \ + >$OUTPUT_FILE +} + + test_lazy_prereq UNTRACKED_CACHE ' { git update-index --test-untracked-cache; ret=$?; } && test $ret -ne 1 @@ -129,19 +142,20 @@ EOF test_expect_success 'status first time (empty cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 3 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ..node-creation:3 + ..gitignore-invalidation:1 + ..directory-invalidation:0 + ..opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after first status' ' @@ -151,19 +165,20 @@ test_expect_success 'untracked cache after first status' ' test_expect_success 'status second time (fully populated cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after second status' ' @@ -174,8 +189,8 @@ test_expect_success 'untracked cache after second status' ' test_expect_success 'modify in root directory, one dir invalidation' ' avoid_racy && : >four && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -189,13 +204,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 1 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:1 + ..opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -223,8 +239,8 @@ EOF test_expect_success 'new .gitignore invalidates recursively' ' avoid_racy && echo four >.gitignore && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -238,13 +254,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 1 -opendir: 4 + ..node-creation:0 + ..gitignore-invalidation:1 + ..directory-invalidation:1 + ..opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -272,8 +289,8 @@ EOF test_expect_success 'new info/exclude invalidates everything' ' avoid_racy && echo three >>.git/info/exclude && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -285,13 +302,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ..node-creation:0 + ..gitignore-invalidation:1 + ..directory-invalidation:0 + ..opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -330,8 +348,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -343,13 +361,14 @@ A one EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -389,8 +408,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -402,13 +421,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -438,8 +458,8 @@ test_expect_success 'set up for sparse checkout testing' ' ' test_expect_success 'status after commit' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -448,13 +468,14 @@ test_expect_success 'status after commit' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 2 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after commit' ' @@ -496,9 +517,9 @@ test_expect_success 'create/modify files, some of which are gitignored' ' ' test_expect_success 'test sparse status with untracked cache' ' - : >../trace && + : >../trace.output && avoid_racy && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -509,13 +530,14 @@ test_expect_success 'test sparse status with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 2 -opendir: 2 + ..node-creation:0 + ..gitignore-invalidation:1 + ..directory-invalidation:2 + ..opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after status' ' @@ -539,8 +561,8 @@ EOF test_expect_success 'test sparse status again with untracked cache' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -551,13 +573,14 @@ test_expect_success 'test sparse status again with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'set up for test of subdir and sparse checkouts' ' @@ -568,8 +591,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' ' test_expect_success 'test sparse status with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -581,13 +604,14 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 2 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 3 + ..node-creation:2 + ..gitignore-invalidation:0 + ..directory-invalidation:1 + ..opendir:3 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' @@ -616,19 +640,20 @@ EOF test_expect_success 'test sparse status again with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'move entry in subdir from untracked to cached' ' diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index 45d025f96010..637391c6ce46 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' git config core.fsmonitor .git/hooks/fsmonitor-test && git update-index --untracked-cache && git update-index --fsmonitor && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \ git status && test-tool dump-untracked-cache >../before ) && @@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' EOF ( cd dot-git && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \ git status && test-tool dump-untracked-cache >../after ) && - grep "directory invalidation" trace-before >>before && - grep "directory invalidation" trace-after >>after && + grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before && + grep "directory-invalidation" trace-after | cut -d"|" -f 9 >>after && # UNTR extension unchanged, dir invalidation count unchanged test_cmp before after ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2021-05-08 0:08 ` [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget @ 2021-05-08 0:08 ` Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 0:08 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Previously, tests that wanted to verify that we don't traverse into a deep directory hierarchy that is ignored had no easy way to verify and enforce that behavior. Record information about the number of directories and paths we inspect while traversing the directory hierarchy in read_directory(), and when trace2 is enabled, print these statistics. Make use of these statistics in t7300 to simplify (and vastly improve the performance of) the "avoid traversing into ignored directories" test. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 8 ++++++ dir.h | 4 +++ t/t7063-status-untracked-cache.sh | 1 + t/t7300-clean.sh | 46 ++++++++++--------------------- 4 files changed, 27 insertions(+), 32 deletions(-) diff --git a/dir.c b/dir.c index 23c71ab7e9a1..896a9a62b2c7 100644 --- a/dir.c +++ b/dir.c @@ -2455,6 +2455,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only)) goto out; + dir->visited_directories++; if (untracked) untracked->check_only = !!check_only; @@ -2463,6 +2464,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, /* check how the file or directory should be treated */ state = treat_path(dir, untracked, &cdir, istate, &path, baselen, pathspec); + dir->visited_paths++; if (state > dir_state) dir_state = state; @@ -2778,6 +2780,10 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d static void trace2_read_directory_statistics(struct dir_struct *dir, struct repository *repo) { + trace2_data_intmax("read_directory", repo, + "directories-visited", dir->visited_directories); + trace2_data_intmax("read_directory", repo, + "paths-visited", dir->visited_paths); if (!dir->untracked) return; trace2_data_intmax("read_directory", repo, @@ -2798,6 +2804,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked; trace2_region_enter("dir", "read_directory", istate->repo); + dir->visited_paths = 0; + dir->visited_directories = 0; if (has_symlink_leading_path(path, len)) { trace_performance_leave("read directory %.*s", len, path); diff --git a/dir.h b/dir.h index 04d886cfce75..22c67907f689 100644 --- a/dir.h +++ b/dir.h @@ -336,6 +336,10 @@ struct dir_struct { struct oid_stat ss_info_exclude; struct oid_stat ss_excludes_file; unsigned unmanaged_exclude_files; + + /* Stats about the traversal */ + unsigned visited_paths; + unsigned visited_directories; }; /*Count the number of slashes for string s*/ diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 6bce65b439e3..1517c316892f 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -65,6 +65,7 @@ get_relevant_traces() { INPUT_FILE=$1 OUTPUT_FILE=$2 grep data.*read_directo $INPUT_FILE \ + | grep -v visited \ | cut -d "|" -f 9 \ >$OUTPUT_FILE } diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index c2a3b7b6a52b..2c10a7b64f11 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -747,42 +747,24 @@ test_expect_success 'clean untracked paths by pathspec' ' ' test_expect_success 'avoid traversing into ignored directories' ' - test_when_finished rm -f output error && + test_when_finished rm -f output error trace.* && test_create_repo avoid-traversing-deep-hierarchy && ( cd avoid-traversing-deep-hierarchy && - >directory-random-file.txt && - # Put this file under directory400/directory399/.../directory1/ - depth=400 && - for x in $(test_seq 1 $depth); do - mkdir "tmpdirectory$x" && - mv directory* "tmpdirectory$x" && - mv "tmpdirectory$x" "directory$x" - done && - - git clean -ffdxn -e directory$depth >../output 2>../error && - - test_must_be_empty ../output && - # We especially do not want things like - # "warning: could not open directory " - # appearing in the error output. It is true that directories - # that are too long cannot be opened, but we should not be - # recursing into those directories anyway since the very first - # level is ignored. - test_must_be_empty ../error && - - # alpine-linux-musl fails to "rm -rf" a directory with such - # a deeply nested hierarchy. Help it out by deleting the - # leading directories ourselves. Super slow, but, what else - # can we do? Without this, we will hit a - # error: Tests passed but test cleanup failed; aborting - # so do this ugly manual cleanup... - while test ! -f directory-random-file.txt; do - name=$(ls -d directory*) && - mv $name/* . && - rmdir $name - done + mkdir -p untracked/subdir/with/a && + >untracked/subdir/with/a/random-file.txt && + + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ + git clean -ffdxn -e untracked && + + grep data.*read_directo.*visited ../trace.output \ + | cut -d "|" -f 9 >../trace.relevant && + cat >../trace.expect <<-EOF && + directories-visited:1 + paths-visited:4 + EOF + test_cmp ../trace.expect ../trace.relevant ) ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v3 0/8] Directory traversal fixes 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget ` (7 preceding siblings ...) 2021-05-08 0:08 ` [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 Elijah Newren via GitGitGadget @ 2021-05-08 19:58 ` Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget ` (8 more replies) 8 siblings, 9 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren This patchset fixes a few directory traversal issues, where fill_directory() would traverse into directories that it shouldn't and not traverse into directories that it should (one of which was originally reported on this list at [1]). And it includes a few cleanups Changes since v2: * Move the RFC patches to the front * Deletes all the ugly test code that stole reviewer attention away from the rest of the series. :-) The RFC patches being first allow the test to be dramatically simplified and rewritten. * Included cleanups suggested by Phillip Oakley and Eric Sunshine (the cleanups suggested by others are obsolete with the test rewrite). Patches 1-3 are RFC because * (1) I'm not that familiar with trace1 & trace2; I've only used trace2 for region_enter() and region_leave() calls before. And I'm unsure if removing trace1 counts as a backward compatibility issue or not, though the trace2 documentation claims it's meant to replace trace1. * (2) The ls-files -i handling to print an error instead of operating as before might be considered a backward incompatible change. I want to hear others' opinions on that. Also, if anyone has any ideas about a better place to put the "Some sidenotes" from the sixth commit message rather than keeping them in a random commit message, that might be helpful too. [1] See https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/ or alternatively https://github.com/git-for-windows/git/issues/2732. Derrick Stolee (1): dir: update stale description of treat_directory() Elijah Newren (7): [RFC] dir: convert trace calls to trace2 equivalents [RFC] dir: report number of visited directories and paths with trace2 [RFC] ls-files: error out on -i unless -o or -c are specified t7300: add testcase showing unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal dir: avoid unnecessary traversal into ignored directory dir: traverse into untracked directories if they may have ignored subfiles builtin/ls-files.c | 3 + dir.c | 103 +++++++++------ dir.h | 4 + t/t1306-xdg-files.sh | 2 +- t/t3001-ls-files-others-exclude.sh | 5 + t/t3003-ls-files-exclude.sh | 4 +- t/t7063-status-untracked-cache.sh | 194 ++++++++++++++++------------- t/t7300-clean.sh | 41 ++++++ t/t7519-status-fsmonitor.sh | 8 +- 9 files changed, 238 insertions(+), 126 deletions(-) base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v3 Pull-Request: https://github.com/git/git/pull/1020 Range-diff vs v2: 7: 3a2394506a53 = 1: 9f1c0d78d739 [RFC] dir: convert trace calls to trace2 equivalents 8: fba4d65b78c7 ! 2: 8b511f228af8 [RFC] dir: reported number of visited directories and paths with trace2 @@ Metadata Author: Elijah Newren <newren@gmail.com> ## Commit message ## - [RFC] dir: reported number of visited directories and paths with trace2 + [RFC] dir: report number of visited directories and paths with trace2 - Previously, tests that wanted to verify that we don't traverse into a - deep directory hierarchy that is ignored had no easy way to verify and - enforce that behavior. Record information about the number of - directories and paths we inspect while traversing the directory - hierarchy in read_directory(), and when trace2 is enabled, print these - statistics. - - Make use of these statistics in t7300 to simplify (and vastly improve - the performance of) the "avoid traversing into ignored directories" - test. + Provide more statistics in trace2 output that include the number of + directories and total paths visited by the directory traversal logic. + Subsequent patches will take advantage of this to ensure we do not + unnecessarily traverse into ignored directories. Signed-off-by: Elijah Newren <newren@gmail.com> @@ t/t7063-status-untracked-cache.sh: get_relevant_traces() { | cut -d "|" -f 9 \ >$OUTPUT_FILE } - - ## t/t7300-clean.sh ## -@@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' ' - ' - - test_expect_success 'avoid traversing into ignored directories' ' -- test_when_finished rm -f output error && -+ test_when_finished rm -f output error trace.* && - test_create_repo avoid-traversing-deep-hierarchy && - ( - cd avoid-traversing-deep-hierarchy && - -- >directory-random-file.txt && -- # Put this file under directory400/directory399/.../directory1/ -- depth=400 && -- for x in $(test_seq 1 $depth); do -- mkdir "tmpdirectory$x" && -- mv directory* "tmpdirectory$x" && -- mv "tmpdirectory$x" "directory$x" -- done && -- -- git clean -ffdxn -e directory$depth >../output 2>../error && -- -- test_must_be_empty ../output && -- # We especially do not want things like -- # "warning: could not open directory " -- # appearing in the error output. It is true that directories -- # that are too long cannot be opened, but we should not be -- # recursing into those directories anyway since the very first -- # level is ignored. -- test_must_be_empty ../error && -- -- # alpine-linux-musl fails to "rm -rf" a directory with such -- # a deeply nested hierarchy. Help it out by deleting the -- # leading directories ourselves. Super slow, but, what else -- # can we do? Without this, we will hit a -- # error: Tests passed but test cleanup failed; aborting -- # so do this ugly manual cleanup... -- while test ! -f directory-random-file.txt; do -- name=$(ls -d directory*) && -- mv $name/* . && -- rmdir $name -- done -+ mkdir -p untracked/subdir/with/a && -+ >untracked/subdir/with/a/random-file.txt && -+ -+ GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ -+ git clean -ffdxn -e untracked && -+ -+ grep data.*read_directo.*visited ../trace.output \ -+ | cut -d "|" -f 9 >../trace.relevant && -+ cat >../trace.expect <<-EOF && -+ directories-visited:1 -+ paths-visited:4 -+ EOF -+ test_cmp ../trace.expect ../trace.relevant - ) - ' - 5: 3d8dd00ccd10 = 3: 44a1322c4402 [RFC] ls-files: error out on -i unless -o or -c are specified 1: a3bd253fa8e8 ! 4: dc3d3f247141 t7300: add testcase showing unnecessary traversal into ignored directory @@ Metadata ## Commit message ## t7300: add testcase showing unnecessary traversal into ignored directory - PNPM is apparently creating deeply nested (but ignored) directory - structures; traversing them is costly performance-wise, unnecessary, and - in some cases is even throwing warnings/errors because the paths are too - long to handle on various platforms. Add a testcase that demonstrates - this problem. + The PNPM package manager is apparently creating deeply nested (but + ignored) directory structures; traversing them is costly + performance-wise, unnecessary, and in some cases is even throwing + warnings/errors because the paths are too long to handle on various + platforms. Add a testcase that checks for such unnecessary directory + traversal. - Initial-test-by: Jason Gore <Jason.Gore@microsoft.com> - Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Elijah Newren <newren@gmail.com> ## t/t7300-clean.sh ## @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' ' ' +test_expect_failure 'avoid traversing into ignored directories' ' -+ test_when_finished rm -f output error && ++ test_when_finished rm -f output error trace.* && + test_create_repo avoid-traversing-deep-hierarchy && + ( + cd avoid-traversing-deep-hierarchy && + -+ >directory-random-file.txt && -+ # Put this file under directory400/directory399/.../directory1/ -+ depth=400 && -+ for x in $(test_seq 1 $depth); do -+ mkdir "tmpdirectory$x" && -+ mv directory* "tmpdirectory$x" && -+ mv "tmpdirectory$x" "directory$x" -+ done && ++ mkdir -p untracked/subdir/with/a && ++ >untracked/subdir/with/a/random-file.txt && + -+ git clean -ffdxn -e directory$depth >../output 2>../error && ++ GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ ++ git clean -ffdxn -e untracked ++ ) && + -+ test_must_be_empty ../output && -+ # We especially do not want things like -+ # "warning: could not open directory " -+ # appearing in the error output. It is true that directories -+ # that are too long cannot be opened, but we should not be -+ # recursing into those directories anyway since the very first -+ # level is ignored. -+ test_must_be_empty ../error && -+ -+ # alpine-linux-musl fails to "rm -rf" a directory with such -+ # a deeply nested hierarchy. Help it out by deleting the -+ # leading directories ourselves. Super slow, but, what else -+ # can we do? Without this, we will hit a -+ # error: Tests passed but test cleanup failed; aborting -+ # so do this ugly manual cleanup... -+ while test ! -f directory-random-file.txt; do -+ name=$(ls -d directory*) && -+ mv $name/* . && -+ rmdir $name -+ done -+ ) ++ grep data.*read_directo.*visited trace.output \ ++ | cut -d "|" -f 9 >trace.relevant && ++ cat >trace.expect <<-EOF && ++ directories-visited:1 ++ paths-visited:4 ++ EOF ++ test_cmp trace.expect trace.relevant +' + test_done 2: aa3a41e26eca ! 5: 73b03a1e8e05 t3001, t7300: add testcase showcasing missed directory traversal @@ t/t3001-ls-files-others-exclude.sh: EOF ## t/t7300-clean.sh ## @@ t/t7300-clean.sh: test_expect_failure 'avoid traversing into ignored directories' ' - ) + test_cmp trace.expect trace.relevant ' +test_expect_failure 'traverse into directories that may have ignored entries' ' 3: 3c3f6111da13 ! 6: 66ffc7f02d08 dir: avoid unnecessary traversal into ignored directory @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' ' -test_expect_failure 'avoid traversing into ignored directories' ' +test_expect_success 'avoid traversing into ignored directories' ' - test_when_finished rm -f output error && + test_when_finished rm -f output error trace.* && test_create_repo avoid-traversing-deep-hierarchy && ( 4: fad048339b81 ! 7: acde436b220e dir: traverse into untracked directories if they may have ignored subfiles @@ t/t3001-ls-files-others-exclude.sh: EOF ## t/t7300-clean.sh ## @@ t/t7300-clean.sh: test_expect_success 'avoid traversing into ignored directories' ' - ) + test_cmp trace.expect trace.relevant ' -test_expect_failure 'traverse into directories that may have ignored entries' ' 6: 1d825dfdc70b = 8: 57135c357774 dir: update stale description of treat_directory() -- gitgitgadget ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget @ 2021-05-08 19:58 ` Elijah Newren via GitGitGadget 2021-05-10 4:49 ` Junio C Hamano 2021-05-11 16:17 ` Jeff Hostetler 2021-05-08 19:58 ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget ` (7 subsequent siblings) 8 siblings, 2 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 34 ++++-- t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++------------- t/t7519-status-fsmonitor.sh | 8 +- 3 files changed, 135 insertions(+), 100 deletions(-) diff --git a/dir.c b/dir.c index 3474e67e8f3c..9f7c8debeab3 100644 --- a/dir.c +++ b/dir.c @@ -2760,12 +2760,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d return root; } +static void trace2_read_directory_statistics(struct dir_struct *dir, + struct repository *repo) +{ + if (!dir->untracked) + return; + trace2_data_intmax("read_directory", repo, + "node-creation", dir->untracked->dir_created); + trace2_data_intmax("read_directory", repo, + "gitignore-invalidation", + dir->untracked->gitignore_invalidated); + trace2_data_intmax("read_directory", repo, + "directory-invalidation", + dir->untracked->dir_invalidated); + trace2_data_intmax("read_directory", repo, + "opendir", dir->untracked->dir_opened); +} + int read_directory(struct dir_struct *dir, struct index_state *istate, const char *path, int len, const struct pathspec *pathspec) { struct untracked_cache_dir *untracked; - trace_performance_enter(); + trace2_region_enter("dir", "read_directory", istate->repo); if (has_symlink_leading_path(path, len)) { trace_performance_leave("read directory %.*s", len, path); @@ -2784,23 +2801,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, QSORT(dir->entries, dir->nr, cmp_dir_entry); QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); - trace_performance_leave("read directory %.*s", len, path); + trace2_region_leave("dir", "read_directory", istate->repo); if (dir->untracked) { static int force_untracked_cache = -1; - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); if (force_untracked_cache < 0) force_untracked_cache = git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); - trace_printf_key(&trace_untracked_stats, - "node creation: %u\n" - "gitignore invalidation: %u\n" - "directory invalidation: %u\n" - "opendir: %u\n", - dir->untracked->dir_created, - dir->untracked->gitignore_invalidated, - dir->untracked->dir_invalidated, - dir->untracked->dir_opened); if (force_untracked_cache && dir->untracked == istate->untracked && (dir->untracked->dir_opened || @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, FREE_AND_NULL(dir->untracked); } } + + if (trace2_is_enabled()) + trace2_read_directory_statistics(dir, istate->repo); return dir->nr; } diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index accefde72fb1..6bce65b439e3 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -57,6 +57,19 @@ iuc () { return $ret } +get_relevant_traces() { + # From the GIT_TRACE2_PERF data of the form + # $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT + # extract the $RELEVANT_STAT fields. We don't care about region_enter + # or region_leave, or stats for things outside read_directory. + INPUT_FILE=$1 + OUTPUT_FILE=$2 + grep data.*read_directo $INPUT_FILE \ + | cut -d "|" -f 9 \ + >$OUTPUT_FILE +} + + test_lazy_prereq UNTRACKED_CACHE ' { git update-index --test-untracked-cache; ret=$?; } && test $ret -ne 1 @@ -129,19 +142,20 @@ EOF test_expect_success 'status first time (empty cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 3 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ..node-creation:3 + ..gitignore-invalidation:1 + ..directory-invalidation:0 + ..opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after first status' ' @@ -151,19 +165,20 @@ test_expect_success 'untracked cache after first status' ' test_expect_success 'status second time (fully populated cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after second status' ' @@ -174,8 +189,8 @@ test_expect_success 'untracked cache after second status' ' test_expect_success 'modify in root directory, one dir invalidation' ' avoid_racy && : >four && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -189,13 +204,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 1 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:1 + ..opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -223,8 +239,8 @@ EOF test_expect_success 'new .gitignore invalidates recursively' ' avoid_racy && echo four >.gitignore && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -238,13 +254,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 1 -opendir: 4 + ..node-creation:0 + ..gitignore-invalidation:1 + ..directory-invalidation:1 + ..opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -272,8 +289,8 @@ EOF test_expect_success 'new info/exclude invalidates everything' ' avoid_racy && echo three >>.git/info/exclude && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -285,13 +302,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ..node-creation:0 + ..gitignore-invalidation:1 + ..directory-invalidation:0 + ..opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -330,8 +348,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -343,13 +361,14 @@ A one EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -389,8 +408,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -402,13 +421,14 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -438,8 +458,8 @@ test_expect_success 'set up for sparse checkout testing' ' ' test_expect_success 'status after commit' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -448,13 +468,14 @@ test_expect_success 'status after commit' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 2 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after commit' ' @@ -496,9 +517,9 @@ test_expect_success 'create/modify files, some of which are gitignored' ' ' test_expect_success 'test sparse status with untracked cache' ' - : >../trace && + : >../trace.output && avoid_racy && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -509,13 +530,14 @@ test_expect_success 'test sparse status with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 2 -opendir: 2 + ..node-creation:0 + ..gitignore-invalidation:1 + ..directory-invalidation:2 + ..opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after status' ' @@ -539,8 +561,8 @@ EOF test_expect_success 'test sparse status again with untracked cache' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -551,13 +573,14 @@ test_expect_success 'test sparse status again with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'set up for test of subdir and sparse checkouts' ' @@ -568,8 +591,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' ' test_expect_success 'test sparse status with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -581,13 +604,14 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 2 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 3 + ..node-creation:2 + ..gitignore-invalidation:0 + ..directory-invalidation:1 + ..opendir:3 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' @@ -616,19 +640,20 @@ EOF test_expect_success 'test sparse status again with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ..node-creation:0 + ..gitignore-invalidation:0 + ..directory-invalidation:0 + ..opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'move entry in subdir from untracked to cached' ' diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index 45d025f96010..637391c6ce46 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' git config core.fsmonitor .git/hooks/fsmonitor-test && git update-index --untracked-cache && git update-index --fsmonitor && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \ git status && test-tool dump-untracked-cache >../before ) && @@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' EOF ( cd dot-git && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \ git status && test-tool dump-untracked-cache >../after ) && - grep "directory invalidation" trace-before >>before && - grep "directory invalidation" trace-after >>after && + grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before && + grep "directory-invalidation" trace-after | cut -d"|" -f 9 >>after && # UNTR extension unchanged, dir invalidation count unchanged test_cmp before after ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents 2021-05-08 19:58 ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget @ 2021-05-10 4:49 ` Junio C Hamano 2021-05-11 17:23 ` Elijah Newren 2021-05-11 16:17 ` Jeff Hostetler 1 sibling, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-10 4:49 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static void trace2_read_directory_statistics(struct dir_struct *dir, > + struct repository *repo) > +{ > + if (!dir->untracked) > + return; > + trace2_data_intmax("read_directory", repo, > + "node-creation", dir->untracked->dir_created); > + trace2_data_intmax("read_directory", repo, > + "gitignore-invalidation", > + dir->untracked->gitignore_invalidated); > + trace2_data_intmax("read_directory", repo, > + "directory-invalidation", > + dir->untracked->dir_invalidated); > + trace2_data_intmax("read_directory", repo, > + "opendir", dir->untracked->dir_opened); > +} > + This obviously looks like an equivalent to what happens in the original inside the "if (dir->untracked)" block. And we have a performance_{enter,leave} pair replaced with a region_[enter,leave} pair. > - trace_performance_enter(); > + trace2_region_enter("dir", "read_directory", istate->repo); > ... > - trace_performance_leave("read directory %.*s", len, path); > + trace2_region_leave("dir", "read_directory", istate->repo); > if (dir->untracked) { > static int force_untracked_cache = -1; > - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); > > if (force_untracked_cache < 0) > force_untracked_cache = > git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); > - trace_printf_key(&trace_untracked_stats, > - "node creation: %u\n" > - "gitignore invalidation: %u\n" > - "directory invalidation: %u\n" > - "opendir: %u\n", > - dir->untracked->dir_created, > - dir->untracked->gitignore_invalidated, > - dir->untracked->dir_invalidated, > - dir->untracked->dir_opened); > if (force_untracked_cache && > dir->untracked == istate->untracked && > (dir->untracked->dir_opened || Removal of the trace_printf() in the middle made the body of this if() statement much less distracting, which is good. > @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, > FREE_AND_NULL(dir->untracked); > } > } > + > + if (trace2_is_enabled()) > + trace2_read_directory_statistics(dir, istate->repo); This slightly changes the semantics in that the original did an equivalent emitting from inside the "if (dir->untracked)" block, but this call is hoisted outside, and the new helper knows how to be silent when untracked thing is not in effect, so the net effect at this step is the same. And if we ever add tracing statics that is relevant when !dir->untracked is true, the new code organization is easier to work with. The only curious thing is the guard "if (trace2_is_enabled())"; correctness-wise, are there bad things going to happen if it is not here, or is this a performance hack, or is it more for its documentation value (meaning, it would be a bug if we later added things that are irrelevant when trace is not enabled to the helper)? > @@ -57,6 +57,19 @@ iuc () { > return $ret > } > > +get_relevant_traces() { Style. SP on both sides of "()". > + # From the GIT_TRACE2_PERF data of the form > + # $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT > + # extract the $RELEVANT_STAT fields. We don't care about region_enter > + # or region_leave, or stats for things outside read_directory. > + INPUT_FILE=$1 > + OUTPUT_FILE=$2 > + grep data.*read_directo $INPUT_FILE \ > + | cut -d "|" -f 9 \ > + >$OUTPUT_FILE Style. Wrapping the line after pipe '|' will allow you to omit the backslash. Also quote the redirection target, i.e. >"$OUTPUT_FILE", to help certain vintage of bash. Those who are more familiar with the trace2 infrastructure may want to further comment, but it looked obvious and straightforward to me. Thanks. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents 2021-05-10 4:49 ` Junio C Hamano @ 2021-05-11 17:23 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-11 17:23 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Sun, May 9, 2021 at 9:49 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +static void trace2_read_directory_statistics(struct dir_struct *dir, > > + struct repository *repo) > > +{ > > + if (!dir->untracked) > > + return; > > + trace2_data_intmax("read_directory", repo, > > + "node-creation", dir->untracked->dir_created); > > + trace2_data_intmax("read_directory", repo, > > + "gitignore-invalidation", > > + dir->untracked->gitignore_invalidated); > > + trace2_data_intmax("read_directory", repo, > > + "directory-invalidation", > > + dir->untracked->dir_invalidated); > > + trace2_data_intmax("read_directory", repo, > > + "opendir", dir->untracked->dir_opened); > > +} > > + > > This obviously looks like an equivalent to what happens in the > original inside the "if (dir->untracked)" block. > > And we have a performance_{enter,leave} pair replaced with > a region_[enter,leave} pair. > > > - trace_performance_enter(); > > + trace2_region_enter("dir", "read_directory", istate->repo); > > ... > > - trace_performance_leave("read directory %.*s", len, path); > > + trace2_region_leave("dir", "read_directory", istate->repo); > > > if (dir->untracked) { > > static int force_untracked_cache = -1; > > - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); > > > > if (force_untracked_cache < 0) > > force_untracked_cache = > > git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); > > - trace_printf_key(&trace_untracked_stats, > > - "node creation: %u\n" > > - "gitignore invalidation: %u\n" > > - "directory invalidation: %u\n" > > - "opendir: %u\n", > > - dir->untracked->dir_created, > > - dir->untracked->gitignore_invalidated, > > - dir->untracked->dir_invalidated, > > - dir->untracked->dir_opened); > > if (force_untracked_cache && > > dir->untracked == istate->untracked && > > (dir->untracked->dir_opened || > > Removal of the trace_printf() in the middle made the body of this > if() statement much less distracting, which is good. > > > @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, > > FREE_AND_NULL(dir->untracked); > > } > > } > > + > > + if (trace2_is_enabled()) > > + trace2_read_directory_statistics(dir, istate->repo); > > This slightly changes the semantics in that the original did an > equivalent emitting from inside the "if (dir->untracked)" block, but > this call is hoisted outside, and the new helper knows how to be > silent when untracked thing is not in effect, so the net effect at > this step is the same. And if we ever add tracing statics that is > relevant when !dir->untracked is true, the new code organization is > easier to work with. > > The only curious thing is the guard "if (trace2_is_enabled())"; > correctness-wise, are there bad things going to happen if it is not > here, or is this a performance hack, or is it more for its > documentation value (meaning, it would be a bug if we later added > things that are irrelevant when trace is not enabled to the helper)? No, there's nothing bad that would happen here. It was a combination of a performance hack and documentation in case trace2_read_directory_statistics() started gaining other code besides trace2_*() calls, but which code was only relevant when trace2 was enabled. Turns out, though, that Jeff's suggestion to also print the path in the statistics is going to require me creating a temporary strbuf so that I can get a NUL-terminated string. We only want to do that when trace2_is_enabled(), so that will make the introduction of that check a bit more natural. > > @@ -57,6 +57,19 @@ iuc () { > > return $ret > > } > > > > +get_relevant_traces() { > > Style. SP on both sides of "()". Will fix. > > > + # From the GIT_TRACE2_PERF data of the form > > + # $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT > > + # extract the $RELEVANT_STAT fields. We don't care about region_enter > > + # or region_leave, or stats for things outside read_directory. > > + INPUT_FILE=$1 > > + OUTPUT_FILE=$2 > > + grep data.*read_directo $INPUT_FILE \ > > + | cut -d "|" -f 9 \ > > + >$OUTPUT_FILE > > Style. Wrapping the line after pipe '|' will allow you to omit the > backslash. Also quote the redirection target, i.e. >"$OUTPUT_FILE", > to help certain vintage of bash. Will fix. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents 2021-05-08 19:58 ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-10 4:49 ` Junio C Hamano @ 2021-05-11 16:17 ` Jeff Hostetler 2021-05-11 17:29 ` Elijah Newren 1 sibling, 1 reply; 90+ messages in thread From: Jeff Hostetler @ 2021-05-11 16:17 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On 5/8/21 3:58 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > dir.c | 34 ++++-- > t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++------------- > t/t7519-status-fsmonitor.sh | 8 +- > 3 files changed, 135 insertions(+), 100 deletions(-) > > diff --git a/dir.c b/dir.c > index 3474e67e8f3c..9f7c8debeab3 100644 > --- a/dir.c > +++ b/dir.c > @@ -2760,12 +2760,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > return root; > } > > +static void trace2_read_directory_statistics(struct dir_struct *dir, > + struct repository *repo) > +{ > + if (!dir->untracked) > + return; Is there value to also printing the path? The existing `trace_performance_leave()` calls were, but I'm familiar enough with this code to say if the output wasn't always something like ".". > + trace2_data_intmax("read_directory", repo, > + "node-creation", dir->untracked->dir_created); > + trace2_data_intmax("read_directory", repo, > + "gitignore-invalidation", > + dir->untracked->gitignore_invalidated); > + trace2_data_intmax("read_directory", repo, > + "directory-invalidation", > + dir->untracked->dir_invalidated); > + trace2_data_intmax("read_directory", repo, > + "opendir", dir->untracked->dir_opened); > +} > + The existing code was quite tangled and I think this helps make things more clear. > int read_directory(struct dir_struct *dir, struct index_state *istate, > const char *path, int len, const struct pathspec *pathspec) > { > struct untracked_cache_dir *untracked; > > - trace_performance_enter(); > + trace2_region_enter("dir", "read_directory", istate->repo); > > if (has_symlink_leading_path(path, len)) { > trace_performance_leave("read directory %.*s", len, path); This `trace_performance_leave()` inside the `if` needs to be converted too. > @@ -2784,23 +2801,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, > QSORT(dir->entries, dir->nr, cmp_dir_entry); > QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); > > - trace_performance_leave("read directory %.*s", len, path); > + trace2_region_leave("dir", "read_directory", istate->repo); Can we put the call to `trace2_read_directory_statistics()` before the above `trace2_region_leave()` call? Then those stats will appear indented between the begin- and end-region events in the output. That way, the following `if (dir-untracked) {...}` is only concerned with the untracked cache and/or freeing that data. > if (dir->untracked) { > static int force_untracked_cache = -1; > - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); > > if (force_untracked_cache < 0) > force_untracked_cache = > git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); > - trace_printf_key(&trace_untracked_stats, > - "node creation: %u\n" > - "gitignore invalidation: %u\n" > - "directory invalidation: %u\n" > - "opendir: %u\n", > - dir->untracked->dir_created, > - dir->untracked->gitignore_invalidated, > - dir->untracked->dir_invalidated, > - dir->untracked->dir_opened); > if (force_untracked_cache && > dir->untracked == istate->untracked && > (dir->untracked->dir_opened || > @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, > FREE_AND_NULL(dir->untracked); > } > } > + > + if (trace2_is_enabled()) > + trace2_read_directory_statistics(dir, istate->repo); Also, I think it'd be ok to move the `trace2_is_enabled()` call inside the function. Since we're also testing `!dir->untracked` inside the function. The more that I look at the before and after versions, the more I think the `trace2_read_directory_statistics()` call should be up before the `trace2_region_leave()`. Here at the bottom of the function, we may have already freed `dir->untracked`. I'm not familiar enough with this code to know if that is a good or bad thing. > return dir->nr; > } > ... Thanks, Jeff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents 2021-05-11 16:17 ` Jeff Hostetler @ 2021-05-11 17:29 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-11 17:29 UTC (permalink / raw) To: Jeff Hostetler Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Tue, May 11, 2021 at 9:17 AM Jeff Hostetler <git@jeffhostetler.com> wrote: > > On 5/8/21 3:58 PM, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > dir.c | 34 ++++-- > > t/t7063-status-untracked-cache.sh | 193 +++++++++++++++++------------- > > t/t7519-status-fsmonitor.sh | 8 +- > > 3 files changed, 135 insertions(+), 100 deletions(-) > > > > diff --git a/dir.c b/dir.c > > index 3474e67e8f3c..9f7c8debeab3 100644 > > --- a/dir.c > > +++ b/dir.c > > @@ -2760,12 +2760,29 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > > return root; > > } > > > > +static void trace2_read_directory_statistics(struct dir_struct *dir, > > + struct repository *repo) > > +{ > > + if (!dir->untracked) > > + return; > > Is there value to also printing the path? > The existing `trace_performance_leave()` calls were, but > I'm familiar enough with this code to say if the output > wasn't always something like ".". The path will most likely just be "" (i.e. the empty string) for the toplevel directory, but not always so it may be useful to print it. I'll add it. > > + trace2_data_intmax("read_directory", repo, > > + "node-creation", dir->untracked->dir_created); > > + trace2_data_intmax("read_directory", repo, > > + "gitignore-invalidation", > > + dir->untracked->gitignore_invalidated); > > + trace2_data_intmax("read_directory", repo, > > + "directory-invalidation", > > + dir->untracked->dir_invalidated); > > + trace2_data_intmax("read_directory", repo, > > + "opendir", dir->untracked->dir_opened); > > +} > > + > > The existing code was quite tangled and I think this helps > make things more clear. > > > > int read_directory(struct dir_struct *dir, struct index_state *istate, > > const char *path, int len, const struct pathspec *pathspec) > > { > > struct untracked_cache_dir *untracked; > > > > - trace_performance_enter(); > > + trace2_region_enter("dir", "read_directory", istate->repo); > > > > if (has_symlink_leading_path(path, len)) { > > trace_performance_leave("read directory %.*s", len, path); > > This `trace_performance_leave()` inside the `if` needs to be > converted too. Ooh, good catch. Will fix. > > @@ -2784,23 +2801,13 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, > > QSORT(dir->entries, dir->nr, cmp_dir_entry); > > QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); > > > > - trace_performance_leave("read directory %.*s", len, path); > > + trace2_region_leave("dir", "read_directory", istate->repo); > > Can we put the call to `trace2_read_directory_statistics()` before > the above `trace2_region_leave()` call? Then those stats will > appear indented between the begin- and end-region events in the output. > > That way, the following `if (dir-untracked) {...}` is only > concerned with the untracked cache and/or freeing that data. Makes sense, I'll move it. > > if (dir->untracked) { > > static int force_untracked_cache = -1; > > - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); > > > > if (force_untracked_cache < 0) > > force_untracked_cache = > > git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); > > - trace_printf_key(&trace_untracked_stats, > > - "node creation: %u\n" > > - "gitignore invalidation: %u\n" > > - "directory invalidation: %u\n" > > - "opendir: %u\n", > > - dir->untracked->dir_created, > > - dir->untracked->gitignore_invalidated, > > - dir->untracked->dir_invalidated, > > - dir->untracked->dir_opened); > > if (force_untracked_cache && > > dir->untracked == istate->untracked && > > (dir->untracked->dir_opened || > > @@ -2811,6 +2818,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, > > FREE_AND_NULL(dir->untracked); > > } > > } > > + > > + if (trace2_is_enabled()) > > + trace2_read_directory_statistics(dir, istate->repo); > > Also, I think it'd be ok to move the `trace2_is_enabled()` call > inside the function. Since we're also testing `!dir->untracked` > inside the function. Actually, I can't do that. The path passed to this function is not going to always be (and will often not be) NUL-terminated, but trace2_data_string() expects a NUL-terminated string. So, I'm going to make a temporary strbuf and copy the path into it, but of course I only want to spend time doing that if trace2_is_enabled(). > The more that I look at the before and after versions, the > more I think the `trace2_read_directory_statistics()` call > should be up before the `trace2_region_leave()`. Here at the > bottom of the function, we may have already freed `dir->untracked`. > I'm not familiar enough with this code to know if that is a > good or bad thing. Yeah, the statistics really need to be moved earlier, both for the nesting reasons you point out and because otherwise the statistics won't print whenever dir->untracked != istate->untracked. I'll move them. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget @ 2021-05-08 19:58 ` Elijah Newren via GitGitGadget 2021-05-10 5:00 ` Junio C Hamano 2021-05-08 19:58 ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget ` (6 subsequent siblings) 8 siblings, 1 reply; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Provide more statistics in trace2 output that include the number of directories and total paths visited by the directory traversal logic. Subsequent patches will take advantage of this to ensure we do not unnecessarily traverse into ignored directories. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 8 ++++++++ dir.h | 4 ++++ t/t7063-status-untracked-cache.sh | 1 + 3 files changed, 13 insertions(+) diff --git a/dir.c b/dir.c index 9f7c8debeab3..dfb174227b36 100644 --- a/dir.c +++ b/dir.c @@ -2440,6 +2440,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only)) goto out; + dir->visited_directories++; if (untracked) untracked->check_only = !!check_only; @@ -2448,6 +2449,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, /* check how the file or directory should be treated */ state = treat_path(dir, untracked, &cdir, istate, &path, baselen, pathspec); + dir->visited_paths++; if (state > dir_state) dir_state = state; @@ -2763,6 +2765,10 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d static void trace2_read_directory_statistics(struct dir_struct *dir, struct repository *repo) { + trace2_data_intmax("read_directory", repo, + "directories-visited", dir->visited_directories); + trace2_data_intmax("read_directory", repo, + "paths-visited", dir->visited_paths); if (!dir->untracked) return; trace2_data_intmax("read_directory", repo, @@ -2783,6 +2789,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked; trace2_region_enter("dir", "read_directory", istate->repo); + dir->visited_paths = 0; + dir->visited_directories = 0; if (has_symlink_leading_path(path, len)) { trace_performance_leave("read directory %.*s", len, path); diff --git a/dir.h b/dir.h index 04d886cfce75..22c67907f689 100644 --- a/dir.h +++ b/dir.h @@ -336,6 +336,10 @@ struct dir_struct { struct oid_stat ss_info_exclude; struct oid_stat ss_excludes_file; unsigned unmanaged_exclude_files; + + /* Stats about the traversal */ + unsigned visited_paths; + unsigned visited_directories; }; /*Count the number of slashes for string s*/ diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 6bce65b439e3..1517c316892f 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -65,6 +65,7 @@ get_relevant_traces() { INPUT_FILE=$1 OUTPUT_FILE=$2 grep data.*read_directo $INPUT_FILE \ + | grep -v visited \ | cut -d "|" -f 9 \ >$OUTPUT_FILE } -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 2021-05-08 19:58 ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget @ 2021-05-10 5:00 ` Junio C Hamano 0 siblings, 0 replies; 90+ messages in thread From: Junio C Hamano @ 2021-05-10 5:00 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Elijah Newren <newren@gmail.com> > > Provide more statistics in trace2 output that include the number of > directories and total paths visited by the directory traversal logic. > Subsequent patches will take advantage of this to ensure we do not > unnecessarily traverse into ignored directories. And this change is the reason behind how the call to the trace statistics helper is now outside the "if (untracked)" block after patch 1/8; makes sense to me. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget @ 2021-05-08 19:58 ` Elijah Newren via GitGitGadget 2021-05-10 5:09 ` Junio C Hamano 2021-05-08 19:59 ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (5 subsequent siblings) 8 siblings, 1 reply; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:58 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> ls-files --ignored can be used together with either --others or --cached. After being perplexed for a bit and digging in to the code, I assumed that ls-files -i was just broken and not printing anything and had a nice patch ready to submit when I finally realized that -i can be used with --cached to find tracked ignores. While that was a mistake on my part, and a careful reading of the documentation could have made this more clear, I suspect this is an error others are likely to make as well. In fact, of two uses in our testsuite, I believe one of the two did make this error. In t1306.13, there are NO tracked files, and all the excludes built up and used in that test and in previous tests thus have to be about untracked files. However, since they were looking for an empty result, the mistake went unnoticed as their erroneous command also just happened to give an empty answer. -i will most the time be used with -o, which would suggest we could just make -i imply -o in the absence of either a -o or -c, but that would be a backward incompatible break. Instead, let's just flag -i without either a -o or -c as an error, and update the two relevant testcases to specify their intent. Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/ls-files.c | 3 +++ t/t1306-xdg-files.sh | 2 +- t/t3003-ls-files-exclude.sh | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 60a2913a01e9..9f74b1ab2e69 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) if (pathspec.nr && error_unmatch) ps_matched = xcalloc(pathspec.nr, 1); + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) + die("ls-files --ignored is usually used with --others, but --cached is the default. Please specify which you want."); + if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given) die("ls-files --ignored needs some exclude pattern"); diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh index dd87b43be1a6..40d3c42618c0 100755 --- a/t/t1306-xdg-files.sh +++ b/t/t1306-xdg-files.sh @@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' ' test_expect_success 'Checking XDG ignore file when HOME is unset' ' (sane_unset HOME && git config --unset core.excludesfile && - git ls-files --exclude-standard --ignored >actual) && + git ls-files --exclude-standard --ignored --others >actual) && test_must_be_empty actual ' diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh index d5ec333131f9..c41c4f046abf 100755 --- a/t/t3003-ls-files-exclude.sh +++ b/t/t3003-ls-files-exclude.sh @@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' ' ' check_all_output -test_expect_success 'ls-files -i lists only tracked-but-ignored files' ' +test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' ' echo content >other-file && git add other-file && echo file >expect && - git ls-files -i --exclude-standard >output && + git ls-files -i -c --exclude-standard >output && test_cmp expect output ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified 2021-05-08 19:58 ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget @ 2021-05-10 5:09 ` Junio C Hamano 2021-05-11 17:40 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-10 5:09 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > if (pathspec.nr && error_unmatch) > ps_matched = xcalloc(pathspec.nr, 1); > > + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) > + die("ls-files --ignored is usually used with --others, but --cached is the default. Please specify which you want."); > + So "git ls-files -i" would suddenly start erroring out and users are to scramble and patch their scripts? More importantly, the message does not make much sense. "I is usually used with O" is very true, but the mention of "usually" here means it is not an error for "I" to be used without "O". That part is very understandable and correct. But I do not know what "but --cached is the default" part wants to say. If it is the _default_, and (assuming that what I read in the proposed log message is correct) the combination of "-i -c" is valid, then I would understand the message if the code were more like this: if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) { show_cached = 1; /* default */ warning("ls-files -i given without -o/-c; defaulting to -i -c"); } If we are not defaulting to cached, then die("ls-files -i must be used with either -o or -c"); would also make sense. The variant presented in the patch does not make sense to me. > diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh > index dd87b43be1a6..40d3c42618c0 100755 > --- a/t/t1306-xdg-files.sh > +++ b/t/t1306-xdg-files.sh > @@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' ' > test_expect_success 'Checking XDG ignore file when HOME is unset' ' > (sane_unset HOME && > git config --unset core.excludesfile && > - git ls-files --exclude-standard --ignored >actual) && > + git ls-files --exclude-standard --ignored --others >actual) && > test_must_be_empty actual > ' > > diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh > index d5ec333131f9..c41c4f046abf 100755 > --- a/t/t3003-ls-files-exclude.sh > +++ b/t/t3003-ls-files-exclude.sh > @@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' ' > ' > check_all_output > > -test_expect_success 'ls-files -i lists only tracked-but-ignored files' ' > +test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' ' > echo content >other-file && > git add other-file && > echo file >expect && > - git ls-files -i --exclude-standard >output && > + git ls-files -i -c --exclude-standard >output && > test_cmp expect output > ' ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified 2021-05-10 5:09 ` Junio C Hamano @ 2021-05-11 17:40 ` Elijah Newren 2021-05-11 22:32 ` Junio C Hamano 0 siblings, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-11 17:40 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Sun, May 9, 2021 at 10:09 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) > > if (pathspec.nr && error_unmatch) > > ps_matched = xcalloc(pathspec.nr, 1); > > > > + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) > > + die("ls-files --ignored is usually used with --others, but --cached is the default. Please specify which you want."); > > + > > So "git ls-files -i" would suddenly start erroring out and users are > to scramble and patch their scripts? Thus the reason I marked this as "RFC" and called it out in the cover letter for folks to comment on. I figured that if I was having difficulty using it correctly and even our own testsuite showed that 50% of such invocations were wrong (despite being reviewed[1]), then it seems likely to me that erroring out to inform folks of this problem might be warranted. But, if folks disagree, I can switch it to a warning instead. [1] https://lore.kernel.org/git/20120724133227.GA14422@sigill.intra.peff.net/#t > More importantly, the message does not make much sense. "I is > usually used with O" is very true, but the mention of "usually" here > means it is not an error for "I" to be used without "O". That part > is very understandable and correct. > > But I do not know what "but --cached is the default" part wants to > say. If it is the _default_, and (assuming that what I read in the > proposed log message is correct) the combination of "-i -c" is valid, > then I would understand the message if the code were more like this: > > if ((dir.flags & DIR_SHOW_IGNORED) && > !show_others && !show_cached) { > show_cached = 1; /* default */ > warning("ls-files -i given without -o/-c; defaulting to -i -c"); > } > > If we are not defaulting to cached, then > > die("ls-files -i must be used with either -o or -c"); > > would also make sense. Ooh, that wording is much nicer. I'll adopt the latter suggestion, but let me know if you'd rather I went the warning route. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified 2021-05-11 17:40 ` Elijah Newren @ 2021-05-11 22:32 ` Junio C Hamano 0 siblings, 0 replies; 90+ messages in thread From: Junio C Hamano @ 2021-05-11 22:32 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon Elijah Newren <newren@gmail.com> writes: >> If we are not defaulting to cached, then >> >> die("ls-files -i must be used with either -o or -c"); >> >> would also make sense. > > Ooh, that wording is much nicer. I'll adopt the latter suggestion, > but let me know if you'd rather I went the warning route. Even though warning would be safer, I have no strong prefeference. Either way will resolve my puzzlement. Thanks. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-05-08 19:58 ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget @ 2021-05-08 19:59 ` Elijah Newren via GitGitGadget 2021-05-10 5:28 ` Junio C Hamano 2021-05-08 19:59 ` [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget ` (4 subsequent siblings) 8 siblings, 1 reply; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The PNPM package manager is apparently creating deeply nested (but ignored) directory structures; traversing them is costly performance-wise, unnecessary, and in some cases is even throwing warnings/errors because the paths are too long to handle on various platforms. Add a testcase that checks for such unnecessary directory traversal. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7300-clean.sh | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index a74816ca8b46..b7c9898fac5b 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,4 +746,26 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' +test_expect_failure 'avoid traversing into ignored directories' ' + test_when_finished rm -f output error trace.* && + test_create_repo avoid-traversing-deep-hierarchy && + ( + cd avoid-traversing-deep-hierarchy && + + mkdir -p untracked/subdir/with/a && + >untracked/subdir/with/a/random-file.txt && + + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ + git clean -ffdxn -e untracked + ) && + + grep data.*read_directo.*visited trace.output \ + | cut -d "|" -f 9 >trace.relevant && + cat >trace.expect <<-EOF && + directories-visited:1 + paths-visited:4 + EOF + test_cmp trace.expect trace.relevant +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-08 19:59 ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-10 5:28 ` Junio C Hamano 2021-05-11 17:45 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-10 5:28 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > +test_expect_failure 'avoid traversing into ignored directories' ' > + test_when_finished rm -f output error trace.* && > + test_create_repo avoid-traversing-deep-hierarchy && > + ( > + cd avoid-traversing-deep-hierarchy && > + > + mkdir -p untracked/subdir/with/a && > + >untracked/subdir/with/a/random-file.txt && > + > + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ > + git clean -ffdxn -e untracked > + ) && > + > + grep data.*read_directo.*visited trace.output \ > + | cut -d "|" -f 9 >trace.relevant && > + cat >trace.expect <<-EOF && > + directories-visited:1 > + paths-visited:4 Are the origins of '1' and '4' trivially obvious to those who are reading the test, or do these deserve comments? We create an empty test repository, go there and create a untracked/ hierarchy with a junk file, and tell "clean" that 'untracked' is "also" in the exclude pattern (but since there is no other exclude pattern, that is the only one), so everything underneath untracked/ we have no reason to inspect. So, we do not visit 'untracked' directory. Which ones do we visit? Is '1' coming from the top-level of the working tree '.'? What about the number of visited paths '4' (the trace is stored outside this new test repository, so that's not it). Thanks. > + EOF > + test_cmp trace.expect trace.relevant > +' > + > test_done ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-10 5:28 ` Junio C Hamano @ 2021-05-11 17:45 ` Elijah Newren 2021-05-11 22:43 ` Junio C Hamano 0 siblings, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-11 17:45 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Sun, May 9, 2021 at 10:28 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > +test_expect_failure 'avoid traversing into ignored directories' ' > > + test_when_finished rm -f output error trace.* && > > + test_create_repo avoid-traversing-deep-hierarchy && > > + ( > > + cd avoid-traversing-deep-hierarchy && > > + > > + mkdir -p untracked/subdir/with/a && > > + >untracked/subdir/with/a/random-file.txt && > > + > > + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ > > + git clean -ffdxn -e untracked > > + ) && > > + > > + grep data.*read_directo.*visited trace.output \ > > + | cut -d "|" -f 9 >trace.relevant && > > + cat >trace.expect <<-EOF && > > + directories-visited:1 > > + paths-visited:4 > > Are the origins of '1' and '4' trivially obvious to those who are > reading the test, or do these deserve comments? > > We create an empty test repository, go there and create a untracked/ > hierarchy with a junk file, and tell "clean" that 'untracked' is > "also" in the exclude pattern (but since there is no other exclude > pattern, that is the only one), so everything underneath untracked/ > we have no reason to inspect. > > So, we do not visit 'untracked' directory. Which ones do we visit? > Is '1' coming from the top-level of the working tree '.'? What > about the number of visited paths '4' (the trace is stored outside > this new test repository, so that's not it). Good points. I'll make a comment that directories-visited:1 is about ensuring we only went into the toplevel directory, and I'll removed the paths-visited check. But to answer your question, the paths we visit are '.', '..', '.git', and 'untracked', the first three of which we mark as path_none and don't recurse into because of special rules for those paths, and the last of which we shouldn't recurse into since it is ignored. There weren't any non-directory files in the toplevel directory, or those would also be included in the paths-visited count. A later patch in the series will fix the code to not recurse into the 'untracked' directory, fixing this test. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-11 17:45 ` Elijah Newren @ 2021-05-11 22:43 ` Junio C Hamano 2021-05-12 2:07 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-11 22:43 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon Elijah Newren <newren@gmail.com> writes: > But to answer your question, the paths we visit are '.', '..', '.git', > and 'untracked', the first three of which we mark as path_none and > don't recurse into because of special rules for those paths, and the > last of which we shouldn't recurse into since it is ignored. Not a hard requirement, but I wish if we entirely ignored "." and ".." in our code (not just not counting, but making whoever calls readdir() skip and call it again when it gets "." or ".."). https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html seems to imply that readdir() may not give "." or ".." (if dot or dot-dot exists, you are to return them only once, which implies that it is perfectly OK for dot or dot-dot to be missing). So dropping the test for number of visited paths would be nicer from portability's point of view ;-) Thanks. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-11 22:43 ` Junio C Hamano @ 2021-05-12 2:07 ` Elijah Newren 2021-05-12 3:17 ` Junio C Hamano 0 siblings, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-12 2:07 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Tue, May 11, 2021 at 3:43 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > But to answer your question, the paths we visit are '.', '..', '.git', > > and 'untracked', the first three of which we mark as path_none and > > don't recurse into because of special rules for those paths, and the > > last of which we shouldn't recurse into since it is ignored. > > Not a hard requirement, but I wish if we entirely ignored "." and > ".." in our code (not just not counting, but making whoever calls > readdir() skip and call it again when it gets "." or ".."). > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html > > seems to imply that readdir() may not give "." or ".." (if dot or > dot-dot exists, you are to return them only once, which implies that > it is perfectly OK for dot or dot-dot to be missing). Something like this? diff --git a/dir.c b/dir.c index 993a12145f..7f470bc701 100644 --- a/dir.c +++ b/dir.c @@ -2341,7 +2341,11 @@ static int read_cached_dir(struct cached_dir *cdir) struct dirent *de; if (cdir->fdir) { - de = readdir(cdir->fdir); + while ((de = readdir(cdir->fdir))) { + /* Ignore '.' and '..' by re-looping; handle the rest */ + if (!de || !is_dot_or_dotdot(de->d_name)) + break; + } if (!de) { cdir->d_name = NULL; cdir->d_type = DT_UNKNOWN; It appears that the other two callers of readdir() in dir.c, namely in is_empty_dir() and remove_dir_recurse() already have such special repeat-if-is_dot_or_dotdot() logic built into them, so this was partially lifted from those. If you'd like, I can add another patch in the series with this change so that all readdir() calls in dir.c have such ignore '.' and '..' logic. Or, we could perhaps introduce a new readdir() wrapper that does nothing other than ignore '.' and '..' and have all three of these callsites use that new wrapper. > So dropping the test for number of visited paths would be nicer from > portability's point of view ;-) Yep, makes sense. I already did that in v4, which means it'll continue to pass with or without the above proposed change to read_cached_dir(). ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-12 2:07 ` Elijah Newren @ 2021-05-12 3:17 ` Junio C Hamano 0 siblings, 0 replies; 90+ messages in thread From: Junio C Hamano @ 2021-05-12 3:17 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon Elijah Newren <newren@gmail.com> writes: > If you'd like, I can add another patch in the series with this change > so that all readdir() calls in dir.c have such ignore '.' and '..' > logic. Or, we could perhaps introduce a new readdir() wrapper that > does nothing other than ignore '.' and '..' and have all three of > these callsites use that new wrapper. Yeah, it is good to be consistent (either implementation). >> So dropping the test for number of visited paths would be nicer from >> portability's point of view ;-) > > Yep, makes sense. I already did that in v4, which means it'll > continue to pass with or without the above proposed change to > read_cached_dir(). Yup. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2021-05-08 19:59 ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-08 19:59 ` Elijah Newren via GitGitGadget 2021-05-08 19:59 ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (3 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> In the last commit, we added a testcase showing that the directory traversal machinery sometimes traverses into directories unnecessarily. Here we show that there are cases where it does the opposite: it does not traverse into directories, despite those directories having important files that need to be flagged. Add a testcase showing that `git ls-files -o -i --directory` can omit some of the files it should be listing, and another showing that `git clean -fX` can fail to clean out some of the expected files. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t3001-ls-files-others-exclude.sh | 5 +++++ t/t7300-clean.sh | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index 1ec7cb57c7a8..ac05d1a17931 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,6 +292,11 @@ EOF test_cmp expect actual ' +test_expect_failure 'ls-files with "**" patterns and --directory' ' + # Expectation same as previous test + git ls-files --directory -o -i --exclude "**/a.1" >actual && + test_cmp expect actual +' test_expect_success 'ls-files with "**" patterns and no slashes' ' git ls-files -o -i --exclude "one**a.1" >actual && diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index b7c9898fac5b..74d395838708 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -768,4 +768,23 @@ test_expect_failure 'avoid traversing into ignored directories' ' test_cmp trace.expect trace.relevant ' +test_expect_failure 'traverse into directories that may have ignored entries' ' + test_when_finished rm -f output && + test_create_repo need-to-traverse-into-hierarchy && + ( + cd need-to-traverse-into-hierarchy && + mkdir -p modules/foobar/src/generated && + > modules/foobar/src/generated/code.c && + > modules/foobar/Makefile && + echo "/modules/**/src/generated/" >.gitignore && + + git clean -fX modules/foobar >../output && + + grep Removing ../output && + + test_path_is_missing modules/foobar/src/generated/code.c && + test_path_is_file modules/foobar/Makefile + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2021-05-08 19:59 ` [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget @ 2021-05-08 19:59 ` Elijah Newren via GitGitGadget 2021-05-10 5:48 ` Junio C Hamano 2021-05-08 19:59 ` [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget ` (2 subsequent siblings) 8 siblings, 1 reply; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The show_other_directories case in treat_directory() tried to handle both excludes and untracked files with the same logic, and mishandled both the excludes and the untracked files in the process, in different ways. Split that logic apart, and then focus on the logic for the excludes; a subsequent commit will address the logic for untracked files. For show_other_directories, an excluded directory means that every path underneath that directory will also be excluded. Given that the calling code requested to just show directories when everything under a directory had the same state (that's what the "DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to traverse into such directories and can just immediately mark them as ignored (i.e. as path_excluded). The only reason we cannot just immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag and the possibility that the ignored directory is an empty directory. The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an exception as well, which was wrong. It can sometimes reduce the number of cases where we need to recurse (namely if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able to increase the number of cases where we need to recurse. Fix the logic accordingly. Some sidenotes about possible confusion with dir.c: * "ignored" often refers to an untracked ignore", i.e. a file which is not tracked which matches one of the ignore/exclusion rules. But you can also have a "tracked ignore", a tracked file that happens to match one of the ignore/exclusion rules and which dir.c has to worry about since "git ls-files -c -i" is supposed to list them. * The dir code often uses "ignored" and "excluded" interchangeably, which you need to keep in mind while reading the code. Sadly, though, it can get very confusing since ignore rules can have exclusions, as in the last of the following .gitignore rules: .gitignore *~ *.log !settings.log In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE) will be true due the the '!' negating the rule. Someone might refer to this as "excluded". That means the file 'settings.log' will not match, and thus not be ignored. So we won't return path_excluded for it. So it's an exclude rule that prevents the file from being an exclude. The non-excluded rules are the ones that result in files being excludes. Great fun, eh? Sometimes it feels like dir.c needs its own glossary with its many definitions, including the multiply-defined terms. Reported-by: Jason Gore <Jason.Gore@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 44 +++++++++++++++++++++++++++++--------------- t/t7300-clean.sh | 2 +- 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/dir.c b/dir.c index dfb174227b36..3f2cfef2c2bb 100644 --- a/dir.c +++ b/dir.c @@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } /* This is the "show_other_directories" case */ + assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* * If we have a pathspec which could match something _below_ this @@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir, if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC) return path_recurse; + /* Special cases for where this directory is excluded/ignored */ + if (excluded) { + /* + * In the show_other_directories case, if we're not + * hiding empty directories, there is no need to + * recurse into an ignored directory. + */ + if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + return path_excluded; + + /* + * Even if we are hiding empty directories, we can still avoid + * recursing into ignored directories for DIR_SHOW_IGNORED_TOO + * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. + */ + if ((dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) + return path_excluded; + } + /* - * Other than the path_recurse case immediately above, we only need - * to recurse into untracked/ignored directories if either of the - * following bits is set: + * Other than the path_recurse case above, we only need to + * recurse into untracked directories if either of the following + * bits is set: * - DIR_SHOW_IGNORED_TOO (because then we need to determine if * there are ignored entries below) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ - if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) - return excluded ? path_excluded : path_untracked; - - /* - * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid - * recursing into ignored directories if the path is excluded and - * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. - */ - if (excluded && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) - return path_excluded; + if (!excluded && + !(dir->flags & (DIR_SHOW_IGNORED_TOO | + DIR_HIDE_EMPTY_DIRECTORIES))) { + return path_untracked; + } /* * Even if we don't want to know all the paths under an untracked or diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 74d395838708..a1d695ee9fe9 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' -test_expect_failure 'avoid traversing into ignored directories' ' +test_expect_success 'avoid traversing into ignored directories' ' test_when_finished rm -f output error trace.* && test_create_repo avoid-traversing-deep-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory 2021-05-08 19:59 ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-10 5:48 ` Junio C Hamano 2021-05-11 17:57 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-10 5:48 UTC (permalink / raw) To: Elijah Newren via GitGitGadget Cc: git, Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > Some sidenotes about possible confusion with dir.c: Thanks for working on untangling this mess ;-) > * "ignored" often refers to an untracked ignore", i.e. a file which is > not tracked which matches one of the ignore/exclusion rules. But you > can also have a "tracked ignore", a tracked file that happens to match > one of the ignore/exclusion rules and which dir.c has to worry about > since "git ls-files -c -i" is supposed to list them. OK. This is to find a pattern in .gitignore that is too broad (i.e. if the path were to be added as a new thing today, it would require "add -f"), right? The combination of "-i -c" does make sense for that purpose. > * The dir code often uses "ignored" and "excluded" interchangeably, > which you need to keep in mind while reading the code. True. In tree .gitignore files are to hold exclude patterns, and per repository personal exclude file is called $GIT_DIR/info/exclude which is confusing. > Sadly, though, > it can get very confusing since ignore rules can have exclusions, as > in the last of the following .gitignore rules: > .gitignore > *~ > *.log > !settings.log > In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE) > will be true due the the '!' negating the rule. Someone might refer > to this as "excluded". That one I've never heard of. As far as I am concerned, that is a negative exclude pattern. I do wish we started the project with .gitignore files and $GIT_DIR/info/ignore both of which holds ignore patterns and negative ignore patterns from day one, but the boat sailed long time ago. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory 2021-05-10 5:48 ` Junio C Hamano @ 2021-05-11 17:57 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-11 17:57 UTC (permalink / raw) To: Junio C Hamano Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Sun, May 9, 2021 at 10:48 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > Some sidenotes about possible confusion with dir.c: > > Thanks for working on untangling this mess ;-) > > > * "ignored" often refers to an untracked ignore", i.e. a file which is > > not tracked which matches one of the ignore/exclusion rules. But you > > can also have a "tracked ignore", a tracked file that happens to match > > one of the ignore/exclusion rules and which dir.c has to worry about > > since "git ls-files -c -i" is supposed to list them. > > OK. This is to find a pattern in .gitignore that is too broad > (i.e. if the path were to be added as a new thing today, it would > require "add -f"), right? The combination of "-i -c" does make > sense for that purpose. > > > * The dir code often uses "ignored" and "excluded" interchangeably, > > which you need to keep in mind while reading the code. > > True. In tree .gitignore files are to hold exclude patterns, and > per repository personal exclude file is called $GIT_DIR/info/exclude > which is confusing. > > > Sadly, though, > > it can get very confusing since ignore rules can have exclusions, as > > in the last of the following .gitignore rules: > > .gitignore > > *~ > > *.log > > !settings.log > > In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE) > > will be true due the the '!' negating the rule. Someone might refer > > to this as "excluded". > > That one I've never heard of. As far as I am concerned, that is a > negative exclude pattern. Oops, I was mixing up negative exclude patterns and negative (or excluded) pathspecs. So "exclude" can refer to "ignored" files, or be used in "PATHSPEC_EXCLUDE" for excluded pathspecs. ...and there's another way it's used. "exclude" can also be used to refer to "exclude" patterns, meaning the patterns that .gitignore (and related files) use. However, .git/info/sparse-checkout re-used these same rulesets, but then used them to determine path *inclusion*. At my request, Stolee mostly fixed that up in 65edd96aec ("treewide: rename 'exclude' methods to 'pattern'", 2019-09-03) but you can still occasionally find a code comment referring to an "exclude" pattern that might actually be used by the sparse-checkout stuff as an inclusion rule. And then we have a myriad of other variables and comments with "excl" in their name that might be derived from any of the above three...and it's sometimes difficult for me to remember which one of the concepts such a derived variable or comment might be referring to. *sigh* > I do wish we started the project with .gitignore files and > $GIT_DIR/info/ignore both of which holds ignore patterns and > negative ignore patterns from day one, but the boat sailed > long time ago. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2021-05-08 19:59 ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-08 19:59 ` Elijah Newren via GitGitGadget 2021-05-08 19:59 ` [PATCH v3 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A directory that is untracked does not imply that all files under it should be categorized as untracked; in particular, if the caller is interested in ignored files, many files or directories underneath the untracked directory may be ignored. We previously partially handled this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED. It was not obvious, though, because the logic for untracked and excluded files had been fused together making it harder to reason about. The previous commit split that logic out, making it easier to notice that DIR_SHOW_IGNORED was missing. Add it. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 10 ++++++---- t/t3001-ls-files-others-exclude.sh | 2 +- t/t7300-clean.sh | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index 3f2cfef2c2bb..f5d9732d9e68 100644 --- a/dir.c +++ b/dir.c @@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* * Other than the path_recurse case above, we only need to - * recurse into untracked directories if either of the following + * recurse into untracked directories if any of the following * bits is set: - * - DIR_SHOW_IGNORED_TOO (because then we need to determine if - * there are ignored entries below) + * - DIR_SHOW_IGNORED (because then we need to determine if + * there are ignored entries below) + * - DIR_SHOW_IGNORED_TOO (same as above) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ if (!excluded && - !(dir->flags & (DIR_SHOW_IGNORED_TOO | + !(dir->flags & (DIR_SHOW_IGNORED | + DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) { return path_untracked; } diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index ac05d1a17931..516c95ea0e82 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,7 +292,7 @@ EOF test_cmp expect actual ' -test_expect_failure 'ls-files with "**" patterns and --directory' ' +test_expect_success 'ls-files with "**" patterns and --directory' ' # Expectation same as previous test git ls-files --directory -o -i --exclude "**/a.1" >actual && test_cmp expect actual diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index a1d695ee9fe9..751764c0f1ae 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -768,7 +768,7 @@ test_expect_success 'avoid traversing into ignored directories' ' test_cmp trace.expect trace.relevant ' -test_expect_failure 'traverse into directories that may have ignored entries' ' +test_expect_success 'traverse into directories that may have ignored entries' ' test_when_finished rm -f output && test_create_repo need-to-traverse-into-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v3 8/8] dir: update stale description of treat_directory() 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2021-05-08 19:59 ` [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget @ 2021-05-08 19:59 ` Derrick Stolee via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 8 siblings, 0 replies; 90+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-05-08 19:59 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Elijah Newren, Derrick Stolee From: Derrick Stolee <stolee@gmail.com> The documentation comment for treat_directory() was originally written in 095952 (Teach directory traversal about subprojects, 2007-04-11) which was before the 'struct dir_struct' split its bitfield of named options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct dir_struct into a single variable, 2009-02-16). When those flags changed, the comment became stale, since members like 'show_other_directories' transitioned into flags like DIR_SHOW_OTHER_DIRECTORIES. Update the comments for treat_directory() to use these flag names rather than the old member names. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> --- dir.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/dir.c b/dir.c index f5d9732d9e68..896a9a62b2c7 100644 --- a/dir.c +++ b/dir.c @@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, * Case 3: if we didn't have it in the index previously, we * have a few sub-cases: * - * (a) if "show_other_directories" is true, we show it as - * just a directory, unless "hide_empty_directories" is + * (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as + * just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is * also true, in which case we need to check if it contains any * untracked and / or ignored files. - * (b) if it looks like a git directory, and we don't have - * 'no_gitlinks' set we treat it as a gitlink, and show it - * as a directory. + * (b) if it looks like a git directory and we don't have the + * DIR_NO_GITLINKS flag, then we treat it as a gitlink, and + * show it as a directory. * (c) otherwise, we recurse into it. */ static enum path_treatment treat_directory(struct dir_struct *dir, @@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir, return path_recurse; } - /* This is the "show_other_directories" case */ assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* @@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* Special cases for where this directory is excluded/ignored */ if (excluded) { /* - * In the show_other_directories case, if we're not + * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not * hiding empty directories, there is no need to * recurse into an ignored directory. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v4 0/8] Directory traversal fixes 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (7 preceding siblings ...) 2021-05-08 19:59 ` [PATCH v3 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget ` (8 more replies) 8 siblings, 9 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren This patchset fixes a few directory traversal issues, where fill_directory() would traverse into directories that it shouldn't and not traverse into directories that it should (one of which was originally reported on this list at [1]). And it includes a few cleanups Changes since v3, includes numerous cleanups suggested by Junio and Jeff (thanks for the reviews!): * Removed the RFC labels, but if folks want a warning instead of a die on ls-files -i (see patch 3), let me know * Include the path passed to read_directory() in the printed trace2 statistics * Print trace2 statistics before calling trace2_region_leave() * Make sure to convert both trace_performance_leave() calls * testcase style fixes * left a comment that directories-visited:1 referred to the toplevel directory * fixed up some commit message comments about "exclude" and mentioned yet another way it can be confusing. As noted in previous versions of this series, if folks would prefer ls-files -i to continue running but print a warning rather than making it an error as I did in this series, let me know. Also, if anyone has any ideas about a better place to put the "Some sidenotes" from the sixth commit message rather than keeping them in a random commit message, that might be helpful too. [1] See https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/ or alternatively https://github.com/git-for-windows/git/issues/2732. Derrick Stolee (1): dir: update stale description of treat_directory() Elijah Newren (7): dir: convert trace calls to trace2 equivalents dir: report number of visited directories and paths with trace2 ls-files: error out on -i unless -o or -c are specified t7300: add testcase showing unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal dir: avoid unnecessary traversal into ignored directory dir: traverse into untracked directories if they may have ignored subfiles builtin/ls-files.c | 3 + dir.c | 112 +++++++++++----- dir.h | 4 + t/t1306-xdg-files.sh | 2 +- t/t3001-ls-files-others-exclude.sh | 5 + t/t3003-ls-files-exclude.sh | 4 +- t/t7063-status-untracked-cache.sh | 206 +++++++++++++++++------------ t/t7300-clean.sh | 42 ++++++ t/t7519-status-fsmonitor.sh | 8 +- 9 files changed, 259 insertions(+), 127 deletions(-) base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v4 Pull-Request: https://github.com/git/git/pull/1020 Range-diff vs v3: 1: 9f1c0d78d739 ! 1: 9204e36b7e90 [RFC] dir: convert trace calls to trace2 equivalents @@ Metadata Author: Elijah Newren <newren@gmail.com> ## Commit message ## - [RFC] dir: convert trace calls to trace2 equivalents + dir: convert trace calls to trace2 equivalents Signed-off-by: Elijah Newren <newren@gmail.com> @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st } +static void trace2_read_directory_statistics(struct dir_struct *dir, -+ struct repository *repo) ++ struct repository *repo, ++ const char *path) +{ + if (!dir->untracked) + return; ++ trace2_data_string("read_directory", repo, "path", path); + trace2_data_intmax("read_directory", repo, + "node-creation", dir->untracked->dir_created); + trace2_data_intmax("read_directory", repo, @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st + trace2_region_enter("dir", "read_directory", istate->repo); if (has_symlink_leading_path(path, len)) { - trace_performance_leave("read directory %.*s", len, path); +- trace_performance_leave("read directory %.*s", len, path); ++ trace2_region_leave("dir", "read_directory", istate->repo); + return dir->nr; + } + @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate, QSORT(dir->entries, dir->nr, cmp_dir_entry); QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); - trace_performance_leave("read directory %.*s", len, path); ++ if (trace2_is_enabled()) { ++ struct strbuf tmp = STRBUF_INIT; ++ strbuf_add(&tmp, path, len); ++ trace2_read_directory_statistics(dir, istate->repo, tmp.buf); ++ strbuf_release(&tmp); ++ } ++ + trace2_region_leave("dir", "read_directory", istate->repo); if (dir->untracked) { static int force_untracked_cache = -1; @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate, } } + -+ if (trace2_is_enabled()) -+ trace2_read_directory_statistics(dir, istate->repo); return dir->nr; } @@ t/t7063-status-untracked-cache.sh: iuc () { return $ret } -+get_relevant_traces() { ++get_relevant_traces () { + # From the GIT_TRACE2_PERF data of the form + # $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT + # extract the $RELEVANT_STAT fields. We don't care about region_enter + # or region_leave, or stats for things outside read_directory. + INPUT_FILE=$1 + OUTPUT_FILE=$2 -+ grep data.*read_directo $INPUT_FILE \ -+ | cut -d "|" -f 9 \ -+ >$OUTPUT_FILE ++ grep data.*read_directo $INPUT_FILE | ++ cut -d "|" -f 9 \ ++ >"$OUTPUT_FILE" +} + + @@ t/t7063-status-untracked-cache.sh: EOF -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 -+ ..node-creation:3 -+ ..gitignore-invalidation:1 -+ ..directory-invalidation:0 -+ ..opendir:4 ++ ....path: ++ ....node-creation:3 ++ ....gitignore-invalidation:1 ++ ....directory-invalidation:0 ++ ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache after fi -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 -+ ..node-creation:0 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:0 -+ ..opendir:0 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:0 ++ ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: A two -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 1 -+ ..node-creation:0 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:1 -+ ..opendir:1 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:1 ++ ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: A two -gitignore invalidation: 1 -directory invalidation: 1 -opendir: 4 -+ ..node-creation:0 -+ ..gitignore-invalidation:1 -+ ..directory-invalidation:1 -+ ..opendir:4 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:1 ++ ....directory-invalidation:1 ++ ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: A two -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 -+ ..node-creation:0 -+ ..gitignore-invalidation:1 -+ ..directory-invalidation:0 -+ ..opendir:4 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:1 ++ ....directory-invalidation:0 ++ ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: A one -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 -+ ..node-creation:0 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:0 -+ ..opendir:1 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:0 ++ ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: A two -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 -+ ..node-creation:0 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:0 -+ ..opendir:1 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:0 ++ ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: test_expect_success 'status after commit' ' -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 2 -+ ..node-creation:0 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:0 -+ ..opendir:2 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:0 ++ ....opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status with -gitignore invalidation: 1 -directory invalidation: 2 -opendir: 2 -+ ..node-creation:0 -+ ..gitignore-invalidation:1 -+ ..directory-invalidation:2 -+ ..opendir:2 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:1 ++ ....directory-invalidation:2 ++ ....opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status again -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 -+ ..node-creation:0 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:0 -+ ..opendir:0 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:0 ++ ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: test_expect_success 'test sparse status with -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 3 -+ ..node-creation:2 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:1 -+ ..opendir:3 ++ ....path: ++ ....node-creation:2 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:1 ++ ....opendir:3 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant @@ t/t7063-status-untracked-cache.sh: EOF -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 -+ ..node-creation:0 -+ ..gitignore-invalidation:0 -+ ..directory-invalidation:0 -+ ..opendir:0 ++ ....path: ++ ....node-creation:0 ++ ....gitignore-invalidation:0 ++ ....directory-invalidation:0 ++ ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant 2: 8b511f228af8 ! 2: 6939253be825 [RFC] dir: report number of visited directories and paths with trace2 @@ Metadata Author: Elijah Newren <newren@gmail.com> ## Commit message ## - [RFC] dir: report number of visited directories and paths with trace2 + dir: report number of visited directories and paths with trace2 Provide more statistics in trace2 output that include the number of directories and total paths visited by the directory traversal logic. @@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di if (state > dir_state) dir_state = state; -@@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d - static void trace2_read_directory_statistics(struct dir_struct *dir, - struct repository *repo) +@@ dir.c: static void trace2_read_directory_statistics(struct dir_struct *dir, + struct repository *repo, + const char *path) { + trace2_data_intmax("read_directory", repo, + "directories-visited", dir->visited_directories); @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st + "paths-visited", dir->visited_paths); if (!dir->untracked) return; - trace2_data_intmax("read_directory", repo, + trace2_data_string("read_directory", repo, "path", path); @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked; @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate, + dir->visited_directories = 0; if (has_symlink_leading_path(path, len)) { - trace_performance_leave("read directory %.*s", len, path); + trace2_region_leave("dir", "read_directory", istate->repo); ## dir.h ## @@ dir.h: struct dir_struct { @@ dir.h: struct dir_struct { /*Count the number of slashes for string s*/ ## t/t7063-status-untracked-cache.sh ## -@@ t/t7063-status-untracked-cache.sh: get_relevant_traces() { +@@ t/t7063-status-untracked-cache.sh: get_relevant_traces () { INPUT_FILE=$1 OUTPUT_FILE=$2 - grep data.*read_directo $INPUT_FILE \ -+ | grep -v visited \ - | cut -d "|" -f 9 \ - >$OUTPUT_FILE + grep data.*read_directo $INPUT_FILE | +- cut -d "|" -f 9 \ ++ cut -d "|" -f 9 | ++ grep -v visited \ + >"$OUTPUT_FILE" } + 3: 44a1322c4402 ! 3: 8d0ca8104be6 [RFC] ls-files: error out on -i unless -o or -c are specified @@ Metadata Author: Elijah Newren <newren@gmail.com> ## Commit message ## - [RFC] ls-files: error out on -i unless -o or -c are specified + ls-files: error out on -i unless -o or -c are specified ls-files --ignored can be used together with either --others or --cached. After being perplexed for a bit and digging in to the code, I assumed that ls-files -i was just broken and not printing anything and - had a nice patch ready to submit when I finally realized that -i can be + I had a nice patch ready to submit when I finally realized that -i can be used with --cached to find tracked ignores. While that was a mistake on my part, and a careful reading of the @@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm ps_matched = xcalloc(pathspec.nr, 1); + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) -+ die("ls-files --ignored is usually used with --others, but --cached is the default. Please specify which you want."); ++ die("ls-files -i must be used with either -o or -c"); + if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given) die("ls-files --ignored needs some exclude pattern"); 4: dc3d3f247141 ! 4: 317abab3571e t7300: add testcase showing unnecessary traversal into ignored directory @@ t/t7300-clean.sh: test_expect_success 'clean untracked paths by pathspec' ' + git clean -ffdxn -e untracked + ) && + -+ grep data.*read_directo.*visited trace.output \ -+ | cut -d "|" -f 9 >trace.relevant && ++ # Make sure we only visited into the top-level directory, and did ++ # not traverse into the "untracked" subdirectory since it was excluded ++ grep data.*read_directo.*directories-visited trace.output | ++ cut -d "|" -f 9 >trace.relevant && + cat >trace.expect <<-EOF && -+ directories-visited:1 -+ paths-visited:4 ++ ..directories-visited:1 + EOF + test_cmp trace.expect trace.relevant +' 5: 73b03a1e8e05 = 5: 5eb019327b57 t3001, t7300: add testcase showcasing missed directory traversal 6: 66ffc7f02d08 ! 6: 89cc01ef8598 dir: avoid unnecessary traversal into ignored directory @@ Commit message since "git ls-files -c -i" is supposed to list them. * The dir code often uses "ignored" and "excluded" interchangeably, - which you need to keep in mind while reading the code. Sadly, though, - it can get very confusing since ignore rules can have exclusions, as - in the last of the following .gitignore rules: - .gitignore + which you need to keep in mind while reading the code. + + * "exclude" is used multiple ways in the code: + + * As noted above, "exclude" is often a synonym for "ignored". + + * The logic for parsing .gitignore files was re-used in + .git/info/sparse-checkout, except there it is used to mark paths that + the user wants to *keep*. This was mostly addressed by commit + 65edd96aec ("treewide: rename 'exclude' methods to 'pattern'", + 2019-09-03), but every once in a while you'll find a comment about + "exclude" referring to these patterns that might in fact be in use + by the sparse-checkout machinery for inclusion rules. + + * The word "EXCLUDE" is also used for pathspec negation, as in + (pathspec->items[3].magic & PATHSPEC_EXCLUDE) + Thus if a user had a .gitignore file containing *~ *.log !settings.log - In the last entry above, (pathspec->items[3].magic & PATHSPEC_EXCLUDE) - will be true due the the '!' negating the rule. Someone might refer - to this as "excluded". That means the file 'settings.log' will not - match, and thus not be ignored. So we won't return path_excluded for - it. So it's an exclude rule that prevents the file from being an - exclude. The non-excluded rules are the ones that result in files - being excludes. Great fun, eh? + And then ran + git add -- 'settings.*' ':^settings.log' + Then :^settings.log is a pathspec negation making settings.log not + be requested to be added even though all other settings.* files are + being added. Also, !settings.log in the gitignore file is a negative + exclude pattern meaning that settings.log is normally a file we + want to track even though all other *.log files are ignored. Sometimes it feels like dir.c needs its own glossary with its many definitions, including the multiply-defined terms. 7: acde436b220e = 7: 4a561e1229e4 dir: traverse into untracked directories if they may have ignored subfiles 8: 57135c357774 = 8: 2945e749f5e3 dir: update stale description of treat_directory() -- gitgitgadget ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 19:06 ` Jeff Hostetler 2021-05-11 18:34 ` [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget ` (7 subsequent siblings) 8 siblings, 1 reply; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 43 +++++-- t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ t/t7519-status-fsmonitor.sh | 8 +- 3 files changed, 155 insertions(+), 101 deletions(-) diff --git a/dir.c b/dir.c index 3474e67e8f3c..122fcbffdf89 100644 --- a/dir.c +++ b/dir.c @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d return root; } +static void trace2_read_directory_statistics(struct dir_struct *dir, + struct repository *repo, + const char *path) +{ + if (!dir->untracked) + return; + trace2_data_string("read_directory", repo, "path", path); + trace2_data_intmax("read_directory", repo, + "node-creation", dir->untracked->dir_created); + trace2_data_intmax("read_directory", repo, + "gitignore-invalidation", + dir->untracked->gitignore_invalidated); + trace2_data_intmax("read_directory", repo, + "directory-invalidation", + dir->untracked->dir_invalidated); + trace2_data_intmax("read_directory", repo, + "opendir", dir->untracked->dir_opened); +} + int read_directory(struct dir_struct *dir, struct index_state *istate, const char *path, int len, const struct pathspec *pathspec) { struct untracked_cache_dir *untracked; - trace_performance_enter(); + trace2_region_enter("dir", "read_directory", istate->repo); if (has_symlink_leading_path(path, len)) { - trace_performance_leave("read directory %.*s", len, path); + trace2_region_leave("dir", "read_directory", istate->repo); return dir->nr; } @@ -2784,23 +2803,20 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, QSORT(dir->entries, dir->nr, cmp_dir_entry); QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); - trace_performance_leave("read directory %.*s", len, path); + if (trace2_is_enabled()) { + struct strbuf tmp = STRBUF_INIT; + strbuf_add(&tmp, path, len); + trace2_read_directory_statistics(dir, istate->repo, tmp.buf); + strbuf_release(&tmp); + } + + trace2_region_leave("dir", "read_directory", istate->repo); if (dir->untracked) { static int force_untracked_cache = -1; - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); if (force_untracked_cache < 0) force_untracked_cache = git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); - trace_printf_key(&trace_untracked_stats, - "node creation: %u\n" - "gitignore invalidation: %u\n" - "directory invalidation: %u\n" - "opendir: %u\n", - dir->untracked->dir_created, - dir->untracked->gitignore_invalidated, - dir->untracked->dir_invalidated, - dir->untracked->dir_opened); if (force_untracked_cache && dir->untracked == istate->untracked && (dir->untracked->dir_opened || @@ -2811,6 +2827,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, FREE_AND_NULL(dir->untracked); } } + return dir->nr; } diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index accefde72fb1..9710d33b3cd6 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -57,6 +57,19 @@ iuc () { return $ret } +get_relevant_traces () { + # From the GIT_TRACE2_PERF data of the form + # $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT + # extract the $RELEVANT_STAT fields. We don't care about region_enter + # or region_leave, or stats for things outside read_directory. + INPUT_FILE=$1 + OUTPUT_FILE=$2 + grep data.*read_directo $INPUT_FILE | + cut -d "|" -f 9 \ + >"$OUTPUT_FILE" +} + + test_lazy_prereq UNTRACKED_CACHE ' { git update-index --test-untracked-cache; ret=$?; } && test $ret -ne 1 @@ -129,19 +142,21 @@ EOF test_expect_success 'status first time (empty cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 3 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ....path: + ....node-creation:3 + ....gitignore-invalidation:1 + ....directory-invalidation:0 + ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after first status' ' @@ -151,19 +166,21 @@ test_expect_success 'untracked cache after first status' ' test_expect_success 'status second time (fully populated cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after second status' ' @@ -174,8 +191,8 @@ test_expect_success 'untracked cache after second status' ' test_expect_success 'modify in root directory, one dir invalidation' ' avoid_racy && : >four && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -189,13 +206,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 1 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:1 + ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -223,8 +242,8 @@ EOF test_expect_success 'new .gitignore invalidates recursively' ' avoid_racy && echo four >.gitignore && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -238,13 +257,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 1 -opendir: 4 + ....path: + ....node-creation:0 + ....gitignore-invalidation:1 + ....directory-invalidation:1 + ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -272,8 +293,8 @@ EOF test_expect_success 'new info/exclude invalidates everything' ' avoid_racy && echo three >>.git/info/exclude && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -285,13 +306,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ....path: + ....node-creation:0 + ....gitignore-invalidation:1 + ....directory-invalidation:0 + ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -330,8 +353,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -343,13 +366,15 @@ A one EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -389,8 +414,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -402,13 +427,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -438,8 +465,8 @@ test_expect_success 'set up for sparse checkout testing' ' ' test_expect_success 'status after commit' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -448,13 +475,15 @@ test_expect_success 'status after commit' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 2 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after commit' ' @@ -496,9 +525,9 @@ test_expect_success 'create/modify files, some of which are gitignored' ' ' test_expect_success 'test sparse status with untracked cache' ' - : >../trace && + : >../trace.output && avoid_racy && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -509,13 +538,15 @@ test_expect_success 'test sparse status with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 2 -opendir: 2 + ....path: + ....node-creation:0 + ....gitignore-invalidation:1 + ....directory-invalidation:2 + ....opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after status' ' @@ -539,8 +570,8 @@ EOF test_expect_success 'test sparse status again with untracked cache' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -551,13 +582,15 @@ test_expect_success 'test sparse status again with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'set up for test of subdir and sparse checkouts' ' @@ -568,8 +601,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' ' test_expect_success 'test sparse status with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -581,13 +614,15 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 2 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 3 + ....path: + ....node-creation:2 + ....gitignore-invalidation:0 + ....directory-invalidation:1 + ....opendir:3 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' @@ -616,19 +651,21 @@ EOF test_expect_success 'test sparse status again with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'move entry in subdir from untracked to cached' ' diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index 45d025f96010..637391c6ce46 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' git config core.fsmonitor .git/hooks/fsmonitor-test && git update-index --untracked-cache && git update-index --fsmonitor && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \ git status && test-tool dump-untracked-cache >../before ) && @@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' EOF ( cd dot-git && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \ git status && test-tool dump-untracked-cache >../after ) && - grep "directory invalidation" trace-before >>before && - grep "directory invalidation" trace-after >>after && + grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before && + grep "directory-invalidation" trace-after | cut -d"|" -f 9 >>after && # UNTR extension unchanged, dir invalidation count unchanged test_cmp before after ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents 2021-05-11 18:34 ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget @ 2021-05-11 19:06 ` Jeff Hostetler 2021-05-11 20:12 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Jeff Hostetler @ 2021-05-11 19:06 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > dir.c | 43 +++++-- > t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ > t/t7519-status-fsmonitor.sh | 8 +- > 3 files changed, 155 insertions(+), 101 deletions(-) > > diff --git a/dir.c b/dir.c > index 3474e67e8f3c..122fcbffdf89 100644 > --- a/dir.c > +++ b/dir.c > @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > return root; > } > > +static void trace2_read_directory_statistics(struct dir_struct *dir, > + struct repository *repo, > + const char *path) > +{ > + if (!dir->untracked) > + return; > + trace2_data_string("read_directory", repo, "path", path); I'm probably just nit-picking here, but should this look more like: if (path && *path) trace2_data_string(...) if (!dir->untracked) return; Then when you add the visitied fields in the next commit, you'll have the path with them (when present). (and it would let you optionally avoid the tmp strbuf in the caller.) Jeff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents 2021-05-11 19:06 ` Jeff Hostetler @ 2021-05-11 20:12 ` Elijah Newren 2021-05-11 23:12 ` Jeff Hostetler 0 siblings, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-11 20:12 UTC (permalink / raw) To: Jeff Hostetler Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > > On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > dir.c | 43 +++++-- > > t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ > > t/t7519-status-fsmonitor.sh | 8 +- > > 3 files changed, 155 insertions(+), 101 deletions(-) > > > > diff --git a/dir.c b/dir.c > > index 3474e67e8f3c..122fcbffdf89 100644 > > --- a/dir.c > > +++ b/dir.c > > @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > > return root; > > } > > > > +static void trace2_read_directory_statistics(struct dir_struct *dir, > > + struct repository *repo, > > + const char *path) > > +{ > > + if (!dir->untracked) > > + return; > > + trace2_data_string("read_directory", repo, "path", path); > > I'm probably just nit-picking here, but should this look more like: nit-picking and questions are totally fine. :-) Thanks for reviewing. > > if (path && *path) > trace2_data_string(...) path is always non-NULL (it'd be an error to call read_directory() with a NULL path). So the first part of the check isn't meaningful for this particular code. The second half is interesting. Do we want to omit the path when it happens to be the toplevel directory (the case where !*path)? The original trace_performance_leave() calls certainly didn't, and I was just trying to provide the same info they do, as you suggested. I guess people could determine the path by knowing that the code doesn't print it when it's empty, but do we want trace2 users to need to read the code to figure out statistics and info? > if (!dir->untracked) > return; > > Then when you add the visitied fields in the next commit, > you'll have the path with them (when present). There is always a path with them, it's just that the empty string denotes the toplevel directory. > (and it would let you optionally avoid the tmp strbuf in > the caller.) The path in read_directory() is not necessarily NUL-delimited, so attempting to use it as-is, or even with your checks, would cause us to possibly print garbage and do out-of-bounds reads. We need the tmp strbuf. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents 2021-05-11 20:12 ` Elijah Newren @ 2021-05-11 23:12 ` Jeff Hostetler 2021-05-12 0:44 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Jeff Hostetler @ 2021-05-11 23:12 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On 5/11/21 4:12 PM, Elijah Newren wrote: > On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote: >> >> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote: >>> From: Elijah Newren <newren@gmail.com> >>> >>> Signed-off-by: Elijah Newren <newren@gmail.com> >>> --- >>> dir.c | 43 +++++-- >>> t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ >>> t/t7519-status-fsmonitor.sh | 8 +- >>> 3 files changed, 155 insertions(+), 101 deletions(-) >>> >>> diff --git a/dir.c b/dir.c >>> index 3474e67e8f3c..122fcbffdf89 100644 >>> --- a/dir.c >>> +++ b/dir.c >>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d >>> return root; >>> } >>> >>> +static void trace2_read_directory_statistics(struct dir_struct *dir, >>> + struct repository *repo, >>> + const char *path) >>> +{ >>> + if (!dir->untracked) >>> + return; >>> + trace2_data_string("read_directory", repo, "path", path); >> >> I'm probably just nit-picking here, but should this look more like: > > nit-picking and questions are totally fine. :-) Thanks for reviewing. > >> >> if (path && *path) >> trace2_data_string(...) > > path is always non-NULL (it'd be an error to call read_directory() > with a NULL path). So the first part of the check isn't meaningful > for this particular code. The second half is interesting. Do we want > to omit the path when it happens to be the toplevel directory (the > case where !*path)? The original trace_performance_leave() calls > certainly didn't, and I was just trying to provide the same info they > do, as you suggested. I guess people could determine the path by > knowing that the code doesn't print it when it's empty, but do we want > trace2 users to need to read the code to figure out statistics and > info? that's fine. it might be easier to just always print it (even if blank) so that post-processors know that rather than have to assume it. > >> if (!dir->untracked) >> return; >> >> Then when you add the visitied fields in the next commit, >> you'll have the path with them (when present). > > There is always a path with them, it's just that the empty string > denotes the toplevel directory. > >> (and it would let you optionally avoid the tmp strbuf in >> the caller.) > > The path in read_directory() is not necessarily NUL-delimited, so > attempting to use it as-is, or even with your checks, would cause us > to possibly print garbage and do out-of-bounds reads. We need the tmp > strbuf. > I just meant, "if (!len) pass NULL, else build and pass tmp.buf". but i'm nit-picking again. Jeff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents 2021-05-11 23:12 ` Jeff Hostetler @ 2021-05-12 0:44 ` Elijah Newren 2021-05-12 12:26 ` Jeff Hostetler 0 siblings, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-12 0:44 UTC (permalink / raw) To: Jeff Hostetler Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Tue, May 11, 2021 at 4:12 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > > On 5/11/21 4:12 PM, Elijah Newren wrote: > > On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > >> > >> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote: > >>> From: Elijah Newren <newren@gmail.com> > >>> > >>> Signed-off-by: Elijah Newren <newren@gmail.com> > >>> --- > >>> dir.c | 43 +++++-- > >>> t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ > >>> t/t7519-status-fsmonitor.sh | 8 +- > >>> 3 files changed, 155 insertions(+), 101 deletions(-) > >>> > >>> diff --git a/dir.c b/dir.c > >>> index 3474e67e8f3c..122fcbffdf89 100644 > >>> --- a/dir.c > >>> +++ b/dir.c > >>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > >>> return root; > >>> } > >>> > >>> +static void trace2_read_directory_statistics(struct dir_struct *dir, > >>> + struct repository *repo, > >>> + const char *path) > >>> +{ > >>> + if (!dir->untracked) > >>> + return; > >>> + trace2_data_string("read_directory", repo, "path", path); > >> > >> I'm probably just nit-picking here, but should this look more like: > > > > nit-picking and questions are totally fine. :-) Thanks for reviewing. > > > >> > >> if (path && *path) > >> trace2_data_string(...) > > > > path is always non-NULL (it'd be an error to call read_directory() > > with a NULL path). So the first part of the check isn't meaningful > > for this particular code. The second half is interesting. Do we want > > to omit the path when it happens to be the toplevel directory (the > > case where !*path)? The original trace_performance_leave() calls > > certainly didn't, and I was just trying to provide the same info they > > do, as you suggested. I guess people could determine the path by > > knowing that the code doesn't print it when it's empty, but do we want > > trace2 users to need to read the code to figure out statistics and > > info? > > that's fine. it might be easier to just always print it (even if > blank) so that post-processors know that rather than have to assume > it. > > > > >> if (!dir->untracked) > >> return; > >> > >> Then when you add the visitied fields in the next commit, > >> you'll have the path with them (when present). > > > > There is always a path with them, it's just that the empty string > > denotes the toplevel directory. > > > >> (and it would let you optionally avoid the tmp strbuf in > >> the caller.) > > > > The path in read_directory() is not necessarily NUL-delimited, so > > attempting to use it as-is, or even with your checks, would cause us > > to possibly print garbage and do out-of-bounds reads. We need the tmp > > strbuf. > > > > I just meant, "if (!len) pass NULL, else build and pass tmp.buf". Ah, gotcha, that's why you were checking non-NULL. However, what about the other case when len is nonzero. Let's say that len = 8 and path points at "filename*%&#)aWholeBunchOfTotalGarbageAfterTheRealFilenameThatShouldNotBeReadOrIncluded\0\0\0\0\0\0\0\0\0\0" ? How do you make it print "filename" and only "filename" without the other stuff without using the tmp strbuf? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents 2021-05-12 0:44 ` Elijah Newren @ 2021-05-12 12:26 ` Jeff Hostetler 2021-05-12 15:24 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Jeff Hostetler @ 2021-05-12 12:26 UTC (permalink / raw) To: Elijah Newren Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On 5/11/21 8:44 PM, Elijah Newren wrote: > On Tue, May 11, 2021 at 4:12 PM Jeff Hostetler <git@jeffhostetler.com> wrote: >> >> On 5/11/21 4:12 PM, Elijah Newren wrote: >>> On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote: >>>> >>>> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote: >>>>> From: Elijah Newren <newren@gmail.com> >>>>> >>>>> Signed-off-by: Elijah Newren <newren@gmail.com> >>>>> --- >>>>> dir.c | 43 +++++-- >>>>> t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ >>>>> t/t7519-status-fsmonitor.sh | 8 +- >>>>> 3 files changed, 155 insertions(+), 101 deletions(-) >>>>> >>>>> diff --git a/dir.c b/dir.c >>>>> index 3474e67e8f3c..122fcbffdf89 100644 >>>>> --- a/dir.c >>>>> +++ b/dir.c >>>>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d >>>>> return root; >>>>> } >>>>> >>>>> +static void trace2_read_directory_statistics(struct dir_struct *dir, >>>>> + struct repository *repo, >>>>> + const char *path) >>>>> +{ >>>>> + if (!dir->untracked) >>>>> + return; >>>>> + trace2_data_string("read_directory", repo, "path", path); >>>> >>>> I'm probably just nit-picking here, but should this look more like: >>> >>> nit-picking and questions are totally fine. :-) Thanks for reviewing. >>> >>>> >>>> if (path && *path) >>>> trace2_data_string(...) >>> >>> path is always non-NULL (it'd be an error to call read_directory() >>> with a NULL path). So the first part of the check isn't meaningful >>> for this particular code. The second half is interesting. Do we want >>> to omit the path when it happens to be the toplevel directory (the >>> case where !*path)? The original trace_performance_leave() calls >>> certainly didn't, and I was just trying to provide the same info they >>> do, as you suggested. I guess people could determine the path by >>> knowing that the code doesn't print it when it's empty, but do we want >>> trace2 users to need to read the code to figure out statistics and >>> info? >> >> that's fine. it might be easier to just always print it (even if >> blank) so that post-processors know that rather than have to assume >> it. >> >>> >>>> if (!dir->untracked) >>>> return; >>>> >>>> Then when you add the visitied fields in the next commit, >>>> you'll have the path with them (when present). >>> >>> There is always a path with them, it's just that the empty string >>> denotes the toplevel directory. >>> >>>> (and it would let you optionally avoid the tmp strbuf in >>>> the caller.) >>> >>> The path in read_directory() is not necessarily NUL-delimited, so >>> attempting to use it as-is, or even with your checks, would cause us >>> to possibly print garbage and do out-of-bounds reads. We need the tmp >>> strbuf. >>> >> >> I just meant, "if (!len) pass NULL, else build and pass tmp.buf". > > Ah, gotcha, that's why you were checking non-NULL. > > However, what about the other case when len is nonzero. Let's say > that len = 8 and path points at > "filename*%&#)aWholeBunchOfTotalGarbageAfterTheRealFilenameThatShouldNotBeReadOrIncluded\0\0\0\0\0\0\0\0\0\0" > ? > > How do you make it print "filename" and only "filename" without the > other stuff without using the tmp strbuf? > I was still saying to use the "strbuf tmp" in the non-zero len case, but just pass NULL (or "") for the len==0 case. Alternatively, since `trace2_read_directory_statistics() a static local function, we could move all of the path manipulation into it. static void emit_stats( struct dir_struct *dir, struct repository *repo, const char* path_buf, size_t path_len) { if (!path_len) trace2_data_string("read_directory", repo, "path", ""); else { struct strbuf tmp = STRBUF_INIT; strbuf_add(&tmp, path_buf, path_len); trace2_data_string("read_directory", repo, "path", tmp.buf); strbuf_release(&tmp); } ... the rest of intmax stats ... } BTW, could we also rename your stats function? I've been trying to keep the "trace2_" prefix reserved for the Trace2 API. Thanks, Jeff ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents 2021-05-12 12:26 ` Jeff Hostetler @ 2021-05-12 15:24 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-12 15:24 UTC (permalink / raw) To: Jeff Hostetler Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon On Wed, May 12, 2021 at 5:26 AM Jeff Hostetler <git@jeffhostetler.com> wrote: > > On 5/11/21 8:44 PM, Elijah Newren wrote: > > On Tue, May 11, 2021 at 4:12 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > >> > >> On 5/11/21 4:12 PM, Elijah Newren wrote: > >>> On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@jeffhostetler.com> wrote: > >>>> > >>>> On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote: > >>>>> From: Elijah Newren <newren@gmail.com> > >>>>> > >>>>> Signed-off-by: Elijah Newren <newren@gmail.com> > >>>>> --- > >>>>> dir.c | 43 +++++-- > >>>>> t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ > >>>>> t/t7519-status-fsmonitor.sh | 8 +- > >>>>> 3 files changed, 155 insertions(+), 101 deletions(-) > >>>>> > >>>>> diff --git a/dir.c b/dir.c > >>>>> index 3474e67e8f3c..122fcbffdf89 100644 > >>>>> --- a/dir.c > >>>>> +++ b/dir.c > >>>>> @@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d > >>>>> return root; > >>>>> } > >>>>> > >>>>> +static void trace2_read_directory_statistics(struct dir_struct *dir, > >>>>> + struct repository *repo, > >>>>> + const char *path) > >>>>> +{ > >>>>> + if (!dir->untracked) > >>>>> + return; > >>>>> + trace2_data_string("read_directory", repo, "path", path); > >>>> > >>>> I'm probably just nit-picking here, but should this look more like: > >>> > >>> nit-picking and questions are totally fine. :-) Thanks for reviewing. > >>> > >>>> > >>>> if (path && *path) > >>>> trace2_data_string(...) > >>> > >>> path is always non-NULL (it'd be an error to call read_directory() > >>> with a NULL path). So the first part of the check isn't meaningful > >>> for this particular code. The second half is interesting. Do we want > >>> to omit the path when it happens to be the toplevel directory (the > >>> case where !*path)? The original trace_performance_leave() calls > >>> certainly didn't, and I was just trying to provide the same info they > >>> do, as you suggested. I guess people could determine the path by > >>> knowing that the code doesn't print it when it's empty, but do we want > >>> trace2 users to need to read the code to figure out statistics and > >>> info? > >> > >> that's fine. it might be easier to just always print it (even if > >> blank) so that post-processors know that rather than have to assume > >> it. > >> > >>> > >>>> if (!dir->untracked) > >>>> return; > >>>> > >>>> Then when you add the visitied fields in the next commit, > >>>> you'll have the path with them (when present). > >>> > >>> There is always a path with them, it's just that the empty string > >>> denotes the toplevel directory. > >>> > >>>> (and it would let you optionally avoid the tmp strbuf in > >>>> the caller.) > >>> > >>> The path in read_directory() is not necessarily NUL-delimited, so > >>> attempting to use it as-is, or even with your checks, would cause us > >>> to possibly print garbage and do out-of-bounds reads. We need the tmp > >>> strbuf. > >>> > >> > >> I just meant, "if (!len) pass NULL, else build and pass tmp.buf". > > > > Ah, gotcha, that's why you were checking non-NULL. > > > > However, what about the other case when len is nonzero. Let's say > > that len = 8 and path points at > > "filename*%&#)aWholeBunchOfTotalGarbageAfterTheRealFilenameThatShouldNotBeReadOrIncluded\0\0\0\0\0\0\0\0\0\0" > > ? > > > > How do you make it print "filename" and only "filename" without the > > other stuff without using the tmp strbuf? > > > > I was still saying to use the "strbuf tmp" in the non-zero len case, > but just pass NULL (or "") for the len==0 case. Ah, now I see what you were saying. Sorry for not getting it earlier. > Alternatively, since `trace2_read_directory_statistics() a static > local function, we could move all of the path manipulation into it. > > static void emit_stats( > struct dir_struct *dir, > struct repository *repo, > const char* path_buf, > size_t path_len) > { > if (!path_len) > trace2_data_string("read_directory", repo, > "path", ""); > else { > struct strbuf tmp = STRBUF_INIT; > strbuf_add(&tmp, path_buf, path_len); > trace2_data_string("read_directory", repo, > "path", tmp.buf); > strbuf_release(&tmp); > } > ... the rest of intmax stats ... > } Makes sense. > BTW, could we also rename your stats function? I've been trying > to keep the "trace2_" prefix reserved for the Trace2 API. Sure, will do. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget ` (6 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Provide more statistics in trace2 output that include the number of directories and total paths visited by the directory traversal logic. Subsequent patches will take advantage of this to ensure we do not unnecessarily traverse into ignored directories. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 8 ++++++++ dir.h | 4 ++++ t/t7063-status-untracked-cache.sh | 3 ++- 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/dir.c b/dir.c index 122fcbffdf89..69b8c9d7f9fb 100644 --- a/dir.c +++ b/dir.c @@ -2440,6 +2440,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only)) goto out; + dir->visited_directories++; if (untracked) untracked->check_only = !!check_only; @@ -2448,6 +2449,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, /* check how the file or directory should be treated */ state = treat_path(dir, untracked, &cdir, istate, &path, baselen, pathspec); + dir->visited_paths++; if (state > dir_state) dir_state = state; @@ -2764,6 +2766,10 @@ static void trace2_read_directory_statistics(struct dir_struct *dir, struct repository *repo, const char *path) { + trace2_data_intmax("read_directory", repo, + "directories-visited", dir->visited_directories); + trace2_data_intmax("read_directory", repo, + "paths-visited", dir->visited_paths); if (!dir->untracked) return; trace2_data_string("read_directory", repo, "path", path); @@ -2785,6 +2791,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked; trace2_region_enter("dir", "read_directory", istate->repo); + dir->visited_paths = 0; + dir->visited_directories = 0; if (has_symlink_leading_path(path, len)) { trace2_region_leave("dir", "read_directory", istate->repo); diff --git a/dir.h b/dir.h index 04d886cfce75..22c67907f689 100644 --- a/dir.h +++ b/dir.h @@ -336,6 +336,10 @@ struct dir_struct { struct oid_stat ss_info_exclude; struct oid_stat ss_excludes_file; unsigned unmanaged_exclude_files; + + /* Stats about the traversal */ + unsigned visited_paths; + unsigned visited_directories; }; /*Count the number of slashes for string s*/ diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 9710d33b3cd6..a0c123b0a77a 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -65,7 +65,8 @@ get_relevant_traces () { INPUT_FILE=$1 OUTPUT_FILE=$2 grep data.*read_directo $INPUT_FILE | - cut -d "|" -f 9 \ + cut -d "|" -f 9 | + grep -v visited \ >"$OUTPUT_FILE" } -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (5 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> ls-files --ignored can be used together with either --others or --cached. After being perplexed for a bit and digging in to the code, I assumed that ls-files -i was just broken and not printing anything and I had a nice patch ready to submit when I finally realized that -i can be used with --cached to find tracked ignores. While that was a mistake on my part, and a careful reading of the documentation could have made this more clear, I suspect this is an error others are likely to make as well. In fact, of two uses in our testsuite, I believe one of the two did make this error. In t1306.13, there are NO tracked files, and all the excludes built up and used in that test and in previous tests thus have to be about untracked files. However, since they were looking for an empty result, the mistake went unnoticed as their erroneous command also just happened to give an empty answer. -i will most the time be used with -o, which would suggest we could just make -i imply -o in the absence of either a -o or -c, but that would be a backward incompatible break. Instead, let's just flag -i without either a -o or -c as an error, and update the two relevant testcases to specify their intent. Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/ls-files.c | 3 +++ t/t1306-xdg-files.sh | 2 +- t/t3003-ls-files-exclude.sh | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 60a2913a01e9..e8e25006c647 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) if (pathspec.nr && error_unmatch) ps_matched = xcalloc(pathspec.nr, 1); + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) + die("ls-files -i must be used with either -o or -c"); + if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given) die("ls-files --ignored needs some exclude pattern"); diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh index dd87b43be1a6..40d3c42618c0 100755 --- a/t/t1306-xdg-files.sh +++ b/t/t1306-xdg-files.sh @@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' ' test_expect_success 'Checking XDG ignore file when HOME is unset' ' (sane_unset HOME && git config --unset core.excludesfile && - git ls-files --exclude-standard --ignored >actual) && + git ls-files --exclude-standard --ignored --others >actual) && test_must_be_empty actual ' diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh index d5ec333131f9..c41c4f046abf 100755 --- a/t/t3003-ls-files-exclude.sh +++ b/t/t3003-ls-files-exclude.sh @@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' ' ' check_all_output -test_expect_success 'ls-files -i lists only tracked-but-ignored files' ' +test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' ' echo content >other-file && git add other-file && echo file >expect && - git ls-files -i --exclude-standard >output && + git ls-files -i -c --exclude-standard >output && test_cmp expect output ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-05-11 18:34 ` [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget ` (4 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The PNPM package manager is apparently creating deeply nested (but ignored) directory structures; traversing them is costly performance-wise, unnecessary, and in some cases is even throwing warnings/errors because the paths are too long to handle on various platforms. Add a testcase that checks for such unnecessary directory traversal. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7300-clean.sh | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index a74816ca8b46..07e8ba2d4b85 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,4 +746,27 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' +test_expect_failure 'avoid traversing into ignored directories' ' + test_when_finished rm -f output error trace.* && + test_create_repo avoid-traversing-deep-hierarchy && + ( + cd avoid-traversing-deep-hierarchy && + + mkdir -p untracked/subdir/with/a && + >untracked/subdir/with/a/random-file.txt && + + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ + git clean -ffdxn -e untracked + ) && + + # Make sure we only visited into the top-level directory, and did + # not traverse into the "untracked" subdirectory since it was excluded + grep data.*read_directo.*directories-visited trace.output | + cut -d "|" -f 9 >trace.relevant && + cat >trace.expect <<-EOF && + ..directories-visited:1 + EOF + test_cmp trace.expect trace.relevant +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2021-05-11 18:34 ` [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (3 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> In the last commit, we added a testcase showing that the directory traversal machinery sometimes traverses into directories unnecessarily. Here we show that there are cases where it does the opposite: it does not traverse into directories, despite those directories having important files that need to be flagged. Add a testcase showing that `git ls-files -o -i --directory` can omit some of the files it should be listing, and another showing that `git clean -fX` can fail to clean out some of the expected files. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t3001-ls-files-others-exclude.sh | 5 +++++ t/t7300-clean.sh | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index 1ec7cb57c7a8..ac05d1a17931 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,6 +292,11 @@ EOF test_cmp expect actual ' +test_expect_failure 'ls-files with "**" patterns and --directory' ' + # Expectation same as previous test + git ls-files --directory -o -i --exclude "**/a.1" >actual && + test_cmp expect actual +' test_expect_success 'ls-files with "**" patterns and no slashes' ' git ls-files -o -i --exclude "one**a.1" >actual && diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 07e8ba2d4b85..34c08c325407 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -769,4 +769,23 @@ test_expect_failure 'avoid traversing into ignored directories' ' test_cmp trace.expect trace.relevant ' +test_expect_failure 'traverse into directories that may have ignored entries' ' + test_when_finished rm -f output && + test_create_repo need-to-traverse-into-hierarchy && + ( + cd need-to-traverse-into-hierarchy && + mkdir -p modules/foobar/src/generated && + > modules/foobar/src/generated/code.c && + > modules/foobar/Makefile && + echo "/modules/**/src/generated/" >.gitignore && + + git clean -fX modules/foobar >../output && + + grep Removing ../output && + + test_path_is_missing modules/foobar/src/generated/code.c && + test_path_is_file modules/foobar/Makefile + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2021-05-11 18:34 ` [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget ` (2 subsequent siblings) 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The show_other_directories case in treat_directory() tried to handle both excludes and untracked files with the same logic, and mishandled both the excludes and the untracked files in the process, in different ways. Split that logic apart, and then focus on the logic for the excludes; a subsequent commit will address the logic for untracked files. For show_other_directories, an excluded directory means that every path underneath that directory will also be excluded. Given that the calling code requested to just show directories when everything under a directory had the same state (that's what the "DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to traverse into such directories and can just immediately mark them as ignored (i.e. as path_excluded). The only reason we cannot just immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag and the possibility that the ignored directory is an empty directory. The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an exception as well, which was wrong. It can sometimes reduce the number of cases where we need to recurse (namely if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able to increase the number of cases where we need to recurse. Fix the logic accordingly. Some sidenotes about possible confusion with dir.c: * "ignored" often refers to an untracked ignore", i.e. a file which is not tracked which matches one of the ignore/exclusion rules. But you can also have a "tracked ignore", a tracked file that happens to match one of the ignore/exclusion rules and which dir.c has to worry about since "git ls-files -c -i" is supposed to list them. * The dir code often uses "ignored" and "excluded" interchangeably, which you need to keep in mind while reading the code. * "exclude" is used multiple ways in the code: * As noted above, "exclude" is often a synonym for "ignored". * The logic for parsing .gitignore files was re-used in .git/info/sparse-checkout, except there it is used to mark paths that the user wants to *keep*. This was mostly addressed by commit 65edd96aec ("treewide: rename 'exclude' methods to 'pattern'", 2019-09-03), but every once in a while you'll find a comment about "exclude" referring to these patterns that might in fact be in use by the sparse-checkout machinery for inclusion rules. * The word "EXCLUDE" is also used for pathspec negation, as in (pathspec->items[3].magic & PATHSPEC_EXCLUDE) Thus if a user had a .gitignore file containing *~ *.log !settings.log And then ran git add -- 'settings.*' ':^settings.log' Then :^settings.log is a pathspec negation making settings.log not be requested to be added even though all other settings.* files are being added. Also, !settings.log in the gitignore file is a negative exclude pattern meaning that settings.log is normally a file we want to track even though all other *.log files are ignored. Sometimes it feels like dir.c needs its own glossary with its many definitions, including the multiply-defined terms. Reported-by: Jason Gore <Jason.Gore@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 44 +++++++++++++++++++++++++++++--------------- t/t7300-clean.sh | 2 +- 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/dir.c b/dir.c index 69b8c9d7f9fb..0126e2f08af7 100644 --- a/dir.c +++ b/dir.c @@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } /* This is the "show_other_directories" case */ + assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* * If we have a pathspec which could match something _below_ this @@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir, if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC) return path_recurse; + /* Special cases for where this directory is excluded/ignored */ + if (excluded) { + /* + * In the show_other_directories case, if we're not + * hiding empty directories, there is no need to + * recurse into an ignored directory. + */ + if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + return path_excluded; + + /* + * Even if we are hiding empty directories, we can still avoid + * recursing into ignored directories for DIR_SHOW_IGNORED_TOO + * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. + */ + if ((dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) + return path_excluded; + } + /* - * Other than the path_recurse case immediately above, we only need - * to recurse into untracked/ignored directories if either of the - * following bits is set: + * Other than the path_recurse case above, we only need to + * recurse into untracked directories if either of the following + * bits is set: * - DIR_SHOW_IGNORED_TOO (because then we need to determine if * there are ignored entries below) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ - if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) - return excluded ? path_excluded : path_untracked; - - /* - * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid - * recursing into ignored directories if the path is excluded and - * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. - */ - if (excluded && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) - return path_excluded; + if (!excluded && + !(dir->flags & (DIR_SHOW_IGNORED_TOO | + DIR_HIDE_EMPTY_DIRECTORIES))) { + return path_untracked; + } /* * Even if we don't want to know all the paths under an untracked or diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 34c08c325407..21e48b3ba591 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' -test_expect_failure 'avoid traversing into ignored directories' ' +test_expect_success 'avoid traversing into ignored directories' ' test_when_finished rm -f output error trace.* && test_create_repo avoid-traversing-deep-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2021-05-11 18:34 ` [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget 8 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A directory that is untracked does not imply that all files under it should be categorized as untracked; in particular, if the caller is interested in ignored files, many files or directories underneath the untracked directory may be ignored. We previously partially handled this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED. It was not obvious, though, because the logic for untracked and excluded files had been fused together making it harder to reason about. The previous commit split that logic out, making it easier to notice that DIR_SHOW_IGNORED was missing. Add it. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 10 ++++++---- t/t3001-ls-files-others-exclude.sh | 2 +- t/t7300-clean.sh | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index 0126e2f08af7..deeff1a58319 100644 --- a/dir.c +++ b/dir.c @@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* * Other than the path_recurse case above, we only need to - * recurse into untracked directories if either of the following + * recurse into untracked directories if any of the following * bits is set: - * - DIR_SHOW_IGNORED_TOO (because then we need to determine if - * there are ignored entries below) + * - DIR_SHOW_IGNORED (because then we need to determine if + * there are ignored entries below) + * - DIR_SHOW_IGNORED_TOO (same as above) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ if (!excluded && - !(dir->flags & (DIR_SHOW_IGNORED_TOO | + !(dir->flags & (DIR_SHOW_IGNORED | + DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) { return path_untracked; } diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index ac05d1a17931..516c95ea0e82 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,7 +292,7 @@ EOF test_cmp expect actual ' -test_expect_failure 'ls-files with "**" patterns and --directory' ' +test_expect_success 'ls-files with "**" patterns and --directory' ' # Expectation same as previous test git ls-files --directory -o -i --exclude "**/a.1" >actual && test_cmp expect actual diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 21e48b3ba591..0399701e6276 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -769,7 +769,7 @@ test_expect_success 'avoid traversing into ignored directories' ' test_cmp trace.expect trace.relevant ' -test_expect_failure 'traverse into directories that may have ignored entries' ' +test_expect_success 'traverse into directories that may have ignored entries' ' test_when_finished rm -f output && test_create_repo need-to-traverse-into-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v4 8/8] dir: update stale description of treat_directory() 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2021-05-11 18:34 ` [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget @ 2021-05-11 18:34 ` Derrick Stolee via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget 8 siblings, 0 replies; 90+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-05-11 18:34 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Derrick Stolee From: Derrick Stolee <stolee@gmail.com> The documentation comment for treat_directory() was originally written in 095952 (Teach directory traversal about subprojects, 2007-04-11) which was before the 'struct dir_struct' split its bitfield of named options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct dir_struct into a single variable, 2009-02-16). When those flags changed, the comment became stale, since members like 'show_other_directories' transitioned into flags like DIR_SHOW_OTHER_DIRECTORIES. Update the comments for treat_directory() to use these flag names rather than the old member names. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> --- dir.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/dir.c b/dir.c index deeff1a58319..993a12145f9d 100644 --- a/dir.c +++ b/dir.c @@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, * Case 3: if we didn't have it in the index previously, we * have a few sub-cases: * - * (a) if "show_other_directories" is true, we show it as - * just a directory, unless "hide_empty_directories" is + * (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as + * just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is * also true, in which case we need to check if it contains any * untracked and / or ignored files. - * (b) if it looks like a git directory, and we don't have - * 'no_gitlinks' set we treat it as a gitlink, and show it - * as a directory. + * (b) if it looks like a git directory and we don't have the + * DIR_NO_GITLINKS flag, then we treat it as a gitlink, and + * show it as a directory. * (c) otherwise, we recurse into it. */ static enum path_treatment treat_directory(struct dir_struct *dir, @@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir, return path_recurse; } - /* This is the "show_other_directories" case */ assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* @@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* Special cases for where this directory is excluded/ignored */ if (excluded) { /* - * In the show_other_directories case, if we're not + * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not * hiding empty directories, there is no need to * recurse into an ignored directory. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 0/9] Directory traversal fixes 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget ` (7 preceding siblings ...) 2021-05-11 18:34 ` [PATCH v4 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget ` (9 more replies) 8 siblings, 10 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren This patchset fixes a few directory traversal issues, where fill_directory() would traverse into directories that it shouldn't and not traverse into directories that it should (one of which was originally reported on this list at [1]). And it includes a few cleanups Changes since v4: * Tweak the trace2 statistics emitting a bit, as per suggestions from Jeff. * Introduce a new readdir_skip_dot_and_dotdot() helper at the end of the series, and use it everywhere we repeat the same code to skip '.' and '..' entries from readdir. Also use it in dir.c's read_cached_dir() so we can be consistent about skipping it, even for statistics, across platforms. If anyone has any ideas about a better place to put the "Some sidenotes" from the sixth commit message rather than keeping them in a random commit message, that might be helpful. [1] See https://lore.kernel.org/git/DM6PR00MB06829EC5B85E0C5AC595004E894E9@DM6PR00MB0682.namprd00.prod.outlook.com/ or alternatively https://github.com/git-for-windows/git/issues/2732. Derrick Stolee (1): dir: update stale description of treat_directory() Elijah Newren (8): dir: convert trace calls to trace2 equivalents dir: report number of visited directories and paths with trace2 ls-files: error out on -i unless -o or -c are specified t7300: add testcase showing unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal dir: avoid unnecessary traversal into ignored directory dir: traverse into untracked directories if they may have ignored subfiles dir: introduce readdir_skip_dot_and_dotdot() helper builtin/clean.c | 4 +- builtin/ls-files.c | 3 + builtin/worktree.c | 4 +- diff-no-index.c | 5 +- dir.c | 146 +++++++++++++------- dir.h | 6 + entry.c | 5 +- notes-merge.c | 5 +- object-file.c | 4 +- packfile.c | 5 +- rerere.c | 4 +- t/t1306-xdg-files.sh | 2 +- t/t3001-ls-files-others-exclude.sh | 5 + t/t3003-ls-files-exclude.sh | 4 +- t/t7063-status-untracked-cache.sh | 206 +++++++++++++++++------------ t/t7300-clean.sh | 42 ++++++ t/t7519-status-fsmonitor.sh | 8 +- worktree.c | 12 +- 18 files changed, 298 insertions(+), 172 deletions(-) base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1020%2Fnewren%2Fdirectory-traversal-fixes-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1020/newren/directory-traversal-fixes-v5 Pull-Request: https://github.com/git/git/pull/1020 Range-diff vs v4: 1: 9204e36b7e90 ! 1: 6b1b4820dd20 dir: convert trace calls to trace2 equivalents @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st return root; } -+static void trace2_read_directory_statistics(struct dir_struct *dir, -+ struct repository *repo, -+ const char *path) ++static void emit_traversal_statistics(struct dir_struct *dir, ++ struct repository *repo, ++ const char *path, ++ int path_len) +{ ++ if (!trace2_is_enabled()) ++ return; ++ ++ if (!path_len) { ++ trace2_data_string("read_directory", repo, "path", ""); ++ } else { ++ struct strbuf tmp = STRBUF_INIT; ++ strbuf_add(&tmp, path, path_len); ++ trace2_data_string("read_directory", repo, "path", tmp.buf); ++ strbuf_release(&tmp); ++ } ++ + if (!dir->untracked) + return; -+ trace2_data_string("read_directory", repo, "path", path); + trace2_data_intmax("read_directory", repo, + "node-creation", dir->untracked->dir_created); + trace2_data_intmax("read_directory", repo, @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate, QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); - trace_performance_leave("read directory %.*s", len, path); -+ if (trace2_is_enabled()) { -+ struct strbuf tmp = STRBUF_INIT; -+ strbuf_add(&tmp, path, len); -+ trace2_read_directory_statistics(dir, istate->repo, tmp.buf); -+ strbuf_release(&tmp); -+ } ++ emit_traversal_statistics(dir, istate->repo, path, len); + + trace2_region_leave("dir", "read_directory", istate->repo); if (dir->untracked) { 2: 6939253be825 ! 2: cfe2898b7a7e dir: report number of visited directories and paths with trace2 @@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di if (state > dir_state) dir_state = state; -@@ dir.c: static void trace2_read_directory_statistics(struct dir_struct *dir, - struct repository *repo, - const char *path) - { +@@ dir.c: static void emit_traversal_statistics(struct dir_struct *dir, + strbuf_release(&tmp); + } + + trace2_data_intmax("read_directory", repo, + "directories-visited", dir->visited_directories); + trace2_data_intmax("read_directory", repo, + "paths-visited", dir->visited_paths); ++ if (!dir->untracked) return; - trace2_data_string("read_directory", repo, "path", path); + trace2_data_intmax("read_directory", repo, @@ dir.c: int read_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked; 3: 8d0ca8104be6 = 3: 279ef30ffbc2 ls-files: error out on -i unless -o or -c are specified 4: 317abab3571e = 4: 5a8807a1992c t7300: add testcase showing unnecessary traversal into ignored directory 5: 5eb019327b57 = 5: b014ccbbaf3e t3001, t7300: add testcase showcasing missed directory traversal 6: 89cc01ef8598 = 6: ae1c9e37b21b dir: avoid unnecessary traversal into ignored directory 7: 4a561e1229e4 = 7: 6fa1e85edf2f dir: traverse into untracked directories if they may have ignored subfiles 8: 2945e749f5e3 = 8: 179f992edc92 dir: update stale description of treat_directory() -: ------------ > 9: b7c6176560bd dir: introduce readdir_skip_dot_and_dotdot() helper -- gitgitgadget ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget ` (8 subsequent siblings) 9 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 50 ++++++-- t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------ t/t7519-status-fsmonitor.sh | 8 +- 3 files changed, 162 insertions(+), 101 deletions(-) diff --git a/dir.c b/dir.c index 3474e67e8f3c..cf19a83d3e2c 100644 --- a/dir.c +++ b/dir.c @@ -2760,15 +2760,46 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d return root; } +static void emit_traversal_statistics(struct dir_struct *dir, + struct repository *repo, + const char *path, + int path_len) +{ + if (!trace2_is_enabled()) + return; + + if (!path_len) { + trace2_data_string("read_directory", repo, "path", ""); + } else { + struct strbuf tmp = STRBUF_INIT; + strbuf_add(&tmp, path, path_len); + trace2_data_string("read_directory", repo, "path", tmp.buf); + strbuf_release(&tmp); + } + + if (!dir->untracked) + return; + trace2_data_intmax("read_directory", repo, + "node-creation", dir->untracked->dir_created); + trace2_data_intmax("read_directory", repo, + "gitignore-invalidation", + dir->untracked->gitignore_invalidated); + trace2_data_intmax("read_directory", repo, + "directory-invalidation", + dir->untracked->dir_invalidated); + trace2_data_intmax("read_directory", repo, + "opendir", dir->untracked->dir_opened); +} + int read_directory(struct dir_struct *dir, struct index_state *istate, const char *path, int len, const struct pathspec *pathspec) { struct untracked_cache_dir *untracked; - trace_performance_enter(); + trace2_region_enter("dir", "read_directory", istate->repo); if (has_symlink_leading_path(path, len)) { - trace_performance_leave("read directory %.*s", len, path); + trace2_region_leave("dir", "read_directory", istate->repo); return dir->nr; } @@ -2784,23 +2815,15 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, QSORT(dir->entries, dir->nr, cmp_dir_entry); QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry); - trace_performance_leave("read directory %.*s", len, path); + emit_traversal_statistics(dir, istate->repo, path, len); + + trace2_region_leave("dir", "read_directory", istate->repo); if (dir->untracked) { static int force_untracked_cache = -1; - static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS); if (force_untracked_cache < 0) force_untracked_cache = git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0); - trace_printf_key(&trace_untracked_stats, - "node creation: %u\n" - "gitignore invalidation: %u\n" - "directory invalidation: %u\n" - "opendir: %u\n", - dir->untracked->dir_created, - dir->untracked->gitignore_invalidated, - dir->untracked->dir_invalidated, - dir->untracked->dir_opened); if (force_untracked_cache && dir->untracked == istate->untracked && (dir->untracked->dir_opened || @@ -2811,6 +2834,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, FREE_AND_NULL(dir->untracked); } } + return dir->nr; } diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index accefde72fb1..9710d33b3cd6 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -57,6 +57,19 @@ iuc () { return $ret } +get_relevant_traces () { + # From the GIT_TRACE2_PERF data of the form + # $TIME $FILE:$LINE | d0 | main | data | r1 | ? | ? | read_directo | $RELEVANT_STAT + # extract the $RELEVANT_STAT fields. We don't care about region_enter + # or region_leave, or stats for things outside read_directory. + INPUT_FILE=$1 + OUTPUT_FILE=$2 + grep data.*read_directo $INPUT_FILE | + cut -d "|" -f 9 \ + >"$OUTPUT_FILE" +} + + test_lazy_prereq UNTRACKED_CACHE ' { git update-index --test-untracked-cache; ret=$?; } && test $ret -ne 1 @@ -129,19 +142,21 @@ EOF test_expect_success 'status first time (empty cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 3 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ....path: + ....node-creation:3 + ....gitignore-invalidation:1 + ....directory-invalidation:0 + ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after first status' ' @@ -151,19 +166,21 @@ test_expect_success 'untracked cache after first status' ' test_expect_success 'status second time (fully populated cache)' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache after second status' ' @@ -174,8 +191,8 @@ test_expect_success 'untracked cache after second status' ' test_expect_success 'modify in root directory, one dir invalidation' ' avoid_racy && : >four && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -189,13 +206,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 1 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:1 + ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -223,8 +242,8 @@ EOF test_expect_success 'new .gitignore invalidates recursively' ' avoid_racy && echo four >.gitignore && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -238,13 +257,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 1 -opendir: 4 + ....path: + ....node-creation:0 + ....gitignore-invalidation:1 + ....directory-invalidation:1 + ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' @@ -272,8 +293,8 @@ EOF test_expect_success 'new info/exclude invalidates everything' ' avoid_racy && echo three >>.git/info/exclude && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -285,13 +306,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 0 -opendir: 4 + ....path: + ....node-creation:0 + ....gitignore-invalidation:1 + ....directory-invalidation:0 + ....opendir:4 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -330,8 +353,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -343,13 +366,15 @@ A one EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -389,8 +414,8 @@ EOF ' test_expect_success 'status after the move' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -402,13 +427,15 @@ A two EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 1 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:1 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump' ' @@ -438,8 +465,8 @@ test_expect_success 'set up for sparse checkout testing' ' ' test_expect_success 'status after commit' ' - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -448,13 +475,15 @@ test_expect_success 'status after commit' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 2 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after commit' ' @@ -496,9 +525,9 @@ test_expect_success 'create/modify files, some of which are gitignored' ' ' test_expect_success 'test sparse status with untracked cache' ' - : >../trace && + : >../trace.output && avoid_racy && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -509,13 +538,15 @@ test_expect_success 'test sparse status with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 1 -directory invalidation: 2 -opendir: 2 + ....path: + ....node-creation:0 + ....gitignore-invalidation:1 + ....directory-invalidation:2 + ....opendir:2 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'untracked cache correct after status' ' @@ -539,8 +570,8 @@ EOF test_expect_success 'test sparse status again with untracked cache' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -551,13 +582,15 @@ test_expect_success 'test sparse status again with untracked cache' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'set up for test of subdir and sparse checkouts' ' @@ -568,8 +601,8 @@ test_expect_success 'set up for test of subdir and sparse checkouts' ' test_expect_success 'test sparse status with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && cat >../status.expect <<EOF && @@ -581,13 +614,15 @@ test_expect_success 'test sparse status with untracked cache and subdir' ' EOF test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 2 -gitignore invalidation: 0 -directory invalidation: 1 -opendir: 3 + ....path: + ....node-creation:2 + ....gitignore-invalidation:0 + ....directory-invalidation:1 + ....opendir:3 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'verify untracked cache dump (sparse/subdirs)' ' @@ -616,19 +651,21 @@ EOF test_expect_success 'test sparse status again with untracked cache and subdir' ' avoid_racy && - : >../trace && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace" \ + : >../trace.output && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ git status --porcelain >../status.actual && iuc status --porcelain >../status.iuc && test_cmp ../status.expect ../status.iuc && test_cmp ../status.expect ../status.actual && + get_relevant_traces ../trace.output ../trace.relevant && cat >../trace.expect <<EOF && -node creation: 0 -gitignore invalidation: 0 -directory invalidation: 0 -opendir: 0 + ....path: + ....node-creation:0 + ....gitignore-invalidation:0 + ....directory-invalidation:0 + ....opendir:0 EOF - test_cmp ../trace.expect ../trace + test_cmp ../trace.expect ../trace.relevant ' test_expect_success 'move entry in subdir from untracked to cached' ' diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index 45d025f96010..637391c6ce46 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -334,7 +334,7 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' git config core.fsmonitor .git/hooks/fsmonitor-test && git update-index --untracked-cache && git update-index --fsmonitor && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \ git status && test-tool dump-untracked-cache >../before ) && @@ -346,12 +346,12 @@ test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' EOF ( cd dot-git && - GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \ + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-after" \ git status && test-tool dump-untracked-cache >../after ) && - grep "directory invalidation" trace-before >>before && - grep "directory invalidation" trace-after >>after && + grep "directory-invalidation" trace-before | cut -d"|" -f 9 >>before && + grep "directory-invalidation" trace-after | cut -d"|" -f 9 >>after && # UNTR extension unchanged, dir invalidation count unchanged test_cmp before after ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget ` (7 subsequent siblings) 9 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Provide more statistics in trace2 output that include the number of directories and total paths visited by the directory traversal logic. Subsequent patches will take advantage of this to ensure we do not unnecessarily traverse into ignored directories. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 9 +++++++++ dir.h | 4 ++++ t/t7063-status-untracked-cache.sh | 3 ++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/dir.c b/dir.c index cf19a83d3e2c..f6dec5fd4a78 100644 --- a/dir.c +++ b/dir.c @@ -2440,6 +2440,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, if (open_cached_dir(&cdir, dir, untracked, istate, &path, check_only)) goto out; + dir->visited_directories++; if (untracked) untracked->check_only = !!check_only; @@ -2448,6 +2449,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir, /* check how the file or directory should be treated */ state = treat_path(dir, untracked, &cdir, istate, &path, baselen, pathspec); + dir->visited_paths++; if (state > dir_state) dir_state = state; @@ -2777,6 +2779,11 @@ static void emit_traversal_statistics(struct dir_struct *dir, strbuf_release(&tmp); } + trace2_data_intmax("read_directory", repo, + "directories-visited", dir->visited_directories); + trace2_data_intmax("read_directory", repo, + "paths-visited", dir->visited_paths); + if (!dir->untracked) return; trace2_data_intmax("read_directory", repo, @@ -2797,6 +2804,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate, struct untracked_cache_dir *untracked; trace2_region_enter("dir", "read_directory", istate->repo); + dir->visited_paths = 0; + dir->visited_directories = 0; if (has_symlink_leading_path(path, len)) { trace2_region_leave("dir", "read_directory", istate->repo); diff --git a/dir.h b/dir.h index 04d886cfce75..22c67907f689 100644 --- a/dir.h +++ b/dir.h @@ -336,6 +336,10 @@ struct dir_struct { struct oid_stat ss_info_exclude; struct oid_stat ss_excludes_file; unsigned unmanaged_exclude_files; + + /* Stats about the traversal */ + unsigned visited_paths; + unsigned visited_directories; }; /*Count the number of slashes for string s*/ diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh index 9710d33b3cd6..a0c123b0a77a 100755 --- a/t/t7063-status-untracked-cache.sh +++ b/t/t7063-status-untracked-cache.sh @@ -65,7 +65,8 @@ get_relevant_traces () { INPUT_FILE=$1 OUTPUT_FILE=$2 grep data.*read_directo $INPUT_FILE | - cut -d "|" -f 9 \ + cut -d "|" -f 9 | + grep -v visited \ >"$OUTPUT_FILE" } -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (6 subsequent siblings) 9 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> ls-files --ignored can be used together with either --others or --cached. After being perplexed for a bit and digging in to the code, I assumed that ls-files -i was just broken and not printing anything and I had a nice patch ready to submit when I finally realized that -i can be used with --cached to find tracked ignores. While that was a mistake on my part, and a careful reading of the documentation could have made this more clear, I suspect this is an error others are likely to make as well. In fact, of two uses in our testsuite, I believe one of the two did make this error. In t1306.13, there are NO tracked files, and all the excludes built up and used in that test and in previous tests thus have to be about untracked files. However, since they were looking for an empty result, the mistake went unnoticed as their erroneous command also just happened to give an empty answer. -i will most the time be used with -o, which would suggest we could just make -i imply -o in the absence of either a -o or -c, but that would be a backward incompatible break. Instead, let's just flag -i without either a -o or -c as an error, and update the two relevant testcases to specify their intent. Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/ls-files.c | 3 +++ t/t1306-xdg-files.sh | 2 +- t/t3003-ls-files-exclude.sh | 4 ++-- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 60a2913a01e9..e8e25006c647 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -748,6 +748,9 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) if (pathspec.nr && error_unmatch) ps_matched = xcalloc(pathspec.nr, 1); + if ((dir.flags & DIR_SHOW_IGNORED) && !show_others && !show_cached) + die("ls-files -i must be used with either -o or -c"); + if ((dir.flags & DIR_SHOW_IGNORED) && !exc_given) die("ls-files --ignored needs some exclude pattern"); diff --git a/t/t1306-xdg-files.sh b/t/t1306-xdg-files.sh index dd87b43be1a6..40d3c42618c0 100755 --- a/t/t1306-xdg-files.sh +++ b/t/t1306-xdg-files.sh @@ -116,7 +116,7 @@ test_expect_success 'Exclusion in a non-XDG global ignore file' ' test_expect_success 'Checking XDG ignore file when HOME is unset' ' (sane_unset HOME && git config --unset core.excludesfile && - git ls-files --exclude-standard --ignored >actual) && + git ls-files --exclude-standard --ignored --others >actual) && test_must_be_empty actual ' diff --git a/t/t3003-ls-files-exclude.sh b/t/t3003-ls-files-exclude.sh index d5ec333131f9..c41c4f046abf 100755 --- a/t/t3003-ls-files-exclude.sh +++ b/t/t3003-ls-files-exclude.sh @@ -29,11 +29,11 @@ test_expect_success 'add file to gitignore' ' ' check_all_output -test_expect_success 'ls-files -i lists only tracked-but-ignored files' ' +test_expect_success 'ls-files -i -c lists only tracked-but-ignored files' ' echo content >other-file && git add other-file && echo file >expect && - git ls-files -i --exclude-standard >output && + git ls-files -i -c --exclude-standard >output && test_cmp expect output ' -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget ` (2 preceding siblings ...) 2021-05-12 17:28 ` [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget ` (5 subsequent siblings) 9 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The PNPM package manager is apparently creating deeply nested (but ignored) directory structures; traversing them is costly performance-wise, unnecessary, and in some cases is even throwing warnings/errors because the paths are too long to handle on various platforms. Add a testcase that checks for such unnecessary directory traversal. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t7300-clean.sh | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index a74816ca8b46..07e8ba2d4b85 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,4 +746,27 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' +test_expect_failure 'avoid traversing into ignored directories' ' + test_when_finished rm -f output error trace.* && + test_create_repo avoid-traversing-deep-hierarchy && + ( + cd avoid-traversing-deep-hierarchy && + + mkdir -p untracked/subdir/with/a && + >untracked/subdir/with/a/random-file.txt && + + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \ + git clean -ffdxn -e untracked + ) && + + # Make sure we only visited into the top-level directory, and did + # not traverse into the "untracked" subdirectory since it was excluded + grep data.*read_directo.*directories-visited trace.output | + cut -d "|" -f 9 >trace.relevant && + cat >trace.expect <<-EOF && + ..directories-visited:1 + EOF + test_cmp trace.expect trace.relevant +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget ` (3 preceding siblings ...) 2021-05-12 17:28 ` [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget ` (4 subsequent siblings) 9 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> In the last commit, we added a testcase showing that the directory traversal machinery sometimes traverses into directories unnecessarily. Here we show that there are cases where it does the opposite: it does not traverse into directories, despite those directories having important files that need to be flagged. Add a testcase showing that `git ls-files -o -i --directory` can omit some of the files it should be listing, and another showing that `git clean -fX` can fail to clean out some of the expected files. Signed-off-by: Elijah Newren <newren@gmail.com> --- t/t3001-ls-files-others-exclude.sh | 5 +++++ t/t7300-clean.sh | 19 +++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index 1ec7cb57c7a8..ac05d1a17931 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,6 +292,11 @@ EOF test_cmp expect actual ' +test_expect_failure 'ls-files with "**" patterns and --directory' ' + # Expectation same as previous test + git ls-files --directory -o -i --exclude "**/a.1" >actual && + test_cmp expect actual +' test_expect_success 'ls-files with "**" patterns and no slashes' ' git ls-files -o -i --exclude "one**a.1" >actual && diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 07e8ba2d4b85..34c08c325407 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -769,4 +769,23 @@ test_expect_failure 'avoid traversing into ignored directories' ' test_cmp trace.expect trace.relevant ' +test_expect_failure 'traverse into directories that may have ignored entries' ' + test_when_finished rm -f output && + test_create_repo need-to-traverse-into-hierarchy && + ( + cd need-to-traverse-into-hierarchy && + mkdir -p modules/foobar/src/generated && + > modules/foobar/src/generated/code.c && + > modules/foobar/Makefile && + echo "/modules/**/src/generated/" >.gitignore && + + git clean -fX modules/foobar >../output && + + grep Removing ../output && + + test_path_is_missing modules/foobar/src/generated/code.c && + test_path_is_file modules/foobar/Makefile + ) +' + test_done -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget ` (4 preceding siblings ...) 2021-05-12 17:28 ` [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget ` (3 subsequent siblings) 9 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> The show_other_directories case in treat_directory() tried to handle both excludes and untracked files with the same logic, and mishandled both the excludes and the untracked files in the process, in different ways. Split that logic apart, and then focus on the logic for the excludes; a subsequent commit will address the logic for untracked files. For show_other_directories, an excluded directory means that every path underneath that directory will also be excluded. Given that the calling code requested to just show directories when everything under a directory had the same state (that's what the "DIR_SHOW_OTHER_DIRECTORIES" flag means), we generally do not need to traverse into such directories and can just immediately mark them as ignored (i.e. as path_excluded). The only reason we cannot just immediately return path_excluded is the DIR_HIDE_EMPTY_DIRECTORIES flag and the possibility that the ignored directory is an empty directory. The code previously treated DIR_SHOW_IGNORED_TOO in most cases as an exception as well, which was wrong. It can sometimes reduce the number of cases where we need to recurse (namely if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set), but should not be able to increase the number of cases where we need to recurse. Fix the logic accordingly. Some sidenotes about possible confusion with dir.c: * "ignored" often refers to an untracked ignore", i.e. a file which is not tracked which matches one of the ignore/exclusion rules. But you can also have a "tracked ignore", a tracked file that happens to match one of the ignore/exclusion rules and which dir.c has to worry about since "git ls-files -c -i" is supposed to list them. * The dir code often uses "ignored" and "excluded" interchangeably, which you need to keep in mind while reading the code. * "exclude" is used multiple ways in the code: * As noted above, "exclude" is often a synonym for "ignored". * The logic for parsing .gitignore files was re-used in .git/info/sparse-checkout, except there it is used to mark paths that the user wants to *keep*. This was mostly addressed by commit 65edd96aec ("treewide: rename 'exclude' methods to 'pattern'", 2019-09-03), but every once in a while you'll find a comment about "exclude" referring to these patterns that might in fact be in use by the sparse-checkout machinery for inclusion rules. * The word "EXCLUDE" is also used for pathspec negation, as in (pathspec->items[3].magic & PATHSPEC_EXCLUDE) Thus if a user had a .gitignore file containing *~ *.log !settings.log And then ran git add -- 'settings.*' ':^settings.log' Then :^settings.log is a pathspec negation making settings.log not be requested to be added even though all other settings.* files are being added. Also, !settings.log in the gitignore file is a negative exclude pattern meaning that settings.log is normally a file we want to track even though all other *.log files are ignored. Sometimes it feels like dir.c needs its own glossary with its many definitions, including the multiply-defined terms. Reported-by: Jason Gore <Jason.Gore@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 44 +++++++++++++++++++++++++++++--------------- t/t7300-clean.sh | 2 +- 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/dir.c b/dir.c index f6dec5fd4a78..db2ae516a3aa 100644 --- a/dir.c +++ b/dir.c @@ -1844,6 +1844,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, } /* This is the "show_other_directories" case */ + assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* * If we have a pathspec which could match something _below_ this @@ -1854,27 +1855,40 @@ static enum path_treatment treat_directory(struct dir_struct *dir, if (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC) return path_recurse; + /* Special cases for where this directory is excluded/ignored */ + if (excluded) { + /* + * In the show_other_directories case, if we're not + * hiding empty directories, there is no need to + * recurse into an ignored directory. + */ + if (!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES)) + return path_excluded; + + /* + * Even if we are hiding empty directories, we can still avoid + * recursing into ignored directories for DIR_SHOW_IGNORED_TOO + * if DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. + */ + if ((dir->flags & DIR_SHOW_IGNORED_TOO) && + (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) + return path_excluded; + } + /* - * Other than the path_recurse case immediately above, we only need - * to recurse into untracked/ignored directories if either of the - * following bits is set: + * Other than the path_recurse case above, we only need to + * recurse into untracked directories if either of the following + * bits is set: * - DIR_SHOW_IGNORED_TOO (because then we need to determine if * there are ignored entries below) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ - if (!(dir->flags & (DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) - return excluded ? path_excluded : path_untracked; - - /* - * ...and even if DIR_SHOW_IGNORED_TOO is set, we can still avoid - * recursing into ignored directories if the path is excluded and - * DIR_SHOW_IGNORED_TOO_MODE_MATCHING is also set. - */ - if (excluded && - (dir->flags & DIR_SHOW_IGNORED_TOO) && - (dir->flags & DIR_SHOW_IGNORED_TOO_MODE_MATCHING)) - return path_excluded; + if (!excluded && + !(dir->flags & (DIR_SHOW_IGNORED_TOO | + DIR_HIDE_EMPTY_DIRECTORIES))) { + return path_untracked; + } /* * Even if we don't want to know all the paths under an untracked or diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 34c08c325407..21e48b3ba591 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -746,7 +746,7 @@ test_expect_success 'clean untracked paths by pathspec' ' test_must_be_empty actual ' -test_expect_failure 'avoid traversing into ignored directories' ' +test_expect_success 'avoid traversing into ignored directories' ' test_when_finished rm -f output error trace.* && test_create_repo avoid-traversing-deep-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget ` (5 preceding siblings ...) 2021-05-12 17:28 ` [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget ` (2 subsequent siblings) 9 siblings, 0 replies; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> A directory that is untracked does not imply that all files under it should be categorized as untracked; in particular, if the caller is interested in ignored files, many files or directories underneath the untracked directory may be ignored. We previously partially handled this right with DIR_SHOW_IGNORED_TOO, but missed DIR_SHOW_IGNORED. It was not obvious, though, because the logic for untracked and excluded files had been fused together making it harder to reason about. The previous commit split that logic out, making it easier to notice that DIR_SHOW_IGNORED was missing. Add it. Signed-off-by: Elijah Newren <newren@gmail.com> --- dir.c | 10 ++++++---- t/t3001-ls-files-others-exclude.sh | 2 +- t/t7300-clean.sh | 2 +- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/dir.c b/dir.c index db2ae516a3aa..c0233bbba36c 100644 --- a/dir.c +++ b/dir.c @@ -1877,15 +1877,17 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* * Other than the path_recurse case above, we only need to - * recurse into untracked directories if either of the following + * recurse into untracked directories if any of the following * bits is set: - * - DIR_SHOW_IGNORED_TOO (because then we need to determine if - * there are ignored entries below) + * - DIR_SHOW_IGNORED (because then we need to determine if + * there are ignored entries below) + * - DIR_SHOW_IGNORED_TOO (same as above) * - DIR_HIDE_EMPTY_DIRECTORIES (because we have to determine if * the directory is empty) */ if (!excluded && - !(dir->flags & (DIR_SHOW_IGNORED_TOO | + !(dir->flags & (DIR_SHOW_IGNORED | + DIR_SHOW_IGNORED_TOO | DIR_HIDE_EMPTY_DIRECTORIES))) { return path_untracked; } diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh index ac05d1a17931..516c95ea0e82 100755 --- a/t/t3001-ls-files-others-exclude.sh +++ b/t/t3001-ls-files-others-exclude.sh @@ -292,7 +292,7 @@ EOF test_cmp expect actual ' -test_expect_failure 'ls-files with "**" patterns and --directory' ' +test_expect_success 'ls-files with "**" patterns and --directory' ' # Expectation same as previous test git ls-files --directory -o -i --exclude "**/a.1" >actual && test_cmp expect actual diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh index 21e48b3ba591..0399701e6276 100755 --- a/t/t7300-clean.sh +++ b/t/t7300-clean.sh @@ -769,7 +769,7 @@ test_expect_success 'avoid traversing into ignored directories' ' test_cmp trace.expect trace.relevant ' -test_expect_failure 'traverse into directories that may have ignored entries' ' +test_expect_success 'traverse into directories that may have ignored entries' ' test_when_finished rm -f output && test_create_repo need-to-traverse-into-hierarchy && ( -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* [PATCH v5 8/9] dir: update stale description of treat_directory() 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget ` (6 preceding siblings ...) 2021-05-12 17:28 ` [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget @ 2021-05-12 17:28 ` Derrick Stolee via GitGitGadget 2021-05-17 17:20 ` Derrick Stolee 2021-05-12 17:28 ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget 2021-05-17 17:23 ` [PATCH v5 0/9] Directory traversal fixes Derrick Stolee 9 siblings, 1 reply; 90+ messages in thread From: Derrick Stolee via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Derrick Stolee From: Derrick Stolee <stolee@gmail.com> The documentation comment for treat_directory() was originally written in 095952 (Teach directory traversal about subprojects, 2007-04-11) which was before the 'struct dir_struct' split its bitfield of named options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct dir_struct into a single variable, 2009-02-16). When those flags changed, the comment became stale, since members like 'show_other_directories' transitioned into flags like DIR_SHOW_OTHER_DIRECTORIES. Update the comments for treat_directory() to use these flag names rather than the old member names. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> --- dir.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/dir.c b/dir.c index c0233bbba36c..4794c822b47f 100644 --- a/dir.c +++ b/dir.c @@ -1749,13 +1749,13 @@ static enum exist_status directory_exists_in_index(struct index_state *istate, * Case 3: if we didn't have it in the index previously, we * have a few sub-cases: * - * (a) if "show_other_directories" is true, we show it as - * just a directory, unless "hide_empty_directories" is + * (a) if DIR_SHOW_OTHER_DIRECTORIES flag is set, we show it as + * just a directory, unless DIR_HIDE_EMPTY_DIRECTORIES is * also true, in which case we need to check if it contains any * untracked and / or ignored files. - * (b) if it looks like a git directory, and we don't have - * 'no_gitlinks' set we treat it as a gitlink, and show it - * as a directory. + * (b) if it looks like a git directory and we don't have the + * DIR_NO_GITLINKS flag, then we treat it as a gitlink, and + * show it as a directory. * (c) otherwise, we recurse into it. */ static enum path_treatment treat_directory(struct dir_struct *dir, @@ -1843,7 +1843,6 @@ static enum path_treatment treat_directory(struct dir_struct *dir, return path_recurse; } - /* This is the "show_other_directories" case */ assert(dir->flags & DIR_SHOW_OTHER_DIRECTORIES); /* @@ -1858,7 +1857,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir, /* Special cases for where this directory is excluded/ignored */ if (excluded) { /* - * In the show_other_directories case, if we're not + * If DIR_SHOW_OTHER_DIRECTORIES is set and we're not * hiding empty directories, there is no need to * recurse into an ignored directory. */ -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v5 8/9] dir: update stale description of treat_directory() 2021-05-12 17:28 ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget @ 2021-05-17 17:20 ` Derrick Stolee 2021-05-17 19:44 ` Junio C Hamano 0 siblings, 1 reply; 90+ messages in thread From: Derrick Stolee @ 2021-05-17 17:20 UTC (permalink / raw) To: Derrick Stolee via GitGitGadget, git Cc: Eric Sunshine, Elijah Newren, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler On 5/12/2021 1:28 PM, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <stolee@gmail.com> > > The documentation comment for treat_directory() was originally written > in 095952 (Teach directory traversal about subprojects, 2007-04-11) > which was before the 'struct dir_struct' split its bitfield of named > options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct > dir_struct into a single variable, 2009-02-16). When those flags > changed, the comment became stale, since members like > 'show_other_directories' transitioned into flags like > DIR_SHOW_OTHER_DIRECTORIES. > > Update the comments for treat_directory() to use these flag names rather > than the old member names. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > Reviewed-by: Elijah Newren <newren@gmail.com> I think you want the "Reviewed-by" before the "Signed-off-by", followed by your own sign-off. Thanks, -Stolee ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v5 8/9] dir: update stale description of treat_directory() 2021-05-17 17:20 ` Derrick Stolee @ 2021-05-17 19:44 ` Junio C Hamano 2021-05-18 3:32 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Junio C Hamano @ 2021-05-17 19:44 UTC (permalink / raw) To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget, git, Eric Sunshine, Elijah Newren, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler Derrick Stolee <stolee@gmail.com> writes: > On 5/12/2021 1:28 PM, Derrick Stolee via GitGitGadget wrote: >> From: Derrick Stolee <stolee@gmail.com> >> >> The documentation comment for treat_directory() was originally written >> in 095952 (Teach directory traversal about subprojects, 2007-04-11) >> which was before the 'struct dir_struct' split its bitfield of named >> options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct >> dir_struct into a single variable, 2009-02-16). When those flags >> changed, the comment became stale, since members like >> 'show_other_directories' transitioned into flags like >> DIR_SHOW_OTHER_DIRECTORIES. >> >> Update the comments for treat_directory() to use these flag names rather >> than the old member names. >> >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> >> Reviewed-by: Elijah Newren <newren@gmail.com> > > I think you want the "Reviewed-by" before the "Signed-off-by", > followed by your own sign-off. Grabbing somebody else's signed-off patch, and forwarding it (with or without tweaks and enhancements) with your own sign-off would be a sufficient sign that you've inspected the patch deeply enough to be confident that it is worth forwarding. So I think you can even lose the reviewed-by. But as long as you are relaying somebody else's patch, DCO asks you to sign it off yourself. Thanks. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v5 8/9] dir: update stale description of treat_directory() 2021-05-17 19:44 ` Junio C Hamano @ 2021-05-18 3:32 ` Elijah Newren 2021-05-19 1:44 ` Junio C Hamano 0 siblings, 1 reply; 90+ messages in thread From: Elijah Newren @ 2021-05-18 3:32 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, Git Mailing List, Eric Sunshine, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler On Mon, May 17, 2021 at 12:44 PM Junio C Hamano <gitster@pobox.com> wrote: > > Derrick Stolee <stolee@gmail.com> writes: > > > On 5/12/2021 1:28 PM, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <stolee@gmail.com> > >> > >> The documentation comment for treat_directory() was originally written > >> in 095952 (Teach directory traversal about subprojects, 2007-04-11) > >> which was before the 'struct dir_struct' split its bitfield of named > >> options into a 'flags' enum in 7c4c97c0 (Turn the flags in struct > >> dir_struct into a single variable, 2009-02-16). When those flags > >> changed, the comment became stale, since members like > >> 'show_other_directories' transitioned into flags like > >> DIR_SHOW_OTHER_DIRECTORIES. > >> > >> Update the comments for treat_directory() to use these flag names rather > >> than the old member names. > >> > >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > >> Reviewed-by: Elijah Newren <newren@gmail.com> > > > > I think you want the "Reviewed-by" before the "Signed-off-by", > > followed by your own sign-off. > > Grabbing somebody else's signed-off patch, and forwarding it (with > or without tweaks and enhancements) with your own sign-off would be > a sufficient sign that you've inspected the patch deeply enough to > be confident that it is worth forwarding. So I think you can even > lose the reviewed-by. > > But as long as you are relaying somebody else's patch, DCO asks you > to sign it off yourself. > > Thanks. I was going to go fix this up, but it looks like en/dir-traversal has already merged down to next. We could revert the last two patches of the series out of next (allowing the first seven with the important fixes to merge down) and then I could resubmit just the last two patches. Or we could just let them all merge down as-is. Preferences? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v5 8/9] dir: update stale description of treat_directory() 2021-05-18 3:32 ` Elijah Newren @ 2021-05-19 1:44 ` Junio C Hamano 0 siblings, 0 replies; 90+ messages in thread From: Junio C Hamano @ 2021-05-19 1:44 UTC (permalink / raw) To: Elijah Newren Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, Git Mailing List, Eric Sunshine, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler Elijah Newren <newren@gmail.com> writes: > We could revert the last two patches of the series out of next > (allowing the first seven with the important fixes to merge down) and > then I could resubmit just the last two patches. Or we could just let > them all merge down as-is. Preferences? The former, thanks. ^ permalink raw reply [flat|nested] 90+ messages in thread
* [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget ` (7 preceding siblings ...) 2021-05-12 17:28 ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget @ 2021-05-12 17:28 ` Elijah Newren via GitGitGadget 2021-05-17 17:22 ` Derrick Stolee 2021-05-17 17:23 ` [PATCH v5 0/9] Directory traversal fixes Derrick Stolee 9 siblings, 1 reply; 90+ messages in thread From: Elijah Newren via GitGitGadget @ 2021-05-12 17:28 UTC (permalink / raw) To: git Cc: Eric Sunshine, Elijah Newren, Derrick Stolee, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler, Elijah Newren, Elijah Newren From: Elijah Newren <newren@gmail.com> Many places in the code were doing while ((d = readdir(dir)) != NULL) { if (is_dot_or_dotdot(d->d_name)) continue; ...process d... } Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner: while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { ...process d... } This helper particularly simplifies checks for empty directories. Also use this helper in read_cached_dir() so that our statistics are consistent across platforms. (In other words, read_cached_dir() should have been using is_dot_or_dotdot() and skipping such entries, but did not and left it to treat_path() to detect and mark such entries as path_none.) Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/clean.c | 4 +--- builtin/worktree.c | 4 +--- diff-no-index.c | 5 ++--- dir.c | 26 +++++++++++++++++--------- dir.h | 2 ++ entry.c | 5 +---- notes-merge.c | 5 +---- object-file.c | 4 +--- packfile.c | 5 +---- rerere.c | 4 +--- worktree.c | 12 +++--------- 11 files changed, 31 insertions(+), 45 deletions(-) diff --git a/builtin/clean.c b/builtin/clean.c index 995053b79173..a1a57476153b 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -189,10 +189,8 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, strbuf_complete(path, '/'); len = path->len; - while ((e = readdir(dir)) != NULL) { + while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL) { struct stat st; - if (is_dot_or_dotdot(e->d_name)) - continue; strbuf_setlen(path, len); strbuf_addstr(path, e->d_name); diff --git a/builtin/worktree.c b/builtin/worktree.c index 877145349381..ae28249e0f0b 100644 --- a/builtin/worktree.c +++ b/builtin/worktree.c @@ -118,10 +118,8 @@ static void prune_worktrees(void) struct dirent *d; if (!dir) return; - while ((d = readdir(dir)) != NULL) { + while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { char *path; - if (is_dot_or_dotdot(d->d_name)) - continue; strbuf_reset(&reason); if (should_prune_worktree(d->d_name, &reason, &path, expire)) prune_worktree(d->d_name, reason.buf); diff --git a/diff-no-index.c b/diff-no-index.c index 7814eabfe028..e5cc87837143 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -26,9 +26,8 @@ static int read_directory_contents(const char *path, struct string_list *list) if (!(dir = opendir(path))) return error("Could not open directory %s", path); - while ((e = readdir(dir))) - if (!is_dot_or_dotdot(e->d_name)) - string_list_insert(list, e->d_name); + while ((e = readdir_skip_dot_and_dotdot(dir))) + string_list_insert(list, e->d_name); closedir(dir); return 0; diff --git a/dir.c b/dir.c index 4794c822b47f..66c8518947dd 100644 --- a/dir.c +++ b/dir.c @@ -59,6 +59,18 @@ void dir_init(struct dir_struct *dir) memset(dir, 0, sizeof(*dir)); } +struct dirent * +readdir_skip_dot_and_dotdot(DIR *dirp) +{ + struct dirent *e; + + while ((e = readdir(dirp)) != NULL) { + if (!is_dot_or_dotdot(e->d_name)) + break; + } + return e; +} + int count_slashes(const char *s) { int cnt = 0; @@ -2341,7 +2353,7 @@ static int read_cached_dir(struct cached_dir *cdir) struct dirent *de; if (cdir->fdir) { - de = readdir(cdir->fdir); + de = readdir_skip_dot_and_dotdot(cdir->fdir); if (!de) { cdir->d_name = NULL; cdir->d_type = DT_UNKNOWN; @@ -2940,11 +2952,9 @@ int is_empty_dir(const char *path) if (!dir) return 0; - while ((e = readdir(dir)) != NULL) - if (!is_dot_or_dotdot(e->d_name)) { - ret = 0; - break; - } + e = readdir_skip_dot_and_dotdot(dir); + if (e) + ret = 0; closedir(dir); return ret; @@ -2984,10 +2994,8 @@ static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up) strbuf_complete(path, '/'); len = path->len; - while ((e = readdir(dir)) != NULL) { + while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL) { struct stat st; - if (is_dot_or_dotdot(e->d_name)) - continue; strbuf_setlen(path, len); strbuf_addstr(path, e->d_name); diff --git a/dir.h b/dir.h index 22c67907f689..a704e466afd5 100644 --- a/dir.h +++ b/dir.h @@ -342,6 +342,8 @@ struct dir_struct { unsigned visited_directories; }; +struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp); + /*Count the number of slashes for string s*/ int count_slashes(const char *s); diff --git a/entry.c b/entry.c index 2dc94ba5cc2a..6da589696770 100644 --- a/entry.c +++ b/entry.c @@ -57,12 +57,9 @@ static void remove_subtree(struct strbuf *path) if (!dir) die_errno("cannot opendir '%s'", path->buf); - while ((de = readdir(dir)) != NULL) { + while ((de = readdir_skip_dot_and_dotdot(dir)) != NULL) { struct stat st; - if (is_dot_or_dotdot(de->d_name)) - continue; - strbuf_addch(path, '/'); strbuf_addstr(path, de->d_name); if (lstat(path->buf, &st)) diff --git a/notes-merge.c b/notes-merge.c index d2771fa3d43c..e9d6f86d3428 100644 --- a/notes-merge.c +++ b/notes-merge.c @@ -695,13 +695,10 @@ int notes_merge_commit(struct notes_merge_options *o, strbuf_addch(&path, '/'); baselen = path.len; - while ((e = readdir(dir)) != NULL) { + while ((e = readdir_skip_dot_and_dotdot(dir)) != NULL) { struct stat st; struct object_id obj_oid, blob_oid; - if (is_dot_or_dotdot(e->d_name)) - continue; - if (get_oid_hex(e->d_name, &obj_oid)) { if (o->verbosity >= 3) printf("Skipping non-SHA1 entry '%s%s'\n", diff --git a/object-file.c b/object-file.c index 624af408cdcd..77bdcfd21bc8 100644 --- a/object-file.c +++ b/object-file.c @@ -2304,10 +2304,8 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, strbuf_addch(path, '/'); baselen = path->len; - while ((de = readdir(dir))) { + while ((de = readdir_skip_dot_and_dotdot(dir))) { size_t namelen; - if (is_dot_or_dotdot(de->d_name)) - continue; namelen = strlen(de->d_name); strbuf_setlen(path, baselen); diff --git a/packfile.c b/packfile.c index 8668345d9309..7c8f1b7202ca 100644 --- a/packfile.c +++ b/packfile.c @@ -813,10 +813,7 @@ void for_each_file_in_pack_dir(const char *objdir, } strbuf_addch(&path, '/'); dirnamelen = path.len; - while ((de = readdir(dir)) != NULL) { - if (is_dot_or_dotdot(de->d_name)) - continue; - + while ((de = readdir_skip_dot_and_dotdot(dir)) != NULL) { strbuf_setlen(&path, dirnamelen); strbuf_addstr(&path, de->d_name); diff --git a/rerere.c b/rerere.c index dee60dc6df63..d83d58df4fbc 100644 --- a/rerere.c +++ b/rerere.c @@ -1190,13 +1190,11 @@ void rerere_gc(struct repository *r, struct string_list *rr) if (!dir) die_errno(_("unable to open rr-cache directory")); /* Collect stale conflict IDs ... */ - while ((e = readdir(dir))) { + while ((e = readdir_skip_dot_and_dotdot(dir))) { struct rerere_dir *rr_dir; struct rerere_id id; int now_empty; - if (is_dot_or_dotdot(e->d_name)) - continue; if (!is_rr_cache_dirname(e->d_name)) continue; /* or should we remove e->d_name? */ diff --git a/worktree.c b/worktree.c index f35ac40a84a5..237517baee67 100644 --- a/worktree.c +++ b/worktree.c @@ -128,10 +128,8 @@ struct worktree **get_worktrees(void) dir = opendir(path.buf); strbuf_release(&path); if (dir) { - while ((d = readdir(dir)) != NULL) { + while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { struct worktree *linked = NULL; - if (is_dot_or_dotdot(d->d_name)) - continue; if ((linked = get_linked_worktree(d->d_name))) { ALLOC_GROW(list, counter + 1, alloc); @@ -486,13 +484,9 @@ int submodule_uses_worktrees(const char *path) if (!dir) return 0; - while ((d = readdir(dir)) != NULL) { - if (is_dot_or_dotdot(d->d_name)) - continue; - + d = readdir_skip_dot_and_dotdot(dir); + if (d != NULL) ret = 1; - break; - } closedir(dir); return ret; } -- gitgitgadget ^ permalink raw reply related [flat|nested] 90+ messages in thread
* Re: [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper 2021-05-12 17:28 ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget @ 2021-05-17 17:22 ` Derrick Stolee 2021-05-18 3:34 ` Elijah Newren 0 siblings, 1 reply; 90+ messages in thread From: Derrick Stolee @ 2021-05-17 17:22 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Eric Sunshine, Elijah Newren, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler On 5/12/2021 1:28 PM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Many places in the code were doing > while ((d = readdir(dir)) != NULL) { > if (is_dot_or_dotdot(d->d_name)) > continue; > ...process d... > } > Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner: > while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { > ...process d... > } > > This helper particularly simplifies checks for empty directories. > > Also use this helper in read_cached_dir() so that our statistics are > consistent across platforms. (In other words, read_cached_dir() should > have been using is_dot_or_dotdot() and skipping such entries, but did > not and left it to treat_path() to detect and mark such entries as > path_none.) I like the idea of this helper! > +struct dirent * > +readdir_skip_dot_and_dotdot(DIR *dirp) nit: This seems like an accidental newline between the return type and the method name. Otherwise, patch LGTM. -Stolee ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper 2021-05-17 17:22 ` Derrick Stolee @ 2021-05-18 3:34 ` Elijah Newren 0 siblings, 0 replies; 90+ messages in thread From: Elijah Newren @ 2021-05-18 3:34 UTC (permalink / raw) To: Derrick Stolee Cc: Elijah Newren via GitGitGadget, Git Mailing List, Eric Sunshine, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler On Mon, May 17, 2021 at 10:22 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 5/12/2021 1:28 PM, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > Many places in the code were doing > > while ((d = readdir(dir)) != NULL) { > > if (is_dot_or_dotdot(d->d_name)) > > continue; > > ...process d... > > } > > Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner: > > while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { > > ...process d... > > } > > > > This helper particularly simplifies checks for empty directories. > > > > Also use this helper in read_cached_dir() so that our statistics are > > consistent across platforms. (In other words, read_cached_dir() should > > have been using is_dot_or_dotdot() and skipping such entries, but did > > not and left it to treat_path() to detect and mark such entries as > > path_none.) > > I like the idea of this helper! > > > +struct dirent * > > +readdir_skip_dot_and_dotdot(DIR *dirp) > > nit: This seems like an accidental newline between the > return type and the method name. I would fix this, but the patch is already in next. If Junio decides to revert the last two patches out of next because of the Signed-off-by issue on the previous patch, then I'll resubmit this patch as well with this issue fixed. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [PATCH v5 0/9] Directory traversal fixes 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget ` (8 preceding siblings ...) 2021-05-12 17:28 ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget @ 2021-05-17 17:23 ` Derrick Stolee 9 siblings, 0 replies; 90+ messages in thread From: Derrick Stolee @ 2021-05-17 17:23 UTC (permalink / raw) To: Elijah Newren via GitGitGadget, git Cc: Eric Sunshine, Elijah Newren, Jeff King, Philip Oakley, Jeff Hostetler, Josh Steadmon, Jeff Hostetler On 5/12/2021 1:28 PM, Elijah Newren via GitGitGadget wrote: > This patchset fixes a few directory traversal issues, where fill_directory() > would traverse into directories that it shouldn't and not traverse into > directories that it should (one of which was originally reported on this > list at [1]). And it includes a few cleanups Sorry that I've been sleeping on this series since v1. I re-read this version from scratch and only found a couple nitpicks. > If anyone has any ideas about a better place to put the "Some sidenotes" > from the sixth commit message rather than keeping them in a random commit > message, that might be helpful. I don't have any better ideas, sorry. Thanks, -Stolee ^ permalink raw reply [flat|nested] 90+ messages in thread
end of thread, other threads:[~2021-05-19 1:44 UTC | newest] Thread overview: 90+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-05-07 4:04 [PATCH 0/5] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 1/5] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-07 4:27 ` Eric Sunshine 2021-05-07 5:00 ` Elijah Newren 2021-05-07 5:31 ` Eric Sunshine 2021-05-07 5:42 ` Elijah Newren 2021-05-07 5:56 ` Eric Sunshine 2021-05-07 23:05 ` Jeff King 2021-05-07 23:15 ` Eric Sunshine 2021-05-08 0:04 ` Elijah Newren 2021-05-08 0:10 ` Eric Sunshine 2021-05-08 17:20 ` Elijah Newren 2021-05-08 11:13 ` Philip Oakley 2021-05-08 17:20 ` Elijah Newren 2021-05-07 4:04 ` [PATCH 2/5] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 3/5] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-07 4:04 ` [PATCH 4/5] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget 2021-05-07 4:05 ` [PATCH 5/5] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget 2021-05-07 16:22 ` [PATCH 6/5] dir: update stale description of treat_directory() Derrick Stolee 2021-05-07 17:57 ` Elijah Newren 2021-05-07 16:27 ` [PATCH 0/5] Directory traversal fixes Derrick Stolee 2021-05-08 0:08 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 1/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-08 10:13 ` Junio C Hamano 2021-05-08 17:34 ` Elijah Newren 2021-05-08 10:19 ` Junio C Hamano 2021-05-08 17:41 ` Elijah Newren 2021-05-08 0:08 ` [PATCH v2 2/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 3/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 4/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 5/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 6/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 7/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-08 0:08 ` [PATCH v2 8/8] [RFC] dir: reported number of visited directories and paths with trace2 Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-08 19:58 ` [PATCH v3 1/8] [RFC] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-10 4:49 ` Junio C Hamano 2021-05-11 17:23 ` Elijah Newren 2021-05-11 16:17 ` Jeff Hostetler 2021-05-11 17:29 ` Elijah Newren 2021-05-08 19:58 ` [PATCH v3 2/8] [RFC] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget 2021-05-10 5:00 ` Junio C Hamano 2021-05-08 19:58 ` [PATCH v3 3/8] [RFC] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget 2021-05-10 5:09 ` Junio C Hamano 2021-05-11 17:40 ` Elijah Newren 2021-05-11 22:32 ` Junio C Hamano 2021-05-08 19:59 ` [PATCH v3 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-10 5:28 ` Junio C Hamano 2021-05-11 17:45 ` Elijah Newren 2021-05-11 22:43 ` Junio C Hamano 2021-05-12 2:07 ` Elijah Newren 2021-05-12 3:17 ` Junio C Hamano 2021-05-08 19:59 ` [PATCH v3 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget 2021-05-08 19:59 ` [PATCH v3 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-10 5:48 ` Junio C Hamano 2021-05-11 17:57 ` Elijah Newren 2021-05-08 19:59 ` [PATCH v3 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget 2021-05-08 19:59 ` [PATCH v3 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 0/8] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-11 19:06 ` Jeff Hostetler 2021-05-11 20:12 ` Elijah Newren 2021-05-11 23:12 ` Jeff Hostetler 2021-05-12 0:44 ` Elijah Newren 2021-05-12 12:26 ` Jeff Hostetler 2021-05-12 15:24 ` Elijah Newren 2021-05-11 18:34 ` [PATCH v4 2/8] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 3/8] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 4/8] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 5/8] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 6/8] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 7/8] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget 2021-05-11 18:34 ` [PATCH v4 8/8] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 0/9] Directory traversal fixes Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 1/9] dir: convert trace calls to trace2 equivalents Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 2/9] dir: report number of visited directories and paths with trace2 Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 3/9] ls-files: error out on -i unless -o or -c are specified Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 4/9] t7300: add testcase showing unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 5/9] t3001, t7300: add testcase showcasing missed directory traversal Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 6/9] dir: avoid unnecessary traversal into ignored directory Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 7/9] dir: traverse into untracked directories if they may have ignored subfiles Elijah Newren via GitGitGadget 2021-05-12 17:28 ` [PATCH v5 8/9] dir: update stale description of treat_directory() Derrick Stolee via GitGitGadget 2021-05-17 17:20 ` Derrick Stolee 2021-05-17 19:44 ` Junio C Hamano 2021-05-18 3:32 ` Elijah Newren 2021-05-19 1:44 ` Junio C Hamano 2021-05-12 17:28 ` [PATCH v5 9/9] dir: introduce readdir_skip_dot_and_dotdot() helper Elijah Newren via GitGitGadget 2021-05-17 17:22 ` Derrick Stolee 2021-05-18 3:34 ` Elijah Newren 2021-05-17 17:23 ` [PATCH v5 0/9] Directory traversal fixes Derrick Stolee
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).