git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/1] dir: consider worktree config in path recursion
@ 2022-05-05 20:32 Goss Geppert
  2022-05-05 20:32 ` [RFC PATCH 1/1] " Goss Geppert
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Goss Geppert @ 2022-05-05 20:32 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Please see the commit message for a description. Effectively, since
8d92fb2927 (dir: replace exponential algorithm with a linear one,
2020-04-01) the following no longer does what it used to do:

```
test_repo="$(mktemp -d /tmp/git-test-XXXXXXX)"
cd "$test_repo"
>test-file
git init
git config core.worktree /tmp
git add test-file
```

In terms of desired behavior, it seems that if it is possible to track
files that are outside the default worktree, then it should also be
possible to track files inside the default worktree location.
Personally, I was using this setup to track dotfiles and some other
system related snippets and liked that design more than alternatives I'm
aware of.  I ran into this problem because I was recently unable to add a
file to an existing repository with this configuration.

There are a couple open items but I wanted to get some feedback to
confirm that the behavior of this patch is desirable or if I'm
overlooking something entirely.

TODOs:
[ ] add test
[ ] add signoffs

Goss Geppert (1):
  dir: consider worktree config in path recursion

 dir.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)


base-commit: 0f828332d5ac36fc63b7d8202652efa152809856
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 1/1] dir: consider worktree config in path recursion
  2022-05-05 20:32 [RFC PATCH 0/1] dir: consider worktree config in path recursion Goss Geppert
@ 2022-05-05 20:32 ` Goss Geppert
  2022-05-07  3:26   ` Elijah Newren
  2022-05-06 17:02 ` [RFC PATCH 0/1] " Junio C Hamano
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Goss Geppert @ 2022-05-05 20:32 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Since 8d92fb2927 (dir: replace exponential algorithm with a linear one,
2020-04-01) the following no longer works:

    1) Initialize a repository.
    2) Set the `core.worktree` location to a parent directory of the
       default worktree.
    3) Add a file located in the default worktree location to the index
       (i.e. anywhere in the immediate parent directory of the gitdir).

This commit adds a check to determine whether a nested repository that
is encountered in recursing a path is actually `the_repository`.  If so,
simply treat the directory as if it doesn't contain a nested repository.

Prior to this commit, the `add` operation was silently ignored.
---
 dir.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index f2b0f24210..cef39f43d8 100644
--- a/dir.c
+++ b/dir.c
@@ -1861,7 +1861,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 */
 	enum path_treatment state;
 	int matches_how = 0;
-	int nested_repo = 0, check_only, stop_early;
+	int check_only, stop_early;
 	int old_ignored_nr, old_untracked_nr;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
@@ -1893,16 +1893,39 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
 		!(dir->flags & DIR_NO_GITLINKS)) {
+		/*
+		 * Determine if `dirname` is a nested repo by confirming that:
+		 * 1) we are in a nonbare repository, and
+		 * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
+		 *    which could occur if the `worktree` location was manually
+		 *    configured by the user
+		 */
+		int nested_repo;
 		struct strbuf sb = STRBUF_INIT;
 		strbuf_addstr(&sb, dirname);
 		nested_repo = is_nonbare_repository_dir(&sb);
+
+		if (nested_repo) {
+			char *real_dirname, *real_gitdir;
+			strbuf_reset(&sb);
+			strbuf_addstr(&sb, dirname);
+			strbuf_complete(&sb, '/');
+			strbuf_addstr(&sb, ".git");
+			real_dirname = real_pathdup(sb.buf, 0);
+			real_gitdir = real_pathdup(the_repository->gitdir, 0);
+
+			nested_repo = !!strcmp(real_dirname, real_gitdir);
+			free(real_gitdir);
+			free(real_dirname);
+		}
 		strbuf_release(&sb);
-	}
-	if (nested_repo) {
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
-			return path_none;
-		return excluded ? path_excluded : path_untracked;
+
+		if (nested_repo) {
+			if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+				(matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
+				return path_none;
+			return excluded ? path_excluded : path_untracked;
+		}
 	}
 
 	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/1] dir: consider worktree config in path recursion
  2022-05-05 20:32 [RFC PATCH 0/1] dir: consider worktree config in path recursion Goss Geppert
  2022-05-05 20:32 ` [RFC PATCH 1/1] " Goss Geppert
@ 2022-05-06 17:02 ` Junio C Hamano
  2022-05-06 20:00   ` oss dev
  2022-05-10 17:15 ` [PATCH v2 0/2] " Goss Geppert
  2022-05-20 19:28 ` [PATCH v3 0/3] dir: traverse into repository Goss Geppert
  3 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2022-05-06 17:02 UTC (permalink / raw)
  To: Goss Geppert; +Cc: git, christian w, Elijah Newren, Derrick Stolee

Goss Geppert <gg.oss.dev@gmail.com> writes:

> Please see the commit message for a description. Effectively, since
> 8d92fb2927 (dir: replace exponential algorithm with a linear one,
> 2020-04-01) the following no longer does what it used to do:
>
> ```
> test_repo="$(mktemp -d /tmp/git-test-XXXXXXX)"
> cd "$test_repo"
> >test-file
> git init
> git config core.worktree /tmp
> git add test-file

And this is supposed to add a new entry to the index at
"git-test-123456/test-file"?

If it added a new index entry 'test-file', that means "git config
core.worktree" is completely ignored, because we are not treating
"/tmp" as the root of the working tree as it is instructed and that
sounds like a bug.  Is that what you mean by "no longer does what it
used to do"?

Thanks for raising an issue (although it is not quite clear yet to
me what the issue is, exactly ;-).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 0/1] dir: consider worktree config in path recursion
  2022-05-06 17:02 ` [RFC PATCH 0/1] " Junio C Hamano
@ 2022-05-06 20:00   ` oss dev
  0 siblings, 0 replies; 24+ messages in thread
From: oss dev @ 2022-05-06 20:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, christian w, Elijah Newren, Derrick Stolee

On Fri, May 6, 2022 at 1:02 PM Junio C Hamano <gitster@pobox.com> wrote:

> And this is supposed to add a new entry to the index at
> "git-test-123456/test-file"?

Correct, I expected an index entry at "git-test-123456/test-file".  If
I run the following:

```
test_repo="$(mktemp -d /tmp/git-test-XXXXXXX)"
cd "$test_repo"
>test-file
>/tmp/test-file
git init
git config core.worktree /tmp
git add test-file
git add /tmp/test-file
```

.. then with current git I get an index entry for "/tmp/test-file" but
not "/tmp/git-test-123456/test-file".  That is, I can add files
outside the standard worktree location but not inside it.

A playground example of the prior behavior (git 2.25.1) is here:
https://www.onlinegdb.com/OXt2cvL8S

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 1/1] dir: consider worktree config in path recursion
  2022-05-05 20:32 ` [RFC PATCH 1/1] " Goss Geppert
@ 2022-05-07  3:26   ` Elijah Newren
  2022-05-07 17:59     ` oss dev
  0 siblings, 1 reply; 24+ messages in thread
From: Elijah Newren @ 2022-05-07  3:26 UTC (permalink / raw)
  To: Goss Geppert
  Cc: Git Mailing List, Junio C Hamano, christian w, Derrick Stolee

On Thu, May 5, 2022 at 1:32 PM Goss Geppert <gg.oss.dev@gmail.com> wrote:
>
> Since 8d92fb2927 (dir: replace exponential algorithm with a linear one,
> 2020-04-01) the following no longer works:
>
>     1) Initialize a repository.
>     2) Set the `core.worktree` location to a parent directory of the
>        default worktree.
>     3) Add a file located in the default worktree location to the index
>        (i.e. anywhere in the immediate parent directory of the gitdir).
>
> This commit adds a check to determine whether a nested repository that
> is encountered in recursing a path is actually `the_repository`.  If so,
> simply treat the directory as if it doesn't contain a nested repository.
>
> Prior to this commit, the `add` operation was silently ignored.
> ---
>  dir.c | 37 ++++++++++++++++++++++++++++++-------
>  1 file changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index f2b0f24210..cef39f43d8 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1861,7 +1861,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>          */
>         enum path_treatment state;
>         int matches_how = 0;
> -       int nested_repo = 0, check_only, stop_early;
> +       int check_only, stop_early;

This part of the patch, along with two other parts below, is
orthogonal to the actual fix being made.  It probably makes the code
clearer, but should be done in an independent cleanup commit.

>         int old_ignored_nr, old_untracked_nr;
>         /* The "len-1" is to strip the final '/' */
>         enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
> @@ -1893,16 +1893,39 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>
>         if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
>                 !(dir->flags & DIR_NO_GITLINKS)) {
> +               /*
> +                * Determine if `dirname` is a nested repo by confirming that:
> +                * 1) we are in a nonbare repository, and
> +                * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
> +                *    which could occur if the `worktree` location was manually
> +                *    configured by the user
> +                */

This is a good comment, but it'd be really nice to be able to point a
user at a testcase in the testsuite for them to inspect further (e.g.
"see t####, 'TESTCASE DESCRIPTION' for an case where this matters").

> +               int nested_repo;

Just this line immediately above this comment is part of the orthogonal cleanup.

>                 struct strbuf sb = STRBUF_INIT;
>                 strbuf_addstr(&sb, dirname);
>                 nested_repo = is_nonbare_repository_dir(&sb);
> +
> +               if (nested_repo) {
> +                       char *real_dirname, *real_gitdir;
> +                       strbuf_reset(&sb);
> +                       strbuf_addstr(&sb, dirname);
> +                       strbuf_complete(&sb, '/');
> +                       strbuf_addstr(&sb, ".git");

> +                       real_dirname = real_pathdup(sb.buf, 0);
> +                       real_gitdir = real_pathdup(the_repository->gitdir, 0);

I haven't thought it through, but is there a reason you can't just
compare the_repository->gitdir to sb.buf at this point?  Why is
real_pathdup needed?

> +
> +                       nested_repo = !!strcmp(real_dirname, real_gitdir);
> +                       free(real_gitdir);
> +                       free(real_dirname);
> +               }

Everything up to here other than the two parts I mentioned as being
orthogonal (both relating to where "nested_repo" is defined), makes up
the actual fix.

>                 strbuf_release(&sb);
> -       }
> -       if (nested_repo) {
> -               if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
> -                   (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
> -                       return path_none;
> -               return excluded ? path_excluded : path_untracked;
> +
> +               if (nested_repo) {
> +                       if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
> +                               (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
> +                               return path_none;
> +                       return excluded ? path_excluded : path_untracked;
> +               }

This final block is part of the orthogonal code cleanup that belongs
in a separate commit.

>         }
>
>         if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
> --
> 2.36.0

Thanks for finding, reporting, and sending in a patch.  Could you
split up the patch as indicated above, and add a testcase to the patch
with the fix, and include your Signed-off-by trailer on the commits?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 1/1] dir: consider worktree config in path recursion
  2022-05-07  3:26   ` Elijah Newren
@ 2022-05-07 17:59     ` oss dev
  0 siblings, 0 replies; 24+ messages in thread
From: oss dev @ 2022-05-07 17:59 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Git Mailing List, Junio C Hamano, christian w, Derrick Stolee

On Fri, May 6, 2022 at 11:26 PM Elijah Newren <newren@gmail.com> wrote:
> I haven't thought it through, but is there a reason you can't just
> compare the_repository->gitdir to sb.buf at this point?  Why is
> real_pathdup needed?

I used `real_pathdup` here because `the_repository->gitdir` is not necessarily
normalized as required for the `strcmp`.  For example, when using option
`--git-dir=./././xyz/.git` while outside the worktree, the path will remain as
is (see [1] and [2]).

[1]: https://github.com/git/git/blob/e8005e4871f130c4e402ddca2032c111252f070a/setup.c#L962
[2]: https://github.com/git/git/blob/e8005e4871f130c4e402ddca2032c111252f070a/environment.c#L353

> Thanks for finding, reporting, and sending in a patch.  Could you
> split up the patch as indicated above, and add a testcase to the patch
> with the fix, and include your Signed-off-by trailer on the commits?

WIll do, thx.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/2] dir: consider worktree config in path recursion
  2022-05-05 20:32 [RFC PATCH 0/1] dir: consider worktree config in path recursion Goss Geppert
  2022-05-05 20:32 ` [RFC PATCH 1/1] " Goss Geppert
  2022-05-06 17:02 ` [RFC PATCH 0/1] " Junio C Hamano
@ 2022-05-10 17:15 ` Goss Geppert
  2022-05-10 17:15   ` [PATCH v2 1/2] " Goss Geppert
  2022-05-10 17:15   ` [PATCH v2 2/2] dir: minor refactoring / clean-up Goss Geppert
  2022-05-20 19:28 ` [PATCH v3 0/3] dir: traverse into repository Goss Geppert
  3 siblings, 2 replies; 24+ messages in thread
From: Goss Geppert @ 2022-05-10 17:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Updated for requested revisions:
* Split patch between fix and clean-up
* Added a test
* Included my sign-off

It wasn't clear to me where the testcases should be placed--let me know
if a different location is more appropriate.


Goss Geppert (2):
  dir: consider worktree config in path recursion
  dir: minor refactoring / clean-up

 dir.c                          | 38 ++++++++++++---
 t/t2205-add-worktree-config.sh | 89 ++++++++++++++++++++++++++++++++++
 2 files changed, 120 insertions(+), 7 deletions(-)
 create mode 100755 t/t2205-add-worktree-config.sh

Range-diff against v1:
1:  b9ca67e61d ! 1:  83bf9f40ac dir: consider worktree config in path recursion
    @@ Commit message
     
         Prior to this commit, the `add` operation was silently ignored.
     
    +    Signed-off-by: Goss Geppert <ggossdev@gmail.com>
    +
      ## dir.c ##
     @@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
    - 	 */
    - 	enum path_treatment state;
    - 	int matches_how = 0;
    --	int nested_repo = 0, check_only, stop_early;
    -+	int check_only, stop_early;
    - 	int old_ignored_nr, old_untracked_nr;
    - 	/* The "len-1" is to strip the final '/' */
    - 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
    -@@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
      
      	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
      		!(dir->flags & DIR_NO_GITLINKS)) {
    @@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
     +		 * 1) we are in a nonbare repository, and
     +		 * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
     +		 *    which could occur if the `worktree` location was manually
    -+		 *    configured by the user
    ++		 *    configured by the user; see t2205 testcases 1a-1d for examples
    ++		 *    where this matters
     +		 */
    -+		int nested_repo;
      		struct strbuf sb = STRBUF_INIT;
      		strbuf_addstr(&sb, dirname);
      		nested_repo = is_nonbare_repository_dir(&sb);
    @@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
     +			free(real_dirname);
     +		}
      		strbuf_release(&sb);
    --	}
    --	if (nested_repo) {
    --		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
    --		    (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
    --			return path_none;
    --		return excluded ? path_excluded : path_untracked;
    -+
    -+		if (nested_repo) {
    -+			if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
    -+				(matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
    -+				return path_none;
    -+			return excluded ? path_excluded : path_untracked;
    -+		}
      	}
    - 
    - 	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
    + 	if (nested_repo) {
    +
    + ## t/t2205-add-worktree-config.sh (new) ##
    +@@
    ++#!/bin/sh
    ++
    ++test_description='directory traversal respects worktree config
    ++
    ++This test configures the repository`s worktree to be two levels above the
    ++`.git` directory and checks whether we are able to add to the index those files
    ++that are in either (1) the manually configured worktree directory or (2) the
    ++standard worktree location with respect to the `.git` directory (i.e. ensuring
    ++that the encountered `.git` directory is not treated as belonging to a foreign
    ++nested repository)'
    ++
    ++. ./test-lib.sh
    ++
    ++test_expect_success '1a: setup' '
    ++	test_create_repo test1 &&
    ++	git --git-dir="test1/.git" config core.worktree "$(pwd)" &&
    ++
    ++	mkdir -p outside-tracked outside-untracked &&
    ++	mkdir -p test1/inside-tracked test1/inside-untracked &&
    ++	>file-tracked &&
    ++	>file-untracked &&
    ++	>outside-tracked/file &&
    ++	>outside-untracked/file &&
    ++	>test1/file-tracked &&
    ++	>test1/file-untracked &&
    ++	>test1/inside-tracked/file &&
    ++	>test1/inside-untracked/file &&
    ++
    ++	cat >expect-tracked-unsorted <<-EOF &&
    ++	../file-tracked
    ++	../outside-tracked/file
    ++	file-tracked
    ++	inside-tracked/file
    ++	EOF
    ++
    ++	cat >expect-untracked-unsorted <<-EOF &&
    ++	../file-untracked
    ++	../outside-untracked/file
    ++	file-untracked
    ++	inside-untracked/file
    ++	EOF
    ++
    ++	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
    ++
    ++	cat >.gitignore <<-EOF
    ++	.gitignore
    ++	actual-*
    ++	expect-*
    ++	EOF
    ++'
    ++
    ++test_expect_success '1b: pre-add all' '
    ++	local parent_dir="$(pwd)" &&
    ++	(
    ++		cd test1 &&
    ++		git ls-files -o --exclude-standard "$parent_dir" >../actual-all-unsorted
    ++	) &&
    ++	sort actual-all-unsorted >actual-all &&
    ++	sort expect-all-unsorted >expect-all &&
    ++	test_cmp expect-all actual-all
    ++'
    ++
    ++test_expect_success '1c: post-add tracked' '
    ++	local parent_dir="$(pwd)" &&
    ++	(
    ++		cd test1 &&
    ++		git add file-tracked &&
    ++		git add inside-tracked &&
    ++		git add ../outside-tracked &&
    ++		git add "$parent_dir/file-tracked" &&
    ++		git ls-files "$parent_dir" >../actual-tracked-unsorted
    ++	) &&
    ++	sort actual-tracked-unsorted >actual-tracked &&
    ++	sort expect-tracked-unsorted >expect-tracked &&
    ++	test_cmp expect-tracked actual-tracked
    ++'
    ++
    ++test_expect_success '1d: post-add untracked' '
    ++	local parent_dir="$(pwd)" &&
    ++	(
    ++		cd test1 &&
    ++		git ls-files -o --exclude-standard "$parent_dir" >../actual-untracked-unsorted
    ++	) &&
    ++	sort actual-untracked-unsorted >actual-untracked &&
    ++	sort expect-untracked-unsorted >expect-untracked &&
    ++	test_cmp expect-untracked actual-untracked
    ++'
    ++
    ++test_done
-:  ---------- > 2:  0da9804776 dir: minor refactoring / clean-up
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-10 17:15 ` [PATCH v2 0/2] " Goss Geppert
@ 2022-05-10 17:15   ` Goss Geppert
  2022-05-11 16:37     ` Junio C Hamano
                       ` (2 more replies)
  2022-05-10 17:15   ` [PATCH v2 2/2] dir: minor refactoring / clean-up Goss Geppert
  1 sibling, 3 replies; 24+ messages in thread
From: Goss Geppert @ 2022-05-10 17:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Since 8d92fb2927 (dir: replace exponential algorithm with a linear one,
2020-04-01) the following no longer works:

    1) Initialize a repository.
    2) Set the `core.worktree` location to a parent directory of the
       default worktree.
    3) Add a file located in the default worktree location to the index
       (i.e. anywhere in the immediate parent directory of the gitdir).

This commit adds a check to determine whether a nested repository that
is encountered in recursing a path is actually `the_repository`.  If so,
simply treat the directory as if it doesn't contain a nested repository.

Prior to this commit, the `add` operation was silently ignored.

Signed-off-by: Goss Geppert <ggossdev@gmail.com>
---
 dir.c                          | 22 +++++++++
 t/t2205-add-worktree-config.sh | 89 ++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+)
 create mode 100755 t/t2205-add-worktree-config.sh

diff --git a/dir.c b/dir.c
index f2b0f24210..a1886e61a3 100644
--- a/dir.c
+++ b/dir.c
@@ -1893,9 +1893,31 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
 		!(dir->flags & DIR_NO_GITLINKS)) {
+		/*
+		 * Determine if `dirname` is a nested repo by confirming that:
+		 * 1) we are in a nonbare repository, and
+		 * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
+		 *    which could occur if the `worktree` location was manually
+		 *    configured by the user; see t2205 testcases 1a-1d for examples
+		 *    where this matters
+		 */
 		struct strbuf sb = STRBUF_INIT;
 		strbuf_addstr(&sb, dirname);
 		nested_repo = is_nonbare_repository_dir(&sb);
+
+		if (nested_repo) {
+			char *real_dirname, *real_gitdir;
+			strbuf_reset(&sb);
+			strbuf_addstr(&sb, dirname);
+			strbuf_complete(&sb, '/');
+			strbuf_addstr(&sb, ".git");
+			real_dirname = real_pathdup(sb.buf, 0);
+			real_gitdir = real_pathdup(the_repository->gitdir, 0);
+
+			nested_repo = !!strcmp(real_dirname, real_gitdir);
+			free(real_gitdir);
+			free(real_dirname);
+		}
 		strbuf_release(&sb);
 	}
 	if (nested_repo) {
diff --git a/t/t2205-add-worktree-config.sh b/t/t2205-add-worktree-config.sh
new file mode 100755
index 0000000000..ca70cf3fe2
--- /dev/null
+++ b/t/t2205-add-worktree-config.sh
@@ -0,0 +1,89 @@
+#!/bin/sh
+
+test_description='directory traversal respects worktree config
+
+This test configures the repository`s worktree to be two levels above the
+`.git` directory and checks whether we are able to add to the index those files
+that are in either (1) the manually configured worktree directory or (2) the
+standard worktree location with respect to the `.git` directory (i.e. ensuring
+that the encountered `.git` directory is not treated as belonging to a foreign
+nested repository)'
+
+. ./test-lib.sh
+
+test_expect_success '1a: setup' '
+	test_create_repo test1 &&
+	git --git-dir="test1/.git" config core.worktree "$(pwd)" &&
+
+	mkdir -p outside-tracked outside-untracked &&
+	mkdir -p test1/inside-tracked test1/inside-untracked &&
+	>file-tracked &&
+	>file-untracked &&
+	>outside-tracked/file &&
+	>outside-untracked/file &&
+	>test1/file-tracked &&
+	>test1/file-untracked &&
+	>test1/inside-tracked/file &&
+	>test1/inside-untracked/file &&
+
+	cat >expect-tracked-unsorted <<-EOF &&
+	../file-tracked
+	../outside-tracked/file
+	file-tracked
+	inside-tracked/file
+	EOF
+
+	cat >expect-untracked-unsorted <<-EOF &&
+	../file-untracked
+	../outside-untracked/file
+	file-untracked
+	inside-untracked/file
+	EOF
+
+	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
+
+	cat >.gitignore <<-EOF
+	.gitignore
+	actual-*
+	expect-*
+	EOF
+'
+
+test_expect_success '1b: pre-add all' '
+	local parent_dir="$(pwd)" &&
+	(
+		cd test1 &&
+		git ls-files -o --exclude-standard "$parent_dir" >../actual-all-unsorted
+	) &&
+	sort actual-all-unsorted >actual-all &&
+	sort expect-all-unsorted >expect-all &&
+	test_cmp expect-all actual-all
+'
+
+test_expect_success '1c: post-add tracked' '
+	local parent_dir="$(pwd)" &&
+	(
+		cd test1 &&
+		git add file-tracked &&
+		git add inside-tracked &&
+		git add ../outside-tracked &&
+		git add "$parent_dir/file-tracked" &&
+		git ls-files "$parent_dir" >../actual-tracked-unsorted
+	) &&
+	sort actual-tracked-unsorted >actual-tracked &&
+	sort expect-tracked-unsorted >expect-tracked &&
+	test_cmp expect-tracked actual-tracked
+'
+
+test_expect_success '1d: post-add untracked' '
+	local parent_dir="$(pwd)" &&
+	(
+		cd test1 &&
+		git ls-files -o --exclude-standard "$parent_dir" >../actual-untracked-unsorted
+	) &&
+	sort actual-untracked-unsorted >actual-untracked &&
+	sort expect-untracked-unsorted >expect-untracked &&
+	test_cmp expect-untracked actual-untracked
+'
+
+test_done
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] dir: minor refactoring / clean-up
  2022-05-10 17:15 ` [PATCH v2 0/2] " Goss Geppert
  2022-05-10 17:15   ` [PATCH v2 1/2] " Goss Geppert
@ 2022-05-10 17:15   ` Goss Geppert
  2022-05-11 16:51     ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Goss Geppert @ 2022-05-10 17:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Improve readability.

Signed-off-by: Goss Geppert <ggossdev@gmail.com>
---
 dir.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index a1886e61a3..329db72999 100644
--- a/dir.c
+++ b/dir.c
@@ -1861,7 +1861,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 */
 	enum path_treatment state;
 	int matches_how = 0;
-	int nested_repo = 0, check_only, stop_early;
+	int check_only, stop_early;
 	int old_ignored_nr, old_untracked_nr;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
@@ -1901,6 +1901,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		 *    configured by the user; see t2205 testcases 1a-1d for examples
 		 *    where this matters
 		 */
+		int nested_repo;
 		struct strbuf sb = STRBUF_INIT;
 		strbuf_addstr(&sb, dirname);
 		nested_repo = is_nonbare_repository_dir(&sb);
@@ -1919,12 +1920,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 			free(real_dirname);
 		}
 		strbuf_release(&sb);
-	}
-	if (nested_repo) {
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
-			return path_none;
-		return excluded ? path_excluded : path_untracked;
+
+		if (nested_repo) {
+			if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+				(matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
+				return path_none;
+			return excluded ? path_excluded : path_untracked;
+		}
 	}
 
 	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-10 17:15   ` [PATCH v2 1/2] " Goss Geppert
@ 2022-05-11 16:37     ` Junio C Hamano
  2022-05-20 19:45       ` oss dev
  2022-05-24 14:29       ` Elijah Newren
  2022-05-11 23:07     ` Junio C Hamano
  2022-05-23 19:23     ` Derrick Stolee
  2 siblings, 2 replies; 24+ messages in thread
From: Junio C Hamano @ 2022-05-11 16:37 UTC (permalink / raw)
  To: Goss Geppert; +Cc: git, christian w, Elijah Newren, Derrick Stolee

Goss Geppert <gg.oss.dev@gmail.com> writes:

> diff --git a/dir.c b/dir.c
> index f2b0f24210..a1886e61a3 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1893,9 +1893,31 @@ static enum path_treatment treat_directory(struct dir_struct *dir,

>  	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
>  		!(dir->flags & DIR_NO_GITLINKS)) {
> +		/*
> +		 * Determine if `dirname` is a nested repo by confirming that:
> +		 * 1) we are in a nonbare repository, and
> +		 * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
> +		 *    which could occur if the `worktree` location was manually
> +		 *    configured by the user; see t2205 testcases 1a-1d for examples
> +		 *    where this matters
> +		 */
>  		struct strbuf sb = STRBUF_INIT;
>  		strbuf_addstr(&sb, dirname);
>  		nested_repo = is_nonbare_repository_dir(&sb);
> +
> +		if (nested_repo) {
> +			char *real_dirname, *real_gitdir;
> +			strbuf_reset(&sb);
> +			strbuf_addstr(&sb, dirname);
> +			strbuf_complete(&sb, '/');
> +			strbuf_addstr(&sb, ".git");
> +			real_dirname = real_pathdup(sb.buf, 0);

Is this _complete() necessary?

Near the top of this function there is

	/* The "len-1" is to strip the final '/' */
	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);

which indicates that we are safe to assume dirname ends with '/'.

> +			real_gitdir = real_pathdup(the_repository->gitdir, 0);

This function is repeatedly called during the traversal.

How expensive is it to keep calling real_pathdup() on the constant
the_repository->gitdir just in case it might be the same as our true
GIT_DIR?

> +
> +			nested_repo = !!strcmp(real_dirname, real_gitdir);
> +			free(real_gitdir);
> +			free(real_dirname);
> +		}
>  		strbuf_release(&sb);
>  	}
>  	if (nested_repo) {
> diff --git a/t/t2205-add-worktree-config.sh b/t/t2205-add-worktree-config.sh
> new file mode 100755
> index 0000000000..ca70cf3fe2
> --- /dev/null
> +++ b/t/t2205-add-worktree-config.sh
> @@ -0,0 +1,89 @@
> +#!/bin/sh
> +
> +test_description='directory traversal respects worktree config
> +
> +This test configures the repository`s worktree to be two levels above the
> +`.git` directory and checks whether we are able to add to the index those files
> +that are in either (1) the manually configured worktree directory or (2) the
> +standard worktree location with respect to the `.git` directory (i.e. ensuring
> +that the encountered `.git` directory is not treated as belonging to a foreign
> +nested repository)'
> +
> +. ./test-lib.sh
> +
> +test_expect_success '1a: setup' '
> +	test_create_repo test1 &&
> +	git --git-dir="test1/.git" config core.worktree "$(pwd)" &&
> +
> +	mkdir -p outside-tracked outside-untracked &&
> +	mkdir -p test1/inside-tracked test1/inside-untracked &&
> +	>file-tracked &&
> +	>file-untracked &&
> +	>outside-tracked/file &&
> +	>outside-untracked/file &&
> +	>test1/file-tracked &&
> +	>test1/file-untracked &&
> +	>test1/inside-tracked/file &&
> +	>test1/inside-untracked/file &&
> +
> +	cat >expect-tracked-unsorted <<-EOF &&
> +	../file-tracked
> +	../outside-tracked/file
> +	file-tracked
> +	inside-tracked/file
> +	EOF
> +
> +	cat >expect-untracked-unsorted <<-EOF &&
> +	../file-untracked
> +	../outside-untracked/file
> +	file-untracked
> +	inside-untracked/file
> +	EOF
> +
> +	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
> +
> +	cat >.gitignore <<-EOF
> +	.gitignore
> +	actual-*
> +	expect-*
> +	EOF
> +'
> +
> +test_expect_success '1b: pre-add all' '
> +	local parent_dir="$(pwd)" &&
> +	(
> +		cd test1 &&
> +		git ls-files -o --exclude-standard "$parent_dir" >../actual-all-unsorted
> +	) &&
> +	sort actual-all-unsorted >actual-all &&
> +	sort expect-all-unsorted >expect-all &&
> +	test_cmp expect-all actual-all
> +'
> +
> +test_expect_success '1c: post-add tracked' '
> +	local parent_dir="$(pwd)" &&
> +	(
> +		cd test1 &&
> +		git add file-tracked &&
> +		git add inside-tracked &&
> +		git add ../outside-tracked &&
> +		git add "$parent_dir/file-tracked" &&
> +		git ls-files "$parent_dir" >../actual-tracked-unsorted
> +	) &&
> +	sort actual-tracked-unsorted >actual-tracked &&
> +	sort expect-tracked-unsorted >expect-tracked &&
> +	test_cmp expect-tracked actual-tracked
> +'
> +
> +test_expect_success '1d: post-add untracked' '
> +	local parent_dir="$(pwd)" &&
> +	(
> +		cd test1 &&
> +		git ls-files -o --exclude-standard "$parent_dir" >../actual-untracked-unsorted
> +	) &&
> +	sort actual-untracked-unsorted >actual-untracked &&
> +	sort expect-untracked-unsorted >expect-untracked &&
> +	test_cmp expect-untracked actual-untracked
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] dir: minor refactoring / clean-up
  2022-05-10 17:15   ` [PATCH v2 2/2] dir: minor refactoring / clean-up Goss Geppert
@ 2022-05-11 16:51     ` Junio C Hamano
  2022-05-20 20:03       ` oss dev
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2022-05-11 16:51 UTC (permalink / raw)
  To: Goss Geppert; +Cc: git, christian w, Elijah Newren, Derrick Stolee

Goss Geppert <gg.oss.dev@gmail.com> writes:

> Improve readability.

It reads somewhat a subjective opinion, without explaining how/why
the change makes it more readable.  Perhaps

	Narrow the scope of the nested_repo variable to the block
	that uses it.

or something?

I was hoping that we can/should apply these two patches near where
the breakage happened, but unfortunately this part has been updated
a bit since then, so merging upwards would involve a bit of conflict
resolution.  We should be able to manage.

Thanks.


> Signed-off-by: Goss Geppert <ggossdev@gmail.com>
> ---
>  dir.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index a1886e61a3..329db72999 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1861,7 +1861,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  	 */
>  	enum path_treatment state;
>  	int matches_how = 0;
> -	int nested_repo = 0, check_only, stop_early;
> +	int check_only, stop_early;
>  	int old_ignored_nr, old_untracked_nr;
>  	/* The "len-1" is to strip the final '/' */
>  	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
> @@ -1901,6 +1901,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  		 *    configured by the user; see t2205 testcases 1a-1d for examples
>  		 *    where this matters
>  		 */
> +		int nested_repo;
>  		struct strbuf sb = STRBUF_INIT;
>  		strbuf_addstr(&sb, dirname);
>  		nested_repo = is_nonbare_repository_dir(&sb);
> @@ -1919,12 +1920,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  			free(real_dirname);
>  		}
>  		strbuf_release(&sb);
> -	}
> -	if (nested_repo) {
> -		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
> -		    (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
> -			return path_none;
> -		return excluded ? path_excluded : path_untracked;
> +
> +		if (nested_repo) {
> +			if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
> +				(matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
> +				return path_none;
> +			return excluded ? path_excluded : path_untracked;
> +		}
>  	}
>  
>  	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-10 17:15   ` [PATCH v2 1/2] " Goss Geppert
  2022-05-11 16:37     ` Junio C Hamano
@ 2022-05-11 23:07     ` Junio C Hamano
  2022-05-20 20:01       ` oss dev
  2022-05-23 19:23     ` Derrick Stolee
  2 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2022-05-11 23:07 UTC (permalink / raw)
  To: Goss Geppert; +Cc: git, christian w, Elijah Newren, Derrick Stolee

Goss Geppert <gg.oss.dev@gmail.com> writes:

> Since 8d92fb2927 (dir: replace exponential algorithm with a linear one,
> 2020-04-01) the following no longer works:
>
>     1) Initialize a repository.
>     2) Set the `core.worktree` location to a parent directory of the
>        default worktree.
>     3) Add a file located in the default worktree location to the index
>        (i.e. anywhere in the immediate parent directory of the gitdir).

I think I've mentioned this ealier, but the above only describes the
scenario and does not say what behaviour is expected and what behaviour
is observed.  "no longer works" is OK, but not sufficient.

Easily remedied by saying "It used to do X, but now it does Y" after
the above description of the scenario to describe what happens and
what you want to happen.

> +	git --git-dir="test1/.git" config core.worktree "$(pwd)" &&

I wonder if this lets funny paths to be added to the index, e.g.
would "git add test1" recursively add everything in that directory?

Thanks.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 0/3] dir: traverse into repository
  2022-05-05 20:32 [RFC PATCH 0/1] dir: consider worktree config in path recursion Goss Geppert
                   ` (2 preceding siblings ...)
  2022-05-10 17:15 ` [PATCH v2 0/2] " Goss Geppert
@ 2022-05-20 19:28 ` Goss Geppert
  2022-05-20 19:28   ` [PATCH v3 1/3] " Goss Geppert
                     ` (2 more replies)
  3 siblings, 3 replies; 24+ messages in thread
From: Goss Geppert @ 2022-05-20 19:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Thanks for your feedback so far. This revision includes a number of changes:

* I realized that using option `--git-dir` (or environment variable `GIT_DIR`)
  when the working directory is outside the repository can also cause the
  directory traversal to fail to enter `the_repository`.  I've added test cases
  and updated comments / messages to reflect this.
* I noticed that `real_pathdup` can fail even for valid paths (and return NULL)
  if e.g. the path contains too many nested symlinks (currently 32).  We'll now
  `die_on_error`.
* I no longer reset the string buffer holding the `dirname` and just reuse it.
  While the prior call to `is_nonbare_repository_dir` does not treat its
  argument as const, the function does reset the argument to its effective
  input state.  The `strbuf_addstr` and `strbuf_complete` have also been
  removed.
* I added a separate commit to cache the `real_gitdir` value.  On my mystery
  machine, the call to `real_pathdup`, instrumented with trace2, takes an
  anecdotal 40 usecs for a somewhat lengthy path [1].  This compares to around
  10 usecs for an empty trace2 region.  I can remove the commit if you don't
  think the performance savings are worth the clutter.

Let me know if you have further comments or suggestions.

[1]: path with one symlink: './repo/link/a/b/././././..//////c/d/./xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy/././../e/../f/../g/../h/../././///../../..'

Goss Geppert (3):
  dir: traverse into repository
  dir: cache git_dir's realpath
  dir: minor refactoring / clean-up

 dir.c                          |  90 ++++++++---
 t/t2205-add-worktree-config.sh | 274 +++++++++++++++++++++++++++++++++
 2 files changed, 341 insertions(+), 23 deletions(-)
 create mode 100755 t/t2205-add-worktree-config.sh

Range-diff against v2:
1:  26e6a878cd ! 1:  391fd4d8cf dir: consider worktree config in path recursion
    @@ Metadata
     Author: Goss Geppert <ggossdev@gmail.com>
     
      ## Commit message ##
    -    dir: consider worktree config in path recursion
    +    dir: traverse into repository
     
         Since 8d92fb2927 (dir: replace exponential algorithm with a linear one,
    -    2020-04-01) the following no longer works:
    +    2020-04-01) traversing into a repository's directory tree when the
    +    traversal began outside the repository's standard location has failed
    +    because the encountered repository was identified as a nested foreign
    +    repository.
     
    -        1) Initialize a repository.
    -        2) Set the `core.worktree` location to a parent directory of the
    -           default worktree.
    -        3) Add a file located in the default worktree location to the index
    -           (i.e. anywhere in the immediate parent directory of the gitdir).
    +    Prior to this commit, the failure to traverse into a repository's
    +    default worktree location was observable from a user's perspective under
    +    either of the following conditions (there may be others):
    +
    +        1) Set the `core.worktree` location to a parent directory of the
    +           default worktree; or
    +        2) Use the `--git_dir` option while the working directory is outside
    +           the repository's default worktree location
    +
    +    Under either of these conditions, symptoms of the failure to traverse
    +    into the repository's default worktree location include the inability to
    +    add files to the index or get a list of untracked files via ls-files.
     
         This commit adds a check to determine whether a nested repository that
         is encountered in recursing a path is actually `the_repository`.  If so,
    -    simply treat the directory as if it doesn't contain a nested repository.
    +    we simply treat the directory as if it doesn't contain a nested
    +    repository.
     
    -    Prior to this commit, the `add` operation was silently ignored.
    +    The commit includes test-cases to reduce the likelihood of future
    +    regressions.
     
         Signed-off-by: Goss Geppert <ggossdev@gmail.com>
     
    @@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
     +		 * Determine if `dirname` is a nested repo by confirming that:
     +		 * 1) we are in a nonbare repository, and
     +		 * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
    -+		 *    which could occur if the `worktree` location was manually
    -+		 *    configured by the user; see t2205 testcases 1a-1d for examples
    -+		 *    where this matters
    ++		 *    which could occur if the git_dir or worktree location was
    ++		 *    manually configured by the user; see t2205 testcases 1-3 for
    ++		 *    examples where this matters
     +		 */
      		struct strbuf sb = STRBUF_INIT;
      		strbuf_addstr(&sb, dirname);
    @@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
     +
     +		if (nested_repo) {
     +			char *real_dirname, *real_gitdir;
    -+			strbuf_reset(&sb);
    -+			strbuf_addstr(&sb, dirname);
    -+			strbuf_complete(&sb, '/');
     +			strbuf_addstr(&sb, ".git");
    -+			real_dirname = real_pathdup(sb.buf, 0);
    -+			real_gitdir = real_pathdup(the_repository->gitdir, 0);
    ++			real_dirname = real_pathdup(sb.buf, 1);
    ++			real_gitdir = real_pathdup(the_repository->gitdir, 1);
     +
     +			nested_repo = !!strcmp(real_dirname, real_gitdir);
     +			free(real_gitdir);
    @@ t/t2205-add-worktree-config.sh (new)
     @@
     +#!/bin/sh
     +
    -+test_description='directory traversal respects worktree config
    ++test_description='directory traversal respects user config
    ++
    ++This test verifies the traversal of the directory tree when the traversal begins
    ++outside the repository.  Two instances for which this can occur are tested:
     +
    -+This test configures the repository`s worktree to be two levels above the
    -+`.git` directory and checks whether we are able to add to the index those files
    -+that are in either (1) the manually configured worktree directory or (2) the
    -+standard worktree location with respect to the `.git` directory (i.e. ensuring
    -+that the encountered `.git` directory is not treated as belonging to a foreign
    -+nested repository)'
    ++	1) The user manually sets the worktree.  For this instance, the test sets
    ++	   the worktree two levels above the `.git` directory and checks whether we
    ++	   are able to add to the index those files that are in either (1) the
    ++	   manually configured worktree directory or (2) the standard worktree
    ++	   location with respect to the `.git` directory (i.e. ensuring that the
    ++	   encountered `.git` directory is not treated as belonging to a foreign
    ++	   nested repository).
    ++	2) The user manually sets the `git_dir` while the working directory is
    ++	   outside the repository.  The test checks that files inside the
    ++	   repository can be added to the index.
    ++	'
     +
     +. ./test-lib.sh
     +
    -+test_expect_success '1a: setup' '
    -+	test_create_repo test1 &&
    -+	git --git-dir="test1/.git" config core.worktree "$(pwd)" &&
    ++test_expect_success '1a: setup--config worktree' '
    ++	mkdir test1 &&
    ++	(
    ++	cd test1 &&
    ++	test_create_repo repo &&
    ++	git --git-dir="repo/.git" config core.worktree "$(pwd)" &&
     +
     +	mkdir -p outside-tracked outside-untracked &&
    -+	mkdir -p test1/inside-tracked test1/inside-untracked &&
    ++	mkdir -p repo/inside-tracked repo/inside-untracked &&
     +	>file-tracked &&
     +	>file-untracked &&
     +	>outside-tracked/file &&
     +	>outside-untracked/file &&
    -+	>test1/file-tracked &&
    -+	>test1/file-untracked &&
    -+	>test1/inside-tracked/file &&
    -+	>test1/inside-untracked/file &&
    ++	>repo/file-tracked &&
    ++	>repo/file-untracked &&
    ++	>repo/inside-tracked/file &&
    ++	>repo/inside-untracked/file &&
     +
     +	cat >expect-tracked-unsorted <<-EOF &&
     +	../file-tracked
    @@ t/t2205-add-worktree-config.sh (new)
     +	inside-untracked/file
     +	EOF
     +
    ++	cat >expect-all-dir-unsorted <<-EOF &&
    ++	../file-untracked
    ++	../file-tracked
    ++	../outside-untracked/
    ++	../outside-tracked/
    ++	./
    ++	EOF
    ++
     +	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
     +
     +	cat >.gitignore <<-EOF
    @@ t/t2205-add-worktree-config.sh (new)
     +	actual-*
     +	expect-*
     +	EOF
    ++	)
     +'
     +
     +test_expect_success '1b: pre-add all' '
    ++	(
    ++	cd test1 &&
     +	local parent_dir="$(pwd)" &&
     +	(
    -+		cd test1 &&
    ++		cd repo &&
     +		git ls-files -o --exclude-standard "$parent_dir" >../actual-all-unsorted
     +	) &&
     +	sort actual-all-unsorted >actual-all &&
     +	sort expect-all-unsorted >expect-all &&
     +	test_cmp expect-all actual-all
    ++	)
     +'
     +
    -+test_expect_success '1c: post-add tracked' '
    ++test_expect_success '1c: pre-add dir all' '
    ++	(
    ++	cd test1 &&
     +	local parent_dir="$(pwd)" &&
     +	(
    -+		cd test1 &&
    ++		cd repo &&
    ++		git ls-files -o --directory --exclude-standard "$parent_dir" >../actual-all-dir-unsorted
    ++	) &&
    ++	sort actual-all-dir-unsorted >actual-all &&
    ++	sort expect-all-dir-unsorted >expect-all &&
    ++	test_cmp expect-all actual-all
    ++	)
    ++'
    ++
    ++test_expect_success '1d: post-add tracked' '
    ++	(
    ++	cd test1 &&
    ++	local parent_dir="$(pwd)" &&
    ++	(
    ++		cd repo &&
     +		git add file-tracked &&
     +		git add inside-tracked &&
     +		git add ../outside-tracked &&
    @@ t/t2205-add-worktree-config.sh (new)
     +	sort actual-tracked-unsorted >actual-tracked &&
     +	sort expect-tracked-unsorted >expect-tracked &&
     +	test_cmp expect-tracked actual-tracked
    ++	)
     +'
     +
    -+test_expect_success '1d: post-add untracked' '
    ++test_expect_success '1e: post-add untracked' '
    ++	(
    ++	cd test1 &&
     +	local parent_dir="$(pwd)" &&
     +	(
    -+		cd test1 &&
    ++		cd repo &&
     +		git ls-files -o --exclude-standard "$parent_dir" >../actual-untracked-unsorted
     +	) &&
     +	sort actual-untracked-unsorted >actual-untracked &&
     +	sort expect-untracked-unsorted >expect-untracked &&
     +	test_cmp expect-untracked actual-untracked
    ++	)
    ++'
    ++
    ++test_expect_success '2a: setup--set git-dir' '
    ++	mkdir test2 &&
    ++	(
    ++	cd test2 &&
    ++	test_create_repo repo &&
    ++	# create two foreign repositories that should remain untracked
    ++	test_create_repo repo-outside &&
    ++	test_create_repo repo/repo-inside &&
    ++
    ++	mkdir -p repo/inside-tracked repo/inside-untracked &&
    ++	>repo/file-tracked &&
    ++	>repo/file-untracked &&
    ++	>repo/inside-tracked/file &&
    ++	>repo/inside-untracked/file &&
    ++	>repo-outside/file &&
    ++	>repo/repo-inside/file &&
    ++
    ++	cat >expect-tracked-unsorted <<-EOF &&
    ++	repo/file-tracked
    ++	repo/inside-tracked/file
    ++	EOF
    ++
    ++	cat >expect-untracked-unsorted <<-EOF &&
    ++	repo/file-untracked
    ++	repo/inside-untracked/file
    ++	repo/repo-inside/
    ++	repo-outside/
    ++	EOF
    ++
    ++	cat >expect-all-dir-unsorted <<-EOF &&
    ++	repo/
    ++	repo-outside/
    ++	EOF
    ++
    ++	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
    ++
    ++	cat >.gitignore <<-EOF
    ++	.gitignore
    ++	actual-*
    ++	expect-*
    ++	EOF
    ++	)
    ++'
    ++
    ++test_expect_success '2b: pre-add all' '
    ++	(
    ++	cd test2 &&
    ++	git --git-dir=repo/.git ls-files -o --exclude-standard >actual-all-unsorted &&
    ++	sort actual-all-unsorted >actual-all &&
    ++	sort expect-all-unsorted >expect-all &&
    ++	test_cmp expect-all actual-all
    ++	)
    ++'
    ++
    ++test_expect_success '2c: pre-add dir all' '
    ++	(
    ++	cd test2 &&
    ++	git --git-dir=repo/.git ls-files -o --directory --exclude-standard >actual-all-dir-unsorted &&
    ++	sort actual-all-dir-unsorted >actual-all &&
    ++	sort expect-all-dir-unsorted >expect-all &&
    ++	test_cmp expect-all actual-all
    ++	)
    ++'
    ++
    ++test_expect_success '2d: post-add tracked' '
    ++	(
    ++	cd test2 &&
    ++	git --git-dir=repo/.git add repo/file-tracked &&
    ++	git --git-dir=repo/.git add repo/inside-tracked &&
    ++	git --git-dir=repo/.git ls-files >actual-tracked-unsorted &&
    ++	sort actual-tracked-unsorted >actual-tracked &&
    ++	sort expect-tracked-unsorted >expect-tracked &&
    ++	test_cmp expect-tracked actual-tracked
    ++	)
    ++'
    ++
    ++test_expect_success '2e: post-add untracked' '
    ++	(
    ++	cd test2 &&
    ++	git --git-dir=repo/.git ls-files -o --exclude-standard >actual-untracked-unsorted &&
    ++	sort actual-untracked-unsorted >actual-untracked &&
    ++	sort expect-untracked-unsorted >expect-untracked &&
    ++	test_cmp expect-untracked actual-untracked
    ++	)
    ++'
    ++
    ++test_expect_success '3a: setup--add repo dir' '
    ++	mkdir test3 &&
    ++	(
    ++	cd test3 &&
    ++	test_create_repo repo &&
    ++
    ++	mkdir -p repo/inside-tracked repo/inside-ignored &&
    ++	>repo/file-tracked &&
    ++	>repo/file-ignored &&
    ++	>repo/inside-tracked/file &&
    ++	>repo/inside-ignored/file &&
    ++
    ++	cat >.gitignore <<-EOF &&
    ++	.gitignore
    ++	actual-*
    ++	expect-*
    ++	*ignored
    ++	EOF
    ++
    ++	cat >expect-tracked-unsorted <<-EOF &&
    ++	repo/file-tracked
    ++	repo/inside-tracked/file
    ++	EOF
    ++
    ++	cat >expect-ignored-unsorted <<-EOF
    ++	repo/file-ignored
    ++	repo/inside-ignored/
    ++	.gitignore
    ++	actual-ignored-unsorted
    ++	expect-ignored-unsorted
    ++	expect-tracked-unsorted
    ++	EOF
    ++	)
    ++'
    ++
    ++test_expect_success '3b: ignored' '
    ++	(
    ++	cd test3 &&
    ++	git --git-dir=repo/.git ls-files -io --directory --exclude-standard >actual-ignored-unsorted &&
    ++	sort actual-ignored-unsorted >actual-ignored &&
    ++	sort expect-ignored-unsorted >expect-ignored &&
    ++	test_cmp expect-ignored actual-ignored
    ++	)
    ++'
    ++
    ++test_expect_success '3c: add repo' '
    ++	(
    ++	cd test3 &&
    ++	git --git-dir=repo/.git add repo &&
    ++	git --git-dir=repo/.git ls-files >actual-tracked-unsorted &&
    ++	sort actual-tracked-unsorted >actual-tracked &&
    ++	sort expect-tracked-unsorted >expect-tracked &&
    ++	test_cmp expect-tracked actual-tracked
    ++	)
     +'
     +
     +test_done
-:  ---------- > 2:  16e87a0345 dir: cache git_dir's realpath
2:  2e0a178c78 ! 3:  6de7932731 dir: minor refactoring / clean-up
    @@ Metadata
      ## Commit message ##
         dir: minor refactoring / clean-up
     
    -    Improve readability.
    +    Narrow the scope of the `nested_repo` variable and conditional return
    +    statement to the block where the variable is set.
     
         Signed-off-by: Goss Geppert <ggossdev@gmail.com>
     
    @@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
      	/* The "len-1" is to strip the final '/' */
      	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
     @@ dir.c: static enum path_treatment treat_directory(struct dir_struct *dir,
    - 		 *    configured by the user; see t2205 testcases 1a-1d for examples
    - 		 *    where this matters
    + 		 *    manually configured by the user; see t2205 testcases 1-3 for
    + 		 *    examples where this matters
      		 */
     +		int nested_repo;
      		struct strbuf sb = STRBUF_INIT;
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/3] dir: traverse into repository
  2022-05-20 19:28 ` [PATCH v3 0/3] dir: traverse into repository Goss Geppert
@ 2022-05-20 19:28   ` Goss Geppert
  2022-05-20 19:28   ` [PATCH v3 2/3] dir: cache git_dir's realpath Goss Geppert
  2022-05-20 19:28   ` [PATCH v3 3/3] dir: minor refactoring / clean-up Goss Geppert
  2 siblings, 0 replies; 24+ messages in thread
From: Goss Geppert @ 2022-05-20 19:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Since 8d92fb2927 (dir: replace exponential algorithm with a linear one,
2020-04-01) traversing into a repository's directory tree when the
traversal began outside the repository's standard location has failed
because the encountered repository was identified as a nested foreign
repository.

Prior to this commit, the failure to traverse into a repository's
default worktree location was observable from a user's perspective under
either of the following conditions (there may be others):

    1) Set the `core.worktree` location to a parent directory of the
       default worktree; or
    2) Use the `--git_dir` option while the working directory is outside
       the repository's default worktree location

Under either of these conditions, symptoms of the failure to traverse
into the repository's default worktree location include the inability to
add files to the index or get a list of untracked files via ls-files.

This commit adds a check to determine whether a nested repository that
is encountered in recursing a path is actually `the_repository`.  If so,
we simply treat the directory as if it doesn't contain a nested
repository.

The commit includes test-cases to reduce the likelihood of future
regressions.

Signed-off-by: Goss Geppert <ggossdev@gmail.com>
---
 dir.c                          |  19 +++
 t/t2205-add-worktree-config.sh | 274 +++++++++++++++++++++++++++++++++
 2 files changed, 293 insertions(+)
 create mode 100755 t/t2205-add-worktree-config.sh

diff --git a/dir.c b/dir.c
index 26c4d141ab..e45e117875 100644
--- a/dir.c
+++ b/dir.c
@@ -1893,9 +1893,28 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 
 	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
 		!(dir->flags & DIR_NO_GITLINKS)) {
+		/*
+		 * Determine if `dirname` is a nested repo by confirming that:
+		 * 1) we are in a nonbare repository, and
+		 * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
+		 *    which could occur if the git_dir or worktree location was
+		 *    manually configured by the user; see t2205 testcases 1-3 for
+		 *    examples where this matters
+		 */
 		struct strbuf sb = STRBUF_INIT;
 		strbuf_addstr(&sb, dirname);
 		nested_repo = is_nonbare_repository_dir(&sb);
+
+		if (nested_repo) {
+			char *real_dirname, *real_gitdir;
+			strbuf_addstr(&sb, ".git");
+			real_dirname = real_pathdup(sb.buf, 1);
+			real_gitdir = real_pathdup(the_repository->gitdir, 1);
+
+			nested_repo = !!strcmp(real_dirname, real_gitdir);
+			free(real_gitdir);
+			free(real_dirname);
+		}
 		strbuf_release(&sb);
 	}
 	if (nested_repo) {
diff --git a/t/t2205-add-worktree-config.sh b/t/t2205-add-worktree-config.sh
new file mode 100755
index 0000000000..056894eb94
--- /dev/null
+++ b/t/t2205-add-worktree-config.sh
@@ -0,0 +1,274 @@
+#!/bin/sh
+
+test_description='directory traversal respects user config
+
+This test verifies the traversal of the directory tree when the traversal begins
+outside the repository.  Two instances for which this can occur are tested:
+
+	1) The user manually sets the worktree.  For this instance, the test sets
+	   the worktree two levels above the `.git` directory and checks whether we
+	   are able to add to the index those files that are in either (1) the
+	   manually configured worktree directory or (2) the standard worktree
+	   location with respect to the `.git` directory (i.e. ensuring that the
+	   encountered `.git` directory is not treated as belonging to a foreign
+	   nested repository).
+	2) The user manually sets the `git_dir` while the working directory is
+	   outside the repository.  The test checks that files inside the
+	   repository can be added to the index.
+	'
+
+. ./test-lib.sh
+
+test_expect_success '1a: setup--config worktree' '
+	mkdir test1 &&
+	(
+	cd test1 &&
+	test_create_repo repo &&
+	git --git-dir="repo/.git" config core.worktree "$(pwd)" &&
+
+	mkdir -p outside-tracked outside-untracked &&
+	mkdir -p repo/inside-tracked repo/inside-untracked &&
+	>file-tracked &&
+	>file-untracked &&
+	>outside-tracked/file &&
+	>outside-untracked/file &&
+	>repo/file-tracked &&
+	>repo/file-untracked &&
+	>repo/inside-tracked/file &&
+	>repo/inside-untracked/file &&
+
+	cat >expect-tracked-unsorted <<-EOF &&
+	../file-tracked
+	../outside-tracked/file
+	file-tracked
+	inside-tracked/file
+	EOF
+
+	cat >expect-untracked-unsorted <<-EOF &&
+	../file-untracked
+	../outside-untracked/file
+	file-untracked
+	inside-untracked/file
+	EOF
+
+	cat >expect-all-dir-unsorted <<-EOF &&
+	../file-untracked
+	../file-tracked
+	../outside-untracked/
+	../outside-tracked/
+	./
+	EOF
+
+	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
+
+	cat >.gitignore <<-EOF
+	.gitignore
+	actual-*
+	expect-*
+	EOF
+	)
+'
+
+test_expect_success '1b: pre-add all' '
+	(
+	cd test1 &&
+	local parent_dir="$(pwd)" &&
+	(
+		cd repo &&
+		git ls-files -o --exclude-standard "$parent_dir" >../actual-all-unsorted
+	) &&
+	sort actual-all-unsorted >actual-all &&
+	sort expect-all-unsorted >expect-all &&
+	test_cmp expect-all actual-all
+	)
+'
+
+test_expect_success '1c: pre-add dir all' '
+	(
+	cd test1 &&
+	local parent_dir="$(pwd)" &&
+	(
+		cd repo &&
+		git ls-files -o --directory --exclude-standard "$parent_dir" >../actual-all-dir-unsorted
+	) &&
+	sort actual-all-dir-unsorted >actual-all &&
+	sort expect-all-dir-unsorted >expect-all &&
+	test_cmp expect-all actual-all
+	)
+'
+
+test_expect_success '1d: post-add tracked' '
+	(
+	cd test1 &&
+	local parent_dir="$(pwd)" &&
+	(
+		cd repo &&
+		git add file-tracked &&
+		git add inside-tracked &&
+		git add ../outside-tracked &&
+		git add "$parent_dir/file-tracked" &&
+		git ls-files "$parent_dir" >../actual-tracked-unsorted
+	) &&
+	sort actual-tracked-unsorted >actual-tracked &&
+	sort expect-tracked-unsorted >expect-tracked &&
+	test_cmp expect-tracked actual-tracked
+	)
+'
+
+test_expect_success '1e: post-add untracked' '
+	(
+	cd test1 &&
+	local parent_dir="$(pwd)" &&
+	(
+		cd repo &&
+		git ls-files -o --exclude-standard "$parent_dir" >../actual-untracked-unsorted
+	) &&
+	sort actual-untracked-unsorted >actual-untracked &&
+	sort expect-untracked-unsorted >expect-untracked &&
+	test_cmp expect-untracked actual-untracked
+	)
+'
+
+test_expect_success '2a: setup--set git-dir' '
+	mkdir test2 &&
+	(
+	cd test2 &&
+	test_create_repo repo &&
+	# create two foreign repositories that should remain untracked
+	test_create_repo repo-outside &&
+	test_create_repo repo/repo-inside &&
+
+	mkdir -p repo/inside-tracked repo/inside-untracked &&
+	>repo/file-tracked &&
+	>repo/file-untracked &&
+	>repo/inside-tracked/file &&
+	>repo/inside-untracked/file &&
+	>repo-outside/file &&
+	>repo/repo-inside/file &&
+
+	cat >expect-tracked-unsorted <<-EOF &&
+	repo/file-tracked
+	repo/inside-tracked/file
+	EOF
+
+	cat >expect-untracked-unsorted <<-EOF &&
+	repo/file-untracked
+	repo/inside-untracked/file
+	repo/repo-inside/
+	repo-outside/
+	EOF
+
+	cat >expect-all-dir-unsorted <<-EOF &&
+	repo/
+	repo-outside/
+	EOF
+
+	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
+
+	cat >.gitignore <<-EOF
+	.gitignore
+	actual-*
+	expect-*
+	EOF
+	)
+'
+
+test_expect_success '2b: pre-add all' '
+	(
+	cd test2 &&
+	git --git-dir=repo/.git ls-files -o --exclude-standard >actual-all-unsorted &&
+	sort actual-all-unsorted >actual-all &&
+	sort expect-all-unsorted >expect-all &&
+	test_cmp expect-all actual-all
+	)
+'
+
+test_expect_success '2c: pre-add dir all' '
+	(
+	cd test2 &&
+	git --git-dir=repo/.git ls-files -o --directory --exclude-standard >actual-all-dir-unsorted &&
+	sort actual-all-dir-unsorted >actual-all &&
+	sort expect-all-dir-unsorted >expect-all &&
+	test_cmp expect-all actual-all
+	)
+'
+
+test_expect_success '2d: post-add tracked' '
+	(
+	cd test2 &&
+	git --git-dir=repo/.git add repo/file-tracked &&
+	git --git-dir=repo/.git add repo/inside-tracked &&
+	git --git-dir=repo/.git ls-files >actual-tracked-unsorted &&
+	sort actual-tracked-unsorted >actual-tracked &&
+	sort expect-tracked-unsorted >expect-tracked &&
+	test_cmp expect-tracked actual-tracked
+	)
+'
+
+test_expect_success '2e: post-add untracked' '
+	(
+	cd test2 &&
+	git --git-dir=repo/.git ls-files -o --exclude-standard >actual-untracked-unsorted &&
+	sort actual-untracked-unsorted >actual-untracked &&
+	sort expect-untracked-unsorted >expect-untracked &&
+	test_cmp expect-untracked actual-untracked
+	)
+'
+
+test_expect_success '3a: setup--add repo dir' '
+	mkdir test3 &&
+	(
+	cd test3 &&
+	test_create_repo repo &&
+
+	mkdir -p repo/inside-tracked repo/inside-ignored &&
+	>repo/file-tracked &&
+	>repo/file-ignored &&
+	>repo/inside-tracked/file &&
+	>repo/inside-ignored/file &&
+
+	cat >.gitignore <<-EOF &&
+	.gitignore
+	actual-*
+	expect-*
+	*ignored
+	EOF
+
+	cat >expect-tracked-unsorted <<-EOF &&
+	repo/file-tracked
+	repo/inside-tracked/file
+	EOF
+
+	cat >expect-ignored-unsorted <<-EOF
+	repo/file-ignored
+	repo/inside-ignored/
+	.gitignore
+	actual-ignored-unsorted
+	expect-ignored-unsorted
+	expect-tracked-unsorted
+	EOF
+	)
+'
+
+test_expect_success '3b: ignored' '
+	(
+	cd test3 &&
+	git --git-dir=repo/.git ls-files -io --directory --exclude-standard >actual-ignored-unsorted &&
+	sort actual-ignored-unsorted >actual-ignored &&
+	sort expect-ignored-unsorted >expect-ignored &&
+	test_cmp expect-ignored actual-ignored
+	)
+'
+
+test_expect_success '3c: add repo' '
+	(
+	cd test3 &&
+	git --git-dir=repo/.git add repo &&
+	git --git-dir=repo/.git ls-files >actual-tracked-unsorted &&
+	sort actual-tracked-unsorted >actual-tracked &&
+	sort expect-tracked-unsorted >expect-tracked &&
+	test_cmp expect-tracked actual-tracked
+	)
+'
+
+test_done
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 2/3] dir: cache git_dir's realpath
  2022-05-20 19:28 ` [PATCH v3 0/3] dir: traverse into repository Goss Geppert
  2022-05-20 19:28   ` [PATCH v3 1/3] " Goss Geppert
@ 2022-05-20 19:28   ` Goss Geppert
  2022-05-24 14:32     ` Elijah Newren
  2022-05-20 19:28   ` [PATCH v3 3/3] dir: minor refactoring / clean-up Goss Geppert
  2 siblings, 1 reply; 24+ messages in thread
From: Goss Geppert @ 2022-05-20 19:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

When traversing the directory tree, the realpath of the `git_dir` is
required each time a nested repository is encountered to determine how
the nested repository should be treated.  To prevent excessive resource
usage in pathological cases (e.g. many nested repositories, a long path,
symlinks, etc.), cache the realpath after the first usage until the
traversal is complete.
---
 dir.c | 63 ++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/dir.c b/dir.c
index e45e117875..45b89560fc 100644
--- a/dir.c
+++ b/dir.c
@@ -47,10 +47,22 @@ struct cached_dir {
 	struct untracked_cache_dir *ucd;
 };
 
+/*
+ * Support data structure for the recursing into the directory tree.
+ */
+struct traversal_cache {
+	/*
+	 * The realpath for the `git_dir` may be required in the recursion and is
+	 * cached for repeated use.
+	 */
+	char *real_gitdir;
+};
+
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct index_state *istate, const char *path, int len,
 	struct untracked_cache_dir *untracked,
-	int check_only, int stop_at_first_file, const struct pathspec *pathspec);
+	int check_only, int stop_at_first_file, const struct pathspec *pathspec,
+	struct traversal_cache *cache);
 static int resolve_dtype(int dtype, struct index_state *istate,
 			 const char *path, int len);
 struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp)
@@ -1852,7 +1864,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	struct index_state *istate,
 	struct untracked_cache_dir *untracked,
 	const char *dirname, int len, int baselen, int excluded,
-	const struct pathspec *pathspec)
+	const struct pathspec *pathspec,
+	struct traversal_cache *cache)
 {
 	/*
 	 * WARNING: From this function, you can return path_recurse or you
@@ -1906,13 +1919,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		nested_repo = is_nonbare_repository_dir(&sb);
 
 		if (nested_repo) {
-			char *real_dirname, *real_gitdir;
+			char *real_dirname;
 			strbuf_addstr(&sb, ".git");
 			real_dirname = real_pathdup(sb.buf, 1);
-			real_gitdir = real_pathdup(the_repository->gitdir, 1);
+			if (!cache->real_gitdir)
+				cache->real_gitdir = real_pathdup(the_repository->gitdir, 1);
 
-			nested_repo = !!strcmp(real_dirname, real_gitdir);
-			free(real_gitdir);
+			nested_repo = !!strcmp(real_dirname, cache->real_gitdir);
 			free(real_dirname);
 		}
 		strbuf_release(&sb);
@@ -1944,7 +1957,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 				return path_excluded;
 
 			if (read_directory_recursive(dir, istate, dirname, len,
-						     untracked, 1, 1, pathspec) == path_excluded)
+						     untracked, 1, 1, pathspec, cache) == path_excluded)
 				return path_excluded;
 
 			return path_none;
@@ -2045,7 +2058,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	untracked = lookup_untracked(dir->untracked, untracked,
 				     dirname + baselen, len - baselen);
 	state = read_directory_recursive(dir, istate, dirname, len, untracked,
-					 check_only, stop_early, pathspec);
+					 check_only, stop_early, pathspec, cache);
 
 	/* There are a variety of reasons we may need to fixup the state... */
 	if (state == path_excluded) {
@@ -2242,7 +2255,8 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 					   struct index_state *istate,
 					   struct strbuf *path,
 					   int baselen,
-					   const struct pathspec *pathspec)
+					   const struct pathspec *pathspec,
+					   struct traversal_cache *cache)
 {
 	/*
 	 * WARNING: From this function, you can return path_recurse or you
@@ -2264,7 +2278,7 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
 		 * with check_only set.
 		 */
 		return read_directory_recursive(dir, istate, path->buf, path->len,
-						cdir->ucd, 1, 0, pathspec);
+						cdir->ucd, 1, 0, pathspec, cache);
 	/*
 	 * We get path_recurse in the first run when
 	 * directory_exists_in_index() returns index_nonexistent. We
@@ -2280,13 +2294,14 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 				      struct index_state *istate,
 				      struct strbuf *path,
 				      int baselen,
-				      const struct pathspec *pathspec)
+				      const struct pathspec *pathspec,
+				      struct traversal_cache *cache)
 {
 	int has_path_in_index, dtype, excluded;
 
 	if (!cdir->d_name)
 		return treat_path_fast(dir, cdir, istate, path,
-				       baselen, pathspec);
+				       baselen, pathspec, cache);
 	if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
 		return path_none;
 	strbuf_setlen(path, baselen);
@@ -2348,7 +2363,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 		strbuf_addch(path, '/');
 		return treat_directory(dir, istate, untracked,
 				       path->buf, path->len,
-				       baselen, excluded, pathspec);
+				       baselen, excluded, pathspec, cache);
 	case DT_REG:
 	case DT_LNK:
 		if (pathspec &&
@@ -2549,7 +2564,8 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir,
 static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	struct index_state *istate, const char *base, int baselen,
 	struct untracked_cache_dir *untracked, int check_only,
-	int stop_at_first_file, const struct pathspec *pathspec)
+	int stop_at_first_file, const struct pathspec *pathspec,
+	struct traversal_cache *cache)
 {
 	/*
 	 * WARNING: Do NOT recurse unless path_recurse is returned from
@@ -2572,7 +2588,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 	while (!read_cached_dir(&cdir)) {
 		/* check how the file or directory should be treated */
 		state = treat_path(dir, untracked, &cdir, istate, &path,
-				   baselen, pathspec);
+				   baselen, pathspec, cache);
 		dir->visited_paths++;
 
 		if (state > dir_state)
@@ -2587,7 +2603,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
 			subdir_state =
 				read_directory_recursive(dir, istate, path.buf,
 							 path.len, ud,
-							 check_only, stop_at_first_file, pathspec);
+							 check_only, stop_at_first_file, pathspec, cache);
 			if (subdir_state > dir_state)
 				dir_state = subdir_state;
 
@@ -2661,7 +2677,8 @@ int check_dir_entry_contains(const struct dir_entry *out, const struct dir_entry
 static int treat_leading_path(struct dir_struct *dir,
 			      struct index_state *istate,
 			      const char *path, int len,
-			      const struct pathspec *pathspec)
+			      const struct pathspec *pathspec,
+			      struct traversal_cache *cache)
 {
 	struct strbuf sb = STRBUF_INIT;
 	struct strbuf subdir = STRBUF_INIT;
@@ -2713,7 +2730,8 @@ static int treat_leading_path(struct dir_struct *dir,
 		strbuf_reset(&subdir);
 		strbuf_add(&subdir, path+prevlen, baselen-prevlen);
 		cdir.d_name = subdir.buf;
-		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec);
+
+		state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec, cache);
 
 		if (state != path_recurse)
 			break; /* do not recurse into it */
@@ -2986,6 +3004,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
+	struct traversal_cache cache = {
+		.real_gitdir = NULL,
+	};
 
 	trace2_region_enter("dir", "read_directory", istate->repo);
 	dir->visited_paths = 0;
@@ -3003,8 +3024,10 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		 * e.g. prep_exclude()
 		 */
 		dir->untracked = NULL;
-	if (!len || treat_leading_path(dir, istate, path, len, pathspec))
-		read_directory_recursive(dir, istate, path, len, untracked, 0, 0, pathspec);
+	if (!len || treat_leading_path(dir, istate, path, len, pathspec, &cache)) {
+		read_directory_recursive(dir, istate, path, len, untracked, 0, 0, pathspec, &cache);
+	}
+	FREE_AND_NULL(cache.real_gitdir);
 	QSORT(dir->entries, dir->nr, cmp_dir_entry);
 	QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] dir: minor refactoring / clean-up
  2022-05-20 19:28 ` [PATCH v3 0/3] dir: traverse into repository Goss Geppert
  2022-05-20 19:28   ` [PATCH v3 1/3] " Goss Geppert
  2022-05-20 19:28   ` [PATCH v3 2/3] dir: cache git_dir's realpath Goss Geppert
@ 2022-05-20 19:28   ` Goss Geppert
  2 siblings, 0 replies; 24+ messages in thread
From: Goss Geppert @ 2022-05-20 19:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, christian w, Elijah Newren, Derrick Stolee

Narrow the scope of the `nested_repo` variable and conditional return
statement to the block where the variable is set.

Signed-off-by: Goss Geppert <ggossdev@gmail.com>
---
 dir.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index 45b89560fc..92bdd4daf8 100644
--- a/dir.c
+++ b/dir.c
@@ -1874,7 +1874,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 	 */
 	enum path_treatment state;
 	int matches_how = 0;
-	int nested_repo = 0, check_only, stop_early;
+	int check_only, stop_early;
 	int old_ignored_nr, old_untracked_nr;
 	/* The "len-1" is to strip the final '/' */
 	enum exist_status status = directory_exists_in_index(istate, dirname, len-1);
@@ -1914,6 +1914,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 		 *    manually configured by the user; see t2205 testcases 1-3 for
 		 *    examples where this matters
 		 */
+		int nested_repo;
 		struct strbuf sb = STRBUF_INIT;
 		strbuf_addstr(&sb, dirname);
 		nested_repo = is_nonbare_repository_dir(&sb);
@@ -1929,12 +1930,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
 			free(real_dirname);
 		}
 		strbuf_release(&sb);
-	}
-	if (nested_repo) {
-		if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
-		    (matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
-			return path_none;
-		return excluded ? path_excluded : path_untracked;
+
+		if (nested_repo) {
+			if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
+				(matches_how == MATCHED_RECURSIVELY_LEADING_PATHSPEC))
+				return path_none;
+			return excluded ? path_excluded : path_untracked;
+		}
 	}
 
 	if (!(dir->flags & DIR_SHOW_OTHER_DIRECTORIES)) {
-- 
2.36.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-11 16:37     ` Junio C Hamano
@ 2022-05-20 19:45       ` oss dev
  2022-05-24 14:29       ` Elijah Newren
  1 sibling, 0 replies; 24+ messages in thread
From: oss dev @ 2022-05-20 19:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, christian w, Elijah Newren, Derrick Stolee

> Is this _complete() necessary?

It is not.  Though _complete won't actually change the buffer if the
trailing character matches, I've removed it.  The '/' is added prior
to each call here [1].

[1]: https://github.com/git/git/blob/277cf0bc36094f6dc4297d8c9cef79df045b735d/dir.c#L2329

> This function is repeatedly called during the traversal.
>
> How expensive is it to keep calling real_pathdup() on the constant
> the_repository->gitdir just in case it might be the same as our true
> GIT_DIR?

Using trace2, `real_pathdup` takes around 40 usecs on my machine
compared to 10 usecs for an empty region.

Please see the cover letter for v3 of the patch series for additional
details on both of these items.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-11 23:07     ` Junio C Hamano
@ 2022-05-20 20:01       ` oss dev
  0 siblings, 0 replies; 24+ messages in thread
From: oss dev @ 2022-05-20 20:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, christian w, Elijah Newren, Derrick Stolee

> I think I've mentioned this ealier, but the above only describes the
> scenario and does not say what behaviour is expected and what behaviour
> is observed.  "no longer works" is OK, but not sufficient.

I've updated the commit message for the fact that using option
`--git-dir` can also cause the traversal to skip over the repository.
I've therefore had to be a bit more generic in the description, so let
me know if it's not specific enough.

> > +     git --git-dir="test1/.git" config core.worktree "$(pwd)" &&
>
> I wonder if this lets funny paths to be added to the index, e.g.
> would "git add test1" recursively add everything in that directory?

Yes, it would recursively add files inside test1.  I've added testcase
3c which I believe does what we want but let me know if that doesn't
address your question.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] dir: minor refactoring / clean-up
  2022-05-11 16:51     ` Junio C Hamano
@ 2022-05-20 20:03       ` oss dev
  0 siblings, 0 replies; 24+ messages in thread
From: oss dev @ 2022-05-20 20:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, christian w, Elijah Newren, Derrick Stolee

> > Improve readability.
>
> It reads somewhat a subjective opinion, without explaining how/why
> the change makes it more readable.  Perhaps
>
>         Narrow the scope of the nested_repo variable to the block
>         that uses it.
>
> or something?

I've updated this message.  Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-10 17:15   ` [PATCH v2 1/2] " Goss Geppert
  2022-05-11 16:37     ` Junio C Hamano
  2022-05-11 23:07     ` Junio C Hamano
@ 2022-05-23 19:23     ` Derrick Stolee
  2 siblings, 0 replies; 24+ messages in thread
From: Derrick Stolee @ 2022-05-23 19:23 UTC (permalink / raw)
  To: Goss Geppert, git; +Cc: Junio C Hamano, christian w, Elijah Newren

On 5/10/22 1:15 PM, Goss Geppert wrote:
> Since 8d92fb2927 (dir: replace exponential algorithm with a linear one,
> 2020-04-01) the following no longer works:
> 
>     1) Initialize a repository.
>     2) Set the `core.worktree` location to a parent directory of the
>        default worktree.
>     3) Add a file located in the default worktree location to the index
>        (i.e. anywhere in the immediate parent directory of the gitdir).
> 
> This commit adds a check to determine whether a nested repository that
> is encountered in recursing a path is actually `the_repository`.  If so,
> simply treat the directory as if it doesn't contain a nested repository.
> 
> Prior to this commit, the `add` operation was silently ignored.
> 
> Signed-off-by: Goss Geppert <ggossdev@gmail.com>
> ---
>  dir.c                          | 22 +++++++++
>  t/t2205-add-worktree-config.sh | 89 ++++++++++++++++++++++++++++++++++
>  2 files changed, 111 insertions(+)
>  create mode 100755 t/t2205-add-worktree-config.sh
> 
> diff --git a/dir.c b/dir.c
> index f2b0f24210..a1886e61a3 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1893,9 +1893,31 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>  
>  	if ((dir->flags & DIR_SKIP_NESTED_GIT) ||
>  		!(dir->flags & DIR_NO_GITLINKS)) {
> +		/*
> +		 * Determine if `dirname` is a nested repo by confirming that:
> +		 * 1) we are in a nonbare repository, and
> +		 * 2) `dirname` is not an immediate parent of `the_repository->gitdir`,
> +		 *    which could occur if the `worktree` location was manually
> +		 *    configured by the user; see t2205 testcases 1a-1d for examples
> +		 *    where this matters
> +		 */
>  		struct strbuf sb = STRBUF_INIT;
>  		strbuf_addstr(&sb, dirname);
>  		nested_repo = is_nonbare_repository_dir(&sb);
> +
> +		if (nested_repo) {
> +			char *real_dirname, *real_gitdir;
> +			strbuf_reset(&sb);
> +			strbuf_addstr(&sb, dirname);
> +			strbuf_complete(&sb, '/');
> +			strbuf_addstr(&sb, ".git");

This could be strbuf_add(&sb, ".git", 4); to avoid a (minor)
strlen() computation.

> +			real_dirname = real_pathdup(sb.buf, 0);
> +			real_gitdir = real_pathdup(the_repository->gitdir, 0);

With regards to Junio's comment about repeating real_pathdup()
on the_repository->gitdir, we might be able to short-circuit
this by making real_gitdir static and only calling
real_pathdup() if real_gitdir is NULL. It would cut the cost
of these checks by half for big traversals.

> +
> +			nested_repo = !!strcmp(real_dirname, real_gitdir);
> +			free(real_gitdir);
> +			free(real_dirname);
> +		}
...
> +
> +test_description='directory traversal respects worktree config
> +
> +This test configures the repository`s worktree to be two levels above the
> +`.git` directory and checks whether we are able to add to the index those files
> +that are in either (1) the manually configured worktree directory or (2) the
> +standard worktree location with respect to the `.git` directory (i.e. ensuring
> +that the encountered `.git` directory is not treated as belonging to a foreign
> +nested repository)'

The test description should be a single line. The longer paragraph
could be a comment before your "setup" test case.

> +. ./test-lib.sh
> +
> +test_expect_success '1a: setup' '

Also, there is no need to number your tests in their names, as the
testing infrastructure will apply numbers based on their order in
the test file. These labels will grow stale if tests are removed
or inserted into the order.

> +	test_create_repo test1 &&
> +	git --git-dir="test1/.git" config core.worktree "$(pwd)" &&
> +
> +	mkdir -p outside-tracked outside-untracked &&
> +	mkdir -p test1/inside-tracked test1/inside-untracked &&
> +	>file-tracked &&
> +	>file-untracked &&
> +	>outside-tracked/file &&
> +	>outside-untracked/file &&
> +	>test1/file-tracked &&
> +	>test1/file-untracked &&
> +	>test1/inside-tracked/file &&
> +	>test1/inside-untracked/file &&
> +
> +	cat >expect-tracked-unsorted <<-EOF &&
> +	../file-tracked
> +	../outside-tracked/file
> +	file-tracked
> +	inside-tracked/file
> +	EOF
> +
> +	cat >expect-untracked-unsorted <<-EOF &&
> +	../file-untracked
> +	../outside-untracked/file
> +	file-untracked
> +	inside-untracked/file
> +	EOF
> +
> +	cat expect-tracked-unsorted expect-untracked-unsorted >expect-all-unsorted &&
> +
> +	cat >.gitignore <<-EOF
> +	.gitignore
> +	actual-*
> +	expect-*
> +	EOF

This use of .gitignore to ignore the helper files being used
during the test is interesting. I was thinking it would be better
to create isolated directories where the only activity was that
which you were testing:

	mkdir -p worktree &&
	test_create_repo worktree/contains-git &&

...or something like that. The names are more descriptive, and
we don't need the .gitignore trick.

> +'
> +
> +test_expect_success '1b: pre-add all' '
> +	local parent_dir="$(pwd)" &&

I was surprised by this "local". It isn't needed at this
point in the test script. We use it for helper methods to
be sure that we aren't stomping variables from test scripts,
but since the test is being executed inside an eval(), this
local isn't important.

> +	(
> +		cd test1 &&
> +		git ls-files -o --exclude-standard "$parent_dir" >../actual-all-unsorted

You can avoid the sub-shell using "git -C test1 ls-files ..."
right?

> +	) &&
> +	sort actual-all-unsorted >actual-all &&
> +	sort expect-all-unsorted >expect-all &&
> +	test_cmp expect-all actual-all
> +'
> +
> +test_expect_success '1c: post-add tracked' '
> +	local parent_dir="$(pwd)" &&
> +	(
> +		cd test1 &&
> +		git add file-tracked &&
> +		git add inside-tracked &&
> +		git add ../outside-tracked &&
> +		git add "$parent_dir/file-tracked" &&
> +		git ls-files "$parent_dir" >../actual-tracked-unsorted
> +	) &&

This sub-shell is important because of the relative paths
involved. OK.

This also checks to see if _any_ of these "git add"
commands fail, as opposed to failing immediately after
the first one fails. I think your approach is simpler and
should be relatively simple to identify which command did
the wrong thing by looking at the test_cmp output.

> +	sort actual-tracked-unsorted >actual-tracked &&
> +	sort expect-tracked-unsorted >expect-tracked &&
> +	test_cmp expect-tracked actual-tracked
> +'
> +
> +test_expect_success '1d: post-add untracked' '
> +	local parent_dir="$(pwd)" &&
> +	(
> +		cd test1 &&
> +		git ls-files -o --exclude-standard "$parent_dir" >../actual-untracked-unsorted
> +	) &&

Again, this one is not needed.

> +	sort actual-untracked-unsorted >actual-untracked &&
> +	sort expect-untracked-unsorted >expect-untracked &&
> +	test_cmp expect-untracked actual-untracked
> +'
> +

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-11 16:37     ` Junio C Hamano
  2022-05-20 19:45       ` oss dev
@ 2022-05-24 14:29       ` Elijah Newren
  2022-05-24 19:45         ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Elijah Newren @ 2022-05-24 14:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Goss Geppert, Git Mailing List, christian w, Derrick Stolee

On Wed, May 11, 2022 at 9:37 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Goss Geppert <gg.oss.dev@gmail.com> writes:
>
> > diff --git a/dir.c b/dir.c
> > index f2b0f24210..a1886e61a3 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1893,9 +1893,31 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>
[...]
>
> > +                     real_gitdir = real_pathdup(the_repository->gitdir, 0);
>
> This function is repeatedly called during the traversal.
>
> How expensive is it to keep calling real_pathdup() on the constant
> the_repository->gitdir just in case it might be the same as our true
> GIT_DIR?

I agree that treat_directory is called many times, but this
real_pathdup() call is inside the "if (nested_repo)" block, so this
new real_pathdup() invocation should occur very seldom.  Or are you
worried about cases where users have *very* large numbers of bare
repositories nested under the working directory?  Even in that case,
which seems pathological to me, I'd suspect the
is_nonbare_repository_dir() ->
read_gitfile_gently()/is_git_directory() codepath (used to determine
the value of nested_repo) would be much more expensive than this call
to real_pathdup(), so would it be worth trying to optimize this
real_pathdup() call away even in that rare case?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/3] dir: cache git_dir's realpath
  2022-05-20 19:28   ` [PATCH v3 2/3] dir: cache git_dir's realpath Goss Geppert
@ 2022-05-24 14:32     ` Elijah Newren
  0 siblings, 0 replies; 24+ messages in thread
From: Elijah Newren @ 2022-05-24 14:32 UTC (permalink / raw)
  To: Goss Geppert
  Cc: Git Mailing List, Junio C Hamano, christian w, Derrick Stolee

On Fri, May 20, 2022 at 12:29 PM Goss Geppert <gg.oss.dev@gmail.com> wrote:
>
> When traversing the directory tree, the realpath of the `git_dir` is
> required each time a nested repository is encountered to determine how
> the nested repository should be treated.  To prevent excessive resource
> usage in pathological cases (e.g. many nested repositories, a long path,
> symlinks, etc.), cache the realpath after the first usage until the
> traversal is complete.
> ---
>  dir.c | 63 ++++++++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 43 insertions(+), 20 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index e45e117875..45b89560fc 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -47,10 +47,22 @@ struct cached_dir {
>         struct untracked_cache_dir *ucd;
>  };
>
> +/*
> + * Support data structure for the recursing into the directory tree.
> + */
> +struct traversal_cache {
> +       /*
> +        * The realpath for the `git_dir` may be required in the recursion and is
> +        * cached for repeated use.
> +        */
> +       char *real_gitdir;
> +};
> +
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>         struct index_state *istate, const char *path, int len,
>         struct untracked_cache_dir *untracked,
> -       int check_only, int stop_at_first_file, const struct pathspec *pathspec);
> +       int check_only, int stop_at_first_file, const struct pathspec *pathspec,
> +       struct traversal_cache *cache);
>  static int resolve_dtype(int dtype, struct index_state *istate,
>                          const char *path, int len);
>  struct dirent *readdir_skip_dot_and_dotdot(DIR *dirp)
> @@ -1852,7 +1864,8 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>         struct index_state *istate,
>         struct untracked_cache_dir *untracked,
>         const char *dirname, int len, int baselen, int excluded,
> -       const struct pathspec *pathspec)
> +       const struct pathspec *pathspec,
> +       struct traversal_cache *cache)
>  {
>         /*
>          * WARNING: From this function, you can return path_recurse or you
> @@ -1906,13 +1919,13 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>                 nested_repo = is_nonbare_repository_dir(&sb);
>
>                 if (nested_repo) {
> -                       char *real_dirname, *real_gitdir;
> +                       char *real_dirname;
>                         strbuf_addstr(&sb, ".git");
>                         real_dirname = real_pathdup(sb.buf, 1);
> -                       real_gitdir = real_pathdup(the_repository->gitdir, 1);
> +                       if (!cache->real_gitdir)
> +                               cache->real_gitdir = real_pathdup(the_repository->gitdir, 1);
>
> -                       nested_repo = !!strcmp(real_dirname, real_gitdir);
> -                       free(real_gitdir);
> +                       nested_repo = !!strcmp(real_dirname, cache->real_gitdir);
>                         free(real_dirname);
>                 }
>                 strbuf_release(&sb);
> @@ -1944,7 +1957,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>                                 return path_excluded;
>
>                         if (read_directory_recursive(dir, istate, dirname, len,
> -                                                    untracked, 1, 1, pathspec) == path_excluded)
> +                                                    untracked, 1, 1, pathspec, cache) == path_excluded)
>                                 return path_excluded;
>
>                         return path_none;
> @@ -2045,7 +2058,7 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>         untracked = lookup_untracked(dir->untracked, untracked,
>                                      dirname + baselen, len - baselen);
>         state = read_directory_recursive(dir, istate, dirname, len, untracked,
> -                                        check_only, stop_early, pathspec);
> +                                        check_only, stop_early, pathspec, cache);
>
>         /* There are a variety of reasons we may need to fixup the state... */
>         if (state == path_excluded) {
> @@ -2242,7 +2255,8 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
>                                            struct index_state *istate,
>                                            struct strbuf *path,
>                                            int baselen,
> -                                          const struct pathspec *pathspec)
> +                                          const struct pathspec *pathspec,
> +                                          struct traversal_cache *cache)
>  {
>         /*
>          * WARNING: From this function, you can return path_recurse or you
> @@ -2264,7 +2278,7 @@ static enum path_treatment treat_path_fast(struct dir_struct *dir,
>                  * with check_only set.
>                  */
>                 return read_directory_recursive(dir, istate, path->buf, path->len,
> -                                               cdir->ucd, 1, 0, pathspec);
> +                                               cdir->ucd, 1, 0, pathspec, cache);
>         /*
>          * We get path_recurse in the first run when
>          * directory_exists_in_index() returns index_nonexistent. We
> @@ -2280,13 +2294,14 @@ static enum path_treatment treat_path(struct dir_struct *dir,
>                                       struct index_state *istate,
>                                       struct strbuf *path,
>                                       int baselen,
> -                                     const struct pathspec *pathspec)
> +                                     const struct pathspec *pathspec,
> +                                     struct traversal_cache *cache)
>  {
>         int has_path_in_index, dtype, excluded;
>
>         if (!cdir->d_name)
>                 return treat_path_fast(dir, cdir, istate, path,
> -                                      baselen, pathspec);
> +                                      baselen, pathspec, cache);
>         if (is_dot_or_dotdot(cdir->d_name) || !fspathcmp(cdir->d_name, ".git"))
>                 return path_none;
>         strbuf_setlen(path, baselen);
> @@ -2348,7 +2363,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
>                 strbuf_addch(path, '/');
>                 return treat_directory(dir, istate, untracked,
>                                        path->buf, path->len,
> -                                      baselen, excluded, pathspec);
> +                                      baselen, excluded, pathspec, cache);
>         case DT_REG:
>         case DT_LNK:
>                 if (pathspec &&
> @@ -2549,7 +2564,8 @@ static void add_path_to_appropriate_result_list(struct dir_struct *dir,
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>         struct index_state *istate, const char *base, int baselen,
>         struct untracked_cache_dir *untracked, int check_only,
> -       int stop_at_first_file, const struct pathspec *pathspec)
> +       int stop_at_first_file, const struct pathspec *pathspec,
> +       struct traversal_cache *cache)
>  {
>         /*
>          * WARNING: Do NOT recurse unless path_recurse is returned from
> @@ -2572,7 +2588,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>         while (!read_cached_dir(&cdir)) {
>                 /* check how the file or directory should be treated */
>                 state = treat_path(dir, untracked, &cdir, istate, &path,
> -                                  baselen, pathspec);
> +                                  baselen, pathspec, cache);
>                 dir->visited_paths++;
>
>                 if (state > dir_state)
> @@ -2587,7 +2603,7 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>                         subdir_state =
>                                 read_directory_recursive(dir, istate, path.buf,
>                                                          path.len, ud,
> -                                                        check_only, stop_at_first_file, pathspec);
> +                                                        check_only, stop_at_first_file, pathspec, cache);
>                         if (subdir_state > dir_state)
>                                 dir_state = subdir_state;
>
> @@ -2661,7 +2677,8 @@ int check_dir_entry_contains(const struct dir_entry *out, const struct dir_entry
>  static int treat_leading_path(struct dir_struct *dir,
>                               struct index_state *istate,
>                               const char *path, int len,
> -                             const struct pathspec *pathspec)
> +                             const struct pathspec *pathspec,
> +                             struct traversal_cache *cache)
>  {
>         struct strbuf sb = STRBUF_INIT;
>         struct strbuf subdir = STRBUF_INIT;
> @@ -2713,7 +2730,8 @@ static int treat_leading_path(struct dir_struct *dir,
>                 strbuf_reset(&subdir);
>                 strbuf_add(&subdir, path+prevlen, baselen-prevlen);
>                 cdir.d_name = subdir.buf;
> -               state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec);
> +
> +               state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen, pathspec, cache);
>
>                 if (state != path_recurse)
>                         break; /* do not recurse into it */
> @@ -2986,6 +3004,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>                    const char *path, int len, const struct pathspec *pathspec)
>  {
>         struct untracked_cache_dir *untracked;
> +       struct traversal_cache cache = {
> +               .real_gitdir = NULL,
> +       };
>
>         trace2_region_enter("dir", "read_directory", istate->repo);
>         dir->visited_paths = 0;
> @@ -3003,8 +3024,10 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>                  * e.g. prep_exclude()
>                  */
>                 dir->untracked = NULL;
> -       if (!len || treat_leading_path(dir, istate, path, len, pathspec))
> -               read_directory_recursive(dir, istate, path, len, untracked, 0, 0, pathspec);
> +       if (!len || treat_leading_path(dir, istate, path, len, pathspec, &cache)) {
> +               read_directory_recursive(dir, istate, path, len, untracked, 0, 0, pathspec, &cache);
> +       }
> +       FREE_AND_NULL(cache.real_gitdir);
>         QSORT(dir->entries, dir->nr, cmp_dir_entry);
>         QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
>
> --
> 2.36.0

This makes sense as a way to avoid the call to real_pathdup() if
that's what we want, but I feel it's a micro-optimization that isn't
worth this code churn.  I'd rather drop this patch from the series and
just use the real_pathdup() you had before (see
https://lore.kernel.org/git/CABPp-BGXRzYCvyM38dEUvQ125+VtRu++7L9UiRz98u+1=Lov7A@mail.gmail.com/).
Let's see what Junio says, but I'm hoping he agrees this patch isn't
needed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-24 14:29       ` Elijah Newren
@ 2022-05-24 19:45         ` Junio C Hamano
  2022-05-25  3:46           ` Elijah Newren
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2022-05-24 19:45 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Goss Geppert, Git Mailing List, christian w, Derrick Stolee

Elijah Newren <newren@gmail.com> writes:

> On Wed, May 11, 2022 at 9:37 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Goss Geppert <gg.oss.dev@gmail.com> writes:
>>
>> > diff --git a/dir.c b/dir.c
>> > index f2b0f24210..a1886e61a3 100644
>> > --- a/dir.c
>> > +++ b/dir.c
>> > @@ -1893,9 +1893,31 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
>>
> [...]
>>
>> > +                     real_gitdir = real_pathdup(the_repository->gitdir, 0);
>>
>> This function is repeatedly called during the traversal.
>>
>> How expensive is it to keep calling real_pathdup() on the constant
>> the_repository->gitdir just in case it might be the same as our true
>> GIT_DIR?
>
> I agree that treat_directory is called many times, but this
> real_pathdup() call is inside the "if (nested_repo)" block, so this
> new real_pathdup() invocation should occur very seldom.  Or are you
> worried about cases where users have *very* large numbers of bare
> repositories nested under the working directory?

No.  I wasn't worried about anything in particular.  I just wanted
to get the feel of how deep a thought the patch was backed by by
spot checking what was and what was not taken into account when
designing the change.

I do not care too much when there are very large numbers of things
that cause this codepath to be exercised.  Strange situations can be
left for later optimization only when they turn up in the real world
and prove to be a problem.

By the way, where is a bare repository involved?  did you mean
non-bare aka worktree-full repository?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] dir: consider worktree config in path recursion
  2022-05-24 19:45         ` Junio C Hamano
@ 2022-05-25  3:46           ` Elijah Newren
  0 siblings, 0 replies; 24+ messages in thread
From: Elijah Newren @ 2022-05-25  3:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Goss Geppert, Git Mailing List, christian w, Derrick Stolee

On Tue, May 24, 2022 at 12:45 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Wed, May 11, 2022 at 9:37 AM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Goss Geppert <gg.oss.dev@gmail.com> writes:
> >>
> >> > diff --git a/dir.c b/dir.c
> >> > index f2b0f24210..a1886e61a3 100644
> >> > --- a/dir.c
> >> > +++ b/dir.c
> >> > @@ -1893,9 +1893,31 @@ static enum path_treatment treat_directory(struct dir_struct *dir,
> >>
> > [...]
> >>
> >> > +                     real_gitdir = real_pathdup(the_repository->gitdir, 0);
> >>
> >> This function is repeatedly called during the traversal.
> >>
> >> How expensive is it to keep calling real_pathdup() on the constant
> >> the_repository->gitdir just in case it might be the same as our true
> >> GIT_DIR?
> >
> > I agree that treat_directory is called many times, but this
> > real_pathdup() call is inside the "if (nested_repo)" block, so this
> > new real_pathdup() invocation should occur very seldom.  Or are you
> > worried about cases where users have *very* large numbers of bare
> > repositories nested under the working directory?
>
> No.  I wasn't worried about anything in particular.  I just wanted
> to get the feel of how deep a thought the patch was backed by by
> spot checking what was and what was not taken into account when
> designing the change.
>
> I do not care too much when there are very large numbers of things
> that cause this codepath to be exercised.  Strange situations can be
> left for later optimization only when they turn up in the real world
> and prove to be a problem.

Ah, gotcha.

> By the way, where is a bare repository involved?  did you mean
> non-bare aka worktree-full repository?

Yes, sorry, I meant non-bare repository.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-05-25  3:47 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-05 20:32 [RFC PATCH 0/1] dir: consider worktree config in path recursion Goss Geppert
2022-05-05 20:32 ` [RFC PATCH 1/1] " Goss Geppert
2022-05-07  3:26   ` Elijah Newren
2022-05-07 17:59     ` oss dev
2022-05-06 17:02 ` [RFC PATCH 0/1] " Junio C Hamano
2022-05-06 20:00   ` oss dev
2022-05-10 17:15 ` [PATCH v2 0/2] " Goss Geppert
2022-05-10 17:15   ` [PATCH v2 1/2] " Goss Geppert
2022-05-11 16:37     ` Junio C Hamano
2022-05-20 19:45       ` oss dev
2022-05-24 14:29       ` Elijah Newren
2022-05-24 19:45         ` Junio C Hamano
2022-05-25  3:46           ` Elijah Newren
2022-05-11 23:07     ` Junio C Hamano
2022-05-20 20:01       ` oss dev
2022-05-23 19:23     ` Derrick Stolee
2022-05-10 17:15   ` [PATCH v2 2/2] dir: minor refactoring / clean-up Goss Geppert
2022-05-11 16:51     ` Junio C Hamano
2022-05-20 20:03       ` oss dev
2022-05-20 19:28 ` [PATCH v3 0/3] dir: traverse into repository Goss Geppert
2022-05-20 19:28   ` [PATCH v3 1/3] " Goss Geppert
2022-05-20 19:28   ` [PATCH v3 2/3] dir: cache git_dir's realpath Goss Geppert
2022-05-24 14:32     ` Elijah Newren
2022-05-20 19:28   ` [PATCH v3 3/3] dir: minor refactoring / clean-up Goss Geppert

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).