git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
@ 2021-11-18 16:41 Danial Alihosseini
  2021-11-18 17:04 ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Danial Alihosseini @ 2021-11-18 16:41 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
Consider the following project structure
- data
  - data1
    - file1
    - file1.txt
  - data2
    - file2
    - file2.txt
- .gitignore


`.gitignore` is as follows:
```
data/**
!data/**/
!data/**/*.txt
```
What did you expect to happen? (Expected behavior)

I expect all files in `data` folder to be ignored except `.txt` files.

What happened instead? (Actual behavior)

`file1` and `file2` are not ignored.
Here is the `check-ignore` output:
```
$ git check-ignore -v data/data1/file1
.gitignore:2:!/data/**/ data/data1/file1
```

What's different between what you expected and what actually happened?
It seems that the behavior of `**` in gitignore is different from
previous versions.

Anything else you want to add:

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.34.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 5.15.2-arch1-1
compiler info: gnuc: 11.1
libc info: glibc: 2.33
$SHELL (typically, interactive shell): /usr/bin/zsh


[Enabled Hooks]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-18 16:41 git 2.34.0: Behavior of `**` in gitignore is different from previous versions Danial Alihosseini
@ 2021-11-18 17:04 ` Jeff King
  2021-11-18 22:09   ` Derrick Stolee
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2021-11-18 17:04 UTC (permalink / raw)
  To: Danial Alihosseini; +Cc: Derrick Stolee, git

On Thu, Nov 18, 2021 at 08:11:04PM +0330, Danial Alihosseini wrote:

> What did you do before the bug happened? (Steps to reproduce your issue)
> Consider the following project structure
> - data
>   - data1
>     - file1
>     - file1.txt
>   - data2
>     - file2
>     - file2.txt
> - .gitignore
> 
> 
> `.gitignore` is as follows:
> ```
> data/**
> !data/**/
> !data/**/*.txt
> ```
> What did you expect to happen? (Expected behavior)
> 
> I expect all files in `data` folder to be ignored except `.txt` files.
> 
> What happened instead? (Actual behavior)
> 
> `file1` and `file2` are not ignored.
> Here is the `check-ignore` output:
> ```
> $ git check-ignore -v data/data1/file1
> .gitignore:2:!/data/**/ data/data1/file1
> ```

Thanks for an easy reproduction. It looks like this changed in
f6526728f9 (dir: select directories correctly, 2021-09-24). Author cc'd.

The key thing seems to be that the second line of your .gitignore should
match only directories (because of the trailing slash), but no longer
does.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-18 17:04 ` Jeff King
@ 2021-11-18 22:09   ` Derrick Stolee
  2021-11-19  4:01     ` Danial Alihosseini
  0 siblings, 1 reply; 13+ messages in thread
From: Derrick Stolee @ 2021-11-18 22:09 UTC (permalink / raw)
  To: Jeff King, Danial Alihosseini; +Cc: Derrick Stolee, git

On 11/18/2021 12:04 PM, Jeff King wrote:
> On Thu, Nov 18, 2021 at 08:11:04PM +0330, Danial Alihosseini wrote:
> 
>> What did you do before the bug happened? (Steps to reproduce your issue)
>> Consider the following project structure
>> - data
>>   - data1
>>     - file1
>>     - file1.txt
>>   - data2
>>     - file2
>>     - file2.txt
>> - .gitignore
>>
>>
>> `.gitignore` is as follows:
>> ```
>> data/**
>> !data/**/
>> !data/**/*.txt
>> ```
>> What did you expect to happen? (Expected behavior)
>>
>> I expect all files in `data` folder to be ignored except `.txt` files.
>>
>> What happened instead? (Actual behavior)
>>
>> `file1` and `file2` are not ignored.
>> Here is the `check-ignore` output:
>> ```
>> $ git check-ignore -v data/data1/file1
>> .gitignore:2:!/data/**/ data/data1/file1
>> ```
> 
> Thanks for an easy reproduction. It looks like this changed in
> f6526728f9 (dir: select directories correctly, 2021-09-24). Author cc'd.

Thanks for the bisect and CC.

> The key thing seems to be that the second line of your .gitignore should
> match only directories (because of the trailing slash), but no longer
> does.

Doesn't "matching only directories" mean it would match everything
within that directory? (It also means that "data/file" is not matched,
which is still correct.)

My interpretation of these patterns is that everything in data/data1/
and data/data2/ should not be ignored, making it seem like the change
fixed a bug (it definitely changed behavior).

Just for extra clarity, this test currently passes:

test_expect_success 'directories and ** matches' '
	cat >.gitignore <<-\EOF &&
	data/**
	!data/**/
	!data/**/*.txt
	EOF
	git check-ignore file \
		data/file data/data1/file1 data/data1/file1.txt \
		data/data2/file2 data/data2/file2.txt >actual &&
	cat >expect <<-\EOF &&
	data/file
	EOF
	test_cmp expect actual
'

but the previous behavior would have passed this test:

test_expect_success 'directories and ** matches' '
	cat >.gitignore <<-\EOF &&
	data/**
	!data/**/
	!data/**/*.txt
	EOF
	git check-ignore file \
		data/file data/data1/file1.txt \
		data/data2/file2.txt >actual &&
	cat >expect <<-\EOF &&
	data/file
	EOF
	test_cmp expect actual
'

I seek more clarity on this. Specifically: if we match a directory
then should we not also match the contents within?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-18 22:09   ` Derrick Stolee
@ 2021-11-19  4:01     ` Danial Alihosseini
  2021-11-19 14:51       ` Derrick Stolee
  0 siblings, 1 reply; 13+ messages in thread
From: Danial Alihosseini @ 2021-11-19  4:01 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff King, Derrick Stolee, git

Thanks for your follow-up.

I wanted to ignore all files in the "data" folder except ".txt" ones.

As mentioned in the gitignore doc, there should be a difference
between "**" and "**/".

> A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.

and,
>
> A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.

and also,
>
> It is not possible to re-include a file if a parent directory of that file is excluded.

So, I excluded all files by "data/**", re-included just directories
(at any depth) by "!data/**/" and, re-included ".txt" files (at any
depth) by "!data/**/*.txt".
However, if I wanted to re-include a directory and all of its
contents, I could use something like "!data/data1/**", without
trailing slash.
As I see, there is no separation between "**" and "**/" in the current version.

I think there is another point that the previous behavior test case
should be like this:
test_expect_success 'directories and ** matches' '
    cat >.gitignore <<-\EOF &&
    data/**
    !data/**/
    !data/**/*.txt
    EOF
    git check-ignore file \
        data/file data/data1/file1 data/data1/file1.txt \
        data/data2/file2 data/data2/file2.txt >actual &&
    cat >expect <<-\EOF &&
    data/file
    data/data1/file1
    data/data2/file2
    EOF
    test_cmp expect actual
'
Thanks
Danial


On Fri, Nov 19, 2021 at 1:40 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/18/2021 12:04 PM, Jeff King wrote:
> > On Thu, Nov 18, 2021 at 08:11:04PM +0330, Danial Alihosseini wrote:
> >
> >> What did you do before the bug happened? (Steps to reproduce your issue)
> >> Consider the following project structure
> >> - data
> >>   - data1
> >>     - file1
> >>     - file1.txt
> >>   - data2
> >>     - file2
> >>     - file2.txt
> >> - .gitignore
> >>
> >>
> >> `.gitignore` is as follows:
> >> ```
> >> data/**
> >> !data/**/
> >> !data/**/*.txt
> >> ```
> >> What did you expect to happen? (Expected behavior)
> >>
> >> I expect all files in `data` folder to be ignored except `.txt` files.
> >>
> >> What happened instead? (Actual behavior)
> >>
> >> `file1` and `file2` are not ignored.
> >> Here is the `check-ignore` output:
> >> ```
> >> $ git check-ignore -v data/data1/file1
> >> .gitignore:2:!/data/**/ data/data1/file1
> >> ```
> >
> > Thanks for an easy reproduction. It looks like this changed in
> > f6526728f9 (dir: select directories correctly, 2021-09-24). Author cc'd.
>
> Thanks for the bisect and CC.
>
> > The key thing seems to be that the second line of your .gitignore should
> > match only directories (because of the trailing slash), but no longer
> > does.
>
> Doesn't "matching only directories" mean it would match everything
> within that directory? (It also means that "data/file" is not matched,
> which is still correct.)
>
> My interpretation of these patterns is that everything in data/data1/
> and data/data2/ should not be ignored, making it seem like the change
> fixed a bug (it definitely changed behavior).
>
> Just for extra clarity, this test currently passes:
>
> test_expect_success 'directories and ** matches' '
>         cat >.gitignore <<-\EOF &&
>         data/**
>         !data/**/
>         !data/**/*.txt
>         EOF
>         git check-ignore file \
>                 data/file data/data1/file1 data/data1/file1.txt \
>                 data/data2/file2 data/data2/file2.txt >actual &&
>         cat >expect <<-\EOF &&
>         data/file
>         EOF
>         test_cmp expect actual
> '
>
> but the previous behavior would have passed this test:
>
> test_expect_success 'directories and ** matches' '
>         cat >.gitignore <<-\EOF &&
>         data/**
>         !data/**/
>         !data/**/*.txt
>         EOF
>         git check-ignore file \
>                 data/file data/data1/file1.txt \
>                 data/data2/file2.txt >actual &&
>         cat >expect <<-\EOF &&
>         data/file
>         EOF
>         test_cmp expect actual
> '
>
> I seek more clarity on this. Specifically: if we match a directory
> then should we not also match the contents within?
>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-19  4:01     ` Danial Alihosseini
@ 2021-11-19 14:51       ` Derrick Stolee
  2021-11-19 17:06         ` Danial Alihosseini
  2021-11-19 20:05         ` Johannes Sixt
  0 siblings, 2 replies; 13+ messages in thread
From: Derrick Stolee @ 2021-11-19 14:51 UTC (permalink / raw)
  To: Danial Alihosseini; +Cc: Jeff King, Derrick Stolee, git

On 11/18/2021 11:01 PM, Danial Alihosseini wrote:
> Thanks for your follow-up.
> 
> I wanted to ignore all files in the "data" folder except ".txt" ones.
> 
> As mentioned in the gitignore doc, there should be a difference
> between "**" and "**/".
> 
>> A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.
> 
> and,
>>
>> A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.
> 
> and also,
>>
>> It is not possible to re-include a file if a parent directory of that file is excluded.

So this is the key point of why you can't just omit the !data/**/
line.

What is unclear to me is what exactly "match a directory" means.
If we ignore a directory, then we ignore everything inside it (until
another pattern says we should care about it), but the converse
should also hold: if we have a pattern like "!data/**/", then that
should mean "include everything inside data/<A>/ where <A> is any
directory name".

My inability to form a mental model where the existing behavior
matches the documented specification is an indicator that this was
changed erroneously. A revert patch is included at the end of this
message.

If anyone could help clarify my understanding here, then maybe
there is room for improving the documentation.

> So, I excluded all files by "data/**", re-included just directories
> (at any depth) by "!data/**/" and, re-included ".txt" files (at any
> depth) by "!data/**/*.txt".
> However, if I wanted to re-include a directory and all of its
> contents, I could use something like "!data/data1/**", without
> trailing slash.
> As I see, there is no separation between "**" and "**/" in the current version.
> 
> I think there is another point that the previous behavior test case
> should be like this:
> test_expect_success 'directories and ** matches' '
>     cat >.gitignore <<-\EOF &&
>     data/**
>     !data/**/
>     !data/**/*.txt
>     EOF
>     git check-ignore file \
>         data/file data/data1/file1 data/data1/file1.txt \
>         data/data2/file2 data/data2/file2.txt >actual &&
>     cat >expect <<-\EOF &&
>     data/file
>     data/data1/file1
>     data/data2/file2
>     EOF
>     test_cmp expect actual
> '

Thank you for seeing my poorly-edited test and understanding
what it _should_ have been.

Here is a revert including this new test. I further tested
with the Scalar functional tests [1] to see if I could find
what I was thinking when making this original change. Since
this is so similar to another revert we caught before the
2.34.0 release, I regret not checking adjacent commits for
a similar "reverting does not affect the test suite" case.

---- >8 ----

From f833967da83ecd51090ebb75a4b8173a775d16f1 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 19 Nov 2021 09:13:49 -0500
Subject: [PATCH] dir: revert "dir: select directories correctly"

This reverts commit f6526728f950cacfd5b5e42bcc65f2c47f3da654.

The change in f652672 (dir: select directories correctly, 2021-09-24)
caused a regression in directory-based matches with non-cone-mode
patterns, especially for .gitignore patterns. A test is included to
prevent this regression in the future.

The commit ed495847 (dir: fix pattern matching on dirs, 2021-09-24) was
reverted in 5ceb663 (dir: fix directory-matching bug, 2021-11-02) for
similar reasons. Neither commit changed tests, and tests added later in
the series continue to pass when these commits are reverted.

Reported-by: Danial Alihosseini <danial.alihosseini@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c              | 54 +++++-----------------------------------------
 t/t0008-ignores.sh | 17 +++++++++++++++
 2 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/dir.c b/dir.c
index 9ea6cfe61c..86afa2eae0 100644
--- a/dir.c
+++ b/dir.c
@@ -1303,44 +1303,6 @@ int match_pathname(const char *pathname, int pathlen,
 				 WM_PATHNAME) == 0;
 }
 
-static int path_matches_dir_pattern(const char *pathname,
-				    int pathlen,
-				    struct strbuf **path_parent,
-				    int *dtype,
-				    struct path_pattern *pattern,
-				    struct index_state *istate)
-{
-	if (!*path_parent) {
-		char *slash;
-		CALLOC_ARRAY(*path_parent, 1);
-		strbuf_add(*path_parent, pathname, pathlen);
-		slash = find_last_dir_sep((*path_parent)->buf);
-
-		if (slash)
-			strbuf_setlen(*path_parent, slash - (*path_parent)->buf);
-		else
-			strbuf_setlen(*path_parent, 0);
-	}
-
-	/*
-	 * If the parent directory matches the pattern, then we do not
-	 * need to check for dtype.
-	 */
-	if ((*path_parent)->len &&
-	    match_pathname((*path_parent)->buf, (*path_parent)->len,
-			   pattern->base,
-			   pattern->baselen ? pattern->baselen - 1 : 0,
-			   pattern->pattern, pattern->nowildcardlen,
-			   pattern->patternlen, pattern->flags))
-		return 1;
-
-	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
-	if (*dtype != DT_DIR)
-		return 0;
-
-	return 1;
-}
-
 /*
  * Scan the given exclude list in reverse to see whether pathname
  * should be ignored.  The first match (i.e. the last on the list), if
@@ -1356,7 +1318,6 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
 {
 	struct path_pattern *res = NULL; /* undecided */
 	int i;
-	struct strbuf *path_parent = NULL;
 
 	if (!pl->nr)
 		return NULL;	/* undefined */
@@ -1366,10 +1327,11 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
 		const char *exclude = pattern->pattern;
 		int prefix = pattern->nowildcardlen;
 
-		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR &&
-		    !path_matches_dir_pattern(pathname, pathlen, &path_parent,
-					      dtype, pattern, istate))
-			continue;
+		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
+			*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
+			if (*dtype != DT_DIR)
+				continue;
+		}
 
 		if (pattern->flags & PATTERN_FLAG_NODIR) {
 			if (match_basename(basename,
@@ -1393,12 +1355,6 @@ static struct path_pattern *last_matching_pattern_from_list(const char *pathname
 			break;
 		}
 	}
-
-	if (path_parent) {
-		strbuf_release(path_parent);
-		free(path_parent);
-	}
-
 	return res;
 }
 
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 46a5038fed..fce7c7b408 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -828,6 +828,23 @@ test_expect_success 'exact prefix matching (without root)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'directories and ** matches' '
+	cat >.gitignore <<-\EOF &&
+	data/**
+	!data/**/
+	!data/**/*.txt
+	EOF
+	git check-ignore file \
+		data/file data/data1/file1 data/data1/file1.txt \
+		data/data2/file2 data/data2/file2.txt >actual &&
+	cat >expect <<-\EOF &&
+	data/file
+	data/data1/file1
+	data/data2/file2
+	EOF
+	test_cmp expect actual
+'
+
 ############################################################################
 #
 # test whitespace handling
-- 
2.34.0.dirty


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-19 14:51       ` Derrick Stolee
@ 2021-11-19 17:06         ` Danial Alihosseini
  2021-11-19 20:05         ` Johannes Sixt
  1 sibling, 0 replies; 13+ messages in thread
From: Danial Alihosseini @ 2021-11-19 17:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jeff King, Derrick Stolee, git

Thanks Derrick.

> What is unclear to me is what exactly "match a directory" means.
> If we ignore a directory, then we ignore everything inside it (until
> another pattern says we should care about it), but the converse
> should also hold: if we have a pattern like "!data/**/", then that
> should mean "include everything inside data/<A>/ where <A> is any
> directory name".

I think if we consider directory and files separately, it would be more clear.
If a pattern in gitignore matches a directory, it is just about that
directory and not about its contents.
The contents are ignored by another rule which states:
> It is not possible to re-include a file if a parent directory of that file is excluded.

In our test case, the "data1" directory is ignored by the "data/**" pattern.
However, the contents are ignored by two rules. One of them is that
the parent ("data1") is not tracked. Another one is that the "data/**"
pattern matches them.
By adding the "!data/**/" pattern, the "data1" directory is
re-included, but its contents are still ignored by the "data/**"
pattern.

I hope this will help to make the rules more clear.

Thanks
Danial

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-19 14:51       ` Derrick Stolee
  2021-11-19 17:06         ` Danial Alihosseini
@ 2021-11-19 20:05         ` Johannes Sixt
  2021-11-19 20:33           ` Derrick Stolee
  1 sibling, 1 reply; 13+ messages in thread
From: Johannes Sixt @ 2021-11-19 20:05 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Danial Alihosseini, Jeff King, Derrick Stolee, git

Am 19.11.21 um 15:51 schrieb Derrick Stolee:
> What is unclear to me is what exactly "match a directory" means.
> If we ignore a directory, then we ignore everything inside it (until
> another pattern says we should care about it), but the converse
> should also hold: if we have a pattern like "!data/**/", then that
> should mean "include everything inside data/<A>/ where <A> is any
> directory name".
> 
> My inability to form a mental model where the existing behavior
> matches the documented specification is an indicator that this was
> changed erroneously. A revert patch is included at the end of this
> message.
> 
> If anyone could help clarify my understanding here, then maybe
> there is room for improving the documentation.

You form a wrong mental model when you start with the grand picture of a
working tree. That is, when you say

- here I have theeeeeese many files and directories,
- and I want to ignore some: foo/**/,
- but I don't want to ignore others: !bar/**/.

This forms the wrong mental model because that is not how Git sees the
working tree: it never has a grand picture of all of its contents.

Git only ever sees the contents of one directory. When Git determines
that a sub-directory is ignored, then that one's contents are never
inspected, and there is no opportunity to un-ignore some of the
sub-directory's contents.

-- Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-19 20:05         ` Johannes Sixt
@ 2021-11-19 20:33           ` Derrick Stolee
  2021-11-19 20:57             ` Ævar Arnfjörð Bjarmason
  2021-11-20 22:41             ` Chris Torek
  0 siblings, 2 replies; 13+ messages in thread
From: Derrick Stolee @ 2021-11-19 20:33 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Danial Alihosseini, Jeff King, Derrick Stolee, git

On 11/19/2021 3:05 PM, Johannes Sixt wrote:
> Am 19.11.21 um 15:51 schrieb Derrick Stolee:
>> What is unclear to me is what exactly "match a directory" means.
>> If we ignore a directory, then we ignore everything inside it (until
>> another pattern says we should care about it), but the converse
>> should also hold: if we have a pattern like "!data/**/", then that
>> should mean "include everything inside data/<A>/ where <A> is any
>> directory name".
>>
>> My inability to form a mental model where the existing behavior
>> matches the documented specification is an indicator that this was
>> changed erroneously. A revert patch is included at the end of this
>> message.
>>
>> If anyone could help clarify my understanding here, then maybe
>> there is room for improving the documentation.
> 
> You form a wrong mental model when you start with the grand picture of a
> working tree. That is, when you say
> 
> - here I have theeeeeese many files and directories,
> - and I want to ignore some: foo/**/,
> - but I don't want to ignore others: !bar/**/.
> 
> This forms the wrong mental model because that is not how Git sees the
> working tree: it never has a grand picture of all of its contents.
> 
> Git only ever sees the contents of one directory. When Git determines
> that a sub-directory is ignored, then that one's contents are never
> inspected, and there is no opportunity to un-ignore some of the
> sub-directory's contents.

So the problem is this: I want to know "I have a file named <X>, and
a certain pattern set, does <X> match the patterns or not?" but in
fact it's not just "check <X> against the patterns in order" but
actually "check every parent directory of <X> in order to see if
any directory is unmatched, which would preclude any later matches
to other parents of <X>"

So really, to check a path, we really want to first iterate on the
parent directories. If we get a match on a positive pattern on level
i, then we check level (i+1) for a match on a negative pattern. If
we find that negative pattern match, then continue. If we do not see
a negative match, then we terminate by matching the entire path <X>.

I'm still not seeing a clear way of describing the matching procedure
here for a single path, and that's fine. Me understanding is not a
necessary condition for fixing this bug.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-19 20:33           ` Derrick Stolee
@ 2021-11-19 20:57             ` Ævar Arnfjörð Bjarmason
  2021-11-20 22:41             ` Chris Torek
  1 sibling, 0 replies; 13+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-19 20:57 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Johannes Sixt, Danial Alihosseini, Jeff King, Derrick Stolee, git


On Fri, Nov 19 2021, Derrick Stolee wrote:

> On 11/19/2021 3:05 PM, Johannes Sixt wrote:
>> Am 19.11.21 um 15:51 schrieb Derrick Stolee:
>>> What is unclear to me is what exactly "match a directory" means.
>>> If we ignore a directory, then we ignore everything inside it (until
>>> another pattern says we should care about it), but the converse
>>> should also hold: if we have a pattern like "!data/**/", then that
>>> should mean "include everything inside data/<A>/ where <A> is any
>>> directory name".
>>>
>>> My inability to form a mental model where the existing behavior
>>> matches the documented specification is an indicator that this was
>>> changed erroneously. A revert patch is included at the end of this
>>> message.
>>>
>>> If anyone could help clarify my understanding here, then maybe
>>> there is room for improving the documentation.
>> 
>> You form a wrong mental model when you start with the grand picture of a
>> working tree. That is, when you say
>> 
>> - here I have theeeeeese many files and directories,
>> - and I want to ignore some: foo/**/,
>> - but I don't want to ignore others: !bar/**/.
>> 
>> This forms the wrong mental model because that is not how Git sees the
>> working tree: it never has a grand picture of all of its contents.
>> 
>> Git only ever sees the contents of one directory. When Git determines
>> that a sub-directory is ignored, then that one's contents are never
>> inspected, and there is no opportunity to un-ignore some of the
>> sub-directory's contents.
>
> So the problem is this: I want to know "I have a file named <X>, and
> a certain pattern set, does <X> match the patterns or not?" but in
> fact it's not just "check <X> against the patterns in order" but
> actually "check every parent directory of <X> in order to see if
> any directory is unmatched, which would preclude any later matches
> to other parents of <X>"
>
> So really, to check a path, we really want to first iterate on the
> parent directories. If we get a match on a positive pattern on level
> i, then we check level (i+1) for a match on a negative pattern. If
> we find that negative pattern match, then continue. If we do not see
> a negative match, then we terminate by matching the entire path <X>.
>
> I'm still not seeing a clear way of describing the matching procedure
> here for a single path, and that's fine. Me understanding is not a
> necessary condition for fixing this bug.

Just watching this thread on the sidelines I think it would help if it
can be distilled down to a wildatch() test that doesn't have to do with
the pathspec matching code.

I.e. can you stick the "should this match?" into t3070 and it does the
same thing, or is this to do with the pathspec-specific sugar on top,
either that it splits paths and then matches them, that there's some
information about the path type in there added on top, or that it's to
do with the specifics of the exclude/include gitignore matching?

FWIW I have some old WIP patches somewhere where I made this match
behavior much faster by compiling the (using a mode PCREv2 has) glob
syntax into PCRE's, which are then JIT'ed, and matched.

To do that I had to unpeel this whole truncation of the pattern thing,
and IIRC it didn't matter for speed (or maybe it did just with the
wildmatch code?).

Maybe all of this is irrelevant, sorry. I haven't looked into this issue
at all, just skimmed this growing thread over the past day, maybe some
of the above helps, or not...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-19 20:33           ` Derrick Stolee
  2021-11-19 20:57             ` Ævar Arnfjörð Bjarmason
@ 2021-11-20 22:41             ` Chris Torek
  2021-11-21  0:46               ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Torek @ 2021-11-20 22:41 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Johannes Sixt, Danial Alihosseini, Jeff King, Derrick Stolee,
	Git List

On Fri, Nov 19, 2021 at 12:44 PM Derrick Stolee <stolee@gmail.com> wrote:
> So the problem is this: I want to know "I have a file named <X>, and
> a certain pattern set, does <X> match the patterns or not?" but in
> fact it's not just "check <X> against the patterns in order"  ...

Right.  You're starting with the wrong problem by assuming that
you *do* have file named <X> in the first place.  You would not
have it at all in some cases.

I've always found this somewhat puzzling, or at least very tricky
to explain, in the presence of a file that is tracked because it's
already in the index, despite having a parent directory that would
have caused the file to not have been found.  So the standard
explanation -- at least, the one I use -- is this:

 * Git opens and reads the working tree directory.  For each file
   or directory that is actually present here, Git checks it
   against the ignore rules.  Some rules match only directories
   and others match both directories and files.  Some rules say
   "do ignore" and some say "do not ignore".

 * The *last* applicable rule wins.

 * If this is a file and the file is ignored, it's ignored.
   Unless, that is, it's in the index already, because then it's
   tracked and can't be ignored.

 * If this is a directory and the directory is ignored, it's
   not even opened and read.  It's not in the index because
   directories are never in the index (at least nominally).
   If it is opened and read, the entire set of rules here
   apply recursively.

This works, but skips over files that are in the index and are in
a directory that won't be read.  So I add one last rule, which is
that already-tracked files are checked despite not being scanned
during the above process.  (That's because the process above is
only used to determine which files to *complain about* at `git
status` time, or auto-add with `git add --all` or the like.)

So: these files are not ignored ... but is their directory ever
read?  The actual answer, per testing, is "no", which matches with
the parenthetical in the paragraph just above.  But no
documentation says this, explicitly, one way or another.

Incidentally, while I have no patches to contribute at this point,
I do think it would be sensible for Git to read a `.gitignore`
that says:

    *
    !a/b/c

as meaning:

    *
    !a/
    !a/b/
    !a/b/c

That is, declaring an un-ignored file within some ignored
directory should automatically imply that we *must* un-ignore the
parents.  It doesn't seem like it should be that hard to insert
some extra rules internally here (though without `*` we'd want
to ignore `a/*`, i.e., the above is optimized beyond what I
would expect from automated rule insertion).

Chris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-20 22:41             ` Chris Torek
@ 2021-11-21  0:46               ` Junio C Hamano
  2021-11-23 12:21                 ` Philip Oakley
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2021-11-21  0:46 UTC (permalink / raw)
  To: Chris Torek
  Cc: Derrick Stolee, Johannes Sixt, Danial Alihosseini, Jeff King,
	Derrick Stolee, Git List

Chris Torek <chris.torek@gmail.com> writes:

> ...  So the standard
> explanation -- at least, the one I use -- is this:
>
>  * Git opens and reads the working tree directory.  For each file
>    or directory that is actually present here, Git checks it
>    against the ignore rules.  Some rules match only directories
>    and others match both directories and files.  Some rules say
>    "do ignore" and some say "do not ignore".
>
>  * The *last* applicable rule wins.
>
>  * If this is a file and the file is ignored, it's ignored.
>    Unless, that is, it's in the index already, because then it's
>    tracked and can't be ignored.
>
>  * If this is a directory and the directory is ignored, it's
>    not even opened and read.  It's not in the index because
>    directories are never in the index (at least nominally).
>    If it is opened and read, the entire set of rules here
>    apply recursively.
>
> This works, but skips over files that are in the index and are in
> a directory that won't be read.  So I add one last rule, which is

All of the above is sensible.  If you deal with a path that is in
the index upfront, you can simplify the later rules somewhat, I
would think.  I.e. add a first rule before everything else that
says:

 * A file in the index is not ignored.  Everything below is about a
   path not in the index.

Then your third rule can lose "Unless...", and you do not have to
add one last rule outside the bulleted list, either.

;-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-21  0:46               ` Junio C Hamano
@ 2021-11-23 12:21                 ` Philip Oakley
  2021-11-23 21:13                   ` Danial Alihosseini
  0 siblings, 1 reply; 13+ messages in thread
From: Philip Oakley @ 2021-11-23 12:21 UTC (permalink / raw)
  To: Junio C Hamano, Chris Torek
  Cc: Derrick Stolee, Johannes Sixt, Danial Alihosseini, Jeff King,
	Derrick Stolee, Git List

On 21/11/2021 00:46, Junio C Hamano wrote:
> Chris Torek <chris.torek@gmail.com> writes:
>
>> ...  So the standard
>> explanation -- at least, the one I use -- is this:
>>
>>  * Git opens and reads the working tree directory.  For each file
>>    or directory that is actually present here, Git checks it
>>    against the ignore rules.  Some rules match only directories
>>    and others match both directories and files.  Some rules say
>>    "do ignore" and some say "do not ignore".
>>
>>  * The *last* applicable rule wins.
>>
>>  * If this is a file and the file is ignored, it's ignored.
>>    Unless, that is, it's in the index already, because then it's
>>    tracked and can't be ignored.
>>
>>  * If this is a directory and the directory is ignored, it's
>>    not even opened and read.  It's not in the index because
>>    directories are never in the index (at least nominally).
>>    If it is opened and read, the entire set of rules here
>>    apply recursively.
>>
>> This works, but skips over files that are in the index and are in
>> a directory that won't be read.  So I add one last rule, which is
> All of the above is sensible.  If you deal with a path that is in
> the index upfront, you can simplify the later rules somewhat, I
> would think.  I.e. add a first rule before everything else that
> says:
>
>  * A file in the index is not ignored.  Everything below is about a
>    path not in the index.
>
> Then your third rule can lose "Unless...", and you do not have to
> add one last rule outside the bulleted list, either.
>
> ;-)
>
Perhaps, the clarifications are an opportunity to improve the
documentation, should the Danial think that the explanation would have
helped. The 'ignore' matching rules does appear to show up quite often
on the list.

Any thoughts, Danial?

Philip

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git 2.34.0: Behavior of `**` in gitignore is different from previous versions.
  2021-11-23 12:21                 ` Philip Oakley
@ 2021-11-23 21:13                   ` Danial Alihosseini
  0 siblings, 0 replies; 13+ messages in thread
From: Danial Alihosseini @ 2021-11-23 21:13 UTC (permalink / raw)
  To: philipoakley
  Cc: Junio C Hamano, Chris Torek, Derrick Stolee, Johannes Sixt,
	Jeff King, Derrick Stolee, Git List

>  * Git opens and reads the working tree directory.  For each file
>
>    or directory that is actually present here, Git checks it
>
>    against the ignore rules.  Some rules match only directories
>
>    and others match both directories and files.  Some rules say
>    "do ignore" and some say "do not ignore".
>  * The *last* applicable rule wins.
>  * If this is a file and the file is ignored, it's ignored.
>    Unless, that is, it's in the index already, because then it's
>    tracked and can't be ignored.
>  * If this is a directory and the directory is ignored, it's
>    not even opened and read.  It's not in the index because
>    directories are never in the index (at least nominally).
>    If it is opened and read, the entire set of rules here
>    apply recursively.

> I do think it would be sensible for Git to read a `.gitignore`
>
> that says:
>
>     *
>
>     !a/b/c
>
> as meaning:
>     *
>     !a/
>     !a/b/
>     !a/b/c
> That is, declaring an un-ignored file within some ignored
> directory should automatically imply that we *must* un-ignore the
> parents.

Thanks, Chris. Yes, it mostly makes sense to me too.
I think this will better describe the gitignore behavior.
However, checking the indexed files at first may enhance it.

In addition to that, there should be an exception about negating patterns.
For example, for the "!**/file.txt" pattern, it might be tricky to
re-include all possible parents of "file.txt" files. You have to
traverse directories to reach files and find out which parents to
un-ignore. Any negating pattern with "**" at the beginning or in the
middle of the pattern can make trouble.
The existing rule which states the file cannot be re-included in case
of ignored parents may help to prevent this.

Thanks
Danial

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-23 21:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 16:41 git 2.34.0: Behavior of `**` in gitignore is different from previous versions Danial Alihosseini
2021-11-18 17:04 ` Jeff King
2021-11-18 22:09   ` Derrick Stolee
2021-11-19  4:01     ` Danial Alihosseini
2021-11-19 14:51       ` Derrick Stolee
2021-11-19 17:06         ` Danial Alihosseini
2021-11-19 20:05         ` Johannes Sixt
2021-11-19 20:33           ` Derrick Stolee
2021-11-19 20:57             ` Ævar Arnfjörð Bjarmason
2021-11-20 22:41             ` Chris Torek
2021-11-21  0:46               ` Junio C Hamano
2021-11-23 12:21                 ` Philip Oakley
2021-11-23 21:13                   ` Danial Alihosseini

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).