* Oddities in gitignore matching
@ 2022-06-26 19:34 John Thorvald Wodder II
2022-06-28 9:13 ` Phillip Wood
0 siblings, 1 reply; 5+ messages in thread
From: John Thorvald Wodder II @ 2022-06-26 19:34 UTC (permalink / raw)
To: git
Git developers & fans,
I'm developing a Python library for performing gitignore-style path matching (because all the pre-existing ones I've found so far leave something to be desired), and, in my experiments to nail down Git's exact behavior, I've encountered three odd things that may be bugs or may be deficiencies in the documentation; let me know which.
First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored. However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar"). Is this behavior of non-leading "**/" deliberate or a bug?
Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.
Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy. (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.) The wildmatch test-tool agrees with this behavior, though.
Third: While the documentation for `git-check-ignore` only states that it works on files, I've found that it also works with directory paths, as well as treating any nonexistent path ending in a slash as a directory. For example, in a fresh repository containing only a .gitignore file with the pattern "foo/", `git-check-ignore` will accept the path "foo/" but not "foo", and if `mkdir foo` is run, it will accept both.
However, I've found a case in which `git-check-ignore` deviates from the actual .gitignore behavior regarding ignoring directories. If .gitignore contains only the pattern "foo/*", then (regardless of whether a directory named "foo" exists or not), `git-check-ignore` will accept "foo/" but not "foo" — and yet, if you do `mkdir foo; touch foo/bar` and run `git status --ignored=matching --porcelain`, it shows "!! foo/bar", rather than "!! foo/" (which you get with the .gitignore pattern "foo/"), indicating that "foo/*" matches the contents of "foo" but not "foo" itself, in apparent disagreement with `git-check-ignore`. Is this a flaw in `git-check-ignore`, or should it just not be trusted in the first place when it comes to directories?
These observations were made using Git version 2.36.1, installed via Homebrew on macOS 11.6.6, with test-tool compiled from commit 39c15e4855 of the Git source.
Thank you for your time reading & responding,
-- John Wodder
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Oddities in gitignore matching
2022-06-26 19:34 Oddities in gitignore matching John Thorvald Wodder II
@ 2022-06-28 9:13 ` Phillip Wood
2022-06-28 13:48 ` John Thorvald Wodder II
0 siblings, 1 reply; 5+ messages in thread
From: Phillip Wood @ 2022-06-28 9:13 UTC (permalink / raw)
To: John Thorvald Wodder II, git
Hi John
On 26/06/2022 20:34, John Thorvald Wodder II wrote:
> First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored. However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar"). Is this behavior of non-leading "**/" deliberate or a bug?
I've no idea if it is deliberate or not but it seems reasonable and I
think it matches shells like fish, tcsh and zsh though not bash (I think
our documented behavior matches bash).
> Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.
>
> Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy. (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.) The wildmatch test-tool agrees with this behavior, though.
This is because git defines its own isspace() that does not treat '\v'
or '\f' as whitespace (see git-compat-util.h and ctype.c). I'm not sure
why we exclude those characters, I think the reason for defining our own
isspace() is to avoid the locale dependent behaviour of the standard
version.
I'm afraid I don't have anything useful to add for your third point
Best Wishes
Phillip
> Third: While the documentation for `git-check-ignore` only states that it works on files, I've found that it also works with directory paths, as well as treating any nonexistent path ending in a slash as a directory. For example, in a fresh repository containing only a .gitignore file with the pattern "foo/", `git-check-ignore` will accept the path "foo/" but not "foo", and if `mkdir foo` is run, it will accept both.
>
> However, I've found a case in which `git-check-ignore` deviates from the actual .gitignore behavior regarding ignoring directories. If .gitignore contains only the pattern "foo/*", then (regardless of whether a directory named "foo" exists or not), `git-check-ignore` will accept "foo/" but not "foo" — and yet, if you do `mkdir foo; touch foo/bar` and run `git status --ignored=matching --porcelain`, it shows "!! foo/bar", rather than "!! foo/" (which you get with the .gitignore pattern "foo/"), indicating that "foo/*" matches the contents of "foo" but not "foo" itself, in apparent disagreement with `git-check-ignore`. Is this a flaw in `git-check-ignore`, or should it just not be trusted in the first place when it comes to directories?
>
> These observations were made using Git version 2.36.1, installed via Homebrew on macOS 11.6.6, with test-tool compiled from commit 39c15e4855 of the Git source.
>
> Thank you for your time reading & responding,
>
> -- John Wodder
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Oddities in gitignore matching
2022-06-28 9:13 ` Phillip Wood
@ 2022-06-28 13:48 ` John Thorvald Wodder II
2022-06-28 13:57 ` John Thorvald Wodder II
2022-06-29 9:22 ` Phillip Wood
0 siblings, 2 replies; 5+ messages in thread
From: John Thorvald Wodder II @ 2022-06-28 13:48 UTC (permalink / raw)
To: phillip.wood; +Cc: git
On 2022 Jun 28, at 05:13, Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi John
>
> On 26/06/2022 20:34, John Thorvald Wodder II wrote:
>> First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored. However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar"). Is this behavior of non-leading "**/" deliberate or a bug?
>
> I've no idea if it is deliberate or not but it seems reasonable and I think it matches shells like fish, tcsh and zsh though not bash (I think our documented behavior matches bash).
OK, but it turns out that "foo**/bar" also matches just "foobar", no slash, which definitely seems wrong.
>> Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.
>> Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy. (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.) The wildmatch test-tool agrees with this behavior, though.
>
> This is because git defines its own isspace() that does not treat '\v' or '\f' as whitespace (see git-compat-util.h and ctype.c). I'm not sure why we exclude those characters, I think the reason for defining our own isspace() is to avoid the locale dependent behaviour of the standard version.
Thank you for the explanation.
---
Through further experimentation, I've discovered a fourth oddity with gitignore: If "foo//" (with two or more trailing slashes) is added to .gitignore and `mkdir -p foo/bar` is run, then `git status --ignored=matching --porcelain` won't show "foo/" or "foo/bar/" at all, which is something I'd previously only encountered for completely empty top-level directories. This holds true no matter how deep or wide you make the directory tree at "foo/", as long as it's all-directories; once a file gets added somewhere under "foo/", the "git status" command shows "foo/" as ignored.
-- John Wodder
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Oddities in gitignore matching
2022-06-28 13:48 ` John Thorvald Wodder II
@ 2022-06-28 13:57 ` John Thorvald Wodder II
2022-06-29 9:22 ` Phillip Wood
1 sibling, 0 replies; 5+ messages in thread
From: John Thorvald Wodder II @ 2022-06-28 13:57 UTC (permalink / raw)
To: John Thorvald Wodder II; +Cc: phillip.wood, git
On 2022 Jun 28, at 09:48, John Thorvald Wodder II <jwodder@gmail.com> wrote:
> Through further experimentation, I've discovered a fourth oddity with gitignore: If "foo//" (with two or more trailing slashes) is added to .gitignore and `mkdir -p foo/bar` is run, then `git status --ignored=matching --porcelain` won't show "foo/" or "foo/bar/" at all, which is something I'd previously only encountered for completely empty top-level directories. This holds true no matter how deep or wide you make the directory tree at "foo/", as long as it's all-directories; once a file gets added somewhere under "foo/", the "git status" command shows "foo/" as ignored.
Correction: When a file is added under "foo/", then "git status" shows it as *untracked*, not ignored. I also see now that a pure-directory tree being absent from "git status" happens even without a .gitignore, so presumably this is intended behavior. It's just a little odd that, if "foo/" is an empty top-level directory, it only shows up in "git status --ignored=matching" if it's ignored.
-- John Wodder
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Oddities in gitignore matching
2022-06-28 13:48 ` John Thorvald Wodder II
2022-06-28 13:57 ` John Thorvald Wodder II
@ 2022-06-29 9:22 ` Phillip Wood
1 sibling, 0 replies; 5+ messages in thread
From: Phillip Wood @ 2022-06-29 9:22 UTC (permalink / raw)
To: John Thorvald Wodder II, phillip.wood
Cc: git, Ævar Arnfjörð Bjarmason
On 28/06/2022 14:48, John Thorvald Wodder II wrote:
> On 2022 Jun 28, at 05:13, Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> Hi John
>>
>> On 26/06/2022 20:34, John Thorvald Wodder II wrote:
>>> First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored. However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar"). Is this behavior of non-leading "**/" deliberate or a bug?
>>
>> I've no idea if it is deliberate or not but it seems reasonable and I think it matches shells like fish, tcsh and zsh though not bash (I think our documented behavior matches bash).
>
> OK, but it turns out that "foo**/bar" also matches just "foobar", no slash, which definitely seems wrong.
Yes that definitely sounds like a bug, I've cc'd Ævar who I think is
more familiar with the pattern matching code than I am
Best Wishes
Phillip
>>> Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.
>>> Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy. (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.) The wildmatch test-tool agrees with this behavior, though.
>>
>> This is because git defines its own isspace() that does not treat '\v' or '\f' as whitespace (see git-compat-util.h and ctype.c). I'm not sure why we exclude those characters, I think the reason for defining our own isspace() is to avoid the locale dependent behaviour of the standard version.
>
> Thank you for the explanation.
>
> ---
>
> Through further experimentation, I've discovered a fourth oddity with gitignore: If "foo//" (with two or more trailing slashes) is added to .gitignore and `mkdir -p foo/bar` is run, then `git status --ignored=matching --porcelain` won't show "foo/" or "foo/bar/" at all, which is something I'd previously only encountered for completely empty top-level directories. This holds true no matter how deep or wide you make the directory tree at "foo/", as long as it's all-directories; once a file gets added somewhere under "foo/", the "git status" command shows "foo/" as ignored.
>
> -- John Wodder
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-06-29 9:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26 19:34 Oddities in gitignore matching John Thorvald Wodder II
2022-06-28 9:13 ` Phillip Wood
2022-06-28 13:48 ` John Thorvald Wodder II
2022-06-28 13:57 ` John Thorvald Wodder II
2022-06-29 9:22 ` Phillip Wood
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).