git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Oddities in gitignore matching
@ 2022-06-26 19:34 John Thorvald Wodder II
  2022-06-28  9:13 ` Phillip Wood
  0 siblings, 1 reply; 5+ messages in thread
From: John Thorvald Wodder II @ 2022-06-26 19:34 UTC (permalink / raw)
  To: git

Git developers & fans,

I'm developing a Python library for performing gitignore-style path matching (because all the pre-existing ones I've found so far leave something to be desired), and, in my experiments to nail down Git's exact behavior, I've encountered three odd things that may be bugs or may be deficiencies in the documentation; let me know which.

First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored.  However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar").  Is this behavior of non-leading "**/" deliberate or a bug?

Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.

Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy.  (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.)  The wildmatch test-tool agrees with this behavior, though.

Third: While the documentation for `git-check-ignore` only states that it works on files, I've found that it also works with directory paths, as well as treating any nonexistent path ending in a slash as a directory.  For example, in a fresh repository containing only a .gitignore file with the pattern "foo/", `git-check-ignore` will accept the path "foo/" but not "foo", and if `mkdir foo` is run, it will accept both.

However, I've found a case in which `git-check-ignore` deviates from the actual .gitignore behavior regarding ignoring directories.  If .gitignore contains only the pattern "foo/*", then (regardless of whether a directory named "foo" exists or not), `git-check-ignore` will accept "foo/" but not "foo" — and yet, if you do `mkdir foo; touch foo/bar` and run `git status --ignored=matching --porcelain`, it shows "!! foo/bar", rather than "!! foo/" (which you get with the .gitignore pattern "foo/"), indicating that "foo/*" matches the contents of "foo" but not "foo" itself, in apparent disagreement with `git-check-ignore`.  Is this a flaw in `git-check-ignore`, or should it just not be trusted in the first place when it comes to directories?

These observations were made using Git version 2.36.1, installed via Homebrew on macOS 11.6.6, with test-tool compiled from commit 39c15e4855 of the Git source.

Thank you for your time reading & responding,

-- John Wodder


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oddities in gitignore matching
  2022-06-26 19:34 Oddities in gitignore matching John Thorvald Wodder II
@ 2022-06-28  9:13 ` Phillip Wood
  2022-06-28 13:48   ` John Thorvald Wodder II
  0 siblings, 1 reply; 5+ messages in thread
From: Phillip Wood @ 2022-06-28  9:13 UTC (permalink / raw)
  To: John Thorvald Wodder II, git

Hi John

On 26/06/2022 20:34, John Thorvald Wodder II wrote:
> First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored.  However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar").  Is this behavior of non-leading "**/" deliberate or a bug?

I've no idea if it is deliberate or not but it seems reasonable and I 
think it matches shells like fish, tcsh and zsh though not bash (I think 
our documented behavior matches bash).

> Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.
> 
> Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy.  (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.)  The wildmatch test-tool agrees with this behavior, though.

This is because git defines its own isspace() that does not treat '\v' 
or '\f' as whitespace (see git-compat-util.h and ctype.c). I'm not sure 
why we exclude those characters, I think the reason for defining our own 
isspace() is to avoid the locale dependent behaviour of the standard 
version.

I'm afraid I don't have anything useful to add for your third point

Best Wishes

Phillip

> Third: While the documentation for `git-check-ignore` only states that it works on files, I've found that it also works with directory paths, as well as treating any nonexistent path ending in a slash as a directory.  For example, in a fresh repository containing only a .gitignore file with the pattern "foo/", `git-check-ignore` will accept the path "foo/" but not "foo", and if `mkdir foo` is run, it will accept both.
> 
> However, I've found a case in which `git-check-ignore` deviates from the actual .gitignore behavior regarding ignoring directories.  If .gitignore contains only the pattern "foo/*", then (regardless of whether a directory named "foo" exists or not), `git-check-ignore` will accept "foo/" but not "foo" — and yet, if you do `mkdir foo; touch foo/bar` and run `git status --ignored=matching --porcelain`, it shows "!! foo/bar", rather than "!! foo/" (which you get with the .gitignore pattern "foo/"), indicating that "foo/*" matches the contents of "foo" but not "foo" itself, in apparent disagreement with `git-check-ignore`.  Is this a flaw in `git-check-ignore`, or should it just not be trusted in the first place when it comes to directories?
> 
> These observations were made using Git version 2.36.1, installed via Homebrew on macOS 11.6.6, with test-tool compiled from commit 39c15e4855 of the Git source.
> 
> Thank you for your time reading & responding,
> 
> -- John Wodder
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oddities in gitignore matching
  2022-06-28  9:13 ` Phillip Wood
@ 2022-06-28 13:48   ` John Thorvald Wodder II
  2022-06-28 13:57     ` John Thorvald Wodder II
  2022-06-29  9:22     ` Phillip Wood
  0 siblings, 2 replies; 5+ messages in thread
From: John Thorvald Wodder II @ 2022-06-28 13:48 UTC (permalink / raw)
  To: phillip.wood; +Cc: git

On 2022 Jun 28, at 05:13, Phillip Wood <phillip.wood123@gmail.com> wrote:
> 
> Hi John
> 
> On 26/06/2022 20:34, John Thorvald Wodder II wrote:
>> First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored.  However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar").  Is this behavior of non-leading "**/" deliberate or a bug?
> 
> I've no idea if it is deliberate or not but it seems reasonable and I think it matches shells like fish, tcsh and zsh though not bash (I think our documented behavior matches bash).

OK, but it turns out that "foo**/bar" also matches just "foobar", no slash, which definitely seems wrong.

>> Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.
>> Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy.  (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.)  The wildmatch test-tool agrees with this behavior, though.
> 
> This is because git defines its own isspace() that does not treat '\v' or '\f' as whitespace (see git-compat-util.h and ctype.c). I'm not sure why we exclude those characters, I think the reason for defining our own isspace() is to avoid the locale dependent behaviour of the standard version.

Thank you for the explanation.

---

Through further experimentation, I've discovered a fourth oddity with gitignore: If "foo//" (with two or more trailing slashes) is added to .gitignore and `mkdir -p foo/bar` is run, then `git status --ignored=matching --porcelain` won't show "foo/" or "foo/bar/" at all, which is something I'd previously only encountered for completely empty top-level directories.  This holds true no matter how deep or wide you make the directory tree at "foo/", as long as it's all-directories; once a file gets added somewhere under "foo/", the "git status" command shows "foo/" as ignored.

-- John Wodder

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oddities in gitignore matching
  2022-06-28 13:48   ` John Thorvald Wodder II
@ 2022-06-28 13:57     ` John Thorvald Wodder II
  2022-06-29  9:22     ` Phillip Wood
  1 sibling, 0 replies; 5+ messages in thread
From: John Thorvald Wodder II @ 2022-06-28 13:57 UTC (permalink / raw)
  To: John Thorvald Wodder II; +Cc: phillip.wood, git

On 2022 Jun 28, at 09:48, John Thorvald Wodder II <jwodder@gmail.com> wrote:
> Through further experimentation, I've discovered a fourth oddity with gitignore: If "foo//" (with two or more trailing slashes) is added to .gitignore and `mkdir -p foo/bar` is run, then `git status --ignored=matching --porcelain` won't show "foo/" or "foo/bar/" at all, which is something I'd previously only encountered for completely empty top-level directories.  This holds true no matter how deep or wide you make the directory tree at "foo/", as long as it's all-directories; once a file gets added somewhere under "foo/", the "git status" command shows "foo/" as ignored.

Correction: When a file is added under "foo/", then "git status" shows it as *untracked*, not ignored.  I also see now that a pure-directory tree being absent from "git status" happens even without a .gitignore, so presumably this is intended behavior.  It's just a little odd that, if "foo/" is an empty top-level directory, it only shows up in "git status --ignored=matching" if it's ignored.

-- John Wodder

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oddities in gitignore matching
  2022-06-28 13:48   ` John Thorvald Wodder II
  2022-06-28 13:57     ` John Thorvald Wodder II
@ 2022-06-29  9:22     ` Phillip Wood
  1 sibling, 0 replies; 5+ messages in thread
From: Phillip Wood @ 2022-06-29  9:22 UTC (permalink / raw)
  To: John Thorvald Wodder II, phillip.wood
  Cc: git, Ævar Arnfjörð Bjarmason

On 28/06/2022 14:48, John Thorvald Wodder II wrote:
> On 2022 Jun 28, at 05:13, Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> Hi John
>>
>> On 26/06/2022 20:34, John Thorvald Wodder II wrote:
>>> First: I've found that the pattern "foo**/bar" causes the path "foo/glarch/bar" (as well as "foobie/glarch/bar") to be ignored.  However, the gitignore(5) documentation states that "**/" only has special meaning when it's "leading"; in other circumstances, the double star should be treated the same as a single star (and "foo*/bar" does not match "foo/glarch/bar").  Is this behavior of non-leading "**/" deliberate or a bug?
>>
>> I've no idea if it is deliberate or not but it seems reasonable and I think it matches shells like fish, tcsh and zsh though not bash (I think our documented behavior matches bash).
> 
> OK, but it turns out that "foo**/bar" also matches just "foobar", no slash, which definitely seems wrong.

Yes that definitely sounds like a bug, I've cc'd Ævar who I think is 
more familiar with the pattern matching code than I am

Best Wishes

Phillip

>>> Interestingly, checking the pattern with the wildmatch test-tool (`t/helper/test-tool wildmatch wildmatch foo/glarch/bar 'foo**/bar'`) shows that the pattern should not match the path.
>>> Second: The pattern "[[:space:]]" does not match 0x0B (\v, vertical tab) or 0x0C (\f, form feed) despite the fact that the C isspace() function accepts these characters, and I cannot figure out the cause for this discrepancy.  (The pattern does match the other characters that isspace() accepts, though — tab, line feed, carriage return, and space character.)  The wildmatch test-tool agrees with this behavior, though.
>>
>> This is because git defines its own isspace() that does not treat '\v' or '\f' as whitespace (see git-compat-util.h and ctype.c). I'm not sure why we exclude those characters, I think the reason for defining our own isspace() is to avoid the locale dependent behaviour of the standard version.
> 
> Thank you for the explanation.
> 
> ---
> 
> Through further experimentation, I've discovered a fourth oddity with gitignore: If "foo//" (with two or more trailing slashes) is added to .gitignore and `mkdir -p foo/bar` is run, then `git status --ignored=matching --porcelain` won't show "foo/" or "foo/bar/" at all, which is something I'd previously only encountered for completely empty top-level directories.  This holds true no matter how deep or wide you make the directory tree at "foo/", as long as it's all-directories; once a file gets added somewhere under "foo/", the "git status" command shows "foo/" as ignored.
> 
> -- John Wodder

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-29  9:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26 19:34 Oddities in gitignore matching John Thorvald Wodder II
2022-06-28  9:13 ` Phillip Wood
2022-06-28 13:48   ` John Thorvald Wodder II
2022-06-28 13:57     ` John Thorvald Wodder II
2022-06-29  9:22     ` Phillip Wood

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).