git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PROPOSAL] .gitignore syntax modification
@ 2010-10-08 21:26 Kevin Ballard
  2010-10-08 21:58 ` Maaartin
  2010-10-13  2:24 ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 13+ messages in thread
From: Kevin Ballard @ 2010-10-08 21:26 UTC (permalink / raw)
  To: Git Mailing List

A number of times I've run into a problem with .gitignore in the handling of folder packages. For the unaware, a folder package on OS X is a folder which is treated as a file by the file browser and all UI-level tools. Ideally, Git would learn about these and treat them as files, but as a stopgap measure I like to set the binary attribute on various files inside these packages. Similarly, for foo.xcodeproj packages I like to ignore all contained files except project.pbxproj. Unfortunately, .gitignore has no good way of matching this sort of thing at anything other than the root level. Here's an example of my global ~/.gitignore file:

*.xcodeproj/*
!*.xcodeproj/*.pbxproj

As you can see, this applies my rule for foo.xcodeproj globally, except it only works when the project is at the root of a repository. On any repository where this isn't the case, I have to duplicate this pattern into a .gitignore file at the right level. Similarly, I have the following in a repo's .gitattributes:

*.xcmappingmodel/* binary

This sets the binary attribute on all files inside of foo.xcmappingmodel/, but again I have to keep the .gitattributes at the right level. I would love to put this into a global ~/.gitattributes file but I can't.

What I would really like is for the gitignore syntax to support ** a la zsh/bash v4. As I understand it, gitignore uses fnmatch() to do the actual lifting, and unfortunately fnmatch doesn't support this syntax. There's 2 reasonable possibilities I can see here for providing this functionality. The first is to provide our own implementation of fnmatch() that does support this syntax. We already have an implementation of fnmatch() inside of compat/, would there be any problem with using this implementation for everybody and updating it to support **? The second solution is to simply special-case having **/ at the very beginning of a pattern (similar to how we special-case !) and recursively apply the pattern to all nested path components until one matches or we run out of components. This is
  obviously not ideal, but it would allow us to continue to use the system-provided fnmatch().

Any thoughts/problems/suggestions?

-Kevin Ballard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-08 21:26 [PROPOSAL] .gitignore syntax modification Kevin Ballard
@ 2010-10-08 21:58 ` Maaartin
  2010-10-09  0:03   ` Kevin Ballard
  2010-10-13  2:24 ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 13+ messages in thread
From: Maaartin @ 2010-10-08 21:58 UTC (permalink / raw)
  To: git

Kevin Ballard <kevin <at> sb.org> writes:

> ... Similarly, for foo.xcodeproj packages I
> like to ignore all contained files except project.pbxproj. 
Unfortunately, .gitignore has no good way of
> matching this sort of thing at anything other than the root level. Here's an 
example of my global
> ~/.gitignore file:
> 
> *.xcodeproj/*
> !*.xcodeproj/*.pbxproj
> 
> ... On any repository where this isn't the case, I have to duplicate this 
pattern into
> a .gitignore file at the right level.

You don't. You can do something like

*.xcodeproj/*
!.xcodeproj/subdir_with_project
.xcodeproj/subdir_with_project/*
!.xcodeproj/subdir_with_project/*.pbxproj

I'm a beginner but I just stumbled upon this very problem five minutes ago.

The reason for git ignoring everything in the directory without ever looking 
there is efficiency. Maybe we would need something like "weak ignore" meaning 
ignore all matching files but traverse the directories. I'd propose the syntax

?*.xcodeproj
!*.pbxproj

for this. The pattern after the question would exclude all files below 
*.xcodeproj but still traverse the directories there. The second pattern would 
re-include the *.pbxproj files.

The lower efficiency would be no problem, since this was user's choice not to 
use the standard pattern. You could still prevent digging too deep by using 
additionally something like

*.xcodeproj/*/*

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-08 21:58 ` Maaartin
@ 2010-10-09  0:03   ` Kevin Ballard
  2010-10-11 23:46     ` Maaartin
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Ballard @ 2010-10-09  0:03 UTC (permalink / raw)
  To: Maaartin; +Cc: git

On Oct 8, 2010, at 2:58 PM, Maaartin wrote:

> You don't. You can do something like
> 
> *.xcodeproj/*
> !.xcodeproj/subdir_with_project
> .xcodeproj/subdir_with_project/*
> !.xcodeproj/subdir_with_project/*.pbxproj
> 
> I'm a beginner but I just stumbled upon this very problem five minutes ago.

That's not the right layout. I'd have to do

*.xcodeproj/*
!*.xcodeproj/project.pbxproj
*/*.xcodeproj/*
!*/*.xcodeproj/project.pbxproj

And so on, once for each possible level of nesting.

> The reason for git ignoring everything in the directory without ever looking 
> there is efficiency.

That's not what's happening either. That happens if you ignore the directory itself, such as

*.xcodeproj/

That won't look in the directory at all to match non-ignore patterns. The problem I'm talking about is simply that you cannot write a pattern that includes a slash and have that pattern match at any nesting level.

Upon further reflection, if we stick with platform-provided fnmatch() we don't have to special-case a prefixed **/. We could simply define patterns as always matching in that way, and you can use the already-existing prefixed / to root it at the current level. So if my pattern looks like

*.xcodeproj/*

Then it will attempt to match this pattern against the last 2 path components of any path rooted in this directory. It can simply count the slashes to determine the number of path components. If I really want it to just match *.xcodeproj files in the current directory then my pattern should look like

/*.xcodeproj/*

-Kevin Ballard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-09  0:03   ` Kevin Ballard
@ 2010-10-11 23:46     ` Maaartin
  0 siblings, 0 replies; 13+ messages in thread
From: Maaartin @ 2010-10-11 23:46 UTC (permalink / raw)
  To: git

Kevin Ballard <kevin <at> sb.org> writes:

> On Oct 8, 2010, at 2:58 PM, Maaartin wrote:
> That's not the right layout. I'd have to do
> 
> *.xcodeproj/*
> !*.xcodeproj/project.pbxproj
> */*.xcodeproj/*
> !*/*.xcodeproj/project.pbxproj
> 
> And so on, once for each possible level of nesting.

Sorry, that's was I meant, I was too sloppy/tired/stupid when writing this.

> ... The problem I'm talking about is
> simply that you cannot write a pattern that includes a slash and have that 
pattern match at any nesting level.

I see I wrong again. At least I learned something.

> Upon further reflection, if we stick with platform-provided fnmatch() we 
don't have to special-case a
> prefixed **/. We could simply define patterns as always matching in that way, 
and you can use the
> already-existing prefixed / to root it at the current level. So if my pattern 
looks like
> 
> *.xcodeproj/*
> 
> Then it will attempt to match this pattern against the last 2 path components 
of any path rooted in this
> directory. It can simply count the slashes to determine the number of path 
components. If I really want it
> to just match *.xcodeproj files in the current directory then my pattern 
should look like
> 
> /*.xcodeproj/*

IMHO, this is how it should work, simple and consequent. Sorry for clobbering 
this thread by my mistakes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-08 21:26 [PROPOSAL] .gitignore syntax modification Kevin Ballard
  2010-10-08 21:58 ` Maaartin
@ 2010-10-13  2:24 ` Nguyen Thai Ngoc Duy
  2010-10-13  2:32   ` Kevin Ballard
  1 sibling, 1 reply; 13+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-10-13  2:24 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Git Mailing List

On Sat, Oct 9, 2010 at 4:26 AM, Kevin Ballard <kevin@sb.org> wrote:
> A number of times I've run into a problem with .gitignore in the handling of folder packages. For the unaware, a folder package on OS X is a folder which is treated as a file by the file browser and all UI-level tools. Ideally, Git would learn about these and treat them as files, but as a stopgap measure I like to set the binary attribute on various files inside these packages. Similarly, for foo.xcodeproj packages I like to ignore all contained files except project.pbxproj. Unfortunately, .gitignore has no good way of matching this sort of thing at anything other than the root level. Here's an example of my global ~/.gitignore file:
>
> *.xcodeproj/*
> !*.xcodeproj/*.pbxproj
>
> As you can see, this applies my rule for foo.xcodeproj globally, except it only works when the project is at the root of a repository. On any repository where this isn't the case, I have to duplicate this pattern into a .gitignore file at the right level. Similarly, I have the following in a repo's .gitattributes:
>
> *.xcmappingmodel/* binary
>
> This sets the binary attribute on all files inside of foo.xcmappingmodel/, but again I have to keep the .gitattributes at the right level. I would love to put this into a global ~/.gitattributes file but I can't.
>
> What I would really like is for the gitignore syntax to support ** a la zsh/bash v4. As I understand it, gitignore uses fnmatch() to do the actual lifting, and unfortunately fnmatch doesn't support this syntax. There's 2 reasonable possibilities I can see here for providing this functionality. The first is to provide our own implementation of fnmatch() that does support this syntax. We already have an implementation of fnmatch() inside of compat/, would there be any problem with using this implementation for everybody and updating it to support **? The second solution is to simply special-case having **/ at the very beginning of a pattern (similar to how we special-case !) and recursively apply the pattern to all nested path components until one matches or we run out of components. This 
 is obviously not ideal, but it would allow us to continue to use the system-provided fnmatch().

Special case "**/" (also "path/to/**/") is probably good enough. You
might need to handle all combinations of "**/" and other optimizations
in excluded_from_list() though. Can you make a patch (or a few
patches) for it?
-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-13  2:24 ` Nguyen Thai Ngoc Duy
@ 2010-10-13  2:32   ` Kevin Ballard
  2010-10-13  2:40     ` Jonathan Nieder
  2010-10-13 12:15     ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 13+ messages in thread
From: Kevin Ballard @ 2010-10-13  2:32 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List

On Oct 12, 2010, at 7:24 PM, Nguyen Thai Ngoc Duy wrote:

> On Sat, Oct 9, 2010 at 4:26 AM, Kevin Ballard <kevin@sb.org> wrote:
>> A number of times I've run into a problem with .gitignore in the handling of folder packages. For the unaware, a folder package on OS X is a folder which is treated as a file by the file browser and all UI-level tools. Ideally, Git would learn about these and treat them as files, but as a stopgap measure I like to set the binary attribute on various files inside these packages. Similarly, for foo.xcodeproj packages I like to ignore all contained files except project.pbxproj. Unfortunately, .gitignore has no good way of matching this sort of thing at anything other than the root level. Here's an example of my global ~/.gitignore file:
>> 
>> *.xcodeproj/*
>> !*.xcodeproj/*.pbxproj
>> 
>> As you can see, this applies my rule for foo.xcodeproj globally, except it only works when the project is at the root of a repository. On any repository where this isn't the case, I have to duplicate this pattern into a .gitignore file at the right level. Similarly, I have the following in a repo's .gitattributes:
>> 
>> *.xcmappingmodel/* binary
>> 
>> This sets the binary attribute on all files inside of foo.xcmappingmodel/, but again I have to keep the .gitattributes at the right level. I would love to put this into a global ~/.gitattributes file but I can't.
>> 
>> What I would really like is for the gitignore syntax to support ** a la zsh/bash v4. As I understand it, gitignore uses fnmatch() to do the actual lifting, and unfortunately fnmatch doesn't support this syntax. There's 2 reasonable possibilities I can see here for providing this functionality. The first is to provide our own implementation of fnmatch() that does support this syntax. We already have an implementation of fnmatch() inside of compat/, would there be any problem with using this implementation for everybody and updating it to support **? The second solution is to simply special-case having **/ at the very beginning of a pattern (similar to how we special-case !) and recursively apply the pattern to all nested path components until one matches or we run out of components. This
  is obviously not ideal, but it would allow us to continue to use the system-provided fnmatch().
> 
> Special case "**/" (also "path/to/**/") is probably good enough. You
> might need to handle all combinations of "**/" and other optimizations
> in excluded_from_list() though. Can you make a patch (or a few
> patches) for it?

As soon as I find the time, I'll be working on a patch for this. I only wrote up the proposal because I want to make sure that what I end up implementing is actually something that will be accepted. At this point I'm actually in favor of simply assuming all paths that don't start with / can be matched at any level but I recognize that this is a change to existing functionality, though I personally think that all patterns that are meant to match only at the current level should be prefixed with / anyway. Would such a change to existing behavior be rejected out-of-hand?

-Kevin Ballard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-13  2:32   ` Kevin Ballard
@ 2010-10-13  2:40     ` Jonathan Nieder
  2010-10-13  3:05       ` Kevin Ballard
  2010-10-13 12:15     ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2010-10-13  2:40 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Nguyen Thai Ngoc Duy, Git Mailing List

Kevin Ballard wrote:
> On Oct 12, 2010, at 7:24 PM, Nguyen Thai Ngoc Duy wrote:

>> Special case "**/" (also "path/to/**/") is probably good enough. You
>> might need to handle all combinations of "**/" and other optimizations
>> in excluded_from_list() though. Can you make a patch (or a few
>> patches) for it?
[...]
>                                                                  At
> this point I'm actually in favor of simply assuming all paths that
> don't start with / can be matched at any level

As a long-term change with early warnings and proper deprecation
procedure that may (or may not) be an ok idea, but in the short
term it is certainly a no-go.

If linux-2.6 is any indication of what to expect, some projects are
not using / in front of any paths with / at all.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-13  2:40     ` Jonathan Nieder
@ 2010-10-13  3:05       ` Kevin Ballard
  2010-10-13 16:51         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Ballard @ 2010-10-13  3:05 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Nguyen Thai Ngoc Duy, Git Mailing List

On Oct 12, 2010, at 7:40 PM, Jonathan Nieder wrote:

> Kevin Ballard wrote:
>> On Oct 12, 2010, at 7:24 PM, Nguyen Thai Ngoc Duy wrote:
> 
>>> Special case "**/" (also "path/to/**/") is probably good enough. You
>>> might need to handle all combinations of "**/" and other optimizations
>>> in excluded_from_list() though. Can you make a patch (or a few
>>> patches) for it?
> [...]
>>                                                                 At
>> this point I'm actually in favor of simply assuming all paths that
>> don't start with / can be matched at any level
> 
> As a long-term change with early warnings and proper deprecation
> procedure that may (or may not) be an ok idea, but in the short
> term it is certainly a no-go.
> 
> If linux-2.6 is any indication of what to expect, some projects are
> not using / in front of any paths with / at all.

Another possibility is using this new behavior for core.excludesfile and possibly .git/info/exclude, but not for patterns in .gitignore files. The benefit here is any existing project that uses .gitignore files will be unchanged, but global (and possibly repo-wide) patterns can benefit from the new behavior. The drawback is this would introduce a behavioral difference for patterns depending on which file they're in, which is probably not acceptable.

I will look into how much work is required to support **/ patterns without doing a wholesale replacement of fnmatch(). Hopefully it won't be infeasible.

-Kevin Ballard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-13  2:32   ` Kevin Ballard
  2010-10-13  2:40     ` Jonathan Nieder
@ 2010-10-13 12:15     ` Nguyen Thai Ngoc Duy
  2010-10-15 11:01       ` Kevin Ballard
  1 sibling, 1 reply; 13+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-10-13 12:15 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1202 bytes --]

On Tue, Oct 12, 2010 at 07:32:54PM -0700, Kevin Ballard wrote:
> As soon as I find the time, I'll be working on a patch for this. I
> only wrote up the proposal because I want to make sure that what I
> end up implementing is actually something that will be accepted. At
> this point I'm actually in favor of simply assuming all paths that
> don't start with / can be matched at any level but I recognize that
> this is a change to existing functionality, though I personally
> think that all patterns that are meant to match only at the current
> level should be prefixed with / anyway. Would such a change to
> existing behavior be rejected out-of-hand?

Yes, patterns that only match current level should be prefixed with a
slash. There are also other cases apart from "current level only" and
"any level": foo/a* will match only second level, not any level.

I was thinking of doing like this. It's not complete (not even build)
but it shows the idea. I don't think this way it will change existing
behaviors. Performance is something I haven't thought through.

Anyway, what do you think? I'm afraid I don't have time to work on
this. The pathspec unification work still needs to be done.
-- 
Duy

[-- Attachment #2: 0001-excluded_from_list-split-the-core-exclude-code-out-a.patch --]
[-- Type: text/plain, Size: 4039 bytes --]

>From 058ed97a52a3a155655193d0ecd94506781c93c1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= <pclouds@gmail.com>
Date: Wed, 13 Oct 2010 19:01:11 +0700
Subject: [PATCH 1/3] excluded_from_list: split the core exclude code out as a separate function

---
 dir.c |  114 ++++++++++++++++++++++++++++++++++------------------------------
 1 files changed, 61 insertions(+), 53 deletions(-)

diff --git a/dir.c b/dir.c
index a44c7b3..56974e5 100644
--- a/dir.c
+++ b/dir.c
@@ -343,6 +343,64 @@ static void prep_exclude(struct dir_struct *dir, const char *base, int baselen)
 	dir->basebuf[baselen] = '\0';
 }
 
+static int excluded_from_list_1(const char *pathname, int pathlen, const char *basename, struct exclude *x, int *dtype)
+{
+	const char *exclude = x->pattern;
+	int to_exclude = x->to_exclude;
+
+	if (x->flags & EXC_FLAG_MUSTBEDIR) {
+		if (!dtype) {
+			if (!prefixcmp(pathname, exclude))
+				return to_exclude;
+			else
+				return -1;
+		}
+		if (*dtype == DT_UNKNOWN)
+			*dtype = get_dtype(NULL, pathname, pathlen);
+		if (*dtype != DT_DIR)
+			return -1;
+	}
+
+	if (x->flags & EXC_FLAG_NODIR) {
+		/* match basename */
+		if (x->flags & EXC_FLAG_NOWILDCARD) {
+			if (!strcmp(exclude, basename))
+				return to_exclude;
+		} else if (x->flags & EXC_FLAG_ENDSWITH) {
+			if (x->patternlen - 1 <= pathlen &&
+			    !strcmp(exclude + 1, pathname + pathlen - x->patternlen + 1))
+				return to_exclude;
+		} else {
+			if (fnmatch(exclude, basename, 0) == 0)
+				return to_exclude;
+		}
+	}
+	else {
+		/* match with FNM_PATHNAME:
+		 * exclude has base (baselen long) implicitly
+		 * in front of it.
+		 */
+		int baselen = x->baselen;
+		if (*exclude == '/')
+			exclude++;
+
+		if (pathlen < baselen ||
+		    (baselen && pathname[baselen-1] != '/') ||
+		    strncmp(pathname, x->base, baselen))
+		    return -1;
+
+		if (x->flags & EXC_FLAG_NOWILDCARD) {
+			if (!strcmp(exclude, pathname + baselen))
+				return to_exclude;
+		} else {
+			if (fnmatch(exclude, pathname+baselen,
+				    FNM_PATHNAME) == 0)
+			    return to_exclude;
+		}
+	}
+	return -1;
+}
+
 /* Scan the list and let the last match determine the fate.
  * Return 1 for exclude, 0 for include and -1 for undecided.
  */
@@ -355,59 +413,9 @@ int excluded_from_list(const char *pathname,
 	if (el->nr) {
 		for (i = el->nr - 1; 0 <= i; i--) {
 			struct exclude *x = el->excludes[i];
-			const char *exclude = x->pattern;
-			int to_exclude = x->to_exclude;
-
-			if (x->flags & EXC_FLAG_MUSTBEDIR) {
-				if (!dtype) {
-					if (!prefixcmp(pathname, exclude))
-						return to_exclude;
-					else
-						continue;
-				}
-				if (*dtype == DT_UNKNOWN)
-					*dtype = get_dtype(NULL, pathname, pathlen);
-				if (*dtype != DT_DIR)
-					continue;
-			}
-
-			if (x->flags & EXC_FLAG_NODIR) {
-				/* match basename */
-				if (x->flags & EXC_FLAG_NOWILDCARD) {
-					if (!strcmp(exclude, basename))
-						return to_exclude;
-				} else if (x->flags & EXC_FLAG_ENDSWITH) {
-					if (x->patternlen - 1 <= pathlen &&
-					    !strcmp(exclude + 1, pathname + pathlen - x->patternlen + 1))
-						return to_exclude;
-				} else {
-					if (fnmatch(exclude, basename, 0) == 0)
-						return to_exclude;
-				}
-			}
-			else {
-				/* match with FNM_PATHNAME:
-				 * exclude has base (baselen long) implicitly
-				 * in front of it.
-				 */
-				int baselen = x->baselen;
-				if (*exclude == '/')
-					exclude++;
-
-				if (pathlen < baselen ||
-				    (baselen && pathname[baselen-1] != '/') ||
-				    strncmp(pathname, x->base, baselen))
-				    continue;
-
-				if (x->flags & EXC_FLAG_NOWILDCARD) {
-					if (!strcmp(exclude, pathname + baselen))
-						return to_exclude;
-				} else {
-					if (fnmatch(exclude, pathname+baselen,
-						    FNM_PATHNAME) == 0)
-					    return to_exclude;
-				}
-			}
+			int to_exclude = excluded_from_list_1(pathname, pathlen, basename, x, dtype);
+			if (to_exclude != -1)
+				return to_exclude;
 		}
 	}
 	return -1; /* undecided */
-- 
1.7.0.2.445.gcbdb3


[-- Attachment #3: 0002-Set-EXC_FLAG_STARSTAR-if-a-pattern-starts-with.patch --]
[-- Type: text/plain, Size: 1335 bytes --]

>From afa6ef0ee1a299aef486367792f6dcc6e4e686d5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= <pclouds@gmail.com>
Date: Wed, 13 Oct 2010 19:10:35 +0700
Subject: [PATCH 2/3] Set EXC_FLAG_STARSTAR if a pattern starts with "**/"

---
 dir.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 56974e5..241fbce 100644
--- a/dir.c
+++ b/dir.c
@@ -171,6 +171,10 @@ void add_exclude(const char *string, const char *base,
 		to_exclude = 0;
 		string++;
 	}
+	if (*string == '*' && string[1] == '*' && string[2] == '/') {
+		flags |= EXC_FLAG_STARSTAR;
+		string += 3;
+	}
 	len = strlen(string);
 	if (len && string[len - 1] == '/') {
 		char *s;
@@ -180,7 +184,7 @@ void add_exclude(const char *string, const char *base,
 		s[len - 1] = '\0';
 		string = s;
 		x->pattern = s;
-		flags = EXC_FLAG_MUSTBEDIR;
+		flags |= EXC_FLAG_MUSTBEDIR;
 	} else {
 		x = xmalloc(sizeof(*x));
 		x->pattern = string;
@@ -190,7 +194,7 @@ void add_exclude(const char *string, const char *base,
 	x->base = base;
 	x->baselen = baselen;
 	x->flags = flags;
-	if (!strchr(string, '/'))
+	if (!(x->flags & EXC_FLAG_STARSTAR) && !strchr(string, '/'))
 		x->flags |= EXC_FLAG_NODIR;
 	if (no_wildcard(string))
 		x->flags |= EXC_FLAG_NOWILDCARD;
-- 
1.7.0.2.445.gcbdb3


[-- Attachment #4: 0003-excluded_from_list-support-EXC_FLAG_STARSTAR.patch --]
[-- Type: text/plain, Size: 1371 bytes --]

>From c11a949328b90aca3398f730a50eab35033f0334 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nguy=E1=BB=85n=20Th=C3=A1i=20Ng=E1=BB=8Dc=20Duy?= <pclouds@gmail.com>
Date: Wed, 13 Oct 2010 19:06:07 +0700
Subject: [PATCH 3/3] excluded_from_list: support EXC_FLAG_STARSTAR

if that flag is set, try to match all full pathname, then skip one
directory and try again until it completely fails.
---
 dir.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 241fbce..11c3e7d 100644
--- a/dir.c
+++ b/dir.c
@@ -417,9 +417,23 @@ int excluded_from_list(const char *pathname,
 	if (el->nr) {
 		for (i = el->nr - 1; 0 <= i; i--) {
 			struct exclude *x = el->excludes[i];
-			int to_exclude = excluded_from_list_1(pathname, pathlen, basename, x, dtype);
-			if (to_exclude != -1)
-				return to_exclude;
+			const char *new_pathname = pathname;
+			int new_pathlen = pathlen;
+
+			while (1) {
+				int to_exclude = excluded_from_list_1(new_pathname, new_pathlen, basename, x, dtype);
+				if (to_exclude != -1)
+					return to_exclude;
+				if (x->flags & EXC_FLAG_STARSTAR) {
+					const char *p = strchr(new_pathname, '/');
+					if (p) {
+						new_pathlen -= p - new_pathname - 1;
+						new_pathname = p + 1;
+						continue;
+					}
+				}
+				break;
+			}
 		}
 	}
 	return -1; /* undecided */
-- 
1.7.0.2.445.gcbdb3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-13  3:05       ` Kevin Ballard
@ 2010-10-13 16:51         ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2010-10-13 16:51 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Jonathan Nieder, Nguyen Thai Ngoc Duy, Git Mailing List

Kevin Ballard <kevin@sb.org> writes:

> Another possibility is using this new behavior for core.excludesfile and
> possibly .git/info/exclude, but not for patterns in .gitignore
> files.

I suspect that that approach is totally unacceptable.  These out-of-tree
exclude pattern files do not have any semantic difference from the in-tree
exclude pattern files (.gitignore); it makes no sense to make them
syntactically different.

More importantly it is much harder to diagnose if you break people's
private configuration without introducing the same breakage to what's
available to wider public guinea pigs.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-13 12:15     ` Nguyen Thai Ngoc Duy
@ 2010-10-15 11:01       ` Kevin Ballard
  2010-10-15 12:57         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Ballard @ 2010-10-15 11:01 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List

On Oct 13, 2010, at 5:15 AM, Nguyen Thai Ngoc Duy wrote:

> On Tue, Oct 12, 2010 at 07:32:54PM -0700, Kevin Ballard wrote:
>> As soon as I find the time, I'll be working on a patch for this. I
>> only wrote up the proposal because I want to make sure that what I
>> end up implementing is actually something that will be accepted. At
>> this point I'm actually in favor of simply assuming all paths that
>> don't start with / can be matched at any level but I recognize that
>> this is a change to existing functionality, though I personally
>> think that all patterns that are meant to match only at the current
>> level should be prefixed with / anyway. Would such a change to
>> existing behavior be rejected out-of-hand?
> 
> Yes, patterns that only match current level should be prefixed with a
> slash. There are also other cases apart from "current level only" and
> "any level": foo/a* will match only second level, not any level.

Sure, "foo/a*" will match second level only, but that's identical to "/foo/a*". I would have preferred the semantics of .gitignore to behave as if all patterns are prefixed with "**/" unless the pattern starts with "/", but that ship sailed a long time ago and I agree that it would be a bad idea to introduce that behavior now.

> I was thinking of doing like this. It's not complete (not even build)
> but it shows the idea. I don't think this way it will change existing
> behaviors. Performance is something I haven't thought through.
> 
> Anyway, what do you think? I'm afraid I don't have time to work on
> this. The pathspec unification work still needs to be done.

Got around to glancing at your patch. Looks pretty good, and it does build if you simply define EXC_FLAG_STARSTAR, though there are a few changes that are definitely necessary (a path of "*" will cause this to run off the end of the string while trying to detect "**/"). I'll have some more time next week to take a much closer look though. As for performance, I'm not particularly worried. The only performance change is if EXC_FLAG_STARSTAR is checked, in the worst-case it'll try to apply the pattern once per level of directory nesting. As this is just string twiddling, it's bound to be pretty fast, and I don't think there's any viable alternative to doing this kind of loop anyway. That said, I'd still like to support putting **/ anywhere in the pattern instead of just at the beginning, and 
 possibly even support ** (without the trailing /).

If we do support ** by itself, I wonder if we should special-case having ** as the last path component of the pattern. The possible behavior change we could have is making this only match files and not directories. The use-case here is putting something like "foo/**" in the top-level .gitignore and then a few levels into foo we could put another .gitignore with an inverse pattern in order to un-ignore some deep file (or just "!foo/*/*/bar.c" inside that top-level .gitignore as well). The only way I can think of to achieve this behavior with the current gitignore is something along the lines of

foo/*
!foo/bar/
foo/bar/*
!foo/bar/baz/
foo/bar/baz/*
!foo/bar/baz/bar.c

And even this will only work if you know all the intermediate directories. I cannot think of any way at all right now to ignore everything in a single directory except for one file at least 1 level of nesting deeper if you don't know the names of the intermediate directories. With the proposed special-case we can say

foo/**
!foo/*/*/bar.c

and it will behave exactly as specified.

It occurs to me that we could actually tweak this slightly, to say that if a ** is encountered and there are zero slashes in the pattern after it, then it will only match files (with zero or more leading directories). This way you can have a pattern "foo/**.d" which only ignores files with the extension ".d" but will still avoid ignoring directories that end in ".d".

This turned out a bit longer than intended, and slightly more rambling as well, and I apologize for that. I will revisit this again next week.

-Kevin Ballard

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-15 11:01       ` Kevin Ballard
@ 2010-10-15 12:57         ` Nguyen Thai Ngoc Duy
  2010-10-15 20:15           ` Kevin Ballard
  0 siblings, 1 reply; 13+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-10-15 12:57 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Git Mailing List

On Fri, Oct 15, 2010 at 6:01 PM, Kevin Ballard <kevin@sb.org> wrote:
> Got around to glancing at your patch. Looks pretty good, and it does build if you simply define EXC_FLAG_STARSTAR, though there are a few changes that are definitely necessary (a path of "*" will cause this to run off the end of the string while trying to detect "**/"). I'll have some more time next week to take a much closer look though. As for performance, I'm not particularly worried. The only performance change is if EXC_FLAG_STARSTAR is checked, in the worst-case it'll try to apply the pattern once per level of directory nesting. As this is just string twiddling, it's bound to be pretty fast, and I don't think there's any viable alternative to doing this kind of loop anyway. That said, I'd still like to support putting **/ anywhere in the pattern instead of just at the beginning, an
 d possibly even support ** (without the trailing /).

Well that would mean reimplementing fnmatch(). I don't know, maybe
it's not hard to do that. '*' can already match '/' if FNM_PATHNAME is
not given. So one just needs to tell fnmatch() '**' is '*' without
FNM_PATHNAME.

"**/" optimization can be extended to support "path/to/**/" quite
easily as long as no wildcards are used in "path/to/" part.

> If we do support ** by itself, I wonder if we should special-case having ** as the last path component of the pattern. The possible behavior change we could have is making this only match files and not directories. The use-case here is putting something like "foo/**" in the top-level .gitignore and then a few levels into foo we could put another .gitignore with an inverse pattern in order to un-ignore some deep file (or just "!foo/*/*/bar.c" inside that top-level .gitignore as well). The only way I can think of to achieve this behavior with the current gitignore is something along the lines of
>
> foo/*
> !foo/bar/
> foo/bar/*
> !foo/bar/baz/
> foo/bar/baz/*
> !foo/bar/baz/bar.c
>
> And even this will only work if you know all the intermediate directories. I cannot think of any way at all right now to ignore everything in a single directory except for one file at least 1 level of nesting deeper if you don't know the names of the intermediate directories. With the proposed special-case we can say
>
> foo/**
> !foo/*/*/bar.c
>
> and it will behave exactly as specified.
>
> It occurs to me that we could actually tweak this slightly, to say that if a ** is encountered and there are zero slashes in the pattern after it, then it will only match files (with zero or more leading directories). This way you can have a pattern "foo/**.d" which only ignores files with the extension ".d" but will still avoid ignoring directories that end in ".d".

No idea. Seems overkill to me. But I don't use .gitignore heavily. For
really complex ignore rules, how about allowing an external process to
do the job? It would keep .gitignore syntax simple, yet powerful when
needed.

A leading '|' marks an external process and can be used intermixed
with normal patterns in .gitignore. When excluded_from_list examines a
'|' pattern, it sends all information to the associated process' stdin
and expects to a result code in stdout. The process is started when it
is examined the first time and is kept alive until git process
terminates.
-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PROPOSAL] .gitignore syntax modification
  2010-10-15 12:57         ` Nguyen Thai Ngoc Duy
@ 2010-10-15 20:15           ` Kevin Ballard
  0 siblings, 0 replies; 13+ messages in thread
From: Kevin Ballard @ 2010-10-15 20:15 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List

On Oct 15, 2010, at 5:57 AM, Nguyen Thai Ngoc Duy wrote:

> On Fri, Oct 15, 2010 at 6:01 PM, Kevin Ballard <kevin@sb.org> wrote:
>> Got around to glancing at your patch. Looks pretty good, and it does build if you simply define EXC_FLAG_STARSTAR, though there are a few changes that are definitely necessary (a path of "*" will cause this to run off the end of the string while trying to detect "**/"). I'll have some more time next week to take a much closer look though. As for performance, I'm not particularly worried. The only performance change is if EXC_FLAG_STARSTAR is checked, in the worst-case it'll try to apply the pattern once per level of directory nesting. As this is just string twiddling, it's bound to be pretty fast, and I don't think there's any viable alternative to doing this kind of loop anyway. That said, I'd still like to support putting **/ anywhere in the pattern instead of just at the beginning, a
 nd possibly even support ** (without the trailing /).
> 
> Well that would mean reimplementing fnmatch(). I don't know, maybe
> it's not hard to do that. '*' can already match '/' if FNM_PATHNAME is
> not given. So one just needs to tell fnmatch() '**' is '*' without
> FNM_PATHNAME.
> 
> "**/" optimization can be extended to support "path/to/**/" quite
> easily as long as no wildcards are used in "path/to/" part.

I think both cases can be dealt with while still using fnmatch(). We can split the string along all instances of "**" and then match each pattern segment with fnmatch() along parts of the path. If a given segment matches part of the path, then we can assume that's correct and move on (e.g. never backtrack to the ** before it). The only specialization is the very last path segment has to match at the end of the path, and we can use slash-counting in each path segment in order to figure out how to slice up the path to pass to fnmatch().

>> If we do support ** by itself, I wonder if we should special-case having ** as the last path component of the pattern. The possible behavior change we could have is making this only match files and not directories. The use-case here is putting something like "foo/**" in the top-level .gitignore and then a few levels into foo we could put another .gitignore with an inverse pattern in order to un-ignore some deep file (or just "!foo/*/*/bar.c" inside that top-level .gitignore as well). The only way I can think of to achieve this behavior with the current gitignore is something along the lines of
>> 
>> foo/*
>> !foo/bar/
>> foo/bar/*
>> !foo/bar/baz/
>> foo/bar/baz/*
>> !foo/bar/baz/bar.c
>> 
>> And even this will only work if you know all the intermediate directories. I cannot think of any way at all right now to ignore everything in a single directory except for one file at least 1 level of nesting deeper if you don't know the names of the intermediate directories. With the proposed special-case we can say
>> 
>> foo/**
>> !foo/*/*/bar.c
>> 
>> and it will behave exactly as specified.
>> 
>> It occurs to me that we could actually tweak this slightly, to say that if a ** is encountered and there are zero slashes in the pattern after it, then it will only match files (with zero or more leading directories). This way you can have a pattern "foo/**.d" which only ignores files with the extension ".d" but will still avoid ignoring directories that end in ".d".
> 
> No idea. Seems overkill to me. But I don't use .gitignore heavily. For
> really complex ignore rules, how about allowing an external process to
> do the job? It would keep .gitignore syntax simple, yet powerful when
> needed.
> 
> A leading '|' marks an external process and can be used intermixed
> with normal patterns in .gitignore. When excluded_from_list examines a
> '|' pattern, it sends all information to the associated process' stdin
> and expects to a result code in stdout. The process is started when it
> is examined the first time and is kept alive until git process
> terminates.

That would certainly be powerful, but I don't know how much work it would take to implement. I still haven't really looked at the gitignore code yet. I think this is a good suggestion to do, but I still want to handle ** natively if possible.

-Kevin Ballard

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-10-15 20:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-08 21:26 [PROPOSAL] .gitignore syntax modification Kevin Ballard
2010-10-08 21:58 ` Maaartin
2010-10-09  0:03   ` Kevin Ballard
2010-10-11 23:46     ` Maaartin
2010-10-13  2:24 ` Nguyen Thai Ngoc Duy
2010-10-13  2:32   ` Kevin Ballard
2010-10-13  2:40     ` Jonathan Nieder
2010-10-13  3:05       ` Kevin Ballard
2010-10-13 16:51         ` Junio C Hamano
2010-10-13 12:15     ` Nguyen Thai Ngoc Duy
2010-10-15 11:01       ` Kevin Ballard
2010-10-15 12:57         ` Nguyen Thai Ngoc Duy
2010-10-15 20:15           ` Kevin Ballard

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).