git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/8] wildmatch take 3
@ 2012-10-09  3:08 Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 1/8] Import wildmatch from rsync Nguyễn Thái Ngọc Duy
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:08 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Still experimental but the semantics is getting better, I think.
Please point out what you think still does not make sense. Quote from
the last patch:

  Two consecutive asterisks ("`**`") in patterns matched against
  full pathname may have special meaning:
  
   - A leading "`**`" followed by a slash means match in all
     directories. For example, "`**/foo`" matches file or directory
     "`foo`" anywhere, the same as pattern "`foo`". "**/foo/bar"
     matches file or directory "`bar`" anywhere that is directly
     under directory "`foo`".
  
   - A trailing "/**" matches everything inside. For example,
     "abc/**" is equivalent to "`/abc/`".
  
   - A slash followed by two consecutive asterisks then a slash
     matches zero or more directories. For example, "`a/**/b`"
     matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
  
   - Consecutive asterisks otherwise are treated like normal
     asterisk wildcards.

"abc/**" and "abc/*" are different in attr (the former matches all
files, the latter only files directly under abc) while they are the
same for ignore. I don't like these subtleties but I don't
think we have a choice unless we rework attr matching machinery.

Tests t3070.100 and .101 fail on some machine and not on other due to
(I guess) fnmatch behavior changes. Maybe we should fallback on
compat/fnmatch in such cases for consistent behavior.

There are problems with asciidoc and "`**/`" but that's not something
we need to care now.

On top of master (and a small conflict with nd/attr-match-optim-more)

Nguyễn Thái Ngọc Duy (8):
  Import wildmatch from rsync
  wildmatch: remove unnecessary functions
  Integrate wildmatch to git
  wildmatch: remove static variable force_lower_case
  wildmatch: fix case-insensitive matching
  wildmatch: adjust "**" behavior
  wildmatch: make /**/ match zero or more directories
  Support "**" wildcard in .gitignore and .gitattributes

 .gitignore                         |   1 +
 Documentation/gitignore.txt        |  19 +++
 Makefile                           |   3 +
 attr.c                             |   4 +-
 dir.c                              |   4 +-
 t/t0003-attributes.sh              |  38 ++++++
 t/t3001-ls-files-others-exclude.sh |  19 +++
 t/t3070-wildmatch.sh               | 184 ++++++++++++++++++++++++++++
 test-wildmatch.c                   |  14 +++
 wildmatch.c                        | 245 +++++++++++++++++++++++++++++++++++++
 wildmatch.h                        |   3 +
 11 files changed, 532 insertions(+), 2 deletions(-)
 create mode 100755 t/t3070-wildmatch.sh
 create mode 100644 test-wildmatch.c
 create mode 100644 wildmatch.c
 create mode 100644 wildmatch.h

-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/8] Import wildmatch from rsync
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 2/8] wildmatch: remove unnecessary functions Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 16393 bytes --]

These files are from rsync.git commit
f92f5b166e3019db42bc7fe1aa2f1a9178cd215d, which was the last commit
before rsync turned GPL-3. All files are imported as-is and
no-op. Adaptation is done in a separate patch.

rsync.git           ->  git.git
lib/wildmatch.[ch]      wildmatch.[ch]
wildtest.txt            t/t3070/wildtest.txt

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t3070/wildtest.txt | 165 +++++++++++++++++++++++
 wildmatch.c          | 368 +++++++++++++++++++++++++++++++++++++++++++++++++++
 wildmatch.h          |   6 +
 3 files changed, 539 insertions(+)
 create mode 100644 t/t3070/wildtest.txt
 create mode 100644 wildmatch.c
 create mode 100644 wildmatch.h

diff --git a/t/t3070/wildtest.txt b/t/t3070/wildtest.txt
new file mode 100644
index 0000000..42c1678
--- /dev/null
+++ b/t/t3070/wildtest.txt
@@ -0,0 +1,165 @@
+# Input is in the following format (all items white-space separated):
+#
+# The first two items are 1 or 0 indicating if the wildmat call is expected to
+# succeed and if fnmatch works the same way as wildmat, respectively.  After
+# that is a text string for the match, and a pattern string.  Strings can be
+# quoted (if desired) in either double or single quotes, as well as backticks.
+#
+# MATCH FNMATCH_SAME "text to match" 'pattern to use'
+
+# Basic wildmat features
+1 1 foo			foo
+0 1 foo			bar
+1 1 ''			""
+1 1 foo			???
+0 1 foo			??
+1 1 foo			*
+1 1 foo			f*
+0 1 foo			*f
+1 1 foo			*foo*
+1 1 foobar		*ob*a*r*
+1 1 aaaaaaabababab	*ab
+1 1 foo*		foo\*
+0 1 foobar		foo\*bar
+1 1 f\oo		f\\oo
+1 1 ball		*[al]?
+0 1 ten			[ten]
+1 1 ten			**[!te]
+0 1 ten			**[!ten]
+1 1 ten			t[a-g]n
+0 1 ten			t[!a-g]n
+1 1 ton			t[!a-g]n
+1 1 ton			t[^a-g]n
+1 1 a]b			a[]]b
+1 1 a-b			a[]-]b
+1 1 a]b			a[]-]b
+0 1 aab			a[]-]b
+1 1 aab			a[]a-]b
+1 1 ]			]
+
+# Extended slash-matching features
+0 1 foo/baz/bar		foo*bar
+1 1 foo/baz/bar		foo**bar
+0 1 foo/bar		foo?bar
+0 1 foo/bar		foo[/]bar
+0 1 foo/bar		f[^eiu][^eiu][^eiu][^eiu][^eiu]r
+1 1 foo-bar		f[^eiu][^eiu][^eiu][^eiu][^eiu]r
+0 1 foo			**/foo
+1 1 /foo		**/foo
+1 1 bar/baz/foo		**/foo
+0 1 bar/baz/foo		*/foo
+0 0 foo/bar/baz		**/bar*
+1 1 deep/foo/bar/baz	**/bar/*
+0 1 deep/foo/bar/baz/	**/bar/*
+1 1 deep/foo/bar/baz/	**/bar/**
+0 1 deep/foo/bar	**/bar/*
+1 1 deep/foo/bar/	**/bar/**
+1 1 foo/bar/baz		**/bar**
+1 1 foo/bar/baz/x	*/bar/**
+0 0 deep/foo/bar/baz/x	*/bar/**
+1 1 deep/foo/bar/baz/x	**/bar/*/*
+
+# Various additional tests
+0 1 acrt		a[c-c]st
+1 1 acrt		a[c-c]rt
+0 1 ]			[!]-]
+1 1 a			[!]-]
+0 1 ''			\
+0 1 \			\
+0 1 /\			*/\
+1 1 /\			*/\\
+1 1 foo			foo
+1 1 @foo		@foo
+0 1 foo			@foo
+1 1 [ab]		\[ab]
+1 1 [ab]		[[]ab]
+1 1 [ab]		[[:]ab]
+0 1 [ab]		[[::]ab]
+1 1 [ab]		[[:digit]ab]
+1 1 [ab]		[\[:]ab]
+1 1 ?a?b		\??\?b
+1 1 abc			\a\b\c
+0 1 foo			''
+1 1 foo/bar/baz/to	**/t[o]
+
+# Character class tests
+1 1 a1B		[[:alpha:]][[:digit:]][[:upper:]]
+0 1 a		[[:digit:][:upper:][:space:]]
+1 1 A		[[:digit:][:upper:][:space:]]
+1 1 1		[[:digit:][:upper:][:space:]]
+0 1 1		[[:digit:][:upper:][:spaci:]]
+1 1 ' '		[[:digit:][:upper:][:space:]]
+0 1 .		[[:digit:][:upper:][:space:]]
+1 1 .		[[:digit:][:punct:][:space:]]
+1 1 5		[[:xdigit:]]
+1 1 f		[[:xdigit:]]
+1 1 D		[[:xdigit:]]
+1 1 _		[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]
+#1 1 …		[^[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]
+1 1 \x7f		[^[:alnum:][:alpha:][:blank:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]
+1 1 .		[^[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:lower:][:space:][:upper:][:xdigit:]]
+1 1 5		[a-c[:digit:]x-z]
+1 1 b		[a-c[:digit:]x-z]
+1 1 y		[a-c[:digit:]x-z]
+0 1 q		[a-c[:digit:]x-z]
+
+# Additional tests, including some malformed wildmats
+1 1 ]		[\\-^]
+0 1 [		[\\-^]
+1 1 -		[\-_]
+1 1 ]		[\]]
+0 1 \]		[\]]
+0 1 \		[\]]
+0 1 ab		a[]b
+0 1 a[]b	a[]b
+0 1 ab[		ab[
+0 1 ab		[!
+0 1 ab		[-
+1 1 -		[-]
+0 1 -		[a-
+0 1 -		[!a-
+1 1 -		[--A]
+1 1 5		[--A]
+1 1 ' '		'[ --]'
+1 1 $		'[ --]'
+1 1 -		'[ --]'
+0 1 0		'[ --]'
+1 1 -		[---]
+1 1 -		[------]
+0 1 j		[a-e-n]
+1 1 -		[a-e-n]
+1 1 a		[!------]
+0 1 [		[]-a]
+1 1 ^		[]-a]
+0 1 ^		[!]-a]
+1 1 [		[!]-a]
+1 1 ^		[a^bc]
+1 1 -b]		[a-]b]
+0 1 \		[\]
+1 1 \		[\\]
+0 1 \		[!\\]
+1 1 G		[A-\\]
+0 1 aaabbb	b*a
+0 1 aabcaa	*ba*
+1 1 ,		[,]
+1 1 ,		[\\,]
+1 1 \		[\\,]
+1 1 -		[,-.]
+0 1 +		[,-.]
+0 1 -.]		[,-.]
+1 1 2		[\1-\3]
+1 1 3		[\1-\3]
+0 1 4		[\1-\3]
+1 1 \		[[-\]]
+1 1 [		[[-\]]
+1 1 ]		[[-\]]
+0 1 -		[[-\]]
+
+# Test recursion and the abort code (use "wildtest -i" to see iteration counts)
+1 1 -adobe-courier-bold-o-normal--12-120-75-75-m-70-iso8859-1	-*-*-*-*-*-*-12-*-*-*-m-*-*-*
+0 1 -adobe-courier-bold-o-normal--12-120-75-75-X-70-iso8859-1	-*-*-*-*-*-*-12-*-*-*-m-*-*-*
+0 1 -adobe-courier-bold-o-normal--12-120-75-75-/-70-iso8859-1	-*-*-*-*-*-*-12-*-*-*-m-*-*-*
+1 1 /adobe/courier/bold/o/normal//12/120/75/75/m/70/iso8859/1	/*/*/*/*/*/*/12/*/*/*/m/*/*/*
+0 1 /adobe/courier/bold/o/normal//12/120/75/75/X/70/iso8859/1	/*/*/*/*/*/*/12/*/*/*/m/*/*/*
+1 1 abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt		**/*a*b*g*n*t
+0 1 abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txtz		**/*a*b*g*n*t
diff --git a/wildmatch.c b/wildmatch.c
new file mode 100644
index 0000000..f3a1731
--- /dev/null
+++ b/wildmatch.c
@@ -0,0 +1,368 @@
+/*
+**  Do shell-style pattern matching for ?, \, [], and * characters.
+**  It is 8bit clean.
+**
+**  Written by Rich $alz, mirror!rs, Wed Nov 26 19:03:17 EST 1986.
+**  Rich $alz is now <rsalz@bbn.com>.
+**
+**  Modified by Wayne Davison to special-case '/' matching, to make '**'
+**  work differently than '*', and to fix the character-class code.
+*/
+
+#include "rsync.h"
+
+/* What character marks an inverted character class? */
+#define NEGATE_CLASS	'!'
+#define NEGATE_CLASS2	'^'
+
+#define FALSE 0
+#define TRUE 1
+#define ABORT_ALL -1
+#define ABORT_TO_STARSTAR -2
+
+#define CC_EQ(class, len, litmatch) ((len) == sizeof (litmatch)-1 \
+				    && *(class) == *(litmatch) \
+				    && strncmp((char*)class, litmatch, len) == 0)
+
+#if defined STDC_HEADERS || !defined isascii
+# define ISASCII(c) 1
+#else
+# define ISASCII(c) isascii(c)
+#endif
+
+#ifdef isblank
+# define ISBLANK(c) (ISASCII(c) && isblank(c))
+#else
+# define ISBLANK(c) ((c) == ' ' || (c) == '\t')
+#endif
+
+#ifdef isgraph
+# define ISGRAPH(c) (ISASCII(c) && isgraph(c))
+#else
+# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c))
+#endif
+
+#define ISPRINT(c) (ISASCII(c) && isprint(c))
+#define ISDIGIT(c) (ISASCII(c) && isdigit(c))
+#define ISALNUM(c) (ISASCII(c) && isalnum(c))
+#define ISALPHA(c) (ISASCII(c) && isalpha(c))
+#define ISCNTRL(c) (ISASCII(c) && iscntrl(c))
+#define ISLOWER(c) (ISASCII(c) && islower(c))
+#define ISPUNCT(c) (ISASCII(c) && ispunct(c))
+#define ISSPACE(c) (ISASCII(c) && isspace(c))
+#define ISUPPER(c) (ISASCII(c) && isupper(c))
+#define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
+
+#ifdef WILD_TEST_ITERATIONS
+int wildmatch_iteration_count;
+#endif
+
+static int force_lower_case = 0;
+
+/* Match pattern "p" against the a virtually-joined string consisting
+ * of "text" and any strings in array "a". */
+static int dowild(const uchar *p, const uchar *text, const uchar*const *a)
+{
+    uchar p_ch;
+
+#ifdef WILD_TEST_ITERATIONS
+    wildmatch_iteration_count++;
+#endif
+
+    for ( ; (p_ch = *p) != '\0'; text++, p++) {
+	int matched, special;
+	uchar t_ch, prev_ch;
+	while ((t_ch = *text) == '\0') {
+	    if (*a == NULL) {
+		if (p_ch != '*')
+		    return ABORT_ALL;
+		break;
+	    }
+	    text = *a++;
+	}
+	if (force_lower_case && ISUPPER(t_ch))
+	    t_ch = tolower(t_ch);
+	switch (p_ch) {
+	  case '\\':
+	    /* Literal match with following character.  Note that the test
+	     * in "default" handles the p[1] == '\0' failure case. */
+	    p_ch = *++p;
+	    /* FALLTHROUGH */
+	  default:
+	    if (t_ch != p_ch)
+		return FALSE;
+	    continue;
+	  case '?':
+	    /* Match anything but '/'. */
+	    if (t_ch == '/')
+		return FALSE;
+	    continue;
+	  case '*':
+	    if (*++p == '*') {
+		while (*++p == '*') {}
+		special = TRUE;
+	    } else
+		special = FALSE;
+	    if (*p == '\0') {
+		/* Trailing "**" matches everything.  Trailing "*" matches
+		 * only if there are no more slash characters. */
+		if (!special) {
+		    do {
+			if (strchr((char*)text, '/') != NULL)
+			    return FALSE;
+		    } while ((text = *a++) != NULL);
+		}
+		return TRUE;
+	    }
+	    while (1) {
+		if (t_ch == '\0') {
+		    if ((text = *a++) == NULL)
+			break;
+		    t_ch = *text;
+		    continue;
+		}
+		if ((matched = dowild(p, text, a)) != FALSE) {
+		    if (!special || matched != ABORT_TO_STARSTAR)
+			return matched;
+		} else if (!special && t_ch == '/')
+		    return ABORT_TO_STARSTAR;
+		t_ch = *++text;
+	    }
+	    return ABORT_ALL;
+	  case '[':
+	    p_ch = *++p;
+#ifdef NEGATE_CLASS2
+	    if (p_ch == NEGATE_CLASS2)
+		p_ch = NEGATE_CLASS;
+#endif
+	    /* Assign literal TRUE/FALSE because of "matched" comparison. */
+	    special = p_ch == NEGATE_CLASS? TRUE : FALSE;
+	    if (special) {
+		/* Inverted character class. */
+		p_ch = *++p;
+	    }
+	    prev_ch = 0;
+	    matched = FALSE;
+	    do {
+		if (!p_ch)
+		    return ABORT_ALL;
+		if (p_ch == '\\') {
+		    p_ch = *++p;
+		    if (!p_ch)
+			return ABORT_ALL;
+		    if (t_ch == p_ch)
+			matched = TRUE;
+		} else if (p_ch == '-' && prev_ch && p[1] && p[1] != ']') {
+		    p_ch = *++p;
+		    if (p_ch == '\\') {
+			p_ch = *++p;
+			if (!p_ch)
+			    return ABORT_ALL;
+		    }
+		    if (t_ch <= p_ch && t_ch >= prev_ch)
+			matched = TRUE;
+		    p_ch = 0; /* This makes "prev_ch" get set to 0. */
+		} else if (p_ch == '[' && p[1] == ':') {
+		    const uchar *s;
+		    int i;
+		    for (s = p += 2; (p_ch = *p) && p_ch != ']'; p++) {} /*SHARED ITERATOR*/
+		    if (!p_ch)
+			return ABORT_ALL;
+		    i = p - s - 1;
+		    if (i < 0 || p[-1] != ':') {
+			/* Didn't find ":]", so treat like a normal set. */
+			p = s - 2;
+			p_ch = '[';
+			if (t_ch == p_ch)
+			    matched = TRUE;
+			continue;
+		    }
+		    if (CC_EQ(s,i, "alnum")) {
+			if (ISALNUM(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "alpha")) {
+			if (ISALPHA(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "blank")) {
+			if (ISBLANK(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "cntrl")) {
+			if (ISCNTRL(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "digit")) {
+			if (ISDIGIT(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "graph")) {
+			if (ISGRAPH(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "lower")) {
+			if (ISLOWER(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "print")) {
+			if (ISPRINT(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "punct")) {
+			if (ISPUNCT(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "space")) {
+			if (ISSPACE(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "upper")) {
+			if (ISUPPER(t_ch))
+			    matched = TRUE;
+		    } else if (CC_EQ(s,i, "xdigit")) {
+			if (ISXDIGIT(t_ch))
+			    matched = TRUE;
+		    } else /* malformed [:class:] string */
+			return ABORT_ALL;
+		    p_ch = 0; /* This makes "prev_ch" get set to 0. */
+		} else if (t_ch == p_ch)
+		    matched = TRUE;
+	    } while (prev_ch = p_ch, (p_ch = *++p) != ']');
+	    if (matched == special || t_ch == '/')
+		return FALSE;
+	    continue;
+	}
+    }
+
+    do {
+	if (*text)
+	    return FALSE;
+    } while ((text = *a++) != NULL);
+
+    return TRUE;
+}
+
+/* Match literal string "s" against the a virtually-joined string consisting
+ * of "text" and any strings in array "a". */
+static int doliteral(const uchar *s, const uchar *text, const uchar*const *a)
+{
+    for ( ; *s != '\0'; text++, s++) {
+	while (*text == '\0') {
+	    if ((text = *a++) == NULL)
+		return FALSE;
+	}
+	if (*text != *s)
+	    return FALSE;
+    }
+
+    do {
+	if (*text)
+	    return FALSE;
+    } while ((text = *a++) != NULL);
+
+    return TRUE;
+}
+
+/* Return the last "count" path elements from the concatenated string.
+ * We return a string pointer to the start of the string, and update the
+ * array pointer-pointer to point to any remaining string elements. */
+static const uchar *trailing_N_elements(const uchar*const **a_ptr, int count)
+{
+    const uchar*const *a = *a_ptr;
+    const uchar*const *first_a = a;
+
+    while (*a)
+	    a++;
+
+    while (a != first_a) {
+	const uchar *s = *--a;
+	s += strlen((char*)s);
+	while (--s >= *a) {
+	    if (*s == '/' && !--count) {
+		*a_ptr = a+1;
+		return s+1;
+	    }
+	}
+    }
+
+    if (count == 1) {
+	*a_ptr = a+1;
+	return *a;
+    }
+
+    return NULL;
+}
+
+/* Match the "pattern" against the "text" string. */
+int wildmatch(const char *pattern, const char *text)
+{
+    static const uchar *nomore[1]; /* A NULL pointer. */
+#ifdef WILD_TEST_ITERATIONS
+    wildmatch_iteration_count = 0;
+#endif
+    return dowild((const uchar*)pattern, (const uchar*)text, nomore) == TRUE;
+}
+
+/* Match the "pattern" against the forced-to-lower-case "text" string. */
+int iwildmatch(const char *pattern, const char *text)
+{
+    static const uchar *nomore[1]; /* A NULL pointer. */
+    int ret;
+#ifdef WILD_TEST_ITERATIONS
+    wildmatch_iteration_count = 0;
+#endif
+    force_lower_case = 1;
+    ret = dowild((const uchar*)pattern, (const uchar*)text, nomore) == TRUE;
+    force_lower_case = 0;
+    return ret;
+}
+
+/* Match pattern "p" against the a virtually-joined string consisting
+ * of all the pointers in array "texts" (which has a NULL pointer at the
+ * end).  The int "where" can be 0 (normal matching), > 0 (match only
+ * the trailing N slash-separated filename components of "texts"), or < 0
+ * (match the "pattern" at the start or after any slash in "texts"). */
+int wildmatch_array(const char *pattern, const char*const *texts, int where)
+{
+    const uchar *p = (const uchar*)pattern;
+    const uchar*const *a = (const uchar*const*)texts;
+    const uchar *text;
+    int matched;
+
+#ifdef WILD_TEST_ITERATIONS
+    wildmatch_iteration_count = 0;
+#endif
+
+    if (where > 0)
+	text = trailing_N_elements(&a, where);
+    else
+	text = *a++;
+    if (!text)
+	return FALSE;
+
+    if ((matched = dowild(p, text, a)) != TRUE && where < 0
+     && matched != ABORT_ALL) {
+	while (1) {
+	    if (*text == '\0') {
+		if ((text = (uchar*)*a++) == NULL)
+		    return FALSE;
+		continue;
+	    }
+	    if (*text++ == '/' && (matched = dowild(p, text, a)) != FALSE
+	     && matched != ABORT_TO_STARSTAR)
+		break;
+	}
+    }
+    return matched == TRUE;
+}
+
+/* Match literal string "s" against the a virtually-joined string consisting
+ * of all the pointers in array "texts" (which has a NULL pointer at the
+ * end).  The int "where" can be 0 (normal matching), or > 0 (match
+ * only the trailing N slash-separated filename components of "texts"). */
+int litmatch_array(const char *string, const char*const *texts, int where)
+{
+    const uchar *s = (const uchar*)string;
+    const uchar*const *a = (const uchar* const*)texts;
+    const uchar *text;
+
+    if (where > 0)
+	text = trailing_N_elements(&a, where);
+    else
+	text = *a++;
+    if (!text)
+	return FALSE;
+
+    return doliteral(s, text, a) == TRUE;
+}
diff --git a/wildmatch.h b/wildmatch.h
new file mode 100644
index 0000000..e7f1a35
--- /dev/null
+++ b/wildmatch.h
@@ -0,0 +1,6 @@
+/* wildmatch.h */
+
+int wildmatch(const char *pattern, const char *text);
+int iwildmatch(const char *pattern, const char *text);
+int wildmatch_array(const char *pattern, const char*const *texts, int where);
+int litmatch_array(const char *string, const char*const *texts, int where);
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/8] wildmatch: remove unnecessary functions
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 1/8] Import wildmatch from rsync Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 3/8] Integrate wildmatch to git Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 wildmatch.c | 161 ++++--------------------------------------------------------
 wildmatch.h |   2 -
 2 files changed, 9 insertions(+), 154 deletions(-)

diff --git a/wildmatch.c b/wildmatch.c
index f3a1731..71dba76 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -53,33 +53,19 @@
 #define ISUPPER(c) (ISASCII(c) && isupper(c))
 #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
 
-#ifdef WILD_TEST_ITERATIONS
-int wildmatch_iteration_count;
-#endif
-
 static int force_lower_case = 0;
 
 /* Match pattern "p" against the a virtually-joined string consisting
  * of "text" and any strings in array "a". */
-static int dowild(const uchar *p, const uchar *text, const uchar*const *a)
+static int dowild(const uchar *p, const uchar *text)
 {
     uchar p_ch;
 
-#ifdef WILD_TEST_ITERATIONS
-    wildmatch_iteration_count++;
-#endif
-
     for ( ; (p_ch = *p) != '\0'; text++, p++) {
 	int matched, special;
 	uchar t_ch, prev_ch;
-	while ((t_ch = *text) == '\0') {
-	    if (*a == NULL) {
-		if (p_ch != '*')
-		    return ABORT_ALL;
-		break;
-	    }
-	    text = *a++;
-	}
+	if ((t_ch = *text) == '\0' && p_ch != '*')
+		return ABORT_ALL;
 	if (force_lower_case && ISUPPER(t_ch))
 	    t_ch = tolower(t_ch);
 	switch (p_ch) {
@@ -107,21 +93,15 @@ static int dowild(const uchar *p, const uchar *text, const uchar*const *a)
 		/* Trailing "**" matches everything.  Trailing "*" matches
 		 * only if there are no more slash characters. */
 		if (!special) {
-		    do {
 			if (strchr((char*)text, '/') != NULL)
 			    return FALSE;
-		    } while ((text = *a++) != NULL);
 		}
 		return TRUE;
 	    }
 	    while (1) {
-		if (t_ch == '\0') {
-		    if ((text = *a++) == NULL)
-			break;
-		    t_ch = *text;
-		    continue;
-		}
-		if ((matched = dowild(p, text, a)) != FALSE) {
+		if (t_ch == '\0')
+		    break;
+		if ((matched = dowild(p, text)) != FALSE) {
 		    if (!special || matched != ABORT_TO_STARSTAR)
 			return matched;
 		} else if (!special && t_ch == '/')
@@ -225,144 +205,21 @@ static int dowild(const uchar *p, const uchar *text, const uchar*const *a)
 	}
     }
 
-    do {
-	if (*text)
-	    return FALSE;
-    } while ((text = *a++) != NULL);
-
-    return TRUE;
-}
-
-/* Match literal string "s" against the a virtually-joined string consisting
- * of "text" and any strings in array "a". */
-static int doliteral(const uchar *s, const uchar *text, const uchar*const *a)
-{
-    for ( ; *s != '\0'; text++, s++) {
-	while (*text == '\0') {
-	    if ((text = *a++) == NULL)
-		return FALSE;
-	}
-	if (*text != *s)
-	    return FALSE;
-    }
-
-    do {
-	if (*text)
-	    return FALSE;
-    } while ((text = *a++) != NULL);
-
-    return TRUE;
-}
-
-/* Return the last "count" path elements from the concatenated string.
- * We return a string pointer to the start of the string, and update the
- * array pointer-pointer to point to any remaining string elements. */
-static const uchar *trailing_N_elements(const uchar*const **a_ptr, int count)
-{
-    const uchar*const *a = *a_ptr;
-    const uchar*const *first_a = a;
-
-    while (*a)
-	    a++;
-
-    while (a != first_a) {
-	const uchar *s = *--a;
-	s += strlen((char*)s);
-	while (--s >= *a) {
-	    if (*s == '/' && !--count) {
-		*a_ptr = a+1;
-		return s+1;
-	    }
-	}
-    }
-
-    if (count == 1) {
-	*a_ptr = a+1;
-	return *a;
-    }
-
-    return NULL;
+    return *text ? FALSE : TRUE;
 }
 
 /* Match the "pattern" against the "text" string. */
 int wildmatch(const char *pattern, const char *text)
 {
-    static const uchar *nomore[1]; /* A NULL pointer. */
-#ifdef WILD_TEST_ITERATIONS
-    wildmatch_iteration_count = 0;
-#endif
-    return dowild((const uchar*)pattern, (const uchar*)text, nomore) == TRUE;
+    return dowild((const uchar*)pattern, (const uchar*)text) == TRUE;
 }
 
 /* Match the "pattern" against the forced-to-lower-case "text" string. */
 int iwildmatch(const char *pattern, const char *text)
 {
-    static const uchar *nomore[1]; /* A NULL pointer. */
     int ret;
-#ifdef WILD_TEST_ITERATIONS
-    wildmatch_iteration_count = 0;
-#endif
     force_lower_case = 1;
-    ret = dowild((const uchar*)pattern, (const uchar*)text, nomore) == TRUE;
+    ret = dowild((const uchar*)pattern, (const uchar*)text) == TRUE;
     force_lower_case = 0;
     return ret;
 }
-
-/* Match pattern "p" against the a virtually-joined string consisting
- * of all the pointers in array "texts" (which has a NULL pointer at the
- * end).  The int "where" can be 0 (normal matching), > 0 (match only
- * the trailing N slash-separated filename components of "texts"), or < 0
- * (match the "pattern" at the start or after any slash in "texts"). */
-int wildmatch_array(const char *pattern, const char*const *texts, int where)
-{
-    const uchar *p = (const uchar*)pattern;
-    const uchar*const *a = (const uchar*const*)texts;
-    const uchar *text;
-    int matched;
-
-#ifdef WILD_TEST_ITERATIONS
-    wildmatch_iteration_count = 0;
-#endif
-
-    if (where > 0)
-	text = trailing_N_elements(&a, where);
-    else
-	text = *a++;
-    if (!text)
-	return FALSE;
-
-    if ((matched = dowild(p, text, a)) != TRUE && where < 0
-     && matched != ABORT_ALL) {
-	while (1) {
-	    if (*text == '\0') {
-		if ((text = (uchar*)*a++) == NULL)
-		    return FALSE;
-		continue;
-	    }
-	    if (*text++ == '/' && (matched = dowild(p, text, a)) != FALSE
-	     && matched != ABORT_TO_STARSTAR)
-		break;
-	}
-    }
-    return matched == TRUE;
-}
-
-/* Match literal string "s" against the a virtually-joined string consisting
- * of all the pointers in array "texts" (which has a NULL pointer at the
- * end).  The int "where" can be 0 (normal matching), or > 0 (match
- * only the trailing N slash-separated filename components of "texts"). */
-int litmatch_array(const char *string, const char*const *texts, int where)
-{
-    const uchar *s = (const uchar*)string;
-    const uchar*const *a = (const uchar* const*)texts;
-    const uchar *text;
-
-    if (where > 0)
-	text = trailing_N_elements(&a, where);
-    else
-	text = *a++;
-    if (!text)
-	return FALSE;
-
-    return doliteral(s, text, a) == TRUE;
-}
diff --git a/wildmatch.h b/wildmatch.h
index e7f1a35..562faa3 100644
--- a/wildmatch.h
+++ b/wildmatch.h
@@ -2,5 +2,3 @@
 
 int wildmatch(const char *pattern, const char *text);
 int iwildmatch(const char *pattern, const char *text);
-int wildmatch_array(const char *pattern, const char*const *texts, int where);
-int litmatch_array(const char *string, const char*const *texts, int where);
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/8] Integrate wildmatch to git
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 1/8] Import wildmatch from rsync Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 2/8] wildmatch: remove unnecessary functions Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 4/8] wildmatch: remove static variable force_lower_case Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 13321 bytes --]

This makes wildmatch.c part of libgit.a and builds test-wildmatch; the
dependency on libpopt in the original has been replaced with the use
of our parse-options. Global variables in test-wildmatch are marked
static to avoid sparse warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 .gitignore           |   1 +
 Makefile             |   3 +
 t/t3070-wildmatch.sh | 178 +++++++++++++++++++++++++++++++++++++++++++++++++++
 t/t3070/wildtest.txt | 165 -----------------------------------------------
 test-wildmatch.c     |  14 ++++
 wildmatch.c          |   8 ++-
 6 files changed, 203 insertions(+), 166 deletions(-)
 create mode 100755 t/t3070-wildmatch.sh
 delete mode 100644 t/t3070/wildtest.txt
 create mode 100644 test-wildmatch.c

diff --git a/.gitignore b/.gitignore
index a188a82..37c3507 100644
--- a/.gitignore
+++ b/.gitignore
@@ -197,6 +197,7 @@
 /test-string-list
 /test-subprocess
 /test-svn-fe
+/test-wildmatch
 /common-cmds.h
 *.tar.gz
 *.dsc
diff --git a/Makefile b/Makefile
index 8413606..9a97379 100644
--- a/Makefile
+++ b/Makefile
@@ -523,6 +523,7 @@ TEST_PROGRAMS_NEED_X += test-sigchain
 TEST_PROGRAMS_NEED_X += test-string-list
 TEST_PROGRAMS_NEED_X += test-subprocess
 TEST_PROGRAMS_NEED_X += test-svn-fe
+TEST_PROGRAMS_NEED_X += test-wildmatch
 
 TEST_PROGRAMS = $(patsubst %,%$X,$(TEST_PROGRAMS_NEED_X))
 
@@ -695,6 +696,7 @@ LIB_H += userdiff.h
 LIB_H += utf8.h
 LIB_H += varint.h
 LIB_H += walker.h
+LIB_H += wildmatch.h
 LIB_H += wt-status.h
 LIB_H += xdiff-interface.h
 LIB_H += xdiff/xdiff.h
@@ -826,6 +828,7 @@ LIB_OBJS += utf8.o
 LIB_OBJS += varint.o
 LIB_OBJS += version.o
 LIB_OBJS += walker.o
+LIB_OBJS += wildmatch.o
 LIB_OBJS += wrapper.o
 LIB_OBJS += write_or_die.o
 LIB_OBJS += ws.o
diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
new file mode 100755
index 0000000..bb92f8d
--- /dev/null
+++ b/t/t3070-wildmatch.sh
@@ -0,0 +1,178 @@
+#!/bin/sh
+
+test_description='wildmatch tests'
+
+. ./test-lib.sh
+
+match() {
+    test_expect_success "wildmatch $*" "
+	if [ $1 = 1 ]; then
+	    test-wildmatch wildmatch '$3' '$4'
+	else
+	    ! test-wildmatch wildmatch '$3' '$4'
+	fi &&
+	if [ $2 = 1 ]; then
+	    test-wildmatch fnmatch '$3' '$4'
+	else
+	    ! test-wildmatch fnmatch '$3' '$4'
+	fi
+    "
+}
+
+# Basic wildmat features
+match 1 1 foo foo
+match 0 0 foo bar
+match 1 1 '' ""
+match 1 1 foo '???'
+match 0 0 foo '??'
+match 1 1 foo '*'
+match 1 1 foo 'f*'
+match 0 0 foo '*f'
+match 1 1 foo '*foo*'
+match 1 1 foobar '*ob*a*r*'
+match 1 1 aaaaaaabababab '*ab'
+match 1 1 'foo*' 'foo\*'
+match 0 0 foobar 'foo\*bar'
+match 1 1 'f\oo' 'f\\oo'
+match 1 1 ball '*[al]?'
+match 0 0 ten '[ten]'
+match 1 1 ten '**[!te]'
+match 0 0 ten '**[!ten]'
+match 1 1 ten 't[a-g]n'
+match 0 0 ten 't[!a-g]n'
+match 1 1 ton 't[!a-g]n'
+match 1 1 ton 't[^a-g]n'
+match 1 1 'a]b' 'a[]]b'
+match 1 1 a-b 'a[]-]b'
+match 1 1 'a]b' 'a[]-]b'
+match 0 0 aab 'a[]-]b'
+match 1 1 aab 'a[]a-]b'
+match 1 1 ']' ']'
+
+# Extended slash-matching features
+match 0 0 'foo/baz/bar' 'foo*bar'
+match 1 0 'foo/baz/bar' 'foo**bar'
+match 0 0 'foo/bar' 'foo?bar'
+match 0 0 'foo/bar' 'foo[/]bar'
+match 0 0 'foo/bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
+match 1 1 'foo-bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
+match 0 0 'foo' '**/foo'
+match 1 1 '/foo' '**/foo'
+match 1 0 'bar/baz/foo' '**/foo'
+match 0 0 'bar/baz/foo' '*/foo'
+match 0 0 'foo/bar/baz' '**/bar*'
+match 1 0 'deep/foo/bar/baz' '**/bar/*'
+match 0 0 'deep/foo/bar/baz/' '**/bar/*'
+match 1 0 'deep/foo/bar/baz/' '**/bar/**'
+match 0 0 'deep/foo/bar' '**/bar/*'
+match 1 0 'deep/foo/bar/' '**/bar/**'
+match 1 0 'foo/bar/baz' '**/bar**'
+match 1 0 'foo/bar/baz/x' '*/bar/**'
+match 0 0 'deep/foo/bar/baz/x' '*/bar/**'
+match 1 0 'deep/foo/bar/baz/x' '**/bar/*/*'
+
+# Various additional tests
+match 0 0 'acrt' 'a[c-c]st'
+match 1 1 'acrt' 'a[c-c]rt'
+match 0 0 ']' '[!]-]'
+match 1 1 'a' '[!]-]'
+match 0 0 '' '\'
+match 0 0 '\' '\'
+match 0 0 '/\' '*/\'
+match 1 1 '/\' '*/\\'
+match 1 1 'foo' 'foo'
+match 1 1 '@foo' '@foo'
+match 0 0 'foo' '@foo'
+match 1 1 '[ab]' '\[ab]'
+match 1 1 '[ab]' '[[]ab]'
+match 1 1 '[ab]' '[[:]ab]'
+match 0 0 '[ab]' '[[::]ab]'
+match 1 1 '[ab]' '[[:digit]ab]'
+match 1 1 '[ab]' '[\[:]ab]'
+match 1 1 '?a?b' '\??\?b'
+match 1 1 'abc' '\a\b\c'
+match 0 0 'foo' ''
+match 1 0 'foo/bar/baz/to' '**/t[o]'
+
+# Character class tests
+match 1 1 'a1B' '[[:alpha:]][[:digit:]][[:upper:]]'
+match 0 0 'a' '[[:digit:][:upper:][:space:]]'
+match 1 1 'A' '[[:digit:][:upper:][:space:]]'
+match 1 0 '1' '[[:digit:][:upper:][:space:]]'
+match 0 0 '1' '[[:digit:][:upper:][:spaci:]]'
+match 1 1 ' ' '[[:digit:][:upper:][:space:]]'
+match 0 0 '.' '[[:digit:][:upper:][:space:]]'
+match 1 1 '.' '[[:digit:][:punct:][:space:]]'
+match 1 1 '5' '[[:xdigit:]]'
+match 1 1 'f' '[[:xdigit:]]'
+match 1 1 'D' '[[:xdigit:]]'
+match 1 0 '_' '[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]'
+match 1 0 '_' '[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]'
+match 1 1 '.' '[^[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:lower:][:space:][:upper:][:xdigit:]]'
+match 1 1 '5' '[a-c[:digit:]x-z]'
+match 1 1 'b' '[a-c[:digit:]x-z]'
+match 1 1 'y' '[a-c[:digit:]x-z]'
+match 0 0 'q' '[a-c[:digit:]x-z]'
+
+# Additional tests, including some malformed wildmats
+match 1 1 ']' '[\\-^]'
+match 0 0 '[' '[\\-^]'
+match 1 1 '-' '[\-_]'
+match 1 1 ']' '[\]]'
+match 0 0 '\]' '[\]]'
+match 0 0 '\' '[\]]'
+match 0 0 'ab' 'a[]b'
+match 0 1 'a[]b' 'a[]b'
+match 0 1 'ab[' 'ab['
+match 0 0 'ab' '[!'
+match 0 0 'ab' '[-'
+match 1 1 '-' '[-]'
+match 0 0 '-' '[a-'
+match 0 0 '-' '[!a-'
+match 1 1 '-' '[--A]'
+match 1 1 '5' '[--A]'
+match 1 1 ' ' '[ --]'
+match 1 1 '$' '[ --]'
+match 1 1 '-' '[ --]'
+match 0 0 '0' '[ --]'
+match 1 1 '-' '[---]'
+match 1 1 '-' '[------]'
+match 0 0 'j' '[a-e-n]'
+match 1 1 '-' '[a-e-n]'
+match 1 1 'a' '[!------]'
+match 0 0 '[' '[]-a]'
+match 1 1 '^' '[]-a]'
+match 0 0 '^' '[!]-a]'
+match 1 1 '[' '[!]-a]'
+match 1 1 '^' '[a^bc]'
+match 1 1 '-b]' '[a-]b]'
+match 0 0 '\' '[\]'
+match 1 1 '\' '[\\]'
+match 0 0 '\' '[!\\]'
+match 1 1 'G' '[A-\\]'
+match 0 0 'aaabbb' 'b*a'
+match 0 0 'aabcaa' '*ba*'
+match 1 1 ',' '[,]'
+match 1 1 ',' '[\\,]'
+match 1 1 '\' '[\\,]'
+match 1 1 '-' '[,-.]'
+match 0 0 '+' '[,-.]'
+match 0 0 '-.]' '[,-.]'
+match 1 1 '2' '[\1-\3]'
+match 1 1 '3' '[\1-\3]'
+match 0 0 '4' '[\1-\3]'
+match 1 1 '\' '[[-\]]'
+match 1 1 '[' '[[-\]]'
+match 1 1 ']' '[[-\]]'
+match 0 0 '-' '[[-\]]'
+
+# Test recursion and the abort code (use "wildtest -i" to see iteration counts)
+match 1 1 '-adobe-courier-bold-o-normal--12-120-75-75-m-70-iso8859-1' '-*-*-*-*-*-*-12-*-*-*-m-*-*-*'
+match 0 0 '-adobe-courier-bold-o-normal--12-120-75-75-X-70-iso8859-1' '-*-*-*-*-*-*-12-*-*-*-m-*-*-*'
+match 0 0 '-adobe-courier-bold-o-normal--12-120-75-75-/-70-iso8859-1' '-*-*-*-*-*-*-12-*-*-*-m-*-*-*'
+match 1 1 '/adobe/courier/bold/o/normal//12/120/75/75/m/70/iso8859/1' '/*/*/*/*/*/*/12/*/*/*/m/*/*/*'
+match 0 0 '/adobe/courier/bold/o/normal//12/120/75/75/X/70/iso8859/1' '/*/*/*/*/*/*/12/*/*/*/m/*/*/*'
+match 1 0 'abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt' '**/*a*b*g*n*t'
+match 0 0 'abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txtz' '**/*a*b*g*n*t'
+
+test_done
diff --git a/t/t3070/wildtest.txt b/t/t3070/wildtest.txt
deleted file mode 100644
index 42c1678..0000000
--- a/t/t3070/wildtest.txt
+++ /dev/null
@@ -1,165 +0,0 @@
-# Input is in the following format (all items white-space separated):
-#
-# The first two items are 1 or 0 indicating if the wildmat call is expected to
-# succeed and if fnmatch works the same way as wildmat, respectively.  After
-# that is a text string for the match, and a pattern string.  Strings can be
-# quoted (if desired) in either double or single quotes, as well as backticks.
-#
-# MATCH FNMATCH_SAME "text to match" 'pattern to use'
-
-# Basic wildmat features
-1 1 foo			foo
-0 1 foo			bar
-1 1 ''			""
-1 1 foo			???
-0 1 foo			??
-1 1 foo			*
-1 1 foo			f*
-0 1 foo			*f
-1 1 foo			*foo*
-1 1 foobar		*ob*a*r*
-1 1 aaaaaaabababab	*ab
-1 1 foo*		foo\*
-0 1 foobar		foo\*bar
-1 1 f\oo		f\\oo
-1 1 ball		*[al]?
-0 1 ten			[ten]
-1 1 ten			**[!te]
-0 1 ten			**[!ten]
-1 1 ten			t[a-g]n
-0 1 ten			t[!a-g]n
-1 1 ton			t[!a-g]n
-1 1 ton			t[^a-g]n
-1 1 a]b			a[]]b
-1 1 a-b			a[]-]b
-1 1 a]b			a[]-]b
-0 1 aab			a[]-]b
-1 1 aab			a[]a-]b
-1 1 ]			]
-
-# Extended slash-matching features
-0 1 foo/baz/bar		foo*bar
-1 1 foo/baz/bar		foo**bar
-0 1 foo/bar		foo?bar
-0 1 foo/bar		foo[/]bar
-0 1 foo/bar		f[^eiu][^eiu][^eiu][^eiu][^eiu]r
-1 1 foo-bar		f[^eiu][^eiu][^eiu][^eiu][^eiu]r
-0 1 foo			**/foo
-1 1 /foo		**/foo
-1 1 bar/baz/foo		**/foo
-0 1 bar/baz/foo		*/foo
-0 0 foo/bar/baz		**/bar*
-1 1 deep/foo/bar/baz	**/bar/*
-0 1 deep/foo/bar/baz/	**/bar/*
-1 1 deep/foo/bar/baz/	**/bar/**
-0 1 deep/foo/bar	**/bar/*
-1 1 deep/foo/bar/	**/bar/**
-1 1 foo/bar/baz		**/bar**
-1 1 foo/bar/baz/x	*/bar/**
-0 0 deep/foo/bar/baz/x	*/bar/**
-1 1 deep/foo/bar/baz/x	**/bar/*/*
-
-# Various additional tests
-0 1 acrt		a[c-c]st
-1 1 acrt		a[c-c]rt
-0 1 ]			[!]-]
-1 1 a			[!]-]
-0 1 ''			\
-0 1 \			\
-0 1 /\			*/\
-1 1 /\			*/\\
-1 1 foo			foo
-1 1 @foo		@foo
-0 1 foo			@foo
-1 1 [ab]		\[ab]
-1 1 [ab]		[[]ab]
-1 1 [ab]		[[:]ab]
-0 1 [ab]		[[::]ab]
-1 1 [ab]		[[:digit]ab]
-1 1 [ab]		[\[:]ab]
-1 1 ?a?b		\??\?b
-1 1 abc			\a\b\c
-0 1 foo			''
-1 1 foo/bar/baz/to	**/t[o]
-
-# Character class tests
-1 1 a1B		[[:alpha:]][[:digit:]][[:upper:]]
-0 1 a		[[:digit:][:upper:][:space:]]
-1 1 A		[[:digit:][:upper:][:space:]]
-1 1 1		[[:digit:][:upper:][:space:]]
-0 1 1		[[:digit:][:upper:][:spaci:]]
-1 1 ' '		[[:digit:][:upper:][:space:]]
-0 1 .		[[:digit:][:upper:][:space:]]
-1 1 .		[[:digit:][:punct:][:space:]]
-1 1 5		[[:xdigit:]]
-1 1 f		[[:xdigit:]]
-1 1 D		[[:xdigit:]]
-1 1 _		[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]
-#1 1 …		[^[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]
-1 1 \x7f		[^[:alnum:][:alpha:][:blank:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:xdigit:]]
-1 1 .		[^[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:lower:][:space:][:upper:][:xdigit:]]
-1 1 5		[a-c[:digit:]x-z]
-1 1 b		[a-c[:digit:]x-z]
-1 1 y		[a-c[:digit:]x-z]
-0 1 q		[a-c[:digit:]x-z]
-
-# Additional tests, including some malformed wildmats
-1 1 ]		[\\-^]
-0 1 [		[\\-^]
-1 1 -		[\-_]
-1 1 ]		[\]]
-0 1 \]		[\]]
-0 1 \		[\]]
-0 1 ab		a[]b
-0 1 a[]b	a[]b
-0 1 ab[		ab[
-0 1 ab		[!
-0 1 ab		[-
-1 1 -		[-]
-0 1 -		[a-
-0 1 -		[!a-
-1 1 -		[--A]
-1 1 5		[--A]
-1 1 ' '		'[ --]'
-1 1 $		'[ --]'
-1 1 -		'[ --]'
-0 1 0		'[ --]'
-1 1 -		[---]
-1 1 -		[------]
-0 1 j		[a-e-n]
-1 1 -		[a-e-n]
-1 1 a		[!------]
-0 1 [		[]-a]
-1 1 ^		[]-a]
-0 1 ^		[!]-a]
-1 1 [		[!]-a]
-1 1 ^		[a^bc]
-1 1 -b]		[a-]b]
-0 1 \		[\]
-1 1 \		[\\]
-0 1 \		[!\\]
-1 1 G		[A-\\]
-0 1 aaabbb	b*a
-0 1 aabcaa	*ba*
-1 1 ,		[,]
-1 1 ,		[\\,]
-1 1 \		[\\,]
-1 1 -		[,-.]
-0 1 +		[,-.]
-0 1 -.]		[,-.]
-1 1 2		[\1-\3]
-1 1 3		[\1-\3]
-0 1 4		[\1-\3]
-1 1 \		[[-\]]
-1 1 [		[[-\]]
-1 1 ]		[[-\]]
-0 1 -		[[-\]]
-
-# Test recursion and the abort code (use "wildtest -i" to see iteration counts)
-1 1 -adobe-courier-bold-o-normal--12-120-75-75-m-70-iso8859-1	-*-*-*-*-*-*-12-*-*-*-m-*-*-*
-0 1 -adobe-courier-bold-o-normal--12-120-75-75-X-70-iso8859-1	-*-*-*-*-*-*-12-*-*-*-m-*-*-*
-0 1 -adobe-courier-bold-o-normal--12-120-75-75-/-70-iso8859-1	-*-*-*-*-*-*-12-*-*-*-m-*-*-*
-1 1 /adobe/courier/bold/o/normal//12/120/75/75/m/70/iso8859/1	/*/*/*/*/*/*/12/*/*/*/m/*/*/*
-0 1 /adobe/courier/bold/o/normal//12/120/75/75/X/70/iso8859/1	/*/*/*/*/*/*/12/*/*/*/m/*/*/*
-1 1 abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt		**/*a*b*g*n*t
-0 1 abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txtz		**/*a*b*g*n*t
diff --git a/test-wildmatch.c b/test-wildmatch.c
new file mode 100644
index 0000000..08962d5
--- /dev/null
+++ b/test-wildmatch.c
@@ -0,0 +1,14 @@
+#include "cache.h"
+#include "wildmatch.h"
+
+int main(int argc, char **argv)
+{
+	if (!strcmp(argv[1], "wildmatch"))
+		return wildmatch(argv[3], argv[2]) ? 0 : 1;
+	else if (!strcmp(argv[1], "iwildmatch"))
+		return iwildmatch(argv[3], argv[2]) ? 0 : 1;
+	else if (!strcmp(argv[1], "fnmatch"))
+		return fnmatch(argv[3], argv[2], FNM_PATHNAME);
+	else
+		return 1;
+}
diff --git a/wildmatch.c b/wildmatch.c
index 71dba76..7b64a6b 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -9,7 +9,13 @@
 **  work differently than '*', and to fix the character-class code.
 */
 
-#include "rsync.h"
+#include <stddef.h>
+#include <ctype.h>
+#include <string.h>
+
+#include "wildmatch.h"
+
+typedef unsigned char uchar;
 
 /* What character marks an inverted character class? */
 #define NEGATE_CLASS	'!'
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/8] wildmatch: remove static variable force_lower_case
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2012-10-09  3:09 ` [PATCH 3/8] Integrate wildmatch to git Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09 20:47   ` Junio C Hamano
  2012-10-09  3:09 ` [PATCH 5/8] wildmatch: fix case-insensitive matching Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

One place less to worry about thread safety. Also combine wildmatch
and iwildmatch into one.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 test-wildmatch.c |  4 ++--
 wildmatch.c      | 23 ++++++-----------------
 wildmatch.h      |  3 +--
 3 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/test-wildmatch.c b/test-wildmatch.c
index 08962d5..5c18cf8 100644
--- a/test-wildmatch.c
+++ b/test-wildmatch.c
@@ -4,9 +4,9 @@
 int main(int argc, char **argv)
 {
 	if (!strcmp(argv[1], "wildmatch"))
-		return wildmatch(argv[3], argv[2]) ? 0 : 1;
+		return wildmatch(argv[3], argv[2], 0) ? 0 : 1;
 	else if (!strcmp(argv[1], "iwildmatch"))
-		return iwildmatch(argv[3], argv[2]) ? 0 : 1;
+		return wildmatch(argv[3], argv[2], FNM_CASEFOLD) ? 0 : 1;
 	else if (!strcmp(argv[1], "fnmatch"))
 		return fnmatch(argv[3], argv[2], FNM_PATHNAME);
 	else
diff --git a/wildmatch.c b/wildmatch.c
index 7b64a6b..2382873 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -11,8 +11,8 @@
 
 #include <stddef.h>
 #include <ctype.h>
-#include <string.h>
 
+#include "cache.h"
 #include "wildmatch.h"
 
 typedef unsigned char uchar;
@@ -59,11 +59,9 @@ typedef unsigned char uchar;
 #define ISUPPER(c) (ISASCII(c) && isupper(c))
 #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c))
 
-static int force_lower_case = 0;
-
 /* Match pattern "p" against the a virtually-joined string consisting
  * of "text" and any strings in array "a". */
-static int dowild(const uchar *p, const uchar *text)
+static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 {
     uchar p_ch;
 
@@ -107,7 +105,7 @@ static int dowild(const uchar *p, const uchar *text)
 	    while (1) {
 		if (t_ch == '\0')
 		    break;
-		if ((matched = dowild(p, text)) != FALSE) {
+		if ((matched = dowild(p, text, force_lower_case)) != FALSE) {
 		    if (!special || matched != ABORT_TO_STARSTAR)
 			return matched;
 		} else if (!special && t_ch == '/')
@@ -215,17 +213,8 @@ static int dowild(const uchar *p, const uchar *text)
 }
 
 /* Match the "pattern" against the "text" string. */
-int wildmatch(const char *pattern, const char *text)
-{
-    return dowild((const uchar*)pattern, (const uchar*)text) == TRUE;
-}
-
-/* Match the "pattern" against the forced-to-lower-case "text" string. */
-int iwildmatch(const char *pattern, const char *text)
+int wildmatch(const char *pattern, const char *text, int flags)
 {
-    int ret;
-    force_lower_case = 1;
-    ret = dowild((const uchar*)pattern, (const uchar*)text) == TRUE;
-    force_lower_case = 0;
-    return ret;
+    return dowild((const uchar*)pattern, (const uchar*)text,
+		  flags & FNM_CASEFOLD ? 1 : 0) == TRUE;
 }
diff --git a/wildmatch.h b/wildmatch.h
index 562faa3..e974f9a 100644
--- a/wildmatch.h
+++ b/wildmatch.h
@@ -1,4 +1,3 @@
 /* wildmatch.h */
 
-int wildmatch(const char *pattern, const char *text);
-int iwildmatch(const char *pattern, const char *text);
+int wildmatch(const char *pattern, const char *text, int flags);
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/8] wildmatch: fix case-insensitive matching
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2012-10-09  3:09 ` [PATCH 4/8] wildmatch: remove static variable force_lower_case Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 6/8] wildmatch: adjust "**" behavior Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

dowild() does case insensitive matching by lower-casing the text. That
means lower case letters in patterns imply case-insensitive matching,
but upper case means exact matching.

We do not want that subtlety. Lower case pattern too so iwildmatch()
always does what we expect it to do.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 wildmatch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/wildmatch.c b/wildmatch.c
index 2382873..fdb8cb1 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -72,6 +72,8 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 		return ABORT_ALL;
 	if (force_lower_case && ISUPPER(t_ch))
 	    t_ch = tolower(t_ch);
+	if (force_lower_case && ISUPPER(p_ch))
+	    p_ch = tolower(p_ch);
 	switch (p_ch) {
 	  case '\\':
 	    /* Literal match with following character.  Note that the test
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/8] wildmatch: adjust "**" behavior
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2012-10-09  3:09 ` [PATCH 5/8] wildmatch: fix case-insensitive matching Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 7/8] wildmatch: make /**/ match zero or more directories Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 8/8] Support "**" wildcard in .gitignore and .gitattributes Nguyễn Thái Ngọc Duy
  7 siblings, 0 replies; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

Standard wildmatch() sees consecutive asterisks as "*" that can also
match slashes. But that may be hard to explain to users as
"abc/**/def" can match "abcdef", "abcxyzdef", "abc/def", "abc/x/def",
"abc/x/y/def"...

This patch changes wildmatch so that users can do

- "**/def" -> all paths ending with file/directory 'def'
- "abc/**" - equivalent to "/abc/"
- "abc/**/def" -> "abc/x/def", "abc/x/y/def"...
- other "**" cases are downgraded to normal "*"

Basically the magic of "**" only remains if it's wrapped around by
slashes.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t3070-wildmatch.sh | 2 +-
 wildmatch.c          | 8 +++++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index bb92f8d..d320f84 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -51,7 +51,7 @@ match 1 1 ']' ']'
 
 # Extended slash-matching features
 match 0 0 'foo/baz/bar' 'foo*bar'
-match 1 0 'foo/baz/bar' 'foo**bar'
+match 0 0 'foo/baz/bar' 'foo**bar'
 match 0 0 'foo/bar' 'foo?bar'
 match 0 0 'foo/bar' 'foo[/]bar'
 match 0 0 'foo/bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
diff --git a/wildmatch.c b/wildmatch.c
index fdb8cb1..1b39346 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -91,8 +91,14 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 	    continue;
 	  case '*':
 	    if (*++p == '*') {
+		const uchar *prev_p = p - 2;
 		while (*++p == '*') {}
-		special = TRUE;
+		if ((prev_p == text || *prev_p == '/') ||
+		    (*p == '\0' || *p == '/' ||
+		     (p[0] == '\\' && p[1] == '/'))) {
+		    special = TRUE;
+		} else
+		    special = FALSE;
 	    } else
 		special = FALSE;
 	    if (*p == '\0') {
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 7/8] wildmatch: make /**/ match zero or more directories
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2012-10-09  3:09 ` [PATCH 6/8] wildmatch: adjust "**" behavior Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09  3:09 ` [PATCH 8/8] Support "**" wildcard in .gitignore and .gitattributes Nguyễn Thái Ngọc Duy
  7 siblings, 0 replies; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

"foo/**/bar" matches "foo/x/bar", "foo/x/y/bar"... but not
"foo/bar". We make a special case, when foo/**/ is detected (and
"foo/" part is already matched), try matching "bar" with the rest of
the string.

"Match one or more directories" semantics can be easily achieved using
"foo/*/**/bar".

This also makes "**/foo" match "foo" in addition to "x/foo",
"x/y/foo"..

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t3070-wildmatch.sh |  8 +++++++-
 wildmatch.c          | 17 +++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index d320f84..a247a36 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -52,11 +52,17 @@ match 1 1 ']' ']'
 # Extended slash-matching features
 match 0 0 'foo/baz/bar' 'foo*bar'
 match 0 0 'foo/baz/bar' 'foo**bar'
+match 1 1 'foo/baz/bar' 'foo/**/bar'
+match 1 0 'foo/baz/bar' 'foo/**/**/bar'
+match 1 0 'foo/b/a/z/bar' 'foo/**/bar'
+match 1 0 'foo/b/a/z/bar' 'foo/**/**/bar'
+match 1 0 'foo/bar' 'foo/**/bar'
+match 1 0 'foo/bar' 'foo/**/**/bar'
 match 0 0 'foo/bar' 'foo?bar'
 match 0 0 'foo/bar' 'foo[/]bar'
 match 0 0 'foo/bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
 match 1 1 'foo-bar' 'f[^eiu][^eiu][^eiu][^eiu][^eiu]r'
-match 0 0 'foo' '**/foo'
+match 1 0 'foo' '**/foo'
 match 1 1 '/foo' '**/foo'
 match 1 0 'bar/baz/foo' '**/foo'
 match 0 0 'bar/baz/foo' '*/foo'
diff --git a/wildmatch.c b/wildmatch.c
index 1b39346..4069b2d 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -96,6 +96,23 @@ static int dowild(const uchar *p, const uchar *text, int force_lower_case)
 		if ((prev_p == text || *prev_p == '/') ||
 		    (*p == '\0' || *p == '/' ||
 		     (p[0] == '\\' && p[1] == '/'))) {
+			/*
+			 * Assuming we already match 'foo/' and are at
+			 * <star star slash>, just assume it matches
+			 * nothing and go ahead match the rest of the
+			 * pattern with the remaining string. This
+			 * helps make foo/<*><*>/bar (<> because
+			 * otherwise it breaks C comment syntax) match
+			 * both foo/bar and foo/a/bar.
+			 *
+			 * Crazy patterns like /<*><*>/<*><*>/ are
+			 * treated like /<*><*>/. But undefined
+			 * behavior is even appropriate for people
+			 * writing such a pattern.
+			 */
+			if (p[0] == '/' &&
+			    dowild(p + 1, text, force_lower_case) == TRUE)
+				return TRUE;
 		    special = TRUE;
 		} else
 		    special = FALSE;
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 8/8] Support "**" wildcard in .gitignore and .gitattributes
  2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2012-10-09  3:09 ` [PATCH 7/8] wildmatch: make /**/ match zero or more directories Nguyễn Thái Ngọc Duy
@ 2012-10-09  3:09 ` Nguyễn Thái Ngọc Duy
  2012-10-09  7:57   ` Michael Haggerty
  7 siblings, 1 reply; 15+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-10-09  3:09 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/gitignore.txt        | 19 +++++++++++++++++++
 attr.c                             |  4 +++-
 dir.c                              |  4 +++-
 t/t0003-attributes.sh              | 38 ++++++++++++++++++++++++++++++++++++++
 t/t3001-ls-files-others-exclude.sh | 19 +++++++++++++++++++
 5 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index 96639e0..5a9c9f7 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -104,6 +104,25 @@ PATTERN FORMAT
    For example, "/{asterisk}.c" matches "cat-file.c" but not
    "mozilla-sha1/sha1.c".
 
+Two consecutive asterisks ("`**`") in patterns matched against
+full pathname may have special meaning:
+
+ - A leading "`**`" followed by a slash means match in all
+   directories. For example, "`**/foo`" matches file or directory
+   "`foo`" anywhere, the same as pattern "`foo`". "**/foo/bar"
+   matches file or directory "`bar`" anywhere that is directly
+   under directory "`foo`".
+
+ - A trailing "/**" matches everything inside. For example,
+   "abc/**" is equivalent to "`/abc/`".
+
+ - A slash followed by two consecutive asterisks then a slash
+   matches zero or more directories. For example, "`a/**/b`"
+   matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
+
+ - Consecutive asterisks otherwise are treated like normal
+   asterisk wildcards.
+
 NOTES
 -----
 
diff --git a/attr.c b/attr.c
index 887a9ae..e85e5ed 100644
--- a/attr.c
+++ b/attr.c
@@ -12,6 +12,7 @@
 #include "exec_cmd.h"
 #include "attr.h"
 #include "dir.h"
+#include "wildmatch.h"
 
 const char git_attr__true[] = "(builtin)true";
 const char git_attr__false[] = "\0(builtin)false";
@@ -666,7 +667,8 @@ static int path_matches(const char *pathname, int pathlen,
 		return 0;
 	if (baselen != 0)
 		baselen++;
-	return fnmatch_icase(pattern, pathname + baselen, FNM_PATHNAME) == 0;
+	return wildmatch(pattern, pathname + baselen,
+			 ignore_case ? FNM_CASEFOLD : 0);
 }
 
 static int macroexpand_one(int attr_nr, int rem);
diff --git a/dir.c b/dir.c
index 4868339..dc721c0 100644
--- a/dir.c
+++ b/dir.c
@@ -8,6 +8,7 @@
 #include "cache.h"
 #include "dir.h"
 #include "refs.h"
+#include "wildmatch.h"
 
 struct path_simplify {
 	int len;
@@ -575,7 +576,8 @@ int excluded_from_list(const char *pathname,
 			namelen -= prefix;
 		}
 
-		if (!namelen || !fnmatch_icase(exclude, name, FNM_PATHNAME))
+		if (!namelen ||
+		    wildmatch(exclude, name, ignore_case ? FNM_CASEFOLD : 0))
 			return to_exclude;
 	}
 	return -1; /* undecided */
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index febc45c..67a5694 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -232,4 +232,42 @@ test_expect_success 'bare repository: test info/attributes' '
 	attr_check subdir/a/i unspecified
 '
 
+test_expect_success '"**" test' '
+	cd .. &&
+	echo "**/f foo=bar" >.gitattributes &&
+	cat <<\EOF >expect &&
+f: foo: bar
+a/f: foo: bar
+a/b/f: foo: bar
+a/b/c/f: foo: bar
+EOF
+	git check-attr foo -- "f" >actual 2>err &&
+	git check-attr foo -- "a/f" >>actual 2>>err &&
+	git check-attr foo -- "a/b/f" >>actual 2>>err &&
+	git check-attr foo -- "a/b/c/f" >>actual 2>>err &&
+	test_cmp expect actual &&
+	test_line_count = 0 err
+'
+
+test_expect_success '"**" with no slashes test' '
+	echo "a**f foo=bar" >.gitattributes &&
+	git check-attr foo -- "f" >actual &&
+	cat <<\EOF >expect &&
+f: foo: unspecified
+af: foo: bar
+axf: foo: bar
+a/f: foo: unspecified
+a/b/f: foo: unspecified
+a/b/c/f: foo: unspecified
+EOF
+	git check-attr foo -- "f" >actual 2>err &&
+	git check-attr foo -- "af" >>actual 2>err &&
+	git check-attr foo -- "axf" >>actual 2>err &&
+	git check-attr foo -- "a/f" >>actual 2>>err &&
+	git check-attr foo -- "a/b/f" >>actual 2>>err &&
+	git check-attr foo -- "a/b/c/f" >>actual 2>>err &&
+	test_cmp expect actual &&
+	test_line_count = 0 err
+'
+
 test_done
diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
index c8fe978..278315d 100755
--- a/t/t3001-ls-files-others-exclude.sh
+++ b/t/t3001-ls-files-others-exclude.sh
@@ -214,4 +214,23 @@ test_expect_success 'subdirectory ignore (l1)' '
 	test_cmp expect actual
 '
 
+
+test_expect_success 'ls-files with "**" patterns' '
+	cat <<\EOF >expect &&
+a.1
+one/a.1
+one/two/a.1
+three/a.1
+EOF
+	git ls-files -o -i --exclude "**/a.1" >actual
+	test_cmp expect actual
+'
+
+
+test_expect_success 'ls-files with "**" patterns and no slashes' '
+	: >expect &&
+	git ls-files -o -i --exclude "one**a.1" >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.8.0.rc0.29.g1fdd78f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 8/8] Support "**" wildcard in .gitignore and .gitattributes
  2012-10-09  3:09 ` [PATCH 8/8] Support "**" wildcard in .gitignore and .gitattributes Nguyễn Thái Ngọc Duy
@ 2012-10-09  7:57   ` Michael Haggerty
  2012-10-10  5:40     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Haggerty @ 2012-10-09  7:57 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

I like how this series is going and it's going to be a nice new feature.
 Some comments...

It would be helpful if you would use

    --subject-prefix='PATCH v3'

etc. to help spectators keep track of the different versions of your
patch series.

On 10/09/2012 05:09 AM, Nguyễn Thái Ngọc Duy wrote:
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  Documentation/gitignore.txt        | 19 +++++++++++++++++++
>  attr.c                             |  4 +++-
>  dir.c                              |  4 +++-
>  t/t0003-attributes.sh              | 38 ++++++++++++++++++++++++++++++++++++++
>  t/t3001-ls-files-others-exclude.sh | 19 +++++++++++++++++++
>  5 files changed, 82 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
> index 96639e0..5a9c9f7 100644
> --- a/Documentation/gitignore.txt
> +++ b/Documentation/gitignore.txt
> @@ -104,6 +104,25 @@ PATTERN FORMAT
>     For example, "/{asterisk}.c" matches "cat-file.c" but not
>     "mozilla-sha1/sha1.c".
>  
> +Two consecutive asterisks ("`**`") in patterns matched against
> +full pathname may have special meaning:
> +
> + - A leading "`**`" followed by a slash means match in all
> +   directories. For example, "`**/foo`" matches file or directory
> +   "`foo`" anywhere, the same as pattern "`foo`". "**/foo/bar"
> +   matches file or directory "`bar`" anywhere that is directly
> +   under directory "`foo`".
> +
> + - A trailing "/**" matches everything inside. For example,
> +   "abc/**" is equivalent to "`/abc/`".

It seems odd that you add a leading slash in this example.  I assume
that is because of the rule that a pattern containing a slash is
considered anchored at the current directory.  But I find it confusing
because the addition of the leading slash is not part of the rule you
are trying to illustrate here, and is therefore a distraction.  I
suggest that you write either

- A trailing "/**" matches everything inside. For example,
  "/abc/**" is equivalent to "`/abc/`".

or

- A trailing "/**" matches everything inside. For example,
  "abc/**" is equivalent to "`abc/`" (which is also equivalent
  to "`/abc/`").

> +
> + - A slash followed by two consecutive asterisks then a slash
> +   matches zero or more directories. For example, "`a/**/b`"
> +   matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
> +
> + - Consecutive asterisks otherwise are treated like normal
> +   asterisk wildcards.
> +

I don't like the last rule.  (1) This construct is superfluous; why
wouldn't the user just use a single asterisk?  (2) Allowing this
construct means that it could appear in .gitignore files, creating
unnecessary confusion: extrapolating from the other meanings of "**"
users would expect that it would somehow match slashes.  (3) It is
conceivable (though admittedly unlikely) that we might want to assign a
distinct meaning to this construct in the future, and accepting it now
as a different way to spell "*" would prevent such a change.

Perhaps this rule was intended for backwards compatibility?

I think it would be preferable to say that other uses of consecutive
asterisks are undefined, and probably make them trigger a warning.

>  NOTES
>  -----
>  
> diff --git a/attr.c b/attr.c
> index 887a9ae..e85e5ed 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -12,6 +12,7 @@
>  #include "exec_cmd.h"
>  #include "attr.h"
>  #include "dir.h"
> +#include "wildmatch.h"
>  
>  const char git_attr__true[] = "(builtin)true";
>  const char git_attr__false[] = "\0(builtin)false";
> @@ -666,7 +667,8 @@ static int path_matches(const char *pathname, int pathlen,
>  		return 0;
>  	if (baselen != 0)
>  		baselen++;
> -	return fnmatch_icase(pattern, pathname + baselen, FNM_PATHNAME) == 0;
> +	return wildmatch(pattern, pathname + baselen,
> +			 ignore_case ? FNM_CASEFOLD : 0);
>  }
>  
>  static int macroexpand_one(int attr_nr, int rem);
> diff --git a/dir.c b/dir.c
> index 4868339..dc721c0 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -8,6 +8,7 @@
>  #include "cache.h"
>  #include "dir.h"
>  #include "refs.h"
> +#include "wildmatch.h"
>  
>  struct path_simplify {
>  	int len;
> @@ -575,7 +576,8 @@ int excluded_from_list(const char *pathname,
>  			namelen -= prefix;
>  		}
>  
> -		if (!namelen || !fnmatch_icase(exclude, name, FNM_PATHNAME))
> +		if (!namelen ||
> +		    wildmatch(exclude, name, ignore_case ? FNM_CASEFOLD : 0))
>  			return to_exclude;
>  	}
>  	return -1; /* undecided */
> diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
> index febc45c..67a5694 100755
> --- a/t/t0003-attributes.sh
> +++ b/t/t0003-attributes.sh
> @@ -232,4 +232,42 @@ test_expect_success 'bare repository: test info/attributes' '
>  	attr_check subdir/a/i unspecified
>  '
>  
> +test_expect_success '"**" test' '
> +	cd .. &&
> +	echo "**/f foo=bar" >.gitattributes &&
> +	cat <<\EOF >expect &&
> +f: foo: bar
> +a/f: foo: bar
> +a/b/f: foo: bar
> +a/b/c/f: foo: bar
> +EOF
> +	git check-attr foo -- "f" >actual 2>err &&
> +	git check-attr foo -- "a/f" >>actual 2>>err &&
> +	git check-attr foo -- "a/b/f" >>actual 2>>err &&
> +	git check-attr foo -- "a/b/c/f" >>actual 2>>err &&
> +	test_cmp expect actual &&
> +	test_line_count = 0 err
> +'
> +
> +test_expect_success '"**" with no slashes test' '
> +	echo "a**f foo=bar" >.gitattributes &&
> +	git check-attr foo -- "f" >actual &&
> +	cat <<\EOF >expect &&
> +f: foo: unspecified
> +af: foo: bar
> +axf: foo: bar
> +a/f: foo: unspecified
> +a/b/f: foo: unspecified
> +a/b/c/f: foo: unspecified
> +EOF
> +	git check-attr foo -- "f" >actual 2>err &&
> +	git check-attr foo -- "af" >>actual 2>err &&
> +	git check-attr foo -- "axf" >>actual 2>err &&
> +	git check-attr foo -- "a/f" >>actual 2>>err &&
> +	git check-attr foo -- "a/b/f" >>actual 2>>err &&
> +	git check-attr foo -- "a/b/c/f" >>actual 2>>err &&
> +	test_cmp expect actual &&
> +	test_line_count = 0 err
> +'
> +
>  test_done
> diff --git a/t/t3001-ls-files-others-exclude.sh b/t/t3001-ls-files-others-exclude.sh
> index c8fe978..278315d 100755
> --- a/t/t3001-ls-files-others-exclude.sh
> +++ b/t/t3001-ls-files-others-exclude.sh
> @@ -214,4 +214,23 @@ test_expect_success 'subdirectory ignore (l1)' '
>  	test_cmp expect actual
>  '
>  
> +
> +test_expect_success 'ls-files with "**" patterns' '
> +	cat <<\EOF >expect &&
> +a.1
> +one/a.1
> +one/two/a.1
> +three/a.1
> +EOF
> +	git ls-files -o -i --exclude "**/a.1" >actual
> +	test_cmp expect actual
> +'
> +
> +
> +test_expect_success 'ls-files with "**" patterns and no slashes' '
> +	: >expect &&
> +	git ls-files -o -i --exclude "one**a.1" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done
> 

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/8] wildmatch: remove static variable force_lower_case
  2012-10-09  3:09 ` [PATCH 4/8] wildmatch: remove static variable force_lower_case Nguyễn Thái Ngọc Duy
@ 2012-10-09 20:47   ` Junio C Hamano
  2012-10-10  5:14     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2012-10-09 20:47 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/wildmatch.c b/wildmatch.c
> index 7b64a6b..2382873 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -11,8 +11,8 @@
>  
>  #include <stddef.h>
>  #include <ctype.h>
> -#include <string.h>
>  
> +#include "cache.h"
>  #include "wildmatch.h"

This is wrong; the includes from the system headers should have
been removed in the previous step where the series "integrated"
wildmatch to git, after which point the first include any C source
that is not at the platform-compatibility layer should be cache.h
or git-compat-util.h.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/8] wildmatch: remove static variable force_lower_case
  2012-10-09 20:47   ` Junio C Hamano
@ 2012-10-10  5:14     ` Nguyen Thai Ngoc Duy
  2012-10-10  5:31       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10  5:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Oct 10, 2012 at 3:47 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> diff --git a/wildmatch.c b/wildmatch.c
>> index 7b64a6b..2382873 100644
>> --- a/wildmatch.c
>> +++ b/wildmatch.c
>> @@ -11,8 +11,8 @@
>>
>>  #include <stddef.h>
>>  #include <ctype.h>
>> -#include <string.h>
>>
>> +#include "cache.h"
>>  #include "wildmatch.h"
>
> This is wrong; the includes from the system headers should have
> been removed in the previous step where the series "integrated"
> wildmatch to git, after which point the first include any C source
> that is not at the platform-compatibility layer should be cache.h
> or git-compat-util.h.

Git's ctype does not seem to be complete for wildmatch's use so
ctype.h is required. But that can be easily fixed later on.
-- 
Duy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/8] wildmatch: remove static variable force_lower_case
  2012-10-10  5:14     ` Nguyen Thai Ngoc Duy
@ 2012-10-10  5:31       ` Junio C Hamano
  2012-10-10  5:47         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2012-10-10  5:31 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> Git's ctype does not seem to be complete for wildmatch's use so
> ctype.h is required. But that can be easily fixed later on.

Until "later on", I cannot even compile the series.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 8/8] Support "**" wildcard in .gitignore and .gitattributes
  2012-10-09  7:57   ` Michael Haggerty
@ 2012-10-10  5:40     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 15+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10  5:40 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git

On Tue, Oct 9, 2012 at 2:57 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> + - A trailing "/**" matches everything inside. For example,
>> +   "abc/**" is equivalent to "`/abc/`".
>
> It seems odd that you add a leading slash in this example.  I assume
> that is because of the rule that a pattern containing a slash is
> considered anchored at the current directory. But I find it confusing
> because the addition of the leading slash is not part of the rule you
> are trying to illustrate here, and is therefore a distraction.  I
> suggest that you write either
>
> - A trailing "/**" matches everything inside. For example,
>   "/abc/**" is equivalent to "`/abc/`".
>
> or
>
> - A trailing "/**" matches everything inside. For example,
>   "abc/**" is equivalent to "`abc/`" (which is also equivalent
>   to "`/abc/`").

The tricky thing in .gitignore is that the last '/' alone does not
imply anchor. So "abc/" means match _directory_ abc anywhere in
worktree. So the former is probably better. I should also add a note
here (or in gitattributes.txt) about the difference between "/abc/*"
and "/abc/**". The former assigns attributes to all files directly
under abc (e.g. depth 1), the latter infinite depth.

>> + - A slash followed by two consecutive asterisks then a slash
>> +   matches zero or more directories. For example, "`a/**/b`"
>> +   matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
>> +
>> + - Consecutive asterisks otherwise are treated like normal
>> +   asterisk wildcards.
>> +
>
> I don't like the last rule.  (1) This construct is superfluous; why
> wouldn't the user just use a single asterisk?  (2) Allowing this
> construct means that it could appear in .gitignore files, creating
> unnecessary confusion: extrapolating from the other meanings of "**"
> users would expect that it would somehow match slashes.  (3) It is
> conceivable (though admittedly unlikely) that we might want to assign a
> distinct meaning to this construct in the future, and accepting it now
> as a different way to spell "*" would prevent such a change.
>
> Perhaps this rule was intended for backwards compatibility?

We break backwards compatibility already. Existing "**/" or "/**"
patterns now behave differently.

> I think it would be preferable to say that other uses of consecutive
> asterisks are undefined, and probably make them trigger a warning.

Instead of undefined, we can reject the pattern as "broken". I have to
check how fnmatch/wildmatch deals with broken patterns (it must do).
If it returns a different code for broken patterns, then we can warn
users, which is not limited in just "**" breakage.
-- 
Duy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/8] wildmatch: remove static variable force_lower_case
  2012-10-10  5:31       ` Junio C Hamano
@ 2012-10-10  5:47         ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 15+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-10-10  5:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Oct 10, 2012 at 12:31 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> Git's ctype does not seem to be complete for wildmatch's use so
>> ctype.h is required. But that can be easily fixed later on.
>
> Until "later on", I cannot even compile the series.

So that's why you noticed this patch :) It builds fine here. I'll fix
up and send an update later.
-- 
Duy

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-10-10  5:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-09  3:08 [PATCH 0/8] wildmatch take 3 Nguyễn Thái Ngọc Duy
2012-10-09  3:09 ` [PATCH 1/8] Import wildmatch from rsync Nguyễn Thái Ngọc Duy
2012-10-09  3:09 ` [PATCH 2/8] wildmatch: remove unnecessary functions Nguyễn Thái Ngọc Duy
2012-10-09  3:09 ` [PATCH 3/8] Integrate wildmatch to git Nguyễn Thái Ngọc Duy
2012-10-09  3:09 ` [PATCH 4/8] wildmatch: remove static variable force_lower_case Nguyễn Thái Ngọc Duy
2012-10-09 20:47   ` Junio C Hamano
2012-10-10  5:14     ` Nguyen Thai Ngoc Duy
2012-10-10  5:31       ` Junio C Hamano
2012-10-10  5:47         ` Nguyen Thai Ngoc Duy
2012-10-09  3:09 ` [PATCH 5/8] wildmatch: fix case-insensitive matching Nguyễn Thái Ngọc Duy
2012-10-09  3:09 ` [PATCH 6/8] wildmatch: adjust "**" behavior Nguyễn Thái Ngọc Duy
2012-10-09  3:09 ` [PATCH 7/8] wildmatch: make /**/ match zero or more directories Nguyễn Thái Ngọc Duy
2012-10-09  3:09 ` [PATCH 8/8] Support "**" wildcard in .gitignore and .gitattributes Nguyễn Thái Ngọc Duy
2012-10-09  7:57   ` Michael Haggerty
2012-10-10  5:40     ` Nguyen Thai Ngoc Duy

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).