git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] fix "git add" pattern matching
@ 2009-01-14 14:54 Clemens Buchacher
  2009-01-14 14:54 ` [PATCH 1/3] clean up pathspec matching Clemens Buchacher
  0 siblings, 1 reply; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-14 14:54 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, johannes

Hi,

Johannes Schneider reported that "git add '*'" did not add matching files.
And indeed pattern matching for tracked files is broken since 1.6.1. I've
never used this feature myself, but I can imagine it's useful in some
situations. For example, you can do "git add '*.c'" to add all .c files. The
equivalent command without pattern matching would be
"find .  -name '*.c' | xargs git add".

[PATCH 1/3] clean up pathspec matching
[PATCH 2/3] remove pathspec_match, use match_pathspec instead
[PATCH 3/3] implement pattern matching in ce_path_match

This patch series fixes pattern matching for "git add". The first two
patches are cleanups only. PATCH 3/3 then implements pattern matching in
ce_path_match. It is very intrusive in the sense that all commands using
ce_path_match will now have pattern matching. I suppose this is good in
general, but can also have undesired side-effects, as in "git log --follow
'*'", for example. I'd appreciate some double-checking in that context.

Cheers,
Clemens

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] clean up pathspec matching
  2009-01-14 14:54 [PATCH 0/3] fix "git add" pattern matching Clemens Buchacher
@ 2009-01-14 14:54 ` Clemens Buchacher
  2009-01-14 14:54   ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Clemens Buchacher
  0 siblings, 1 reply; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-14 14:54 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, johannes, Clemens Buchacher

If pathspec already matched exactly, it cannot match any more.
Originally, we had to continue anyways, because we did not
differentiate between exact, recursive and globbing matches.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---
 dir.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/dir.c b/dir.c
index 7c59829..87a9758 100644
--- a/dir.c
+++ b/dir.c
@@ -118,7 +118,7 @@ int match_pathspec(const char **pathspec, const char *name, int namelen, int pre
 
 	for (retval = 0; (match = *pathspec++) != NULL; seen++) {
 		int how;
-		if (retval && *seen == MATCHED_EXACTLY)
+		if (*seen == MATCHED_EXACTLY)
 			continue;
 		match += prefix;
 		how = match_one(match, name, namelen);
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] remove pathspec_match, use match_pathspec instead
  2009-01-14 14:54 ` [PATCH 1/3] clean up pathspec matching Clemens Buchacher
@ 2009-01-14 14:54   ` Clemens Buchacher
  2009-01-14 14:54     ` [PATCH 3/3] implement pattern matching in ce_path_match Clemens Buchacher
  2009-01-14 15:40     ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Johannes Schindelin
  0 siblings, 2 replies; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-14 14:54 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, johannes, Clemens Buchacher

Both versions have the same functionality. This removes any
redundancy.

This also adds makes two extensions to match_pathspec:

- If pathspec is NULL, return 1. This reflects the behavior of git
  commands, for which no paths usually means "match all paths".

- If seen is NULL, do not use it.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---
 builtin-checkout.c |    6 +++---
 builtin-commit.c   |    2 +-
 builtin-ls-files.c |   40 ++--------------------------------------
 cache.h            |    1 -
 dir.c              |   19 +++++++++++--------
 5 files changed, 17 insertions(+), 51 deletions(-)

diff --git a/builtin-checkout.c b/builtin-checkout.c
index b5dd9c0..84a2825 100644
--- a/builtin-checkout.c
+++ b/builtin-checkout.c
@@ -240,7 +240,7 @@ static int checkout_paths(struct tree *source_tree, const char **pathspec,
 
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
-		pathspec_match(pathspec, ps_matched, ce->name, 0);
+		match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, ps_matched);
 	}
 
 	if (report_path_error(ps_matched, pathspec, 0))
@@ -249,7 +249,7 @@ static int checkout_paths(struct tree *source_tree, const char **pathspec,
 	/* Any unmerged paths? */
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
-		if (pathspec_match(pathspec, NULL, ce->name, 0)) {
+		if (match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, NULL)) {
 			if (!ce_stage(ce))
 				continue;
 			if (opts->force) {
@@ -274,7 +274,7 @@ static int checkout_paths(struct tree *source_tree, const char **pathspec,
 	state.refresh_cache = 1;
 	for (pos = 0; pos < active_nr; pos++) {
 		struct cache_entry *ce = active_cache[pos];
-		if (pathspec_match(pathspec, NULL, ce->name, 0)) {
+		if (match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, NULL)) {
 			if (!ce_stage(ce)) {
 				errs |= checkout_entry(ce, &state, NULL);
 				continue;
diff --git a/builtin-commit.c b/builtin-commit.c
index 7cf227a..913aa89 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -166,7 +166,7 @@ static int list_paths(struct string_list *list, const char *with_tree,
 		struct cache_entry *ce = active_cache[i];
 		if (ce->ce_flags & CE_UPDATE)
 			continue;
-		if (!pathspec_match(pattern, m, ce->name, 0))
+		if (!match_pathspec(pattern, ce->name, ce_namelen(ce), 0, m))
 			continue;
 		string_list_insert(ce->name, list);
 	}
diff --git a/builtin-ls-files.c b/builtin-ls-files.c
index f72eb85..3434031 100644
--- a/builtin-ls-files.c
+++ b/builtin-ls-files.c
@@ -36,42 +36,6 @@ static const char *tag_other = "";
 static const char *tag_killed = "";
 static const char *tag_modified = "";
 
-
-/*
- * Match a pathspec against a filename. The first "skiplen" characters
- * are the common prefix
- */
-int pathspec_match(const char **spec, char *ps_matched,
-		   const char *filename, int skiplen)
-{
-	const char *m;
-
-	while ((m = *spec++) != NULL) {
-		int matchlen = strlen(m + skiplen);
-
-		if (!matchlen)
-			goto matched;
-		if (!strncmp(m + skiplen, filename + skiplen, matchlen)) {
-			if (m[skiplen + matchlen - 1] == '/')
-				goto matched;
-			switch (filename[skiplen + matchlen]) {
-			case '/': case '\0':
-				goto matched;
-			}
-		}
-		if (!fnmatch(m + skiplen, filename + skiplen, 0))
-			goto matched;
-		if (ps_matched)
-			ps_matched++;
-		continue;
-	matched:
-		if (ps_matched)
-			*ps_matched = 1;
-		return 1;
-	}
-	return 0;
-}
-
 static void show_dir_entry(const char *tag, struct dir_entry *ent)
 {
 	int len = prefix_len;
@@ -80,7 +44,7 @@ static void show_dir_entry(const char *tag, struct dir_entry *ent)
 	if (len >= ent->len)
 		die("git ls-files: internal error - directory entry not superset of prefix");
 
-	if (pathspec && !pathspec_match(pathspec, ps_matched, ent->name, len))
+	if (!match_pathspec(pathspec, ent->name, ent->len, len, ps_matched))
 		return;
 
 	fputs(tag, stdout);
@@ -156,7 +120,7 @@ static void show_ce_entry(const char *tag, struct cache_entry *ce)
 	if (len >= ce_namelen(ce))
 		die("git ls-files: internal error - cache entry not superset of prefix");
 
-	if (pathspec && !pathspec_match(pathspec, ps_matched, ce->name, len))
+	if (!match_pathspec(pathspec, ce->name, ce_namelen(ce), len, ps_matched))
 		return;
 
 	if (tag && *tag && show_valid_bit &&
diff --git a/cache.h b/cache.h
index 2dbd546..eba8afc 100644
--- a/cache.h
+++ b/cache.h
@@ -940,7 +940,6 @@ extern int ws_fix_copy(char *, const char *, int, unsigned, int *);
 extern int ws_blank_line(const char *line, int len, unsigned ws_rule);
 
 /* ls-files */
-int pathspec_match(const char **spec, char *matched, const char *filename, int skiplen);
 int report_path_error(const char *ps_matched, const char **pathspec, int prefix_offset);
 void overlay_tree_on_cache(const char *tree_name, const char *prefix);
 
diff --git a/dir.c b/dir.c
index 87a9758..d3b92de 100644
--- a/dir.c
+++ b/dir.c
@@ -108,25 +108,28 @@ static int match_one(const char *match, const char *name, int namelen)
  * and a mark is left in seen[] array for pathspec element that
  * actually matched anything.
  */
-int match_pathspec(const char **pathspec, const char *name, int namelen, int prefix, char *seen)
+int match_pathspec(const char **pathspec, const char *name, int namelen,
+		int prefix, char *seen)
 {
-	int retval;
-	const char *match;
+	int i, retval = 0;
+
+	if (!pathspec)
+		return 1;
 
 	name += prefix;
 	namelen -= prefix;
 
-	for (retval = 0; (match = *pathspec++) != NULL; seen++) {
+	for (i = 0; pathspec[i] != NULL; i++) {
 		int how;
-		if (*seen == MATCHED_EXACTLY)
+		const char *match = pathspec[i] + prefix;
+		if (seen && seen[i] == MATCHED_EXACTLY)
 			continue;
-		match += prefix;
 		how = match_one(match, name, namelen);
 		if (how) {
 			if (retval < how)
 				retval = how;
-			if (*seen < how)
-				*seen = how;
+			if (seen && seen[i] < how)
+				seen[i] = how;
 		}
 	}
 	return retval;
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 14:54   ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Clemens Buchacher
@ 2009-01-14 14:54     ` Clemens Buchacher
  2009-01-14 15:25       ` Clemens Buchacher
                         ` (2 more replies)
  2009-01-14 15:40     ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Johannes Schindelin
  1 sibling, 3 replies; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-14 14:54 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, johannes, Clemens Buchacher

For tracked files, matching globbing pattern in "git add" was broken
by 1e5f764c (builtin-add.c: optimize -A option and "git add ."), which
uses ce_path_match instead of match_pathspec for tracked files.
But ce_path_match does not implement pattern matching.

With this patch ce_path_match uses match_pathspec in order to perform
pattern matching.

This also implies pattern matching for many other git commands, such
as add -u, blame, log, etc.  For "git log --follow", we have to
disallow globbing pattern, because they potentially match more than
one file.

Reported-by: Johannes Schneider <johannes@familieschneider.info>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---
 builtin-log.c         |   13 ++++++++++++-
 read-cache.c          |   23 +----------------------
 t/t2200-add-update.sh |   10 ++++++++++
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index c7aa48e..16e9207 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -25,6 +25,17 @@ static int default_show_root = 1;
 static const char *fmt_patch_subject_prefix = "PATCH";
 static const char *fmt_pretty;
 
+static int has_special(const char *p)
+{
+	int x;
+
+	while ((x = *p++) != '\0')
+		if (isspecial(x))
+			return 1;
+
+	return 0;
+}
+
 static void cmd_log_init(int argc, const char **argv, const char *prefix,
 		      struct rev_info *rev)
 {
@@ -49,7 +60,7 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
 		rev->always_show_header = 0;
 	if (DIFF_OPT_TST(&rev->diffopt, FOLLOW_RENAMES)) {
 		rev->always_show_header = 0;
-		if (rev->diffopt.nr_paths != 1)
+		if (rev->diffopt.nr_paths != 1 || has_special(rev->diffopt.paths[0]))
 			usage("git logs can only follow renames on one pathname at a time");
 	}
 	for (i = 1; i < argc; i++) {
diff --git a/read-cache.c b/read-cache.c
index b1475ff..8a5f306 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -640,28 +640,7 @@ int ce_same_name(struct cache_entry *a, struct cache_entry *b)
 
 int ce_path_match(const struct cache_entry *ce, const char **pathspec)
 {
-	const char *match, *name;
-	int len;
-
-	if (!pathspec)
-		return 1;
-
-	len = ce_namelen(ce);
-	name = ce->name;
-	while ((match = *pathspec++) != NULL) {
-		int matchlen = strlen(match);
-		if (matchlen > len)
-			continue;
-		if (memcmp(name, match, matchlen))
-			continue;
-		if (matchlen && name[matchlen-1] == '/')
-			return 1;
-		if (name[matchlen] == '/' || !name[matchlen])
-			return 1;
-		if (!matchlen)
-			return 1;
-	}
-	return 0;
+	return match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, NULL);
 }
 
 /*
diff --git a/t/t2200-add-update.sh b/t/t2200-add-update.sh
index cd9231c..6d99461 100755
--- a/t/t2200-add-update.sh
+++ b/t/t2200-add-update.sh
@@ -128,4 +128,14 @@ test_expect_success 'add -n -u should not add but just report' '
 
 '
 
+test_expect_success 'wildcard update' '
+
+	: >a.x &&
+	git add "*.x" &&
+	echo asdf >a.x &&
+	git add -u "*.x" &&
+	test -z "`git ls-files -m a.x`"
+
+'
+
 test_done
-- 
1.6.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 14:54     ` [PATCH 3/3] implement pattern matching in ce_path_match Clemens Buchacher
@ 2009-01-14 15:25       ` Clemens Buchacher
  2009-01-14 15:44       ` Johannes Schindelin
  2009-01-14 18:39       ` Junio C Hamano
  2 siblings, 0 replies; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-14 15:25 UTC (permalink / raw
  To: git; +Cc: Junio C Hamano, johannes

On Wed, Jan 14, 2009 at 03:54:36PM +0100, Clemens Buchacher wrote:
> This also implies pattern matching for many other git commands, such
> as add -u, blame, log, etc.

Oops, I thought I had removed that. AFAICT blame is not affected by this.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] remove pathspec_match, use match_pathspec instead
  2009-01-14 14:54   ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Clemens Buchacher
  2009-01-14 14:54     ` [PATCH 3/3] implement pattern matching in ce_path_match Clemens Buchacher
@ 2009-01-14 15:40     ` Johannes Schindelin
  2009-01-14 16:09       ` Clemens Buchacher
  1 sibling, 1 reply; 16+ messages in thread
From: Johannes Schindelin @ 2009-01-14 15:40 UTC (permalink / raw
  To: Clemens Buchacher; +Cc: git, Junio C Hamano, johannes

Hi,

On Wed, 14 Jan 2009, Clemens Buchacher wrote:

> Both versions have the same functionality. This removes any
> redundancy.
> 
> This also adds makes two extensions to match_pathspec:

s/makes//

>  5 files changed, 17 insertions(+), 51 deletions(-)

Nice.  Does it still pass the test suite?  (From my reading, it should, 
but I do not quite have the time to run it right now.)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 14:54     ` [PATCH 3/3] implement pattern matching in ce_path_match Clemens Buchacher
  2009-01-14 15:25       ` Clemens Buchacher
@ 2009-01-14 15:44       ` Johannes Schindelin
  2009-01-14 15:55         ` Sverre Rabbelier
  2009-01-14 16:18         ` Samuel Tardieu
  2009-01-14 18:39       ` Junio C Hamano
  2 siblings, 2 replies; 16+ messages in thread
From: Johannes Schindelin @ 2009-01-14 15:44 UTC (permalink / raw
  To: Clemens Buchacher; +Cc: git, Junio C Hamano, johannes

Hi,

On Wed, 14 Jan 2009, Clemens Buchacher wrote:

> @@ -25,6 +25,17 @@ static int default_show_root = 1;
>  static const char *fmt_patch_subject_prefix = "PATCH";
>  static const char *fmt_pretty;
>  
> +static int has_special(const char *p)
> +{
> +	int x;
> +
> +	while ((x = *p++) != '\0')
> +		if (isspecial(x))
> +			return 1;
> +
> +	return 0;
> +}

I would prefer something like this:

	static int has_special(const char *p)
	{
		while (*p)
			if (isspecial(*(p++)))
				return 1;
		return 0;
	}

but that is probably a matter of taste.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 15:44       ` Johannes Schindelin
@ 2009-01-14 15:55         ` Sverre Rabbelier
  2009-01-14 16:18         ` Samuel Tardieu
  1 sibling, 0 replies; 16+ messages in thread
From: Sverre Rabbelier @ 2009-01-14 15:55 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Clemens Buchacher, git, Junio C Hamano, johannes

On Wed, Jan 14, 2009 at 16:44, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> +static int has_special(const char *p)
>> +{
>> +     int x;
>> +
>> +     while ((x = *p++) != '\0')
>> +             if (isspecial(x))
>> +                     return 1;
>> +
>> +     return 0;
>> +}
>
> I would prefer something like this:
>
>        static int has_special(const char *p)
>        {
>                while (*p)
>                        if (isspecial(*(p++)))
>                                return 1;
>                return 0;
>        }
>
> but that is probably a matter of taste.

FWIW, I think the above is a lot less readable due to the assignment
in the while loop's conditional. Whereas in Dscho's version it is
intuitively obvious what the termination condition of the while loop
is.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] remove pathspec_match, use match_pathspec instead
  2009-01-14 15:40     ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Johannes Schindelin
@ 2009-01-14 16:09       ` Clemens Buchacher
  0 siblings, 0 replies; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-14 16:09 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, johannes

On Wed, Jan 14, 2009 at 04:40:42PM +0100, Johannes Schindelin wrote:
> > Both versions have the same functionality. This removes any
> > redundancy.
> >
> > This also adds makes two extensions to match_pathspec:
> 
> s/makes//

Thanks.

> Nice.  Does it still pass the test suite?  (From my reading, it should, 
> but I do not quite have the time to run it right now.)

It sure does. I am not confident enough to send untested patches yet. :-)

On Wed, Jan 14, 2009 at 04:44:36PM +0100, Johannes Schindelin wrote:
> I would prefer something like this:
> 
>       static int has_special(const char *p)
>       {
>               while (*p)
>                       if (isspecial(*(p++)))
>                               return 1;
>               return 0;
>       }

Ok, no problem.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 15:44       ` Johannes Schindelin
  2009-01-14 15:55         ` Sverre Rabbelier
@ 2009-01-14 16:18         ` Samuel Tardieu
  2009-01-14 16:53           ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Samuel Tardieu @ 2009-01-14 16:18 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Clemens Buchacher, git, Junio C Hamano, johannes

>>>>> "Johannes" == Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

Dscho> I would prefer something like this:

My taste would favor:

  static int has_special(const char *p)
  {
    for (; *p; p++)
      if (isspecial(*p))
        return 1;
    return 0;
  }

as it underlines the intent (loop over "p" characters and stop no
later than the end of the string) while avoiding using side effects in
the body to increment the pointer. This habit comes from Ada, where
loop indices are considered read-only in the loop body.

It also eases further extensions such as

  static int has_special(const char *p)
  {
    for (; *p; p++)
      if (isspecial(*p) || isveryspecial(*p))
        return 1;
    return 0;
  }

without having to move the "++" somewhere else.

Dscho> but that is probably a matter of taste.

Agreed.

  Sam
-- 
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 16:18         ` Samuel Tardieu
@ 2009-01-14 16:53           ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2009-01-14 16:53 UTC (permalink / raw
  To: Samuel Tardieu
  Cc: Johannes Schindelin, Clemens Buchacher, git, Junio C Hamano,
	johannes

On Wed, Jan 14, 2009 at 05:18:39PM +0100, Samuel Tardieu wrote:

> My taste would favor:
> 
>   static int has_special(const char *p)
>   {
>     for (; *p; p++)
>       if (isspecial(*p))
>         return 1;
>     return 0;
>   }

That was my first thought upon reading the other two, as well. And I
agree with all of the reasoning you gave.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 14:54     ` [PATCH 3/3] implement pattern matching in ce_path_match Clemens Buchacher
  2009-01-14 15:25       ` Clemens Buchacher
  2009-01-14 15:44       ` Johannes Schindelin
@ 2009-01-14 18:39       ` Junio C Hamano
  2009-01-14 19:23         ` Clemens Buchacher
  2009-01-16  2:51         ` Junio C Hamano
  2 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-01-14 18:39 UTC (permalink / raw
  To: Clemens Buchacher; +Cc: git, johannes

Clemens Buchacher <drizzd@aon.at> writes:

> With this patch ce_path_match uses match_pathspec in order to perform
> pattern matching.

We have two conflicting definitions of pattern matching in our system.
I'd make it more explicit which kind of pattern matching you are talking
about here.

The family of operations based on the diff-tree machinery (e.g. path
limited revision walking "git log A..B -- dir1/dir2") define the pattern
matching as "leading path match (exact match is just a special case of
this)".  Other operations that work on paths in the work tree and the
index (e.g. grep, ls-files) uses "leading path match, but fall back to
globbing".

In the longer term we really should unify them by teaching the former to
fall back to globbing without getting undue performance hit, and this
patch may be a step in the right direction.  There are optimizations that
assume the "leading path" semantics to trim the input early and avoid
opening and descending into a tree object if pathspec patterns cannot
possibly match (see tree-diff.c::tree_entry_interesting() for an example),
and we need to teach them to notice a glob wildcard in an earlier part of
a pathspec and to descend into some trees that they would have skipped
with the old definition of pathspec.

> @@ -49,7 +60,7 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
>  		rev->always_show_header = 0;
>  	if (DIFF_OPT_TST(&rev->diffopt, FOLLOW_RENAMES)) {
>  		rev->always_show_header = 0;
> -		if (rev->diffopt.nr_paths != 1)
> +		if (rev->diffopt.nr_paths != 1 || has_special(rev->diffopt.paths[0]))
>  			usage("git logs can only follow renames on one pathname at a time");
>  	}

The reason match_pathspec() first tries exact match and then falls back to
globbing is so that the user can say "I have a file whose name ends with a
question mark, please match it literally."  This patch defeats it, but it
probably is a minor point.

1/3 and 2/3 in the series looked good.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 18:39       ` Junio C Hamano
@ 2009-01-14 19:23         ` Clemens Buchacher
  2009-01-14 22:27           ` Junio C Hamano
  2009-01-16  2:51         ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-14 19:23 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, johannes

On Wed, Jan 14, 2009 at 10:39:29AM -0800, Junio C Hamano wrote:
> Clemens Buchacher <drizzd@aon.at> writes:
> 
> > With this patch ce_path_match uses match_pathspec in order to perform
> > pattern matching.
> 
> We have two conflicting definitions of pattern matching in our system.
> I'd make it more explicit which kind of pattern matching you are talking
> about here.

Right, will fix.

> In the longer term we really should unify them by teaching the former to
> fall back to globbing without getting undue performance hit, and this
> patch may be a step in the right direction.  There are optimizations that
> assume the "leading path" semantics to trim the input early and avoid
> opening and descending into a tree object if pathspec patterns cannot
> possibly match (see tree-diff.c::tree_entry_interesting() for an example),
> and we need to teach them to notice a glob wildcard in an earlier part of
> a pathspec and to descend into some trees that they would have skipped
> with the old definition of pathspec.

I see. I can probably fix that this weekend.

> > @@ -49,7 +60,7 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
> >  		rev->always_show_header = 0;
> >  	if (DIFF_OPT_TST(&rev->diffopt, FOLLOW_RENAMES)) {
> >  		rev->always_show_header = 0;
> > -		if (rev->diffopt.nr_paths != 1)
> > +		if (rev->diffopt.nr_paths != 1 || has_special(rev->diffopt.paths[0]))
> >  			usage("git logs can only follow renames on one pathname at a time");
> >  	}
> 
> The reason match_pathspec() first tries exact match and then falls back to
> globbing is so that the user can say "I have a file whose name ends with a
> question mark, please match it literally."  This patch defeats it, but it
> probably is a minor point.

I was wondering actually if we should disallow such paths altogether,
since there would be no way to match only 'a?', if something like 'ab' also
exists. So if you added 'a?' by accident, you cannot even remove it without
also removing 'ab'.

I think we could at least add an option to disable globbing. Then we can
also disable the above check conditioned on that. If we allowed globbing
pattern for following renames wouldn't that result in following the first
file (or last in history) to match the pattern, which is potentially
confusing?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 19:23         ` Clemens Buchacher
@ 2009-01-14 22:27           ` Junio C Hamano
  2009-01-15  8:20             ` Clemens Buchacher
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2009-01-14 22:27 UTC (permalink / raw
  To: Clemens Buchacher; +Cc: git, johannes

Clemens Buchacher <drizzd@aon.at> writes:

> I think we could at least add an option to disable globbing. Then we can
> also disable the above check conditioned on that. If we allowed globbing
> pattern for following renames wouldn't that result in following the first
> file (or last in history) to match the pattern, which is potentially
> confusing?

Yeah, I agree that would be a reasonable thing to do.  

In places we read paths from the index or from the work tree and add them
as pathspec elements---you would want to mark them as non-globbing, too.
Which probably means that "is it Ok to glob this" setting has to be per
pathspec array elements.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 22:27           ` Junio C Hamano
@ 2009-01-15  8:20             ` Clemens Buchacher
  0 siblings, 0 replies; 16+ messages in thread
From: Clemens Buchacher @ 2009-01-15  8:20 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, johannes

On Wed, Jan 14, 2009 at 02:27:03PM -0800, Junio C Hamano wrote:
> In places we read paths from the index or from the work tree and add them
> as pathspec elements---you would want to mark them as non-globbing, too.
> Which probably means that "is it Ok to glob this" setting has to be per
> pathspec array elements.

Right. This certainly complicates things. Also note that this invalidates
1/3, because even if '?' matched exactly, it can still match '*', and vice
versa. Depending on ordering one of these two cases would pose a problem.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] implement pattern matching in ce_path_match
  2009-01-14 18:39       ` Junio C Hamano
  2009-01-14 19:23         ` Clemens Buchacher
@ 2009-01-16  2:51         ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2009-01-16  2:51 UTC (permalink / raw
  To: Clemens Buchacher; +Cc: git, johannes

Junio C Hamano <gitster@pobox.com> writes:

> Clemens Buchacher <drizzd@aon.at> writes:
>
>> With this patch ce_path_match uses match_pathspec in order to perform
>> pattern matching.
>
> We have two conflicting definitions of pattern matching in our system.
> I'd make it more explicit which kind of pattern matching you are talking
> about here.
>
> The family of operations based on the diff-tree machinery (e.g. path
> limited revision walking "git log A..B -- dir1/dir2") define the pattern
> matching as "leading path match (exact match is just a special case of
> this)".  Other operations that work on paths in the work tree and the
> index (e.g. grep, ls-files) uses "leading path match, but fall back to
> globbing".
>
> In the longer term we really should unify them by teaching the former to
> fall back to globbing without getting undue performance hit, and this
> patch may be a step in the right direction.  There are optimizations that
> assume the "leading path" semantics to trim the input early and avoid
> opening and descending into a tree object if pathspec patterns cannot
> possibly match (see tree-diff.c::tree_entry_interesting() for an example),
> and we need to teach them to notice a glob wildcard in an earlier part of
> a pathspec and to descend into some trees that they would have skipped
> with the old definition of pathspec.

Actually there was an earlier attempt that resulted in the pathspec
matching tree traverser builtin-grep uses.  Even though it has to work
with trees (when grepping inside a tree-ish) and has optimizations not to
open unnecessary subtrees similar to the one the diff-tree machinery has,
it also knows how to handle globs.  If we were to pick one of existing
implementations for the longer term unification, I think that is probably
the one we should build on top of.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-01-16  2:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-14 14:54 [PATCH 0/3] fix "git add" pattern matching Clemens Buchacher
2009-01-14 14:54 ` [PATCH 1/3] clean up pathspec matching Clemens Buchacher
2009-01-14 14:54   ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Clemens Buchacher
2009-01-14 14:54     ` [PATCH 3/3] implement pattern matching in ce_path_match Clemens Buchacher
2009-01-14 15:25       ` Clemens Buchacher
2009-01-14 15:44       ` Johannes Schindelin
2009-01-14 15:55         ` Sverre Rabbelier
2009-01-14 16:18         ` Samuel Tardieu
2009-01-14 16:53           ` Jeff King
2009-01-14 18:39       ` Junio C Hamano
2009-01-14 19:23         ` Clemens Buchacher
2009-01-14 22:27           ` Junio C Hamano
2009-01-15  8:20             ` Clemens Buchacher
2009-01-16  2:51         ` Junio C Hamano
2009-01-14 15:40     ` [PATCH 2/3] remove pathspec_match, use match_pathspec instead Johannes Schindelin
2009-01-14 16:09       ` Clemens Buchacher

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).