git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1])
@ 2010-12-13  9:46 Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 01/19] Add struct pathspec Nguyễn Thái Ngọc Duy
                   ` (18 more replies)
  0 siblings, 19 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Background:

pathspecs in git can be handled differently in three places

 1. log family uses tree_entry_interesting() and ce_path_match()
 2. most index-related operations use match_pathspec()
 3. grep uses its own pathspec_matches()

Out of three, #3 provides the most advanced functionalities, while #1
has a few good optimizations, but not as powerful as #3. #2 is sort of
trade-off between the other two.

This series brings all the #3 goodness to #1 and #2, then kills #3. I
don't want to kill #2 because it takes a list as input, while #1 takes
trees (ce_path_match() takes list though). There could be different
optmizations based on different input type.

Summary of patches:

  Add struct pathspec
  diff-no-index: use diff_tree_setup_paths()
  pathspec: cache string length when initializing pathspec
  Convert struct diff_options to use struct pathspec
  tree_entry_interesting(): remove dependency on struct diff_options
  Move tree_entry_interesting() to tree-walk.c and export it

This is unchanged from nd/struct-pathspec in pu. There is one patch
from pu replaced later.

  glossary: define pathspec

This is what I am aiming to. If I make mistakes, blame Jonathan
because he mis-specifies it ;-)

  pathspec: mark wildcard pathspecs from the beginning

>From old nd/struct-pathspec, to recognize potential wildcard pathspecs
early.

  tree-diff.c: reserve space in "base" for pathname concatenation

The (probably most) used operation in traversing trees is concatenate
dirname and basename into full path (especially for wildcard matching).
This requires a new buffer every time. This patch ensures that the
caller prepares a writable buffer with dirname already filled. If the
callee wants full path, it does not have to allocate another buffer
(and does shorter memcpy).

This patch is not strictly needed though.

  tree_entry_interesting(): factor out most matching logic

For readibility of the next patches.

  tree_entry_interesting: support depth limit

Goodness from #3.

  tree_entry_interesting(): support wildcard matching
  tree_entry_interesting(): optimize fnmatch when base is matched

This is something t_e_i() lacks for so long. However, in order to make
log family commands work properly, ce_path_match() also needs to learn
wildcards.

This changes tree_entry_interesting() interface, therefore breaks
en/object-list-with-pathspec. I'll send fixes shortly.

  Convert ce_path_match() use to match_pathspec()

So that log family now works with wildcards.

  pathspec: add match_pathspec_depth()

This is new match_pathspec(). I don't want to replace the old one
because it changes more places. But once it works, another patch to
kill match_pathspec() should be easy.

  grep: convert to use struct pathspec
  grep: use match_pathspec_depth() for cache grepping
  grep: use preallocated buffer for grep_tree()
  grep: drop pathspec_matches() in favor of tree_entry_interesting()

grep (especially t7810) is how I test all these. I need to write more
tests to make sure things work. But for now t7810 passes.

Hopefully I did not lose any optimizations in pathspec_matches().

It's time to rebase negative pathspec patches on top and get back to
my narrow clone.

[1] https://git.wiki.kernel.org/index.php/SoC2010Ideas#Unify_Pathspec_Semantics

 Documentation/glossary-content.txt |   23 ++++
 builtin/diff-files.c               |    2 +-
 builtin/diff.c                     |    4 +-
 builtin/grep.c                     |  200 ++++++++---------------------
 builtin/log.c                      |    2 +-
 cache.h                            |   14 ++
 diff-lib.c                         |    2 +-
 diff-no-index.c                    |   13 +-
 diff.h                             |    4 +-
 dir.c                              |   98 ++++++++++++++
 dir.h                              |    4 +
 read-cache.c                       |   20 +---
 revision.c                         |    6 +-
 t/t4010-diff-pathspec.sh           |   14 ++
 tree-diff.c                        |  246 ++++++++----------------------------
 tree-walk.c                        |  186 +++++++++++++++++++++++++++
 tree-walk.h                        |    2 +
 17 files changed, 461 insertions(+), 379 deletions(-)

-- 
1.7.3.3.476.g10a82

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/19] Add struct pathspec
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 17:31   ` Thiago Farina
  2010-12-13  9:46 ` [PATCH 02/19] diff-no-index: use diff_tree_setup_paths() Nguyễn Thái Ngọc Duy
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This struct for now is just a wrapper for the current pathspec form:
const char **. It is intended to be extended with more useful
pathspec-related information over time.

The data structure for passing pathspec around remains const char **,
struct pathspec will be initialized locally to be used and destroyed.
Hopefully all pathspec related code will be gradually migrated to pass
this struct instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h |    7 +++++++
 dir.c   |   18 ++++++++++++++++++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index 2ef2fa3..3330769 100644
--- a/cache.h
+++ b/cache.h
@@ -493,6 +493,13 @@ extern int index_name_is_other(const struct index_state *, const char *, int);
 extern int ie_match_stat(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
 extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
 
+struct pathspec {
+	const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
+	int nr;
+};
+
+extern int init_pathspec(struct pathspec *, const char **);
+extern void free_pathspec(struct pathspec *);
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, int write_object);
diff --git a/dir.c b/dir.c
index 133f472..205adc4 100644
--- a/dir.c
+++ b/dir.c
@@ -1071,3 +1071,21 @@ int remove_path(const char *name)
 	return 0;
 }
 
+int init_pathspec(struct pathspec *pathspec, const char **paths)
+{
+	const char **p = paths;
+
+	memset(pathspec, 0, sizeof(*pathspec));
+	if (!p)
+		return 0;
+	while (*p)
+		p++;
+	pathspec->raw = paths;
+	pathspec->nr = p - paths;
+	return 0;
+}
+
+void free_pathspec(struct pathspec *pathspec)
+{
+	; /* do nothing */
+}
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/19] diff-no-index: use diff_tree_setup_paths()
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 01/19] Add struct pathspec Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 03/19] pathspec: cache string length when initializing pathspec Nguyễn Thái Ngọc Duy
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

diff_options.{paths,nr_paths} will be removed later. Do not
modify them directly.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diff-no-index.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index ce9e783..e48ab92 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -231,8 +231,9 @@ void diff_no_index(struct rev_info *revs,
 
 	if (prefix) {
 		int len = strlen(prefix);
+		const char *paths[3];
+		memset(paths, 0, sizeof(paths));
 
-		revs->diffopt.paths = xcalloc(2, sizeof(char *));
 		for (i = 0; i < 2; i++) {
 			const char *p = argv[argc - 2 + i];
 			/*
@@ -242,12 +243,12 @@ void diff_no_index(struct rev_info *revs,
 			p = (strcmp(p, "-")
 			     ? xstrdup(prefix_filename(prefix, len, p))
 			     : p);
-			revs->diffopt.paths[i] = p;
+			paths[i] = p;
 		}
+		diff_tree_setup_paths(paths, &revs->diffopt);
 	}
 	else
-		revs->diffopt.paths = argv + argc - 2;
-	revs->diffopt.nr_paths = 2;
+		diff_tree_setup_paths(argv + argc - 2, &revs->diffopt);
 	revs->diffopt.skip_stat_unmatch = 1;
 	if (!revs->diffopt.output_format)
 		revs->diffopt.output_format = DIFF_FORMAT_PATCH;
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/19] pathspec: cache string length when initializing pathspec
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 01/19] Add struct pathspec Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 02/19] diff-no-index: use diff_tree_setup_paths() Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 04/19] Convert struct diff_options to use struct pathspec Nguyễn Thái Ngọc Duy
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This field will be used when tree_entry_interesting() is converted to
use struct pathspec. Currently it uses pathlens[] in struct
diff_options to avoid calculating string over and over again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h |    3 +++
 dir.c   |   13 ++++++++++++-
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index 3330769..36819b6 100644
--- a/cache.h
+++ b/cache.h
@@ -496,6 +496,9 @@ extern int ie_modified(const struct index_state *, struct cache_entry *, struct
 struct pathspec {
 	const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
 	int nr;
+	struct pathspec_item {
+		int len;
+	} *items;
 };
 
 extern int init_pathspec(struct pathspec *, const char **);
diff --git a/dir.c b/dir.c
index 205adc4..646c79f 100644
--- a/dir.c
+++ b/dir.c
@@ -1074,6 +1074,7 @@ int remove_path(const char *name)
 int init_pathspec(struct pathspec *pathspec, const char **paths)
 {
 	const char **p = paths;
+	int i;
 
 	memset(pathspec, 0, sizeof(*pathspec));
 	if (!p)
@@ -1082,10 +1083,20 @@ int init_pathspec(struct pathspec *pathspec, const char **paths)
 		p++;
 	pathspec->raw = paths;
 	pathspec->nr = p - paths;
+	if (!pathspec->nr)
+		return 0;
+
+	pathspec->items = xmalloc(sizeof(struct pathspec_item)*pathspec->nr);
+	for (i = 0; i < pathspec->nr; i++) {
+		struct pathspec_item *item = pathspec->items+i;
+
+		item->len = strlen(paths[i]);
+	}
 	return 0;
 }
 
 void free_pathspec(struct pathspec *pathspec)
 {
-	; /* do nothing */
+	free(pathspec->items);
+	pathspec->items = NULL;
 }
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/19] Convert struct diff_options to use struct pathspec
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 03/19] pathspec: cache string length when initializing pathspec Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 19:00   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 05/19] tree_entry_interesting(): remove dependency on struct diff_options Nguyễn Thái Ngọc Duy
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/diff-files.c |    2 +-
 builtin/diff.c       |    4 ++--
 builtin/log.c        |    2 +-
 diff-lib.c           |    2 +-
 diff-no-index.c      |    4 ++--
 diff.h               |    4 +---
 revision.c           |    6 +-----
 tree-diff.c          |   46 +++++++++++-----------------------------------
 8 files changed, 20 insertions(+), 50 deletions(-)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 951c7c8..46085f8 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -61,7 +61,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
 	    (rev.diffopt.output_format & DIFF_FORMAT_PATCH))
 		rev.combine_merges = rev.dense_combined_merges = 1;
 
-	if (read_cache_preload(rev.diffopt.paths) < 0) {
+	if (read_cache_preload(rev.diffopt.pathspec.raw) < 0) {
 		perror("read_cache_preload");
 		return -1;
 	}
diff --git a/builtin/diff.c b/builtin/diff.c
index a43d326..76c42d8 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -135,7 +135,7 @@ static int builtin_diff_index(struct rev_info *revs,
 	    revs->max_count != -1 || revs->min_age != -1 ||
 	    revs->max_age != -1)
 		usage(builtin_diff_usage);
-	if (read_cache_preload(revs->diffopt.paths) < 0) {
+	if (read_cache_preload(revs->diffopt.pathspec.raw) < 0) {
 		perror("read_cache_preload");
 		return -1;
 	}
@@ -237,7 +237,7 @@ static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv
 		revs->combine_merges = revs->dense_combined_merges = 1;
 
 	setup_work_tree();
-	if (read_cache_preload(revs->diffopt.paths) < 0) {
+	if (read_cache_preload(revs->diffopt.pathspec.raw) < 0) {
 		perror("read_cache_preload");
 		return -1;
 	}
diff --git a/builtin/log.c b/builtin/log.c
index eaa1ee0..92779a5 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -89,7 +89,7 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
 		rev->always_show_header = 0;
 	if (DIFF_OPT_TST(&rev->diffopt, FOLLOW_RENAMES)) {
 		rev->always_show_header = 0;
-		if (rev->diffopt.nr_paths != 1)
+		if (rev->diffopt.pathspec.nr != 1)
 			usage("git logs can only follow renames on one pathname at a time");
 	}
 	for (i = 1; i < argc; i++) {
diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..3b809f2 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -501,7 +501,7 @@ int do_diff_cache(const unsigned char *tree_sha1, struct diff_options *opt)
 	active_nr = dst - active_cache;
 
 	init_revisions(&revs, NULL);
-	revs.prune_data = opt->paths;
+	revs.prune_data = opt->pathspec.raw;
 	tree = parse_tree_indirect(tree_sha1);
 	if (!tree)
 		die("bad tree object %s", sha1_to_hex(tree_sha1));
diff --git a/diff-no-index.c b/diff-no-index.c
index e48ab92..3a36144 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -260,8 +260,8 @@ void diff_no_index(struct rev_info *revs,
 	if (diff_setup_done(&revs->diffopt) < 0)
 		die("diff_setup_done failed");
 
-	if (queue_diff(&revs->diffopt, revs->diffopt.paths[0],
-		       revs->diffopt.paths[1]))
+	if (queue_diff(&revs->diffopt, revs->diffopt.pathspec.raw[0],
+		       revs->diffopt.pathspec.raw[1]))
 		exit(1);
 	diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/");
 	diffcore_std(&revs->diffopt);
diff --git a/diff.h b/diff.h
index bf2f44d..6497b71 100644
--- a/diff.h
+++ b/diff.h
@@ -133,9 +133,7 @@ struct diff_options {
 	FILE *file;
 	int close_file;
 
-	int nr_paths;
-	const char **paths;
-	int *pathlens;
+	struct pathspec pathspec;
 	change_fn_t change;
 	add_remove_fn_t add_remove;
 	diff_format_fn_t format_callback;
diff --git a/revision.c b/revision.c
index b1c1890..b2a5867 100644
--- a/revision.c
+++ b/revision.c
@@ -553,11 +553,7 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
 
 	left_first = left_count < right_count;
 	init_patch_ids(&ids);
-	if (revs->diffopt.nr_paths) {
-		ids.diffopts.nr_paths = revs->diffopt.nr_paths;
-		ids.diffopts.paths = revs->diffopt.paths;
-		ids.diffopts.pathlens = revs->diffopt.pathlens;
-	}
+	ids.diffopts.pathspec = revs->diffopt.pathspec;
 
 	/* Compute patch-ids for one side */
 	for (p = list; p; p = p->next) {
diff --git a/tree-diff.c b/tree-diff.c
index cd659c6..986c0f4 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -100,16 +100,16 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 	int pathlen;
 	int never_interesting = -1;
 
-	if (!opt->nr_paths)
+	if (!opt->pathspec.nr)
 		return 1;
 
 	sha1 = tree_entry_extract(desc, &path, &mode);
 
 	pathlen = tree_entry_len(path, sha1);
 
-	for (i = 0; i < opt->nr_paths; i++) {
-		const char *match = opt->paths[i];
-		int matchlen = opt->pathlens[i];
+	for (i = 0; i < opt->pathspec.nr; i++) {
+		const char *match = opt->pathspec.raw[i];
+		int matchlen = opt->pathspec.items[i].len;
 		int m = -1; /* signals that we haven't called strncmp() */
 
 		if (baselen >= matchlen) {
@@ -289,7 +289,7 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2, const char *base, stru
 		if (DIFF_OPT_TST(opt, QUICK) &&
 		    DIFF_OPT_TST(opt, HAS_CHANGES))
 			break;
-		if (opt->nr_paths) {
+		if (opt->pathspec.nr) {
 			skip_uninteresting(t1, base, baselen, opt);
 			skip_uninteresting(t2, base, baselen, opt);
 		}
@@ -348,7 +348,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
 	DIFF_OPT_SET(&diff_opts, RECURSIVE);
 	DIFF_OPT_SET(&diff_opts, FIND_COPIES_HARDER);
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
-	diff_opts.single_follow = opt->paths[0];
+	diff_opts.single_follow = opt->pathspec.raw[0];
 	diff_opts.break_opt = opt->break_opt;
 	paths[0] = NULL;
 	diff_tree_setup_paths(paths, &diff_opts);
@@ -368,15 +368,15 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
 		 * diff_queued_diff, we will also use that as the path in
 		 * the future!
 		 */
-		if ((p->status == 'R' || p->status == 'C') && !strcmp(p->two->path, opt->paths[0])) {
+		if ((p->status == 'R' || p->status == 'C') && !strcmp(p->two->path, opt->pathspec.raw[0])) {
 			/* Switch the file-pairs around */
 			q->queue[i] = choice;
 			choice = p;
 
 			/* Update the path we use from now on.. */
 			diff_tree_release_paths(opt);
-			opt->paths[0] = xstrdup(p->one->path);
-			diff_tree_setup_paths(opt->paths, opt);
+			opt->pathspec.raw[0] = xstrdup(p->one->path);
+			diff_tree_setup_paths(opt->pathspec.raw, opt);
 
 			/*
 			 * The caller expects us to return a set of vanilla
@@ -451,36 +451,12 @@ int diff_root_tree_sha1(const unsigned char *new, const char *base, struct diff_
 	return retval;
 }
 
-static int count_paths(const char **paths)
-{
-	int i = 0;
-	while (*paths++)
-		i++;
-	return i;
-}
-
 void diff_tree_release_paths(struct diff_options *opt)
 {
-	free(opt->pathlens);
+	free_pathspec(&opt->pathspec);
 }
 
 void diff_tree_setup_paths(const char **p, struct diff_options *opt)
 {
-	opt->nr_paths = 0;
-	opt->pathlens = NULL;
-	opt->paths = NULL;
-
-	if (p) {
-		int i;
-
-		opt->paths = p;
-		opt->nr_paths = count_paths(p);
-		if (opt->nr_paths == 0) {
-			opt->pathlens = NULL;
-			return;
-		}
-		opt->pathlens = xmalloc(opt->nr_paths * sizeof(int));
-		for (i=0; i < opt->nr_paths; i++)
-			opt->pathlens[i] = strlen(p[i]);
-	}
+	init_pathspec(&opt->pathspec, p);
 }
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/19] tree_entry_interesting(): remove dependency on struct diff_options
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 04/19] Convert struct diff_options to use struct pathspec Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 19:11   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 06/19] Move tree_entry_interesting() to tree-walk.c and export it Nguyễn Thái Ngọc Duy
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This function can be potentially used in more places than just
tree-diff.c. "struct diff_options" does not make much sense outside
diff_tree_sha1().

While removing the use of diff_options, it also removes
tree_entry_extract() call, which means S_ISDIR() uses the entry->mode
directly, without being filtered by canon_mode() (called internally
inside tree_entry_extract)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 tree-diff.c |   28 +++++++++++-----------------
 1 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 986c0f4..822d45e 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -91,25 +91,20 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
  *  - zero for no
  *  - negative for "no, and no subsequent entries will be either"
  */
-static int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt)
+static int tree_entry_interesting(const struct name_entry *entry, const char *base, int baselen, const struct pathspec *ps)
 {
-	const char *path;
-	const unsigned char *sha1;
-	unsigned mode;
 	int i;
 	int pathlen;
 	int never_interesting = -1;
 
-	if (!opt->pathspec.nr)
+	if (!ps || !ps->nr)
 		return 1;
 
-	sha1 = tree_entry_extract(desc, &path, &mode);
-
-	pathlen = tree_entry_len(path, sha1);
+	pathlen = tree_entry_len(entry->path, entry->sha1);
 
-	for (i = 0; i < opt->pathspec.nr; i++) {
-		const char *match = opt->pathspec.raw[i];
-		int matchlen = opt->pathspec.items[i].len;
+	for (i = 0; i < ps->nr; i++) {
+		const char *match = ps->raw[i];
+		int matchlen = ps->items[i].len;
 		int m = -1; /* signals that we haven't called strncmp() */
 
 		if (baselen >= matchlen) {
@@ -147,7 +142,7 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 			 * Does match sort strictly earlier than path
 			 * with their common parts?
 			 */
-			m = strncmp(match, path,
+			m = strncmp(match, entry->path,
 				    (matchlen < pathlen) ? matchlen : pathlen);
 			if (m < 0)
 				continue;
@@ -174,7 +169,7 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 		if (matchlen > pathlen) {
 			if (match[pathlen] != '/')
 				continue;
-			if (!S_ISDIR(mode))
+			if (!S_ISDIR(entry->mode))
 				continue;
 		}
 
@@ -183,7 +178,7 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 			 * we cheated and did not do strncmp(), so we do
 			 * that here.
 			 */
-			m = strncmp(match, path, pathlen);
+			m = strncmp(match, entry->path, pathlen);
 
 		/*
 		 * If common part matched earlier then it is a hit,
@@ -206,8 +201,7 @@ static void show_tree(struct diff_options *opt, const char *prefix, struct tree_
 		if (all_interesting)
 			show = 1;
 		else {
-			show = tree_entry_interesting(desc, base, baselen,
-						      opt);
+			show = tree_entry_interesting(&desc->entry, base, baselen, &opt->pathspec);
 			if (show == 2)
 				all_interesting = 1;
 		}
@@ -266,7 +260,7 @@ static void skip_uninteresting(struct tree_desc *t, const char *base, int basele
 		if (all_interesting)
 			show = 1;
 		else {
-			show = tree_entry_interesting(t, base, baselen, opt);
+			show = tree_entry_interesting(&t->entry, base, baselen, &opt->pathspec);
 			if (show == 2)
 				all_interesting = 1;
 		}
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/19] Move tree_entry_interesting() to tree-walk.c and export it
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 05/19] tree_entry_interesting(): remove dependency on struct diff_options Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 07/19] glossary: define pathspec Nguyễn Thái Ngọc Duy
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 tree-diff.c |  109 ---------------------------------------------------------
 tree-walk.c |  111 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tree-walk.h |    2 +
 3 files changed, 113 insertions(+), 109 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 822d45e..50d7e6d 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -82,115 +82,6 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 	return 0;
 }
 
-/*
- * Is a tree entry interesting given the pathspec we have?
- *
- * Return:
- *  - 2 for "yes, and all subsequent entries will be"
- *  - 1 for yes
- *  - zero for no
- *  - negative for "no, and no subsequent entries will be either"
- */
-static int tree_entry_interesting(const struct name_entry *entry, const char *base, int baselen, const struct pathspec *ps)
-{
-	int i;
-	int pathlen;
-	int never_interesting = -1;
-
-	if (!ps || !ps->nr)
-		return 1;
-
-	pathlen = tree_entry_len(entry->path, entry->sha1);
-
-	for (i = 0; i < ps->nr; i++) {
-		const char *match = ps->raw[i];
-		int matchlen = ps->items[i].len;
-		int m = -1; /* signals that we haven't called strncmp() */
-
-		if (baselen >= matchlen) {
-			/* If it doesn't match, move along... */
-			if (strncmp(base, match, matchlen))
-				continue;
-
-			/*
-			 * If the base is a subdirectory of a path which
-			 * was specified, all of them are interesting.
-			 */
-			if (!matchlen ||
-			    base[matchlen] == '/' ||
-			    match[matchlen - 1] == '/')
-				return 2;
-
-			/* Just a random prefix match */
-			continue;
-		}
-
-		/* Does the base match? */
-		if (strncmp(base, match, baselen))
-			continue;
-
-		match += baselen;
-		matchlen -= baselen;
-
-		if (never_interesting) {
-			/*
-			 * We have not seen any match that sorts later
-			 * than the current path.
-			 */
-
-			/*
-			 * Does match sort strictly earlier than path
-			 * with their common parts?
-			 */
-			m = strncmp(match, entry->path,
-				    (matchlen < pathlen) ? matchlen : pathlen);
-			if (m < 0)
-				continue;
-
-			/*
-			 * If we come here even once, that means there is at
-			 * least one pathspec that would sort equal to or
-			 * later than the path we are currently looking at.
-			 * In other words, if we have never reached this point
-			 * after iterating all pathspecs, it means all
-			 * pathspecs are either outside of base, or inside the
-			 * base but sorts strictly earlier than the current
-			 * one.  In either case, they will never match the
-			 * subsequent entries.  In such a case, we initialized
-			 * the variable to -1 and that is what will be
-			 * returned, allowing the caller to terminate early.
-			 */
-			never_interesting = 0;
-		}
-
-		if (pathlen > matchlen)
-			continue;
-
-		if (matchlen > pathlen) {
-			if (match[pathlen] != '/')
-				continue;
-			if (!S_ISDIR(entry->mode))
-				continue;
-		}
-
-		if (m == -1)
-			/*
-			 * we cheated and did not do strncmp(), so we do
-			 * that here.
-			 */
-			m = strncmp(match, entry->path, pathlen);
-
-		/*
-		 * If common part matched earlier then it is a hit,
-		 * because we rejected the case where path is not a
-		 * leading directory and is shorter than match.
-		 */
-		if (!m)
-			return 1;
-	}
-	return never_interesting; /* No matches */
-}
-
 /* A whole sub-tree went away or appeared */
 static void show_tree(struct diff_options *opt, const char *prefix, struct tree_desc *desc, const char *base, int baselen)
 {
diff --git a/tree-walk.c b/tree-walk.c
index a9bbf4e..01168ea 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -455,3 +455,114 @@ int get_tree_entry(const unsigned char *tree_sha1, const char *name, unsigned ch
 	free(tree);
 	return retval;
 }
+
+/*
+ * Is a tree entry interesting given the pathspec we have?
+ *
+ * Return:
+ *  - 2 for "yes, and all subsequent entries will be"
+ *  - 1 for yes
+ *  - zero for no
+ *  - negative for "no, and no subsequent entries will be either"
+ */
+int tree_entry_interesting(const struct name_entry *entry,
+			   const char *base, int baselen,
+			   const struct pathspec *ps)
+{
+	int i;
+	int pathlen;
+	int never_interesting = -1;
+
+	if (!ps || !ps->nr)
+		return 1;
+
+	pathlen = tree_entry_len(entry->path, entry->sha1);
+
+	for (i = 0; i < ps->nr; i++) {
+		const char *match = ps->raw[i];
+		int matchlen = ps->items[i].len;
+		int m = -1; /* signals that we haven't called strncmp() */
+
+		if (baselen >= matchlen) {
+			/* If it doesn't match, move along... */
+			if (strncmp(base, match, matchlen))
+				continue;
+
+			/*
+			 * If the base is a subdirectory of a path which
+			 * was specified, all of them are interesting.
+			 */
+			if (!matchlen ||
+			    base[matchlen] == '/' ||
+			    match[matchlen - 1] == '/')
+				return 2;
+
+			/* Just a random prefix match */
+			continue;
+		}
+
+		/* Does the base match? */
+		if (strncmp(base, match, baselen))
+			continue;
+
+		match += baselen;
+		matchlen -= baselen;
+
+		if (never_interesting) {
+			/*
+			 * We have not seen any match that sorts later
+			 * than the current path.
+			 */
+
+			/*
+			 * Does match sort strictly earlier than path
+			 * with their common parts?
+			 */
+			m = strncmp(match, entry->path,
+				    (matchlen < pathlen) ? matchlen : pathlen);
+			if (m < 0)
+				continue;
+
+			/*
+			 * If we come here even once, that means there is at
+			 * least one pathspec that would sort equal to or
+			 * later than the path we are currently looking at.
+			 * In other words, if we have never reached this point
+			 * after iterating all pathspecs, it means all
+			 * pathspecs are either outside of base, or inside the
+			 * base but sorts strictly earlier than the current
+			 * one.  In either case, they will never match the
+			 * subsequent entries.  In such a case, we initialized
+			 * the variable to -1 and that is what will be
+			 * returned, allowing the caller to terminate early.
+			 */
+			never_interesting = 0;
+		}
+
+		if (pathlen > matchlen)
+			continue;
+
+		if (matchlen > pathlen) {
+			if (match[pathlen] != '/')
+				continue;
+			if (!S_ISDIR(entry->mode))
+				continue;
+		}
+
+		if (m == -1)
+			/*
+			 * we cheated and did not do strncmp(), so we do
+			 * that here.
+			 */
+			m = strncmp(match, entry->path, pathlen);
+
+		/*
+		 * If common part matched earlier then it is a hit,
+		 * because we rejected the case where path is not a
+		 * leading directory and is shorter than match.
+		 */
+		if (!m)
+			return 1;
+	}
+	return never_interesting; /* No matches */
+}
diff --git a/tree-walk.h b/tree-walk.h
index 7e3e0b5..c12f0a2 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -60,4 +60,6 @@ static inline int traverse_path_len(const struct traverse_info *info, const stru
 	return info->pathlen + tree_entry_len(n->path, n->sha1);
 }
 
+extern int tree_entry_interesting(const struct name_entry *, const char *, int, const struct pathspec *ps);
+
 #endif
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/19] glossary: define pathspec
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 06/19] Move tree_entry_interesting() to tree-walk.c and export it Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 08/19] pathspec: mark wildcard pathspecs from the beginning Nguyễn Thái Ngọc Duy
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jonathan Nieder,
	Nguyễn Thái Ngọc Duy

From: Jonathan Nieder <jrnieder@gmail.com>


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/glossary-content.txt |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
index 1f029f8..4ed2a28 100644
--- a/Documentation/glossary-content.txt
+++ b/Documentation/glossary-content.txt
@@ -273,6 +273,29 @@ This commit is referred to as a "merge commit", or sometimes just a
 	<<def_pack,pack>>, to assist in efficiently accessing the contents of a
 	pack.
 
+[[def_pathspec]]pathspec::
+       Pattern used to specify paths.
++
+Pathspecs are used on the command line of "git ls-files", "git
+ls-tree", "git grep", "git checkout", and many other commands to
+limit the scope of operations to some subset of the tree or
+worktree.  See the documentation of each command for whether
+paths are relative to the current directory or toplevel.  The
+pathspec syntax is as follows:
+
+* any path matches itself
+* the pathspec up to the last slash represents a
+  directory prefix.  The scope of that pathspec is
+  limited to that subtree.
+* the rest of the pathspec is a pattern for the remainder
+  of the pathname.  Paths relative to the directory
+  prefix will be matched against that pattern using fnmatch(3);
+  in particular, '*' and '?' _can_ match directory separators.
++
+For example, Documentation/*.jpg will match all .jpg files
+in the Documentation subtree,
+including Documentation/chapter_1/figure_1.jpg.
+
 [[def_parent]]parent::
 	A <<def_commit_object,commit object>> contains a (possibly empty) list
 	of the logical predecessor(s) in the line of development, i.e. its
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/19] pathspec: mark wildcard pathspecs from the beginning
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 07/19] glossary: define pathspec Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 18:09   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation Nguyễn Thái Ngọc Duy
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 prefix_len was used, but no longer. Still hesitate to remove it. It
 might get used again..

 cache.h |    4 +++-
 dir.c   |   13 +++++++++++--
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 36819b6..3a1acf1 100644
--- a/cache.h
+++ b/cache.h
@@ -496,8 +496,10 @@ extern int ie_modified(const struct index_state *, struct cache_entry *, struct
 struct pathspec {
 	const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
 	int nr;
+	int has_wildcard:1;
 	struct pathspec_item {
-		int len;
+		int len, prefix_len;
+		int has_wildcard:1;
 	} *items;
 };
 
diff --git a/dir.c b/dir.c
index 646c79f..0987d0c 100644
--- a/dir.c
+++ b/dir.c
@@ -1089,8 +1089,17 @@ int init_pathspec(struct pathspec *pathspec, const char **paths)
 	pathspec->items = xmalloc(sizeof(struct pathspec_item)*pathspec->nr);
 	for (i = 0; i < pathspec->nr; i++) {
 		struct pathspec_item *item = pathspec->items+i;
-
-		item->len = strlen(paths[i]);
+		const char *path = paths[i];
+
+		item->len = strlen(path);
+		item->has_wildcard = !no_wildcard(path);
+		if (item->has_wildcard) {
+			pathspec->has_wildcard = 1;
+			item->prefix_len = 0;
+			while (item->prefix_len < item->len &&
+			       strchr("*?[{\\", path[item->prefix_len]) == NULL)
+				item->prefix_len++;
+		}
 	}
 	return 0;
 }
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (7 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 08/19] pathspec: mark wildcard pathspecs from the beginning Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 18:10   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 10/19] tree_entry_interesting(): factor out most matching logic Nguyễn Thái Ngọc Duy
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This patch make sure that "base" parameter is writable. The callees
are free to modify it as long as base remains the same before
entering and after leaving the callee.

This avoids quite a bit of malloc and memcpy().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 tree-diff.c |   87 +++++++++++++++++++++++++++-------------------------------
 1 files changed, 41 insertions(+), 46 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 50d7e6d..a870f6c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -6,34 +6,17 @@
 #include "diffcore.h"
 #include "tree.h"
 
-static char *malloc_base(const char *base, int baselen, const char *path, int pathlen)
-{
-	char *newbase = xmalloc(baselen + pathlen + 2);
-	memcpy(newbase, base, baselen);
-	memcpy(newbase + baselen, path, pathlen);
-	memcpy(newbase + baselen + pathlen, "/", 2);
-	return newbase;
-}
+static void show_entry(struct diff_options *opt, const char *prefix,
+		       struct tree_desc *desc, char *base, int baselen);
 
-static char *malloc_fullname(const char *base, int baselen, const char *path, int pathlen)
-{
-	char *fullname = xmalloc(baselen + pathlen + 1);
-	memcpy(fullname, base, baselen);
-	memcpy(fullname + baselen, path, pathlen);
-	fullname[baselen + pathlen] = 0;
-	return fullname;
-}
-
-static void show_entry(struct diff_options *opt, const char *prefix, struct tree_desc *desc,
-		       const char *base, int baselen);
-
-static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const char *base, int baselen, struct diff_options *opt)
+static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2,
+			      char *base, int baselen,
+			      struct diff_options *opt)
 {
 	unsigned mode1, mode2;
 	const char *path1, *path2;
 	const unsigned char *sha1, *sha2;
 	int cmp, pathlen1, pathlen2;
-	char *fullname;
 
 	sha1 = tree_entry_extract(t1, &path1, &mode1);
 	sha2 = tree_entry_extract(t2, &path2, &mode2);
@@ -64,26 +47,31 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 
 	if (DIFF_OPT_TST(opt, RECURSIVE) && S_ISDIR(mode1)) {
 		int retval;
-		char *newbase = malloc_base(base, baselen, path1, pathlen1);
+
+		memcpy(base + baselen, path1, pathlen1);
+		memcpy(base + baselen + pathlen1, "/", 2);
+
 		if (DIFF_OPT_TST(opt, TREE_IN_RECURSIVE)) {
-			newbase[baselen + pathlen1] = 0;
+			base[baselen + pathlen1] = 0;
 			opt->change(opt, mode1, mode2,
-				    sha1, sha2, newbase, 0, 0);
-			newbase[baselen + pathlen1] = '/';
+				    sha1, sha2, base, 0, 0);
+			base[baselen + pathlen1] = '/';
 		}
-		retval = diff_tree_sha1(sha1, sha2, newbase, opt);
-		free(newbase);
+		retval = diff_tree_sha1(sha1, sha2, base, opt);
+		base[baselen] = 0;
 		return retval;
 	}
 
-	fullname = malloc_fullname(base, baselen, path1, pathlen1);
-	opt->change(opt, mode1, mode2, sha1, sha2, fullname, 0, 0);
-	free(fullname);
+	memcpy(base + baselen, path1, pathlen1);
+	base[baselen + pathlen1] = 0;
+	opt->change(opt, mode1, mode2, sha1, sha2, base, 0, 0);
+	base[baselen] = 0;
 	return 0;
 }
 
 /* A whole sub-tree went away or appeared */
-static void show_tree(struct diff_options *opt, const char *prefix, struct tree_desc *desc, const char *base, int baselen)
+static void show_tree(struct diff_options *opt, const char *prefix,
+		      struct tree_desc *desc, char *base, int baselen)
 {
 	int all_interesting = 0;
 	while (desc->size) {
@@ -105,8 +93,8 @@ static void show_tree(struct diff_options *opt, const char *prefix, struct tree_
 }
 
 /* A file entry went away or appeared */
-static void show_entry(struct diff_options *opt, const char *prefix, struct tree_desc *desc,
-		       const char *base, int baselen)
+static void show_entry(struct diff_options *opt, const char *prefix,
+		       struct tree_desc *desc, char *base, int baselen)
 {
 	unsigned mode;
 	const char *path;
@@ -115,34 +103,38 @@ static void show_entry(struct diff_options *opt, const char *prefix, struct tree
 
 	if (DIFF_OPT_TST(opt, RECURSIVE) && S_ISDIR(mode)) {
 		enum object_type type;
-		char *newbase = malloc_base(base, baselen, path, pathlen);
 		struct tree_desc inner;
 		void *tree;
 		unsigned long size;
 
+		memcpy(base + baselen, path, pathlen);
+		memcpy(base + baselen + pathlen, "/", 2);
+
 		tree = read_sha1_file(sha1, &type, &size);
 		if (!tree || type != OBJ_TREE)
 			die("corrupt tree sha %s", sha1_to_hex(sha1));
 
 		if (DIFF_OPT_TST(opt, TREE_IN_RECURSIVE)) {
-			newbase[baselen + pathlen] = 0;
-			opt->add_remove(opt, *prefix, mode, sha1, newbase, 0);
-			newbase[baselen + pathlen] = '/';
+			base[baselen + pathlen] = 0;
+			opt->add_remove(opt, *prefix, mode, sha1, base, 0);
+			base[baselen + pathlen] = '/';
 		}
 
 		init_tree_desc(&inner, tree, size);
-		show_tree(opt, prefix, &inner, newbase, baselen + 1 + pathlen);
+		show_tree(opt, prefix, &inner, base, baselen + 1 + pathlen);
 
+		base[baselen] = 0;
 		free(tree);
-		free(newbase);
 	} else {
-		char *fullname = malloc_fullname(base, baselen, path, pathlen);
-		opt->add_remove(opt, prefix[0], mode, sha1, fullname, 0);
-		free(fullname);
+		memcpy(base + baselen, path, pathlen);
+		base[baselen + pathlen] = 0;
+		opt->add_remove(opt, prefix[0], mode, sha1, base, 0);
+		base[baselen] = 0;
 	}
 }
 
-static void skip_uninteresting(struct tree_desc *t, const char *base, int baselen, struct diff_options *opt)
+static void skip_uninteresting(struct tree_desc *t, char *base,
+			       int baselen, struct diff_options *opt)
 {
 	int all_interesting = 0;
 	while (t->size) {
@@ -166,10 +158,13 @@ static void skip_uninteresting(struct tree_desc *t, const char *base, int basele
 	}
 }
 
-int diff_tree(struct tree_desc *t1, struct tree_desc *t2, const char *base, struct diff_options *opt)
+int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
+	      const char *base_, struct diff_options *opt)
 {
-	int baselen = strlen(base);
+	char base[PATH_MAX];
+	int baselen = strlen(base_);
 
+	memcpy(base, base_, baselen+1);
 	for (;;) {
 		if (DIFF_OPT_TST(opt, QUICK) &&
 		    DIFF_OPT_TST(opt, HAS_CHANGES))
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/19] tree_entry_interesting(): factor out most matching logic
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (8 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 18:10   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 11/19] tree_entry_interesting: support depth limit Nguyễn Thái Ngọc Duy
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 tree-walk.c |  168 ++++++++++++++++++++++++++++++++--------------------------
 1 files changed, 93 insertions(+), 75 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 01168ea..40a4657 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -456,6 +456,91 @@ int get_tree_entry(const unsigned char *tree_sha1, const char *name, unsigned ch
 	return retval;
 }
 
+static int match_entry(const struct name_entry *entry, int pathlen,
+		       const char *match, int matchlen,
+		       int *never_interesting)
+{
+	int m = -1; /* signals that we haven't called strncmp() */
+
+	if (*never_interesting) {
+		/*
+		 * We have not seen any match that sorts later
+		 * than the current path.
+		 */
+
+		/*
+		 * Does match sort strictly earlier than path
+		 * with their common parts?
+		 */
+		m = strncmp(match, entry->path,
+			    (matchlen < pathlen) ? matchlen : pathlen);
+		if (m < 0)
+			return 0;
+
+		/*
+		 * If we come here even once, that means there is at
+		 * least one pathspec that would sort equal to or
+		 * later than the path we are currently looking at.
+		 * In other words, if we have never reached this point
+		 * after iterating all pathspecs, it means all
+		 * pathspecs are either outside of base, or inside the
+		 * base but sorts strictly earlier than the current
+		 * one.  In either case, they will never match the
+		 * subsequent entries.  In such a case, we initialized
+		 * the variable to -1 and that is what will be
+		 * returned, allowing the caller to terminate early.
+		 */
+		*never_interesting = 0;
+	}
+
+	if (pathlen > matchlen)
+		return 0;
+
+	if (matchlen > pathlen) {
+		if (match[pathlen] != '/')
+			return 0;
+		if (!S_ISDIR(entry->mode))
+			return 0;
+	}
+
+	if (m == -1)
+		/*
+		 * we cheated and did not do strncmp(), so we do
+		 * that here.
+		 */
+		m = strncmp(match, entry->path, pathlen);
+
+	/*
+	 * If common part matched earlier then it is a hit,
+	 * because we rejected the case where path is not a
+	 * leading directory and is shorter than match.
+	 */
+	if (!m)
+		return 1;
+
+	return 0;
+}
+
+static int match_dir_prefix(const char *base, int baselen,
+			    const char *match, int matchlen)
+{
+	/* If it doesn't match, move along... */
+	if (strncmp(base, match, matchlen))
+		return 0;
+
+	/*
+	 * If the base is a subdirectory of a path which
+	 * was specified, all of them are interesting.
+	 */
+	if (!matchlen ||
+	    base[matchlen] == '/' ||
+	    match[matchlen - 1] == '/')
+		return 1;
+
+	/* Just a random prefix match */
+	return 0;
+}
+
 /*
  * Is a tree entry interesting given the pathspec we have?
  *
@@ -481,88 +566,21 @@ int tree_entry_interesting(const struct name_entry *entry,
 	for (i = 0; i < ps->nr; i++) {
 		const char *match = ps->raw[i];
 		int matchlen = ps->items[i].len;
-		int m = -1; /* signals that we haven't called strncmp() */
 
 		if (baselen >= matchlen) {
-			/* If it doesn't match, move along... */
-			if (strncmp(base, match, matchlen))
+			if (!match_dir_prefix(base, baselen, match, matchlen))
+				/* Just a random prefix match */
 				continue;
-
-			/*
-			 * If the base is a subdirectory of a path which
-			 * was specified, all of them are interesting.
-			 */
-			if (!matchlen ||
-			    base[matchlen] == '/' ||
-			    match[matchlen - 1] == '/')
-				return 2;
-
-			/* Just a random prefix match */
-			continue;
+			return 2;
 		}
 
 		/* Does the base match? */
-		if (strncmp(base, match, baselen))
-			continue;
-
-		match += baselen;
-		matchlen -= baselen;
-
-		if (never_interesting) {
-			/*
-			 * We have not seen any match that sorts later
-			 * than the current path.
-			 */
-
-			/*
-			 * Does match sort strictly earlier than path
-			 * with their common parts?
-			 */
-			m = strncmp(match, entry->path,
-				    (matchlen < pathlen) ? matchlen : pathlen);
-			if (m < 0)
-				continue;
-
-			/*
-			 * If we come here even once, that means there is at
-			 * least one pathspec that would sort equal to or
-			 * later than the path we are currently looking at.
-			 * In other words, if we have never reached this point
-			 * after iterating all pathspecs, it means all
-			 * pathspecs are either outside of base, or inside the
-			 * base but sorts strictly earlier than the current
-			 * one.  In either case, they will never match the
-			 * subsequent entries.  In such a case, we initialized
-			 * the variable to -1 and that is what will be
-			 * returned, allowing the caller to terminate early.
-			 */
-			never_interesting = 0;
+		if (!strncmp(base, match, baselen)) {
+			if (match_entry(entry, pathlen,
+					match + baselen, matchlen - baselen,
+					&never_interesting))
+				return 1;
 		}
-
-		if (pathlen > matchlen)
-			continue;
-
-		if (matchlen > pathlen) {
-			if (match[pathlen] != '/')
-				continue;
-			if (!S_ISDIR(entry->mode))
-				continue;
-		}
-
-		if (m == -1)
-			/*
-			 * we cheated and did not do strncmp(), so we do
-			 * that here.
-			 */
-			m = strncmp(match, entry->path, pathlen);
-
-		/*
-		 * If common part matched earlier then it is a hit,
-		 * because we rejected the case where path is not a
-		 * leading directory and is shorter than match.
-		 */
-		if (!m)
-			return 1;
 	}
 	return never_interesting; /* No matches */
 }
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/19] tree_entry_interesting: support depth limit
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (9 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 10/19] tree_entry_interesting(): factor out most matching logic Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 18:10   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 12/19] tree_entry_interesting(): support wildcard matching Nguyễn Thái Ngọc Duy
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This is needed to replace pathspec_matches() in builtin/grep.c. Depth
limit is only effective when pathspec.recursive == 1

max_depth == -1 means unlimited depth.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 pathspec.recursive is also needed for wildcard matching later on.

 cache.h     |    2 ++
 dir.c       |   15 +++++++++++++++
 dir.h       |    1 +
 tree-diff.c |    2 ++
 tree-walk.c |   19 +++++++++++++++++--
 5 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 3a1acf1..56da281 100644
--- a/cache.h
+++ b/cache.h
@@ -497,6 +497,8 @@ struct pathspec {
 	const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
 	int nr;
 	int has_wildcard:1;
+	int recursive:1;
+	int max_depth;
 	struct pathspec_item {
 		int len, prefix_len;
 		int has_wildcard:1;
diff --git a/dir.c b/dir.c
index 0987d0c..bb5076c 100644
--- a/dir.c
+++ b/dir.c
@@ -71,6 +71,21 @@ int fill_directory(struct dir_struct *dir, const char **pathspec)
 	return len;
 }
 
+int within_depth(const char *name, int namelen,
+			int depth, int max_depth)
+{
+	const char *cp = name, *cpe = name + namelen;
+
+	while (cp < cpe) {
+		if (*cp++ != '/')
+			continue;
+		depth++;
+		if (depth > max_depth)
+			return 0;
+	}
+	return 1;
+}
+
 /*
  * Does 'match' match the given name?
  * A match is found if
diff --git a/dir.h b/dir.h
index 278d84c..c71de08 100644
--- a/dir.h
+++ b/dir.h
@@ -65,6 +65,7 @@ struct dir_struct {
 #define MATCHED_FNMATCH 2
 #define MATCHED_EXACTLY 3
 extern int match_pathspec(const char **pathspec, const char *name, int namelen, int prefix, char *seen);
+extern int within_depth(const char *name, int namelen, int depth, int max_depth);
 
 extern int fill_directory(struct dir_struct *dir, const char **pathspec);
 extern int read_directory(struct dir_struct *, const char *path, int len, const char **pathspec);
diff --git a/tree-diff.c b/tree-diff.c
index a870f6c..7c3b770 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -165,6 +165,8 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 	int baselen = strlen(base_);
 
 	memcpy(base, base_, baselen+1);
+	opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
+	opt->pathspec.max_depth = -1;
 	for (;;) {
 		if (DIFF_OPT_TST(opt, QUICK) &&
 		    DIFF_OPT_TST(opt, HAS_CHANGES))
diff --git a/tree-walk.c b/tree-walk.c
index 40a4657..d28de30 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tree-walk.h"
 #include "unpack-trees.h"
+#include "dir.h"
 #include "tree.h"
 
 static const char *get_mode(const char *str, unsigned int *modep)
@@ -558,9 +559,17 @@ int tree_entry_interesting(const struct name_entry *entry,
 	int pathlen;
 	int never_interesting = -1;
 
-	if (!ps || !ps->nr)
+	if (!ps)
 		return 1;
 
+	if (!ps->nr) {
+		if (!ps->recursive || ps->max_depth == -1)
+			return 1;
+		return !!within_depth(base, baselen,
+				      !!S_ISDIR(entry->mode),
+				      ps->max_depth);
+	}
+
 	pathlen = tree_entry_len(entry->path, entry->sha1);
 
 	for (i = 0; i < ps->nr; i++) {
@@ -571,7 +580,13 @@ int tree_entry_interesting(const struct name_entry *entry,
 			if (!match_dir_prefix(base, baselen, match, matchlen))
 				/* Just a random prefix match */
 				continue;
-			return 2;
+
+			if (!ps->recursive || ps->max_depth == -1)
+				return 2;
+
+			return !!within_depth(base+matchlen+1, baselen-matchlen-1,
+					      !!S_ISDIR(entry->mode),
+					      ps->max_depth);
 		}
 
 		/* Does the base match? */
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/19] tree_entry_interesting(): support wildcard matching
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (10 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 11/19] tree_entry_interesting: support depth limit Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 18:10   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 13/19] tree_entry_interesting(): optimize fnmatch when base is matched Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 tree-walk.c |   33 ++++++++++++++++++++++++++++++---
 tree-walk.h |    2 +-
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index d28de30..b5ad42b 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -552,7 +552,7 @@ static int match_dir_prefix(const char *base, int baselen,
  *  - negative for "no, and no subsequent entries will be either"
  */
 int tree_entry_interesting(const struct name_entry *entry,
-			   const char *base, int baselen,
+			   char *base, int baselen,
 			   const struct pathspec *ps)
 {
 	int i;
@@ -573,13 +573,15 @@ int tree_entry_interesting(const struct name_entry *entry,
 	pathlen = tree_entry_len(entry->path, entry->sha1);
 
 	for (i = 0; i < ps->nr; i++) {
+		const struct pathspec_item *item = ps->items+i;
+
 		const char *match = ps->raw[i];
-		int matchlen = ps->items[i].len;
+		int matchlen = item->len;
 
 		if (baselen >= matchlen) {
 			if (!match_dir_prefix(base, baselen, match, matchlen))
 				/* Just a random prefix match */
-				continue;
+				goto match_wildcards;
 
 			if (!ps->recursive || ps->max_depth == -1)
 				return 2;
@@ -596,6 +598,31 @@ int tree_entry_interesting(const struct name_entry *entry,
 					&never_interesting))
 				return 1;
 		}
+
+match_wildcards:
+		/*
+		 * Concatenate base and entry->path into one and do
+		 * fnmatch() on it.
+		 */
+
+		if (!item->has_wildcard)
+			continue;
+
+		never_interesting = 0;
+		memcpy(base + baselen, entry->path, pathlen+1);
+
+		if (!fnmatch(match, base, 0)) {
+			base[baselen] = 0;
+			return 1;
+		}
+		base[baselen] = 0;
+
+		/*
+		 * Match all directories. We'll try to match files
+		 * later on.
+		 */
+		if (ps->recursive && S_ISDIR(entry->mode))
+			return 1;
 	}
 	return never_interesting; /* No matches */
 }
diff --git a/tree-walk.h b/tree-walk.h
index c12f0a2..94e0ef4 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -60,6 +60,6 @@ static inline int traverse_path_len(const struct traverse_info *info, const stru
 	return info->pathlen + tree_entry_len(n->path, n->sha1);
 }
 
-extern int tree_entry_interesting(const struct name_entry *, const char *, int, const struct pathspec *ps);
+extern int tree_entry_interesting(const struct name_entry *, char *, int, const struct pathspec *ps);
 
 #endif
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/19] tree_entry_interesting(): optimize fnmatch when base is matched
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (11 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 12/19] tree_entry_interesting(): support wildcard matching Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 18:10   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 14/19] Convert ce_path_match() use to match_pathspec() Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

If base is already matched, skip that part when calling
fnmatch(). This happens quite often when users start a command from
worktree's subdirectory and prefix is usually prepended to all
pathspecs.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 t/t4010-diff-pathspec.sh |   14 ++++++++++++++
 tree-walk.c              |   15 +++++++++++++++
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
index 94df7ae..4b120f8 100755
--- a/t/t4010-diff-pathspec.sh
+++ b/t/t4010-diff-pathspec.sh
@@ -70,4 +70,18 @@ test_expect_success 'diff-tree pathspec' '
 	test_cmp expected current
 '
 
+EMPTY_TREE=4b825dc642cb6eb9a060e54bf8d69288fbee4904
+
+test_expect_success 'diff-tree with wildcard shows dir also matches' '
+	git diff-tree --name-only $EMPTY_TREE $tree -- "f*" >result &&
+	echo file0 >expected &&
+	test_cmp expected result
+'
+
+test_expect_success 'diff-tree -r with wildcard' '
+	git diff-tree -r --name-only $EMPTY_TREE $tree -- "*file1" >result &&
+	echo path1/file1 >expected &&
+	test_cmp expected result
+'
+
 test_done
diff --git a/tree-walk.c b/tree-walk.c
index b5ad42b..e1a18fc 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -597,6 +597,21 @@ int tree_entry_interesting(const struct name_entry *entry,
 					match + baselen, matchlen - baselen,
 					&never_interesting))
 				return 1;
+
+			if (item->has_wildcard) {
+				never_interesting = 0;
+				if (!fnmatch(match + baselen, entry->path, 0))
+					return 1;
+
+				/*
+				 * Match all directories. We'll try to
+				 * match files later on.
+				 */
+				if (ps->recursive && S_ISDIR(entry->mode))
+					return 1;
+			}
+
+			continue;
 		}
 
 match_wildcards:
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/19] Convert ce_path_match() use to match_pathspec()
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (12 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 13/19] tree_entry_interesting(): optimize fnmatch when base is matched Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 19:31   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 15/19] pathspec: add match_pathspec_depth() Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Previously ce_path_match() is used together with tree_entry_interesting().
Both do not support wildcards. tree_entry_interesting() understands
wildcards now, so it's time to teach ce_path_match() to do the same.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c |   20 +-------------------
 1 files changed, 1 insertions(+), 19 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 1f42473..cbabd8b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -685,28 +685,10 @@ int ce_same_name(struct cache_entry *a, struct cache_entry *b)
 
 int ce_path_match(const struct cache_entry *ce, const char **pathspec)
 {
-	const char *match, *name;
-	int len;
-
 	if (!pathspec)
 		return 1;
 
-	len = ce_namelen(ce);
-	name = ce->name;
-	while ((match = *pathspec++) != NULL) {
-		int matchlen = strlen(match);
-		if (matchlen > len)
-			continue;
-		if (memcmp(name, match, matchlen))
-			continue;
-		if (matchlen && name[matchlen-1] == '/')
-			return 1;
-		if (name[matchlen] == '/' || !name[matchlen])
-			return 1;
-		if (!matchlen)
-			return 1;
-	}
-	return 0;
+	return match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, NULL);
 }
 
 /*
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/19] pathspec: add match_pathspec_depth()
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (13 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 14/19] Convert ce_path_match() use to match_pathspec() Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13 19:28   ` Junio C Hamano
  2010-12-13  9:46 ` [PATCH 16/19] grep: convert to use struct pathspec Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

match_pathspec_depth() is similar to match_pathspec() except that it
can take depth limit.

In long term, match_pathspec() should be removed in favor of this
function.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 dir.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 dir.h |    3 +++
 2 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/dir.c b/dir.c
index bb5076c..e12dbdd 100644
--- a/dir.c
+++ b/dir.c
@@ -169,6 +169,51 @@ int match_pathspec(const char **pathspec, const char *name, int namelen,
 	return retval;
 }
 
+int match_pathspec_depth(const char **pathspec, int max_depth,
+			 const char *name, int namelen,
+			 int prefix, char *seen)
+{
+	int i, retval = 0;
+
+	if (!pathspec) {
+		if (max_depth == -1)
+			return MATCHED_RECURSIVELY;
+
+		if (within_depth(name, namelen, 0, max_depth))
+			return MATCHED_EXACTLY;
+		else
+			return 0;
+	}
+
+	name += prefix;
+	namelen -= prefix;
+
+	for (i = 0; pathspec[i] != NULL; i++) {
+		int how;
+		const char *match = pathspec[i] + prefix;
+		if (seen && seen[i] == MATCHED_EXACTLY)
+			continue;
+		how = match_one(match, name, namelen);
+		if (max_depth != -1 &&
+		    how && how != MATCHED_FNMATCH) {
+			int len = strlen(match);
+			if (name[len] == '/')
+				len++;
+			if (within_depth(name+len, namelen-len, 0, max_depth))
+				how = MATCHED_EXACTLY;
+			else
+				how = 0;
+		}
+		if (how) {
+			if (retval < how)
+				retval = how;
+			if (seen && seen[i] < how)
+				seen[i] = how;
+		}
+	}
+	return retval;
+}
+
 static int no_wildcard(const char *string)
 {
 	return string[strcspn(string, "*?[{\\")] == '\0';
diff --git a/dir.h b/dir.h
index c71de08..9dc9592 100644
--- a/dir.h
+++ b/dir.h
@@ -65,6 +65,9 @@ struct dir_struct {
 #define MATCHED_FNMATCH 2
 #define MATCHED_EXACTLY 3
 extern int match_pathspec(const char **pathspec, const char *name, int namelen, int prefix, char *seen);
+extern int match_pathspec_depth(const char **pathspec, int max_depth,
+				const char *name, int namelen,
+				int prefix, char *seen);
 extern int within_depth(const char *name, int namelen, int depth, int max_depth);
 
 extern int fill_directory(struct dir_struct *dir, const char **pathspec);
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/19] grep: convert to use struct pathspec
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (14 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 15/19] pathspec: add match_pathspec_depth() Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 17/19] grep: use match_pathspec_depth() for cache/worktree grepping Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/grep.c |   30 ++++++++++++++++--------------
 1 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index da32f3d..4179af8 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -585,7 +585,7 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 	free(argv);
 }
 
-static int grep_cache(struct grep_opt *opt, const char **paths, int cached)
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
 {
 	int hit = 0;
 	int nr;
@@ -595,7 +595,7 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached)
 		struct cache_entry *ce = active_cache[nr];
 		if (!S_ISREG(ce->ce_mode))
 			continue;
-		if (!pathspec_matches(paths, ce->name, opt->max_depth))
+		if (!pathspec_matches(pathspec->raw, ce->name, opt->max_depth))
 			continue;
 		/*
 		 * If CE_VALID is on, we assume worktree file and its cache entry
@@ -622,7 +622,7 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached)
 	return hit;
 }
 
-static int grep_tree(struct grep_opt *opt, const char **paths,
+static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		     struct tree_desc *tree,
 		     const char *tree_name, const char *base)
 {
@@ -656,7 +656,7 @@ static int grep_tree(struct grep_opt *opt, const char **paths,
 			strbuf_addch(&pathbuf, '/');
 
 		down = pathbuf.buf + tn_len;
-		if (!pathspec_matches(paths, down, opt->max_depth))
+		if (!pathspec_matches(pathspec->raw, down, opt->max_depth))
 			;
 		else if (S_ISREG(entry.mode))
 			hit |= grep_sha1(opt, entry.sha1, pathbuf.buf, tn_len);
@@ -671,7 +671,7 @@ static int grep_tree(struct grep_opt *opt, const char **paths,
 				die("unable to read tree (%s)",
 				    sha1_to_hex(entry.sha1));
 			init_tree_desc(&sub, data, size);
-			hit |= grep_tree(opt, paths, &sub, tree_name, down);
+			hit |= grep_tree(opt, pathspec, &sub, tree_name, down);
 			free(data);
 		}
 		if (hit && opt->status_only)
@@ -681,7 +681,7 @@ static int grep_tree(struct grep_opt *opt, const char **paths,
 	return hit;
 }
 
-static int grep_object(struct grep_opt *opt, const char **paths,
+static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		       struct object *obj, const char *name)
 {
 	if (obj->type == OBJ_BLOB)
@@ -696,14 +696,14 @@ static int grep_object(struct grep_opt *opt, const char **paths,
 		if (!data)
 			die("unable to read tree (%s)", sha1_to_hex(obj->sha1));
 		init_tree_desc(&tree, data, size);
-		hit = grep_tree(opt, paths, &tree, name, "");
+		hit = grep_tree(opt, pathspec, &tree, name, "");
 		free(data);
 		return hit;
 	}
 	die("unable to grep from object of type %s", typename(obj->type));
 }
 
-static int grep_objects(struct grep_opt *opt, const char **paths,
+static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 			const struct object_array *list)
 {
 	unsigned int i;
@@ -713,7 +713,7 @@ static int grep_objects(struct grep_opt *opt, const char **paths,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
-		if (grep_object(opt, paths, real_obj, list->objects[i].name)) {
+		if (grep_object(opt, pathspec, real_obj, list->objects[i].name)) {
 			hit = 1;
 			if (opt->status_only)
 				break;
@@ -722,7 +722,7 @@ static int grep_objects(struct grep_opt *opt, const char **paths,
 	return hit;
 }
 
-static int grep_directory(struct grep_opt *opt, const char **paths)
+static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec)
 {
 	struct dir_struct dir;
 	int i, hit = 0;
@@ -730,7 +730,7 @@ static int grep_directory(struct grep_opt *opt, const char **paths)
 	memset(&dir, 0, sizeof(dir));
 	setup_standard_excludes(&dir);
 
-	fill_directory(&dir, paths);
+	fill_directory(&dir, pathspec->raw);
 	for (i = 0; i < dir.nr; i++) {
 		hit |= grep_file(opt, dir.entries[i]->name);
 		if (hit && opt->status_only)
@@ -836,6 +836,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	struct grep_opt opt;
 	struct object_array list = OBJECT_ARRAY_INIT;
 	const char **paths = NULL;
+	struct pathspec pathspec;
 	struct string_list path_list = STRING_LIST_INIT_NODUP;
 	int i;
 	int dummy;
@@ -1063,6 +1064,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		paths[0] = prefix;
 		paths[1] = NULL;
 	}
+	init_pathspec(&pathspec, paths);
 
 	if (show_in_pager && (cached || list.nr))
 		die("--open-files-in-pager only works on the worktree");
@@ -1093,16 +1095,16 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			die("--cached cannot be used with --no-index.");
 		if (list.nr)
 			die("--no-index cannot be used with revs.");
-		hit = grep_directory(&opt, paths);
+		hit = grep_directory(&opt, &pathspec);
 	} else if (!list.nr) {
 		if (!cached)
 			setup_work_tree();
 
-		hit = grep_cache(&opt, paths, cached);
+		hit = grep_cache(&opt, &pathspec, cached);
 	} else {
 		if (cached)
 			die("both --cached and trees are given.");
-		hit = grep_objects(&opt, paths, &list);
+		hit = grep_objects(&opt, &pathspec, &list);
 	}
 
 	if (use_threads)
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/19] grep: use match_pathspec_depth() for cache/worktree grepping
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (15 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 16/19] grep: convert to use struct pathspec Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 18/19] grep: use preallocated buffer for grep_tree() Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 19/19] grep: drop pathspec_matches() in favor of tree_entry_interesting() Nguyễn Thái Ngọc Duy
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/grep.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 4179af8..1646e15 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -595,7 +595,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		struct cache_entry *ce = active_cache[nr];
 		if (!S_ISREG(ce->ce_mode))
 			continue;
-		if (!pathspec_matches(pathspec->raw, ce->name, opt->max_depth))
+		if (!match_pathspec_depth(pathspec->raw, opt->max_depth,
+					  ce->name, ce_namelen(ce), 0, NULL))
 			continue;
 		/*
 		 * If CE_VALID is on, we assume worktree file and its cache entry
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/19] grep: use preallocated buffer for grep_tree()
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (16 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 17/19] grep: use match_pathspec_depth() for cache/worktree grepping Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  2010-12-13  9:46 ` [PATCH 19/19] grep: drop pathspec_matches() in favor of tree_entry_interesting() Nguyễn Thái Ngọc Duy
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 because tree_entry_interesting() will need writable "base".

 builtin/grep.c |   59 +++++++++++++++++++++++++++++--------------------------
 1 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 1646e15..6bd5728 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -623,44 +623,38 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 	return hit;
 }
 
+/*
+ * "base" is a writable buffer where
+ * - base[-tree_name_len..-1] contains tree_name
+ * - base[0..baselen-1] contains tree base
+ */
 static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
-		     struct tree_desc *tree,
-		     const char *tree_name, const char *base)
+		     struct tree_desc *tree, char *base,
+		     int baselen, int tree_name_len)
 {
-	int len;
 	int hit = 0;
 	struct name_entry entry;
-	char *down;
-	int tn_len = strlen(tree_name);
-	struct strbuf pathbuf;
-
-	strbuf_init(&pathbuf, PATH_MAX + tn_len);
-
-	if (tn_len) {
-		strbuf_add(&pathbuf, tree_name, tn_len);
-		strbuf_addch(&pathbuf, ':');
-		tn_len = pathbuf.len;
-	}
-	strbuf_addstr(&pathbuf, base);
-	len = pathbuf.len;
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(entry.path, entry.sha1);
-		pathbuf.len = len;
-		strbuf_add(&pathbuf, entry.path, te_len);
+		int len = baselen;
+		memcpy(base + baselen, entry.path, te_len+1);
 
-		if (S_ISDIR(entry.mode))
+		len += te_len;
+		if (S_ISDIR(entry.mode)) {
 			/* Match "abc/" against pathspec to
 			 * decide if we want to descend into "abc"
 			 * directory.
 			 */
-			strbuf_addch(&pathbuf, '/');
+			base[len++] = '/';
+			base[len] = 0;
+		}
 
-		down = pathbuf.buf + tn_len;
-		if (!pathspec_matches(pathspec->raw, down, opt->max_depth))
+		if (!pathspec_matches(pathspec->raw, base, opt->max_depth))
 			;
-		else if (S_ISREG(entry.mode))
-			hit |= grep_sha1(opt, entry.sha1, pathbuf.buf, tn_len);
+		else if (S_ISREG(entry.mode)) {
+			hit |= grep_sha1(opt, entry.sha1, base-tree_name_len, tree_name_len);
+		}
 		else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
@@ -672,13 +666,12 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 				die("unable to read tree (%s)",
 				    sha1_to_hex(entry.sha1));
 			init_tree_desc(&sub, data, size);
-			hit |= grep_tree(opt, pathspec, &sub, tree_name, down);
+			hit |= grep_tree(opt, pathspec, &sub, base, len, tree_name_len);
 			free(data);
 		}
 		if (hit && opt->status_only)
 			break;
 	}
-	strbuf_release(&pathbuf);
 	return hit;
 }
 
@@ -691,13 +684,23 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		struct tree_desc tree;
 		void *data;
 		unsigned long size;
-		int hit;
+		int hit, len;
+		char *base;
+
 		data = read_object_with_reference(obj->sha1, tree_type,
 						  &size, NULL);
 		if (!data)
 			die("unable to read tree (%s)", sha1_to_hex(obj->sha1));
 		init_tree_desc(&tree, data, size);
-		hit = grep_tree(opt, pathspec, &tree, name, "");
+		len = name ? strlen(name) : 0;
+		base = xmalloc(PATH_MAX + len + 1);
+		if (len) {
+			memcpy(base, name, len);
+			base[len++] = ':';
+		}
+		base[len] = 0;
+		hit = grep_tree(opt, pathspec, &tree, base+len, 0, len);
+		free(base);
 		free(data);
 		return hit;
 	}
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 19/19] grep: drop pathspec_matches() in favor of tree_entry_interesting()
  2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
                   ` (17 preceding siblings ...)
  2010-12-13  9:46 ` [PATCH 18/19] grep: use preallocated buffer for grep_tree() Nguyễn Thái Ngọc Duy
@ 2010-12-13  9:46 ` Nguyễn Thái Ngọc Duy
  18 siblings, 0 replies; 45+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2010-12-13  9:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy


Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/grep.c |  130 ++++++-------------------------------------------------
 1 files changed, 15 insertions(+), 115 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 6bd5728..a56cdd6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -333,106 +333,6 @@ static int grep_config(const char *var, const char *value, void *cb)
 	return 0;
 }
 
-/*
- * Return non-zero if max_depth is negative or path has no more then max_depth
- * slashes.
- */
-static int accept_subdir(const char *path, int max_depth)
-{
-	if (max_depth < 0)
-		return 1;
-
-	while ((path = strchr(path, '/')) != NULL) {
-		max_depth--;
-		if (max_depth < 0)
-			return 0;
-		path++;
-	}
-	return 1;
-}
-
-/*
- * Return non-zero if name is a subdirectory of match and is not too deep.
- */
-static int is_subdir(const char *name, int namelen,
-		const char *match, int matchlen, int max_depth)
-{
-	if (matchlen > namelen || strncmp(name, match, matchlen))
-		return 0;
-
-	if (name[matchlen] == '\0') /* exact match */
-		return 1;
-
-	if (!matchlen || match[matchlen-1] == '/' || name[matchlen] == '/')
-		return accept_subdir(name + matchlen + 1, max_depth);
-
-	return 0;
-}
-
-/*
- * git grep pathspecs are somewhat different from diff-tree pathspecs;
- * pathname wildcards are allowed.
- */
-static int pathspec_matches(const char **paths, const char *name, int max_depth)
-{
-	int namelen, i;
-	if (!paths || !*paths)
-		return accept_subdir(name, max_depth);
-	namelen = strlen(name);
-	for (i = 0; paths[i]; i++) {
-		const char *match = paths[i];
-		int matchlen = strlen(match);
-		const char *cp, *meta;
-
-		if (is_subdir(name, namelen, match, matchlen, max_depth))
-			return 1;
-		if (!fnmatch(match, name, 0))
-			return 1;
-		if (name[namelen-1] != '/')
-			continue;
-
-		/* We are being asked if the directory ("name") is worth
-		 * descending into.
-		 *
-		 * Find the longest leading directory name that does
-		 * not have metacharacter in the pathspec; the name
-		 * we are looking at must overlap with that directory.
-		 */
-		for (cp = match, meta = NULL; cp - match < matchlen; cp++) {
-			char ch = *cp;
-			if (ch == '*' || ch == '[' || ch == '?') {
-				meta = cp;
-				break;
-			}
-		}
-		if (!meta)
-			meta = cp; /* fully literal */
-
-		if (namelen <= meta - match) {
-			/* Looking at "Documentation/" and
-			 * the pattern says "Documentation/howto/", or
-			 * "Documentation/diff*.txt".  The name we
-			 * have should match prefix.
-			 */
-			if (!memcmp(match, name, namelen))
-				return 1;
-			continue;
-		}
-
-		if (meta - match < namelen) {
-			/* Looking at "Documentation/howto/" and
-			 * the pattern says "Documentation/h*";
-			 * match up to "Do.../h"; this avoids descending
-			 * into "Documentation/technical/".
-			 */
-			if (!memcmp(match, name, meta - match))
-				return 1;
-			continue;
-		}
-	}
-	return 0;
-}
-
 static void *lock_and_read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
 {
 	void *data;
@@ -632,39 +532,37 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		     struct tree_desc *tree, char *base,
 		     int baselen, int tree_name_len)
 {
-	int hit = 0;
+	int hit = 0, matched = 0;
 	struct name_entry entry;
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(entry.path, entry.sha1);
-		int len = baselen;
-		memcpy(base + baselen, entry.path, te_len+1);
 
-		len += te_len;
-		if (S_ISDIR(entry.mode)) {
-			/* Match "abc/" against pathspec to
-			 * decide if we want to descend into "abc"
-			 * directory.
-			 */
-			base[len++] = '/';
-			base[len] = 0;
+		if (matched != 2) {
+			matched = tree_entry_interesting(&entry, base, baselen, pathspec);
+			if (matched == -1)
+				break; /* no more matches */
+			if (!matched)
+				continue;
 		}
 
-		if (!pathspec_matches(pathspec->raw, base, opt->max_depth))
-			;
-		else if (S_ISREG(entry.mode)) {
+		memcpy(base + baselen, entry.path, te_len+1);
+		if (S_ISREG(entry.mode))
 			hit |= grep_sha1(opt, entry.sha1, base-tree_name_len, tree_name_len);
-		}
 		else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
 			unsigned long size;
+			int len = baselen + te_len;
 
 			data = lock_and_read_sha1_file(entry.sha1, &type, &size);
 			if (!data)
 				die("unable to read tree (%s)",
 				    sha1_to_hex(entry.sha1));
+
+			base[len++] = '/';
+			base[len] = 0;
 			init_tree_desc(&sub, data, size);
 			hit |= grep_tree(opt, pathspec, &sub, base, len, tree_name_len);
 			free(data);
@@ -1069,6 +967,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		paths[1] = NULL;
 	}
 	init_pathspec(&pathspec, paths);
+	pathspec.max_depth = opt.max_depth;
+	pathspec.recursive = 1;
 
 	if (show_in_pager && (cached || list.nr))
 		die("--open-files-in-pager only works on the worktree");
-- 
1.7.3.3.476.g10a82

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 01/19] Add struct pathspec
  2010-12-13  9:46 ` [PATCH 01/19] Add struct pathspec Nguyễn Thái Ngọc Duy
@ 2010-12-13 17:31   ` Thiago Farina
  2010-12-14 12:50     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Thiago Farina @ 2010-12-13 17:31 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano

2010/12/13 Nguyễn Thái Ngọc Duy <pclouds@gmail.com>:
> This struct for now is just a wrapper for the current pathspec form:
> const char **. It is intended to be extended with more useful
> pathspec-related information over time.
>
> The data structure for passing pathspec around remains const char **,
> struct pathspec will be initialized locally to be used and destroyed.
> Hopefully all pathspec related code will be gradually migrated to pass
> this struct instead.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  cache.h |    7 +++++++
>  dir.c   |   18 ++++++++++++++++++
>  2 files changed, 25 insertions(+), 0 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 2ef2fa3..3330769 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -493,6 +493,13 @@ extern int index_name_is_other(const struct index_state *, const char *, int);
>  extern int ie_match_stat(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
>  extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
>
> +struct pathspec {
> +       const char **raw; /* get_pathspec() result, not freed by free_pathspec() */
> +       int nr;
> +};
> +
> +extern int init_pathspec(struct pathspec *, const char **);
> +extern void free_pathspec(struct pathspec *);
>  extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
>  extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
>  extern int index_path(unsigned char *sha1, const char *path, struct stat *st, int write_object);
> diff --git a/dir.c b/dir.c
> index 133f472..205adc4 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1071,3 +1071,21 @@ int remove_path(const char *name)
>        return 0;
>  }
>
> +int init_pathspec(struct pathspec *pathspec, const char **paths)
> +{
> +       const char **p = paths;
> +
> +       memset(pathspec, 0, sizeof(*pathspec));
> +       if (!p)
> +               return 0;
> +       while (*p)
> +               p++;
> +       pathspec->raw = paths;
> +       pathspec->nr = p - paths;
> +       return 0;
> +}
> +
> +void free_pathspec(struct pathspec *pathspec)
> +{
> +       ; /* do nothing */

Of curse this ; here is not necessary :)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 08/19] pathspec: mark wildcard pathspecs from the beginning
  2010-12-13  9:46 ` [PATCH 08/19] pathspec: mark wildcard pathspecs from the beginning Nguyễn Thái Ngọc Duy
@ 2010-12-13 18:09   ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 18:09 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> diff --git a/dir.c b/dir.c
> index 646c79f..0987d0c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1089,8 +1089,17 @@ int init_pathspec(struct pathspec *pathspec, const char **paths)
>  	pathspec->items = xmalloc(sizeof(struct pathspec_item)*pathspec->nr);
>  	for (i = 0; i < pathspec->nr; i++) {
>  		struct pathspec_item *item = pathspec->items+i;
> -
> -		item->len = strlen(paths[i]);
> +		const char *path = paths[i];
> +
> +		item->len = strlen(path);
> +		item->has_wildcard = !no_wildcard(path);
> +		if (item->has_wildcard) {
> +			pathspec->has_wildcard = 1;
> +			item->prefix_len = 0;
> +			while (item->prefix_len < item->len &&
> +			       strchr("*?[{\\", path[item->prefix_len]) == NULL)
> +				item->prefix_len++;
> +		}

Would it make sense to use strcspn(3) here?

>  	}
>  	return 0;
>  }
> -- 
> 1.7.3.3.476.g10a82

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-13  9:46 ` [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation Nguyễn Thái Ngọc Duy
@ 2010-12-13 18:10   ` Junio C Hamano
  2010-12-14  5:00     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 18:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This patch make sure that "base" parameter is writable. The callees
> are free to modify it as long as base remains the same before
> entering and after leaving the callee.
>
> This avoids quite a bit of malloc and memcpy().
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

I like what I see in this patch in general, but there is nothing that
guarantees that you are "reserving" enough space.  Doesn't the buffer
overflow with deep enough trees?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 10/19] tree_entry_interesting(): factor out most matching logic
  2010-12-13  9:46 ` [PATCH 10/19] tree_entry_interesting(): factor out most matching logic Nguyễn Thái Ngọc Duy
@ 2010-12-13 18:10   ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 18:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/tree-walk.c b/tree-walk.c
> index 01168ea..40a4657 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -481,88 +566,21 @@ int tree_entry_interesting(const struct name_entry *entry,
>  	for (i = 0; i < ps->nr; i++) {
>  		const char *match = ps->raw[i];
>  		int matchlen = ps->items[i].len;
> -		int m = -1; /* signals that we haven't called strncmp() */
>  
>  		if (baselen >= matchlen) {
> -			/* If it doesn't match, move along... */

Why lose this comment?  That is exactly what this simplified part is
about.  If match-dir-prefix says it doesn't match, we move along.

Keep this comment here, and drop the one you copied to the beginning of
match-dir-prefix, which is not about "moving along" anymore.  The function
is about deciding if they match or not.

> -			if (strncmp(base, match, matchlen))
> +			if (!match_dir_prefix(base, baselen, match, matchlen))
> +				/* Just a random prefix match */
>  				continue;

This comment no longer holds true, even though the same comment at the end
of match-dir-prefix function is correct.  This should probably say "does
not match", but that is something you already said.

Other than that, this is a very nice refactoring.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/19] tree_entry_interesting(): support wildcard matching
  2010-12-13  9:46 ` [PATCH 12/19] tree_entry_interesting(): support wildcard matching Nguyễn Thái Ngọc Duy
@ 2010-12-13 18:10   ` Junio C Hamano
  2010-12-14 15:04     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 18:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  tree-walk.c |   33 ++++++++++++++++++++++++++++++---
>  tree-walk.h |    2 +-
>  2 files changed, 31 insertions(+), 4 deletions(-)

;-)  Looks almost too easy.

> diff --git a/tree-walk.c b/tree-walk.c
> index d28de30..b5ad42b 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -596,6 +598,31 @@ int tree_entry_interesting(const struct name_entry *entry,
>  					&never_interesting))
>  				return 1;
>  		}
> +
> +match_wildcards:
> +		/*
> +		 * Concatenate base and entry->path into one and do
> +		 * fnmatch() on it.
> +		 */
> +
> +		if (!item->has_wildcard)
> +			continue;

I think the comment comes after this if--continue.

> +		never_interesting = 0;

When we have wildcard we would want to disable the never-interesting
optimization, but I wonder if doing so only when we do not have exact hit
is what we want.  If a sick person had a path "a?b" tracked, and asked to
match a pathspec "a?b", don't we still want to say "'a?b' of course
matches, but 'a1b' ('1' comes earlier than '?' in the sort order) and
'aAb' ('A' comes later) also match"?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 13/19] tree_entry_interesting(): optimize fnmatch when base is matched
  2010-12-13  9:46 ` [PATCH 13/19] tree_entry_interesting(): optimize fnmatch when base is matched Nguyễn Thái Ngọc Duy
@ 2010-12-13 18:10   ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 18:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Looks good.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 11/19] tree_entry_interesting: support depth limit
  2010-12-13  9:46 ` [PATCH 11/19] tree_entry_interesting: support depth limit Nguyễn Thái Ngọc Duy
@ 2010-12-13 18:10   ` Junio C Hamano
  2010-12-14 14:44     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 18:10 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> diff --git a/dir.c b/dir.c
> index 0987d0c..bb5076c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -71,6 +71,21 @@ int fill_directory(struct dir_struct *dir, const char **pathspec)
>  	return len;
>  }
>  
> +int within_depth(const char *name, int namelen,
> +			int depth, int max_depth)
> +{
> +	const char *cp = name, *cpe = name + namelen;
> +
> +	while (cp < cpe) {
> +		if (*cp++ != '/')
> +			continue;
> +		depth++;
> +		if (depth > max_depth)
> +			return 0;
> +	}
> +	return 1;
> +}

Makes me almost suspect that it may make more sense to keep track of the
"depth" in a similar way as "base" and "baselen" as "traversal state" on
the side of the caller so that you do not have to scan the string for
slashes over and over again.  But given the codeflow, doing so might make
the result look too ugly, so I won't recommend that without thinking,
though.

> diff --git a/tree-walk.c b/tree-walk.c
> index 40a4657..d28de30 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -558,9 +559,17 @@ int tree_entry_interesting(const struct name_entry *entry,
>  	int pathlen;
>  	int never_interesting = -1;
>  
> -	if (!ps || !ps->nr)
> +	if (!ps)
>  		return 1;
>  
> +	if (!ps->nr) {
> +		if (!ps->recursive || ps->max_depth == -1)
> +			return 1;
> +		return !!within_depth(base, baselen,
> +				      !!S_ISDIR(entry->mode),
> +				      ps->max_depth);
> +	}

This gives different behaviour to between callers that give you NULL as
pathspec and callers that give you a pathspec with zero elements.  Is this
intended?  What is the use case (iow, what does an end user give from the
command line to experience this difference)?

> @@ -571,7 +580,13 @@ int tree_entry_interesting(const struct name_entry *entry,
>  			if (!match_dir_prefix(base, baselen, match, matchlen))
>  				/* Just a random prefix match */
>  				continue;
> -			return 2;
> +
> +			if (!ps->recursive || ps->max_depth == -1)
> +				return 2;
> +
> +			return !!within_depth(base+matchlen+1, baselen-matchlen-1,
> +					      !!S_ISDIR(entry->mode),
> +					      ps->max_depth);
>  		}

If two pathspecs that overlap with each other (e.g. "Documentation/" and
"Documentation/technical") are given, and if the shorter one comes before
the longer one in ps[], wouldn't this give you an unexpected result?  When
inspecting "Documentation/technical/api/foo.txt" with depth limit of 2, if
you didn't have "Documentation/" pathspec, you count "api/foo.txt"
relative to "Documentation/technical", declare that the path is within
limit, and show it.  But if you have "Documentation/" in ps[], you look at
it, decide "technical/api/foo.txt" is too deep and return false without
even looking at "Documentation/technical" that may appear later in ps[],
no?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/19] Convert struct diff_options to use struct pathspec
  2010-12-13  9:46 ` [PATCH 04/19] Convert struct diff_options to use struct pathspec Nguyễn Thái Ngọc Duy
@ 2010-12-13 19:00   ` Junio C Hamano
  2010-12-14  5:02     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 19:00 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> diff --git a/diff-lib.c b/diff-lib.c
> index 392ce2b..3b809f2 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -501,7 +501,7 @@ int do_diff_cache(const unsigned char *tree_sha1, struct diff_options *opt)
>  	active_nr = dst - active_cache;
>  
>  	init_revisions(&revs, NULL);
> -	revs.prune_data = opt->paths;
> +	revs.prune_data = opt->pathspec.raw;
>  	tree = parse_tree_indirect(tree_sha1);
>  	if (!tree)
>  		die("bad tree object %s", sha1_to_hex(tree_sha1));

Hopefully the prune_data will become opt->pathspec not "raw" and use your
generalied/unified pathspec matching code in later patches in the series,
yes?

Other than that (no, "including that", really), looks nicely done.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 05/19] tree_entry_interesting(): remove dependency on struct diff_options
  2010-12-13  9:46 ` [PATCH 05/19] tree_entry_interesting(): remove dependency on struct diff_options Nguyễn Thái Ngọc Duy
@ 2010-12-13 19:11   ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 19:11 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> This function can be potentially used in more places than just
> tree-diff.c. "struct diff_options" does not make much sense outside
> diff_tree_sha1().
>
> While removing the use of diff_options, it also removes
> tree_entry_extract() call, which means S_ISDIR() uses the entry->mode
> directly, without being filtered by canon_mode() (called internally
> inside tree_entry_extract)
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The patch looks good, but the second paragraph above sounded unnecessarily
alarming and I had to read it three times to make sure nothing fishy is
going on ;-).

It bypasses tree-entry-extract call and uses fields of "entry" directly.
With the change, entry->mode is used without first getting normalized with
canon_mode(), but the only use of the mode information in this function is
to check the type of the entry by giving it to S_ISDIR() macro, and the
result does not change with or without canon_mode(), so it is Ok.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 15/19] pathspec: add match_pathspec_depth()
  2010-12-13  9:46 ` [PATCH 15/19] pathspec: add match_pathspec_depth() Nguyễn Thái Ngọc Duy
@ 2010-12-13 19:28   ` Junio C Hamano
  2010-12-14  5:07     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 19:28 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> match_pathspec_depth() is similar to match_pathspec() except that it
> can take depth limit.
>
> In long term, match_pathspec() should be removed in favor of this
> function.

Hmm, this strongly suggests that match_pathspec() should take "const
struct pathspec *" which already contains the necessary information and
more, including the depth limit, no?

> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  dir.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
>  dir.h |    3 +++
>  2 files changed, 48 insertions(+), 0 deletions(-)
>
> diff --git a/dir.c b/dir.c
> index bb5076c..e12dbdd 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -169,6 +169,51 @@ int match_pathspec(const char **pathspec, const char *name, int namelen,
>  	return retval;
>  }
>  
> +int match_pathspec_depth(const char **pathspec, int max_depth,
> +			 const char *name, int namelen,
> +			 int prefix, char *seen)
> +{
> +	int i, retval = 0;
> +
> +	if (!pathspec) {
> +		if (max_depth == -1)
> +			return MATCHED_RECURSIVELY;
> +
> +		if (within_depth(name, namelen, 0, max_depth))
> +			return MATCHED_EXACTLY;

Why the difference between _RECURSIVELY and _EXACTLY here?  If you have a
five-level deep project and give max-depth of 1000, shouldn't you get the
same result as you run the same command with unlimited depth?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 14/19] Convert ce_path_match() use to match_pathspec()
  2010-12-13  9:46 ` [PATCH 14/19] Convert ce_path_match() use to match_pathspec() Nguyễn Thái Ngọc Duy
@ 2010-12-13 19:31   ` Junio C Hamano
  2010-12-14 15:14     ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2010-12-13 19:31 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> Previously ce_path_match() is used together with tree_entry_interesting().
> Both do not support wildcards. tree_entry_interesting() understands
> wildcards now, so it's time to teach ce_path_match() to do the same.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---

This function is called from both Porcelains (e.g. "git diff" via
diff-lib.c and "git status" via wt-status.c) and plumbing commands
(e.g. "git update-index"), and we are changing the semantics in a big way,
even though it is a huge improvement.

I am fairly excited with the progress of this series.  When it gets merged
to 'master', we will be declaring 1.8.0 ;-)

I imagine that eventually ce_path_match() will also take "struct pathspec *"
not "const char **" when the series is completed, yes?  Or there is no
real need for that?

>  read-cache.c |   20 +-------------------
>  1 files changed, 1 insertions(+), 19 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index 1f42473..cbabd8b 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -685,28 +685,10 @@ int ce_same_name(struct cache_entry *a, struct cache_entry *b)
>  
>  int ce_path_match(const struct cache_entry *ce, const char **pathspec)
>  {
> -	const char *match, *name;
> -	int len;
> -
>  	if (!pathspec)
>  		return 1;
>  
> -	len = ce_namelen(ce);
> -	name = ce->name;
> -	while ((match = *pathspec++) != NULL) {
> -		int matchlen = strlen(match);
> -		if (matchlen > len)
> -			continue;
> -		if (memcmp(name, match, matchlen))
> -			continue;
> -		if (matchlen && name[matchlen-1] == '/')
> -			return 1;
> -		if (name[matchlen] == '/' || !name[matchlen])
> -			return 1;
> -		if (!matchlen)
> -			return 1;
> -	}
> -	return 0;
> +	return match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, NULL);
>  }
>  
>  /*
> -- 
> 1.7.3.3.476.g10a82

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-13 18:10   ` Junio C Hamano
@ 2010-12-14  5:00     ` Nguyen Thai Ngoc Duy
  2010-12-14  5:32       ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14  5:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2010/12/14 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> This patch make sure that "base" parameter is writable. The callees
>> are free to modify it as long as base remains the same before
>> entering and after leaving the callee.
>>
>> This avoids quite a bit of malloc and memcpy().
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>
> I like what I see in this patch in general, but there is nothing that
> guarantees that you are "reserving" enough space.  Doesn't the buffer
> overflow with deep enough trees?

All paths should not exceed PATH_MAX, right? That's my assumption. If
it's wrong, then, well.. we need another way.
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/19] Convert struct diff_options to use struct pathspec
  2010-12-13 19:00   ` Junio C Hamano
@ 2010-12-14  5:02     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14  5:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2010/12/14 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> diff --git a/diff-lib.c b/diff-lib.c
>> index 392ce2b..3b809f2 100644
>> --- a/diff-lib.c
>> +++ b/diff-lib.c
>> @@ -501,7 +501,7 @@ int do_diff_cache(const unsigned char *tree_sha1, struct diff_options *opt)
>>       active_nr = dst - active_cache;
>>
>>       init_revisions(&revs, NULL);
>> -     revs.prune_data = opt->paths;
>> +     revs.prune_data = opt->pathspec.raw;
>>       tree = parse_tree_indirect(tree_sha1);
>>       if (!tree)
>>               die("bad tree object %s", sha1_to_hex(tree_sha1));
>
> Hopefully the prune_data will become opt->pathspec not "raw" and use your
> generalied/unified pathspec matching code in later patches in the series,
> yes?

Yes. But that needs closer look. "prune_data" is void* and casted in
many places. Compiler won't catch mistyping for us.
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 15/19] pathspec: add match_pathspec_depth()
  2010-12-13 19:28   ` Junio C Hamano
@ 2010-12-14  5:07     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14  5:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2010/12/14 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> match_pathspec_depth() is similar to match_pathspec() except that it
>> can take depth limit.
>>
>> In long term, match_pathspec() should be removed in favor of this
>> function.
>
> Hmm, this strongly suggests that match_pathspec() should take "const
> struct pathspec *" which already contains the necessary information and
> more, including the depth limit, no?

Good idea. Thanks!

>> +int match_pathspec_depth(const char **pathspec, int max_depth,
>> +                      const char *name, int namelen,
>> +                      int prefix, char *seen)
>> +{
>> +     int i, retval = 0;
>> +
>> +     if (!pathspec) {
>> +             if (max_depth == -1)
>> +                     return MATCHED_RECURSIVELY;
>> +
>> +             if (within_depth(name, namelen, 0, max_depth))
>> +                     return MATCHED_EXACTLY;
>
> Why the difference between _RECURSIVELY and _EXACTLY here?  If you have a
> five-level deep project and give max-depth of 1000, shouldn't you get the
> same result as you run the same command with unlimited depth?

But if max-depth is 5 and the project is 1000-level deep, it should
return _EXACTLY, not _RECURSIVELY, right?
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-14  5:00     ` Nguyen Thai Ngoc Duy
@ 2010-12-14  5:32       ` Junio C Hamano
  2010-12-14  7:10         ` Nguyen Thai Ngoc Duy
  2010-12-14  7:32         ` Johannes Sixt
  0 siblings, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-12-14  5:32 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> All paths should not exceed PATH_MAX, right?

Your PATH_MAX may be a lot shorter than the PATH_MAX on the system I
created my trees on that you are reading.  Besides, you can create
arbitrarily deep tree structure without busting the limit on the
filesystem by writing tree objects by hand.

In other words, all paths shouldn't exceed PATH_MAX, but the responsiblity
to verify that is _yours_.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-14  5:32       ` Junio C Hamano
@ 2010-12-14  7:10         ` Nguyen Thai Ngoc Duy
  2010-12-14  7:32         ` Johannes Sixt
  1 sibling, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14  7:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Dec 14, 2010 at 12:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> All paths should not exceed PATH_MAX, right?
>
> Your PATH_MAX may be a lot shorter than the PATH_MAX on the system I
> created my trees on that you are reading.  Besides, you can create
> arbitrarily deep tree structure without busting the limit on the
> filesystem by writing tree objects by hand.
>
> In other words, all paths shouldn't exceed PATH_MAX, but the responsiblity
> to verify that is _yours_.

OK. I can check and die if any path exceeds PATH_MAX.
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-14  5:32       ` Junio C Hamano
  2010-12-14  7:10         ` Nguyen Thai Ngoc Duy
@ 2010-12-14  7:32         ` Johannes Sixt
  2010-12-14  7:43           ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 45+ messages in thread
From: Johannes Sixt @ 2010-12-14  7:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git

Am 12/14/2010 6:32, schrieb Junio C Hamano:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> 
>> All paths should not exceed PATH_MAX, right?
> 
> Your PATH_MAX may be a lot shorter than the PATH_MAX on the system I
> created my trees on that you are reading.

And that is not just gray theory: On Windows, PATH_MAX is whopping 260
characters!

-- Hannes

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-14  7:32         ` Johannes Sixt
@ 2010-12-14  7:43           ` Nguyen Thai Ngoc Duy
  2010-12-14  8:21             ` Johannes Sixt
  0 siblings, 1 reply; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14  7:43 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

On Tue, Dec 14, 2010 at 2:32 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Am 12/14/2010 6:32, schrieb Junio C Hamano:
>> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>>
>>> All paths should not exceed PATH_MAX, right?
>>
>> Your PATH_MAX may be a lot shorter than the PATH_MAX on the system I
>> created my trees on that you are reading.
>
> And that is not just gray theory: On Windows, PATH_MAX is whopping 260
> characters!

Does that mean Windows won't accept any path longer than that, or
PATH_MAX is defined just for fun?
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-14  7:43           ` Nguyen Thai Ngoc Duy
@ 2010-12-14  8:21             ` Johannes Sixt
  2010-12-14 13:01               ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Sixt @ 2010-12-14  8:21 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git

Am 12/14/2010 8:43, schrieb Nguyen Thai Ngoc Duy:
> On Tue, Dec 14, 2010 at 2:32 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>> Am 12/14/2010 6:32, schrieb Junio C Hamano:
>>> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>>>
>>>> All paths should not exceed PATH_MAX, right?
>>>
>>> Your PATH_MAX may be a lot shorter than the PATH_MAX on the system I
>>> created my trees on that you are reading.
>>
>> And that is not just gray theory: On Windows, PATH_MAX is whopping 260
>> characters!
> 
> Does that mean Windows won't accept any path longer than that, or
> PATH_MAX is defined just for fun?

It is the limit for path names that we can have without MAJOR extra
efforts. It is not a limit implied by the file system.

-- Hannes

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 01/19] Add struct pathspec
  2010-12-13 17:31   ` Thiago Farina
@ 2010-12-14 12:50     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14 12:50 UTC (permalink / raw)
  To: Thiago Farina; +Cc: git, Junio C Hamano

2010/12/14 Thiago Farina <tfransosi@gmail.com>:
>> +void free_pathspec(struct pathspec *pathspec)
>> +{
>> +       ; /* do nothing */
>
> Of curse this ; here is not necessary :)
>

Well, can't please everyone [1]. It's for clarity only.
free_pathspec() becomes real in the next patches.

[1] http://mid.gmane.org/7vbp7ix20u.fsf@alter.siamese.dyndns.org
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-14  8:21             ` Johannes Sixt
@ 2010-12-14 13:01               ` Nguyen Thai Ngoc Duy
  2010-12-14 17:11                 ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14 13:01 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

On Tue, Dec 14, 2010 at 3:21 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Am 12/14/2010 8:43, schrieb Nguyen Thai Ngoc Duy:
>> On Tue, Dec 14, 2010 at 2:32 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
>>> Am 12/14/2010 6:32, schrieb Junio C Hamano:
>>>> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>>>>
>>>>> All paths should not exceed PATH_MAX, right?
>>>>
>>>> Your PATH_MAX may be a lot shorter than the PATH_MAX on the system I
>>>> created my trees on that you are reading.
>>>
>>> And that is not just gray theory: On Windows, PATH_MAX is whopping 260
>>> characters!
>>
>> Does that mean Windows won't accept any path longer than that, or
>> PATH_MAX is defined just for fun?
>
> It is the limit for path names that we can have without MAJOR extra
> efforts. It is not a limit implied by the file system.

Just googled around and found this [1]. I'm going with strbuf so it
can be reallocated if necessary. A bit tricky because "base" can be
moved, but I think it's worth it.

[1] http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 11/19] tree_entry_interesting: support depth limit
  2010-12-13 18:10   ` Junio C Hamano
@ 2010-12-14 14:44     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14 14:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2010/12/14 Junio C Hamano <gitster@pobox.com>:
>> +int within_depth(const char *name, int namelen,
>> +                     int depth, int max_depth)
>> +{
>> +     const char *cp = name, *cpe = name + namelen;
>> +
>> +     while (cp < cpe) {
>> +             if (*cp++ != '/')
>> +                     continue;
>> +             depth++;
>> +             if (depth > max_depth)
>> +                     return 0;
>> +     }
>> +     return 1;
>> +}
>
> Makes me almost suspect that it may make more sense to keep track of the
> "depth" in a similar way as "base" and "baselen" as "traversal state" on
> the side of the caller so that you do not have to scan the string for
> slashes over and over again.  But given the codeflow, doing so might make
> the result look too ugly, so I won't recommend that without thinking,
> though.

Moreover, this function is also used by match_pathspec_depth().

>> -     if (!ps || !ps->nr)
>> +     if (!ps)
>>               return 1;
>>
>> +     if (!ps->nr) {
>> +             if (!ps->recursive || ps->max_depth == -1)
>> +                     return 1;
>> +             return !!within_depth(base, baselen,
>> +                                   !!S_ISDIR(entry->mode),
>> +                                   ps->max_depth);
>> +     }
>
> This gives different behaviour to between callers that give you NULL as
> pathspec and callers that give you a pathspec with zero elements.  Is this
> intended?  What is the use case (iow, what does an end user give from the
> command line to experience this difference)?

Old pathspec type's legacy. When pathspec is of "const char **", it
can be NULL. When struct pathspec is used, I don't think we need to
support NULL pathspec. Will remove that "if (!ps)" clause.

>> @@ -571,7 +580,13 @@ int tree_entry_interesting(const struct name_entry *entry,
>>                       if (!match_dir_prefix(base, baselen, match, matchlen))
>>                               /* Just a random prefix match */
>>                               continue;
>> -                     return 2;
>> +
>> +                     if (!ps->recursive || ps->max_depth == -1)
>> +                             return 2;
>> +
>> +                     return !!within_depth(base+matchlen+1, baselen-matchlen-1,
>> +                                           !!S_ISDIR(entry->mode),
>> +                                           ps->max_depth);
>>               }
>
> If two pathspecs that overlap with each other (e.g. "Documentation/" and
> "Documentation/technical") are given, and if the shorter one comes before
> the longer one in ps[], wouldn't this give you an unexpected result?  When
> inspecting "Documentation/technical/api/foo.txt" with depth limit of 2, if
> you didn't have "Documentation/" pathspec, you count "api/foo.txt"
> relative to "Documentation/technical", declare that the path is within
> limit, and show it.  But if you have "Documentation/" in ps[], you look at
> it, decide "technical/api/foo.txt" is too deep and return false without
> even looking at "Documentation/technical" that may appear later in ps[],
> no?

Right. Hmm.. grep's pathspec_matches() probably has this problem too.
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/19] tree_entry_interesting(): support wildcard matching
  2010-12-13 18:10   ` Junio C Hamano
@ 2010-12-14 15:04     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14 15:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2010/12/14 Junio C Hamano <gitster@pobox.com>:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>>  tree-walk.c |   33 ++++++++++++++++++++++++++++++---
>>  tree-walk.h |    2 +-
>>  2 files changed, 31 insertions(+), 4 deletions(-)
>
> ;-)  Looks almost too easy.

Much better, yes. I went the wrong way and allowed wildcards in
directory matching too. It turned tree_entry_interesting() into hell.

>> +             never_interesting = 0;
>
> When we have wildcard we would want to disable the never-interesting
> optimization, but I wonder if doing so only when we do not have exact hit
> is what we want.  If a sick person had a path "a?b" tracked, and asked to
> match a pathspec "a?b", don't we still want to say "'a?b' of course
> matches, but 'a1b' ('1' comes earlier than '?' in the sort order) and
> 'aAb' ('A' comes later) also match"?

Heck, yeah. Easier for me though, just put this at top :-)

int never_interesting = ps->has_wildcard ? 0 : -1;
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 14/19] Convert ce_path_match() use to match_pathspec()
  2010-12-13 19:31   ` Junio C Hamano
@ 2010-12-14 15:14     ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 45+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-12-14 15:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2010/12/14 Junio C Hamano <gitster@pobox.com>:
> I imagine that eventually ce_path_match() will also take "struct pathspec *"
> not "const char **" when the series is completed, yes?  Or there is no
> real need for that?

But I do have the patches to convert ce_path_match to struct
pathspec*. Hmm.. I did. Need to look into reflogs..
-- 
Duy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation
  2010-12-14 13:01               ` Nguyen Thai Ngoc Duy
@ 2010-12-14 17:11                 ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2010-12-14 17:11 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Johannes Sixt, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> Just googled around and found this [1]. I'm going with strbuf so it
> can be reallocated if necessary. A bit tricky because "base" can be
> moved, but I think it's worth it.

Thanks---I was about to suggest using strbuf but you figured it out
already, which is good ;-).

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2010-12-14 17:11 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-13  9:46 [PATCH 00/19] nd/struct-pathspec (or pathspec unification [1]) Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 01/19] Add struct pathspec Nguyễn Thái Ngọc Duy
2010-12-13 17:31   ` Thiago Farina
2010-12-14 12:50     ` Nguyen Thai Ngoc Duy
2010-12-13  9:46 ` [PATCH 02/19] diff-no-index: use diff_tree_setup_paths() Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 03/19] pathspec: cache string length when initializing pathspec Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 04/19] Convert struct diff_options to use struct pathspec Nguyễn Thái Ngọc Duy
2010-12-13 19:00   ` Junio C Hamano
2010-12-14  5:02     ` Nguyen Thai Ngoc Duy
2010-12-13  9:46 ` [PATCH 05/19] tree_entry_interesting(): remove dependency on struct diff_options Nguyễn Thái Ngọc Duy
2010-12-13 19:11   ` Junio C Hamano
2010-12-13  9:46 ` [PATCH 06/19] Move tree_entry_interesting() to tree-walk.c and export it Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 07/19] glossary: define pathspec Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 08/19] pathspec: mark wildcard pathspecs from the beginning Nguyễn Thái Ngọc Duy
2010-12-13 18:09   ` Junio C Hamano
2010-12-13  9:46 ` [PATCH 09/19] tree-diff.c: reserve space in "base" for pathname concatenation Nguyễn Thái Ngọc Duy
2010-12-13 18:10   ` Junio C Hamano
2010-12-14  5:00     ` Nguyen Thai Ngoc Duy
2010-12-14  5:32       ` Junio C Hamano
2010-12-14  7:10         ` Nguyen Thai Ngoc Duy
2010-12-14  7:32         ` Johannes Sixt
2010-12-14  7:43           ` Nguyen Thai Ngoc Duy
2010-12-14  8:21             ` Johannes Sixt
2010-12-14 13:01               ` Nguyen Thai Ngoc Duy
2010-12-14 17:11                 ` Junio C Hamano
2010-12-13  9:46 ` [PATCH 10/19] tree_entry_interesting(): factor out most matching logic Nguyễn Thái Ngọc Duy
2010-12-13 18:10   ` Junio C Hamano
2010-12-13  9:46 ` [PATCH 11/19] tree_entry_interesting: support depth limit Nguyễn Thái Ngọc Duy
2010-12-13 18:10   ` Junio C Hamano
2010-12-14 14:44     ` Nguyen Thai Ngoc Duy
2010-12-13  9:46 ` [PATCH 12/19] tree_entry_interesting(): support wildcard matching Nguyễn Thái Ngọc Duy
2010-12-13 18:10   ` Junio C Hamano
2010-12-14 15:04     ` Nguyen Thai Ngoc Duy
2010-12-13  9:46 ` [PATCH 13/19] tree_entry_interesting(): optimize fnmatch when base is matched Nguyễn Thái Ngọc Duy
2010-12-13 18:10   ` Junio C Hamano
2010-12-13  9:46 ` [PATCH 14/19] Convert ce_path_match() use to match_pathspec() Nguyễn Thái Ngọc Duy
2010-12-13 19:31   ` Junio C Hamano
2010-12-14 15:14     ` Nguyen Thai Ngoc Duy
2010-12-13  9:46 ` [PATCH 15/19] pathspec: add match_pathspec_depth() Nguyễn Thái Ngọc Duy
2010-12-13 19:28   ` Junio C Hamano
2010-12-14  5:07     ` Nguyen Thai Ngoc Duy
2010-12-13  9:46 ` [PATCH 16/19] grep: convert to use struct pathspec Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 17/19] grep: use match_pathspec_depth() for cache/worktree grepping Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 18/19] grep: use preallocated buffer for grep_tree() Nguyễn Thái Ngọc Duy
2010-12-13  9:46 ` [PATCH 19/19] grep: drop pathspec_matches() in favor of tree_entry_interesting() Nguyễn Thái Ngọc Duy

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).