git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/5] recursively grep across submodules
@ 2016-10-27 22:38 Brandon Williams
  2016-10-27 22:38 ` [PATCH 1/5] submodules: add helper functions to determine presence of submodules Brandon Williams
                   ` (6 more replies)
  0 siblings, 7 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-27 22:38 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

This patch series adds some basic api functions to the submodule interface as
well as teaching grep to recursively search in submodules.

The additions to the submodule interface allow grep to verify that a submodule
has been initialized and checked out prior to launching a child process.  One
issue that still needs to be worked out is when greppig history, you could be
in a state where the submodule doesn't have a working tree (or the path you had
in the past doesn't match what currently exists) so instead of changing
directory into the submdoule you need to look for the .git directory for the
submodule in the parents .git/modules directory.  If it exists we would need to
change directory to .git/modules/<submodule> and then run the child process
from there.  This currently doesn't work due to commit <10f5c52656> since the 
GIT_DIR env variable is explicitly set to be '.git'.  I'm going to spend some
more time thinking about this problem and will address it as an additional patch in
the series at a later time.

As for the rest of the series, it should be ready for review or comments.


Brandon Williams (5):
  submodules: add helper functions to determine presence of submodules
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on <tree> objects

 Documentation/git-grep.txt         |  14 ++
 builtin/grep.c                     | 364 ++++++++++++++++++++++++++++++++++---
 cache.h                            |   2 +
 config.c                           |   8 +-
 git.c                              |   2 +-
 grep.c                             |  16 +-
 grep.h                             |   1 +
 submodule-config.c                 |   6 +-
 submodule-config.h                 |   3 +
 submodule.c                        |  51 ++++++
 submodule.h                        |   3 +
 t/t7814-grep-recurse-submodules.sh | 141 ++++++++++++++
 tree-walk.c                        |  17 +-
 13 files changed, 588 insertions(+), 40 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

-- 
2.10.1.613.g6021889


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 1/5] submodules: add helper functions to determine presence of submodules
  2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
@ 2016-10-27 22:38 ` Brandon Williams
  2016-10-27 22:38 ` [PATCH 2/5] submodules: load gitmodules file from commit sha1 Brandon Williams
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-27 22:38 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Add two helper functions to submodules.c.
`is_submodule_initialized()` checks if a submodule has been initialized
at a given path and `is_submodule_checked_out()` check if a submodule
has been checked out at a given path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 39 +++++++++++++++++++++++++++++++++++++++
 submodule.h |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/submodule.c b/submodule.c
index 6f7d883de..029b24440 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,45 @@ void gitmodules_config(void)
 	}
 }
 
+/*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+	int ret = 0;
+	const struct submodule *module = NULL;
+
+	module = submodule_from_path(null_sha1, path);
+
+	if (module) {
+		struct strbuf buf = STRBUF_INIT;
+		char *submodule_url = NULL;
+
+		strbuf_addf(&buf, "submodule.%s.url",module->name);
+		ret = !git_config_get_string(buf.buf, &submodule_url);
+
+		free(submodule_url);
+		strbuf_release(&buf);
+	}
+
+	return ret;
+}
+
+/*
+ * Determine if a submodule has been checked out at a given 'path'
+ */
+int is_submodule_checked_out(const char *path)
+{
+	int ret = 0;
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addf(&buf, "%s/.git", path);
+	ret = file_exists(buf.buf);
+
+	strbuf_release(&buf);
+	return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a94..bd039ca98 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
+extern int is_submodule_checked_out(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
-- 
2.10.1.613.g6021889


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 2/5] submodules: load gitmodules file from commit sha1
  2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
  2016-10-27 22:38 ` [PATCH 1/5] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-10-27 22:38 ` Brandon Williams
  2016-10-27 22:38 ` [PATCH 3/5] grep: add submodules as a grep source type Brandon Williams
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-27 22:38 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 cache.h            |  2 ++
 config.c           |  8 ++++----
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c        | 12 ++++++++++++
 submodule.h        |  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index f7ee41456..74b0c3cba 100644
--- a/cache.h
+++ b/cache.h
@@ -1681,6 +1681,8 @@ extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+				     const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb1b..4d78e7227 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	return do_config_from(&top, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-				     const char *name,
-				     const unsigned char *sha1,
-				     void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+			      const char *name,
+			      const unsigned char *sha1,
+			      void *data)
 {
 	enum object_type type;
 	char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085be6..8b9a2ef28 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
-				      unsigned char *gitmodules_sha1,
-				      struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+			       unsigned char *gitmodules_sha1,
+			       struct strbuf *rev)
 {
 	int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542d2..78584ba6a 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+				      unsigned char *gitmodules_sha1,
+				      struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index 029b24440..f2a56689f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
 	}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned char sha1[20];
+
+	if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
+		git_config_from_blob_sha1(submodule_config, rev.buf,
+					  sha1, NULL);
+	}
+	strbuf_release(&rev);
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index bd039ca98..9a24ac82e 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_checked_out(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.10.1.613.g6021889


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 3/5] grep: add submodules as a grep source type
  2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
  2016-10-27 22:38 ` [PATCH 1/5] submodules: add helper functions to determine presence of submodules Brandon Williams
  2016-10-27 22:38 ` [PATCH 2/5] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-10-27 22:38 ` Brandon Williams
  2016-10-27 22:38 ` [PATCH 4/5] grep: optionally recurse into submodules Brandon Williams
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-27 22:38 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 grep.c | 16 +++++++++++++++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35b5..0dbdc1d00 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	case GREP_SOURCE_FILE:
 		gs->identifier = xstrdup(identifier);
 		break;
+	case GREP_SOURCE_SUBMODULE:
+		if (!identifier) {
+			gs->identifier = NULL;
+			break;
+		}
+		/*
+		 * FALL THROUGH
+		 * If the identifier is non-NULL (in the submodule case) it
+		 * will be a SHA1 that needs to be copied.
+		 */
 	case GREP_SOURCE_SHA1:
 		gs->identifier = xmalloc(20);
 		hashcpy(gs->identifier, identifier);
 		break;
 	case GREP_SOURCE_BUF:
 		gs->identifier = NULL;
+		break;
 	}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
 	switch (gs->type) {
 	case GREP_SOURCE_FILE:
 	case GREP_SOURCE_SHA1:
+	case GREP_SOURCE_SUBMODULE:
 		free(gs->buf);
 		gs->buf = NULL;
 		gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
 		return grep_source_load_sha1(gs);
 	case GREP_SOURCE_BUF:
 		return gs->buf ? 0 : -1;
+	case GREP_SOURCE_SUBMODULE:
+		break;
 	}
-	die("BUG: invalid grep_source type");
+	die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23e4..267534ca2 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
 		GREP_SOURCE_SHA1,
 		GREP_SOURCE_FILE,
 		GREP_SOURCE_BUF,
+		GREP_SOURCE_SUBMODULE,
 	} type;
 	void *identifier;
 
-- 
2.10.1.613.g6021889


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 4/5] grep: optionally recurse into submodules
  2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
                   ` (2 preceding siblings ...)
  2016-10-27 22:38 ` [PATCH 3/5] grep: add submodules as a grep source type Brandon Williams
@ 2016-10-27 22:38 ` Brandon Williams
  2016-11-05  5:09   ` Jonathan Tan
  2016-10-27 22:38 ` [PATCH 5/5] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-10-27 22:38 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |   5 +
 builtin/grep.c                     | 301 ++++++++++++++++++++++++++++++++++---
 git.c                              |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
 4 files changed, 386 insertions(+), 21 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e49..17aa1ba70 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
+	   [--recurse-submodules]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -88,6 +89,10 @@ OPTIONS
 	mechanism.  Only useful when searching files in the current directory
 	with `--no-index`.
 
+--recurse-submodules::
+	Recursively search in each submodule that has been initialized and
+	checked out in the repository.
+
 -a::
 --text::
 	Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6add..f34f16df9 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
 	NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
 			break;
 
 		opt->output_priv = w;
-		hit |= grep_source(opt, &w->source);
+		if (w->source.type == GREP_SOURCE_SUBMODULE)
+			hit |= grep_submodule_launch(opt, &w->source);
+		else
+			hit |= grep_source(opt, &w->source);
 		grep_source_clear_data(&w->source);
 		work_done(w);
 	}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
+	} else if (super_prefix) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
+		strbuf_addstr(&pathbuf, super_prefix);
+		strbuf_addstr(&pathbuf, filename + tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 {
 	struct strbuf buf = STRBUF_INIT;
 
-	if (opt->relative && opt->prefix_length)
+	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename, opt->prefix, &buf);
-	else
+	} else {
+		if (super_prefix)
+			strbuf_addstr(&buf, super_prefix);
 		strbuf_addstr(&buf, filename);
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
@@ -378,31 +396,259 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 		exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+				      const struct pathspec *pathspec,
+				      int cached, int untracked,
+				      int opt_exclude, int use_index,
+				      int pattern_type_arg)
+{
+	struct grep_pat *pattern;
+	int i;
+
+	if (recurse_submodules)
+		argv_array_push(&submodule_options, "--recurse-submodules");
+
+	if (cached)
+		argv_array_push(&submodule_options, "--cached");
+	if (!use_index)
+		argv_array_push(&submodule_options, "--no-index");
+	if (untracked)
+		argv_array_push(&submodule_options, "--untracked");
+	if (opt_exclude > 0)
+		argv_array_push(&submodule_options, "--exclude-standard");
+
+	if (opt->invert)
+		argv_array_push(&submodule_options, "-v");
+	if (opt->ignore_case)
+		argv_array_push(&submodule_options, "-i");
+	if (opt->word_regexp)
+		argv_array_push(&submodule_options, "-w");
+	switch (opt->binary) {
+	case GREP_BINARY_NOMATCH:
+		argv_array_push(&submodule_options, "-I");
+		break;
+	case GREP_BINARY_TEXT:
+		argv_array_push(&submodule_options, "-a");
+		break;
+	default:
+		break;
+	}
+	if (opt->allow_textconv)
+		argv_array_push(&submodule_options, "--textconv");
+	if (opt->max_depth != -1)
+		argv_array_pushf(&submodule_options, "--max-depth=%d",
+				 opt->max_depth);
+	if (opt->linenum)
+		argv_array_push(&submodule_options, "-n");
+	if (!opt->pathname)
+		argv_array_push(&submodule_options, "-h");
+	if (!opt->relative)
+		argv_array_push(&submodule_options, "--full-name");
+	if (opt->name_only)
+		argv_array_push(&submodule_options, "-l");
+	if (opt->unmatch_name_only)
+		argv_array_push(&submodule_options, "-L");
+	if (opt->null_following_name)
+		argv_array_push(&submodule_options, "-z");
+	if (opt->count)
+		argv_array_push(&submodule_options, "-c");
+	if (opt->file_break)
+		argv_array_push(&submodule_options, "--break");
+	if (opt->heading)
+		argv_array_push(&submodule_options, "--heading");
+	if (opt->pre_context)
+		argv_array_pushf(&submodule_options, "--before-context=%d",
+				 opt->pre_context);
+	if (opt->post_context)
+		argv_array_pushf(&submodule_options, "--after-context=%d",
+				 opt->post_context);
+	if (opt->funcname)
+		argv_array_push(&submodule_options, "-p");
+	if (opt->funcbody)
+		argv_array_push(&submodule_options, "-W");
+	if (opt->all_match)
+		argv_array_push(&submodule_options, "--all-match");
+	if (opt->debug)
+		argv_array_push(&submodule_options, "--debug");
+	if (opt->status_only)
+		argv_array_push(&submodule_options, "-q");
+
+	switch (pattern_type_arg) {
+	case GREP_PATTERN_TYPE_BRE:
+		argv_array_push(&submodule_options, "-G");
+		break;
+	case GREP_PATTERN_TYPE_ERE:
+		argv_array_push(&submodule_options, "-E");
+		break;
+	case GREP_PATTERN_TYPE_FIXED:
+		argv_array_push(&submodule_options, "-F");
+		break;
+	case GREP_PATTERN_TYPE_PCRE:
+		argv_array_push(&submodule_options, "-P");
+		break;
+	case GREP_PATTERN_TYPE_UNSPECIFIED:
+		break;
+	}
+
+	for (pattern = opt->pattern_list; pattern != NULL;
+	     pattern = pattern->next) {
+		switch (pattern->token) {
+		case GREP_PATTERN:
+			argv_array_pushf(&submodule_options, "-e%s",
+					 pattern->pattern);
+			break;
+		case GREP_AND:
+		case GREP_OPEN_PAREN:
+		case GREP_CLOSE_PAREN:
+		case GREP_NOT:
+		case GREP_OR:
+			argv_array_push(&submodule_options, pattern->pattern);
+			break;
+		/* BODY and HEAD are not used by git-grep */
+		case GREP_PATTERN_BODY:
+		case GREP_PATTERN_HEAD:
+			break;
+		}
+	}
+
+	/*
+	 * Limit number of threads for child process to use.
+	 * This is to prevent potential fork-bomb behavior of git-grep as each
+	 * submodule process has its own thread pool.
+	 */
+	if (num_threads)
+		argv_array_pushf(&submodule_options, "--threads=%d",
+				 (num_threads + 1) / 2);
+
+	/* Add Pathspecs */
+	argv_array_push(&submodule_options, "--");
+	for (i = 0; i < pathspec->nr; i++)
+		argv_array_push(&submodule_options,
+				pathspec->items[i].original);
+}
+
+/*
+ * Launch child process to grep contents of a submodule
+ */
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int status, i;
+	struct work_item *w = opt->output_priv;
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	/* Add super prefix */
+	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
+			 super_prefix ? super_prefix: "",
+			 gs->path);
+	argv_array_push(&cp.args, "grep");
+
+	/* Add options */
+	for (i = 0; i < submodule_options.argc; i++)
+		argv_array_push(&cp.args, submodule_options.argv[i]);
+
+	cp.git_cmd = 1;
+	cp.dir = gs->path;
+
+	/*
+	 * Capture output to output buffer and check the return code from the
+	 * child process.  A '0' indicates a hit, a '1' indicates no hit and
+	 * anything else is an error.
+	 */
+	status = capture_command(&cp, &w->out, 0);
+	if (status && (status != 1))
+		exit(status);
+
+	/* invert the return code to make a hit equal to 1 */
+	return !status;
+}
+
+/*
+ * Prep grep structures for a submodule grep
+ * sha1: the sha1 of the submodule or NULL if using the working tree
+ * filename: name of the submodule including tree name of parent
+ * path: location of the submodule
+ */
+static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
+			  const char *filename, const char *path)
+{
+	if (!(is_submodule_initialized(path) &&
+	      is_submodule_checked_out(path))) {
+		warning("skiping submodule '%s%s' since it is not initialized and checked out",
+			super_prefix ? super_prefix: "",
+			path);
+		return 0;
+	}
+
+#ifndef NO_PTHREADS
+	if (num_threads) {
+		add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
+		return 0;
+	} else
+#endif
+	{
+		struct work_item w;
+		int hit;
+
+		grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
+				 filename, path, sha1);
+		strbuf_init(&w.out, 0);
+		opt->output_priv = &w;
+		hit = grep_submodule_launch(opt, &w.source);
+
+		write_or_die(1, w.out.buf, w.out.len);
+
+		grep_source_clear(&w.source);
+		strbuf_release(&w.out);
+		return hit;
+	}
+}
+
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
+		      int cached)
 {
 	int hit = 0;
 	int nr;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
+
 	read_cache();
 
 	for (nr = 0; nr < active_nr; nr++) {
 		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!ce_path_match(ce, pathspec, NULL))
-			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce) || ce_intent_to_add(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
-					 ce->name);
+		strbuf_setlen(&name, name_base_len);
+		strbuf_addstr(&name, ce->name);
+
+		if (S_ISREG(ce->ce_mode) &&
+		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
+				   S_ISDIR(ce->ce_mode) ||
+				   S_ISGITLINK(ce->ce_mode))) {
+			/*
+			 * If CE_VALID is on, we assume worktree file and its
+			 * cache entry are identical, even if worktree file has
+			 * been modified, so use cache version instead
+			 */
+			if (cached || (ce->ce_flags & CE_VALID) ||
+			    ce_skip_worktree(ce)) {
+				if (ce_stage(ce) || ce_intent_to_add(ce))
+					continue;
+				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
+						 0, ce->name);
+			}
+			else {
+				hit |= grep_file(opt, ce->name);
+			}
+		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+			   submodule_path_match(pathspec, name.buf, NULL)) {
+			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
 		}
-		else
-			hit |= grep_file(opt, ce->name);
+
 		if (ce_stage(ce)) {
 			do {
 				nr++;
@@ -413,6 +659,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -651,6 +899,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("search in both tracked and untracked files")),
 		OPT_SET_INT(0, "exclude-standard", &opt_exclude,
 			    N_("ignore files specified via '.gitignore'"), 1),
+		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
+			 N_("recursivley search in each submodule")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -755,6 +1005,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	init_grep_defaults();
 	git_config(grep_cmd_config, NULL);
 	grep_init(&opt, prefix);
+	super_prefix = get_super_prefix();
 
 	/*
 	 * If there is no -- then the paths must exist in the working
@@ -872,6 +1123,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+	if (recurse_submodules) {
+		gitmodules_config();
+		compile_submodule_options(&opt, &pathspec, cached, untracked,
+					  opt_exclude, use_index,
+					  pattern_type_arg);
+	}
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -895,6 +1153,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (recurse_submodules && (!use_index || untracked || list.nr))
+		die(_("option not supported with --recurse-submodules."));
+
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
diff --git a/git.c b/git.c
index efa1059fe..a156efd85 100644
--- a/git.c
+++ b/git.c
@@ -434,7 +434,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
-	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "grep", cmd_grep, RUN_SETUP_GENTLY | SUPPORT_SUPER_PREFIX },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
new file mode 100755
index 000000000..b670c70cb
--- /dev/null
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='Test grep recurse-submodules feature
+
+This test verifies the recurse-submodules feature correctly greps across
+submodules.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup directory structure and submodule' '
+	echo "foobar" >a &&
+	mkdir b &&
+	echo "bar" >b/b &&
+	git add a b &&
+	git commit -m "add a and b" &&
+	git init submodule &&
+	echo "foobar" >submodule/a &&
+	git -C submodule add a &&
+	git -C submodule commit -m "add a" &&
+	git submodule add ./submodule &&
+	git commit -m "added submodule"
+'
+
+test_expect_success 'grep correctly finds patterns in a submodule' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and basic pathspecs' '
+	cat >expect <<-\EOF &&
+	submodule/a:foobar
+	EOF
+
+	git grep -e. --recurse-submodules -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and nested submodules' '
+	git init submodule/sub &&
+	echo "foobar" >submodule/sub/a &&
+	git -C submodule/sub add a &&
+	git -C submodule/sub commit -m "add a" &&
+	git -C submodule submodule add ./sub &&
+	git -C submodule add sub &&
+	git -C submodule commit -m "added sub" &&
+	git add submodule &&
+	git commit -m "updated submodule" &&
+
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --and -e "foo" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	b/b:bar
+	EOF
+
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_incompatible_with_recurse_submodules ()
+{
+	test_expect_success "--recurse-submodules and $1 are incompatible" "
+		test_must_fail git grep -e. --recurse-submodules $1 2>actual &&
+		test_i18ngrep 'not supported with --recurse-submodules' actual
+	"
+}
+
+test_incompatible_with_recurse_submodules --untracked
+test_incompatible_with_recurse_submodules --no-index
+test_incompatible_with_recurse_submodules HEAD
+
+test_done
-- 
2.10.1.613.g6021889


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 5/5] grep: enable recurse-submodules to work on <tree> objects
  2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
                   ` (3 preceding siblings ...)
  2016-10-27 22:38 ` [PATCH 4/5] grep: optionally recurse into submodules Brandon Williams
@ 2016-10-27 22:38 ` Brandon Williams
  2016-10-28 19:35   ` Brandon Williams
  2016-10-27 23:26 ` [RFC PATCH 0/5] recursively grep across submodules Junio C Hamano
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
  6 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-10-27 22:38 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams

Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD` from:
HEAD:file
<commit sha1 of submodule>:sub/file

to:
HEAD:file
HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         | 13 ++++++--
 builtin/grep.c                     | 67 +++++++++++++++++++++++++++++++++++---
 t/t7814-grep-recurse-submodules.sh | 44 ++++++++++++++++++++++++-
 tree-walk.c                        | 17 +++++-----
 4 files changed, 125 insertions(+), 16 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba70..386a868c6 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules]
+	   [--recurse-submodules] [--parent-basename]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
 	Recursively search in each submodule that has been initialized and
-	checked out in the repository.
+	checked out in the repository.  When used in combination with the
+	<tree> option the prefix of all submodule output will be the name of
+	the parent project's <tree> object.
+
+--parent-basename::
+	For internal use only.  In order to produce uniform output with the
+	--recurse-submodules option, this option can be used to provide the
+	basename of a parent's <tree> object to a submodule so the submodule
+	can prefix its output with the parent's name rather than the SHA1 of
+	the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index f34f16df9..bdf1b9089 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
@@ -535,6 +537,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
+	const char *end_of_base;
 	struct work_item *w = opt->output_priv;
 
 	prepare_submodule_repo_env(&cp.env_array);
@@ -545,9 +548,36 @@ static int grep_submodule_launch(struct grep_opt *opt,
 			 gs->path);
 	argv_array_push(&cp.args, "grep");
 
+	/*
+	 * Add basename of parent project
+	 * When performing grep on a <tree> object the filename is prefixed
+	 * with the object's name: '<tree-name>:filename'.  In order to
+	 * provide uniformity of output we want to pass the name of the
+	 * parent project's object name to the submodule so the submodule can
+	 * prefix its output with the parent's name and not its own SHA1.
+	 */
+	end_of_base = strchr(gs->name, ':');
+	if (end_of_base)
+		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
+				 (int) (end_of_base - gs->name),
+				 gs->name);
+
 	/* Add options */
-	for (i = 0; i < submodule_options.argc; i++)
+	for (i = 0; i < submodule_options.argc; i++) {
+		/*
+		 * If there is a <tree> identifier for the submodule, add the
+		 * rev after adding the submodule options but before the
+		 * pathspecs.  To do this we listen for the '--' and insert the
+		 * sha1 before pushing the '--' onto the child process argv
+		 * array.
+		 */
+		if (gs->identifier &&
+		    !strcmp("--", submodule_options.argv[i])) {
+			argv_array_push(&cp.args, sha1_to_hex(gs->identifier));
+		}
+
 		argv_array_push(&cp.args, submodule_options.argv[i]);
+	}
 
 	cp.git_cmd = 1;
 	cp.dir = gs->path;
@@ -672,12 +702,21 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	enum interesting match = entry_not_interesting;
 	struct name_entry entry;
 	int old_baselen = base->len;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
+			strbuf_setlen(&name, name_base_len);
+			strbuf_addstr(&name, base->buf + tn_len);
+			match = tree_entry_interesting(&entry, &name,
+						       0, pathspec);
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
@@ -689,8 +728,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_sha1(opt, entry.oid->hash, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
-		}
-		else if (S_ISDIR(entry.mode)) {
+		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
@@ -706,12 +744,18 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
+		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+			hit |= grep_submodule(opt, entry.oid->hash, base->buf,
+					      base->buf + tn_len);
 		}
+
 		strbuf_setlen(base, old_baselen);
 
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -735,6 +779,10 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
 
+		/* Use parent's name as base when recursing submodules */
+		if (recurse_submodules && parent_basename)
+			name = parent_basename;
+
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
@@ -761,6 +809,12 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
+
+		/* load the gitmodules file for this rev */
+		if (recurse_submodules) {
+			submodule_free();
+			gitmodules_config_sha1(real_obj->oid.hash);
+		}
 		if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) {
 			hit = 1;
 			if (opt->status_only)
@@ -901,6 +955,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    N_("ignore files specified via '.gitignore'"), 1),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			 N_("recursivley search in each submodule")),
+		OPT_STRING(0, "parent-basename", &parent_basename,
+			   N_("basename"),
+			   N_("prepend parent project's basename to output")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -1153,7 +1210,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked || list.nr))
+	if (recurse_submodules && (!use_index || untracked))
 		die(_("option not supported with --recurse-submodules."));
 
 	if (!show_in_pager && !opt.status_only)
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index b670c70cb..3d1892dd7 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -84,6 +84,49 @@ test_expect_success 'grep and multiple patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success 'basic grep tree' '
+	cat >expect <<-\EOF &&
+	HEAD:a:foobar
+	HEAD:b/b:bar
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^' '
+	cat >expect <<-\EOF &&
+	HEAD^:a:foobar
+	HEAD^:b/b:bar
+	HEAD^:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^ > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^^' '
+	cat >expect <<-\EOF &&
+	HEAD^^:a:foobar
+	HEAD^^:b/b:bar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^^ > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- submodule > actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
@@ -94,6 +137,5 @@ test_incompatible_with_recurse_submodules ()
 
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
-test_incompatible_with_recurse_submodules HEAD
 
 test_done
diff --git a/tree-walk.c b/tree-walk.c
index 828f4356b..b3f996174 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -999,10 +999,11 @@ static enum interesting do_match(const struct name_entry *entry,
 					return entry_interesting;
 
 				/*
-				 * Match all directories. We'll try to
-				 * match files later on.
+				 * Match all directories and gitlinks. We'll
+				 * try to match files later on.
 				 */
-				if (ps->recursive && S_ISDIR(entry->mode))
+				if (ps->recursive && (S_ISDIR(entry->mode) ||
+						      S_ISGITLINK(entry->mode)))
 					return entry_interesting;
 			}
 
@@ -1043,13 +1044,13 @@ static enum interesting do_match(const struct name_entry *entry,
 		strbuf_setlen(base, base_offset + baselen);
 
 		/*
-		 * Match all directories. We'll try to match files
-		 * later on.
-		 * max_depth is ignored but we may consider support it
-		 * in future, see
+		 * Match all directories and gitlinks. We'll try to match files
+		 * later on.  max_depth is ignored but we may consider support
+		 * it in future, see
 		 * http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840
 		 */
-		if (ps->recursive && S_ISDIR(entry->mode))
+		if (ps->recursive && (S_ISDIR(entry->mode) ||
+				      S_ISGITLINK(entry->mode)))
 			return entry_interesting;
 	}
 	return never_interesting; /* No matches */
-- 
2.10.1.613.g6021889


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [RFC PATCH 0/5] recursively grep across submodules
  2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
                   ` (4 preceding siblings ...)
  2016-10-27 22:38 ` [PATCH 5/5] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-10-27 23:26 ` Junio C Hamano
  2016-10-28  0:59   ` Stefan Beller
  2016-10-28 17:02   ` Brandon Williams
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
  6 siblings, 2 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-10-27 23:26 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

> As for the rest of the series, it should be ready for review or comments.

Just a few brief comments, before reading the patches carefully.

 * It is somewhat surprising that [1/5] is even needed (in other
   words, I would have expected something like this to be already
   there, and my knee-jerk reaction was "Heh, how does 'git status'
   know how to show submodules that are and are not initialized
   differently without this?"  

   The implementation that reads from the config of the current
   repository may be OK, but I actually would have expected that a
   check would be "given a $path, check to see if $path/.git is
   there and is a valid repository".  In a repository where the
   submodules originate, there may not even be submodule.$name.url
   entries there yet.

 * It is somewhat surprising that [4/5] does not even use the
   previous ls-files to find out the paths.  Also it is a bit
   disappointing to see that the way processes are spawned and
   managed does not share much with Stefan's earlier work, i.e.
   run_processes_parallel().  I was somehow hoping that it can be
   extended to support this use case, but apparently there aren't
   much to be shared.

Thanks.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [RFC PATCH 0/5] recursively grep across submodules
  2016-10-27 23:26 ` [RFC PATCH 0/5] recursively grep across submodules Junio C Hamano
@ 2016-10-28  0:59   ` Stefan Beller
  2016-10-28  2:50     ` Junio C Hamano
  2016-10-28 17:02   ` Brandon Williams
  1 sibling, 1 reply; 126+ messages in thread
From: Stefan Beller @ 2016-10-28  0:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git@vger.kernel.org

On Thu, Oct 27, 2016 at 4:26 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Brandon Williams <bmwill@google.com> writes:
>
>> As for the rest of the series, it should be ready for review or comments.
>
> Just a few brief comments, before reading the patches carefully.
>
>  * It is somewhat surprising that [1/5] is even needed (in other
>    words, I would have expected something like this to be already
>    there, and my knee-jerk reaction was "Heh, how does 'git status'
>    know how to show submodules that are and are not initialized
>    differently without this?"

The issue with much of the existing code is that it is submodule centric,
i.e. it is written to not care about the rest.

git status for example just calls "git submodule summary" to
parse and display the submodule information additionally.
It doesn't integrate submodules and treats them "just like files".

git submodule summary then proceeds to use "submodule--helper list"
that lists submodules *only* ignoring all files.

>
>    The implementation that reads from the config of the current
>    repository may be OK, but I actually would have expected that a
>    check would be "given a $path, check to see if $path/.git is
>    there and is a valid repository".  In a repository where the
>    submodules originate, there may not even be submodule.$name.url
>    entries there yet.

My reaction to 1/5 was that the implementation is sound,
but the design may need rethinking.

Instead of asking all these question, "Is a submodule
* initialized
* checked out (== have a working dir)
* have a .git dir (think of deleted submodules that keep the
  historical git dir around)
(* have commit X)
we would want to either extend the submodule-config API
to also carry these informations just like
name/path/sha1/url/shallow clone recommendation.

Obtaining the information above is however not as cheap,
because we'd need to do extra work additionally to parsing
the .gitmodules file. So the submodule-config would need to learn
an input that will tell the submodule-config what informations should
be evaluated and which can be omitted.

>
>  * It is somewhat surprising that [4/5] does not even use the
>    previous ls-files to find out the paths.  Also it is a bit
>    disappointing to see that the way processes are spawned and
>    managed does not share much with Stefan's earlier work, i.e.
>    run_processes_parallel().  I was somehow hoping that it can be
>    extended to support this use case, but apparently there aren't
>    much to be shared.

I think there are 2 issues here:
* The API I designed runs processes in parallel and the order or
  output is non-deterministic. git-grep uses threads and output is
  alphabetically sorted. The order is fixable though (by e.g. adding
  a flag that indicates which parallel processing output the caller
  wants).

* git-grep already has its own thread pool; integrating/combining
  2 worker pools doesn't sound trivial even to someone who wrote
  one of them.
  Maybe we could extend/rewrite the run_processes_parallel
  API to not just run processes, but instead you could also provide
  a function pointer that is used in a thread instead.
  Then we'd have one machinery that e.g. keeps track of the
  number of parallel processes/threads.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [RFC PATCH 0/5] recursively grep across submodules
  2016-10-28  0:59   ` Stefan Beller
@ 2016-10-28  2:50     ` Junio C Hamano
  2016-10-28  3:46       ` Stefan Beller
  2016-10-28 15:06       ` Philip Oakley
  0 siblings, 2 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-10-28  2:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Brandon Williams, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

>> Just a few brief comments, before reading the patches carefully.
>>
>>  * It is somewhat surprising that [1/5] is even needed (in other
>>    words, I would have expected something like this to be already
>>    there, and my knee-jerk reaction was "Heh, how does 'git status'
>>    know how to show submodules that are and are not initialized
>>    differently without this?"
>
> The issue with much of the existing code is that it is submodule centric,
> i.e. it is written to not care about the rest.
>
> git status for example just calls "git submodule summary" to
> parse and display the submodule information additionally.
> It doesn't integrate submodules and treats them "just like files".

Oh, I know all that after/while writing the above "it is somewhat
surprising" and reading what wt-status.c does.  It was just that it
was somewhat surprising ;-)

> My reaction to 1/5 was that the implementation is sound,
> but the design may need rethinking.
>
> Instead of asking all these question, "Is a submodule
> * initialized
> * checked out (== have a working dir)
> * have a .git dir (think of deleted submodules that keep the
>   historical git dir around)
> (* have commit X)
> we would want to either extend the submodule-config API
> to also carry these informations just like
> name/path/sha1/url/shallow clone recommendation.

I think you are going in a wrong direction with all the above.

Unless you are imagining "git grep" to initialize and checkout a
submodule that is not checked out on-demand, I do not think you have
any reason to even look at ".gitmodules" for the purpose of "I want
to grep both in superproject and submodules that are checked out."

You only need to detect .gitlink that exists in the index of the
superproject, and then there would be only two cases:

 * $path has an empty directory (not even .git in there).  The user
   is not interested in that submodule.

 * $path has ".git", either a directory (old layout or we are
   dealing with the repository that originated the submodule) or a
   "gitdir:" file that points into .git/modules of the repository of
   the superproject.  The user is interested in the submodule.

If $path has ".git" and nothing else, the only explanation is that
the user removed the working tree files in the submodule.  If your
grep is looking at working tree files, it is correct not to find
anything in there.  If it is working with "--cached", go look at the
index of the submodule repository (either the ".git" directory, or
the stashed-away repository in .git/modules/ in the superproject).
If it is working with a tree-ish, again, go look at the object store
in that submodule repository.

>>  * It is somewhat surprising that [4/5] does not even use the
>>    previous ls-files to find out the paths.  Also it is a bit
>>    disappointing to see that the way processes are spawned and
>>    managed does not share much with Stefan's earlier work, i.e.
>>    run_processes_parallel().  I was somehow hoping that it can be
>>    extended to support this use case, but apparently there aren't
>>    much to be shared.
>
> I think there are 2 issues here:

There is no issue here.  I was just giving my impressions (i.e.
"somewhat surprising").

> * git-grep already has its own thread pool

I know.  I was expecting that the previous "ls-files" that recurses
will be used to feed into that thread pool, but I didn't find that
in my cursory look at the patch, hence "somewhat surprising".

I hate it when people become overly defensive and start making
excuses when given harmless observations.




^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [RFC PATCH 0/5] recursively grep across submodules
  2016-10-28  2:50     ` Junio C Hamano
@ 2016-10-28  3:46       ` Stefan Beller
  2016-10-28 15:06       ` Philip Oakley
  1 sibling, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-10-28  3:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git@vger.kernel.org

On Thu, Oct 27, 2016 at 7:50 PM, Junio C Hamano <gitster@pobox.com> wrote:


>
> Unless you are imagining "git grep" to initialize and checkout a
> submodule that is not checked out on-demand, I do not think you have
> any reason to even look at ".gitmodules" for the purpose of "I want
> to grep both in superproject and submodules that are checked out."

In tree-ish mode you may have this example:

    git -C superproject rm path/to/submodule
    git -C superproject commit -a -m "delete submodule"
    ...  time passes ...
    git -C superproject grep --recurse-submodule -e <expression> \
            HEAD~42 path/to/submodule

In the last command you need to map the path to submodule to
the name of the submodule to find out the place of the object store
for that submodule and see if it exists.

> If it is working with a tree-ish, again, go look at the object store
> in that submodule repository.

and to find out the object store for that submodule you need the
path -> name mapping at that point in time, i.e. you want to look
at the .gitmodules file at the given tree-ish.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [RFC PATCH 0/5] recursively grep across submodules
  2016-10-28  2:50     ` Junio C Hamano
  2016-10-28  3:46       ` Stefan Beller
@ 2016-10-28 15:06       ` Philip Oakley
  1 sibling, 0 replies; 126+ messages in thread
From: Philip Oakley @ 2016-10-28 15:06 UTC (permalink / raw)
  To: Junio C Hamano, Stefan Beller; +Cc: Brandon Williams, Git List

From: "Junio C Hamano" <gitster@pobox.com>
>
> I hate it when people become overly defensive and start making
> excuses when given harmless observations.
>

Hi Junio,

It can sometimes be difficult for readers to appreciate which way comments 
are meant to be interpreted, especially as one cannot usually 'see' the 
issue being raised with one's personal work, no matter who writes them. I 
too have to supportively review the work of others (as a volunteer), who 
then don't always respond or understand, as hoped, and it can be 
frustrating.

It can be very hard to write a reasonable write up that gets the balance 
between being on the one hand patronising (e.g. over-explained) and on the 
other too terse, and yet still not be too nuanced that the points are 
missed. The responder has the similar problem, especially if they have 
misunderstood the comment, and then end up just end up digging the hole 
deeper by over-explaining their position. Extricating the discussion from 
the trap can be tricky.

Thank you for your reviews.
--
Philip 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [RFC PATCH 0/5] recursively grep across submodules
  2016-10-27 23:26 ` [RFC PATCH 0/5] recursively grep across submodules Junio C Hamano
  2016-10-28  0:59   ` Stefan Beller
@ 2016-10-28 17:02   ` Brandon Williams
  2016-10-28 17:21     ` Junio C Hamano
  1 sibling, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-10-28 17:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 10/27, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > As for the rest of the series, it should be ready for review or comments.
> 
> Just a few brief comments, before reading the patches carefully.
> 
>  * It is somewhat surprising that [1/5] is even needed (in other
>    words, I would have expected something like this to be already
>    there, and my knee-jerk reaction was "Heh, how does 'git status'
>    know how to show submodules that are and are not initialized
>    differently without this?"  

Yeah I was also surprised to find that this kind of functionality didn't
already exist.  Though I guess there are still many builtin's that don't
play nice with submodules so maybe this kind of functionality just
wasn't needed until now.

>  * It is somewhat surprising that [4/5] does not even use the
>    previous ls-files to find out the paths.

The first attempt I made at this series used ls-files to produce a list
of files which was then fed to the grep machinery.  The problem I found
with this approach was when I started moving to work on grepping
history, at that point it seemed to make more sense to have a process
for each submodule.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [RFC PATCH 0/5] recursively grep across submodules
  2016-10-28 17:02   ` Brandon Williams
@ 2016-10-28 17:21     ` Junio C Hamano
  0 siblings, 0 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-10-28 17:21 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

Brandon Williams <bmwill@google.com> writes:

>> Just a few brief comments, before reading the patches carefully.
>> 
>>  * It is somewhat surprising that [1/5] is even needed (in other
>>    words, I would have expected something like this to be already
>>    there, and my knee-jerk reaction was "Heh, how does 'git status'
>>    know how to show submodules that are and are not initialized
>>    differently without this?"  
>
> Yeah I was also surprised to find that this kind of functionality didn't
> already exist.  Though I guess there are still many builtin's that don't
> play nice with submodules so maybe this kind of functionality just
> wasn't needed until now.
>
>>  * It is somewhat surprising that [4/5] does not even use the
>>    previous ls-files to find out the paths.
>
> The first attempt I made at this series used ls-files to produce a list
> of files which was then fed to the grep machinery.  The problem I found
> with this approach was when I started moving to work on grepping
> history, at that point it seemed to make more sense to have a process
> for each submodule.

Yup, that makes sense to me.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 5/5] grep: enable recurse-submodules to work on <tree> objects
  2016-10-27 22:38 ` [PATCH 5/5] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-10-28 19:35   ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-28 19:35 UTC (permalink / raw)
  To: git

On 10/27, Brandon Williams wrote:
> diff --git a/tree-walk.c b/tree-walk.c
> index 828f4356b..b3f996174 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -999,10 +999,11 @@ static enum interesting do_match(const struct name_entry *entry,
>  					return entry_interesting;
>  
>  				/*
> -				 * Match all directories. We'll try to
> -				 * match files later on.
> +				 * Match all directories and gitlinks. We'll
> +				 * try to match files later on.
>  				 */
> -				if (ps->recursive && S_ISDIR(entry->mode))
> +				if (ps->recursive && (S_ISDIR(entry->mode) ||
> +						      S_ISGITLINK(entry->mode)))
>  					return entry_interesting;
>  			}
>  
> @@ -1043,13 +1044,13 @@ static enum interesting do_match(const struct name_entry *entry,
>  		strbuf_setlen(base, base_offset + baselen);
>  
>  		/*
> -		 * Match all directories. We'll try to match files
> -		 * later on.
> -		 * max_depth is ignored but we may consider support it
> -		 * in future, see
> +		 * Match all directories and gitlinks. We'll try to match files
> +		 * later on.  max_depth is ignored but we may consider support
> +		 * it in future, see
>  		 * http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840
>  		 */
> -		if (ps->recursive && S_ISDIR(entry->mode))
> +		if (ps->recursive && (S_ISDIR(entry->mode) ||
> +				      S_ISGITLINK(entry->mode)))
>  			return entry_interesting;
>  	}
>  	return never_interesting; /* No matches */

Looks like this change actually breaks a test in t4010-diff-pathspec.sh.
I think I'll have to add a flag to optionally let through submodules as
apposed to just treating them like directories.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v2 0/6] recursively grep across submodules
  2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
                   ` (5 preceding siblings ...)
  2016-10-27 23:26 ` [RFC PATCH 0/5] recursively grep across submodules Junio C Hamano
@ 2016-10-31 22:38 ` Brandon Williams
  2016-10-31 22:38   ` [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
                     ` (6 more replies)
  6 siblings, 7 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-31 22:38 UTC (permalink / raw)
  To: git, sbeller; +Cc: Brandon Williams

A few minor style issues have been taken care of from v1 of this series.  I
also added an additional patch to enable grep to function on history where the
submodule has been moved.

I also changed how tree grep performs pathspec checking against submodule
entries in order to fix a test that was breaking with v1 of the series.

Brandon Williams (6):
  submodules: add helper functions to determine presence of submodules
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: search history of moved submodules

 Documentation/git-grep.txt         |  14 ++
 builtin/grep.c                     | 387 ++++++++++++++++++++++++++++++++++---
 cache.h                            |   2 +
 config.c                           |   8 +-
 git.c                              |   2 +-
 grep.c                             |  16 +-
 grep.h                             |   1 +
 submodule-config.c                 |   6 +-
 submodule-config.h                 |   3 +
 submodule.c                        |  51 +++++
 submodule.h                        |   3 +
 t/t7814-grep-recurse-submodules.sh | 182 +++++++++++++++++
 12 files changed, 643 insertions(+), 32 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
@ 2016-10-31 22:38   ` Brandon Williams
  2016-10-31 23:34     ` Stefan Beller
  2016-11-05  2:34     ` Jonathan Tan
  2016-10-31 22:38   ` [PATCH v2 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
                     ` (5 subsequent siblings)
  6 siblings, 2 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-31 22:38 UTC (permalink / raw)
  To: git, sbeller; +Cc: Brandon Williams

Add two helper functions to submodules.c.
`is_submodule_initialized()` checks if a submodule has been initialized
at a given path and `is_submodule_checked_out()` check if a submodule
has been checked out at a given path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 39 +++++++++++++++++++++++++++++++++++++++
 submodule.h |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/submodule.c b/submodule.c
index 6f7d883..ff4e7b2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,45 @@ void gitmodules_config(void)
 	}
 }
 
+/*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+	int ret = 0;
+	const struct submodule *module = NULL;
+
+	module = submodule_from_path(null_sha1, path);
+
+	if (module) {
+		struct strbuf buf = STRBUF_INIT;
+		char *submodule_url = NULL;
+
+		strbuf_addf(&buf, "submodule.%s.url", module->name);
+		ret = !git_config_get_string(buf.buf, &submodule_url);
+
+		free(submodule_url);
+		strbuf_release(&buf);
+	}
+
+	return ret;
+}
+
+/*
+ * Determine if a submodule has been checked out at a given 'path'
+ */
+int is_submodule_checked_out(const char *path)
+{
+	int ret = 0;
+	struct strbuf buf = STRBUF_INIT;
+
+	strbuf_addf(&buf, "%s/.git", path);
+	ret = file_exists(buf.buf);
+
+	strbuf_release(&buf);
+	return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a..bd039ca 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
+extern int is_submodule_checked_out(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v2 2/6] submodules: load gitmodules file from commit sha1
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
  2016-10-31 22:38   ` [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-10-31 22:38   ` Brandon Williams
  2016-11-01 16:39     ` Stefan Beller
  2016-10-31 22:38   ` [PATCH v2 3/6] grep: add submodules as a grep source type Brandon Williams
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-10-31 22:38 UTC (permalink / raw)
  To: git, sbeller; +Cc: Brandon Williams

Teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 cache.h            |  2 ++
 config.c           |  8 ++++----
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c        | 12 ++++++++++++
 submodule.h        |  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 1be6526..559a461 100644
--- a/cache.h
+++ b/cache.h
@@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+				     const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb..4d78e72 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	return do_config_from(&top, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-				     const char *name,
-				     const unsigned char *sha1,
-				     void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+			      const char *name,
+			      const unsigned char *sha1,
+			      void *data)
 {
 	enum object_type type;
 	char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085b..8b9a2ef 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
-				      unsigned char *gitmodules_sha1,
-				      struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+			       unsigned char *gitmodules_sha1,
+			       struct strbuf *rev)
 {
 	int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..78584ba 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+				      unsigned char *gitmodules_sha1,
+				      struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index ff4e7b2..19dfbd4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
 	}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned char sha1[20];
+
+	if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
+		git_config_from_blob_sha1(submodule_config, rev.buf,
+					  sha1, NULL);
+	}
+	strbuf_release(&rev);
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index bd039ca..9a24ac8 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_checked_out(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v2 3/6] grep: add submodules as a grep source type
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
  2016-10-31 22:38   ` [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
  2016-10-31 22:38   ` [PATCH v2 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-10-31 22:38   ` Brandon Williams
  2016-11-01 16:53     ` Stefan Beller
  2016-11-01 17:31     ` Junio C Hamano
  2016-10-31 22:38   ` [PATCH v2 4/6] grep: optionally recurse into submodules Brandon Williams
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-31 22:38 UTC (permalink / raw)
  To: git, sbeller; +Cc: Brandon Williams

Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 grep.c | 16 +++++++++++++++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35..0dbdc1d 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	case GREP_SOURCE_FILE:
 		gs->identifier = xstrdup(identifier);
 		break;
+	case GREP_SOURCE_SUBMODULE:
+		if (!identifier) {
+			gs->identifier = NULL;
+			break;
+		}
+		/*
+		 * FALL THROUGH
+		 * If the identifier is non-NULL (in the submodule case) it
+		 * will be a SHA1 that needs to be copied.
+		 */
 	case GREP_SOURCE_SHA1:
 		gs->identifier = xmalloc(20);
 		hashcpy(gs->identifier, identifier);
 		break;
 	case GREP_SOURCE_BUF:
 		gs->identifier = NULL;
+		break;
 	}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
 	switch (gs->type) {
 	case GREP_SOURCE_FILE:
 	case GREP_SOURCE_SHA1:
+	case GREP_SOURCE_SUBMODULE:
 		free(gs->buf);
 		gs->buf = NULL;
 		gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
 		return grep_source_load_sha1(gs);
 	case GREP_SOURCE_BUF:
 		return gs->buf ? 0 : -1;
+	case GREP_SOURCE_SUBMODULE:
+		break;
 	}
-	die("BUG: invalid grep_source type");
+	die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23..267534c 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
 		GREP_SOURCE_SHA1,
 		GREP_SOURCE_FILE,
 		GREP_SOURCE_BUF,
+		GREP_SOURCE_SUBMODULE,
 	} type;
 	void *identifier;
 
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v2 4/6] grep: optionally recurse into submodules
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
                     ` (2 preceding siblings ...)
  2016-10-31 22:38   ` [PATCH v2 3/6] grep: add submodules as a grep source type Brandon Williams
@ 2016-10-31 22:38   ` Brandon Williams
  2016-11-01 17:26     ` Stefan Beller
  2016-10-31 22:38   ` [PATCH v2 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-10-31 22:38 UTC (permalink / raw)
  To: git, sbeller; +Cc: Brandon Williams

Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |   5 +
 builtin/grep.c                     | 300 ++++++++++++++++++++++++++++++++++---
 git.c                              |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
 4 files changed, 385 insertions(+), 21 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e..17aa1ba 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
+	   [--recurse-submodules]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -88,6 +89,10 @@ OPTIONS
 	mechanism.  Only useful when searching files in the current directory
 	with `--no-index`.
 
+--recurse-submodules::
+	Recursively search in each submodule that has been initialized and
+	checked out in the repository.
+
 -a::
 --text::
 	Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6a..cf4f51e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
 	NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
 			break;
 
 		opt->output_priv = w;
-		hit |= grep_source(opt, &w->source);
+		if (w->source.type == GREP_SOURCE_SUBMODULE)
+			hit |= grep_submodule_launch(opt, &w->source);
+		else
+			hit |= grep_source(opt, &w->source);
 		grep_source_clear_data(&w->source);
 		work_done(w);
 	}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
+	} else if (super_prefix) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
+		strbuf_addstr(&pathbuf, super_prefix);
+		strbuf_addstr(&pathbuf, filename + tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 {
 	struct strbuf buf = STRBUF_INIT;
 
-	if (opt->relative && opt->prefix_length)
+	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename, opt->prefix, &buf);
-	else
+	} else {
+		if (super_prefix)
+			strbuf_addstr(&buf, super_prefix);
 		strbuf_addstr(&buf, filename);
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
@@ -378,31 +396,258 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 		exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+				      const struct pathspec *pathspec,
+				      int cached, int untracked,
+				      int opt_exclude, int use_index,
+				      int pattern_type_arg)
+{
+	struct grep_pat *pattern;
+	int i;
+
+	if (recurse_submodules)
+		argv_array_push(&submodule_options, "--recurse-submodules");
+
+	if (cached)
+		argv_array_push(&submodule_options, "--cached");
+	if (!use_index)
+		argv_array_push(&submodule_options, "--no-index");
+	if (untracked)
+		argv_array_push(&submodule_options, "--untracked");
+	if (opt_exclude > 0)
+		argv_array_push(&submodule_options, "--exclude-standard");
+
+	if (opt->invert)
+		argv_array_push(&submodule_options, "-v");
+	if (opt->ignore_case)
+		argv_array_push(&submodule_options, "-i");
+	if (opt->word_regexp)
+		argv_array_push(&submodule_options, "-w");
+	switch (opt->binary) {
+	case GREP_BINARY_NOMATCH:
+		argv_array_push(&submodule_options, "-I");
+		break;
+	case GREP_BINARY_TEXT:
+		argv_array_push(&submodule_options, "-a");
+		break;
+	default:
+		break;
+	}
+	if (opt->allow_textconv)
+		argv_array_push(&submodule_options, "--textconv");
+	if (opt->max_depth != -1)
+		argv_array_pushf(&submodule_options, "--max-depth=%d",
+				 opt->max_depth);
+	if (opt->linenum)
+		argv_array_push(&submodule_options, "-n");
+	if (!opt->pathname)
+		argv_array_push(&submodule_options, "-h");
+	if (!opt->relative)
+		argv_array_push(&submodule_options, "--full-name");
+	if (opt->name_only)
+		argv_array_push(&submodule_options, "-l");
+	if (opt->unmatch_name_only)
+		argv_array_push(&submodule_options, "-L");
+	if (opt->null_following_name)
+		argv_array_push(&submodule_options, "-z");
+	if (opt->count)
+		argv_array_push(&submodule_options, "-c");
+	if (opt->file_break)
+		argv_array_push(&submodule_options, "--break");
+	if (opt->heading)
+		argv_array_push(&submodule_options, "--heading");
+	if (opt->pre_context)
+		argv_array_pushf(&submodule_options, "--before-context=%d",
+				 opt->pre_context);
+	if (opt->post_context)
+		argv_array_pushf(&submodule_options, "--after-context=%d",
+				 opt->post_context);
+	if (opt->funcname)
+		argv_array_push(&submodule_options, "-p");
+	if (opt->funcbody)
+		argv_array_push(&submodule_options, "-W");
+	if (opt->all_match)
+		argv_array_push(&submodule_options, "--all-match");
+	if (opt->debug)
+		argv_array_push(&submodule_options, "--debug");
+	if (opt->status_only)
+		argv_array_push(&submodule_options, "-q");
+
+	switch (pattern_type_arg) {
+	case GREP_PATTERN_TYPE_BRE:
+		argv_array_push(&submodule_options, "-G");
+		break;
+	case GREP_PATTERN_TYPE_ERE:
+		argv_array_push(&submodule_options, "-E");
+		break;
+	case GREP_PATTERN_TYPE_FIXED:
+		argv_array_push(&submodule_options, "-F");
+		break;
+	case GREP_PATTERN_TYPE_PCRE:
+		argv_array_push(&submodule_options, "-P");
+		break;
+	case GREP_PATTERN_TYPE_UNSPECIFIED:
+		break;
+	}
+
+	for (pattern = opt->pattern_list; pattern != NULL;
+	     pattern = pattern->next) {
+		switch (pattern->token) {
+		case GREP_PATTERN:
+			argv_array_pushf(&submodule_options, "-e%s",
+					 pattern->pattern);
+			break;
+		case GREP_AND:
+		case GREP_OPEN_PAREN:
+		case GREP_CLOSE_PAREN:
+		case GREP_NOT:
+		case GREP_OR:
+			argv_array_push(&submodule_options, pattern->pattern);
+			break;
+		/* BODY and HEAD are not used by git-grep */
+		case GREP_PATTERN_BODY:
+		case GREP_PATTERN_HEAD:
+			break;
+		}
+	}
+
+	/*
+	 * Limit number of threads for child process to use.
+	 * This is to prevent potential fork-bomb behavior of git-grep as each
+	 * submodule process has its own thread pool.
+	 */
+	if (num_threads)
+		argv_array_pushf(&submodule_options, "--threads=%d",
+				 (num_threads + 1) / 2);
+
+	/* Add Pathspecs */
+	argv_array_push(&submodule_options, "--");
+	for (i = 0; i < pathspec->nr; i++)
+		argv_array_push(&submodule_options,
+				pathspec->items[i].original);
+}
+
+/*
+ * Launch child process to grep contents of a submodule
+ */
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int status, i;
+	struct work_item *w = opt->output_priv;
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	/* Add super prefix */
+	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
+			 super_prefix ? super_prefix : "",
+			 gs->name);
+	argv_array_push(&cp.args, "grep");
+
+	/* Add options */
+	for (i = 0; i < submodule_options.argc; i++)
+		argv_array_push(&cp.args, submodule_options.argv[i]);
+
+	cp.git_cmd = 1;
+	cp.dir = gs->path;
+
+	/*
+	 * Capture output to output buffer and check the return code from the
+	 * child process.  A '0' indicates a hit, a '1' indicates no hit and
+	 * anything else is an error.
+	 */
+	status = capture_command(&cp, &w->out, 0);
+	if (status && (status != 1))
+		exit(status);
+
+	/* invert the return code to make a hit equal to 1 */
+	return !status;
+}
+
+/*
+ * Prep grep structures for a submodule grep
+ * sha1: the sha1 of the submodule or NULL if using the working tree
+ * filename: name of the submodule including tree name of parent
+ * path: location of the submodule
+ */
+static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
+			  const char *filename, const char *path)
+{
+	if (!(is_submodule_initialized(path) &&
+	      is_submodule_checked_out(path))) {
+		warning("skiping submodule '%s%s' since it is not initialized and checked out",
+			super_prefix ? super_prefix : "",
+			path);
+		return 0;
+	}
+
+#ifndef NO_PTHREADS
+	if (num_threads) {
+		add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
+		return 0;
+	} else
+#endif
+	{
+		struct work_item w;
+		int hit;
+
+		grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
+				 filename, path, sha1);
+		strbuf_init(&w.out, 0);
+		opt->output_priv = &w;
+		hit = grep_submodule_launch(opt, &w.source);
+
+		write_or_die(1, w.out.buf, w.out.len);
+
+		grep_source_clear(&w.source);
+		strbuf_release(&w.out);
+		return hit;
+	}
+}
+
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
+		      int cached)
 {
 	int hit = 0;
 	int nr;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
+
 	read_cache();
 
 	for (nr = 0; nr < active_nr; nr++) {
 		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!ce_path_match(ce, pathspec, NULL))
-			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce) || ce_intent_to_add(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
-					 ce->name);
+		strbuf_setlen(&name, name_base_len);
+		strbuf_addstr(&name, ce->name);
+
+		if (S_ISREG(ce->ce_mode) &&
+		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
+				   S_ISDIR(ce->ce_mode) ||
+				   S_ISGITLINK(ce->ce_mode))) {
+			/*
+			 * If CE_VALID is on, we assume worktree file and its
+			 * cache entry are identical, even if worktree file has
+			 * been modified, so use cache version instead
+			 */
+			if (cached || (ce->ce_flags & CE_VALID) ||
+			    ce_skip_worktree(ce)) {
+				if (ce_stage(ce) || ce_intent_to_add(ce))
+					continue;
+				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
+						 0, ce->name);
+			} else {
+				hit |= grep_file(opt, ce->name);
+			}
+		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+			   submodule_path_match(pathspec, name.buf, NULL)) {
+			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
 		}
-		else
-			hit |= grep_file(opt, ce->name);
+
 		if (ce_stage(ce)) {
 			do {
 				nr++;
@@ -413,6 +658,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -651,6 +898,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("search in both tracked and untracked files")),
 		OPT_SET_INT(0, "exclude-standard", &opt_exclude,
 			    N_("ignore files specified via '.gitignore'"), 1),
+		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
+			 N_("recursivley search in each submodule")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -755,6 +1004,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	init_grep_defaults();
 	git_config(grep_cmd_config, NULL);
 	grep_init(&opt, prefix);
+	super_prefix = get_super_prefix();
 
 	/*
 	 * If there is no -- then the paths must exist in the working
@@ -872,6 +1122,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+	if (recurse_submodules) {
+		gitmodules_config();
+		compile_submodule_options(&opt, &pathspec, cached, untracked,
+					  opt_exclude, use_index,
+					  pattern_type_arg);
+	}
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -895,6 +1152,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (recurse_submodules && (!use_index || untracked || list.nr))
+		die(_("option not supported with --recurse-submodules."));
+
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
diff --git a/git.c b/git.c
index efa1059..a156efd 100644
--- a/git.c
+++ b/git.c
@@ -434,7 +434,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
-	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "grep", cmd_grep, RUN_SETUP_GENTLY | SUPPORT_SUPER_PREFIX },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
new file mode 100755
index 0000000..b670c70
--- /dev/null
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='Test grep recurse-submodules feature
+
+This test verifies the recurse-submodules feature correctly greps across
+submodules.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup directory structure and submodule' '
+	echo "foobar" >a &&
+	mkdir b &&
+	echo "bar" >b/b &&
+	git add a b &&
+	git commit -m "add a and b" &&
+	git init submodule &&
+	echo "foobar" >submodule/a &&
+	git -C submodule add a &&
+	git -C submodule commit -m "add a" &&
+	git submodule add ./submodule &&
+	git commit -m "added submodule"
+'
+
+test_expect_success 'grep correctly finds patterns in a submodule' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and basic pathspecs' '
+	cat >expect <<-\EOF &&
+	submodule/a:foobar
+	EOF
+
+	git grep -e. --recurse-submodules -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and nested submodules' '
+	git init submodule/sub &&
+	echo "foobar" >submodule/sub/a &&
+	git -C submodule/sub add a &&
+	git -C submodule/sub commit -m "add a" &&
+	git -C submodule submodule add ./sub &&
+	git -C submodule add sub &&
+	git -C submodule commit -m "added sub" &&
+	git add submodule &&
+	git commit -m "updated submodule" &&
+
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --and -e "foo" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	b/b:bar
+	EOF
+
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_incompatible_with_recurse_submodules ()
+{
+	test_expect_success "--recurse-submodules and $1 are incompatible" "
+		test_must_fail git grep -e. --recurse-submodules $1 2>actual &&
+		test_i18ngrep 'not supported with --recurse-submodules' actual
+	"
+}
+
+test_incompatible_with_recurse_submodules --untracked
+test_incompatible_with_recurse_submodules --no-index
+test_incompatible_with_recurse_submodules HEAD
+
+test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v2 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
                     ` (3 preceding siblings ...)
  2016-10-31 22:38   ` [PATCH v2 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-10-31 22:38   ` Brandon Williams
  2016-11-11 23:09     ` Jonathan Tan
  2016-10-31 22:38   ` [PATCH v2 6/6] grep: search history of moved submodules Brandon Williams
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
  6 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-10-31 22:38 UTC (permalink / raw)
  To: git, sbeller; +Cc: Brandon Williams

Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD` from:
HEAD:file
<commit sha1 of submodule>:sub/file

to:
HEAD:file
HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         | 13 +++++-
 builtin/grep.c                     | 83 +++++++++++++++++++++++++++++++++++---
 t/t7814-grep-recurse-submodules.sh | 44 +++++++++++++++++++-
 3 files changed, 131 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..386a868 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules]
+	   [--recurse-submodules] [--parent-basename]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
 	Recursively search in each submodule that has been initialized and
-	checked out in the repository.
+	checked out in the repository.  When used in combination with the
+	<tree> option the prefix of all submodule output will be the name of
+	the parent project's <tree> object.
+
+--parent-basename::
+	For internal use only.  In order to produce uniform output with the
+	--recurse-submodules option, this option can be used to provide the
+	basename of a parent's <tree> object to a submodule so the submodule
+	can prefix its output with the parent's name rather than the SHA1 of
+	the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index cf4f51e..2f10930 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
@@ -535,19 +537,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
+	const char *end_of_base;
+	const char *name;
 	struct work_item *w = opt->output_priv;
 
+	end_of_base = strchr(gs->name, ':');
+	if (end_of_base)
+		name = end_of_base + 1;
+	else
+		name = gs->name;
+
 	prepare_submodule_repo_env(&cp.env_array);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
 			 super_prefix ? super_prefix : "",
-			 gs->name);
+			 name);
 	argv_array_push(&cp.args, "grep");
 
+	/*
+	 * Add basename of parent project
+	 * When performing grep on a <tree> object the filename is prefixed
+	 * with the object's name: '<tree-name>:filename'.  In order to
+	 * provide uniformity of output we want to pass the name of the
+	 * parent project's object name to the submodule so the submodule can
+	 * prefix its output with the parent's name and not its own SHA1.
+	 */
+	if (end_of_base)
+		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
+				 (int) (end_of_base - gs->name),
+				 gs->name);
+
 	/* Add options */
-	for (i = 0; i < submodule_options.argc; i++)
+	for (i = 0; i < submodule_options.argc; i++) {
+		/*
+		 * If there is a <tree> identifier for the submodule, add the
+		 * rev after adding the submodule options but before the
+		 * pathspecs.  To do this we listen for the '--' and insert the
+		 * sha1 before pushing the '--' onto the child process argv
+		 * array.
+		 */
+		if (gs->identifier &&
+		    !strcmp("--", submodule_options.argv[i])) {
+			argv_array_push(&cp.args, sha1_to_hex(gs->identifier));
+		}
+
 		argv_array_push(&cp.args, submodule_options.argv[i]);
+	}
 
 	cp.git_cmd = 1;
 	cp.dir = gs->path;
@@ -671,12 +707,29 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	enum interesting match = entry_not_interesting;
 	struct name_entry entry;
 	int old_baselen = base->len;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
+			strbuf_setlen(&name, name_base_len);
+			strbuf_addstr(&name, base->buf + tn_len);
+
+			if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+				strbuf_addstr(&name, entry.path);
+				match = submodule_path_match(pathspec, name.buf,
+							     NULL);
+			} else {
+				match = tree_entry_interesting(&entry, &name,
+							       0, pathspec);
+			}
+
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
@@ -688,8 +741,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_sha1(opt, entry.oid->hash, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
-		}
-		else if (S_ISDIR(entry.mode)) {
+		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
@@ -705,12 +757,18 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
+		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+			hit |= grep_submodule(opt, entry.oid->hash, base->buf,
+					      base->buf + tn_len);
 		}
+
 		strbuf_setlen(base, old_baselen);
 
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -734,6 +792,10 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
 
+		/* Use parent's name as base when recursing submodules */
+		if (recurse_submodules && parent_basename)
+			name = parent_basename;
+
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
@@ -760,6 +822,12 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
+
+		/* load the gitmodules file for this rev */
+		if (recurse_submodules) {
+			submodule_free();
+			gitmodules_config_sha1(real_obj->oid.hash);
+		}
 		if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) {
 			hit = 1;
 			if (opt->status_only)
@@ -900,6 +968,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    N_("ignore files specified via '.gitignore'"), 1),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			 N_("recursivley search in each submodule")),
+		OPT_STRING(0, "parent-basename", &parent_basename,
+			   N_("basename"),
+			   N_("prepend parent project's basename to output")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -1152,7 +1223,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked || list.nr))
+	if (recurse_submodules && (!use_index || untracked))
 		die(_("option not supported with --recurse-submodules."));
 
 	if (!show_in_pager && !opt.status_only)
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index b670c70..3d1892d 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -84,6 +84,49 @@ test_expect_success 'grep and multiple patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success 'basic grep tree' '
+	cat >expect <<-\EOF &&
+	HEAD:a:foobar
+	HEAD:b/b:bar
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^' '
+	cat >expect <<-\EOF &&
+	HEAD^:a:foobar
+	HEAD^:b/b:bar
+	HEAD^:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^ > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^^' '
+	cat >expect <<-\EOF &&
+	HEAD^^:a:foobar
+	HEAD^^:b/b:bar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^^ > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- submodule > actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
@@ -94,6 +137,5 @@ test_incompatible_with_recurse_submodules ()
 
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
-test_incompatible_with_recurse_submodules HEAD
 
 test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v2 6/6] grep: search history of moved submodules
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
                     ` (4 preceding siblings ...)
  2016-10-31 22:38   ` [PATCH v2 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-10-31 22:38   ` Brandon Williams
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-10-31 22:38 UTC (permalink / raw)
  To: git, sbeller; +Cc: Brandon Williams

If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.

This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit.  If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.

In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/grep.c                     | 18 +++++++++++++----
 t/t7814-grep-recurse-submodules.sh | 41 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2f10930..032d476 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -548,6 +548,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 		name = gs->name;
 
 	prepare_submodule_repo_env(&cp.env_array);
+	argv_array_push(&cp.env_array, GIT_DIR_ENVIRONMENT);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
@@ -612,10 +613,19 @@ static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 {
 	if (!(is_submodule_initialized(path) &&
 	      is_submodule_checked_out(path))) {
-		warning("skiping submodule '%s%s' since it is not initialized and checked out",
-			super_prefix ? super_prefix : "",
-			path);
-		return 0;
+		/*
+		 * If searching history, check for the presense of the
+		 * submodule's gitdir before skipping the submodule.
+		 */
+		if (sha1) {
+			path = git_path("modules/%s",
+					submodule_from_path(null_sha1, path)->name);
+
+			if(!(is_directory(path) && is_git_directory(path)))
+				return 0;
+		} else {
+			return 0;
+		}
 	}
 
 #ifndef NO_PTHREADS
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 3d1892d..ee173ad 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -127,6 +127,47 @@ test_expect_success 'grep tree and pathspecs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+	git init parent &&
+	echo "foobar" >parent/file &&
+	git -C parent add file &&
+	git -C parent commit -m "add file" &&
+
+	git init sub &&
+	echo "foobar" >sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+
+	git -C parent submodule add ../sub &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	test_cmp expect actual &&
+
+	git -C parent mv sub sub-moved &&
+	git -C parent commit -m "moved submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub-moved/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD^:file:foobar
+	HEAD^:sub/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ > actual &&
+	test_cmp expect actual &&
+
+	rm -rf parent sub
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-10-31 22:38   ` [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-10-31 23:34     ` Stefan Beller
  2016-11-01 17:20       ` Junio C Hamano
  2016-11-01 17:23       ` Brandon Williams
  2016-11-05  2:34     ` Jonathan Tan
  1 sibling, 2 replies; 126+ messages in thread
From: Stefan Beller @ 2016-10-31 23:34 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org

On Mon, Oct 31, 2016 at 3:38 PM, Brandon Williams <bmwill@google.com> wrote:
> +int is_submodule_checked_out(const char *path)
> +{
> +       int ret = 0;
> +       struct strbuf buf = STRBUF_INIT;
> +
> +       strbuf_addf(&buf, "%s/.git", path);
> +       ret = file_exists(buf.buf);

I think we can be more tight here; instead of checking
if the file or directory exists, we should be checking if
it is a valid git directory, i.e. s/file_exists/resolve_gitdir/
which returns a path to the actual git dir (in case of a .gitlink)
or NULL when nothing is found that looks like a git directory or
pointer to it.


> +
> +       strbuf_release(&buf);
> +       return ret;
> +}
> +
>  int parse_submodule_update_strategy(const char *value,
>                 struct submodule_update_strategy *dst)
>  {
> diff --git a/submodule.h b/submodule.h
> index d9e197a..bd039ca 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
>                 const char *path);
>  int submodule_config(const char *var, const char *value, void *cb);
>  void gitmodules_config(void);
> +extern int is_submodule_initialized(const char *path);
> +extern int is_submodule_checked_out(const char *path);

no need to put extern for function names. (no other functions in this
header are extern. so local consistency maybe? I'd also claim that
all other extern functions in headers ought to be declared without
being extern)

Also naming: I'd go with

    is_submodule_populated ;)

as it will tell whether this function will tell you if there is a valid
submodule (and not just an empty dir as a place holder).

You don't have to run "git checkout" to arrive in that state,
but a plumbing command such as read_tree may have been used.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 2/6] submodules: load gitmodules file from commit sha1
  2016-10-31 22:38   ` [PATCH v2 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-11-01 16:39     ` Stefan Beller
  0 siblings, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-11-01 16:39 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org

On Mon, Oct 31, 2016 at 3:38 PM, Brandon Williams <bmwill@google.com> wrote:
> Teach submodules to load a '.gitmodules' file from a commit sha1.  This
> enables the population of the submodule_cache to be based on the state
> of the '.gitmodules' file from a particular commit.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  cache.h            |  2 ++
>  config.c           |  8 ++++----
>  submodule-config.c |  6 +++---
>  submodule-config.h |  3 +++
>  submodule.c        | 12 ++++++++++++
>  submodule.h        |  1 +
>  6 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 1be6526..559a461 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, void *);
>  extern int git_config_from_file(config_fn_t fn, const char *, void *);
>  extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
>                                         const char *name, const char *buf, size_t len, void *data);
> +extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
> +                                    const unsigned char *sha1, void *data);
>  extern void git_config_push_parameter(const char *text);
>  extern int git_config_from_parameters(config_fn_t fn, void *data);
>  extern void git_config(config_fn_t fn, void *);
> diff --git a/config.c b/config.c
> index 83fdecb..4d78e72 100644
> --- a/config.c
> +++ b/config.c
> @@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
>         return do_config_from(&top, fn, data);
>  }
>
> -static int git_config_from_blob_sha1(config_fn_t fn,
> -                                    const char *name,
> -                                    const unsigned char *sha1,
> -                                    void *data)
> +int git_config_from_blob_sha1(config_fn_t fn,
> +                             const char *name,
> +                             const unsigned char *sha1,

While looking at this code, we may want to investigate if a conversion
to struct object_id (instead of const char * for sha1) is feasible.

> +                             void *data)
>  {
>         enum object_type type;
>         char *buf;
> diff --git a/submodule-config.c b/submodule-config.c
> index 098085b..8b9a2ef 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
>         return ret;
>  }
>
> -static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
> -                                     unsigned char *gitmodules_sha1,
> -                                     struct strbuf *rev)
> +int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
> +                              unsigned char *gitmodules_sha1,
> +                              struct strbuf *rev)
>  {
>         int ret = 0;
>
> diff --git a/submodule-config.h b/submodule-config.h
> index d05c542..78584ba 100644
> --- a/submodule-config.h
> +++ b/submodule-config.h
> @@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
>                 const char *name);
>  const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
>                 const char *path);
> +extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
> +                                     unsigned char *gitmodules_sha1,
> +                                     struct strbuf *rev);

no need for extern here as it is a function declaration, not a
variable declaration;
(as said on patch 1, I think consistency to the surrounding is important here)

>  void submodule_free(void);
>
>  #endif /* SUBMODULE_CONFIG_H */
> diff --git a/submodule.c b/submodule.c
> index ff4e7b2..19dfbd4 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -198,6 +198,18 @@ void gitmodules_config(void)
>         }
>  }
>
> +void gitmodules_config_sha1(const unsigned char *commit_sha1)
> +{
> +       struct strbuf rev = STRBUF_INIT;
> +       unsigned char sha1[20];
> +
> +       if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
> +               git_config_from_blob_sha1(submodule_config, rev.buf,
> +                                         sha1, NULL);
> +       }
> +       strbuf_release(&rev);
> +}
> +
>  /*
>   * Determine if a submodule has been initialized at a given 'path'
>   */
> diff --git a/submodule.h b/submodule.h
> index bd039ca..9a24ac8 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
>                 const char *path);
>  int submodule_config(const char *var, const char *value, void *cb);
>  void gitmodules_config(void);
> +extern void gitmodules_config_sha1(const unsigned char *commit_sha1);

same.

>  extern int is_submodule_initialized(const char *path);
>  extern int is_submodule_checked_out(const char *path);
>  int parse_submodule_update_strategy(const char *value,
> --
> 2.8.0.rc3.226.g39d4020
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 3/6] grep: add submodules as a grep source type
  2016-10-31 22:38   ` [PATCH v2 3/6] grep: add submodules as a grep source type Brandon Williams
@ 2016-11-01 16:53     ` Stefan Beller
  2016-11-01 17:31     ` Junio C Hamano
  1 sibling, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-11-01 16:53 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org

On Mon, Oct 31, 2016 at 3:38 PM, Brandon Williams <bmwill@google.com> wrote:
> Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
> type in the various switch statements in grep.c.
>
> When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
> identifier can either be NULL (to indicate that the working tree will be
> used) or a SHA1 (the REV of the submodule to be grep'd).  If the
> identifier is a SHA1 then we want to fall through to the
> `GREP_SOURCE_SHA1` case to handle the copying of the SHA1.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>

This patch only adds the (de-)initialization for the new type,
it is not yet made use of. Looks good.

Thanks,
Stefan

> ---
>  grep.c | 16 +++++++++++++++-
>  grep.h |  1 +
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/grep.c b/grep.c
> index 1194d35..0dbdc1d 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
>         case GREP_SOURCE_FILE:
>                 gs->identifier = xstrdup(identifier);
>                 break;
> +       case GREP_SOURCE_SUBMODULE:
> +               if (!identifier) {
> +                       gs->identifier = NULL;
> +                       break;
> +               }
> +               /*
> +                * FALL THROUGH
> +                * If the identifier is non-NULL (in the submodule case) it
> +                * will be a SHA1 that needs to be copied.
> +                */
>         case GREP_SOURCE_SHA1:
>                 gs->identifier = xmalloc(20);
>                 hashcpy(gs->identifier, identifier);
>                 break;
>         case GREP_SOURCE_BUF:
>                 gs->identifier = NULL;
> +               break;
>         }
>  }
>
> @@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
>         switch (gs->type) {
>         case GREP_SOURCE_FILE:
>         case GREP_SOURCE_SHA1:
> +       case GREP_SOURCE_SUBMODULE:
>                 free(gs->buf);
>                 gs->buf = NULL;
>                 gs->size = 0;
> @@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
>                 return grep_source_load_sha1(gs);
>         case GREP_SOURCE_BUF:
>                 return gs->buf ? 0 : -1;
> +       case GREP_SOURCE_SUBMODULE:
> +               break;
>         }
> -       die("BUG: invalid grep_source type");
> +       die("BUG: invalid grep_source type to load");
>  }
>
>  void grep_source_load_driver(struct grep_source *gs)
> diff --git a/grep.h b/grep.h
> index 5856a23..267534c 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -161,6 +161,7 @@ struct grep_source {
>                 GREP_SOURCE_SHA1,
>                 GREP_SOURCE_FILE,
>                 GREP_SOURCE_BUF,
> +               GREP_SOURCE_SUBMODULE,
>         } type;
>         void *identifier;
>
> --
> 2.8.0.rc3.226.g39d4020
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-10-31 23:34     ` Stefan Beller
@ 2016-11-01 17:20       ` Junio C Hamano
  2016-11-01 17:24         ` Brandon Williams
  2016-11-01 17:31         ` Stefan Beller
  2016-11-01 17:23       ` Brandon Williams
  1 sibling, 2 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-11-01 17:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Brandon Williams, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

Overall the suggestions from you in this review is good and please
consider anything I did not mention I agree with you.  Thanks.

>> +extern int is_submodule_initialized(const char *path);
>> +extern int is_submodule_checked_out(const char *path);
>
> no need to put extern for function names. (no other functions in this
> header are extern. so local consistency maybe? I'd also claim that
> all other extern functions in headers ought to be declared without
> being extern)

Maybe I am old fashioned, but I'd feel better to see these with
explicit "extern" in front (check the older header files like
cache.h when you are in doubt what the project convention has been).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-10-31 23:34     ` Stefan Beller
  2016-11-01 17:20       ` Junio C Hamano
@ 2016-11-01 17:23       ` Brandon Williams
  1 sibling, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-01 17:23 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org

On 10/31, Stefan Beller wrote:
> On Mon, Oct 31, 2016 at 3:38 PM, Brandon Williams <bmwill@google.com> wrote:
> > +int is_submodule_checked_out(const char *path)
> > +{
> > +       int ret = 0;
> > +       struct strbuf buf = STRBUF_INIT;
> > +
> > +       strbuf_addf(&buf, "%s/.git", path);
> > +       ret = file_exists(buf.buf);
> 
> I think we can be more tight here; instead of checking
> if the file or directory exists, we should be checking if
> it is a valid git directory, i.e. s/file_exists/resolve_gitdir/
> which returns a path to the actual git dir (in case of a .gitlink)
> or NULL when nothing is found that looks like a git directory or
> pointer to it.

Sounds good.

> > +
> > +       strbuf_release(&buf);
> > +       return ret;
> > +}
> > +
> >  int parse_submodule_update_strategy(const char *value,
> >                 struct submodule_update_strategy *dst)
> >  {
> > diff --git a/submodule.h b/submodule.h
> > index d9e197a..bd039ca 100644
> > --- a/submodule.h
> > +++ b/submodule.h
> > @@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
> >                 const char *path);
> >  int submodule_config(const char *var, const char *value, void *cb);
> >  void gitmodules_config(void);
> > +extern int is_submodule_initialized(const char *path);
> > +extern int is_submodule_checked_out(const char *path);
> 
> no need to put extern for function names. (no other functions in this
> header are extern. so local consistency maybe? I'd also claim that
> all other extern functions in headers ought to be declared without
> being extern)

From looking around at other sections of the code it seems like the
extern keyword is used for functions declared in header files. What's
the style guideline for the project say about this?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-11-01 17:20       ` Junio C Hamano
@ 2016-11-01 17:24         ` Brandon Williams
  2016-11-01 17:31         ` Stefan Beller
  1 sibling, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-01 17:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git@vger.kernel.org

On 11/01, Junio C Hamano wrote:
> Stefan Beller <sbeller@google.com> writes:
> 
> Overall the suggestions from you in this review is good and please
> consider anything I did not mention I agree with you.  Thanks.
> 
> >> +extern int is_submodule_initialized(const char *path);
> >> +extern int is_submodule_checked_out(const char *path);
> >
> > no need to put extern for function names. (no other functions in this
> > header are extern. so local consistency maybe? I'd also claim that
> > all other extern functions in headers ought to be declared without
> > being extern)
> 
> Maybe I am old fashioned, but I'd feel better to see these with
> explicit "extern" in front (check the older header files like
> cache.h when you are in doubt what the project convention has been).

I wouldn't consider that old fashion as I'm fairly new to all this and
I also prefer the explicit "extern" :P

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 4/6] grep: optionally recurse into submodules
  2016-10-31 22:38   ` [PATCH v2 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-11-01 17:26     ` Stefan Beller
  2016-11-01 20:25       ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Stefan Beller @ 2016-11-01 17:26 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org

On Mon, Oct 31, 2016 at 3:38 PM, Brandon Williams <bmwill@google.com> wrote:

>
> +--recurse-submodules::
> +       Recursively search in each submodule that has been initialized and
> +       checked out in the repository.
> +

and warn otherwise.

> +
> +       /*
> +        * Limit number of threads for child process to use.
> +        * This is to prevent potential fork-bomb behavior of git-grep as each
> +        * submodule process has its own thread pool.
> +        */
> +       if (num_threads)
> +               argv_array_pushf(&submodule_options, "--threads=%d",
> +                                (num_threads + 1) / 2);

Just like in the run_parallel machinery this seems like an approximate
workaround. I'm ok with that for now.

Ideally the parent/child can send each other signals to hand
over threads. (SIGUSR1/SIGUSR2 would be enough to do that,
though I wonder if that is as portable as I would hope. Or we'd look at
"make" and see how they handle recursive calls.

> +
> +       /*
> +        * Capture output to output buffer and check the return code from the
> +        * child process.  A '0' indicates a hit, a '1' indicates no hit and
> +        * anything else is an error.
> +        */
> +       status = capture_command(&cp, &w->out, 0);
> +       if (status && (status != 1))

Does the user have enough information what went wrong?
Is the child verbose enough, such that we do not need to give a
die[_errno]("submodule processs failed") ?


> +static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
> +                         const char *filename, const char *path)
> +{
> +       if (!(is_submodule_initialized(path) &&

If it is not initialized, the user "obviously" doesn't care, so maybe
we only need to warn
if init, but not checked out?

> +             is_submodule_checked_out(path))) {
> +               warning("skiping submodule '%s%s' since it is not initialized and checked out",
> +                       super_prefix ? super_prefix : "",
> +                       path);
> +               return 0;
> +       }
> +
> +#ifndef NO_PTHREADS
> +       if (num_threads) {
> +               add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
> +               return 0;
> +       } else
> +#endif
> +       {
> +               struct work_item w;
> +               int hit;
> +
> +               grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
> +                                filename, path, sha1);
> +               strbuf_init(&w.out, 0);
> +               opt->output_priv = &w;
> +               hit = grep_submodule_launch(opt, &w.source);
> +
> +               write_or_die(1, w.out.buf, w.out.len);
> +
> +               grep_source_clear(&w.source);
> +               strbuf_release(&w.out);
> +               return hit;
> +       }
> +}
> +
> +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
> +                     int cached)
>  {
>         int hit = 0;
>         int nr;
> +       struct strbuf name = STRBUF_INIT;
> +       int name_base_len = 0;
> +       if (super_prefix) {
> +               name_base_len = strlen(super_prefix);
> +               strbuf_addstr(&name, super_prefix);
> +       }
> +
>         read_cache();
>
>         for (nr = 0; nr < active_nr; nr++) {
>                 const struct cache_entry *ce = active_cache[nr];
> -               if (!S_ISREG(ce->ce_mode))
> -                       continue;
> -               if (!ce_path_match(ce, pathspec, NULL))
> -                       continue;
> -               /*
> -                * If CE_VALID is on, we assume worktree file and its cache entry
> -                * are identical, even if worktree file has been modified, so use
> -                * cache version instead
> -                */
> -               if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
> -                       if (ce_stage(ce) || ce_intent_to_add(ce))
> -                               continue;
> -                       hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
> -                                        ce->name);
> +               strbuf_setlen(&name, name_base_len);
> +               strbuf_addstr(&name, ce->name);
> +
> +               if (S_ISREG(ce->ce_mode) &&
> +                   match_pathspec(pathspec, name.buf, name.len, 0, NULL,
> +                                  S_ISDIR(ce->ce_mode) ||
> +                                  S_ISGITLINK(ce->ce_mode))) {

Why do we have to pass the ISDIR and ISGITLINK here for the regular file
case? ce_path_match and match_pathspec are doing the same thing?

> +                       /*
> +                        * If CE_VALID is on, we assume worktree file and its
> +                        * cache entry are identical, even if worktree file has
> +                        * been modified, so use cache version instead
> +                        */
> +                       if (cached || (ce->ce_flags & CE_VALID) ||
> +                           ce_skip_worktree(ce)) {
> +                               if (ce_stage(ce) || ce_intent_to_add(ce))
> +                                       continue;
> +                               hit |= grep_sha1(opt, ce->oid.hash, ce->name,
> +                                                0, ce->name);
> +                       } else {
> +                               hit |= grep_file(opt, ce->name);
> +                       }
> +               } else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> +                          submodule_path_match(pathspec, name.buf, NULL)) {
> +                       hit |= grep_submodule(opt, NULL, ce->name, ce->name);

What is the difference between the last two parameters?

> + * filename: name of the submodule including tree name of parent
> + * path: location of the submodule

That sounds the same to me.

>         }
>
> +       if (recurse_submodules && (!use_index || untracked || list.nr))
> +               die(_("option not supported with --recurse-submodules."));

The user asks: Which option?

> +
> +test_expect_success 'grep and nested submodules' '
> +       git init submodule/sub &&
> +       echo "foobar" >submodule/sub/a &&
> +       git -C submodule/sub add a &&
> +       git -C submodule/sub commit -m "add a" &&
> +       git -C submodule submodule add ./sub &&
> +       git -C submodule add sub &&
> +       git -C submodule commit -m "added sub" &&
> +       git add submodule &&
> +       git commit -m "updated submodule" &&

Both in this test as well as in the setup, we setup a repository
with submodules, that have clean working dirs.

What should happen with dirty working dirs. dirty in the sense:
* file untracked in the submodule
* file added in the submodule, but not committed
* file committed in the submodule, that commit is
   untracked in the superproject
* file committed in the submodule, that commit is
  added to the index in the superproject
* (last case is just as above:) file committed in submodule,
   that commit was committed into the superproject.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 3/6] grep: add submodules as a grep source type
  2016-10-31 22:38   ` [PATCH v2 3/6] grep: add submodules as a grep source type Brandon Williams
  2016-11-01 16:53     ` Stefan Beller
@ 2016-11-01 17:31     ` Junio C Hamano
  1 sibling, 0 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-11-01 17:31 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller

Brandon Williams <bmwill@google.com> writes:

> Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
> type in the various switch statements in grep.c.
>
> When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
> identifier can either be NULL (to indicate that the working tree will be
> used) or a SHA1 (the REV of the submodule to be grep'd).  If the
> identifier is a SHA1 then we want to fall through to the
> `GREP_SOURCE_SHA1` case to handle the copying of the SHA1.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---

Conceptually, it somehow feels strange to have SUBMODULE in this
set.

Source being SHA1 means we are doing a recursive grep in a tree
structure that is stored in the object store, being FILE means we
are reading from the filesystem, being BUF means we are fed in-core
buffer (e.g. to implement the "log --grep='string in message'").  It
is unclear how SUBMODULE fits in that picture, as we do not have a
caller that uses the type at this step yet.  Hopefully it will
become obvious why this new type belongs to that set as the series
progresses ;-)

>  grep.c | 16 +++++++++++++++-
>  grep.h |  1 +
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/grep.c b/grep.c
> index 1194d35..0dbdc1d 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
>  	case GREP_SOURCE_FILE:
>  		gs->identifier = xstrdup(identifier);
>  		break;
> +	case GREP_SOURCE_SUBMODULE:
> +		if (!identifier) {
> +			gs->identifier = NULL;
> +			break;
> +		}
> +		/*
> +		 * FALL THROUGH
> +		 * If the identifier is non-NULL (in the submodule case) it
> +		 * will be a SHA1 that needs to be copied.
> +		 */
>  	case GREP_SOURCE_SHA1:
>  		gs->identifier = xmalloc(20);
>  		hashcpy(gs->identifier, identifier);
>  		break;
>  	case GREP_SOURCE_BUF:
>  		gs->identifier = NULL;
> +		break;
>  	}
>  }
>  
> @@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
>  	switch (gs->type) {
>  	case GREP_SOURCE_FILE:
>  	case GREP_SOURCE_SHA1:
> +	case GREP_SOURCE_SUBMODULE:
>  		free(gs->buf);
>  		gs->buf = NULL;
>  		gs->size = 0;
> @@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
>  		return grep_source_load_sha1(gs);
>  	case GREP_SOURCE_BUF:
>  		return gs->buf ? 0 : -1;
> +	case GREP_SOURCE_SUBMODULE:
> +		break;
>  	}
> -	die("BUG: invalid grep_source type");
> +	die("BUG: invalid grep_source type to load");
>  }
>  
>  void grep_source_load_driver(struct grep_source *gs)
> diff --git a/grep.h b/grep.h
> index 5856a23..267534c 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -161,6 +161,7 @@ struct grep_source {
>  		GREP_SOURCE_SHA1,
>  		GREP_SOURCE_FILE,
>  		GREP_SOURCE_BUF,
> +		GREP_SOURCE_SUBMODULE,
>  	} type;
>  	void *identifier;

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-11-01 17:20       ` Junio C Hamano
  2016-11-01 17:24         ` Brandon Williams
@ 2016-11-01 17:31         ` Stefan Beller
  2016-11-06  7:42           ` Jacob Keller
  1 sibling, 1 reply; 126+ messages in thread
From: Stefan Beller @ 2016-11-01 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git@vger.kernel.org

On Tue, Nov 1, 2016 at 10:20 AM, Junio C Hamano <gitster@pobox.com> wrote:

>
> Maybe I am old fashioned, but I'd feel better to see these with
> explicit "extern" in front (check the older header files like
> cache.h when you are in doubt what the project convention has been).

I did check the other files and saw them, so I was very unsure what to
suggest here. I only saw the extern keyword used in headers that were
there when Git was really young, so I assumed it's a style nit by kernel
developers. Thanks for clarifying!

I think we'll want to have some consistency though, so we
maybe want to coordinate a cleanup of submodule.h as well as
submodule-config.h to mark all the functions extern.

This doesn't need to be a all-at-once thing, but we'd keep it in mind
for future declarations in the header.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 4/6] grep: optionally recurse into submodules
  2016-11-01 17:26     ` Stefan Beller
@ 2016-11-01 20:25       ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-01 20:25 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org

On 11/01, Stefan Beller wrote:
> On Mon, Oct 31, 2016 at 3:38 PM, Brandon Williams <bmwill@google.com> wrote:
> 
> >
> > +--recurse-submodules::
> > +       Recursively search in each submodule that has been initialized and
> > +       checked out in the repository.
> > +
> 
> and warn otherwise.

I've been going back and forth on whether to warn the user...maybe
`grep` isn't really the right place for the warning?

> > +
> > +       /*
> > +        * Capture output to output buffer and check the return code from the
> > +        * child process.  A '0' indicates a hit, a '1' indicates no hit and
> > +        * anything else is an error.
> > +        */
> > +       status = capture_command(&cp, &w->out, 0);
> > +       if (status && (status != 1))
> 
> Does the user have enough information what went wrong?
> Is the child verbose enough, such that we do not need to give a
> die[_errno]("submodule processs failed") ?
good point...the output from the child is stored in a buffer and won't
actually get printed if this fails out.  Perhaps we should flush the
buffer and then die?

> > +               if (S_ISREG(ce->ce_mode) &&
> > +                   match_pathspec(pathspec, name.buf, name.len, 0, NULL,
> > +                                  S_ISDIR(ce->ce_mode) ||
> > +                                  S_ISGITLINK(ce->ce_mode))) {
> 
> Why do we have to pass the ISDIR and ISGITLINK here for the regular file
> case? ce_path_match and match_pathspec are doing the same thing?

I was simply doing what ce_path_match was doing.  And I needed to switch
to using match_pathspec instead because ce_path_match doesn't allow for
checking the super_prefix as part of the pathspec logic...Perhaps a
refactor (in the future) in the pathspec logic could do that via a flag?

> > +                          submodule_path_match(pathspec, name.buf, NULL)) {
> > +                       hit |= grep_submodule(opt, NULL, ce->name, ce->name);
> 
> What is the difference between the last two parameters?

Path and file name, in the cached case they are the same.

> > + * filename: name of the submodule including tree name of parent
> > + * path: location of the submodule
> 
> That sounds the same to me.
So they are similar.  path should be used as the directory to
chdir for the child process and it doesn't have the tree name prefixed
to it.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-10-31 22:38   ` [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
  2016-10-31 23:34     ` Stefan Beller
@ 2016-11-05  2:34     ` Jonathan Tan
  1 sibling, 0 replies; 126+ messages in thread
From: Jonathan Tan @ 2016-11-05  2:34 UTC (permalink / raw)
  To: Brandon Williams, git, sbeller

On 10/31/2016 03:38 PM, Brandon Williams wrote:
> +		struct strbuf buf = STRBUF_INIT;
> +		char *submodule_url = NULL;
> +
> +		strbuf_addf(&buf, "submodule.%s.url", module->name);
> +		ret = !git_config_get_string(buf.buf, &submodule_url);
> +
> +		free(submodule_url);
> +		strbuf_release(&buf);
> +	}
> +
> +	return ret;
> +}
> +
> +/*
> + * Determine if a submodule has been checked out at a given 'path'
> + */
> +int is_submodule_checked_out(const char *path)
> +{
> +	int ret = 0;
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_addf(&buf, "%s/.git", path);
> +	ret = file_exists(buf.buf);
> +
> +	strbuf_release(&buf);

In this and the previous function, you can use xstrfmt.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/5] grep: optionally recurse into submodules
  2016-10-27 22:38 ` [PATCH 4/5] grep: optionally recurse into submodules Brandon Williams
@ 2016-11-05  5:09   ` Jonathan Tan
  0 siblings, 0 replies; 126+ messages in thread
From: Jonathan Tan @ 2016-11-05  5:09 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git

On Thu, Oct 27, 2016 at 3:38 PM, Brandon Williams <bmwill@google.com> wrote:
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 8887b6add..f34f16df9 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -18,12 +18,20 @@
>  #include "quote.h"
>  #include "dir.h"
>  #include "pathspec.h"
> +#include "submodule.h"
>
>  static char const * const grep_usage[] = {
>         N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
>         NULL
>  };
>
> +static const char *super_prefix;

I think that the super_prefix changes could be in its own patch.

> +static int recurse_submodules;
> +static struct argv_array submodule_options = ARGV_ARRAY_INIT;

I guess this has to be static because it is shared by multiple threads.

> +
> +static int grep_submodule_launch(struct grep_opt *opt,
> +                                const struct grep_source *gs);
> +
>  #define GREP_NUM_THREADS_DEFAULT 8
>  static int num_threads;
>
> @@ -174,7 +182,10 @@ static void *run(void *arg)
>                         break;
>
>                 opt->output_priv = w;
> -               hit |= grep_source(opt, &w->source);
> +               if (w->source.type == GREP_SOURCE_SUBMODULE)
> +                       hit |= grep_submodule_launch(opt, &w->source);
> +               else
> +                       hit |= grep_source(opt, &w->source);

It seems to me that GREP_SOURCE_SUBMODULE is of a different nature
than the other GREP_SOURCE_.* - in struct work_item, could we instead
have another variable that distinguishes between submodules and
"native" sources? This might also assuage Junio's concerns in
<xmqq37jbqf83.fsf@gitster.mtv.corp.google.com> about the nature of the
sources.

That variable could also be the discriminant for a tagged union, such
that we have "struct grep_source" for the "native" sources and a new
struct (holding only submodule-relevant information) for the
submodule.

> +/*
> + * Prep grep structures for a submodule grep
> + * sha1: the sha1 of the submodule or NULL if using the working tree
> + * filename: name of the submodule including tree name of parent
> + * path: location of the submodule
> + */
> +static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
> +                         const char *filename, const char *path)
> +{
> +       if (!(is_submodule_initialized(path) &&
> +             is_submodule_checked_out(path))) {
> +               warning("skiping submodule '%s%s' since it is not initialized and checked out",
> +                       super_prefix ? super_prefix: "",
> +                       path);
> +               return 0;
> +       }
> +
> +#ifndef NO_PTHREADS
> +       if (num_threads) {
> +               add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
> +               return 0;
> +       } else
> +#endif
> +       {
> +               struct work_item w;
> +               int hit;
> +
> +               grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
> +                                filename, path, sha1);
> +               strbuf_init(&w.out, 0);
> +               opt->output_priv = &w;
> +               hit = grep_submodule_launch(opt, &w.source);
> +
> +               write_or_die(1, w.out.buf, w.out.len);
> +
> +               grep_source_clear(&w.source);
> +               strbuf_release(&w.out);
> +               return hit;
> +       }

This is at least the third invocation of this "if pthreads, add work,
otherwise do it now" pattern - could this be extracted into its own
function (in another patch)? Ideally, there would also be exactly one
function in which the grep_source.* functions are invoked, and both
"run" and the non-pthread code path can use it.

> +}
> +
> +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
> +                     int cached)

This line isn't modified other than the line break, as far as I can
tell, so I wouldn't break it.

> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> new file mode 100755
> index 000000000..b670c70cb
> --- /dev/null
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -0,0 +1,99 @@
> +#!/bin/sh
> +
> +test_description='Test grep recurse-submodules feature
> +
> +This test verifies the recurse-submodules feature correctly greps across
> +submodules.
> +'
> +
> +. ./test-lib.sh
> +

Would it be possible to also test it while num_threads is zero? (Or,
if num_threads is already zero, to test it while it is not zero?)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
  2016-11-01 17:31         ` Stefan Beller
@ 2016-11-06  7:42           ` Jacob Keller
  0 siblings, 0 replies; 126+ messages in thread
From: Jacob Keller @ 2016-11-06  7:42 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Brandon Williams, git@vger.kernel.org

On Tue, Nov 1, 2016 at 10:31 AM, Stefan Beller <sbeller@google.com> wrote:
> On Tue, Nov 1, 2016 at 10:20 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
>>
>> Maybe I am old fashioned, but I'd feel better to see these with
>> explicit "extern" in front (check the older header files like
>> cache.h when you are in doubt what the project convention has been).
>
> I did check the other files and saw them, so I was very unsure what to
> suggest here. I only saw the extern keyword used in headers that were
> there when Git was really young, so I assumed it's a style nit by kernel
> developers. Thanks for clarifying!
>
> I think we'll want to have some consistency though, so we
> maybe want to coordinate a cleanup of submodule.h as well as
> submodule-config.h to mark all the functions extern.
>
> This doesn't need to be a all-at-once thing, but we'd keep it in mind
> for future declarations in the header.
>
> Thanks,
> Stefan

Extern is generally used when you want to declare a header for a
function that's in a different object file. I'm not sure if we
actually need it or not though.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-10-31 22:38   ` [PATCH v2 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-11-11 23:09     ` Jonathan Tan
  0 siblings, 0 replies; 126+ messages in thread
From: Jonathan Tan @ 2016-11-11 23:09 UTC (permalink / raw)
  To: Brandon Williams, git, sbeller

On 10/31/2016 03:38 PM, Brandon Williams wrote:
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 17aa1ba..386a868 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -26,7 +26,7 @@ SYNOPSIS
>  	   [--threads <num>]
>  	   [-f <file>] [-e] <pattern>
>  	   [--and|--or|--not|(|)|-e <pattern>...]
> -	   [--recurse-submodules]
> +	   [--recurse-submodules] [--parent-basename]

Maybe add something after --parent-basename, since it takes an argument 
(like --threads above).

> @@ -91,7 +91,16 @@ OPTIONS
>
>  --recurse-submodules::
>  	Recursively search in each submodule that has been initialized and
> -	checked out in the repository.
> +	checked out in the repository.  When used in combination with the
> +	<tree> option the prefix of all submodule output will be the name of
> +	the parent project's <tree> object.
> +
> +--parent-basename::

Same comment as above.

> diff --git a/builtin/grep.c b/builtin/grep.c
> index cf4f51e..2f10930 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -19,6 +19,7 @@
>  #include "dir.h"
>  #include "pathspec.h"
>  #include "submodule.h"
> +#include "submodule-config.h"
>
>  static char const * const grep_usage[] = {
>  	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
> @@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
>  static const char *super_prefix;
>  static int recurse_submodules;
>  static struct argv_array submodule_options = ARGV_ARRAY_INIT;
> +static const char *parent_basename;

Can this be passed as an argument to the functions (grep_objects and 
grep_object, it seems) instead of having a file-visible variable?

> @@ -671,12 +707,29 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
>  	enum interesting match = entry_not_interesting;
>  	struct name_entry entry;
>  	int old_baselen = base->len;
> +	struct strbuf name = STRBUF_INIT;
> +	int name_base_len = 0;
> +	if (super_prefix) {
> +		name_base_len = strlen(super_prefix);
> +		strbuf_addstr(&name, super_prefix);

Better to invoke strbuf_addstr, and then set name_base_len from 
name.len. This makes it clear where strbuf_setlen (subsequently) resets 
the strbuf to, and is also a slight performance improvement.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v3 0/6] recursively grep across submodules
  2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
                     ` (5 preceding siblings ...)
  2016-10-31 22:38   ` [PATCH v2 6/6] grep: search history of moved submodules Brandon Williams
@ 2016-11-11 23:51   ` Brandon Williams
  2016-11-11 23:51     ` [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
                       ` (7 more replies)
  6 siblings, 8 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-11 23:51 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Most of the changes between v2 and v3 of this series were to address the few
reviewer comments.
* use 'path' as the directory to cd into for the child process and use 'name'
  for the super_prefix
* flush output from childprocess on error and print a more useful error msg
* change is_submodule_checked_out to is_submodule_populated
* fix GIT_DIR for a submodule child process from being set to '.git'
* Addition of a test for searching history of a submodule which had been moved.

The series as a whole probably needs a few more rounds of
review so any additional input would be appreciated!  

Thanks!
Brandon

Brandon Williams (6):
  submodules: add helper functions to determine presence of submodules
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: search history of moved submodules

 Documentation/git-grep.txt         |  14 ++
 builtin/grep.c                     | 391 ++++++++++++++++++++++++++++++++++---
 cache.h                            |   2 +
 config.c                           |   8 +-
 git.c                              |   2 +-
 grep.c                             |  16 +-
 grep.h                             |   1 +
 submodule-config.c                 |   6 +-
 submodule-config.h                 |   3 +
 submodule.c                        |  50 +++++
 submodule.h                        |   3 +
 t/t7814-grep-recurse-submodules.sh | 182 +++++++++++++++++
 12 files changed, 646 insertions(+), 32 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

-- interdiff based on 'bw/grep-recurse-submodules'

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 386a868..71f32f3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules] [--parent-basename]
+	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -95,7 +95,7 @@ OPTIONS
 	<tree> option the prefix of all submodule output will be the name of
 	the parent project's <tree> object.
 
---parent-basename::
+--parent-basename <basename>::
 	For internal use only.  In order to produce uniform output with the
 	--recurse-submodules option, this option can be used to provide the
 	basename of a parent's <tree> object to a submodule so the submodule
diff --git a/builtin/grep.c b/builtin/grep.c
index bdf1b90..1879432 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -538,14 +538,22 @@ static int grep_submodule_launch(struct grep_opt *opt,
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
 	const char *end_of_base;
+	const char *name;
 	struct work_item *w = opt->output_priv;
 
+	end_of_base = strchr(gs->name, ':');
+	if (end_of_base)
+		name = end_of_base + 1;
+	else
+		name = gs->name;
+
 	prepare_submodule_repo_env(&cp.env_array);
+	argv_array_push(&cp.env_array, GIT_DIR_ENVIRONMENT);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
-			 super_prefix ? super_prefix: "",
-			 gs->path);
+			 super_prefix ? super_prefix : "",
+			 name);
 	argv_array_push(&cp.args, "grep");
 
 	/*
@@ -556,7 +564,6 @@ static int grep_submodule_launch(struct grep_opt *opt,
 	 * parent project's object name to the submodule so the submodule can
 	 * prefix its output with the parent's name and not its own SHA1.
 	 */
-	end_of_base = strchr(gs->name, ':');
 	if (end_of_base)
 		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
 				 (int) (end_of_base - gs->name),
@@ -588,8 +595,12 @@ static int grep_submodule_launch(struct grep_opt *opt,
 	 * anything else is an error.
 	 */
 	status = capture_command(&cp, &w->out, 0);
-	if (status && (status != 1))
-		exit(status);
+	if (status && (status != 1)) {
+		/* flush the buffer */
+		write_or_die(1, w->out.buf, w->out.len);
+		die("process for submodule '%s' failed with exit code: %d",
+		    gs->name, status);
+	}
 
 	/* invert the return code to make a hit equal to 1 */
 	return !status;
@@ -605,11 +616,20 @@ static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 			  const char *filename, const char *path)
 {
 	if (!(is_submodule_initialized(path) &&
-	      is_submodule_checked_out(path))) {
-		warning("skiping submodule '%s%s' since it is not initialized and checked out",
-			super_prefix ? super_prefix: "",
-			path);
-		return 0;
+	      is_submodule_populated(path))) {
+		/*
+		 * If searching history, check for the presense of the
+		 * submodule's gitdir before skipping the submodule.
+		 */
+		if (sha1) {
+			path = git_path("modules/%s",
+					submodule_from_path(null_sha1, path)->name);
+
+			if(!(is_directory(path) && is_git_directory(path)))
+				return 0;
+		} else {
+			return 0;
+		}
 	}
 
 #ifndef NO_PTHREADS
@@ -670,8 +690,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
 					continue;
 				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
 						 0, ce->name);
-			}
-			else {
+			} else {
 				hit |= grep_file(opt, ce->name);
 			}
 		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
@@ -705,8 +724,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	struct strbuf name = STRBUF_INIT;
 	int name_base_len = 0;
 	if (super_prefix) {
-		name_base_len = strlen(super_prefix);
 		strbuf_addstr(&name, super_prefix);
+		name_base_len = name.len;
 	}
 
 	while (tree_entry(tree, &entry)) {
@@ -715,8 +734,16 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (match != all_entries_interesting) {
 			strbuf_setlen(&name, name_base_len);
 			strbuf_addstr(&name, base->buf + tn_len);
-			match = tree_entry_interesting(&entry, &name,
-						       0, pathspec);
+
+			if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+				strbuf_addstr(&name, entry.path);
+				match = submodule_path_match(pathspec, name.buf,
+							     NULL);
+			} else {
+				match = tree_entry_interesting(&entry, &name,
+							       0, pathspec);
+			}
+
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
diff --git a/submodule.c b/submodule.c
index f2a5668..062e58b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -221,31 +221,30 @@ int is_submodule_initialized(const char *path)
 	module = submodule_from_path(null_sha1, path);
 
 	if (module) {
-		struct strbuf buf = STRBUF_INIT;
-		char *submodule_url = NULL;
+		char *key = xstrfmt("submodule.%s.url", module->name);
+		char *value = NULL;
 
-		strbuf_addf(&buf, "submodule.%s.url",module->name);
-		ret = !git_config_get_string(buf.buf, &submodule_url);
+		ret = !git_config_get_string(key, &value);
 
-		free(submodule_url);
-		strbuf_release(&buf);
+		free(value);
+		free(key);
 	}
 
 	return ret;
 }
 
 /*
- * Determine if a submodule has been checked out at a given 'path'
+ * Determine if a submodule has been populated at a given 'path'
  */
-int is_submodule_checked_out(const char *path)
+int is_submodule_populated(const char *path)
 {
 	int ret = 0;
-	struct strbuf buf = STRBUF_INIT;
+	char *gitdir = xstrfmt("%s/.git", path);
 
-	strbuf_addf(&buf, "%s/.git", path);
-	ret = file_exists(buf.buf);
+	if (resolve_gitdir(gitdir))
+		ret = 1;
 
-	strbuf_release(&buf);
+	free(gitdir);
 	return ret;
 }
 
diff --git a/submodule.h b/submodule.h
index 9a24ac8..9203d89 100644
--- a/submodule.h
+++ b/submodule.h
@@ -39,7 +39,7 @@ int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
 extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
-extern int is_submodule_checked_out(const char *path);
+extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 3d1892d..ee173ad 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -127,6 +127,47 @@ test_expect_success 'grep tree and pathspecs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+	git init parent &&
+	echo "foobar" >parent/file &&
+	git -C parent add file &&
+	git -C parent commit -m "add file" &&
+
+	git init sub &&
+	echo "foobar" >sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+
+	git -C parent submodule add ../sub &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	test_cmp expect actual &&
+
+	git -C parent mv sub sub-moved &&
+	git -C parent commit -m "moved submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub-moved/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD^:file:foobar
+	HEAD^:sub/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ > actual &&
+	test_cmp expect actual &&
+
+	rm -rf parent sub
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
diff --git a/tree-walk.c b/tree-walk.c
index b3f9961..828f435 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -999,11 +999,10 @@ static enum interesting do_match(const struct name_entry *entry,
 					return entry_interesting;
 
 				/*
-				 * Match all directories and gitlinks. We'll
-				 * try to match files later on.
+				 * Match all directories. We'll try to
+				 * match files later on.
 				 */
-				if (ps->recursive && (S_ISDIR(entry->mode) ||
-						      S_ISGITLINK(entry->mode)))
+				if (ps->recursive && S_ISDIR(entry->mode))
 					return entry_interesting;
 			}
 
@@ -1044,13 +1043,13 @@ static enum interesting do_match(const struct name_entry *entry,
 		strbuf_setlen(base, base_offset + baselen);
 
 		/*
-		 * Match all directories and gitlinks. We'll try to match files
-		 * later on.  max_depth is ignored but we may consider support
-		 * it in future, see
+		 * Match all directories. We'll try to match files
+		 * later on.
+		 * max_depth is ignored but we may consider support it
+		 * in future, see
 		 * http://thread.gmane.org/gmane.comp.version-control.git/163757/focus=163840
 		 */
-		if (ps->recursive && (S_ISDIR(entry->mode) ||
-				      S_ISGITLINK(entry->mode)))
+		if (ps->recursive && S_ISDIR(entry->mode))
 			return entry_interesting;
 	}
 	return never_interesting; /* No matches */

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
@ 2016-11-11 23:51     ` Brandon Williams
  2016-11-15 23:49       ` Stefan Beller
  2016-11-11 23:51     ` [PATCH v3 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-11 23:51 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Add two helper functions to submodules.c.
`is_submodule_initialized()` checks if a submodule has been initialized
at a given path and `is_submodule_populated()` check if a submodule
has been checked out at a given path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 38 ++++++++++++++++++++++++++++++++++++++
 submodule.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/submodule.c b/submodule.c
index 6f7d883..f5107f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,44 @@ void gitmodules_config(void)
 	}
 }
 
+/*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+	int ret = 0;
+	const struct submodule *module = NULL;
+
+	module = submodule_from_path(null_sha1, path);
+
+	if (module) {
+		char *key = xstrfmt("submodule.%s.url", module->name);
+		char *value = NULL;
+
+		ret = !git_config_get_string(key, &value);
+
+		free(value);
+		free(key);
+	}
+
+	return ret;
+}
+
+/*
+ * Determine if a submodule has been populated at a given 'path'
+ */
+int is_submodule_populated(const char *path)
+{
+	int ret = 0;
+	char *gitdir = xstrfmt("%s/.git", path);
+
+	if (resolve_gitdir(gitdir))
+		ret = 1;
+
+	free(gitdir);
+	return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a..6ec5f2f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
+extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v3 2/6] submodules: load gitmodules file from commit sha1
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
  2016-11-11 23:51     ` [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-11-11 23:51     ` Brandon Williams
  2016-11-12  0:22       ` Stefan Beller
  2016-11-11 23:51     ` [PATCH v3 3/6] grep: add submodules as a grep source type Brandon Williams
                       ` (5 subsequent siblings)
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-11 23:51 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 cache.h            |  2 ++
 config.c           |  8 ++++----
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c        | 12 ++++++++++++
 submodule.h        |  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 1be6526..559a461 100644
--- a/cache.h
+++ b/cache.h
@@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+				     const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb..4d78e72 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	return do_config_from(&top, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-				     const char *name,
-				     const unsigned char *sha1,
-				     void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+			      const char *name,
+			      const unsigned char *sha1,
+			      void *data)
 {
 	enum object_type type;
 	char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085b..8b9a2ef 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
-				      unsigned char *gitmodules_sha1,
-				      struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+			       unsigned char *gitmodules_sha1,
+			       struct strbuf *rev)
 {
 	int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..78584ba 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+				      unsigned char *gitmodules_sha1,
+				      struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index f5107f0..062e58b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
 	}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned char sha1[20];
+
+	if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
+		git_config_from_blob_sha1(submodule_config, rev.buf,
+					  sha1, NULL);
+	}
+	strbuf_release(&rev);
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index 6ec5f2f..9203d89 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v3 3/6] grep: add submodules as a grep source type
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
  2016-11-11 23:51     ` [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
  2016-11-11 23:51     ` [PATCH v3 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-11-11 23:51     ` Brandon Williams
  2016-11-11 23:51     ` [PATCH v3 4/6] grep: optionally recurse into submodules Brandon Williams
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-11 23:51 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 grep.c | 16 +++++++++++++++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35..0dbdc1d 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	case GREP_SOURCE_FILE:
 		gs->identifier = xstrdup(identifier);
 		break;
+	case GREP_SOURCE_SUBMODULE:
+		if (!identifier) {
+			gs->identifier = NULL;
+			break;
+		}
+		/*
+		 * FALL THROUGH
+		 * If the identifier is non-NULL (in the submodule case) it
+		 * will be a SHA1 that needs to be copied.
+		 */
 	case GREP_SOURCE_SHA1:
 		gs->identifier = xmalloc(20);
 		hashcpy(gs->identifier, identifier);
 		break;
 	case GREP_SOURCE_BUF:
 		gs->identifier = NULL;
+		break;
 	}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
 	switch (gs->type) {
 	case GREP_SOURCE_FILE:
 	case GREP_SOURCE_SHA1:
+	case GREP_SOURCE_SUBMODULE:
 		free(gs->buf);
 		gs->buf = NULL;
 		gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
 		return grep_source_load_sha1(gs);
 	case GREP_SOURCE_BUF:
 		return gs->buf ? 0 : -1;
+	case GREP_SOURCE_SUBMODULE:
+		break;
 	}
-	die("BUG: invalid grep_source type");
+	die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23..267534c 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
 		GREP_SOURCE_SHA1,
 		GREP_SOURCE_FILE,
 		GREP_SOURCE_BUF,
+		GREP_SOURCE_SUBMODULE,
 	} type;
 	void *identifier;
 
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v3 4/6] grep: optionally recurse into submodules
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
                       ` (2 preceding siblings ...)
  2016-11-11 23:51     ` [PATCH v3 3/6] grep: add submodules as a grep source type Brandon Williams
@ 2016-11-11 23:51     ` Brandon Williams
  2016-11-16  0:07       ` Stefan Beller
  2016-11-11 23:51     ` [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-11 23:51 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |   5 +
 builtin/grep.c                     | 300 ++++++++++++++++++++++++++++++++++---
 git.c                              |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
 4 files changed, 385 insertions(+), 21 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e..17aa1ba 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
+	   [--recurse-submodules]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -88,6 +89,10 @@ OPTIONS
 	mechanism.  Only useful when searching files in the current directory
 	with `--no-index`.
 
+--recurse-submodules::
+	Recursively search in each submodule that has been initialized and
+	checked out in the repository.
+
 -a::
 --text::
 	Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6a..1fd292f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
 	NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
 			break;
 
 		opt->output_priv = w;
-		hit |= grep_source(opt, &w->source);
+		if (w->source.type == GREP_SOURCE_SUBMODULE)
+			hit |= grep_submodule_launch(opt, &w->source);
+		else
+			hit |= grep_source(opt, &w->source);
 		grep_source_clear_data(&w->source);
 		work_done(w);
 	}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
+	} else if (super_prefix) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
+		strbuf_addstr(&pathbuf, super_prefix);
+		strbuf_addstr(&pathbuf, filename + tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 {
 	struct strbuf buf = STRBUF_INIT;
 
-	if (opt->relative && opt->prefix_length)
+	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename, opt->prefix, &buf);
-	else
+	} else {
+		if (super_prefix)
+			strbuf_addstr(&buf, super_prefix);
 		strbuf_addstr(&buf, filename);
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
@@ -378,31 +396,258 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 		exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+				      const struct pathspec *pathspec,
+				      int cached, int untracked,
+				      int opt_exclude, int use_index,
+				      int pattern_type_arg)
+{
+	struct grep_pat *pattern;
+	int i;
+
+	if (recurse_submodules)
+		argv_array_push(&submodule_options, "--recurse-submodules");
+
+	if (cached)
+		argv_array_push(&submodule_options, "--cached");
+	if (!use_index)
+		argv_array_push(&submodule_options, "--no-index");
+	if (untracked)
+		argv_array_push(&submodule_options, "--untracked");
+	if (opt_exclude > 0)
+		argv_array_push(&submodule_options, "--exclude-standard");
+
+	if (opt->invert)
+		argv_array_push(&submodule_options, "-v");
+	if (opt->ignore_case)
+		argv_array_push(&submodule_options, "-i");
+	if (opt->word_regexp)
+		argv_array_push(&submodule_options, "-w");
+	switch (opt->binary) {
+	case GREP_BINARY_NOMATCH:
+		argv_array_push(&submodule_options, "-I");
+		break;
+	case GREP_BINARY_TEXT:
+		argv_array_push(&submodule_options, "-a");
+		break;
+	default:
+		break;
+	}
+	if (opt->allow_textconv)
+		argv_array_push(&submodule_options, "--textconv");
+	if (opt->max_depth != -1)
+		argv_array_pushf(&submodule_options, "--max-depth=%d",
+				 opt->max_depth);
+	if (opt->linenum)
+		argv_array_push(&submodule_options, "-n");
+	if (!opt->pathname)
+		argv_array_push(&submodule_options, "-h");
+	if (!opt->relative)
+		argv_array_push(&submodule_options, "--full-name");
+	if (opt->name_only)
+		argv_array_push(&submodule_options, "-l");
+	if (opt->unmatch_name_only)
+		argv_array_push(&submodule_options, "-L");
+	if (opt->null_following_name)
+		argv_array_push(&submodule_options, "-z");
+	if (opt->count)
+		argv_array_push(&submodule_options, "-c");
+	if (opt->file_break)
+		argv_array_push(&submodule_options, "--break");
+	if (opt->heading)
+		argv_array_push(&submodule_options, "--heading");
+	if (opt->pre_context)
+		argv_array_pushf(&submodule_options, "--before-context=%d",
+				 opt->pre_context);
+	if (opt->post_context)
+		argv_array_pushf(&submodule_options, "--after-context=%d",
+				 opt->post_context);
+	if (opt->funcname)
+		argv_array_push(&submodule_options, "-p");
+	if (opt->funcbody)
+		argv_array_push(&submodule_options, "-W");
+	if (opt->all_match)
+		argv_array_push(&submodule_options, "--all-match");
+	if (opt->debug)
+		argv_array_push(&submodule_options, "--debug");
+	if (opt->status_only)
+		argv_array_push(&submodule_options, "-q");
+
+	switch (pattern_type_arg) {
+	case GREP_PATTERN_TYPE_BRE:
+		argv_array_push(&submodule_options, "-G");
+		break;
+	case GREP_PATTERN_TYPE_ERE:
+		argv_array_push(&submodule_options, "-E");
+		break;
+	case GREP_PATTERN_TYPE_FIXED:
+		argv_array_push(&submodule_options, "-F");
+		break;
+	case GREP_PATTERN_TYPE_PCRE:
+		argv_array_push(&submodule_options, "-P");
+		break;
+	case GREP_PATTERN_TYPE_UNSPECIFIED:
+		break;
+	}
+
+	for (pattern = opt->pattern_list; pattern != NULL;
+	     pattern = pattern->next) {
+		switch (pattern->token) {
+		case GREP_PATTERN:
+			argv_array_pushf(&submodule_options, "-e%s",
+					 pattern->pattern);
+			break;
+		case GREP_AND:
+		case GREP_OPEN_PAREN:
+		case GREP_CLOSE_PAREN:
+		case GREP_NOT:
+		case GREP_OR:
+			argv_array_push(&submodule_options, pattern->pattern);
+			break;
+		/* BODY and HEAD are not used by git-grep */
+		case GREP_PATTERN_BODY:
+		case GREP_PATTERN_HEAD:
+			break;
+		}
+	}
+
+	/*
+	 * Limit number of threads for child process to use.
+	 * This is to prevent potential fork-bomb behavior of git-grep as each
+	 * submodule process has its own thread pool.
+	 */
+	if (num_threads)
+		argv_array_pushf(&submodule_options, "--threads=%d",
+				 (num_threads + 1) / 2);
+
+	/* Add Pathspecs */
+	argv_array_push(&submodule_options, "--");
+	for (i = 0; i < pathspec->nr; i++)
+		argv_array_push(&submodule_options,
+				pathspec->items[i].original);
+}
+
+/*
+ * Launch child process to grep contents of a submodule
+ */
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int status, i;
+	struct work_item *w = opt->output_priv;
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	/* Add super prefix */
+	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
+			 super_prefix ? super_prefix : "",
+			 gs->name);
+	argv_array_push(&cp.args, "grep");
+
+	/* Add options */
+	for (i = 0; i < submodule_options.argc; i++)
+		argv_array_push(&cp.args, submodule_options.argv[i]);
+
+	cp.git_cmd = 1;
+	cp.dir = gs->path;
+
+	/*
+	 * Capture output to output buffer and check the return code from the
+	 * child process.  A '0' indicates a hit, a '1' indicates no hit and
+	 * anything else is an error.
+	 */
+	status = capture_command(&cp, &w->out, 0);
+	if (status && (status != 1)) {
+		/* flush the buffer */
+		write_or_die(1, w->out.buf, w->out.len);
+		die("process for submodule '%s' failed with exit code: %d",
+		    gs->name, status);
+	}
+
+	/* invert the return code to make a hit equal to 1 */
+	return !status;
+}
+
+/*
+ * Prep grep structures for a submodule grep
+ * sha1: the sha1 of the submodule or NULL if using the working tree
+ * filename: name of the submodule including tree name of parent
+ * path: location of the submodule
+ */
+static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
+			  const char *filename, const char *path)
+{
+	if (!(is_submodule_initialized(path) &&
+	      is_submodule_populated(path)))
+		return 0;
+
+#ifndef NO_PTHREADS
+	if (num_threads) {
+		add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
+		return 0;
+	} else
+#endif
+	{
+		struct work_item w;
+		int hit;
+
+		grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
+				 filename, path, sha1);
+		strbuf_init(&w.out, 0);
+		opt->output_priv = &w;
+		hit = grep_submodule_launch(opt, &w.source);
+
+		write_or_die(1, w.out.buf, w.out.len);
+
+		grep_source_clear(&w.source);
+		strbuf_release(&w.out);
+		return hit;
+	}
+}
+
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
+		      int cached)
 {
 	int hit = 0;
 	int nr;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
+
 	read_cache();
 
 	for (nr = 0; nr < active_nr; nr++) {
 		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!ce_path_match(ce, pathspec, NULL))
-			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce) || ce_intent_to_add(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
-					 ce->name);
+		strbuf_setlen(&name, name_base_len);
+		strbuf_addstr(&name, ce->name);
+
+		if (S_ISREG(ce->ce_mode) &&
+		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
+				   S_ISDIR(ce->ce_mode) ||
+				   S_ISGITLINK(ce->ce_mode))) {
+			/*
+			 * If CE_VALID is on, we assume worktree file and its
+			 * cache entry are identical, even if worktree file has
+			 * been modified, so use cache version instead
+			 */
+			if (cached || (ce->ce_flags & CE_VALID) ||
+			    ce_skip_worktree(ce)) {
+				if (ce_stage(ce) || ce_intent_to_add(ce))
+					continue;
+				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
+						 0, ce->name);
+			} else {
+				hit |= grep_file(opt, ce->name);
+			}
+		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+			   submodule_path_match(pathspec, name.buf, NULL)) {
+			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
 		}
-		else
-			hit |= grep_file(opt, ce->name);
+
 		if (ce_stage(ce)) {
 			do {
 				nr++;
@@ -413,6 +658,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -651,6 +898,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("search in both tracked and untracked files")),
 		OPT_SET_INT(0, "exclude-standard", &opt_exclude,
 			    N_("ignore files specified via '.gitignore'"), 1),
+		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
+			 N_("recursivley search in each submodule")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -755,6 +1004,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	init_grep_defaults();
 	git_config(grep_cmd_config, NULL);
 	grep_init(&opt, prefix);
+	super_prefix = get_super_prefix();
 
 	/*
 	 * If there is no -- then the paths must exist in the working
@@ -872,6 +1122,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+	if (recurse_submodules) {
+		gitmodules_config();
+		compile_submodule_options(&opt, &pathspec, cached, untracked,
+					  opt_exclude, use_index,
+					  pattern_type_arg);
+	}
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -895,6 +1152,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (recurse_submodules && (!use_index || untracked || list.nr))
+		die(_("option not supported with --recurse-submodules."));
+
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
diff --git a/git.c b/git.c
index efa1059..a156efd 100644
--- a/git.c
+++ b/git.c
@@ -434,7 +434,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
-	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "grep", cmd_grep, RUN_SETUP_GENTLY | SUPPORT_SUPER_PREFIX },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
new file mode 100755
index 0000000..b670c70
--- /dev/null
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='Test grep recurse-submodules feature
+
+This test verifies the recurse-submodules feature correctly greps across
+submodules.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup directory structure and submodule' '
+	echo "foobar" >a &&
+	mkdir b &&
+	echo "bar" >b/b &&
+	git add a b &&
+	git commit -m "add a and b" &&
+	git init submodule &&
+	echo "foobar" >submodule/a &&
+	git -C submodule add a &&
+	git -C submodule commit -m "add a" &&
+	git submodule add ./submodule &&
+	git commit -m "added submodule"
+'
+
+test_expect_success 'grep correctly finds patterns in a submodule' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and basic pathspecs' '
+	cat >expect <<-\EOF &&
+	submodule/a:foobar
+	EOF
+
+	git grep -e. --recurse-submodules -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and nested submodules' '
+	git init submodule/sub &&
+	echo "foobar" >submodule/sub/a &&
+	git -C submodule/sub add a &&
+	git -C submodule/sub commit -m "add a" &&
+	git -C submodule submodule add ./sub &&
+	git -C submodule add sub &&
+	git -C submodule commit -m "added sub" &&
+	git add submodule &&
+	git commit -m "updated submodule" &&
+
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --and -e "foo" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	b/b:bar
+	EOF
+
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules > actual &&
+	test_cmp expect actual
+'
+
+test_incompatible_with_recurse_submodules ()
+{
+	test_expect_success "--recurse-submodules and $1 are incompatible" "
+		test_must_fail git grep -e. --recurse-submodules $1 2>actual &&
+		test_i18ngrep 'not supported with --recurse-submodules' actual
+	"
+}
+
+test_incompatible_with_recurse_submodules --untracked
+test_incompatible_with_recurse_submodules --no-index
+test_incompatible_with_recurse_submodules HEAD
+
+test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
                       ` (3 preceding siblings ...)
  2016-11-11 23:51     ` [PATCH v3 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-11-11 23:51     ` Brandon Williams
  2016-11-14 18:10       ` Junio C Hamano
  2016-11-16  1:09       ` Stefan Beller
  2016-11-11 23:51     ` [PATCH v3 6/6] grep: search history of moved submodules Brandon Williams
                       ` (2 subsequent siblings)
  7 siblings, 2 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-11 23:51 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD` from:
HEAD:file
<commit sha1 of submodule>:sub/file

to:
HEAD:file
HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         | 13 +++++-
 builtin/grep.c                     | 83 +++++++++++++++++++++++++++++++++++---
 t/t7814-grep-recurse-submodules.sh | 44 +++++++++++++++++++-
 3 files changed, 131 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..71f32f3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules]
+	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
 	Recursively search in each submodule that has been initialized and
-	checked out in the repository.
+	checked out in the repository.  When used in combination with the
+	<tree> option the prefix of all submodule output will be the name of
+	the parent project's <tree> object.
+
+--parent-basename <basename>::
+	For internal use only.  In order to produce uniform output with the
+	--recurse-submodules option, this option can be used to provide the
+	basename of a parent's <tree> object to a submodule so the submodule
+	can prefix its output with the parent's name rather than the SHA1 of
+	the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index 1fd292f..93e5405 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
@@ -535,19 +537,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
+	const char *end_of_base;
+	const char *name;
 	struct work_item *w = opt->output_priv;
 
+	end_of_base = strchr(gs->name, ':');
+	if (end_of_base)
+		name = end_of_base + 1;
+	else
+		name = gs->name;
+
 	prepare_submodule_repo_env(&cp.env_array);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
 			 super_prefix ? super_prefix : "",
-			 gs->name);
+			 name);
 	argv_array_push(&cp.args, "grep");
 
+	/*
+	 * Add basename of parent project
+	 * When performing grep on a <tree> object the filename is prefixed
+	 * with the object's name: '<tree-name>:filename'.  In order to
+	 * provide uniformity of output we want to pass the name of the
+	 * parent project's object name to the submodule so the submodule can
+	 * prefix its output with the parent's name and not its own SHA1.
+	 */
+	if (end_of_base)
+		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
+				 (int) (end_of_base - gs->name),
+				 gs->name);
+
 	/* Add options */
-	for (i = 0; i < submodule_options.argc; i++)
+	for (i = 0; i < submodule_options.argc; i++) {
+		/*
+		 * If there is a <tree> identifier for the submodule, add the
+		 * rev after adding the submodule options but before the
+		 * pathspecs.  To do this we listen for the '--' and insert the
+		 * sha1 before pushing the '--' onto the child process argv
+		 * array.
+		 */
+		if (gs->identifier &&
+		    !strcmp("--", submodule_options.argv[i])) {
+			argv_array_push(&cp.args, sha1_to_hex(gs->identifier));
+		}
+
 		argv_array_push(&cp.args, submodule_options.argv[i]);
+	}
 
 	cp.git_cmd = 1;
 	cp.dir = gs->path;
@@ -671,12 +707,29 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	enum interesting match = entry_not_interesting;
 	struct name_entry entry;
 	int old_baselen = base->len;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		strbuf_addstr(&name, super_prefix);
+		name_base_len = name.len;
+	}
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
+			strbuf_setlen(&name, name_base_len);
+			strbuf_addstr(&name, base->buf + tn_len);
+
+			if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+				strbuf_addstr(&name, entry.path);
+				match = submodule_path_match(pathspec, name.buf,
+							     NULL);
+			} else {
+				match = tree_entry_interesting(&entry, &name,
+							       0, pathspec);
+			}
+
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
@@ -688,8 +741,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_sha1(opt, entry.oid->hash, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
-		}
-		else if (S_ISDIR(entry.mode)) {
+		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
@@ -705,12 +757,18 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
+		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+			hit |= grep_submodule(opt, entry.oid->hash, base->buf,
+					      base->buf + tn_len);
 		}
+
 		strbuf_setlen(base, old_baselen);
 
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -734,6 +792,10 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
 
+		/* Use parent's name as base when recursing submodules */
+		if (recurse_submodules && parent_basename)
+			name = parent_basename;
+
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
@@ -760,6 +822,12 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
+
+		/* load the gitmodules file for this rev */
+		if (recurse_submodules) {
+			submodule_free();
+			gitmodules_config_sha1(real_obj->oid.hash);
+		}
 		if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) {
 			hit = 1;
 			if (opt->status_only)
@@ -900,6 +968,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    N_("ignore files specified via '.gitignore'"), 1),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			 N_("recursivley search in each submodule")),
+		OPT_STRING(0, "parent-basename", &parent_basename,
+			   N_("basename"),
+			   N_("prepend parent project's basename to output")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -1152,7 +1223,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked || list.nr))
+	if (recurse_submodules && (!use_index || untracked))
 		die(_("option not supported with --recurse-submodules."));
 
 	if (!show_in_pager && !opt.status_only)
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index b670c70..3d1892d 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -84,6 +84,49 @@ test_expect_success 'grep and multiple patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success 'basic grep tree' '
+	cat >expect <<-\EOF &&
+	HEAD:a:foobar
+	HEAD:b/b:bar
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^' '
+	cat >expect <<-\EOF &&
+	HEAD^:a:foobar
+	HEAD^:b/b:bar
+	HEAD^:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^ > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^^' '
+	cat >expect <<-\EOF &&
+	HEAD^^:a:foobar
+	HEAD^^:b/b:bar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^^ > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- submodule > actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
@@ -94,6 +137,5 @@ test_incompatible_with_recurse_submodules ()
 
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
-test_incompatible_with_recurse_submodules HEAD
 
 test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v3 6/6] grep: search history of moved submodules
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
                       ` (4 preceding siblings ...)
  2016-11-11 23:51     ` [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-11-11 23:51     ` Brandon Williams
  2016-11-12  0:30       ` Stefan Beller
  2016-11-15 17:42     ` [PATCH v3 0/6] recursively grep across submodules Stefan Beller
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-11 23:51 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.

This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit.  If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.

In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/grep.c                     | 18 +++++++++++++++--
 t/t7814-grep-recurse-submodules.sh | 41 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 93e5405..1879432 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -548,6 +548,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 		name = gs->name;
 
 	prepare_submodule_repo_env(&cp.env_array);
+	argv_array_push(&cp.env_array, GIT_DIR_ENVIRONMENT);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
@@ -615,8 +616,21 @@ static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 			  const char *filename, const char *path)
 {
 	if (!(is_submodule_initialized(path) &&
-	      is_submodule_populated(path)))
-		return 0;
+	      is_submodule_populated(path))) {
+		/*
+		 * If searching history, check for the presense of the
+		 * submodule's gitdir before skipping the submodule.
+		 */
+		if (sha1) {
+			path = git_path("modules/%s",
+					submodule_from_path(null_sha1, path)->name);
+
+			if(!(is_directory(path) && is_git_directory(path)))
+				return 0;
+		} else {
+			return 0;
+		}
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 3d1892d..ee173ad 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -127,6 +127,47 @@ test_expect_success 'grep tree and pathspecs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+	git init parent &&
+	echo "foobar" >parent/file &&
+	git -C parent add file &&
+	git -C parent commit -m "add file" &&
+
+	git init sub &&
+	echo "foobar" >sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+
+	git -C parent submodule add ../sub &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	test_cmp expect actual &&
+
+	git -C parent mv sub sub-moved &&
+	git -C parent commit -m "moved submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub-moved/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD^:file:foobar
+	HEAD^:sub/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ > actual &&
+	test_cmp expect actual &&
+
+	rm -rf parent sub
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 2/6] submodules: load gitmodules file from commit sha1
  2016-11-11 23:51     ` [PATCH v3 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-11-12  0:22       ` Stefan Beller
  0 siblings, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-11-12  0:22 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams <bmwill@google.com> wrote:
> teach submodules to load a '.gitmodules' file from a commit sha1.  This
> enables the population of the submodule_cache to be based on the state
> of the '.gitmodules' file from a particular commit.

This is the actual implementation that lead to
https://public-inbox.org/git/20161102231722.15787-4-sbeller@google.com/
(part of origin/sb/submodule-config-cleanup)

To produce cleaner history, we may want to pick that commit into this patch?
That would allow to extend the documentation or just this commit message
to talk about raciness in case we ever want to go multi-threaded with this,
as the current API is not ready for threading, AFAICT this will be used as:

    gitmodules_config_sha1(&interested_sha1)

    struct submodule *sub = submodule_by_path(path, null_sha1);

and the reason you need this API for now is because the
two lines of code happen to called at very different places, such that it is
more convenient to have this API instead of calling submodule_from_path with
the correct sha1 in the first place. This is because the sha1 is not
available at
the place where you want to call submodule_by_path.

>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  cache.h            |  2 ++
>  config.c           |  8 ++++----
>  submodule-config.c |  6 +++---
>  submodule-config.h |  3 +++
>  submodule.c        | 12 ++++++++++++
>  submodule.h        |  1 +
>  6 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 1be6526..559a461 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, void *);
>  extern int git_config_from_file(config_fn_t fn, const char *, void *);
>  extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
>                                         const char *name, const char *buf, size_t len, void *data);
> +extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
> +                                    const unsigned char *sha1, void *data);
>  extern void git_config_push_parameter(const char *text);
>  extern int git_config_from_parameters(config_fn_t fn, void *data);
>  extern void git_config(config_fn_t fn, void *);
> diff --git a/config.c b/config.c
> index 83fdecb..4d78e72 100644
> --- a/config.c
> +++ b/config.c
> @@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
>         return do_config_from(&top, fn, data);
>  }
>
> -static int git_config_from_blob_sha1(config_fn_t fn,
> -                                    const char *name,
> -                                    const unsigned char *sha1,
> -                                    void *data)
> +int git_config_from_blob_sha1(config_fn_t fn,
> +                             const char *name,
> +                             const unsigned char *sha1,
> +                             void *data)
>  {
>         enum object_type type;
>         char *buf;
> diff --git a/submodule-config.c b/submodule-config.c
> index 098085b..8b9a2ef 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
>         return ret;
>  }
>
> -static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
> -                                     unsigned char *gitmodules_sha1,
> -                                     struct strbuf *rev)
> +int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
> +                              unsigned char *gitmodules_sha1,
> +                              struct strbuf *rev)
>  {
>         int ret = 0;
>
> diff --git a/submodule-config.h b/submodule-config.h
> index d05c542..78584ba 100644
> --- a/submodule-config.h
> +++ b/submodule-config.h
> @@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
>                 const char *name);
>  const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
>                 const char *path);
> +extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
> +                                     unsigned char *gitmodules_sha1,
> +                                     struct strbuf *rev);
>  void submodule_free(void);
>
>  #endif /* SUBMODULE_CONFIG_H */
> diff --git a/submodule.c b/submodule.c
> index f5107f0..062e58b 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -198,6 +198,18 @@ void gitmodules_config(void)
>         }
>  }
>
> +void gitmodules_config_sha1(const unsigned char *commit_sha1)
> +{
> +       struct strbuf rev = STRBUF_INIT;
> +       unsigned char sha1[20];
> +
> +       if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
> +               git_config_from_blob_sha1(submodule_config, rev.buf,
> +                                         sha1, NULL);
> +       }
> +       strbuf_release(&rev);
> +}
> +
>  /*
>   * Determine if a submodule has been initialized at a given 'path'
>   */
> diff --git a/submodule.h b/submodule.h
> index 6ec5f2f..9203d89 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
>                 const char *path);
>  int submodule_config(const char *var, const char *value, void *cb);
>  void gitmodules_config(void);
> +extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
>  extern int is_submodule_initialized(const char *path);
>  extern int is_submodule_populated(const char *path);
>  int parse_submodule_update_strategy(const char *value,
> --
> 2.8.0.rc3.226.g39d4020
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 6/6] grep: search history of moved submodules
  2016-11-11 23:51     ` [PATCH v3 6/6] grep: search history of moved submodules Brandon Williams
@ 2016-11-12  0:30       ` Stefan Beller
  2016-11-14 17:43         ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Stefan Beller @ 2016-11-12  0:30 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams <bmwill@google.com> wrote:

> +
> +       rm -rf parent sub

This line sounds like a perfect candidate for "test_when_finished"
at the beginning of the test

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 6/6] grep: search history of moved submodules
  2016-11-12  0:30       ` Stefan Beller
@ 2016-11-14 17:43         ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-14 17:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On 11/11, Stefan Beller wrote:
> On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams <bmwill@google.com> wrote:
> 
> > +
> > +       rm -rf parent sub
> 
> This line sounds like a perfect candidate for "test_when_finished"
> at the beginning of the test

K will do.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-11 23:51     ` [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-11-14 18:10       ` Junio C Hamano
  2016-11-14 18:44         ` Jonathan Tan
  2016-11-16  1:09       ` Stefan Beller
  1 sibling, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-11-14 18:10 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git, sbeller, Brandon Williams

Brandon Williams <bmwill@google.com> writes:

> Teach grep to recursively search in submodules when provided with a
> <tree> object. This allows grep to search a submodule based on the state
> of the submodule that is present in a commit of the super project.
>
> When grep is provided with a <tree> object, the name of the object is
> prefixed to all output.  In order to provide uniformity of output
> between the parent and child processes the option `--parent-basename`
> has been added so that the child can preface all of it's output with the
> name of the parent's object instead of the name of the commit SHA1 of
> the submodule. This changes output from the command
> `git grep -e. -l --recurse-submodules HEAD` from:
> HEAD:file
> <commit sha1 of submodule>:sub/file
>
> to:
> HEAD:file
> HEAD:sub/file
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---

Unrelated tangent, but this makes readers wonder what the updated
trailer code would do to the last paragraph ;-).  Does it behave
sensibly (with some sane definition of sensibleness)?

I am guessing that it would, because neither To: or HEAD: is what we
normally recognize as a known trailer block element.



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-14 18:10       ` Junio C Hamano
@ 2016-11-14 18:44         ` Jonathan Tan
  2016-11-14 18:56           ` Junio C Hamano
  0 siblings, 1 reply; 126+ messages in thread
From: Jonathan Tan @ 2016-11-14 18:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, Brandon Williams

On 11/14/2016 10:10 AM, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
>
>> Teach grep to recursively search in submodules when provided with a
>> <tree> object. This allows grep to search a submodule based on the state
>> of the submodule that is present in a commit of the super project.
>>
>> When grep is provided with a <tree> object, the name of the object is
>> prefixed to all output.  In order to provide uniformity of output
>> between the parent and child processes the option `--parent-basename`
>> has been added so that the child can preface all of it's output with the
>> name of the parent's object instead of the name of the commit SHA1 of
>> the submodule. This changes output from the command
>> `git grep -e. -l --recurse-submodules HEAD` from:
>> HEAD:file
>> <commit sha1 of submodule>:sub/file
>>
>> to:
>> HEAD:file
>> HEAD:sub/file
>>
>> Signed-off-by: Brandon Williams <bmwill@google.com>
>> ---
>
> Unrelated tangent, but this makes readers wonder what the updated
> trailer code would do to the last paragraph ;-).  Does it behave
> sensibly (with some sane definition of sensibleness)?
>
> I am guessing that it would, because neither To: or HEAD: is what we
> normally recognize as a known trailer block element.

Yes, it behaves sensibly :-) because "Signed-off-by:" is preceded by a 
blank line, so the trailer block consists only of that line.

Having said that, it is probably better to indent those examples in the 
commit message (by at least one space or one tab) - then they will never 
be confused with trailers (once my patch set is in).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-14 18:44         ` Jonathan Tan
@ 2016-11-14 18:56           ` Junio C Hamano
  2016-11-14 19:08             ` Jonathan Tan
  0 siblings, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-11-14 18:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, sbeller, Brandon Williams

Jonathan Tan <jonathantanmy@google.com> writes:

>>> to:
>>> HEAD:file
>>> HEAD:sub/file
>>>
>>> Signed-off-by: Brandon Williams <bmwill@google.com>
>>> ---
>>
>> Unrelated tangent, but this makes readers wonder what the updated
>> trailer code would do to the last paragraph ;-).  Does it behave
>> sensibly (with some sane definition of sensibleness)?
>>
>> I am guessing that it would, because neither To: or HEAD: is what we
>> normally recognize as a known trailer block element.
>
> Yes, it behaves sensibly :-) because "Signed-off-by:" is preceded by a
> blank line, so the trailer block consists only of that line.

Oh, that was not what I was wondering.  Imagine Brandon writing his
message that ends in these three questionable lines and then running
"commit -s --amend" to add his sign-off---that was the case I was
wondering.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-14 18:56           ` Junio C Hamano
@ 2016-11-14 19:08             ` Jonathan Tan
  2016-11-14 19:14               ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Jonathan Tan @ 2016-11-14 19:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, Brandon Williams

On 11/14/2016 10:56 AM, Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
>
>>>> to:
>>>> HEAD:file
>>>> HEAD:sub/file
>>>>
>>>> Signed-off-by: Brandon Williams <bmwill@google.com>
>>>> ---
>>>
>>> Unrelated tangent, but this makes readers wonder what the updated
>>> trailer code would do to the last paragraph ;-).  Does it behave
>>> sensibly (with some sane definition of sensibleness)?
>>>
>>> I am guessing that it would, because neither To: or HEAD: is what we
>>> normally recognize as a known trailer block element.
>>
>> Yes, it behaves sensibly :-) because "Signed-off-by:" is preceded by a
>> blank line, so the trailer block consists only of that line.
>
> Oh, that was not what I was wondering.  Imagine Brandon writing his
> message that ends in these three questionable lines and then running
> "commit -s --amend" to add his sign-off---that was the case I was
> wondering.

Ah, I see. In that case, it would consider the last block as a trailer 
block and attach it directly:

   to:
   HEAD:file
   HEAD:sub/file
   Signed-off-by: ...

It is true that neither to: nor HEAD: are known trailers, but my patch 
set accepts trailer blocks that are 100% well-formed regardless of 
whether the trailers are known (to provide backwards compatibility with 
git-interpret-trailers, and to satisfy the certain use cases that I 
brought up). The "known trailer" check is used when the trailer block is 
not 100% well-formed.

This issue can be avoided if those lines were indented with at least one 
space or at least one tab.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-14 19:08             ` Jonathan Tan
@ 2016-11-14 19:14               ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-14 19:14 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Junio C Hamano, git, sbeller

On 11/14, Jonathan Tan wrote:
> On 11/14/2016 10:56 AM, Junio C Hamano wrote:
> >Jonathan Tan <jonathantanmy@google.com> writes:
> >
> >>>>to:
> >>>>HEAD:file
> >>>>HEAD:sub/file
> >>>>
> >>>>Signed-off-by: Brandon Williams <bmwill@google.com>
> >>>>---
> >>>
> >>>Unrelated tangent, but this makes readers wonder what the updated
> >>>trailer code would do to the last paragraph ;-).  Does it behave
> >>>sensibly (with some sane definition of sensibleness)?
> >>>
> >>>I am guessing that it would, because neither To: or HEAD: is what we
> >>>normally recognize as a known trailer block element.
> >>
> >>Yes, it behaves sensibly :-) because "Signed-off-by:" is preceded by a
> >>blank line, so the trailer block consists only of that line.
> >
> >Oh, that was not what I was wondering.  Imagine Brandon writing his
> >message that ends in these three questionable lines and then running
> >"commit -s --amend" to add his sign-off---that was the case I was
> >wondering.
> 
> Ah, I see. In that case, it would consider the last block as a
> trailer block and attach it directly:
> 
>   to:
>   HEAD:file
>   HEAD:sub/file
>   Signed-off-by: ...
> 
> It is true that neither to: nor HEAD: are known trailers, but my
> patch set accepts trailer blocks that are 100% well-formed
> regardless of whether the trailers are known (to provide backwards
> compatibility with git-interpret-trailers, and to satisfy the
> certain use cases that I brought up). The "known trailer" check is
> used when the trailer block is not 100% well-formed.
> 
> This issue can be avoided if those lines were indented with at least
> one space or at least one tab.

Who would have thought my simple example would cause this kind of
discussion!  I can update the commit message and indent the output so
that it looks like the following:

to:
  HEAD:file
  HEAD:sub/file

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 0/6] recursively grep across submodules
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
                       ` (5 preceding siblings ...)
  2016-11-11 23:51     ` [PATCH v3 6/6] grep: search history of moved submodules Brandon Williams
@ 2016-11-15 17:42     ` Stefan Beller
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
  7 siblings, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-11-15 17:42 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

coverity seems to dislike this part:

*** CID 1394367:  Null pointer dereferences  (NULL_RETURNS)
/builtin/grep.c: 625 in grep_submodule()
619                   is_submodule_populated(path))) {
620                     /*
621                      * If searching history, check for the presense of the
622                      * submodule's gitdir before skipping the submodule.
623                      */
624                     if (sha1) {
>>>     CID 1394367:  Null pointer dereferences  (NULL_RETURNS)
>>>     Dereferencing a null pointer "submodule_from_path(null_sha1, path)".
625                             path = git_path("modules/%s",
626
submodule_from_path(null_sha1, path)->name);
627
628                             if (!(is_directory(path) &&
is_git_directory(path)))
629                                     return 0;
630                     } else {

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules
  2016-11-11 23:51     ` [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-11-15 23:49       ` Stefan Beller
  0 siblings, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-11-15 23:49 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams <bmwill@google.com> wrote:
> Add two helper functions to submodules.c.
> `is_submodule_initialized()` checks if a submodule has been initialized
> at a given path and `is_submodule_populated()` check if a submodule
> has been checked out at a given path.

This reminds me to write the documentation patch explaining the
concepts of submodules (specifically that overview page would state
all the possible states of submodules)

This patch looks good,
Stefan

> +
> +       if (module) {
> +               char *key = xstrfmt("submodule.%s.url", module->name);
> +               char *value = NULL;

minor nit:
In case a reroll is needed, you could replace `value` by
`not_needed` or `unused` to make it easier to follow.
Hence it also doesn't need initialization (Doh, it does
for free() to work, nevermind).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 4/6] grep: optionally recurse into submodules
  2016-11-11 23:51     ` [PATCH v3 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-11-16  0:07       ` Stefan Beller
  2016-11-17 22:13         ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Stefan Beller @ 2016-11-16  0:07 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams <bmwill@google.com> wrote:
> Allow grep to recognize submodules and recursively search for patterns in
> each submodule.  This is done by forking off a process to recursively
> call grep on each submodule.  The top level --super-prefix option is
> used to pass a path to the submodule which can in turn be used to
> prepend to output or in pathspec matching logic.
>
> Recursion only occurs for submodules which have been initialized and
> checked out by the parent project.  If a submodule hasn't been
> initialized and checked out it is simply skipped.
>
> In order to support the existing multi-threading infrastructure in grep,
> output from each child process is captured in a strbuf so that it can be
> later printed to the console in an ordered fashion.
>
> To limit the number of theads that are created, each child process has
> half the number of threads as its parents (minimum of 1), otherwise we
> potentailly have a fork-bomb.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---
>  Documentation/git-grep.txt         |   5 +
>  builtin/grep.c                     | 300 ++++++++++++++++++++++++++++++++++---
>  git.c                              |   2 +-
>  t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
>  4 files changed, 385 insertions(+), 21 deletions(-)
>  create mode 100755 t/t7814-grep-recurse-submodules.sh
>
> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 0ecea6e..17aa1ba 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -26,6 +26,7 @@ SYNOPSIS
>            [--threads <num>]
>            [-f <file>] [-e] <pattern>
>            [--and|--or|--not|(|)|-e <pattern>...]
> +          [--recurse-submodules]
>            [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
>            [--] [<pathspec>...]
>
> @@ -88,6 +89,10 @@ OPTIONS
>         mechanism.  Only useful when searching files in the current directory
>         with `--no-index`.
>
> +--recurse-submodules::
> +       Recursively search in each submodule that has been initialized and
> +       checked out in the repository.
> +
>  -a::
>  --text::
>         Process binary files as if they were text.
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 8887b6a..1fd292f 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -18,12 +18,20 @@
>  #include "quote.h"
>  #include "dir.h"
>  #include "pathspec.h"
> +#include "submodule.h"
>
>  static char const * const grep_usage[] = {
>         N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
>         NULL
>  };
>
> +static const char *super_prefix;
> +static int recurse_submodules;
> +static struct argv_array submodule_options = ARGV_ARRAY_INIT;
> +
> +static int grep_submodule_launch(struct grep_opt *opt,
> +                                const struct grep_source *gs);
> +
>  #define GREP_NUM_THREADS_DEFAULT 8
>  static int num_threads;
>
> @@ -174,7 +182,10 @@ static void *run(void *arg)
>                         break;
>
>                 opt->output_priv = w;
> -               hit |= grep_source(opt, &w->source);
> +               if (w->source.type == GREP_SOURCE_SUBMODULE)
> +                       hit |= grep_submodule_launch(opt, &w->source);
> +               else
> +                       hit |= grep_source(opt, &w->source);
>                 grep_source_clear_data(&w->source);
>                 work_done(w);
>         }
> @@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
>         if (opt->relative && opt->prefix_length) {
>                 quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
>                 strbuf_insert(&pathbuf, 0, filename, tree_name_len);
> +       } else if (super_prefix) {
> +               strbuf_add(&pathbuf, filename, tree_name_len);
> +               strbuf_addstr(&pathbuf, super_prefix);
> +               strbuf_addstr(&pathbuf, filename + tree_name_len);
>         } else {
>                 strbuf_addstr(&pathbuf, filename);
>         }
> @@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
>  {
>         struct strbuf buf = STRBUF_INIT;
>
> -       if (opt->relative && opt->prefix_length)
> +       if (opt->relative && opt->prefix_length) {
>                 quote_path_relative(filename, opt->prefix, &buf);
> -       else
> +       } else {
> +               if (super_prefix)
> +                       strbuf_addstr(&buf, super_prefix);
>                 strbuf_addstr(&buf, filename);
> +       }
>
>  #ifndef NO_PTHREADS
>         if (num_threads) {
> @@ -378,31 +396,258 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
>                 exit(status);
>  }
>
> -static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
> +static void compile_submodule_options(const struct grep_opt *opt,
> +                                     const struct pathspec *pathspec,
> +                                     int cached, int untracked,
> +                                     int opt_exclude, int use_index,
> +                                     int pattern_type_arg)
> +{
> +       struct grep_pat *pattern;
> +       int i;
> +
> +       if (recurse_submodules)
> +               argv_array_push(&submodule_options, "--recurse-submodules");
> +
> +       if (cached)
> +               argv_array_push(&submodule_options, "--cached");
> +       if (!use_index)
> +               argv_array_push(&submodule_options, "--no-index");
> +       if (untracked)
> +               argv_array_push(&submodule_options, "--untracked");
> +       if (opt_exclude > 0)
> +               argv_array_push(&submodule_options, "--exclude-standard");
> +
> +       if (opt->invert)
> +               argv_array_push(&submodule_options, "-v");
> +       if (opt->ignore_case)
> +               argv_array_push(&submodule_options, "-i");
> +       if (opt->word_regexp)
> +               argv_array_push(&submodule_options, "-w");
> +       switch (opt->binary) {
> +       case GREP_BINARY_NOMATCH:
> +               argv_array_push(&submodule_options, "-I");
> +               break;
> +       case GREP_BINARY_TEXT:
> +               argv_array_push(&submodule_options, "-a");
> +               break;
> +       default:
> +               break;
> +       }
> +       if (opt->allow_textconv)
> +               argv_array_push(&submodule_options, "--textconv");
> +       if (opt->max_depth != -1)
> +               argv_array_pushf(&submodule_options, "--max-depth=%d",
> +                                opt->max_depth);
> +       if (opt->linenum)
> +               argv_array_push(&submodule_options, "-n");
> +       if (!opt->pathname)
> +               argv_array_push(&submodule_options, "-h");
> +       if (!opt->relative)
> +               argv_array_push(&submodule_options, "--full-name");
> +       if (opt->name_only)
> +               argv_array_push(&submodule_options, "-l");
> +       if (opt->unmatch_name_only)
> +               argv_array_push(&submodule_options, "-L");
> +       if (opt->null_following_name)
> +               argv_array_push(&submodule_options, "-z");
> +       if (opt->count)
> +               argv_array_push(&submodule_options, "-c");
> +       if (opt->file_break)
> +               argv_array_push(&submodule_options, "--break");
> +       if (opt->heading)
> +               argv_array_push(&submodule_options, "--heading");
> +       if (opt->pre_context)
> +               argv_array_pushf(&submodule_options, "--before-context=%d",
> +                                opt->pre_context);
> +       if (opt->post_context)
> +               argv_array_pushf(&submodule_options, "--after-context=%d",
> +                                opt->post_context);
> +       if (opt->funcname)
> +               argv_array_push(&submodule_options, "-p");
> +       if (opt->funcbody)
> +               argv_array_push(&submodule_options, "-W");
> +       if (opt->all_match)
> +               argv_array_push(&submodule_options, "--all-match");
> +       if (opt->debug)
> +               argv_array_push(&submodule_options, "--debug");
> +       if (opt->status_only)
> +               argv_array_push(&submodule_options, "-q");
> +
> +       switch (pattern_type_arg) {
> +       case GREP_PATTERN_TYPE_BRE:
> +               argv_array_push(&submodule_options, "-G");
> +               break;
> +       case GREP_PATTERN_TYPE_ERE:
> +               argv_array_push(&submodule_options, "-E");
> +               break;
> +       case GREP_PATTERN_TYPE_FIXED:
> +               argv_array_push(&submodule_options, "-F");
> +               break;
> +       case GREP_PATTERN_TYPE_PCRE:
> +               argv_array_push(&submodule_options, "-P");
> +               break;
> +       case GREP_PATTERN_TYPE_UNSPECIFIED:
> +               break;
> +       }
> +
> +       for (pattern = opt->pattern_list; pattern != NULL;
> +            pattern = pattern->next) {
> +               switch (pattern->token) {
> +               case GREP_PATTERN:
> +                       argv_array_pushf(&submodule_options, "-e%s",
> +                                        pattern->pattern);
> +                       break;
> +               case GREP_AND:
> +               case GREP_OPEN_PAREN:
> +               case GREP_CLOSE_PAREN:
> +               case GREP_NOT:
> +               case GREP_OR:
> +                       argv_array_push(&submodule_options, pattern->pattern);
> +                       break;
> +               /* BODY and HEAD are not used by git-grep */
> +               case GREP_PATTERN_BODY:
> +               case GREP_PATTERN_HEAD:
> +                       break;
> +               }
> +       }
> +
> +       /*
> +        * Limit number of threads for child process to use.
> +        * This is to prevent potential fork-bomb behavior of git-grep as each
> +        * submodule process has its own thread pool.
> +        */
> +       if (num_threads)
> +               argv_array_pushf(&submodule_options, "--threads=%d",
> +                                (num_threads + 1) / 2);

I think you would want to pass --threads=%d unconditionally,
as it also serves as a weak defusal for fork bombs. Is it possible to come here
with num_threads=0? (i.e. what happens if the user doesn't specify the number
of threads or such, do we fall back to some default or is it just 0?)

I have seen some other places that check for num_threads unequal to 0,
as e.g. no mutex needs to be locked then (assuming we don't have any
thread but grep within the main process), but as you intend to use this also
as a helper to not blow up the number of threads recursively, we'd need to
pass at a number != 0 here?

> +
> +       git grep -e "bar" --and -e "foo" --recurse-submodules > actual &&

nit here and in the tests below:
We prefer to have no white space between > and the file piped to.

> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'grep and multiple patterns' '
> +       cat >expect <<-\EOF &&
> +       b/b:bar
> +       EOF
> +
> +       git grep -e "bar" --and --not -e "foo" --recurse-submodules > actual &&

Otherwise, this patch looks good.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-11 23:51     ` [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
  2016-11-14 18:10       ` Junio C Hamano
@ 2016-11-16  1:09       ` Stefan Beller
  2016-11-17 23:34         ` Brandon Williams
  1 sibling, 1 reply; 126+ messages in thread
From: Stefan Beller @ 2016-11-16  1:09 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams <bmwill@google.com> wrote:

> to:
> HEAD:file
> HEAD:sub/file

  Maybe indent this ;)

>  static struct argv_array submodule_options = ARGV_ARRAY_INIT;
> +static const char *parent_basename;
>
>  static int grep_submodule_launch(struct grep_opt *opt,
>                                  const struct grep_source *gs);
> @@ -535,19 +537,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
>  {
>         struct child_process cp = CHILD_PROCESS_INIT;
>         int status, i;
> +       const char *end_of_base;
> +       const char *name;
>         struct work_item *w = opt->output_priv;
>
> +       end_of_base = strchr(gs->name, ':');
> +       if (end_of_base)
> +               name = end_of_base + 1;
> +       else
> +               name = gs->name;
> +
>         prepare_submodule_repo_env(&cp.env_array);
>
>         /* Add super prefix */
>         argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
>                          super_prefix ? super_prefix : "",
> -                        gs->name);
> +                        name);
>         argv_array_push(&cp.args, "grep");
>
> +       /*
> +        * Add basename of parent project
> +        * When performing grep on a <tree> object the filename is prefixed
> +        * with the object's name: '<tree-name>:filename'.

This comment is hard to read as it's unclear what the <angle brackets> mean.
(Are the supposed to indicate a variable? If so why is file name not marked up?)

>  In order to
> +        * provide uniformity of output we want to pass the name of the
> +        * parent project's object name to the submodule so the submodule can
> +        * prefix its output with the parent's name and not its own SHA1.
> +        */
> +       if (end_of_base)
> +               argv_array_pushf(&cp.args, "--parent-basename=%.*s",
> +                                (int) (end_of_base - gs->name),
> +                                gs->name);

Do we pass this only with the tree-ish?
What if we are grepping the working tree and the file name contains a colon?

> +test_expect_success 'grep tree HEAD^' '
> +       cat >expect <<-\EOF &&
> +       HEAD^:a:foobar
> +       HEAD^:b/b:bar
> +       HEAD^:submodule/a:foobar
> +       EOF
> +
> +       git grep -e "bar" --recurse-submodules HEAD^ > actual &&
> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'grep tree HEAD^^' '
> +       cat >expect <<-\EOF &&
> +       HEAD^^:a:foobar
> +       HEAD^^:b/b:bar
> +       EOF
> +
> +       git grep -e "bar" --recurse-submodules HEAD^^ > actual &&
> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'grep tree and pathspecs' '
> +       cat >expect <<-\EOF &&
> +       HEAD:submodule/a:foobar
> +       HEAD:submodule/sub/a:foobar
> +       EOF
> +
> +       git grep -e "bar" --recurse-submodules HEAD -- submodule > actual &&
> +       test_cmp expect actual
> +'

Mind to add tests for
* recursive submodules (say 2 levels), preferrably not having the
  gitlink at the root each, i.e. root has a sub1 at path subs/sub1 and
sub1 has a sub2
  at path subs/sub2, such that recursing would produce a path like
  HEAD:subs/sub1/subs/sub2/dir/file ?
* file names with a colon in it
* instead of just HEAD referencing trees, maybe a sha1 referenced test as well
  (though it is not immediately clear what the benefit would be)
* what if the submodule doesn't have the commit referenced in the given sha1

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 4/6] grep: optionally recurse into submodules
  2016-11-16  0:07       ` Stefan Beller
@ 2016-11-17 22:13         ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-17 22:13 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On 11/15, Stefan Beller wrote:
> > +       /*
> > +        * Limit number of threads for child process to use.
> > +        * This is to prevent potential fork-bomb behavior of git-grep as each
> > +        * submodule process has its own thread pool.
> > +        */
> > +       if (num_threads)
> > +               argv_array_pushf(&submodule_options, "--threads=%d",
> > +                                (num_threads + 1) / 2);
> 
> I think you would want to pass --threads=%d unconditionally,
> as it also serves as a weak defusal for fork bombs. Is it possible to come here
> with num_threads=0? (i.e. what happens if the user doesn't specify the number
> of threads or such, do we fall back to some default or is it just 0?)
> 
> I have seen some other places that check for num_threads unequal to 0,
> as e.g. no mutex needs to be locked then (assuming we don't have any
> thread but grep within the main process), but as you intend to use this also
> as a helper to not blow up the number of threads recursively, we'd need to
> pass at a number != 0 here?

The option parsing logic in cmd_grep handles the cases where num_threads
is some odd value (and fails if <0).  In the case where it is 0, it will
default to 8 under certain circumstances.  I figured I would just let
that logic handle the cases where num_theads ends up being 0 instead of
explicitly passing threads=1.  You can't pass threads=0 in some cases
due to the default "oh look threads==0, looks like we should use 8!"
case.

> 
> > +
> > +       git grep -e "bar" --and -e "foo" --recurse-submodules > actual &&
> 
> nit here and in the tests below:
> We prefer to have no white space between > and the file piped to.

I'll fix that up everywhere.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-16  1:09       ` Stefan Beller
@ 2016-11-17 23:34         ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-17 23:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On 11/15, Stefan Beller wrote:
> On Fri, Nov 11, 2016 at 3:51 PM, Brandon Williams <bmwill@google.com> wrote:
> > +       /*
> > +        * Add basename of parent project
> > +        * When performing grep on a <tree> object the filename is prefixed
> > +        * with the object's name: '<tree-name>:filename'.
> 
> This comment is hard to read as it's unclear what the <angle brackets> mean.
> (Are the supposed to indicate a variable? If so why is file name not marked up?)

Yeah you're right, the angle brackets don't really add anything to the
comment.  I'll drop them.

> >  In order to
> > +        * provide uniformity of output we want to pass the name of the
> > +        * parent project's object name to the submodule so the submodule can
> > +        * prefix its output with the parent's name and not its own SHA1.
> > +        */
> > +       if (end_of_base)
> > +               argv_array_pushf(&cp.args, "--parent-basename=%.*s",
> > +                                (int) (end_of_base - gs->name),
> > +                                gs->name);
> 
> Do we pass this only with the tree-ish?
> What if we are grepping the working tree and the file name contains a colon?

Actually you're right, this would only happen if we are passing a
tree-ish, which has a tree-name prefixed to the filename.  I'll add that
as an additional check to ensure that this handles file names with a
colon correctly....though why you have a colon in a filename is beyond
me :P

> > +test_expect_success 'grep tree HEAD^' '
> > +       cat >expect <<-\EOF &&
> > +       HEAD^:a:foobar
> > +       HEAD^:b/b:bar
> > +       HEAD^:submodule/a:foobar
> > +       EOF
> > +
> > +       git grep -e "bar" --recurse-submodules HEAD^ > actual &&
> > +       test_cmp expect actual
> > +'
> > +
> > +test_expect_success 'grep tree HEAD^^' '
> > +       cat >expect <<-\EOF &&
> > +       HEAD^^:a:foobar
> > +       HEAD^^:b/b:bar
> > +       EOF
> > +
> > +       git grep -e "bar" --recurse-submodules HEAD^^ > actual &&
> > +       test_cmp expect actual
> > +'
> > +
> > +test_expect_success 'grep tree and pathspecs' '
> > +       cat >expect <<-\EOF &&
> > +       HEAD:submodule/a:foobar
> > +       HEAD:submodule/sub/a:foobar
> > +       EOF
> > +
> > +       git grep -e "bar" --recurse-submodules HEAD -- submodule > actual &&
> > +       test_cmp expect actual
> > +'
> 
> Mind to add tests for
> * recursive submodules (say 2 levels), preferrably not having the
>   gitlink at the root each, i.e. root has a sub1 at path subs/sub1 and
> sub1 has a sub2
>   at path subs/sub2, such that recursing would produce a path like
>   HEAD:subs/sub1/subs/sub2/dir/file ?
> * file names with a colon in it
> * instead of just HEAD referencing trees, maybe a sha1 referenced test as well
>   (though it is not immediately clear what the benefit would be)
> * what if the submodule doesn't have the commit referenced in the given sha1

I'll add more tests too!

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v4 0/6] recursively grep across submodules
  2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
                       ` (6 preceding siblings ...)
  2016-11-15 17:42     ` [PATCH v3 0/6] recursively grep across submodules Stefan Beller
@ 2016-11-18 19:58     ` Brandon Williams
  2016-11-18 19:58       ` [PATCH v4 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
                         ` (7 more replies)
  7 siblings, 8 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 19:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

This revision of this series should address all of the problems brought up with
v3.

* indent output example in patch 5/6.
* fix ':' in submodule names and add a test to verify.
* cleanup some comments.
* fixed tests to test the case where a submodule isn't at the root of a
  repository.
* always pass --threads=%d in order to limit threads to child proccess.

Brandon Williams (6):
  submodules: add helper functions to determine presence of submodules
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: search history of moved submodules

 Documentation/git-grep.txt         |  14 ++
 builtin/grep.c                     | 393 ++++++++++++++++++++++++++++++++++---
 cache.h                            |   2 +
 config.c                           |   8 +-
 git.c                              |   2 +-
 grep.c                             |  16 +-
 grep.h                             |   1 +
 submodule-config.c                 |   6 +-
 submodule-config.h                 |   3 +
 submodule.c                        |  50 +++++
 submodule.h                        |   3 +
 t/t7814-grep-recurse-submodules.sh | 213 ++++++++++++++++++++
 12 files changed, 679 insertions(+), 32 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

-- interdiff based on 'bw/grep-recurse-submodules'

diff --git a/builtin/grep.c b/builtin/grep.c
index 1cd2be9..747b0c3 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -518,9 +518,8 @@ static void compile_submodule_options(const struct grep_opt *opt,
 	 * This is to prevent potential fork-bomb behavior of git-grep as each
 	 * submodule process has its own thread pool.
 	 */
-	if (num_threads)
-		argv_array_pushf(&submodule_options, "--threads=%d",
-				 (num_threads + 1) / 2);
+	argv_array_pushf(&submodule_options, "--threads=%d",
+			 (num_threads + 1) / 2);
 
 	/* Add Pathspecs */
 	argv_array_push(&submodule_options, "--");
@@ -542,7 +541,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 	struct work_item *w = opt->output_priv;
 
 	end_of_base = strchr(gs->name, ':');
-	if (end_of_base)
+	if (gs->identifier && end_of_base)
 		name = end_of_base + 1;
 	else
 		name = gs->name;
@@ -558,13 +557,13 @@ static int grep_submodule_launch(struct grep_opt *opt,
 
 	/*
 	 * Add basename of parent project
-	 * When performing grep on a <tree> object the filename is prefixed
-	 * with the object's name: '<tree-name>:filename'.  In order to
+	 * When performing grep on a tree object the filename is prefixed
+	 * with the object's name: 'tree-name:filename'.  In order to
 	 * provide uniformity of output we want to pass the name of the
 	 * parent project's object name to the submodule so the submodule can
 	 * prefix its output with the parent's name and not its own SHA1.
 	 */
-	if (end_of_base)
+	if (gs->identifier && end_of_base)
 		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
 				 (int) (end_of_base - gs->name),
 				 gs->name);
@@ -572,7 +571,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 	/* Add options */
 	for (i = 0; i < submodule_options.argc; i++) {
 		/*
-		 * If there is a <tree> identifier for the submodule, add the
+		 * If there is a tree identifier for the submodule, add the
 		 * rev after adding the submodule options but before the
 		 * pathspecs.  To do this we listen for the '--' and insert the
 		 * sha1 before pushing the '--' onto the child process argv
@@ -615,17 +614,20 @@ static int grep_submodule_launch(struct grep_opt *opt,
 static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 			  const char *filename, const char *path)
 {
-	if (!(is_submodule_initialized(path) &&
-	      is_submodule_populated(path))) {
+	if (!is_submodule_initialized(path))
+		return 0;
+	if (!is_submodule_populated(path)) {
 		/*
 		 * If searching history, check for the presense of the
 		 * submodule's gitdir before skipping the submodule.
 		 */
 		if (sha1) {
-			path = git_path("modules/%s",
-					submodule_from_path(null_sha1, path)->name);
+			const struct submodule *sub =
+					submodule_from_path(null_sha1, path);
+			if (sub)
+				path = git_path("modules/%s", sub->name);
 
-			if (!(is_directory(path) && is_git_directory(path)))
+			if(!(is_directory(path) && is_git_directory(path)))
 				return 0;
 		} else {
 			return 0;
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index ee173ad..7d66716 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -60,7 +60,7 @@ test_expect_success 'grep and nested submodules' '
 	submodule/sub/a:foobar
 	EOF
 
-	git grep -e "bar" --recurse-submodules > actual &&
+	git grep -e "bar" --recurse-submodules >actual &&
 	test_cmp expect actual
 '
 
@@ -71,7 +71,7 @@ test_expect_success 'grep and multiple patterns' '
 	submodule/sub/a:foobar
 	EOF
 
-	git grep -e "bar" --and -e "foo" --recurse-submodules > actual &&
+	git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
 	test_cmp expect actual
 '
 
@@ -80,7 +80,7 @@ test_expect_success 'grep and multiple patterns' '
 	b/b:bar
 	EOF
 
-	git grep -e "bar" --and --not -e "foo" --recurse-submodules > actual &&
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
 	test_cmp expect actual
 '
 
@@ -92,7 +92,7 @@ test_expect_success 'basic grep tree' '
 	HEAD:submodule/sub/a:foobar
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD > actual &&
+	git grep -e "bar" --recurse-submodules HEAD >actual &&
 	test_cmp expect actual
 '
 
@@ -103,7 +103,7 @@ test_expect_success 'grep tree HEAD^' '
 	HEAD^:submodule/a:foobar
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD^ > actual &&
+	git grep -e "bar" --recurse-submodules HEAD^ >actual &&
 	test_cmp expect actual
 '
 
@@ -113,7 +113,7 @@ test_expect_success 'grep tree HEAD^^' '
 	HEAD^^:b/b:bar
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD^^ > actual &&
+	git grep -e "bar" --recurse-submodules HEAD^^ >actual &&
 	test_cmp expect actual
 '
 
@@ -123,49 +123,80 @@ test_expect_success 'grep tree and pathspecs' '
 	HEAD:submodule/sub/a:foobar
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD -- submodule > actual &&
+	git grep -e "bar" --recurse-submodules HEAD -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep recurse submodule colon in name' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >"parent/fi:le" &&
+	git -C parent add "fi:le" &&
+	git -C parent commit -m "add fi:le" &&
+
+	git init "su:b" &&
+	test_when_finished "rm -rf su:b" &&
+	echo "foobar" >"su:b/fi:le" &&
+	git -C "su:b" add "fi:le" &&
+	git -C "su:b" commit -m "add fi:le" &&
+
+	git -C parent submodule add "../su:b" "su:b" &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	fi:le:foobar
+	su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD:fi:le:foobar
+	HEAD:su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep history with moved submoules' '
 	git init parent &&
+	test_when_finished "rm -rf parent" &&
 	echo "foobar" >parent/file &&
 	git -C parent add file &&
 	git -C parent commit -m "add file" &&
 
 	git init sub &&
+	test_when_finished "rm -rf sub" &&
 	echo "foobar" >sub/file &&
 	git -C sub add file &&
 	git -C sub commit -m "add file" &&
 
-	git -C parent submodule add ../sub &&
+	git -C parent submodule add ../sub dir/sub &&
 	git -C parent commit -m "add submodule" &&
 
 	cat >expect <<-\EOF &&
+	dir/sub/file:foobar
 	file:foobar
-	sub/file:foobar
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
 	test_cmp expect actual &&
 
-	git -C parent mv sub sub-moved &&
+	git -C parent mv dir/sub sub-moved &&
 	git -C parent commit -m "moved submodule" &&
 
 	cat >expect <<-\EOF &&
 	file:foobar
 	sub-moved/file:foobar
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules > actual &&
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
 	test_cmp expect actual &&
 
 	cat >expect <<-\EOF &&
+	HEAD^:dir/sub/file:foobar
 	HEAD^:file:foobar
-	HEAD^:sub/file:foobar
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules HEAD^ > actual &&
-	test_cmp expect actual &&
-
-	rm -rf parent sub
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
 '
 
 test_incompatible_with_recurse_submodules ()

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v4 1/6] submodules: add helper functions to determine presence of submodules
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
@ 2016-11-18 19:58       ` Brandon Williams
  2016-11-18 19:58       ` [PATCH v4 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
                         ` (6 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 19:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Add two helper functions to submodules.c.
`is_submodule_initialized()` checks if a submodule has been initialized
at a given path and `is_submodule_populated()` check if a submodule
has been checked out at a given path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 38 ++++++++++++++++++++++++++++++++++++++
 submodule.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/submodule.c b/submodule.c
index 6f7d883..f5107f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,44 @@ void gitmodules_config(void)
 	}
 }
 
+/*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+	int ret = 0;
+	const struct submodule *module = NULL;
+
+	module = submodule_from_path(null_sha1, path);
+
+	if (module) {
+		char *key = xstrfmt("submodule.%s.url", module->name);
+		char *value = NULL;
+
+		ret = !git_config_get_string(key, &value);
+
+		free(value);
+		free(key);
+	}
+
+	return ret;
+}
+
+/*
+ * Determine if a submodule has been populated at a given 'path'
+ */
+int is_submodule_populated(const char *path)
+{
+	int ret = 0;
+	char *gitdir = xstrfmt("%s/.git", path);
+
+	if (resolve_gitdir(gitdir))
+		ret = 1;
+
+	free(gitdir);
+	return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a..6ec5f2f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
+extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v4 2/6] submodules: load gitmodules file from commit sha1
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
  2016-11-18 19:58       ` [PATCH v4 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-11-18 19:58       ` Brandon Williams
  2016-11-18 19:58       ` [PATCH v4 3/6] grep: add submodules as a grep source type Brandon Williams
                         ` (5 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 19:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 cache.h            |  2 ++
 config.c           |  8 ++++----
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c        | 12 ++++++++++++
 submodule.h        |  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 1be6526..559a461 100644
--- a/cache.h
+++ b/cache.h
@@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+				     const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb..4d78e72 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	return do_config_from(&top, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-				     const char *name,
-				     const unsigned char *sha1,
-				     void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+			      const char *name,
+			      const unsigned char *sha1,
+			      void *data)
 {
 	enum object_type type;
 	char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085b..8b9a2ef 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
-				      unsigned char *gitmodules_sha1,
-				      struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+			       unsigned char *gitmodules_sha1,
+			       struct strbuf *rev)
 {
 	int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..78584ba 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+				      unsigned char *gitmodules_sha1,
+				      struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index f5107f0..062e58b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
 	}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned char sha1[20];
+
+	if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
+		git_config_from_blob_sha1(submodule_config, rev.buf,
+					  sha1, NULL);
+	}
+	strbuf_release(&rev);
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index 6ec5f2f..9203d89 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v4 3/6] grep: add submodules as a grep source type
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
  2016-11-18 19:58       ` [PATCH v4 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
  2016-11-18 19:58       ` [PATCH v4 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-11-18 19:58       ` Brandon Williams
  2016-11-18 21:37         ` Junio C Hamano
  2016-11-18 19:58       ` [PATCH v4 4/6] grep: optionally recurse into submodules Brandon Williams
                         ` (4 subsequent siblings)
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 19:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 grep.c | 16 +++++++++++++++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35..0dbdc1d 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	case GREP_SOURCE_FILE:
 		gs->identifier = xstrdup(identifier);
 		break;
+	case GREP_SOURCE_SUBMODULE:
+		if (!identifier) {
+			gs->identifier = NULL;
+			break;
+		}
+		/*
+		 * FALL THROUGH
+		 * If the identifier is non-NULL (in the submodule case) it
+		 * will be a SHA1 that needs to be copied.
+		 */
 	case GREP_SOURCE_SHA1:
 		gs->identifier = xmalloc(20);
 		hashcpy(gs->identifier, identifier);
 		break;
 	case GREP_SOURCE_BUF:
 		gs->identifier = NULL;
+		break;
 	}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
 	switch (gs->type) {
 	case GREP_SOURCE_FILE:
 	case GREP_SOURCE_SHA1:
+	case GREP_SOURCE_SUBMODULE:
 		free(gs->buf);
 		gs->buf = NULL;
 		gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
 		return grep_source_load_sha1(gs);
 	case GREP_SOURCE_BUF:
 		return gs->buf ? 0 : -1;
+	case GREP_SOURCE_SUBMODULE:
+		break;
 	}
-	die("BUG: invalid grep_source type");
+	die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23..267534c 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
 		GREP_SOURCE_SHA1,
 		GREP_SOURCE_FILE,
 		GREP_SOURCE_BUF,
+		GREP_SOURCE_SUBMODULE,
 	} type;
 	void *identifier;
 
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v4 4/6] grep: optionally recurse into submodules
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
                         ` (2 preceding siblings ...)
  2016-11-18 19:58       ` [PATCH v4 3/6] grep: add submodules as a grep source type Brandon Williams
@ 2016-11-18 19:58       ` Brandon Williams
  2016-11-18 21:48         ` Junio C Hamano
                           ` (2 more replies)
  2016-11-18 19:58       ` [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
                         ` (3 subsequent siblings)
  7 siblings, 3 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 19:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |   5 +
 builtin/grep.c                     | 300 ++++++++++++++++++++++++++++++++++---
 git.c                              |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
 4 files changed, 385 insertions(+), 21 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e..17aa1ba 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
+	   [--recurse-submodules]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -88,6 +89,10 @@ OPTIONS
 	mechanism.  Only useful when searching files in the current directory
 	with `--no-index`.
 
+--recurse-submodules::
+	Recursively search in each submodule that has been initialized and
+	checked out in the repository.
+
 -a::
 --text::
 	Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6a..cfafa15 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
 	NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
 			break;
 
 		opt->output_priv = w;
-		hit |= grep_source(opt, &w->source);
+		if (w->source.type == GREP_SOURCE_SUBMODULE)
+			hit |= grep_submodule_launch(opt, &w->source);
+		else
+			hit |= grep_source(opt, &w->source);
 		grep_source_clear_data(&w->source);
 		work_done(w);
 	}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
+	} else if (super_prefix) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
+		strbuf_addstr(&pathbuf, super_prefix);
+		strbuf_addstr(&pathbuf, filename + tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 {
 	struct strbuf buf = STRBUF_INIT;
 
-	if (opt->relative && opt->prefix_length)
+	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename, opt->prefix, &buf);
-	else
+	} else {
+		if (super_prefix)
+			strbuf_addstr(&buf, super_prefix);
 		strbuf_addstr(&buf, filename);
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
@@ -378,31 +396,258 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 		exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+				      const struct pathspec *pathspec,
+				      int cached, int untracked,
+				      int opt_exclude, int use_index,
+				      int pattern_type_arg)
+{
+	struct grep_pat *pattern;
+	int i;
+
+	if (recurse_submodules)
+		argv_array_push(&submodule_options, "--recurse-submodules");
+
+	if (cached)
+		argv_array_push(&submodule_options, "--cached");
+	if (!use_index)
+		argv_array_push(&submodule_options, "--no-index");
+	if (untracked)
+		argv_array_push(&submodule_options, "--untracked");
+	if (opt_exclude > 0)
+		argv_array_push(&submodule_options, "--exclude-standard");
+
+	if (opt->invert)
+		argv_array_push(&submodule_options, "-v");
+	if (opt->ignore_case)
+		argv_array_push(&submodule_options, "-i");
+	if (opt->word_regexp)
+		argv_array_push(&submodule_options, "-w");
+	switch (opt->binary) {
+	case GREP_BINARY_NOMATCH:
+		argv_array_push(&submodule_options, "-I");
+		break;
+	case GREP_BINARY_TEXT:
+		argv_array_push(&submodule_options, "-a");
+		break;
+	default:
+		break;
+	}
+	if (opt->allow_textconv)
+		argv_array_push(&submodule_options, "--textconv");
+	if (opt->max_depth != -1)
+		argv_array_pushf(&submodule_options, "--max-depth=%d",
+				 opt->max_depth);
+	if (opt->linenum)
+		argv_array_push(&submodule_options, "-n");
+	if (!opt->pathname)
+		argv_array_push(&submodule_options, "-h");
+	if (!opt->relative)
+		argv_array_push(&submodule_options, "--full-name");
+	if (opt->name_only)
+		argv_array_push(&submodule_options, "-l");
+	if (opt->unmatch_name_only)
+		argv_array_push(&submodule_options, "-L");
+	if (opt->null_following_name)
+		argv_array_push(&submodule_options, "-z");
+	if (opt->count)
+		argv_array_push(&submodule_options, "-c");
+	if (opt->file_break)
+		argv_array_push(&submodule_options, "--break");
+	if (opt->heading)
+		argv_array_push(&submodule_options, "--heading");
+	if (opt->pre_context)
+		argv_array_pushf(&submodule_options, "--before-context=%d",
+				 opt->pre_context);
+	if (opt->post_context)
+		argv_array_pushf(&submodule_options, "--after-context=%d",
+				 opt->post_context);
+	if (opt->funcname)
+		argv_array_push(&submodule_options, "-p");
+	if (opt->funcbody)
+		argv_array_push(&submodule_options, "-W");
+	if (opt->all_match)
+		argv_array_push(&submodule_options, "--all-match");
+	if (opt->debug)
+		argv_array_push(&submodule_options, "--debug");
+	if (opt->status_only)
+		argv_array_push(&submodule_options, "-q");
+
+	switch (pattern_type_arg) {
+	case GREP_PATTERN_TYPE_BRE:
+		argv_array_push(&submodule_options, "-G");
+		break;
+	case GREP_PATTERN_TYPE_ERE:
+		argv_array_push(&submodule_options, "-E");
+		break;
+	case GREP_PATTERN_TYPE_FIXED:
+		argv_array_push(&submodule_options, "-F");
+		break;
+	case GREP_PATTERN_TYPE_PCRE:
+		argv_array_push(&submodule_options, "-P");
+		break;
+	case GREP_PATTERN_TYPE_UNSPECIFIED:
+		break;
+	}
+
+	for (pattern = opt->pattern_list; pattern != NULL;
+	     pattern = pattern->next) {
+		switch (pattern->token) {
+		case GREP_PATTERN:
+			argv_array_pushf(&submodule_options, "-e%s",
+					 pattern->pattern);
+			break;
+		case GREP_AND:
+		case GREP_OPEN_PAREN:
+		case GREP_CLOSE_PAREN:
+		case GREP_NOT:
+		case GREP_OR:
+			argv_array_push(&submodule_options, pattern->pattern);
+			break;
+		/* BODY and HEAD are not used by git-grep */
+		case GREP_PATTERN_BODY:
+		case GREP_PATTERN_HEAD:
+			break;
+		}
+	}
+
+	/*
+	 * Limit number of threads for child process to use.
+	 * This is to prevent potential fork-bomb behavior of git-grep as each
+	 * submodule process has its own thread pool.
+	 */
+	argv_array_pushf(&submodule_options, "--threads=%d",
+			 (num_threads + 1) / 2);
+
+	/* Add Pathspecs */
+	argv_array_push(&submodule_options, "--");
+	for (i = 0; i < pathspec->nr; i++)
+		argv_array_push(&submodule_options,
+				pathspec->items[i].original);
+}
+
+/*
+ * Launch child process to grep contents of a submodule
+ */
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int status, i;
+	struct work_item *w = opt->output_priv;
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	/* Add super prefix */
+	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
+			 super_prefix ? super_prefix : "",
+			 gs->name);
+	argv_array_push(&cp.args, "grep");
+
+	/* Add options */
+	for (i = 0; i < submodule_options.argc; i++)
+		argv_array_push(&cp.args, submodule_options.argv[i]);
+
+	cp.git_cmd = 1;
+	cp.dir = gs->path;
+
+	/*
+	 * Capture output to output buffer and check the return code from the
+	 * child process.  A '0' indicates a hit, a '1' indicates no hit and
+	 * anything else is an error.
+	 */
+	status = capture_command(&cp, &w->out, 0);
+	if (status && (status != 1)) {
+		/* flush the buffer */
+		write_or_die(1, w->out.buf, w->out.len);
+		die("process for submodule '%s' failed with exit code: %d",
+		    gs->name, status);
+	}
+
+	/* invert the return code to make a hit equal to 1 */
+	return !status;
+}
+
+/*
+ * Prep grep structures for a submodule grep
+ * sha1: the sha1 of the submodule or NULL if using the working tree
+ * filename: name of the submodule including tree name of parent
+ * path: location of the submodule
+ */
+static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
+			  const char *filename, const char *path)
+{
+	if (!is_submodule_initialized(path))
+		return 0;
+	if (!is_submodule_populated(path))
+		return 0;
+
+#ifndef NO_PTHREADS
+	if (num_threads) {
+		add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
+		return 0;
+	} else
+#endif
+	{
+		struct work_item w;
+		int hit;
+
+		grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
+				 filename, path, sha1);
+		strbuf_init(&w.out, 0);
+		opt->output_priv = &w;
+		hit = grep_submodule_launch(opt, &w.source);
+
+		write_or_die(1, w.out.buf, w.out.len);
+
+		grep_source_clear(&w.source);
+		strbuf_release(&w.out);
+		return hit;
+	}
+}
+
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
+		      int cached)
 {
 	int hit = 0;
 	int nr;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
+
 	read_cache();
 
 	for (nr = 0; nr < active_nr; nr++) {
 		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!ce_path_match(ce, pathspec, NULL))
-			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce) || ce_intent_to_add(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
-					 ce->name);
+		strbuf_setlen(&name, name_base_len);
+		strbuf_addstr(&name, ce->name);
+
+		if (S_ISREG(ce->ce_mode) &&
+		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
+				   S_ISDIR(ce->ce_mode) ||
+				   S_ISGITLINK(ce->ce_mode))) {
+			/*
+			 * If CE_VALID is on, we assume worktree file and its
+			 * cache entry are identical, even if worktree file has
+			 * been modified, so use cache version instead
+			 */
+			if (cached || (ce->ce_flags & CE_VALID) ||
+			    ce_skip_worktree(ce)) {
+				if (ce_stage(ce) || ce_intent_to_add(ce))
+					continue;
+				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
+						 0, ce->name);
+			} else {
+				hit |= grep_file(opt, ce->name);
+			}
+		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+			   submodule_path_match(pathspec, name.buf, NULL)) {
+			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
 		}
-		else
-			hit |= grep_file(opt, ce->name);
+
 		if (ce_stage(ce)) {
 			do {
 				nr++;
@@ -413,6 +658,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -651,6 +898,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("search in both tracked and untracked files")),
 		OPT_SET_INT(0, "exclude-standard", &opt_exclude,
 			    N_("ignore files specified via '.gitignore'"), 1),
+		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
+			 N_("recursivley search in each submodule")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -755,6 +1004,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	init_grep_defaults();
 	git_config(grep_cmd_config, NULL);
 	grep_init(&opt, prefix);
+	super_prefix = get_super_prefix();
 
 	/*
 	 * If there is no -- then the paths must exist in the working
@@ -872,6 +1122,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+	if (recurse_submodules) {
+		gitmodules_config();
+		compile_submodule_options(&opt, &pathspec, cached, untracked,
+					  opt_exclude, use_index,
+					  pattern_type_arg);
+	}
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -895,6 +1152,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (recurse_submodules && (!use_index || untracked || list.nr))
+		die(_("option not supported with --recurse-submodules."));
+
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
diff --git a/git.c b/git.c
index efa1059..a156efd 100644
--- a/git.c
+++ b/git.c
@@ -434,7 +434,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
-	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "grep", cmd_grep, RUN_SETUP_GENTLY | SUPPORT_SUPER_PREFIX },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
new file mode 100755
index 0000000..1019125
--- /dev/null
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='Test grep recurse-submodules feature
+
+This test verifies the recurse-submodules feature correctly greps across
+submodules.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup directory structure and submodule' '
+	echo "foobar" >a &&
+	mkdir b &&
+	echo "bar" >b/b &&
+	git add a b &&
+	git commit -m "add a and b" &&
+	git init submodule &&
+	echo "foobar" >submodule/a &&
+	git -C submodule add a &&
+	git -C submodule commit -m "add a" &&
+	git submodule add ./submodule &&
+	git commit -m "added submodule"
+'
+
+test_expect_success 'grep correctly finds patterns in a submodule' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and basic pathspecs' '
+	cat >expect <<-\EOF &&
+	submodule/a:foobar
+	EOF
+
+	git grep -e. --recurse-submodules -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and nested submodules' '
+	git init submodule/sub &&
+	echo "foobar" >submodule/sub/a &&
+	git -C submodule/sub add a &&
+	git -C submodule/sub commit -m "add a" &&
+	git -C submodule submodule add ./sub &&
+	git -C submodule add sub &&
+	git -C submodule commit -m "added sub" &&
+	git add submodule &&
+	git commit -m "updated submodule" &&
+
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	b/b:bar
+	EOF
+
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_incompatible_with_recurse_submodules ()
+{
+	test_expect_success "--recurse-submodules and $1 are incompatible" "
+		test_must_fail git grep -e. --recurse-submodules $1 2>actual &&
+		test_i18ngrep 'not supported with --recurse-submodules' actual
+	"
+}
+
+test_incompatible_with_recurse_submodules --untracked
+test_incompatible_with_recurse_submodules --no-index
+test_incompatible_with_recurse_submodules HEAD
+
+test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
                         ` (3 preceding siblings ...)
  2016-11-18 19:58       ` [PATCH v4 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-11-18 19:58       ` Brandon Williams
  2016-11-18 22:19         ` Junio C Hamano
  2016-11-18 19:58       ` [PATCH v4 6/6] grep: search history of moved submodules Brandon Williams
                         ` (2 subsequent siblings)
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 19:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD`

from:
  HEAD:file
  <commit sha1 of submodule>:sub/file

to:
  HEAD:file
  HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         | 13 +++++-
 builtin/grep.c                     | 83 +++++++++++++++++++++++++++++++++++---
 t/t7814-grep-recurse-submodules.sh | 75 +++++++++++++++++++++++++++++++++-
 3 files changed, 162 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..71f32f3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules]
+	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
 	Recursively search in each submodule that has been initialized and
-	checked out in the repository.
+	checked out in the repository.  When used in combination with the
+	<tree> option the prefix of all submodule output will be the name of
+	the parent project's <tree> object.
+
+--parent-basename <basename>::
+	For internal use only.  In order to produce uniform output with the
+	--recurse-submodules option, this option can be used to provide the
+	basename of a parent's <tree> object to a submodule so the submodule
+	can prefix its output with the parent's name rather than the SHA1 of
+	the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index cfafa15..9b795ee 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
@@ -534,19 +536,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
+	const char *end_of_base;
+	const char *name;
 	struct work_item *w = opt->output_priv;
 
+	end_of_base = strchr(gs->name, ':');
+	if (gs->identifier && end_of_base)
+		name = end_of_base + 1;
+	else
+		name = gs->name;
+
 	prepare_submodule_repo_env(&cp.env_array);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
 			 super_prefix ? super_prefix : "",
-			 gs->name);
+			 name);
 	argv_array_push(&cp.args, "grep");
 
+	/*
+	 * Add basename of parent project
+	 * When performing grep on a tree object the filename is prefixed
+	 * with the object's name: 'tree-name:filename'.  In order to
+	 * provide uniformity of output we want to pass the name of the
+	 * parent project's object name to the submodule so the submodule can
+	 * prefix its output with the parent's name and not its own SHA1.
+	 */
+	if (gs->identifier && end_of_base)
+		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
+				 (int) (end_of_base - gs->name),
+				 gs->name);
+
 	/* Add options */
-	for (i = 0; i < submodule_options.argc; i++)
+	for (i = 0; i < submodule_options.argc; i++) {
+		/*
+		 * If there is a tree identifier for the submodule, add the
+		 * rev after adding the submodule options but before the
+		 * pathspecs.  To do this we listen for the '--' and insert the
+		 * sha1 before pushing the '--' onto the child process argv
+		 * array.
+		 */
+		if (gs->identifier &&
+		    !strcmp("--", submodule_options.argv[i])) {
+			argv_array_push(&cp.args, sha1_to_hex(gs->identifier));
+		}
+
 		argv_array_push(&cp.args, submodule_options.argv[i]);
+	}
 
 	cp.git_cmd = 1;
 	cp.dir = gs->path;
@@ -671,12 +707,29 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	enum interesting match = entry_not_interesting;
 	struct name_entry entry;
 	int old_baselen = base->len;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		strbuf_addstr(&name, super_prefix);
+		name_base_len = name.len;
+	}
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
+			strbuf_setlen(&name, name_base_len);
+			strbuf_addstr(&name, base->buf + tn_len);
+
+			if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+				strbuf_addstr(&name, entry.path);
+				match = submodule_path_match(pathspec, name.buf,
+							     NULL);
+			} else {
+				match = tree_entry_interesting(&entry, &name,
+							       0, pathspec);
+			}
+
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
@@ -688,8 +741,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_sha1(opt, entry.oid->hash, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
-		}
-		else if (S_ISDIR(entry.mode)) {
+		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
@@ -705,12 +757,18 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
+		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+			hit |= grep_submodule(opt, entry.oid->hash, base->buf,
+					      base->buf + tn_len);
 		}
+
 		strbuf_setlen(base, old_baselen);
 
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -734,6 +792,10 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
 
+		/* Use parent's name as base when recursing submodules */
+		if (recurse_submodules && parent_basename)
+			name = parent_basename;
+
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
@@ -760,6 +822,12 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
+
+		/* load the gitmodules file for this rev */
+		if (recurse_submodules) {
+			submodule_free();
+			gitmodules_config_sha1(real_obj->oid.hash);
+		}
 		if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) {
 			hit = 1;
 			if (opt->status_only)
@@ -900,6 +968,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    N_("ignore files specified via '.gitignore'"), 1),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			 N_("recursivley search in each submodule")),
+		OPT_STRING(0, "parent-basename", &parent_basename,
+			   N_("basename"),
+			   N_("prepend parent project's basename to output")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -1152,7 +1223,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked || list.nr))
+	if (recurse_submodules && (!use_index || untracked))
 		die(_("option not supported with --recurse-submodules."));
 
 	if (!show_in_pager && !opt.status_only)
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 1019125..d1fd7ed 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -84,6 +84,80 @@ test_expect_success 'grep and multiple patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success 'basic grep tree' '
+	cat >expect <<-\EOF &&
+	HEAD:a:foobar
+	HEAD:b/b:bar
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^' '
+	cat >expect <<-\EOF &&
+	HEAD^:a:foobar
+	HEAD^:b/b:bar
+	HEAD^:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^^' '
+	cat >expect <<-\EOF &&
+	HEAD^^:a:foobar
+	HEAD^^:b/b:bar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep recurse submodule colon in name' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >"parent/fi:le" &&
+	git -C parent add "fi:le" &&
+	git -C parent commit -m "add fi:le" &&
+
+	git init "su:b" &&
+	test_when_finished "rm -rf su:b" &&
+	echo "foobar" >"su:b/fi:le" &&
+	git -C "su:b" add "fi:le" &&
+	git -C "su:b" commit -m "add fi:le" &&
+
+	git -C parent submodule add "../su:b" "su:b" &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	fi:le:foobar
+	su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD:fi:le:foobar
+	HEAD:su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
@@ -94,6 +168,5 @@ test_incompatible_with_recurse_submodules ()
 
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
-test_incompatible_with_recurse_submodules HEAD
 
 test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v4 6/6] grep: search history of moved submodules
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
                         ` (4 preceding siblings ...)
  2016-11-18 19:58       ` [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-11-18 19:58       ` Brandon Williams
  2016-11-18 20:10       ` [PATCH v4 0/6] recursively grep across submodules Stefan Beller
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 19:58 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.

This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit.  If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.

In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/grep.c                     | 20 +++++++++++++++++--
 t/t7814-grep-recurse-submodules.sh | 41 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9b795ee..747b0c3 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -547,6 +547,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 		name = gs->name;
 
 	prepare_submodule_repo_env(&cp.env_array);
+	argv_array_push(&cp.env_array, GIT_DIR_ENVIRONMENT);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
@@ -615,8 +616,23 @@ static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 {
 	if (!is_submodule_initialized(path))
 		return 0;
-	if (!is_submodule_populated(path))
-		return 0;
+	if (!is_submodule_populated(path)) {
+		/*
+		 * If searching history, check for the presense of the
+		 * submodule's gitdir before skipping the submodule.
+		 */
+		if (sha1) {
+			const struct submodule *sub =
+					submodule_from_path(null_sha1, path);
+			if (sub)
+				path = git_path("modules/%s", sub->name);
+
+			if(!(is_directory(path) && is_git_directory(path)))
+				return 0;
+		} else {
+			return 0;
+		}
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index d1fd7ed..7d66716 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -158,6 +158,47 @@ test_expect_success 'grep recurse submodule colon in name' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >parent/file &&
+	git -C parent add file &&
+	git -C parent commit -m "add file" &&
+
+	git init sub &&
+	test_when_finished "rm -rf sub" &&
+	echo "foobar" >sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+
+	git -C parent submodule add ../sub dir/sub &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	dir/sub/file:foobar
+	file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	git -C parent mv dir/sub sub-moved &&
+	git -C parent commit -m "moved submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub-moved/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD^:dir/sub/file:foobar
+	HEAD^:file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 0/6] recursively grep across submodules
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
                         ` (5 preceding siblings ...)
  2016-11-18 19:58       ` [PATCH v4 6/6] grep: search history of moved submodules Brandon Williams
@ 2016-11-18 20:10       ` Stefan Beller
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
  7 siblings, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-11-18 20:10 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano

On Fri, Nov 18, 2016 at 11:58 AM, Brandon Williams <bmwill@google.com> wrote:
> This revision of this series should address all of the problems brought up with
> v3.
>
> * indent output example in patch 5/6.
> * fix ':' in submodule names and add a test to verify.
> * cleanup some comments.
> * fixed tests to test the case where a submodule isn't at the root of a
>   repository.
> * always pass --threads=%d in order to limit threads to child proccess.
>
>
> -- interdiff based on 'bw/grep-recurse-submodules'

Thanks for interdiff!

I only skimmed the patches, but rather reviewed this interdiff in detail.
The series looks good to me, no nits!

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 3/6] grep: add submodules as a grep source type
  2016-11-18 19:58       ` [PATCH v4 3/6] grep: add submodules as a grep source type Brandon Williams
@ 2016-11-18 21:37         ` Junio C Hamano
  2016-11-18 22:56           ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-11-18 21:37 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> diff --git a/grep.h b/grep.h
> index 5856a23..267534c 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -161,6 +161,7 @@ struct grep_source {
>  		GREP_SOURCE_SHA1,
>  		GREP_SOURCE_FILE,
>  		GREP_SOURCE_BUF,
> +		GREP_SOURCE_SUBMODULE,
>  	} type;
>  	void *identifier;

Hmph, interesting.  We have avoided ending enum definition with a
comma, because it is only valid in more recent C than what we aim to
support.  This patch is not introducing a new problem, but just
doing the same thing that would have broken older compilers as the
existing code.  Perhaps those older compilers have died out?


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 4/6] grep: optionally recurse into submodules
  2016-11-18 19:58       ` [PATCH v4 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-11-18 21:48         ` Junio C Hamano
  2016-11-18 22:01         ` Junio C Hamano
  2016-11-18 22:14         ` Junio C Hamano
  2 siblings, 0 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-11-18 21:48 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> +static void compile_submodule_options(const struct grep_opt *opt,
> +				      const struct pathspec *pathspec,
> +				      int cached, int untracked,
> +				      int opt_exclude, int use_index,
> +				      int pattern_type_arg)
> +{
> +	struct grep_pat *pattern;
> +	int i;
> +
> +	if (recurse_submodules)
> +		argv_array_push(&submodule_options, "--recurse-submodules");
> +
> +	if (cached)
> +		argv_array_push(&submodule_options, "--cached");
> +...
> +
> +	/* Add Pathspecs */
> +	argv_array_push(&submodule_options, "--");
> +	for (i = 0; i < pathspec->nr; i++)
> +		argv_array_push(&submodule_options,
> +				pathspec->items[i].original);
> +}

When I do

    $ git grep --recurse-submodules pattern submodules/ lib/

where I have bunch of submodules in "submodules/" directory in the
top-level project, the top-level grep would try to find the pattern
in its own files in its "lib/" directory and then invoke sub-greps
in the submodule/a, submodule/b, etc. working trees.  

This passes the "submodules/" and "lib/" pathspec down to these
sub-greps.   These sub-greps in turn learn via --super-prefix where
they are in the super-project's context (e.g. "submodules/a/") to
adjust the given pathspec patterns, so everything cancels out
(e.g. they know "lib/" is totally outside of their area and their
files do not match with the pathspec element "lib/" at all).

Looking good.





^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 4/6] grep: optionally recurse into submodules
  2016-11-18 19:58       ` [PATCH v4 4/6] grep: optionally recurse into submodules Brandon Williams
  2016-11-18 21:48         ` Junio C Hamano
@ 2016-11-18 22:01         ` Junio C Hamano
  2016-11-18 22:14         ` Junio C Hamano
  2 siblings, 0 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-11-18 22:01 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> @@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
>  	if (opt->relative && opt->prefix_length) {
>  		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
>  		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
> +	} else if (super_prefix) {
> +		strbuf_add(&pathbuf, filename, tree_name_len);
> +		strbuf_addstr(&pathbuf, super_prefix);
> +		strbuf_addstr(&pathbuf, filename + tree_name_len);
>  	} else {
>  		strbuf_addstr(&pathbuf, filename);
>  	}
> @@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
>  {
>  	struct strbuf buf = STRBUF_INIT;
>  
> -	if (opt->relative && opt->prefix_length)
> +	if (opt->relative && opt->prefix_length) {
>  		quote_path_relative(filename, opt->prefix, &buf);
> -	else
> +	} else {
> +		if (super_prefix)
> +			strbuf_addstr(&buf, super_prefix);
>  		strbuf_addstr(&buf, filename);
> +	}

The above two hunks both assume that the super_prefix option is
usable only from the top-level (i.e. opt->prefix_length == 0) and
also "--no-full-name" (which is the default) cannot be used.  The
only invoker that runs "grep" with "--super-prefix" is the "grep"
that runs in the superproject, and it will only run us from the
top-level of the working tree, so the former assumption is OK.  

It is a bit unclear to me how the "relative" and "--recurse-submodules"
would interact with each other, though.




^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 4/6] grep: optionally recurse into submodules
  2016-11-18 19:58       ` [PATCH v4 4/6] grep: optionally recurse into submodules Brandon Williams
  2016-11-18 21:48         ` Junio C Hamano
  2016-11-18 22:01         ` Junio C Hamano
@ 2016-11-18 22:14         ` Junio C Hamano
  2016-11-18 22:58           ` Brandon Williams
  2 siblings, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-11-18 22:14 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
> +		      int cached)
>  {
>  	int hit = 0;
>  	int nr;
> +	struct strbuf name = STRBUF_INIT;
> +	int name_base_len = 0;
> +	if (super_prefix) {
> +		name_base_len = strlen(super_prefix);
> +		strbuf_addstr(&name, super_prefix);
> +	}
> +
>  	read_cache();
>  
>  	for (nr = 0; nr < active_nr; nr++) {
>  		const struct cache_entry *ce = active_cache[nr];
> -		if (!S_ISREG(ce->ce_mode))
> -			continue;
> -		if (!ce_path_match(ce, pathspec, NULL))
> -			continue;
> -		/*
> -		 * If CE_VALID is on, we assume worktree file and its cache entry
> -		 * are identical, even if worktree file has been modified, so use
> -		 * cache version instead
> -		 */
> -		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
> -			if (ce_stage(ce) || ce_intent_to_add(ce))
> -				continue;
> -			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
> -					 ce->name);
> +		strbuf_setlen(&name, name_base_len);
> +		strbuf_addstr(&name, ce->name);
> +
> +		if (S_ISREG(ce->ce_mode) &&
> +		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
> +				   S_ISDIR(ce->ce_mode) ||
> +				   S_ISGITLINK(ce->ce_mode))) {
> +			/*
> +			 * If CE_VALID is on, we assume worktree file and its
> +			 * cache entry are identical, even if worktree file has
> +			 * been modified, so use cache version instead
> +			 */
> +			if (cached || (ce->ce_flags & CE_VALID) ||
> +			    ce_skip_worktree(ce)) {
> +				if (ce_stage(ce) || ce_intent_to_add(ce))
> +					continue;
> +				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
> +						 0, ce->name);
> +			} else {
> +				hit |= grep_file(opt, ce->name);
> +			}
> +		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> +			   submodule_path_match(pathspec, name.buf, NULL)) {
> +			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
>  		}
> -		else
> -			hit |= grep_file(opt, ce->name);

We used to reject anything other than S_ISREG() upfront in the loop,
and then either did grep_sha1() from the cache or from grep_file()
from the working tree.

Now, the guard upfront is removed, and we do the same in the first
part of this if/elseif.  The elseif part deals with a submodule that
could match the pathspec.

Don't we need a final else clause that would skip the remainder of
this loop?  What would happen to a S_ISREG() path that does *NOT*
match the given pathspec?  We used to just "continue", but it seems
to me that such a path will fall through the above if/elseif in the
new code.  Would that be a problem?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-18 19:58       ` [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-11-18 22:19         ` Junio C Hamano
  2016-11-18 22:52           ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-11-18 22:19 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> @@ -671,12 +707,29 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
>  	enum interesting match = entry_not_interesting;
>  	struct name_entry entry;
>  	int old_baselen = base->len;
> +	struct strbuf name = STRBUF_INIT;
> +	int name_base_len = 0;
> +	if (super_prefix) {
> +		strbuf_addstr(&name, super_prefix);
> +		name_base_len = name.len;
> +	}
>  
>  	while (tree_entry(tree, &entry)) {
>  		int te_len = tree_entry_len(&entry);
>  
>  		if (match != all_entries_interesting) {
> -			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
> +			strbuf_setlen(&name, name_base_len);
> +			strbuf_addstr(&name, base->buf + tn_len);
> +
> +			if (recurse_submodules && S_ISGITLINK(entry.mode)) {
> +				strbuf_addstr(&name, entry.path);
> +				match = submodule_path_match(pathspec, name.buf,
> +							     NULL);

The vocabulary from submodule_path_match() returns is the same as
that of do_match_pathspec() and match_pathspec_item() which is
MATCHED_{EXACTLY,FNMATCH,RECURSIVELY}, which is different from the
vocabulary of the variable "match" which is "enum interesting" that
is used by the tree-walk infrastructure.

I doubt they are compatible to be usable like this.  Am I missing
something?

> +			} else {
> +				match = tree_entry_interesting(&entry, &name,
> +							       0, pathspec);
> +			}
> +
>  			if (match == all_entries_not_interesting)
>  				break;
>  			if (match == entry_not_interesting)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-18 22:19         ` Junio C Hamano
@ 2016-11-18 22:52           ` Brandon Williams
  2016-11-21 18:14             ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 22:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, jonathantanmy

On 11/18, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > @@ -671,12 +707,29 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
> >  	enum interesting match = entry_not_interesting;
> >  	struct name_entry entry;
> >  	int old_baselen = base->len;
> > +	struct strbuf name = STRBUF_INIT;
> > +	int name_base_len = 0;
> > +	if (super_prefix) {
> > +		strbuf_addstr(&name, super_prefix);
> > +		name_base_len = name.len;
> > +	}
> >  
> >  	while (tree_entry(tree, &entry)) {
> >  		int te_len = tree_entry_len(&entry);
> >  
> >  		if (match != all_entries_interesting) {
> > -			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
> > +			strbuf_setlen(&name, name_base_len);
> > +			strbuf_addstr(&name, base->buf + tn_len);
> > +
> > +			if (recurse_submodules && S_ISGITLINK(entry.mode)) {
> > +				strbuf_addstr(&name, entry.path);
> > +				match = submodule_path_match(pathspec, name.buf,
> > +							     NULL);
> 
> The vocabulary from submodule_path_match() returns is the same as
> that of do_match_pathspec() and match_pathspec_item() which is
> MATCHED_{EXACTLY,FNMATCH,RECURSIVELY}, which is different from the
> vocabulary of the variable "match" which is "enum interesting" that
> is used by the tree-walk infrastructure.
> 
> I doubt they are compatible to be usable like this.  Am I missing
> something?

I think i initially must have thought it would work out, but looking
back at this I can clearly see that they aren't 100% compatible...

It slightly feels odd to me that we have so many different means for
checking pathspecs, all of which pretty much duplicate some of the
functionality of the other.  Is there any reason there are these two
different code paths?  Do we want them to remain separate or have them
be unified at some point?

Also, in order to use the tree_entry_interesting code it looks like I'll
either have to pipe through a flag saying 'yes i want to match against
submodules' like I did for the other pathspec codepath.  Either that or
add functionality to perform wildmatching against partial matches (ie
directories and submodules) since currently the tree_entry_interesting
code path just punts and says 'well say it matches for now and check
again later' whenever it runs into a directory (I can't really make it
do that for submodules without a flag of somesort as tests could break).
Or maybe both?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 3/6] grep: add submodules as a grep source type
  2016-11-18 21:37         ` Junio C Hamano
@ 2016-11-18 22:56           ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 22:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, jonathantanmy

On 11/18, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > diff --git a/grep.h b/grep.h
> > index 5856a23..267534c 100644
> > --- a/grep.h
> > +++ b/grep.h
> > @@ -161,6 +161,7 @@ struct grep_source {
> >  		GREP_SOURCE_SHA1,
> >  		GREP_SOURCE_FILE,
> >  		GREP_SOURCE_BUF,
> > +		GREP_SOURCE_SUBMODULE,
> >  	} type;
> >  	void *identifier;
> 
> Hmph, interesting.  We have avoided ending enum definition with a
> comma, because it is only valid in more recent C than what we aim to
> support.  This patch is not introducing a new problem, but just
> doing the same thing that would have broken older compilers as the
> existing code.  Perhaps those older compilers have died out?

Perhaps it is time to move to a new C standard! :P

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 4/6] grep: optionally recurse into submodules
  2016-11-18 22:14         ` Junio C Hamano
@ 2016-11-18 22:58           ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-18 22:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, jonathantanmy

On 11/18, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
> > +		      int cached)
> >  {
> >  	int hit = 0;
> >  	int nr;
> > +	struct strbuf name = STRBUF_INIT;
> > +	int name_base_len = 0;
> > +	if (super_prefix) {
> > +		name_base_len = strlen(super_prefix);
> > +		strbuf_addstr(&name, super_prefix);
> > +	}
> > +
> >  	read_cache();
> >  
> >  	for (nr = 0; nr < active_nr; nr++) {
> >  		const struct cache_entry *ce = active_cache[nr];
> > -		if (!S_ISREG(ce->ce_mode))
> > -			continue;
> > -		if (!ce_path_match(ce, pathspec, NULL))
> > -			continue;
> > -		/*
> > -		 * If CE_VALID is on, we assume worktree file and its cache entry
> > -		 * are identical, even if worktree file has been modified, so use
> > -		 * cache version instead
> > -		 */
> > -		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
> > -			if (ce_stage(ce) || ce_intent_to_add(ce))
> > -				continue;
> > -			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
> > -					 ce->name);
> > +		strbuf_setlen(&name, name_base_len);
> > +		strbuf_addstr(&name, ce->name);
> > +
> > +		if (S_ISREG(ce->ce_mode) &&
> > +		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
> > +				   S_ISDIR(ce->ce_mode) ||
> > +				   S_ISGITLINK(ce->ce_mode))) {
> > +			/*
> > +			 * If CE_VALID is on, we assume worktree file and its
> > +			 * cache entry are identical, even if worktree file has
> > +			 * been modified, so use cache version instead
> > +			 */
> > +			if (cached || (ce->ce_flags & CE_VALID) ||
> > +			    ce_skip_worktree(ce)) {
> > +				if (ce_stage(ce) || ce_intent_to_add(ce))
> > +					continue;
> > +				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
> > +						 0, ce->name);
> > +			} else {
> > +				hit |= grep_file(opt, ce->name);
> > +			}
> > +		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> > +			   submodule_path_match(pathspec, name.buf, NULL)) {
> > +			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
> >  		}
> > -		else
> > -			hit |= grep_file(opt, ce->name);
> 
> We used to reject anything other than S_ISREG() upfront in the loop,
> and then either did grep_sha1() from the cache or from grep_file()
> from the working tree.
> 
> Now, the guard upfront is removed, and we do the same in the first
> part of this if/elseif.  The elseif part deals with a submodule that
> could match the pathspec.
> 
> Don't we need a final else clause that would skip the remainder of
> this loop?  What would happen to a S_ISREG() path that does *NOT*
> match the given pathspec?  We used to just "continue", but it seems
> to me that such a path will fall through the above if/elseif in the
> new code.  Would that be a problem?

It may be (Though I didn't see any issues when running tests).  It would
be easy enough to add an 'else continue;' at the end though.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-18 22:52           ` Brandon Williams
@ 2016-11-21 18:14             ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-21 18:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, jonathantanmy

On 11/18, Brandon Williams wrote:

> Also, in order to use the tree_entry_interesting code it looks like I'll
> either have to pipe through a flag saying 'yes i want to match against
> submodules' like I did for the other pathspec codepath.  Either that or
> add functionality to perform wildmatching against partial matches (ie
> directories and submodules) since currently the tree_entry_interesting
> code path just punts and says 'well say it matches for now and check
> again later' whenever it runs into a directory (I can't really make it
> do that for submodules without a flag of somesort as tests could break).
> Or maybe both?

Looks like my initial assumption was incorrect, I just needed to be
smarter than punting when running into a submodule.  Should be able to
just ensure that the entry matches up to at least the first wildcard
character before punting and all should be good.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v5 0/6] recursively grep across submodules
  2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
                         ` (6 preceding siblings ...)
  2016-11-18 20:10       ` [PATCH v4 0/6] recursively grep across submodules Stefan Beller
@ 2016-11-22 18:46       ` Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
                           ` (6 more replies)
  7 siblings, 7 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 18:46 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Major change in v5 is to use tree_is_interesting api instead of the vanilla
pathspec code for submodules.  This is to fix the issue in the last seires
where I mix the two types.  More tests were also added to ensure that the
changes to the pathspec code functioned properly.

Brandon Williams (6):
  submodules: add helper functions to determine presence of submodules
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: search history of moved submodules

 Documentation/git-grep.txt         |  14 ++
 builtin/grep.c                     | 386 ++++++++++++++++++++++++++++++++++---
 cache.h                            |   2 +
 config.c                           |   8 +-
 git.c                              |   2 +-
 grep.c                             |  16 +-
 grep.h                             |   1 +
 submodule-config.c                 |   6 +-
 submodule-config.h                 |   3 +
 submodule.c                        |  50 +++++
 submodule.h                        |   3 +
 t/t7814-grep-recurse-submodules.sh | 241 +++++++++++++++++++++++
 tree-walk.c                        |  28 +++
 13 files changed, 729 insertions(+), 31 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

-- interdiff based on 'bw/grep-recurse-submodules'

diff --git a/builtin/grep.c b/builtin/grep.c
index 052f605..2c727ef 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -698,6 +698,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
 		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
 			   submodule_path_match(pathspec, name.buf, NULL)) {
 			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
+		} else {
+			continue;
 		}
 
 		if (ce_stage(ce)) {
@@ -734,17 +736,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			strbuf_setlen(&name, name_base_len);
 			strbuf_addstr(&name, base->buf + tn_len);
-
-			if (recurse_submodules && S_ISGITLINK(entry.mode)) {
-				strbuf_addstr(&name, entry.path);
-				match = submodule_path_match(pathspec, name.buf,
-							     NULL);
-			} else {
-				match = tree_entry_interesting(&entry, &name,
-							       0, pathspec);
-			}
+			match = tree_entry_interesting(&entry, &name,
+						       0, pathspec);
+			strbuf_setlen(&name, name_base_len);
 
 			if (match == all_entries_not_interesting)
 				break;
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 7d66716..0507771 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -127,6 +127,34 @@ test_expect_success 'grep tree and pathspecs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodule*a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul?/a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul*/sub/a" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'grep recurse submodule colon in name' '
 	git init parent &&
 	test_when_finished "rm -rf parent" &&
diff --git a/tree-walk.c b/tree-walk.c
index 828f435..ff77605 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -1004,6 +1004,19 @@ static enum interesting do_match(const struct name_entry *entry,
 				 */
 				if (ps->recursive && S_ISDIR(entry->mode))
 					return entry_interesting;
+
+				/*
+				 * When matching against submodules with
+				 * wildcard characters, ensure that the entry
+				 * at least matches up to the first wild
+				 * character.  More accurate matching can then
+				 * be performed in the submodule itself.
+				 */
+				if (ps->recursive && S_ISGITLINK(entry->mode) &&
+				    !ps_strncmp(item, match + baselen,
+						entry->path,
+						item->nowildcard_len - baselen))
+					return entry_interesting;
 			}
 
 			continue;
@@ -1040,6 +1053,21 @@ static enum interesting do_match(const struct name_entry *entry,
 			strbuf_setlen(base, base_offset + baselen);
 			return entry_interesting;
 		}
+
+		/*
+		 * When matching against submodules with
+		 * wildcard characters, ensure that the entry
+		 * at least matches up to the first wild
+		 * character.  More accurate matching can then
+		 * be performed in the submodule itself.
+		 */
+		if (ps->recursive && S_ISGITLINK(entry->mode) &&
+		    !ps_strncmp(item, match, base->buf + base_offset,
+				item->nowildcard_len)) {
+			strbuf_setlen(base, base_offset + baselen);
+			return entry_interesting;
+		}
+
 		strbuf_setlen(base, base_offset + baselen);
 
 		/*

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 1/6] submodules: add helper functions to determine presence of submodules
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
@ 2016-11-22 18:46         ` Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 18:46 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Add two helper functions to submodules.c.
`is_submodule_initialized()` checks if a submodule has been initialized
at a given path and `is_submodule_populated()` check if a submodule
has been checked out at a given path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 38 ++++++++++++++++++++++++++++++++++++++
 submodule.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/submodule.c b/submodule.c
index 6f7d883..f5107f0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,44 @@ void gitmodules_config(void)
 	}
 }
 
+/*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+	int ret = 0;
+	const struct submodule *module = NULL;
+
+	module = submodule_from_path(null_sha1, path);
+
+	if (module) {
+		char *key = xstrfmt("submodule.%s.url", module->name);
+		char *value = NULL;
+
+		ret = !git_config_get_string(key, &value);
+
+		free(value);
+		free(key);
+	}
+
+	return ret;
+}
+
+/*
+ * Determine if a submodule has been populated at a given 'path'
+ */
+int is_submodule_populated(const char *path)
+{
+	int ret = 0;
+	char *gitdir = xstrfmt("%s/.git", path);
+
+	if (resolve_gitdir(gitdir))
+		ret = 1;
+
+	free(gitdir);
+	return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a..6ec5f2f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
+extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 2/6] submodules: load gitmodules file from commit sha1
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-11-22 18:46         ` Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 3/6] grep: add submodules as a grep source type Brandon Williams
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 18:46 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 cache.h            |  2 ++
 config.c           |  8 ++++----
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c        | 12 ++++++++++++
 submodule.h        |  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 1be6526..559a461 100644
--- a/cache.h
+++ b/cache.h
@@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+				     const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb..4d78e72 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	return do_config_from(&top, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-				     const char *name,
-				     const unsigned char *sha1,
-				     void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+			      const char *name,
+			      const unsigned char *sha1,
+			      void *data)
 {
 	enum object_type type;
 	char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085b..8b9a2ef 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
-				      unsigned char *gitmodules_sha1,
-				      struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+			       unsigned char *gitmodules_sha1,
+			       struct strbuf *rev)
 {
 	int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..78584ba 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+				      unsigned char *gitmodules_sha1,
+				      struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index f5107f0..062e58b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
 	}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned char sha1[20];
+
+	if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
+		git_config_from_blob_sha1(submodule_config, rev.buf,
+					  sha1, NULL);
+	}
+	strbuf_release(&rev);
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index 6ec5f2f..9203d89 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 3/6] grep: add submodules as a grep source type
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-11-22 18:46         ` Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 4/6] grep: optionally recurse into submodules Brandon Williams
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 18:46 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 grep.c | 16 +++++++++++++++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35..0dbdc1d 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	case GREP_SOURCE_FILE:
 		gs->identifier = xstrdup(identifier);
 		break;
+	case GREP_SOURCE_SUBMODULE:
+		if (!identifier) {
+			gs->identifier = NULL;
+			break;
+		}
+		/*
+		 * FALL THROUGH
+		 * If the identifier is non-NULL (in the submodule case) it
+		 * will be a SHA1 that needs to be copied.
+		 */
 	case GREP_SOURCE_SHA1:
 		gs->identifier = xmalloc(20);
 		hashcpy(gs->identifier, identifier);
 		break;
 	case GREP_SOURCE_BUF:
 		gs->identifier = NULL;
+		break;
 	}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
 	switch (gs->type) {
 	case GREP_SOURCE_FILE:
 	case GREP_SOURCE_SHA1:
+	case GREP_SOURCE_SUBMODULE:
 		free(gs->buf);
 		gs->buf = NULL;
 		gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
 		return grep_source_load_sha1(gs);
 	case GREP_SOURCE_BUF:
 		return gs->buf ? 0 : -1;
+	case GREP_SOURCE_SUBMODULE:
+		break;
 	}
-	die("BUG: invalid grep_source type");
+	die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23..267534c 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
 		GREP_SOURCE_SHA1,
 		GREP_SOURCE_FILE,
 		GREP_SOURCE_BUF,
+		GREP_SOURCE_SUBMODULE,
 	} type;
 	void *identifier;
 
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 4/6] grep: optionally recurse into submodules
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
                           ` (2 preceding siblings ...)
  2016-11-22 18:46         ` [PATCH v5 3/6] grep: add submodules as a grep source type Brandon Williams
@ 2016-11-22 18:46         ` Brandon Williams
  2016-11-22 18:46         ` [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 18:46 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |   5 +
 builtin/grep.c                     | 300 ++++++++++++++++++++++++++++++++++---
 git.c                              |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
 4 files changed, 386 insertions(+), 20 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e..17aa1ba 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
+	   [--recurse-submodules]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -88,6 +89,10 @@ OPTIONS
 	mechanism.  Only useful when searching files in the current directory
 	with `--no-index`.
 
+--recurse-submodules::
+	Recursively search in each submodule that has been initialized and
+	checked out in the repository.
+
 -a::
 --text::
 	Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6a..dca0be6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
 	NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
 			break;
 
 		opt->output_priv = w;
-		hit |= grep_source(opt, &w->source);
+		if (w->source.type == GREP_SOURCE_SUBMODULE)
+			hit |= grep_submodule_launch(opt, &w->source);
+		else
+			hit |= grep_source(opt, &w->source);
 		grep_source_clear_data(&w->source);
 		work_done(w);
 	}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
+	} else if (super_prefix) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
+		strbuf_addstr(&pathbuf, super_prefix);
+		strbuf_addstr(&pathbuf, filename + tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 {
 	struct strbuf buf = STRBUF_INIT;
 
-	if (opt->relative && opt->prefix_length)
+	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename, opt->prefix, &buf);
-	else
+	} else {
+		if (super_prefix)
+			strbuf_addstr(&buf, super_prefix);
 		strbuf_addstr(&buf, filename);
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
@@ -378,31 +396,260 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 		exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+				      const struct pathspec *pathspec,
+				      int cached, int untracked,
+				      int opt_exclude, int use_index,
+				      int pattern_type_arg)
+{
+	struct grep_pat *pattern;
+	int i;
+
+	if (recurse_submodules)
+		argv_array_push(&submodule_options, "--recurse-submodules");
+
+	if (cached)
+		argv_array_push(&submodule_options, "--cached");
+	if (!use_index)
+		argv_array_push(&submodule_options, "--no-index");
+	if (untracked)
+		argv_array_push(&submodule_options, "--untracked");
+	if (opt_exclude > 0)
+		argv_array_push(&submodule_options, "--exclude-standard");
+
+	if (opt->invert)
+		argv_array_push(&submodule_options, "-v");
+	if (opt->ignore_case)
+		argv_array_push(&submodule_options, "-i");
+	if (opt->word_regexp)
+		argv_array_push(&submodule_options, "-w");
+	switch (opt->binary) {
+	case GREP_BINARY_NOMATCH:
+		argv_array_push(&submodule_options, "-I");
+		break;
+	case GREP_BINARY_TEXT:
+		argv_array_push(&submodule_options, "-a");
+		break;
+	default:
+		break;
+	}
+	if (opt->allow_textconv)
+		argv_array_push(&submodule_options, "--textconv");
+	if (opt->max_depth != -1)
+		argv_array_pushf(&submodule_options, "--max-depth=%d",
+				 opt->max_depth);
+	if (opt->linenum)
+		argv_array_push(&submodule_options, "-n");
+	if (!opt->pathname)
+		argv_array_push(&submodule_options, "-h");
+	if (!opt->relative)
+		argv_array_push(&submodule_options, "--full-name");
+	if (opt->name_only)
+		argv_array_push(&submodule_options, "-l");
+	if (opt->unmatch_name_only)
+		argv_array_push(&submodule_options, "-L");
+	if (opt->null_following_name)
+		argv_array_push(&submodule_options, "-z");
+	if (opt->count)
+		argv_array_push(&submodule_options, "-c");
+	if (opt->file_break)
+		argv_array_push(&submodule_options, "--break");
+	if (opt->heading)
+		argv_array_push(&submodule_options, "--heading");
+	if (opt->pre_context)
+		argv_array_pushf(&submodule_options, "--before-context=%d",
+				 opt->pre_context);
+	if (opt->post_context)
+		argv_array_pushf(&submodule_options, "--after-context=%d",
+				 opt->post_context);
+	if (opt->funcname)
+		argv_array_push(&submodule_options, "-p");
+	if (opt->funcbody)
+		argv_array_push(&submodule_options, "-W");
+	if (opt->all_match)
+		argv_array_push(&submodule_options, "--all-match");
+	if (opt->debug)
+		argv_array_push(&submodule_options, "--debug");
+	if (opt->status_only)
+		argv_array_push(&submodule_options, "-q");
+
+	switch (pattern_type_arg) {
+	case GREP_PATTERN_TYPE_BRE:
+		argv_array_push(&submodule_options, "-G");
+		break;
+	case GREP_PATTERN_TYPE_ERE:
+		argv_array_push(&submodule_options, "-E");
+		break;
+	case GREP_PATTERN_TYPE_FIXED:
+		argv_array_push(&submodule_options, "-F");
+		break;
+	case GREP_PATTERN_TYPE_PCRE:
+		argv_array_push(&submodule_options, "-P");
+		break;
+	case GREP_PATTERN_TYPE_UNSPECIFIED:
+		break;
+	}
+
+	for (pattern = opt->pattern_list; pattern != NULL;
+	     pattern = pattern->next) {
+		switch (pattern->token) {
+		case GREP_PATTERN:
+			argv_array_pushf(&submodule_options, "-e%s",
+					 pattern->pattern);
+			break;
+		case GREP_AND:
+		case GREP_OPEN_PAREN:
+		case GREP_CLOSE_PAREN:
+		case GREP_NOT:
+		case GREP_OR:
+			argv_array_push(&submodule_options, pattern->pattern);
+			break;
+		/* BODY and HEAD are not used by git-grep */
+		case GREP_PATTERN_BODY:
+		case GREP_PATTERN_HEAD:
+			break;
+		}
+	}
+
+	/*
+	 * Limit number of threads for child process to use.
+	 * This is to prevent potential fork-bomb behavior of git-grep as each
+	 * submodule process has its own thread pool.
+	 */
+	argv_array_pushf(&submodule_options, "--threads=%d",
+			 (num_threads + 1) / 2);
+
+	/* Add Pathspecs */
+	argv_array_push(&submodule_options, "--");
+	for (i = 0; i < pathspec->nr; i++)
+		argv_array_push(&submodule_options,
+				pathspec->items[i].original);
+}
+
+/*
+ * Launch child process to grep contents of a submodule
+ */
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int status, i;
+	struct work_item *w = opt->output_priv;
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	/* Add super prefix */
+	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
+			 super_prefix ? super_prefix : "",
+			 gs->name);
+	argv_array_push(&cp.args, "grep");
+
+	/* Add options */
+	for (i = 0; i < submodule_options.argc; i++)
+		argv_array_push(&cp.args, submodule_options.argv[i]);
+
+	cp.git_cmd = 1;
+	cp.dir = gs->path;
+
+	/*
+	 * Capture output to output buffer and check the return code from the
+	 * child process.  A '0' indicates a hit, a '1' indicates no hit and
+	 * anything else is an error.
+	 */
+	status = capture_command(&cp, &w->out, 0);
+	if (status && (status != 1)) {
+		/* flush the buffer */
+		write_or_die(1, w->out.buf, w->out.len);
+		die("process for submodule '%s' failed with exit code: %d",
+		    gs->name, status);
+	}
+
+	/* invert the return code to make a hit equal to 1 */
+	return !status;
+}
+
+/*
+ * Prep grep structures for a submodule grep
+ * sha1: the sha1 of the submodule or NULL if using the working tree
+ * filename: name of the submodule including tree name of parent
+ * path: location of the submodule
+ */
+static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
+			  const char *filename, const char *path)
+{
+	if (!is_submodule_initialized(path))
+		return 0;
+	if (!is_submodule_populated(path))
+		return 0;
+
+#ifndef NO_PTHREADS
+	if (num_threads) {
+		add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
+		return 0;
+	} else
+#endif
+	{
+		struct work_item w;
+		int hit;
+
+		grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
+				 filename, path, sha1);
+		strbuf_init(&w.out, 0);
+		opt->output_priv = &w;
+		hit = grep_submodule_launch(opt, &w.source);
+
+		write_or_die(1, w.out.buf, w.out.len);
+
+		grep_source_clear(&w.source);
+		strbuf_release(&w.out);
+		return hit;
+	}
+}
+
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
+		      int cached)
 {
 	int hit = 0;
 	int nr;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
+
 	read_cache();
 
 	for (nr = 0; nr < active_nr; nr++) {
 		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!ce_path_match(ce, pathspec, NULL))
+		strbuf_setlen(&name, name_base_len);
+		strbuf_addstr(&name, ce->name);
+
+		if (S_ISREG(ce->ce_mode) &&
+		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
+				   S_ISDIR(ce->ce_mode) ||
+				   S_ISGITLINK(ce->ce_mode))) {
+			/*
+			 * If CE_VALID is on, we assume worktree file and its
+			 * cache entry are identical, even if worktree file has
+			 * been modified, so use cache version instead
+			 */
+			if (cached || (ce->ce_flags & CE_VALID) ||
+			    ce_skip_worktree(ce)) {
+				if (ce_stage(ce) || ce_intent_to_add(ce))
+					continue;
+				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
+						 0, ce->name);
+			} else {
+				hit |= grep_file(opt, ce->name);
+			}
+		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+			   submodule_path_match(pathspec, name.buf, NULL)) {
+			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
+		} else {
 			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce) || ce_intent_to_add(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
-					 ce->name);
 		}
-		else
-			hit |= grep_file(opt, ce->name);
+
 		if (ce_stage(ce)) {
 			do {
 				nr++;
@@ -413,6 +660,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -651,6 +900,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("search in both tracked and untracked files")),
 		OPT_SET_INT(0, "exclude-standard", &opt_exclude,
 			    N_("ignore files specified via '.gitignore'"), 1),
+		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
+			 N_("recursivley search in each submodule")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -755,6 +1006,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	init_grep_defaults();
 	git_config(grep_cmd_config, NULL);
 	grep_init(&opt, prefix);
+	super_prefix = get_super_prefix();
 
 	/*
 	 * If there is no -- then the paths must exist in the working
@@ -872,6 +1124,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+	if (recurse_submodules) {
+		gitmodules_config();
+		compile_submodule_options(&opt, &pathspec, cached, untracked,
+					  opt_exclude, use_index,
+					  pattern_type_arg);
+	}
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -895,6 +1154,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (recurse_submodules && (!use_index || untracked || list.nr))
+		die(_("option not supported with --recurse-submodules."));
+
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
diff --git a/git.c b/git.c
index efa1059..a156efd 100644
--- a/git.c
+++ b/git.c
@@ -434,7 +434,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
-	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "grep", cmd_grep, RUN_SETUP_GENTLY | SUPPORT_SUPER_PREFIX },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
new file mode 100755
index 0000000..1019125
--- /dev/null
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='Test grep recurse-submodules feature
+
+This test verifies the recurse-submodules feature correctly greps across
+submodules.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup directory structure and submodule' '
+	echo "foobar" >a &&
+	mkdir b &&
+	echo "bar" >b/b &&
+	git add a b &&
+	git commit -m "add a and b" &&
+	git init submodule &&
+	echo "foobar" >submodule/a &&
+	git -C submodule add a &&
+	git -C submodule commit -m "add a" &&
+	git submodule add ./submodule &&
+	git commit -m "added submodule"
+'
+
+test_expect_success 'grep correctly finds patterns in a submodule' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and basic pathspecs' '
+	cat >expect <<-\EOF &&
+	submodule/a:foobar
+	EOF
+
+	git grep -e. --recurse-submodules -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and nested submodules' '
+	git init submodule/sub &&
+	echo "foobar" >submodule/sub/a &&
+	git -C submodule/sub add a &&
+	git -C submodule/sub commit -m "add a" &&
+	git -C submodule submodule add ./sub &&
+	git -C submodule add sub &&
+	git -C submodule commit -m "added sub" &&
+	git add submodule &&
+	git commit -m "updated submodule" &&
+
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	b/b:bar
+	EOF
+
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_incompatible_with_recurse_submodules ()
+{
+	test_expect_success "--recurse-submodules and $1 are incompatible" "
+		test_must_fail git grep -e. --recurse-submodules $1 2>actual &&
+		test_i18ngrep 'not supported with --recurse-submodules' actual
+	"
+}
+
+test_incompatible_with_recurse_submodules --untracked
+test_incompatible_with_recurse_submodules --no-index
+test_incompatible_with_recurse_submodules HEAD
+
+test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
                           ` (3 preceding siblings ...)
  2016-11-22 18:46         ` [PATCH v5 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-11-22 18:46         ` Brandon Williams
  2016-11-22 22:59           ` Junio C Hamano
  2016-11-22 18:46         ` [PATCH v5 6/6] grep: search history of moved submodules Brandon Williams
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
  6 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 18:46 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD`

from:
  HEAD:file
  <commit sha1 of submodule>:sub/file

to:
  HEAD:file
  HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |  13 ++++-
 builtin/grep.c                     |  76 ++++++++++++++++++++++++---
 t/t7814-grep-recurse-submodules.sh | 103 ++++++++++++++++++++++++++++++++++++-
 tree-walk.c                        |  28 ++++++++++
 4 files changed, 211 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..71f32f3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules]
+	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
 	Recursively search in each submodule that has been initialized and
-	checked out in the repository.
+	checked out in the repository.  When used in combination with the
+	<tree> option the prefix of all submodule output will be the name of
+	the parent project's <tree> object.
+
+--parent-basename <basename>::
+	For internal use only.  In order to produce uniform output with the
+	--recurse-submodules option, this option can be used to provide the
+	basename of a parent's <tree> object to a submodule so the submodule
+	can prefix its output with the parent's name rather than the SHA1 of
+	the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index dca0be6..5918a26 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
@@ -534,19 +536,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
+	const char *end_of_base;
+	const char *name;
 	struct work_item *w = opt->output_priv;
 
+	end_of_base = strchr(gs->name, ':');
+	if (gs->identifier && end_of_base)
+		name = end_of_base + 1;
+	else
+		name = gs->name;
+
 	prepare_submodule_repo_env(&cp.env_array);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
 			 super_prefix ? super_prefix : "",
-			 gs->name);
+			 name);
 	argv_array_push(&cp.args, "grep");
 
+	/*
+	 * Add basename of parent project
+	 * When performing grep on a tree object the filename is prefixed
+	 * with the object's name: 'tree-name:filename'.  In order to
+	 * provide uniformity of output we want to pass the name of the
+	 * parent project's object name to the submodule so the submodule can
+	 * prefix its output with the parent's name and not its own SHA1.
+	 */
+	if (gs->identifier && end_of_base)
+		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
+				 (int) (end_of_base - gs->name),
+				 gs->name);
+
 	/* Add options */
-	for (i = 0; i < submodule_options.argc; i++)
+	for (i = 0; i < submodule_options.argc; i++) {
+		/*
+		 * If there is a tree identifier for the submodule, add the
+		 * rev after adding the submodule options but before the
+		 * pathspecs.  To do this we listen for the '--' and insert the
+		 * sha1 before pushing the '--' onto the child process argv
+		 * array.
+		 */
+		if (gs->identifier &&
+		    !strcmp("--", submodule_options.argv[i])) {
+			argv_array_push(&cp.args, sha1_to_hex(gs->identifier));
+		}
+
 		argv_array_push(&cp.args, submodule_options.argv[i]);
+	}
 
 	cp.git_cmd = 1;
 	cp.dir = gs->path;
@@ -673,12 +709,22 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	enum interesting match = entry_not_interesting;
 	struct name_entry entry;
 	int old_baselen = base->len;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		strbuf_addstr(&name, super_prefix);
+		name_base_len = name.len;
+	}
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
+			strbuf_addstr(&name, base->buf + tn_len);
+			match = tree_entry_interesting(&entry, &name,
+						       0, pathspec);
+			strbuf_setlen(&name, name_base_len);
+
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
@@ -690,8 +736,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_sha1(opt, entry.oid->hash, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
-		}
-		else if (S_ISDIR(entry.mode)) {
+		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
@@ -707,12 +752,18 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
+		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+			hit |= grep_submodule(opt, entry.oid->hash, base->buf,
+					      base->buf + tn_len);
 		}
+
 		strbuf_setlen(base, old_baselen);
 
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -736,6 +787,10 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
 
+		/* Use parent's name as base when recursing submodules */
+		if (recurse_submodules && parent_basename)
+			name = parent_basename;
+
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
@@ -762,6 +817,12 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
+
+		/* load the gitmodules file for this rev */
+		if (recurse_submodules) {
+			submodule_free();
+			gitmodules_config_sha1(real_obj->oid.hash);
+		}
 		if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) {
 			hit = 1;
 			if (opt->status_only)
@@ -902,6 +963,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    N_("ignore files specified via '.gitignore'"), 1),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			 N_("recursivley search in each submodule")),
+		OPT_STRING(0, "parent-basename", &parent_basename,
+			   N_("basename"),
+			   N_("prepend parent project's basename to output")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -1154,7 +1218,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked || list.nr))
+	if (recurse_submodules && (!use_index || untracked))
 		die(_("option not supported with --recurse-submodules."));
 
 	if (!show_in_pager && !opt.status_only)
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 1019125..9e93fe7 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -84,6 +84,108 @@ test_expect_success 'grep and multiple patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success 'basic grep tree' '
+	cat >expect <<-\EOF &&
+	HEAD:a:foobar
+	HEAD:b/b:bar
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^' '
+	cat >expect <<-\EOF &&
+	HEAD^:a:foobar
+	HEAD^:b/b:bar
+	HEAD^:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^^' '
+	cat >expect <<-\EOF &&
+	HEAD^^:a:foobar
+	HEAD^^:b/b:bar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodule*a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul?/a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul*/sub/a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep recurse submodule colon in name' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >"parent/fi:le" &&
+	git -C parent add "fi:le" &&
+	git -C parent commit -m "add fi:le" &&
+
+	git init "su:b" &&
+	test_when_finished "rm -rf su:b" &&
+	echo "foobar" >"su:b/fi:le" &&
+	git -C "su:b" add "fi:le" &&
+	git -C "su:b" commit -m "add fi:le" &&
+
+	git -C parent submodule add "../su:b" "su:b" &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	fi:le:foobar
+	su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD:fi:le:foobar
+	HEAD:su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
@@ -94,6 +196,5 @@ test_incompatible_with_recurse_submodules ()
 
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
-test_incompatible_with_recurse_submodules HEAD
 
 test_done
diff --git a/tree-walk.c b/tree-walk.c
index 828f435..ff77605 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -1004,6 +1004,19 @@ static enum interesting do_match(const struct name_entry *entry,
 				 */
 				if (ps->recursive && S_ISDIR(entry->mode))
 					return entry_interesting;
+
+				/*
+				 * When matching against submodules with
+				 * wildcard characters, ensure that the entry
+				 * at least matches up to the first wild
+				 * character.  More accurate matching can then
+				 * be performed in the submodule itself.
+				 */
+				if (ps->recursive && S_ISGITLINK(entry->mode) &&
+				    !ps_strncmp(item, match + baselen,
+						entry->path,
+						item->nowildcard_len - baselen))
+					return entry_interesting;
 			}
 
 			continue;
@@ -1040,6 +1053,21 @@ static enum interesting do_match(const struct name_entry *entry,
 			strbuf_setlen(base, base_offset + baselen);
 			return entry_interesting;
 		}
+
+		/*
+		 * When matching against submodules with
+		 * wildcard characters, ensure that the entry
+		 * at least matches up to the first wild
+		 * character.  More accurate matching can then
+		 * be performed in the submodule itself.
+		 */
+		if (ps->recursive && S_ISGITLINK(entry->mode) &&
+		    !ps_strncmp(item, match, base->buf + base_offset,
+				item->nowildcard_len)) {
+			strbuf_setlen(base, base_offset + baselen);
+			return entry_interesting;
+		}
+
 		strbuf_setlen(base, base_offset + baselen);
 
 		/*
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 6/6] grep: search history of moved submodules
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
                           ` (4 preceding siblings ...)
  2016-11-22 18:46         ` [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-11-22 18:46         ` Brandon Williams
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
  6 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 18:46 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, sbeller, jonathantanmy, gitster

If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.

This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit.  If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.

In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/grep.c                     | 20 +++++++++++++++++--
 t/t7814-grep-recurse-submodules.sh | 41 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 5918a26..2c727ef 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -547,6 +547,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 		name = gs->name;
 
 	prepare_submodule_repo_env(&cp.env_array);
+	argv_array_push(&cp.env_array, GIT_DIR_ENVIRONMENT);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
@@ -615,8 +616,23 @@ static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 {
 	if (!is_submodule_initialized(path))
 		return 0;
-	if (!is_submodule_populated(path))
-		return 0;
+	if (!is_submodule_populated(path)) {
+		/*
+		 * If searching history, check for the presense of the
+		 * submodule's gitdir before skipping the submodule.
+		 */
+		if (sha1) {
+			const struct submodule *sub =
+					submodule_from_path(null_sha1, path);
+			if (sub)
+				path = git_path("modules/%s", sub->name);
+
+			if (!(is_directory(path) && is_git_directory(path)))
+				return 0;
+		} else {
+			return 0;
+		}
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 9e93fe7..0507771 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -186,6 +186,47 @@ test_expect_success 'grep recurse submodule colon in name' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >parent/file &&
+	git -C parent add file &&
+	git -C parent commit -m "add file" &&
+
+	git init sub &&
+	test_when_finished "rm -rf sub" &&
+	echo "foobar" >sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+
+	git -C parent submodule add ../sub dir/sub &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	dir/sub/file:foobar
+	file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	git -C parent mv dir/sub sub-moved &&
+	git -C parent commit -m "moved submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub-moved/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD^:dir/sub/file:foobar
+	HEAD^:file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-22 18:46         ` [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-11-22 22:59           ` Junio C Hamano
  2016-11-22 23:21             ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-11-22 22:59 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> diff --git a/tree-walk.c b/tree-walk.c
> index 828f435..ff77605 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -1004,6 +1004,19 @@ static enum interesting do_match(const struct name_entry *entry,
>  				 */
>  				if (ps->recursive && S_ISDIR(entry->mode))
>  					return entry_interesting;
> +
> +				/*
> +				 * When matching against submodules with
> +				 * wildcard characters, ensure that the entry
> +				 * at least matches up to the first wild
> +				 * character.  More accurate matching can then
> +				 * be performed in the submodule itself.
> +				 */
> +				if (ps->recursive && S_ISGITLINK(entry->mode) &&
> +				    !ps_strncmp(item, match + baselen,
> +						entry->path,
> +						item->nowildcard_len - baselen))
> +					return entry_interesting;
>  			}

This one (and the other hunk) feels more correct than the previous
round.  One thing to keep in mind however is that ps->recursive is
about "do we show a tree as a tree aka 040000, or do we descend into
it to show its contents?", not about "do we recurse into submodules?",
AFAICT.

So this change may have an impact on "git ls-tree -r" with pathspec;
I offhand do not know if that impact is undesirable or not.  A test
or two may be in order to illustrate what happens?  With a submodule
at "sub/module", running "git ls-tree -r HEAD -- sub/module/*" or
something like that, perhaps?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-22 22:59           ` Junio C Hamano
@ 2016-11-22 23:21             ` Brandon Williams
  2016-11-22 23:28               ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, jonathantanmy

On 11/22, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> > diff --git a/tree-walk.c b/tree-walk.c
> > index 828f435..ff77605 100644
> > --- a/tree-walk.c
> > +++ b/tree-walk.c
> > @@ -1004,6 +1004,19 @@ static enum interesting do_match(const struct name_entry *entry,
> >  				 */
> >  				if (ps->recursive && S_ISDIR(entry->mode))
> >  					return entry_interesting;
> > +
> > +				/*
> > +				 * When matching against submodules with
> > +				 * wildcard characters, ensure that the entry
> > +				 * at least matches up to the first wild
> > +				 * character.  More accurate matching can then
> > +				 * be performed in the submodule itself.
> > +				 */
> > +				if (ps->recursive && S_ISGITLINK(entry->mode) &&
> > +				    !ps_strncmp(item, match + baselen,
> > +						entry->path,
> > +						item->nowildcard_len - baselen))
> > +					return entry_interesting;
> >  			}
> 
> This one (and the other hunk) feels more correct than the previous
> round.  One thing to keep in mind however is that ps->recursive is
> about "do we show a tree as a tree aka 040000, or do we descend into
> it to show its contents?", not about "do we recurse into submodules?",
> AFAICT.
> 
> So this change may have an impact on "git ls-tree -r" with pathspec;
> I offhand do not know if that impact is undesirable or not.  A test
> or two may be in order to illustrate what happens?  With a submodule
> at "sub/module", running "git ls-tree -r HEAD -- sub/module/*" or
> something like that, perhaps?

Maybe unrelated, but it looks like wildcard characters are overridden in
ls-tree.c per '170260ae'.  As such wildmatching just doesn't work with
ls-tree.  so `git ls-tree -r HEAD -- "*"` results in no hits.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-22 23:21             ` Brandon Williams
@ 2016-11-22 23:28               ` Brandon Williams
  2016-11-22 23:37                 ` Junio C Hamano
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 23:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, jonathantanmy

On 11/22, Brandon Williams wrote:
> On 11/22, Junio C Hamano wrote:
> > Brandon Williams <bmwill@google.com> writes:
> > 
> > > diff --git a/tree-walk.c b/tree-walk.c
> > > index 828f435..ff77605 100644
> > > --- a/tree-walk.c
> > > +++ b/tree-walk.c
> > > @@ -1004,6 +1004,19 @@ static enum interesting do_match(const struct name_entry *entry,
> > >  				 */
> > >  				if (ps->recursive && S_ISDIR(entry->mode))
> > >  					return entry_interesting;
> > > +
> > > +				/*
> > > +				 * When matching against submodules with
> > > +				 * wildcard characters, ensure that the entry
> > > +				 * at least matches up to the first wild
> > > +				 * character.  More accurate matching can then
> > > +				 * be performed in the submodule itself.
> > > +				 */
> > > +				if (ps->recursive && S_ISGITLINK(entry->mode) &&
> > > +				    !ps_strncmp(item, match + baselen,
> > > +						entry->path,
> > > +						item->nowildcard_len - baselen))
> > > +					return entry_interesting;
> > >  			}
> > 
> > This one (and the other hunk) feels more correct than the previous
> > round.  One thing to keep in mind however is that ps->recursive is
> > about "do we show a tree as a tree aka 040000, or do we descend into
> > it to show its contents?", not about "do we recurse into submodules?",
> > AFAICT.
> > 
> > So this change may have an impact on "git ls-tree -r" with pathspec;
> > I offhand do not know if that impact is undesirable or not.  A test
> > or two may be in order to illustrate what happens?  With a submodule
> > at "sub/module", running "git ls-tree -r HEAD -- sub/module/*" or
> > something like that, perhaps?
> 
> Maybe unrelated, but it looks like wildcard characters are overridden in
> ls-tree.c per '170260ae'.  As such wildmatching just doesn't work with
> ls-tree.  so `git ls-tree -r HEAD -- "*"` results in no hits.

Wrong commit.  Its this one (f0096c06bcdeb7aa6ae8a749ddc9d6d4a2c381d1)
that disabled wildmatching since it is 'plumbing'

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-22 23:28               ` Brandon Williams
@ 2016-11-22 23:37                 ` Junio C Hamano
  2016-11-22 23:54                   ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-11-22 23:37 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

>> > So this change may have an impact on "git ls-tree -r" with pathspec;
>> > I offhand do not know if that impact is undesirable or not.  A test
>> > or two may be in order to illustrate what happens?  With a submodule
>> > at "sub/module", running "git ls-tree -r HEAD -- sub/module/*" or
>> > something like that, perhaps?
>> 
>> Maybe unrelated, but it looks like wildcard characters are overridden in
>> ls-tree.c per '170260ae'.  As such wildmatching just doesn't work with
>> ls-tree.  so `git ls-tree -r HEAD -- "*"` results in no hits.
>
> Wrong commit.  Its this one (f0096c06bcdeb7aa6ae8a749ddc9d6d4a2c381d1)
> that disabled wildmatching since it is 'plumbing'

OK.  Things that share tree-walk other than "ls-tree -r" are still
affected, no?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-11-22 23:37                 ` Junio C Hamano
@ 2016-11-22 23:54                   ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-11-22 23:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, sbeller, jonathantanmy

On 11/22, Junio C Hamano wrote:
> Brandon Williams <bmwill@google.com> writes:
> 
> >> > So this change may have an impact on "git ls-tree -r" with pathspec;
> >> > I offhand do not know if that impact is undesirable or not.  A test
> >> > or two may be in order to illustrate what happens?  With a submodule
> >> > at "sub/module", running "git ls-tree -r HEAD -- sub/module/*" or
> >> > something like that, perhaps?
> >> 
> >> Maybe unrelated, but it looks like wildcard characters are overridden in
> >> ls-tree.c per '170260ae'.  As such wildmatching just doesn't work with
> >> ls-tree.  so `git ls-tree -r HEAD -- "*"` results in no hits.
> >
> > Wrong commit.  Its this one (f0096c06bcdeb7aa6ae8a749ddc9d6d4a2c381d1)
> > that disabled wildmatching since it is 'plumbing'
> 
> OK.  Things that share tree-walk other than "ls-tree -r" are still
> affected, no?

Yeah potentially, though I'm having a difficult time finding a case that
would actually be affected.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v6 0/6] recursively grep across submodules
  2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
                           ` (5 preceding siblings ...)
  2016-11-22 18:46         ` [PATCH v5 6/6] grep: search history of moved submodules Brandon Williams
@ 2016-12-01  1:28         ` Brandon Williams
  2016-12-01  1:28           ` [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
                             ` (7 more replies)
  6 siblings, 8 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-01  1:28 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, sbeller, jonathantanmy, gitster

v6 fixes a race condition which existed in the 'is_submodule_populated'
function.  Instead of calling 'resolve_gitdir' to check for the existance of a
.git file/directory, use 'stat'.  'resolve_gitdir' calls 'chdir' which can
affect other running threads trying to load thier files into a buffer in
memory.

Thanks to Stefan and Jeff for help debugging this problem.

Brandon Williams (6):
  submodules: add helper functions to determine presence of submodules
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: search history of moved submodules

 Documentation/git-grep.txt         |  14 ++
 builtin/grep.c                     | 386 ++++++++++++++++++++++++++++++++++---
 cache.h                            |   2 +
 config.c                           |   8 +-
 git.c                              |   2 +-
 grep.c                             |  16 +-
 grep.h                             |   1 +
 submodule-config.c                 |   6 +-
 submodule-config.h                 |   3 +
 submodule.c                        |  51 +++++
 submodule.h                        |   3 +
 t/t7814-grep-recurse-submodules.sh | 241 +++++++++++++++++++++++
 tree-walk.c                        |  28 +++
 13 files changed, 730 insertions(+), 31 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

--- interdiff based on 'bw/grep-recurse-submodules'

diff --git a/submodule.c b/submodule.c
index 062e58b..8516ab0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -239,9 +239,10 @@ int is_submodule_initialized(const char *path)
 int is_submodule_populated(const char *path)
 {
 	int ret = 0;
+	struct stat st;
 	char *gitdir = xstrfmt("%s/.git", path);
 
-	if (resolve_gitdir(gitdir))
+	if (!stat(gitdir, &st))
 		ret = 1;
 
 	free(gitdir);

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
@ 2016-12-01  1:28           ` Brandon Williams
  2016-12-01  4:29             ` Jeff King
  2016-12-01  1:28           ` [PATCH v6 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
                             ` (6 subsequent siblings)
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-12-01  1:28 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, sbeller, jonathantanmy, gitster

Add two helper functions to submodules.c.
`is_submodule_initialized()` checks if a submodule has been initialized
at a given path and `is_submodule_populated()` check if a submodule
has been checked out at a given path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 39 +++++++++++++++++++++++++++++++++++++++
 submodule.h |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/submodule.c b/submodule.c
index 6f7d883..f336ca9 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,45 @@ void gitmodules_config(void)
 	}
 }
 
+/*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+	int ret = 0;
+	const struct submodule *module = NULL;
+
+	module = submodule_from_path(null_sha1, path);
+
+	if (module) {
+		char *key = xstrfmt("submodule.%s.url", module->name);
+		char *value = NULL;
+
+		ret = !git_config_get_string(key, &value);
+
+		free(value);
+		free(key);
+	}
+
+	return ret;
+}
+
+/*
+ * Determine if a submodule has been populated at a given 'path'
+ */
+int is_submodule_populated(const char *path)
+{
+	int ret = 0;
+	struct stat st;
+	char *gitdir = xstrfmt("%s/.git", path);
+
+	if (!stat(gitdir, &st))
+		ret = 1;
+
+	free(gitdir);
+	return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a..6ec5f2f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
+extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v6 2/6] submodules: load gitmodules file from commit sha1
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
  2016-12-01  1:28           ` [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-12-01  1:28           ` Brandon Williams
  2016-12-01  1:28           ` [PATCH v6 3/6] grep: add submodules as a grep source type Brandon Williams
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-01  1:28 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, sbeller, jonathantanmy, gitster

teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 cache.h            |  2 ++
 config.c           |  8 ++++----
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c        | 12 ++++++++++++
 submodule.h        |  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 1be6526..559a461 100644
--- a/cache.h
+++ b/cache.h
@@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+				     const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb..4d78e72 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	return do_config_from(&top, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-				     const char *name,
-				     const unsigned char *sha1,
-				     void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+			      const char *name,
+			      const unsigned char *sha1,
+			      void *data)
 {
 	enum object_type type;
 	char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085b..8b9a2ef 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
-				      unsigned char *gitmodules_sha1,
-				      struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+			       unsigned char *gitmodules_sha1,
+			       struct strbuf *rev)
 {
 	int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..78584ba 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+				      unsigned char *gitmodules_sha1,
+				      struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index f336ca9..8516ab0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
 	}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned char sha1[20];
+
+	if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
+		git_config_from_blob_sha1(submodule_config, rev.buf,
+					  sha1, NULL);
+	}
+	strbuf_release(&rev);
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index 6ec5f2f..9203d89 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v6 3/6] grep: add submodules as a grep source type
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
  2016-12-01  1:28           ` [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
  2016-12-01  1:28           ` [PATCH v6 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-12-01  1:28           ` Brandon Williams
  2016-12-01  1:28           ` [PATCH v6 4/6] grep: optionally recurse into submodules Brandon Williams
                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-01  1:28 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, sbeller, jonathantanmy, gitster

Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 grep.c | 16 +++++++++++++++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35..0dbdc1d 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	case GREP_SOURCE_FILE:
 		gs->identifier = xstrdup(identifier);
 		break;
+	case GREP_SOURCE_SUBMODULE:
+		if (!identifier) {
+			gs->identifier = NULL;
+			break;
+		}
+		/*
+		 * FALL THROUGH
+		 * If the identifier is non-NULL (in the submodule case) it
+		 * will be a SHA1 that needs to be copied.
+		 */
 	case GREP_SOURCE_SHA1:
 		gs->identifier = xmalloc(20);
 		hashcpy(gs->identifier, identifier);
 		break;
 	case GREP_SOURCE_BUF:
 		gs->identifier = NULL;
+		break;
 	}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
 	switch (gs->type) {
 	case GREP_SOURCE_FILE:
 	case GREP_SOURCE_SHA1:
+	case GREP_SOURCE_SUBMODULE:
 		free(gs->buf);
 		gs->buf = NULL;
 		gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
 		return grep_source_load_sha1(gs);
 	case GREP_SOURCE_BUF:
 		return gs->buf ? 0 : -1;
+	case GREP_SOURCE_SUBMODULE:
+		break;
 	}
-	die("BUG: invalid grep_source type");
+	die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23..267534c 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
 		GREP_SOURCE_SHA1,
 		GREP_SOURCE_FILE,
 		GREP_SOURCE_BUF,
+		GREP_SOURCE_SUBMODULE,
 	} type;
 	void *identifier;
 
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v6 4/6] grep: optionally recurse into submodules
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
                             ` (2 preceding siblings ...)
  2016-12-01  1:28           ` [PATCH v6 3/6] grep: add submodules as a grep source type Brandon Williams
@ 2016-12-01  1:28           ` Brandon Williams
  2016-12-01  1:28           ` [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
                             ` (3 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-01  1:28 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, sbeller, jonathantanmy, gitster

Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |   5 +
 builtin/grep.c                     | 300 ++++++++++++++++++++++++++++++++++---
 git.c                              |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
 4 files changed, 386 insertions(+), 20 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e..17aa1ba 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
+	   [--recurse-submodules]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -88,6 +89,10 @@ OPTIONS
 	mechanism.  Only useful when searching files in the current directory
 	with `--no-index`.
 
+--recurse-submodules::
+	Recursively search in each submodule that has been initialized and
+	checked out in the repository.
+
 -a::
 --text::
 	Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6a..dca0be6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
 	NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
 			break;
 
 		opt->output_priv = w;
-		hit |= grep_source(opt, &w->source);
+		if (w->source.type == GREP_SOURCE_SUBMODULE)
+			hit |= grep_submodule_launch(opt, &w->source);
+		else
+			hit |= grep_source(opt, &w->source);
 		grep_source_clear_data(&w->source);
 		work_done(w);
 	}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
+	} else if (super_prefix) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
+		strbuf_addstr(&pathbuf, super_prefix);
+		strbuf_addstr(&pathbuf, filename + tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 {
 	struct strbuf buf = STRBUF_INIT;
 
-	if (opt->relative && opt->prefix_length)
+	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename, opt->prefix, &buf);
-	else
+	} else {
+		if (super_prefix)
+			strbuf_addstr(&buf, super_prefix);
 		strbuf_addstr(&buf, filename);
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
@@ -378,31 +396,260 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 		exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+				      const struct pathspec *pathspec,
+				      int cached, int untracked,
+				      int opt_exclude, int use_index,
+				      int pattern_type_arg)
+{
+	struct grep_pat *pattern;
+	int i;
+
+	if (recurse_submodules)
+		argv_array_push(&submodule_options, "--recurse-submodules");
+
+	if (cached)
+		argv_array_push(&submodule_options, "--cached");
+	if (!use_index)
+		argv_array_push(&submodule_options, "--no-index");
+	if (untracked)
+		argv_array_push(&submodule_options, "--untracked");
+	if (opt_exclude > 0)
+		argv_array_push(&submodule_options, "--exclude-standard");
+
+	if (opt->invert)
+		argv_array_push(&submodule_options, "-v");
+	if (opt->ignore_case)
+		argv_array_push(&submodule_options, "-i");
+	if (opt->word_regexp)
+		argv_array_push(&submodule_options, "-w");
+	switch (opt->binary) {
+	case GREP_BINARY_NOMATCH:
+		argv_array_push(&submodule_options, "-I");
+		break;
+	case GREP_BINARY_TEXT:
+		argv_array_push(&submodule_options, "-a");
+		break;
+	default:
+		break;
+	}
+	if (opt->allow_textconv)
+		argv_array_push(&submodule_options, "--textconv");
+	if (opt->max_depth != -1)
+		argv_array_pushf(&submodule_options, "--max-depth=%d",
+				 opt->max_depth);
+	if (opt->linenum)
+		argv_array_push(&submodule_options, "-n");
+	if (!opt->pathname)
+		argv_array_push(&submodule_options, "-h");
+	if (!opt->relative)
+		argv_array_push(&submodule_options, "--full-name");
+	if (opt->name_only)
+		argv_array_push(&submodule_options, "-l");
+	if (opt->unmatch_name_only)
+		argv_array_push(&submodule_options, "-L");
+	if (opt->null_following_name)
+		argv_array_push(&submodule_options, "-z");
+	if (opt->count)
+		argv_array_push(&submodule_options, "-c");
+	if (opt->file_break)
+		argv_array_push(&submodule_options, "--break");
+	if (opt->heading)
+		argv_array_push(&submodule_options, "--heading");
+	if (opt->pre_context)
+		argv_array_pushf(&submodule_options, "--before-context=%d",
+				 opt->pre_context);
+	if (opt->post_context)
+		argv_array_pushf(&submodule_options, "--after-context=%d",
+				 opt->post_context);
+	if (opt->funcname)
+		argv_array_push(&submodule_options, "-p");
+	if (opt->funcbody)
+		argv_array_push(&submodule_options, "-W");
+	if (opt->all_match)
+		argv_array_push(&submodule_options, "--all-match");
+	if (opt->debug)
+		argv_array_push(&submodule_options, "--debug");
+	if (opt->status_only)
+		argv_array_push(&submodule_options, "-q");
+
+	switch (pattern_type_arg) {
+	case GREP_PATTERN_TYPE_BRE:
+		argv_array_push(&submodule_options, "-G");
+		break;
+	case GREP_PATTERN_TYPE_ERE:
+		argv_array_push(&submodule_options, "-E");
+		break;
+	case GREP_PATTERN_TYPE_FIXED:
+		argv_array_push(&submodule_options, "-F");
+		break;
+	case GREP_PATTERN_TYPE_PCRE:
+		argv_array_push(&submodule_options, "-P");
+		break;
+	case GREP_PATTERN_TYPE_UNSPECIFIED:
+		break;
+	}
+
+	for (pattern = opt->pattern_list; pattern != NULL;
+	     pattern = pattern->next) {
+		switch (pattern->token) {
+		case GREP_PATTERN:
+			argv_array_pushf(&submodule_options, "-e%s",
+					 pattern->pattern);
+			break;
+		case GREP_AND:
+		case GREP_OPEN_PAREN:
+		case GREP_CLOSE_PAREN:
+		case GREP_NOT:
+		case GREP_OR:
+			argv_array_push(&submodule_options, pattern->pattern);
+			break;
+		/* BODY and HEAD are not used by git-grep */
+		case GREP_PATTERN_BODY:
+		case GREP_PATTERN_HEAD:
+			break;
+		}
+	}
+
+	/*
+	 * Limit number of threads for child process to use.
+	 * This is to prevent potential fork-bomb behavior of git-grep as each
+	 * submodule process has its own thread pool.
+	 */
+	argv_array_pushf(&submodule_options, "--threads=%d",
+			 (num_threads + 1) / 2);
+
+	/* Add Pathspecs */
+	argv_array_push(&submodule_options, "--");
+	for (i = 0; i < pathspec->nr; i++)
+		argv_array_push(&submodule_options,
+				pathspec->items[i].original);
+}
+
+/*
+ * Launch child process to grep contents of a submodule
+ */
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int status, i;
+	struct work_item *w = opt->output_priv;
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	/* Add super prefix */
+	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
+			 super_prefix ? super_prefix : "",
+			 gs->name);
+	argv_array_push(&cp.args, "grep");
+
+	/* Add options */
+	for (i = 0; i < submodule_options.argc; i++)
+		argv_array_push(&cp.args, submodule_options.argv[i]);
+
+	cp.git_cmd = 1;
+	cp.dir = gs->path;
+
+	/*
+	 * Capture output to output buffer and check the return code from the
+	 * child process.  A '0' indicates a hit, a '1' indicates no hit and
+	 * anything else is an error.
+	 */
+	status = capture_command(&cp, &w->out, 0);
+	if (status && (status != 1)) {
+		/* flush the buffer */
+		write_or_die(1, w->out.buf, w->out.len);
+		die("process for submodule '%s' failed with exit code: %d",
+		    gs->name, status);
+	}
+
+	/* invert the return code to make a hit equal to 1 */
+	return !status;
+}
+
+/*
+ * Prep grep structures for a submodule grep
+ * sha1: the sha1 of the submodule or NULL if using the working tree
+ * filename: name of the submodule including tree name of parent
+ * path: location of the submodule
+ */
+static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
+			  const char *filename, const char *path)
+{
+	if (!is_submodule_initialized(path))
+		return 0;
+	if (!is_submodule_populated(path))
+		return 0;
+
+#ifndef NO_PTHREADS
+	if (num_threads) {
+		add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
+		return 0;
+	} else
+#endif
+	{
+		struct work_item w;
+		int hit;
+
+		grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
+				 filename, path, sha1);
+		strbuf_init(&w.out, 0);
+		opt->output_priv = &w;
+		hit = grep_submodule_launch(opt, &w.source);
+
+		write_or_die(1, w.out.buf, w.out.len);
+
+		grep_source_clear(&w.source);
+		strbuf_release(&w.out);
+		return hit;
+	}
+}
+
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
+		      int cached)
 {
 	int hit = 0;
 	int nr;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
+
 	read_cache();
 
 	for (nr = 0; nr < active_nr; nr++) {
 		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!ce_path_match(ce, pathspec, NULL))
+		strbuf_setlen(&name, name_base_len);
+		strbuf_addstr(&name, ce->name);
+
+		if (S_ISREG(ce->ce_mode) &&
+		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
+				   S_ISDIR(ce->ce_mode) ||
+				   S_ISGITLINK(ce->ce_mode))) {
+			/*
+			 * If CE_VALID is on, we assume worktree file and its
+			 * cache entry are identical, even if worktree file has
+			 * been modified, so use cache version instead
+			 */
+			if (cached || (ce->ce_flags & CE_VALID) ||
+			    ce_skip_worktree(ce)) {
+				if (ce_stage(ce) || ce_intent_to_add(ce))
+					continue;
+				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
+						 0, ce->name);
+			} else {
+				hit |= grep_file(opt, ce->name);
+			}
+		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+			   submodule_path_match(pathspec, name.buf, NULL)) {
+			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
+		} else {
 			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce) || ce_intent_to_add(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
-					 ce->name);
 		}
-		else
-			hit |= grep_file(opt, ce->name);
+
 		if (ce_stage(ce)) {
 			do {
 				nr++;
@@ -413,6 +660,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -651,6 +900,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("search in both tracked and untracked files")),
 		OPT_SET_INT(0, "exclude-standard", &opt_exclude,
 			    N_("ignore files specified via '.gitignore'"), 1),
+		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
+			 N_("recursivley search in each submodule")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -755,6 +1006,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	init_grep_defaults();
 	git_config(grep_cmd_config, NULL);
 	grep_init(&opt, prefix);
+	super_prefix = get_super_prefix();
 
 	/*
 	 * If there is no -- then the paths must exist in the working
@@ -872,6 +1124,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+	if (recurse_submodules) {
+		gitmodules_config();
+		compile_submodule_options(&opt, &pathspec, cached, untracked,
+					  opt_exclude, use_index,
+					  pattern_type_arg);
+	}
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -895,6 +1154,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (recurse_submodules && (!use_index || untracked || list.nr))
+		die(_("option not supported with --recurse-submodules."));
+
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
diff --git a/git.c b/git.c
index efa1059..a156efd 100644
--- a/git.c
+++ b/git.c
@@ -434,7 +434,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
-	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "grep", cmd_grep, RUN_SETUP_GENTLY | SUPPORT_SUPER_PREFIX },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
new file mode 100755
index 0000000..1019125
--- /dev/null
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='Test grep recurse-submodules feature
+
+This test verifies the recurse-submodules feature correctly greps across
+submodules.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup directory structure and submodule' '
+	echo "foobar" >a &&
+	mkdir b &&
+	echo "bar" >b/b &&
+	git add a b &&
+	git commit -m "add a and b" &&
+	git init submodule &&
+	echo "foobar" >submodule/a &&
+	git -C submodule add a &&
+	git -C submodule commit -m "add a" &&
+	git submodule add ./submodule &&
+	git commit -m "added submodule"
+'
+
+test_expect_success 'grep correctly finds patterns in a submodule' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and basic pathspecs' '
+	cat >expect <<-\EOF &&
+	submodule/a:foobar
+	EOF
+
+	git grep -e. --recurse-submodules -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and nested submodules' '
+	git init submodule/sub &&
+	echo "foobar" >submodule/sub/a &&
+	git -C submodule/sub add a &&
+	git -C submodule/sub commit -m "add a" &&
+	git -C submodule submodule add ./sub &&
+	git -C submodule add sub &&
+	git -C submodule commit -m "added sub" &&
+	git add submodule &&
+	git commit -m "updated submodule" &&
+
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	b/b:bar
+	EOF
+
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_incompatible_with_recurse_submodules ()
+{
+	test_expect_success "--recurse-submodules and $1 are incompatible" "
+		test_must_fail git grep -e. --recurse-submodules $1 2>actual &&
+		test_i18ngrep 'not supported with --recurse-submodules' actual
+	"
+}
+
+test_incompatible_with_recurse_submodules --untracked
+test_incompatible_with_recurse_submodules --no-index
+test_incompatible_with_recurse_submodules HEAD
+
+test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
                             ` (3 preceding siblings ...)
  2016-12-01  1:28           ` [PATCH v6 4/6] grep: optionally recurse into submodules Brandon Williams
@ 2016-12-01  1:28           ` Brandon Williams
  2016-12-01  7:25             ` Johannes Sixt
  2016-12-01  1:28           ` [PATCH v6 6/6] grep: search history of moved submodules Brandon Williams
                             ` (2 subsequent siblings)
  7 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-12-01  1:28 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, sbeller, jonathantanmy, gitster

Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD`

from:
  HEAD:file
  <commit sha1 of submodule>:sub/file

to:
  HEAD:file
  HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |  13 ++++-
 builtin/grep.c                     |  76 ++++++++++++++++++++++++---
 t/t7814-grep-recurse-submodules.sh | 103 ++++++++++++++++++++++++++++++++++++-
 tree-walk.c                        |  28 ++++++++++
 4 files changed, 211 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..71f32f3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules]
+	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
 	Recursively search in each submodule that has been initialized and
-	checked out in the repository.
+	checked out in the repository.  When used in combination with the
+	<tree> option the prefix of all submodule output will be the name of
+	the parent project's <tree> object.
+
+--parent-basename <basename>::
+	For internal use only.  In order to produce uniform output with the
+	--recurse-submodules option, this option can be used to provide the
+	basename of a parent's <tree> object to a submodule so the submodule
+	can prefix its output with the parent's name rather than the SHA1 of
+	the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index dca0be6..5918a26 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
@@ -534,19 +536,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
+	const char *end_of_base;
+	const char *name;
 	struct work_item *w = opt->output_priv;
 
+	end_of_base = strchr(gs->name, ':');
+	if (gs->identifier && end_of_base)
+		name = end_of_base + 1;
+	else
+		name = gs->name;
+
 	prepare_submodule_repo_env(&cp.env_array);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
 			 super_prefix ? super_prefix : "",
-			 gs->name);
+			 name);
 	argv_array_push(&cp.args, "grep");
 
+	/*
+	 * Add basename of parent project
+	 * When performing grep on a tree object the filename is prefixed
+	 * with the object's name: 'tree-name:filename'.  In order to
+	 * provide uniformity of output we want to pass the name of the
+	 * parent project's object name to the submodule so the submodule can
+	 * prefix its output with the parent's name and not its own SHA1.
+	 */
+	if (gs->identifier && end_of_base)
+		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
+				 (int) (end_of_base - gs->name),
+				 gs->name);
+
 	/* Add options */
-	for (i = 0; i < submodule_options.argc; i++)
+	for (i = 0; i < submodule_options.argc; i++) {
+		/*
+		 * If there is a tree identifier for the submodule, add the
+		 * rev after adding the submodule options but before the
+		 * pathspecs.  To do this we listen for the '--' and insert the
+		 * sha1 before pushing the '--' onto the child process argv
+		 * array.
+		 */
+		if (gs->identifier &&
+		    !strcmp("--", submodule_options.argv[i])) {
+			argv_array_push(&cp.args, sha1_to_hex(gs->identifier));
+		}
+
 		argv_array_push(&cp.args, submodule_options.argv[i]);
+	}
 
 	cp.git_cmd = 1;
 	cp.dir = gs->path;
@@ -673,12 +709,22 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	enum interesting match = entry_not_interesting;
 	struct name_entry entry;
 	int old_baselen = base->len;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		strbuf_addstr(&name, super_prefix);
+		name_base_len = name.len;
+	}
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
+			strbuf_addstr(&name, base->buf + tn_len);
+			match = tree_entry_interesting(&entry, &name,
+						       0, pathspec);
+			strbuf_setlen(&name, name_base_len);
+
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
@@ -690,8 +736,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_sha1(opt, entry.oid->hash, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
-		}
-		else if (S_ISDIR(entry.mode)) {
+		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
@@ -707,12 +752,18 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
+		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+			hit |= grep_submodule(opt, entry.oid->hash, base->buf,
+					      base->buf + tn_len);
 		}
+
 		strbuf_setlen(base, old_baselen);
 
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -736,6 +787,10 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
 
+		/* Use parent's name as base when recursing submodules */
+		if (recurse_submodules && parent_basename)
+			name = parent_basename;
+
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
@@ -762,6 +817,12 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
+
+		/* load the gitmodules file for this rev */
+		if (recurse_submodules) {
+			submodule_free();
+			gitmodules_config_sha1(real_obj->oid.hash);
+		}
 		if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) {
 			hit = 1;
 			if (opt->status_only)
@@ -902,6 +963,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    N_("ignore files specified via '.gitignore'"), 1),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			 N_("recursivley search in each submodule")),
+		OPT_STRING(0, "parent-basename", &parent_basename,
+			   N_("basename"),
+			   N_("prepend parent project's basename to output")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -1154,7 +1218,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked || list.nr))
+	if (recurse_submodules && (!use_index || untracked))
 		die(_("option not supported with --recurse-submodules."));
 
 	if (!show_in_pager && !opt.status_only)
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 1019125..9e93fe7 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -84,6 +84,108 @@ test_expect_success 'grep and multiple patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success 'basic grep tree' '
+	cat >expect <<-\EOF &&
+	HEAD:a:foobar
+	HEAD:b/b:bar
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^' '
+	cat >expect <<-\EOF &&
+	HEAD^:a:foobar
+	HEAD^:b/b:bar
+	HEAD^:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^^' '
+	cat >expect <<-\EOF &&
+	HEAD^^:a:foobar
+	HEAD^^:b/b:bar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodule*a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul?/a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul*/sub/a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep recurse submodule colon in name' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >"parent/fi:le" &&
+	git -C parent add "fi:le" &&
+	git -C parent commit -m "add fi:le" &&
+
+	git init "su:b" &&
+	test_when_finished "rm -rf su:b" &&
+	echo "foobar" >"su:b/fi:le" &&
+	git -C "su:b" add "fi:le" &&
+	git -C "su:b" commit -m "add fi:le" &&
+
+	git -C parent submodule add "../su:b" "su:b" &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	fi:le:foobar
+	su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD:fi:le:foobar
+	HEAD:su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
@@ -94,6 +196,5 @@ test_incompatible_with_recurse_submodules ()
 
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
-test_incompatible_with_recurse_submodules HEAD
 
 test_done
diff --git a/tree-walk.c b/tree-walk.c
index 828f435..ff77605 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -1004,6 +1004,19 @@ static enum interesting do_match(const struct name_entry *entry,
 				 */
 				if (ps->recursive && S_ISDIR(entry->mode))
 					return entry_interesting;
+
+				/*
+				 * When matching against submodules with
+				 * wildcard characters, ensure that the entry
+				 * at least matches up to the first wild
+				 * character.  More accurate matching can then
+				 * be performed in the submodule itself.
+				 */
+				if (ps->recursive && S_ISGITLINK(entry->mode) &&
+				    !ps_strncmp(item, match + baselen,
+						entry->path,
+						item->nowildcard_len - baselen))
+					return entry_interesting;
 			}
 
 			continue;
@@ -1040,6 +1053,21 @@ static enum interesting do_match(const struct name_entry *entry,
 			strbuf_setlen(base, base_offset + baselen);
 			return entry_interesting;
 		}
+
+		/*
+		 * When matching against submodules with
+		 * wildcard characters, ensure that the entry
+		 * at least matches up to the first wild
+		 * character.  More accurate matching can then
+		 * be performed in the submodule itself.
+		 */
+		if (ps->recursive && S_ISGITLINK(entry->mode) &&
+		    !ps_strncmp(item, match, base->buf + base_offset,
+				item->nowildcard_len)) {
+			strbuf_setlen(base, base_offset + baselen);
+			return entry_interesting;
+		}
+
 		strbuf_setlen(base, base_offset + baselen);
 
 		/*
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v6 6/6] grep: search history of moved submodules
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
                             ` (4 preceding siblings ...)
  2016-12-01  1:28           ` [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-12-01  1:28           ` Brandon Williams
  2016-12-01  4:22           ` [PATCH v6 0/6] recursively grep across submodules Jeff King
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-01  1:28 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, peff, sbeller, jonathantanmy, gitster

If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.

This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit.  If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.

In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/grep.c                     | 20 +++++++++++++++++--
 t/t7814-grep-recurse-submodules.sh | 41 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 5918a26..2c727ef 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -547,6 +547,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 		name = gs->name;
 
 	prepare_submodule_repo_env(&cp.env_array);
+	argv_array_push(&cp.env_array, GIT_DIR_ENVIRONMENT);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
@@ -615,8 +616,23 @@ static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 {
 	if (!is_submodule_initialized(path))
 		return 0;
-	if (!is_submodule_populated(path))
-		return 0;
+	if (!is_submodule_populated(path)) {
+		/*
+		 * If searching history, check for the presense of the
+		 * submodule's gitdir before skipping the submodule.
+		 */
+		if (sha1) {
+			const struct submodule *sub =
+					submodule_from_path(null_sha1, path);
+			if (sub)
+				path = git_path("modules/%s", sub->name);
+
+			if (!(is_directory(path) && is_git_directory(path)))
+				return 0;
+		} else {
+			return 0;
+		}
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 9e93fe7..0507771 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -186,6 +186,47 @@ test_expect_success 'grep recurse submodule colon in name' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >parent/file &&
+	git -C parent add file &&
+	git -C parent commit -m "add file" &&
+
+	git init sub &&
+	test_when_finished "rm -rf sub" &&
+	echo "foobar" >sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+
+	git -C parent submodule add ../sub dir/sub &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	dir/sub/file:foobar
+	file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	git -C parent mv dir/sub sub-moved &&
+	git -C parent commit -m "moved submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub-moved/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD^:dir/sub/file:foobar
+	HEAD^:file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 0/6] recursively grep across submodules
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
                             ` (5 preceding siblings ...)
  2016-12-01  1:28           ` [PATCH v6 6/6] grep: search history of moved submodules Brandon Williams
@ 2016-12-01  4:22           ` Jeff King
  2016-12-01 17:45             ` Brandon Williams
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
  7 siblings, 1 reply; 126+ messages in thread
From: Jeff King @ 2016-12-01  4:22 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy, gitster

On Wed, Nov 30, 2016 at 05:28:28PM -0800, Brandon Williams wrote:

> v6 fixes a race condition which existed in the 'is_submodule_populated'
> function.  Instead of calling 'resolve_gitdir' to check for the existance of a
> .git file/directory, use 'stat'.  'resolve_gitdir' calls 'chdir' which can
> affect other running threads trying to load thier files into a buffer in
> memory.

This one passes my stress-test for t7814 (though I imagine you already
knew that).

I tried to think of things that could go wrong by using a simple stat()
instead of resolve_gitdir(). They should only differ when ".git" for
some reason does not point to a git repository. My initial thought is
that this might be more vocal about errors, because the child process
will complain. But actually, the original would already die if the
".git" file is funny, so we were pretty vocal already.

I also wondered whether the sub-process might skip a bogus ".git" file
and keep looking upward in the filesystem tree (which would confusingly
end up back in the super-project!). But it looks like we bail hard when
we see a ".git" file but it's bogus. Which is probably a good thing in
general for submodules.

I'm not sure any of that is actually even worth worrying about, as such
a setup is broken by definition. I just wanted to think it through as a
devil's advocate, and even that seems pretty reasonable.

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01  1:28           ` [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
@ 2016-12-01  4:29             ` Jeff King
  2016-12-01 18:31               ` Stefan Beller
  2016-12-01 18:46               ` Junio C Hamano
  0 siblings, 2 replies; 126+ messages in thread
From: Jeff King @ 2016-12-01  4:29 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy, gitster

On Wed, Nov 30, 2016 at 05:28:29PM -0800, Brandon Williams wrote:

> +/*
> + * Determine if a submodule has been populated at a given 'path'
> + */
> +int is_submodule_populated(const char *path)
> +{
> +	int ret = 0;
> +	struct stat st;
> +	char *gitdir = xstrfmt("%s/.git", path);
> +
> +	if (!stat(gitdir, &st))
> +		ret = 1;
> +
> +	free(gitdir);
> +	return ret;
> +}

I don't know if it's worth changing or not, but this could be a bit
shorter:

  int is_submodule_populated(const char *path)
  {
	return !access(mkpath("%s/.git", path), F_OK);
  }

There is a file_exists() helper, but it uses lstat(), which I think you
don't want (because you'd prefer to bail on a broken .git symlink). But
access(F_OK) does what you want, I think.

mkpath() is generally an unsafe function because it uses a static
buffer, but it's handy and safe for handing values to syscalls like
this.

I say "I don't know if it's worth it" because what you've written is
fine, and while more lines, it's fairly obvious and safe.

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-12-01  1:28           ` [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-12-01  7:25             ` Johannes Sixt
  2016-12-01 17:51               ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Johannes Sixt @ 2016-12-01  7:25 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, peff, sbeller, jonathantanmy, gitster

Am 01.12.2016 um 02:28 schrieb Brandon Williams:
> +	git init "su:b" &&

Don't do that. Colons in file names won't work on Windows.

-- Hannes


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 0/6] recursively grep across submodules
  2016-12-01  4:22           ` [PATCH v6 0/6] recursively grep across submodules Jeff King
@ 2016-12-01 17:45             ` Brandon Williams
  2016-12-01 19:03               ` Jeff King
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-12-01 17:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git, sbeller, jonathantanmy, gitster

On 11/30, Jeff King wrote:
> On Wed, Nov 30, 2016 at 05:28:28PM -0800, Brandon Williams wrote:
> 
> > v6 fixes a race condition which existed in the 'is_submodule_populated'
> > function.  Instead of calling 'resolve_gitdir' to check for the existance of a
> > .git file/directory, use 'stat'.  'resolve_gitdir' calls 'chdir' which can
> > affect other running threads trying to load thier files into a buffer in
> > memory.
> 
> This one passes my stress-test for t7814 (though I imagine you already
> knew that).
> 
> I tried to think of things that could go wrong by using a simple stat()
> instead of resolve_gitdir(). They should only differ when ".git" for
> some reason does not point to a git repository. My initial thought is
> that this might be more vocal about errors, because the child process
> will complain. But actually, the original would already die if the
> ".git" file is funny, so we were pretty vocal already.
> 
> I also wondered whether the sub-process might skip a bogus ".git" file
> and keep looking upward in the filesystem tree (which would confusingly
> end up back in the super-project!). But it looks like we bail hard when
> we see a ".git" file but it's bogus. Which is probably a good thing in
> general for submodules.
> 
> I'm not sure any of that is actually even worth worrying about, as such
> a setup is broken by definition. I just wanted to think it through as a
> devil's advocate, and even that seems pretty reasonable.
> 
> -Peff

Yeah I was trying to think through these scenarios myself last night.
And like you found it seemed alright to let the child process deal with
the .git file/dir as long as once actually exists at that path.  If one
didn't then there would be the possibility that we ended up back at the
superproject, which would result in an infinite loop.  And yeah if the
.git file doesn't resolve to anything sensible then the user probably
mangled their repository somehow anyways.

Thanks again for all the help!

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-12-01  7:25             ` Johannes Sixt
@ 2016-12-01 17:51               ` Brandon Williams
  2016-12-01 18:49                 ` Junio C Hamano
  2016-12-01 18:52                 ` Jeff King
  0 siblings, 2 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-01 17:51 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, peff, sbeller, jonathantanmy, gitster

On 12/01, Johannes Sixt wrote:
> Am 01.12.2016 um 02:28 schrieb Brandon Williams:
> >+	git init "su:b" &&
> 
> Don't do that. Colons in file names won't work on Windows.
> 
> -- Hannes
> 

This test is needed to see if the code still works with filenames that
contain colons.  Is there a way to mark the test to not run on windows?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01  4:29             ` Jeff King
@ 2016-12-01 18:31               ` Stefan Beller
  2016-12-01 18:46               ` Junio C Hamano
  1 sibling, 0 replies; 126+ messages in thread
From: Stefan Beller @ 2016-12-01 18:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Williams, git@vger.kernel.org, Jonathan Tan,
	Junio C Hamano

On Wed, Nov 30, 2016 at 8:29 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Nov 30, 2016 at 05:28:29PM -0800, Brandon Williams wrote:
>
>> +/*
>> + * Determine if a submodule has been populated at a given 'path'
>> + */
>> +int is_submodule_populated(const char *path)
>> +{
>> +     int ret = 0;
>> +     struct stat st;
>> +     char *gitdir = xstrfmt("%s/.git", path);
>> +
>> +     if (!stat(gitdir, &st))
>> +             ret = 1;
>> +
>> +     free(gitdir);
>> +     return ret;
>> +}
>
> I don't know if it's worth changing or not, but this could be a bit
> shorter:
>
>   int is_submodule_populated(const char *path)
>   {
>         return !access(mkpath("%s/.git", path), F_OK);
>   }
>
> There is a file_exists() helper, but it uses lstat(), which I think you
> don't want (because you'd prefer to bail on a broken .git symlink). But
> access(F_OK) does what you want, I think.
>
> mkpath() is generally an unsafe function because it uses a static
> buffer, but it's handy and safe for handing values to syscalls like
> this.
>
> I say "I don't know if it's worth it" because what you've written is
> fine, and while more lines, it's fairly obvious and safe.

OK, chiming in here as well. :)

I plan on making use of the is_submodule_populated method in
the checkout --recurse-submodules series, and for that I am
undecided whether a cheap stat is the right approach or if we want
to have the result of resolve_gitdir as that fails in weird corner cases.

Anyway, I'd propose to change the name when going with either the
code as is or what Jeff proposes to be one of

    is_submodule_populated_cheaply
    is_submodule_populated_with_no_sanity_check
    is_submodule_dot_git_present
    have_submodule_dot_git

I think I'd prefer the last one as that describes what the function
actually does in a concise way?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01  4:29             ` Jeff King
  2016-12-01 18:31               ` Stefan Beller
@ 2016-12-01 18:46               ` Junio C Hamano
  2016-12-01 19:09                 ` Jeff King
  1 sibling, 1 reply; 126+ messages in thread
From: Junio C Hamano @ 2016-12-01 18:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Brandon Williams, git, sbeller, jonathantanmy

Jeff King <peff@peff.net> writes:

> On Wed, Nov 30, 2016 at 05:28:29PM -0800, Brandon Williams wrote:
>
>> +/*
>> + * Determine if a submodule has been populated at a given 'path'
>> + */
>> +int is_submodule_populated(const char *path)
>> +{
>> +	int ret = 0;
>> +	struct stat st;
>> +	char *gitdir = xstrfmt("%s/.git", path);
>> +
>> +	if (!stat(gitdir, &st))
>> +		ret = 1;
>> +
>> +	free(gitdir);
>> +	return ret;
>> +}
>
> I don't know if it's worth changing or not, but this could be a bit
> shorter:
>
>   int is_submodule_populated(const char *path)
>   {
> 	return !access(mkpath("%s/.git", path), F_OK);
>   }
>
> There is a file_exists() helper, but it uses lstat(), which I think you
> don't want (because you'd prefer to bail on a broken .git symlink). But
> access(F_OK) does what you want, I think.
>
> mkpath() is generally an unsafe function because it uses a static
> buffer, but it's handy and safe for handing values to syscalls like
> this.

I think your "unsafe" is not about thread-safety but about "the
caller cannot rely on returned value staying valid for long haul".
If this change since v5 is about thread-safety, I am not sure if it
is safe to use mkpath here.

I am a bit wary of making the check too sketchy like this, but this
is not about determining if a random "path" that has ".git" in a
superproject working tree is a submodule or not (that information
primarily comes from the superproject index), so I tend to agree
with the patch that it is sufficient to check presence of ".git"
alone.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-12-01 17:51               ` Brandon Williams
@ 2016-12-01 18:49                 ` Junio C Hamano
  2016-12-01 18:52                 ` Jeff King
  1 sibling, 0 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-12-01 18:49 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Johannes Sixt, git, peff, sbeller, jonathantanmy

Brandon Williams <bmwill@google.com> writes:

> On 12/01, Johannes Sixt wrote:
>> Am 01.12.2016 um 02:28 schrieb Brandon Williams:
>> >+	git init "su:b" &&
>> 
>> Don't do that. Colons in file names won't work on Windows.
>> 
>> -- Hannes
>> 
>
> This test is needed to see if the code still works with filenames that
> contain colons.  Is there a way to mark the test to not run on windows?

Something like:

test_expect_success !MINGW 'a test' '
	git init s:u:b
'


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects
  2016-12-01 17:51               ` Brandon Williams
  2016-12-01 18:49                 ` Junio C Hamano
@ 2016-12-01 18:52                 ` Jeff King
  1 sibling, 0 replies; 126+ messages in thread
From: Jeff King @ 2016-12-01 18:52 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Johannes Sixt, git, sbeller, jonathantanmy, gitster

On Thu, Dec 01, 2016 at 09:51:07AM -0800, Brandon Williams wrote:

> On 12/01, Johannes Sixt wrote:
> > Am 01.12.2016 um 02:28 schrieb Brandon Williams:
> > >+	git init "su:b" &&
> > 
> > Don't do that. Colons in file names won't work on Windows.
> > 
> > -- Hannes
> > 
> 
> This test is needed to see if the code still works with filenames that
> contain colons.  Is there a way to mark the test to not run on windows?

Junio suggested !MINGW, which seems sensible. Earlier I mentioned doing
the whole thing in-index, but I think that might get tricky because we
try to find the submodule as ".git/modules/<path>". So it probably isn't
worth the trouble.

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 0/6] recursively grep across submodules
  2016-12-01 17:45             ` Brandon Williams
@ 2016-12-01 19:03               ` Jeff King
  0 siblings, 0 replies; 126+ messages in thread
From: Jeff King @ 2016-12-01 19:03 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, sbeller, jonathantanmy, gitster

On Thu, Dec 01, 2016 at 09:45:47AM -0800, Brandon Williams wrote:

> Yeah I was trying to think through these scenarios myself last night.
> And like you found it seemed alright to let the child process deal with
> the .git file/dir as long as once actually exists at that path.  If one
> didn't then there would be the possibility that we ended up back at the
> superproject, which would result in an infinite loop.  And yeah if the
> .git file doesn't resolve to anything sensible then the user probably
> mangled their repository somehow anyways.

I hadn't considered the infinite loop. I thought the worst case is that
we might just generate bogus results by going back to the superproject.
But of course there is nothing to stop it from just recursing again.

However, it looks like there is a circuit-breaker; we end up back in the
superproject, but inside a subdirectory, which causes --super-prefix to
complain.

You can test it with just:

  rm submodule/.git
  mkdir submodule/.git

which says:

  fatal: can't use --super-prefix from a subdirectory
  fatal: process for submodule 'foo' failed with exit code: 128

It might be worth including a test to make sure that behavior remains.
I think it's more of an emergent behavior than something planned. :)

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01 18:46               ` Junio C Hamano
@ 2016-12-01 19:09                 ` Jeff King
  2016-12-01 19:16                   ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Jeff King @ 2016-12-01 19:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, git, sbeller, jonathantanmy

On Thu, Dec 01, 2016 at 10:46:23AM -0800, Junio C Hamano wrote:

> > mkpath() is generally an unsafe function because it uses a static
> > buffer, but it's handy and safe for handing values to syscalls like
> > this.
> 
> I think your "unsafe" is not about thread-safety but about "the
> caller cannot rely on returned value staying valid for long haul".
> If this change since v5 is about thread-safety, I am not sure if it
> is safe to use mkpath here.

Oh, good point. I meant "staying valid", but somehow totally forgot that
we cared about thread reentrancy here. As if I hadn't just spent an hour
debugging a thread problem.

My suggestion is clearly nonsense.

> I am a bit wary of making the check too sketchy like this, but this
> is not about determining if a random "path" that has ".git" in a
> superproject working tree is a submodule or not (that information
> primarily comes from the superproject index), so I tend to agree
> with the patch that it is sufficient to check presence of ".git"
> alone.

The real danger is that it is a different check than the child process
is going to use, so they may disagree (see the almost-infinite-loop
discussion elsewhere).

It feels quite hacky, but checking:

  if (is_git_directory(suspect))
	return 1; /* actual git dir */
  if (!stat(suspect, &st) && S_ISREG(st.st_mode))
	return 1; /* gitfile; may or may not be valid */
  return 0;

is a little more robust, because the child process will happily skip a
non-repo ".git" and keep walking back up to the superproject. Whereas if
it sees any ".git" file, even if it is bogus, it will barf then and
there.

I'm actually not sure if that latter behavior is a bug or not. I don't
think it was really planned out, and it obviously is inconsistent with
the other repo-discovery cases. But it is a convenient side effect for
submodules, and I doubt anybody is bothered by it in practice.

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01 19:09                 ` Jeff King
@ 2016-12-01 19:16                   ` Brandon Williams
  2016-12-01 20:54                     ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-12-01 19:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, sbeller, jonathantanmy

On 12/01, Jeff King wrote:
> On Thu, Dec 01, 2016 at 10:46:23AM -0800, Junio C Hamano wrote:
> 
> > > mkpath() is generally an unsafe function because it uses a static
> > > buffer, but it's handy and safe for handing values to syscalls like
> > > this.
> > 
> > I think your "unsafe" is not about thread-safety but about "the
> > caller cannot rely on returned value staying valid for long haul".
> > If this change since v5 is about thread-safety, I am not sure if it
> > is safe to use mkpath here.
> 
> Oh, good point. I meant "staying valid", but somehow totally forgot that
> we cared about thread reentrancy here. As if I hadn't just spent an hour
> debugging a thread problem.
> 
> My suggestion is clearly nonsense.
> 
> > I am a bit wary of making the check too sketchy like this, but this
> > is not about determining if a random "path" that has ".git" in a
> > superproject working tree is a submodule or not (that information
> > primarily comes from the superproject index), so I tend to agree
> > with the patch that it is sufficient to check presence of ".git"
> > alone.
> 
> The real danger is that it is a different check than the child process
> is going to use, so they may disagree (see the almost-infinite-loop
> discussion elsewhere).
> 
> It feels quite hacky, but checking:
> 
>   if (is_git_directory(suspect))
> 	return 1; /* actual git dir */
>   if (!stat(suspect, &st) && S_ISREG(st.st_mode))
> 	return 1; /* gitfile; may or may not be valid */
>   return 0;
> 
> is a little more robust, because the child process will happily skip a
> non-repo ".git" and keep walking back up to the superproject. Whereas if
> it sees any ".git" file, even if it is bogus, it will barf then and
> there.
> 
> I'm actually not sure if that latter behavior is a bug or not. I don't
> think it was really planned out, and it obviously is inconsistent with
> the other repo-discovery cases. But it is a convenient side effect for
> submodules, and I doubt anybody is bothered by it in practice.
> 
> -Peff

I think this more robust check is probably a good idea, that way we
don't step into a submodule with a .git directory that isn't really a
.git dir.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01 19:16                   ` Brandon Williams
@ 2016-12-01 20:54                     ` Brandon Williams
  2016-12-01 20:59                       ` Jeff King
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-12-01 20:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, sbeller, jonathantanmy

On 12/01, Brandon Williams wrote:
> On 12/01, Jeff King wrote:
> > On Thu, Dec 01, 2016 at 10:46:23AM -0800, Junio C Hamano wrote:
> > 
> > > > mkpath() is generally an unsafe function because it uses a static
> > > > buffer, but it's handy and safe for handing values to syscalls like
> > > > this.
> > > 
> > > I think your "unsafe" is not about thread-safety but about "the
> > > caller cannot rely on returned value staying valid for long haul".
> > > If this change since v5 is about thread-safety, I am not sure if it
> > > is safe to use mkpath here.
> > 
> > Oh, good point. I meant "staying valid", but somehow totally forgot that
> > we cared about thread reentrancy here. As if I hadn't just spent an hour
> > debugging a thread problem.
> > 
> > My suggestion is clearly nonsense.
> > 
> > > I am a bit wary of making the check too sketchy like this, but this
> > > is not about determining if a random "path" that has ".git" in a
> > > superproject working tree is a submodule or not (that information
> > > primarily comes from the superproject index), so I tend to agree
> > > with the patch that it is sufficient to check presence of ".git"
> > > alone.
> > 
> > The real danger is that it is a different check than the child process
> > is going to use, so they may disagree (see the almost-infinite-loop
> > discussion elsewhere).
> > 
> > It feels quite hacky, but checking:
> > 
> >   if (is_git_directory(suspect))
> > 	return 1; /* actual git dir */
> >   if (!stat(suspect, &st) && S_ISREG(st.st_mode))
> > 	return 1; /* gitfile; may or may not be valid */
> >   return 0;
> > 
> > is a little more robust, because the child process will happily skip a
> > non-repo ".git" and keep walking back up to the superproject. Whereas if
> > it sees any ".git" file, even if it is bogus, it will barf then and
> > there.
> > 
> > I'm actually not sure if that latter behavior is a bug or not. I don't
> > think it was really planned out, and it obviously is inconsistent with
> > the other repo-discovery cases. But it is a convenient side effect for
> > submodules, and I doubt anybody is bothered by it in practice.
> > 
> > -Peff
> 
> I think this more robust check is probably a good idea, that way we
> don't step into a submodule with a .git directory that isn't really a
> .git dir.

Looks like this is a no-go as well...the call to is_git_directory() ends
up calling real_path...which ends up performing the chdir call, which
puts us right back to where we started!  (as a side note I was using
is_git_directory else where...which I now know I can't use)

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01 20:54                     ` Brandon Williams
@ 2016-12-01 20:59                       ` Jeff King
  2016-12-01 21:56                         ` Stefan Beller
  0 siblings, 1 reply; 126+ messages in thread
From: Jeff King @ 2016-12-01 20:59 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git, sbeller, jonathantanmy

On Thu, Dec 01, 2016 at 12:54:44PM -0800, Brandon Williams wrote:

> > I think this more robust check is probably a good idea, that way we
> > don't step into a submodule with a .git directory that isn't really a
> > .git dir.
> 
> Looks like this is a no-go as well...the call to is_git_directory() ends
> up calling real_path...which ends up performing the chdir call, which
> puts us right back to where we started!  (as a side note I was using
> is_git_directory else where...which I now know I can't use)

Bleh. Looks like it happens as part of the recently-added
get_common_dir(). I'm not sure if that is ever relevant for submodules,
but I guess in theory you could have a submodule clone that is part of a
worktree?

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01 20:59                       ` Jeff King
@ 2016-12-01 21:56                         ` Stefan Beller
  2016-12-01 21:59                           ` Jeff King
  0 siblings, 1 reply; 126+ messages in thread
From: Stefan Beller @ 2016-12-01 21:56 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Williams, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On Thu, Dec 1, 2016 at 12:59 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Dec 01, 2016 at 12:54:44PM -0800, Brandon Williams wrote:
>
>> > I think this more robust check is probably a good idea, that way we
>> > don't step into a submodule with a .git directory that isn't really a
>> > .git dir.
>>
>> Looks like this is a no-go as well...the call to is_git_directory() ends
>> up calling real_path...which ends up performing the chdir call, which
>> puts us right back to where we started!  (as a side note I was using
>> is_git_directory else where...which I now know I can't use)
>
> Bleh. Looks like it happens as part of the recently-added
> get_common_dir(). I'm not sure if that is ever relevant for submodules,
> but I guess in theory you could have a submodule clone that is part of a
> worktree?

Sure we can, for a test that we don't have that, see the embedgitdirs series. ;)

For now each submodule has its own complete git dir, but the vision
would be to have a common git dir for submodules in the common
superprojects git dir as well, such that objects are shared actually. :)

>
> -Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01 21:56                         ` Stefan Beller
@ 2016-12-01 21:59                           ` Jeff King
  2016-12-02 18:36                             ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Jeff King @ 2016-12-01 21:59 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Brandon Williams, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On Thu, Dec 01, 2016 at 01:56:32PM -0800, Stefan Beller wrote:

> > Bleh. Looks like it happens as part of the recently-added
> > get_common_dir(). I'm not sure if that is ever relevant for submodules,
> > but I guess in theory you could have a submodule clone that is part of a
> > worktree?
> 
> Sure we can, for a test that we don't have that, see the embedgitdirs series. ;)
> 
> For now each submodule has its own complete git dir, but the vision
> would be to have a common git dir for submodules in the common
> superprojects git dir as well, such that objects are shared actually. :)

Fair enough. Given that it seems to behave OK even in error cases, the
simple stat() test may be the best option, then. I do think we should
consider adding a few test cases to make sure it continues to behave in
the error cases (just because we are relying partially on what git's
setup code happens to do currently, and we'd want to protect ourselves
against regressions).

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-01 21:59                           ` Jeff King
@ 2016-12-02 18:36                             ` Brandon Williams
  2016-12-02 18:44                               ` Jacob Keller
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-12-02 18:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Stefan Beller, Junio C Hamano, git@vger.kernel.org, Jonathan Tan

On 12/01, Jeff King wrote:
> On Thu, Dec 01, 2016 at 01:56:32PM -0800, Stefan Beller wrote:
> 
> > > Bleh. Looks like it happens as part of the recently-added
> > > get_common_dir(). I'm not sure if that is ever relevant for submodules,
> > > but I guess in theory you could have a submodule clone that is part of a
> > > worktree?
> > 
> > Sure we can, for a test that we don't have that, see the embedgitdirs series. ;)
> > 
> > For now each submodule has its own complete git dir, but the vision
> > would be to have a common git dir for submodules in the common
> > superprojects git dir as well, such that objects are shared actually. :)
> 
> Fair enough. Given that it seems to behave OK even in error cases, the
> simple stat() test may be the best option, then. I do think we should
> consider adding a few test cases to make sure it continues to behave in
> the error cases (just because we are relying partially on what git's
> setup code happens to do currently, and we'd want to protect ourselves
> against regressions).

For the naive (ie me), is there a reason why real_path() couldn't be
re-implemented to avoid using chdir?  I tried looking into the history of
the function but couldn't find anything explaining why it was done that
way.  I assume it has to do with symlinks, but I thought there was a
syscall (readlink?) that could do the resolution.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 18:36                             ` Brandon Williams
@ 2016-12-02 18:44                               ` Jacob Keller
  2016-12-02 18:49                                 ` Brandon Williams
  0 siblings, 1 reply; 126+ messages in thread
From: Jacob Keller @ 2016-12-02 18:44 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jeff King, Stefan Beller, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On Fri, Dec 2, 2016 at 10:36 AM, Brandon Williams <bmwill@google.com> wrote:
> On 12/01, Jeff King wrote:
>> On Thu, Dec 01, 2016 at 01:56:32PM -0800, Stefan Beller wrote:
>>
>> > > Bleh. Looks like it happens as part of the recently-added
>> > > get_common_dir(). I'm not sure if that is ever relevant for submodules,
>> > > but I guess in theory you could have a submodule clone that is part of a
>> > > worktree?
>> >
>> > Sure we can, for a test that we don't have that, see the embedgitdirs series. ;)
>> >
>> > For now each submodule has its own complete git dir, but the vision
>> > would be to have a common git dir for submodules in the common
>> > superprojects git dir as well, such that objects are shared actually. :)
>>
>> Fair enough. Given that it seems to behave OK even in error cases, the
>> simple stat() test may be the best option, then. I do think we should
>> consider adding a few test cases to make sure it continues to behave in
>> the error cases (just because we are relying partially on what git's
>> setup code happens to do currently, and we'd want to protect ourselves
>> against regressions).
>
> For the naive (ie me), is there a reason why real_path() couldn't be
> re-implemented to avoid using chdir?  I tried looking into the history of
> the function but couldn't find anything explaining why it was done that
> way.  I assume it has to do with symlinks, but I thought there was a
> syscall (readlink?) that could do the resolution.
>
> --
> Brandon Williams

The reason as far as I understand it, is that it uses chdir() to
guarantee that it follows symlinks correctly and then looks up the
resulting path after the chdir(). I do not think there is a syscall
that actually correctly works like real_path() does. You *could*
re-write real_path() to do the symlink lookups itself, but as Jeff
recently pointed out, that way lies madness.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 18:44                               ` Jacob Keller
@ 2016-12-02 18:49                                 ` Brandon Williams
  2016-12-02 19:20                                   ` Jacob Keller
  0 siblings, 1 reply; 126+ messages in thread
From: Brandon Williams @ 2016-12-02 18:49 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Jeff King, Stefan Beller, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On 12/02, Jacob Keller wrote:
> On Fri, Dec 2, 2016 at 10:36 AM, Brandon Williams <bmwill@google.com> wrote:
> > On 12/01, Jeff King wrote:
> >> On Thu, Dec 01, 2016 at 01:56:32PM -0800, Stefan Beller wrote:
> >>
> >> > > Bleh. Looks like it happens as part of the recently-added
> >> > > get_common_dir(). I'm not sure if that is ever relevant for submodules,
> >> > > but I guess in theory you could have a submodule clone that is part of a
> >> > > worktree?
> >> >
> >> > Sure we can, for a test that we don't have that, see the embedgitdirs series. ;)
> >> >
> >> > For now each submodule has its own complete git dir, but the vision
> >> > would be to have a common git dir for submodules in the common
> >> > superprojects git dir as well, such that objects are shared actually. :)
> >>
> >> Fair enough. Given that it seems to behave OK even in error cases, the
> >> simple stat() test may be the best option, then. I do think we should
> >> consider adding a few test cases to make sure it continues to behave in
> >> the error cases (just because we are relying partially on what git's
> >> setup code happens to do currently, and we'd want to protect ourselves
> >> against regressions).
> >
> > For the naive (ie me), is there a reason why real_path() couldn't be
> > re-implemented to avoid using chdir?  I tried looking into the history of
> > the function but couldn't find anything explaining why it was done that
> > way.  I assume it has to do with symlinks, but I thought there was a
> > syscall (readlink?) that could do the resolution.
> >
> > --
> > Brandon Williams
> 
> The reason as far as I understand it, is that it uses chdir() to
> guarantee that it follows symlinks correctly and then looks up the
> resulting path after the chdir(). I do not think there is a syscall
> that actually correctly works like real_path() does. You *could*
> re-write real_path() to do the symlink lookups itself, but as Jeff
> recently pointed out, that way lies madness.

So is there a reason why the library function realpath() can't be used?
From a cursory look at its man page it seems to do the symlink
resolution.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 18:49                                 ` Brandon Williams
@ 2016-12-02 19:20                                   ` Jacob Keller
  2016-12-02 19:28                                     ` Stefan Beller
  0 siblings, 1 reply; 126+ messages in thread
From: Jacob Keller @ 2016-12-02 19:20 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Jeff King, Stefan Beller, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On Fri, Dec 2, 2016 at 10:49 AM, Brandon Williams <bmwill@google.com> wrote:
> On 12/02, Jacob Keller wrote:
>> On Fri, Dec 2, 2016 at 10:36 AM, Brandon Williams <bmwill@google.com> wrote:
>> > On 12/01, Jeff King wrote:
>> >> On Thu, Dec 01, 2016 at 01:56:32PM -0800, Stefan Beller wrote:
>> >>
>> >> > > Bleh. Looks like it happens as part of the recently-added
>> >> > > get_common_dir(). I'm not sure if that is ever relevant for submodules,
>> >> > > but I guess in theory you could have a submodule clone that is part of a
>> >> > > worktree?
>> >> >
>> >> > Sure we can, for a test that we don't have that, see the embedgitdirs series. ;)
>> >> >
>> >> > For now each submodule has its own complete git dir, but the vision
>> >> > would be to have a common git dir for submodules in the common
>> >> > superprojects git dir as well, such that objects are shared actually. :)
>> >>
>> >> Fair enough. Given that it seems to behave OK even in error cases, the
>> >> simple stat() test may be the best option, then. I do think we should
>> >> consider adding a few test cases to make sure it continues to behave in
>> >> the error cases (just because we are relying partially on what git's
>> >> setup code happens to do currently, and we'd want to protect ourselves
>> >> against regressions).
>> >
>> > For the naive (ie me), is there a reason why real_path() couldn't be
>> > re-implemented to avoid using chdir?  I tried looking into the history of
>> > the function but couldn't find anything explaining why it was done that
>> > way.  I assume it has to do with symlinks, but I thought there was a
>> > syscall (readlink?) that could do the resolution.
>> >
>> > --
>> > Brandon Williams
>>
>> The reason as far as I understand it, is that it uses chdir() to
>> guarantee that it follows symlinks correctly and then looks up the
>> resulting path after the chdir(). I do not think there is a syscall
>> that actually correctly works like real_path() does. You *could*
>> re-write real_path() to do the symlink lookups itself, but as Jeff
>> recently pointed out, that way lies madness.
>
> So is there a reason why the library function realpath() can't be used?
> From a cursory look at its man page it seems to do the symlink
> resolution.
>
> --
> Brandon Williams

I believe it uses the same method and thus wouldn't actually resolve
the issue. But I'm not really 100% sure on this.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 19:20                                   ` Jacob Keller
@ 2016-12-02 19:28                                     ` Stefan Beller
  2016-12-02 21:31                                       ` Jacob Keller
  2016-12-02 21:45                                       ` Jeff King
  0 siblings, 2 replies; 126+ messages in thread
From: Stefan Beller @ 2016-12-02 19:28 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Brandon Williams, Jeff King, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On Fri, Dec 2, 2016 at 11:20 AM, Jacob Keller <jacob.keller@gmail.com> wrote:
>>
>> So is there a reason why the library function realpath() can't be used?
>> From a cursory look at its man page it seems to do the symlink
>> resolution.
>>
>> --
>> Brandon Williams
>
> I believe it uses the same method and thus wouldn't actually resolve
> the issue. But I'm not really 100% sure on this.
>
> Thanks,
> Jake

I just reviewed 2 libc implementations (glibc and an Android libc) and
both of them
do not use chdir internally, but use readlink and compose the path 'manually'
c.f. http://osxr.org:8080/glibc/source/stdlib/canonicalize.c?v=glibc-2.13

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 19:28                                     ` Stefan Beller
@ 2016-12-02 21:31                                       ` Jacob Keller
  2016-12-02 21:46                                         ` Brandon Williams
  2016-12-02 21:45                                       ` Jeff King
  1 sibling, 1 reply; 126+ messages in thread
From: Jacob Keller @ 2016-12-02 21:31 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Brandon Williams, Jeff King, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On Fri, Dec 2, 2016 at 11:28 AM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Dec 2, 2016 at 11:20 AM, Jacob Keller <jacob.keller@gmail.com> wrote:
>>>
>>> So is there a reason why the library function realpath() can't be used?
>>> From a cursory look at its man page it seems to do the symlink
>>> resolution.
>>>
>>> --
>>> Brandon Williams
>>
>> I believe it uses the same method and thus wouldn't actually resolve
>> the issue. But I'm not really 100% sure on this.
>>
>> Thanks,
>> Jake
>
> I just reviewed 2 libc implementations (glibc and an Android libc) and
> both of them
> do not use chdir internally, but use readlink and compose the path 'manually'
> c.f. http://osxr.org:8080/glibc/source/stdlib/canonicalize.c?v=glibc-2.13

Interesting. Would this be portable to Windows, though?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 19:28                                     ` Stefan Beller
  2016-12-02 21:31                                       ` Jacob Keller
@ 2016-12-02 21:45                                       ` Jeff King
  2016-12-03  0:16                                         ` Brandon Williams
  1 sibling, 1 reply; 126+ messages in thread
From: Jeff King @ 2016-12-02 21:45 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jacob Keller, Brandon Williams, Junio C Hamano,
	git@vger.kernel.org, Jonathan Tan

On Fri, Dec 02, 2016 at 11:28:49AM -0800, Stefan Beller wrote:

> I just reviewed 2 libc implementations (glibc and an Android libc) and
> both of them
> do not use chdir internally, but use readlink and compose the path 'manually'
> c.f. http://osxr.org:8080/glibc/source/stdlib/canonicalize.c?v=glibc-2.13

Interesting. It might be worth updating our implementation. The original
comes all the way from 54f4b8745 (Library code for user-relative paths,
take three., 2005-11-17). That references a suggestion which I think
comes from:

  http://public-inbox.org/git/Pine.LNX.4.64.0510181728490.3369@g5.osdl.org/

where it's claimed to be simpler and more efficient (which sounds
plausible to me).  But back then it was _just_ git-daemon doing a
canonicalization, and nobody cared about things like thread safety.

Looking at the glibc implementation, it's really not that bad. We
_could_ even rely on the system realpath() and just provide our own
fallback for systems without it, but I think ours might be a little more
featureful (at the very least, it handles arbitrary-sized paths via
strbufs).

-Peff

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 21:31                                       ` Jacob Keller
@ 2016-12-02 21:46                                         ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-02 21:46 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Jeff King, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On 12/02, Jacob Keller wrote:
> On Fri, Dec 2, 2016 at 11:28 AM, Stefan Beller <sbeller@google.com> wrote:
> > On Fri, Dec 2, 2016 at 11:20 AM, Jacob Keller <jacob.keller@gmail.com> wrote:
> >>>
> >>> So is there a reason why the library function realpath() can't be used?
> >>> From a cursory look at its man page it seems to do the symlink
> >>> resolution.
> >>>
> >>> --
> >>> Brandon Williams
> >>
> >> I believe it uses the same method and thus wouldn't actually resolve
> >> the issue. But I'm not really 100% sure on this.
> >>
> >> Thanks,
> >> Jake
> >
> > I just reviewed 2 libc implementations (glibc and an Android libc) and
> > both of them
> > do not use chdir internally, but use readlink and compose the path 'manually'
> > c.f. http://osxr.org:8080/glibc/source/stdlib/canonicalize.c?v=glibc-2.13
> 
> Interesting. Would this be portable to Windows, though?

Perhaps.  It looks like the only crazy thing it does is use readlink,
which our real_path function is already doing.  I don't think we could
drop in their implementation though since there are other things that it
does that aren't portable to windows (like determining if a path is
absolute or not).  Rather their implementation gives me some hope that
it is possible to resolve the real path without using chdir.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules
  2016-12-02 21:45                                       ` Jeff King
@ 2016-12-03  0:16                                         ` Brandon Williams
  0 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-03  0:16 UTC (permalink / raw)
  To: Jeff King
  Cc: Stefan Beller, Jacob Keller, Junio C Hamano, git@vger.kernel.org,
	Jonathan Tan

On 12/02, Jeff King wrote:
> On Fri, Dec 02, 2016 at 11:28:49AM -0800, Stefan Beller wrote:
> 
> > I just reviewed 2 libc implementations (glibc and an Android libc) and
> > both of them
> > do not use chdir internally, but use readlink and compose the path 'manually'
> > c.f. http://osxr.org:8080/glibc/source/stdlib/canonicalize.c?v=glibc-2.13
> 
> Interesting. It might be worth updating our implementation. The original
> comes all the way from 54f4b8745 (Library code for user-relative paths,
> take three., 2005-11-17). That references a suggestion which I think
> comes from:
> 
>   http://public-inbox.org/git/Pine.LNX.4.64.0510181728490.3369@g5.osdl.org/
> 
> where it's claimed to be simpler and more efficient (which sounds
> plausible to me).  But back then it was _just_ git-daemon doing a
> canonicalization, and nobody cared about things like thread safety.
> 
> Looking at the glibc implementation, it's really not that bad. We
> _could_ even rely on the system realpath() and just provide our own
> fallback for systems without it, but I think ours might be a little more
> featureful (at the very least, it handles arbitrary-sized paths via
> strbufs).

I've actually been working on updating our implementation of realpath
today.  Its slow going but we'll see if it works when i'm done :)

Also we can just drop in realpath since it requires that all path
components are valid, while ours allows for the final component to be
invalid.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v7 0/7] recursively grep across submodules
  2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
                             ` (6 preceding siblings ...)
  2016-12-01  4:22           ` [PATCH v6 0/6] recursively grep across submodules Jeff King
@ 2016-12-16 19:03           ` Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 1/7] submodules: add helper to determine if a submodule is populated Brandon Williams
                               ` (7 more replies)
  7 siblings, 8 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

Changes in v7:
* Rebased on 'origin/bw/realpath-wo-chdir' in order to fix the race condition
  that occurs when verifying a submodule's gitdir.
* Reverted is_submodule_populated() to use resolve_gitdir() now that there is
  no race condition.
* Added !MINGW to a test in t7814 so that it won't run on windows.  This is due
  to testing if colons in filenames are still handled correctly, yet windows
  doesn't allow colons in filenames.

Brandon Williams (7):
  submodules: add helper to determine if a submodule is populated
  submodules: add helper to determine if a submodule is initialized
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on <tree> objects
  grep: search history of moved submodules

 Documentation/git-grep.txt         |  14 ++
 builtin/grep.c                     | 386 ++++++++++++++++++++++++++++++++++---
 cache.h                            |   2 +
 config.c                           |   8 +-
 git.c                              |   2 +-
 grep.c                             |  16 +-
 grep.h                             |   1 +
 submodule-config.c                 |   6 +-
 submodule-config.h                 |   3 +
 submodule.c                        |  50 +++++
 submodule.h                        |   3 +
 t/t7814-grep-recurse-submodules.sh | 241 +++++++++++++++++++++++
 tree-walk.c                        |  28 +++
 13 files changed, 729 insertions(+), 31 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v7 1/7] submodules: add helper to determine if a submodule is populated
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
@ 2016-12-16 19:03             ` Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 2/7] submodules: add helper to determine if a submodule is initialized Brandon Williams
                               ` (6 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

Add the `is_submodule_populated()` helper function to submodules.c.
`is_submodule_populated()` performes a check to see if a submodule has
been checkout out (and has a valid .git directory/file) at the given path.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 15 +++++++++++++++
 submodule.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/submodule.c b/submodule.c
index c85ba50..ee3198d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,21 @@ void gitmodules_config(void)
 	}
 }
 
+/*
+ * Determine if a submodule has been populated at a given 'path'
+ */
+int is_submodule_populated(const char *path)
+{
+	int ret = 0;
+	char *gitdir = xstrfmt("%s/.git", path);
+
+	if (resolve_gitdir(gitdir))
+		ret = 1;
+
+	free(gitdir);
+	return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a..c4af505 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v7 2/7] submodules: add helper to determine if a submodule is initialized
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 1/7] submodules: add helper to determine if a submodule is populated Brandon Williams
@ 2016-12-16 19:03             ` Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 3/7] submodules: load gitmodules file from commit sha1 Brandon Williams
                               ` (5 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

Add the `is_submodule_initialized()` helper function to submodules.c.
`is_submodule_initialized()` performs a check to determine if the
submodule at the given path has been initialized.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 submodule.c | 23 +++++++++++++++++++++++
 submodule.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/submodule.c b/submodule.c
index ee3198d..edffaa1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -199,6 +199,29 @@ void gitmodules_config(void)
 }
 
 /*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+	int ret = 0;
+	const struct submodule *module = NULL;
+
+	module = submodule_from_path(null_sha1, path);
+
+	if (module) {
+		char *key = xstrfmt("submodule.%s.url", module->name);
+		char *value = NULL;
+
+		ret = !git_config_get_string(key, &value);
+
+		free(value);
+		free(key);
+	}
+
+	return ret;
+}
+
+/*
  * Determine if a submodule has been populated at a given 'path'
  */
 int is_submodule_populated(const char *path)
diff --git a/submodule.h b/submodule.h
index c4af505..6ec5f2f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
 extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v7 3/7] submodules: load gitmodules file from commit sha1
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 1/7] submodules: add helper to determine if a submodule is populated Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 2/7] submodules: add helper to determine if a submodule is initialized Brandon Williams
@ 2016-12-16 19:03             ` Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 4/7] grep: add submodules as a grep source type Brandon Williams
                               ` (4 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 cache.h            |  2 ++
 config.c           |  8 ++++----
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c        | 12 ++++++++++++
 submodule.h        |  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index e12a5d9..de237ca 100644
--- a/cache.h
+++ b/cache.h
@@ -1693,6 +1693,8 @@ extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
 					const char *name, const char *buf, size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+				     const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb..4d78e72 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum config_origin_type origin_typ
 	return do_config_from(&top, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-				     const char *name,
-				     const unsigned char *sha1,
-				     void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+			      const char *name,
+			      const unsigned char *sha1,
+			      void *data)
 {
 	enum object_type type;
 	char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085b..8b9a2ef 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, void *data)
 	return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
-				      unsigned char *gitmodules_sha1,
-				      struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+			       unsigned char *gitmodules_sha1,
+			       struct strbuf *rev)
 {
 	int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..78584ba 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
 		const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
 		const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+				      unsigned char *gitmodules_sha1,
+				      struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index edffaa1..2600908 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
 	}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+	struct strbuf rev = STRBUF_INIT;
+	unsigned char sha1[20];
+
+	if (gitmodule_sha1_from_commit(commit_sha1, sha1, &rev)) {
+		git_config_from_blob_sha1(submodule_config, rev.buf,
+					  sha1, NULL);
+	}
+	strbuf_release(&rev);
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index 6ec5f2f..9203d89 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
 		const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v7 4/7] grep: add submodules as a grep source type
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
                               ` (2 preceding siblings ...)
  2016-12-16 19:03             ` [PATCH v7 3/7] submodules: load gitmodules file from commit sha1 Brandon Williams
@ 2016-12-16 19:03             ` Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 5/7] grep: optionally recurse into submodules Brandon Williams
                               ` (3 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 grep.c | 16 +++++++++++++++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35..0dbdc1d 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum grep_source_type type,
 	case GREP_SOURCE_FILE:
 		gs->identifier = xstrdup(identifier);
 		break;
+	case GREP_SOURCE_SUBMODULE:
+		if (!identifier) {
+			gs->identifier = NULL;
+			break;
+		}
+		/*
+		 * FALL THROUGH
+		 * If the identifier is non-NULL (in the submodule case) it
+		 * will be a SHA1 that needs to be copied.
+		 */
 	case GREP_SOURCE_SHA1:
 		gs->identifier = xmalloc(20);
 		hashcpy(gs->identifier, identifier);
 		break;
 	case GREP_SOURCE_BUF:
 		gs->identifier = NULL;
+		break;
 	}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
 	switch (gs->type) {
 	case GREP_SOURCE_FILE:
 	case GREP_SOURCE_SHA1:
+	case GREP_SOURCE_SUBMODULE:
 		free(gs->buf);
 		gs->buf = NULL;
 		gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
 		return grep_source_load_sha1(gs);
 	case GREP_SOURCE_BUF:
 		return gs->buf ? 0 : -1;
+	case GREP_SOURCE_SUBMODULE:
+		break;
 	}
-	die("BUG: invalid grep_source type");
+	die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23..267534c 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
 		GREP_SOURCE_SHA1,
 		GREP_SOURCE_FILE,
 		GREP_SOURCE_BUF,
+		GREP_SOURCE_SUBMODULE,
 	} type;
 	void *identifier;
 
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v7 5/7] grep: optionally recurse into submodules
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
                               ` (3 preceding siblings ...)
  2016-12-16 19:03             ` [PATCH v7 4/7] grep: add submodules as a grep source type Brandon Williams
@ 2016-12-16 19:03             ` Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 6/7] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
                               ` (2 subsequent siblings)
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |   5 +
 builtin/grep.c                     | 300 ++++++++++++++++++++++++++++++++++---
 git.c                              |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 ++++++++++++
 4 files changed, 386 insertions(+), 20 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e..17aa1ba 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
+	   [--recurse-submodules]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -88,6 +89,10 @@ OPTIONS
 	mechanism.  Only useful when searching files in the current directory
 	with `--no-index`.
 
+--recurse-submodules::
+	Recursively search in each submodule that has been initialized and
+	checked out in the repository.
+
 -a::
 --text::
 	Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6a..dca0be6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
 	NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
 			break;
 
 		opt->output_priv = w;
-		hit |= grep_source(opt, &w->source);
+		if (w->source.type == GREP_SOURCE_SUBMODULE)
+			hit |= grep_submodule_launch(opt, &w->source);
+		else
+			hit |= grep_source(opt, &w->source);
 		grep_source_clear_data(&w->source);
 		work_done(w);
 	}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1,
 	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename + tree_name_len, opt->prefix, &pathbuf);
 		strbuf_insert(&pathbuf, 0, filename, tree_name_len);
+	} else if (super_prefix) {
+		strbuf_add(&pathbuf, filename, tree_name_len);
+		strbuf_addstr(&pathbuf, super_prefix);
+		strbuf_addstr(&pathbuf, filename + tree_name_len);
 	} else {
 		strbuf_addstr(&pathbuf, filename);
 	}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 {
 	struct strbuf buf = STRBUF_INIT;
 
-	if (opt->relative && opt->prefix_length)
+	if (opt->relative && opt->prefix_length) {
 		quote_path_relative(filename, opt->prefix, &buf);
-	else
+	} else {
+		if (super_prefix)
+			strbuf_addstr(&buf, super_prefix);
 		strbuf_addstr(&buf, filename);
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
@@ -378,31 +396,260 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 		exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+				      const struct pathspec *pathspec,
+				      int cached, int untracked,
+				      int opt_exclude, int use_index,
+				      int pattern_type_arg)
+{
+	struct grep_pat *pattern;
+	int i;
+
+	if (recurse_submodules)
+		argv_array_push(&submodule_options, "--recurse-submodules");
+
+	if (cached)
+		argv_array_push(&submodule_options, "--cached");
+	if (!use_index)
+		argv_array_push(&submodule_options, "--no-index");
+	if (untracked)
+		argv_array_push(&submodule_options, "--untracked");
+	if (opt_exclude > 0)
+		argv_array_push(&submodule_options, "--exclude-standard");
+
+	if (opt->invert)
+		argv_array_push(&submodule_options, "-v");
+	if (opt->ignore_case)
+		argv_array_push(&submodule_options, "-i");
+	if (opt->word_regexp)
+		argv_array_push(&submodule_options, "-w");
+	switch (opt->binary) {
+	case GREP_BINARY_NOMATCH:
+		argv_array_push(&submodule_options, "-I");
+		break;
+	case GREP_BINARY_TEXT:
+		argv_array_push(&submodule_options, "-a");
+		break;
+	default:
+		break;
+	}
+	if (opt->allow_textconv)
+		argv_array_push(&submodule_options, "--textconv");
+	if (opt->max_depth != -1)
+		argv_array_pushf(&submodule_options, "--max-depth=%d",
+				 opt->max_depth);
+	if (opt->linenum)
+		argv_array_push(&submodule_options, "-n");
+	if (!opt->pathname)
+		argv_array_push(&submodule_options, "-h");
+	if (!opt->relative)
+		argv_array_push(&submodule_options, "--full-name");
+	if (opt->name_only)
+		argv_array_push(&submodule_options, "-l");
+	if (opt->unmatch_name_only)
+		argv_array_push(&submodule_options, "-L");
+	if (opt->null_following_name)
+		argv_array_push(&submodule_options, "-z");
+	if (opt->count)
+		argv_array_push(&submodule_options, "-c");
+	if (opt->file_break)
+		argv_array_push(&submodule_options, "--break");
+	if (opt->heading)
+		argv_array_push(&submodule_options, "--heading");
+	if (opt->pre_context)
+		argv_array_pushf(&submodule_options, "--before-context=%d",
+				 opt->pre_context);
+	if (opt->post_context)
+		argv_array_pushf(&submodule_options, "--after-context=%d",
+				 opt->post_context);
+	if (opt->funcname)
+		argv_array_push(&submodule_options, "-p");
+	if (opt->funcbody)
+		argv_array_push(&submodule_options, "-W");
+	if (opt->all_match)
+		argv_array_push(&submodule_options, "--all-match");
+	if (opt->debug)
+		argv_array_push(&submodule_options, "--debug");
+	if (opt->status_only)
+		argv_array_push(&submodule_options, "-q");
+
+	switch (pattern_type_arg) {
+	case GREP_PATTERN_TYPE_BRE:
+		argv_array_push(&submodule_options, "-G");
+		break;
+	case GREP_PATTERN_TYPE_ERE:
+		argv_array_push(&submodule_options, "-E");
+		break;
+	case GREP_PATTERN_TYPE_FIXED:
+		argv_array_push(&submodule_options, "-F");
+		break;
+	case GREP_PATTERN_TYPE_PCRE:
+		argv_array_push(&submodule_options, "-P");
+		break;
+	case GREP_PATTERN_TYPE_UNSPECIFIED:
+		break;
+	}
+
+	for (pattern = opt->pattern_list; pattern != NULL;
+	     pattern = pattern->next) {
+		switch (pattern->token) {
+		case GREP_PATTERN:
+			argv_array_pushf(&submodule_options, "-e%s",
+					 pattern->pattern);
+			break;
+		case GREP_AND:
+		case GREP_OPEN_PAREN:
+		case GREP_CLOSE_PAREN:
+		case GREP_NOT:
+		case GREP_OR:
+			argv_array_push(&submodule_options, pattern->pattern);
+			break;
+		/* BODY and HEAD are not used by git-grep */
+		case GREP_PATTERN_BODY:
+		case GREP_PATTERN_HEAD:
+			break;
+		}
+	}
+
+	/*
+	 * Limit number of threads for child process to use.
+	 * This is to prevent potential fork-bomb behavior of git-grep as each
+	 * submodule process has its own thread pool.
+	 */
+	argv_array_pushf(&submodule_options, "--threads=%d",
+			 (num_threads + 1) / 2);
+
+	/* Add Pathspecs */
+	argv_array_push(&submodule_options, "--");
+	for (i = 0; i < pathspec->nr; i++)
+		argv_array_push(&submodule_options,
+				pathspec->items[i].original);
+}
+
+/*
+ * Launch child process to grep contents of a submodule
+ */
+static int grep_submodule_launch(struct grep_opt *opt,
+				 const struct grep_source *gs)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	int status, i;
+	struct work_item *w = opt->output_priv;
+
+	prepare_submodule_repo_env(&cp.env_array);
+
+	/* Add super prefix */
+	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
+			 super_prefix ? super_prefix : "",
+			 gs->name);
+	argv_array_push(&cp.args, "grep");
+
+	/* Add options */
+	for (i = 0; i < submodule_options.argc; i++)
+		argv_array_push(&cp.args, submodule_options.argv[i]);
+
+	cp.git_cmd = 1;
+	cp.dir = gs->path;
+
+	/*
+	 * Capture output to output buffer and check the return code from the
+	 * child process.  A '0' indicates a hit, a '1' indicates no hit and
+	 * anything else is an error.
+	 */
+	status = capture_command(&cp, &w->out, 0);
+	if (status && (status != 1)) {
+		/* flush the buffer */
+		write_or_die(1, w->out.buf, w->out.len);
+		die("process for submodule '%s' failed with exit code: %d",
+		    gs->name, status);
+	}
+
+	/* invert the return code to make a hit equal to 1 */
+	return !status;
+}
+
+/*
+ * Prep grep structures for a submodule grep
+ * sha1: the sha1 of the submodule or NULL if using the working tree
+ * filename: name of the submodule including tree name of parent
+ * path: location of the submodule
+ */
+static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
+			  const char *filename, const char *path)
+{
+	if (!is_submodule_initialized(path))
+		return 0;
+	if (!is_submodule_populated(path))
+		return 0;
+
+#ifndef NO_PTHREADS
+	if (num_threads) {
+		add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
+		return 0;
+	} else
+#endif
+	{
+		struct work_item w;
+		int hit;
+
+		grep_source_init(&w.source, GREP_SOURCE_SUBMODULE,
+				 filename, path, sha1);
+		strbuf_init(&w.out, 0);
+		opt->output_priv = &w;
+		hit = grep_submodule_launch(opt, &w.source);
+
+		write_or_die(1, w.out.buf, w.out.len);
+
+		grep_source_clear(&w.source);
+		strbuf_release(&w.out);
+		return hit;
+	}
+}
+
+static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
+		      int cached)
 {
 	int hit = 0;
 	int nr;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		name_base_len = strlen(super_prefix);
+		strbuf_addstr(&name, super_prefix);
+	}
+
 	read_cache();
 
 	for (nr = 0; nr < active_nr; nr++) {
 		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!ce_path_match(ce, pathspec, NULL))
+		strbuf_setlen(&name, name_base_len);
+		strbuf_addstr(&name, ce->name);
+
+		if (S_ISREG(ce->ce_mode) &&
+		    match_pathspec(pathspec, name.buf, name.len, 0, NULL,
+				   S_ISDIR(ce->ce_mode) ||
+				   S_ISGITLINK(ce->ce_mode))) {
+			/*
+			 * If CE_VALID is on, we assume worktree file and its
+			 * cache entry are identical, even if worktree file has
+			 * been modified, so use cache version instead
+			 */
+			if (cached || (ce->ce_flags & CE_VALID) ||
+			    ce_skip_worktree(ce)) {
+				if (ce_stage(ce) || ce_intent_to_add(ce))
+					continue;
+				hit |= grep_sha1(opt, ce->oid.hash, ce->name,
+						 0, ce->name);
+			} else {
+				hit |= grep_file(opt, ce->name);
+			}
+		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
+			   submodule_path_match(pathspec, name.buf, NULL)) {
+			hit |= grep_submodule(opt, NULL, ce->name, ce->name);
+		} else {
 			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce) || ce_intent_to_add(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->oid.hash, ce->name, 0,
-					 ce->name);
 		}
-		else
-			hit |= grep_file(opt, ce->name);
+
 		if (ce_stage(ce)) {
 			do {
 				nr++;
@@ -413,6 +660,8 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -651,6 +900,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			N_("search in both tracked and untracked files")),
 		OPT_SET_INT(0, "exclude-standard", &opt_exclude,
 			    N_("ignore files specified via '.gitignore'"), 1),
+		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
+			 N_("recursivley search in each submodule")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -755,6 +1006,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	init_grep_defaults();
 	git_config(grep_cmd_config, NULL);
 	grep_init(&opt, prefix);
+	super_prefix = get_super_prefix();
 
 	/*
 	 * If there is no -- then the paths must exist in the working
@@ -872,6 +1124,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 
+	if (recurse_submodules) {
+		gitmodules_config();
+		compile_submodule_options(&opt, &pathspec, cached, untracked,
+					  opt_exclude, use_index,
+					  pattern_type_arg);
+	}
+
 	if (show_in_pager && (cached || list.nr))
 		die(_("--open-files-in-pager only works on the worktree"));
 
@@ -895,6 +1154,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (recurse_submodules && (!use_index || untracked || list.nr))
+		die(_("option not supported with --recurse-submodules."));
+
 	if (!show_in_pager && !opt.status_only)
 		setup_pager();
 
diff --git a/git.c b/git.c
index dce529f..c95d3e3 100644
--- a/git.c
+++ b/git.c
@@ -434,7 +434,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
-	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
+	{ "grep", cmd_grep, RUN_SETUP_GENTLY | SUPPORT_SUPER_PREFIX },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
 	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY },
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
new file mode 100755
index 0000000..1019125
--- /dev/null
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+
+test_description='Test grep recurse-submodules feature
+
+This test verifies the recurse-submodules feature correctly greps across
+submodules.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup directory structure and submodule' '
+	echo "foobar" >a &&
+	mkdir b &&
+	echo "bar" >b/b &&
+	git add a b &&
+	git commit -m "add a and b" &&
+	git init submodule &&
+	echo "foobar" >submodule/a &&
+	git -C submodule add a &&
+	git -C submodule commit -m "add a" &&
+	git submodule add ./submodule &&
+	git commit -m "added submodule"
+'
+
+test_expect_success 'grep correctly finds patterns in a submodule' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and basic pathspecs' '
+	cat >expect <<-\EOF &&
+	submodule/a:foobar
+	EOF
+
+	git grep -e. --recurse-submodules -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and nested submodules' '
+	git init submodule/sub &&
+	echo "foobar" >submodule/sub/a &&
+	git -C submodule/sub add a &&
+	git -C submodule/sub commit -m "add a" &&
+	git -C submodule submodule add ./sub &&
+	git -C submodule add sub &&
+	git -C submodule commit -m "added sub" &&
+	git add submodule &&
+	git commit -m "updated submodule" &&
+
+	cat >expect <<-\EOF &&
+	a:foobar
+	b/b:bar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	a:foobar
+	submodule/a:foobar
+	submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep and multiple patterns' '
+	cat >expect <<-\EOF &&
+	b/b:bar
+	EOF
+
+	git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+	test_cmp expect actual
+'
+
+test_incompatible_with_recurse_submodules ()
+{
+	test_expect_success "--recurse-submodules and $1 are incompatible" "
+		test_must_fail git grep -e. --recurse-submodules $1 2>actual &&
+		test_i18ngrep 'not supported with --recurse-submodules' actual
+	"
+}
+
+test_incompatible_with_recurse_submodules --untracked
+test_incompatible_with_recurse_submodules --no-index
+test_incompatible_with_recurse_submodules HEAD
+
+test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v7 6/7] grep: enable recurse-submodules to work on <tree> objects
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
                               ` (4 preceding siblings ...)
  2016-12-16 19:03             ` [PATCH v7 5/7] grep: optionally recurse into submodules Brandon Williams
@ 2016-12-16 19:03             ` Brandon Williams
  2016-12-16 19:03             ` [PATCH v7 7/7] grep: search history of moved submodules Brandon Williams
  2016-12-16 21:42             ` [PATCH v7 0/7] recursively grep across submodules Junio C Hamano
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

Teach grep to recursively search in submodules when provided with a
<tree> object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a <tree> object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD`

from:
  HEAD:file
  <commit sha1 of submodule>:sub/file

to:
  HEAD:file
  HEAD:sub/file

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 Documentation/git-grep.txt         |  13 ++++-
 builtin/grep.c                     |  76 ++++++++++++++++++++++++---
 t/t7814-grep-recurse-submodules.sh | 103 ++++++++++++++++++++++++++++++++++++-
 tree-walk.c                        |  28 ++++++++++
 4 files changed, 211 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..71f32f3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
 	   [--threads <num>]
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
-	   [--recurse-submodules]
+	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
 	Recursively search in each submodule that has been initialized and
-	checked out in the repository.
+	checked out in the repository.  When used in combination with the
+	<tree> option the prefix of all submodule output will be the name of
+	the parent project's <tree> object.
+
+--parent-basename <basename>::
+	For internal use only.  In order to produce uniform output with the
+	--recurse-submodules option, this option can be used to provide the
+	basename of a parent's <tree> object to a submodule so the submodule
+	can prefix its output with the parent's name rather than the SHA1 of
+	the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index dca0be6..5918a26 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 				 const struct grep_source *gs);
@@ -534,19 +536,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status, i;
+	const char *end_of_base;
+	const char *name;
 	struct work_item *w = opt->output_priv;
 
+	end_of_base = strchr(gs->name, ':');
+	if (gs->identifier && end_of_base)
+		name = end_of_base + 1;
+	else
+		name = gs->name;
+
 	prepare_submodule_repo_env(&cp.env_array);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
 			 super_prefix ? super_prefix : "",
-			 gs->name);
+			 name);
 	argv_array_push(&cp.args, "grep");
 
+	/*
+	 * Add basename of parent project
+	 * When performing grep on a tree object the filename is prefixed
+	 * with the object's name: 'tree-name:filename'.  In order to
+	 * provide uniformity of output we want to pass the name of the
+	 * parent project's object name to the submodule so the submodule can
+	 * prefix its output with the parent's name and not its own SHA1.
+	 */
+	if (gs->identifier && end_of_base)
+		argv_array_pushf(&cp.args, "--parent-basename=%.*s",
+				 (int) (end_of_base - gs->name),
+				 gs->name);
+
 	/* Add options */
-	for (i = 0; i < submodule_options.argc; i++)
+	for (i = 0; i < submodule_options.argc; i++) {
+		/*
+		 * If there is a tree identifier for the submodule, add the
+		 * rev after adding the submodule options but before the
+		 * pathspecs.  To do this we listen for the '--' and insert the
+		 * sha1 before pushing the '--' onto the child process argv
+		 * array.
+		 */
+		if (gs->identifier &&
+		    !strcmp("--", submodule_options.argv[i])) {
+			argv_array_push(&cp.args, sha1_to_hex(gs->identifier));
+		}
+
 		argv_array_push(&cp.args, submodule_options.argv[i]);
+	}
 
 	cp.git_cmd = 1;
 	cp.dir = gs->path;
@@ -673,12 +709,22 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 	enum interesting match = entry_not_interesting;
 	struct name_entry entry;
 	int old_baselen = base->len;
+	struct strbuf name = STRBUF_INIT;
+	int name_base_len = 0;
+	if (super_prefix) {
+		strbuf_addstr(&name, super_prefix);
+		name_base_len = name.len;
+	}
 
 	while (tree_entry(tree, &entry)) {
 		int te_len = tree_entry_len(&entry);
 
 		if (match != all_entries_interesting) {
-			match = tree_entry_interesting(&entry, base, tn_len, pathspec);
+			strbuf_addstr(&name, base->buf + tn_len);
+			match = tree_entry_interesting(&entry, &name,
+						       0, pathspec);
+			strbuf_setlen(&name, name_base_len);
+
 			if (match == all_entries_not_interesting)
 				break;
 			if (match == entry_not_interesting)
@@ -690,8 +736,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_sha1(opt, entry.oid->hash, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
-		}
-		else if (S_ISDIR(entry.mode)) {
+		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
 			struct tree_desc sub;
 			void *data;
@@ -707,12 +752,18 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
+		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
+			hit |= grep_submodule(opt, entry.oid->hash, base->buf,
+					      base->buf + tn_len);
 		}
+
 		strbuf_setlen(base, old_baselen);
 
 		if (hit && opt->status_only)
 			break;
 	}
+
+	strbuf_release(&name);
 	return hit;
 }
 
@@ -736,6 +787,10 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 		if (!data)
 			die(_("unable to read tree (%s)"), oid_to_hex(&obj->oid));
 
+		/* Use parent's name as base when recursing submodules */
+		if (recurse_submodules && parent_basename)
+			name = parent_basename;
+
 		len = name ? strlen(name) : 0;
 		strbuf_init(&base, PATH_MAX + len + 1);
 		if (len) {
@@ -762,6 +817,12 @@ static int grep_objects(struct grep_opt *opt, const struct pathspec *pathspec,
 	for (i = 0; i < nr; i++) {
 		struct object *real_obj;
 		real_obj = deref_tag(list->objects[i].item, NULL, 0);
+
+		/* load the gitmodules file for this rev */
+		if (recurse_submodules) {
+			submodule_free();
+			gitmodules_config_sha1(real_obj->oid.hash);
+		}
 		if (grep_object(opt, pathspec, real_obj, list->objects[i].name, list->objects[i].path)) {
 			hit = 1;
 			if (opt->status_only)
@@ -902,6 +963,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			    N_("ignore files specified via '.gitignore'"), 1),
 		OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
 			 N_("recursivley search in each submodule")),
+		OPT_STRING(0, "parent-basename", &parent_basename,
+			   N_("basename"),
+			   N_("prepend parent project's basename to output")),
 		OPT_GROUP(""),
 		OPT_BOOL('v', "invert-match", &opt.invert,
 			N_("show non-matching lines")),
@@ -1154,7 +1218,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	if (recurse_submodules && (!use_index || untracked || list.nr))
+	if (recurse_submodules && (!use_index || untracked))
 		die(_("option not supported with --recurse-submodules."));
 
 	if (!show_in_pager && !opt.status_only)
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 1019125..d5fc316 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -84,6 +84,108 @@ test_expect_success 'grep and multiple patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success 'basic grep tree' '
+	cat >expect <<-\EOF &&
+	HEAD:a:foobar
+	HEAD:b/b:bar
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^' '
+	cat >expect <<-\EOF &&
+	HEAD^:a:foobar
+	HEAD^:b/b:bar
+	HEAD^:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree HEAD^^' '
+	cat >expect <<-\EOF &&
+	HEAD^^:a:foobar
+	HEAD^^:b/b:bar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD^^ >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- submodule >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodule*a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul?/a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep tree and more pathspecs' '
+	cat >expect <<-\EOF &&
+	HEAD:submodule/sub/a:foobar
+	EOF
+
+	git grep -e "bar" --recurse-submodules HEAD -- "submodul*/sub/a" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success !MINGW 'grep recurse submodule colon in name' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >"parent/fi:le" &&
+	git -C parent add "fi:le" &&
+	git -C parent commit -m "add fi:le" &&
+
+	git init "su:b" &&
+	test_when_finished "rm -rf su:b" &&
+	echo "foobar" >"su:b/fi:le" &&
+	git -C "su:b" add "fi:le" &&
+	git -C "su:b" commit -m "add fi:le" &&
+
+	git -C parent submodule add "../su:b" "su:b" &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	fi:le:foobar
+	su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD:fi:le:foobar
+	HEAD:su:b/fi:le:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
@@ -94,6 +196,5 @@ test_incompatible_with_recurse_submodules ()
 
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
-test_incompatible_with_recurse_submodules HEAD
 
 test_done
diff --git a/tree-walk.c b/tree-walk.c
index 828f435..ff77605 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -1004,6 +1004,19 @@ static enum interesting do_match(const struct name_entry *entry,
 				 */
 				if (ps->recursive && S_ISDIR(entry->mode))
 					return entry_interesting;
+
+				/*
+				 * When matching against submodules with
+				 * wildcard characters, ensure that the entry
+				 * at least matches up to the first wild
+				 * character.  More accurate matching can then
+				 * be performed in the submodule itself.
+				 */
+				if (ps->recursive && S_ISGITLINK(entry->mode) &&
+				    !ps_strncmp(item, match + baselen,
+						entry->path,
+						item->nowildcard_len - baselen))
+					return entry_interesting;
 			}
 
 			continue;
@@ -1040,6 +1053,21 @@ static enum interesting do_match(const struct name_entry *entry,
 			strbuf_setlen(base, base_offset + baselen);
 			return entry_interesting;
 		}
+
+		/*
+		 * When matching against submodules with
+		 * wildcard characters, ensure that the entry
+		 * at least matches up to the first wild
+		 * character.  More accurate matching can then
+		 * be performed in the submodule itself.
+		 */
+		if (ps->recursive && S_ISGITLINK(entry->mode) &&
+		    !ps_strncmp(item, match, base->buf + base_offset,
+				item->nowildcard_len)) {
+			strbuf_setlen(base, base_offset + baselen);
+			return entry_interesting;
+		}
+
 		strbuf_setlen(base, base_offset + baselen);
 
 		/*
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v7 7/7] grep: search history of moved submodules
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
                               ` (5 preceding siblings ...)
  2016-12-16 19:03             ` [PATCH v7 6/7] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
@ 2016-12-16 19:03             ` Brandon Williams
  2016-12-16 21:42             ` [PATCH v7 0/7] recursively grep across submodules Junio C Hamano
  7 siblings, 0 replies; 126+ messages in thread
From: Brandon Williams @ 2016-12-16 19:03 UTC (permalink / raw)
  To: git
  Cc: peff, sbeller, jonathantanmy, gitster, jacob.keller, j6t,
	Brandon Williams

If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.

This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit.  If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.

In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.

Signed-off-by: Brandon Williams <bmwill@google.com>
---
 builtin/grep.c                     | 20 +++++++++++++++++--
 t/t7814-grep-recurse-submodules.sh | 41 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 5918a26..2c727ef 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -547,6 +547,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 		name = gs->name;
 
 	prepare_submodule_repo_env(&cp.env_array);
+	argv_array_push(&cp.env_array, GIT_DIR_ENVIRONMENT);
 
 	/* Add super prefix */
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
@@ -615,8 +616,23 @@ static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
 {
 	if (!is_submodule_initialized(path))
 		return 0;
-	if (!is_submodule_populated(path))
-		return 0;
+	if (!is_submodule_populated(path)) {
+		/*
+		 * If searching history, check for the presense of the
+		 * submodule's gitdir before skipping the submodule.
+		 */
+		if (sha1) {
+			const struct submodule *sub =
+					submodule_from_path(null_sha1, path);
+			if (sub)
+				path = git_path("modules/%s", sub->name);
+
+			if (!(is_directory(path) && is_git_directory(path)))
+				return 0;
+		} else {
+			return 0;
+		}
+	}
 
 #ifndef NO_PTHREADS
 	if (num_threads) {
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index d5fc316..67247a0 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -186,6 +186,47 @@ test_expect_success !MINGW 'grep recurse submodule colon in name' '
 	test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+	git init parent &&
+	test_when_finished "rm -rf parent" &&
+	echo "foobar" >parent/file &&
+	git -C parent add file &&
+	git -C parent commit -m "add file" &&
+
+	git init sub &&
+	test_when_finished "rm -rf sub" &&
+	echo "foobar" >sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+
+	git -C parent submodule add ../sub dir/sub &&
+	git -C parent commit -m "add submodule" &&
+
+	cat >expect <<-\EOF &&
+	dir/sub/file:foobar
+	file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	git -C parent mv dir/sub sub-moved &&
+	git -C parent commit -m "moved submodule" &&
+
+	cat >expect <<-\EOF &&
+	file:foobar
+	sub-moved/file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	test_cmp expect actual &&
+
+	cat >expect <<-\EOF &&
+	HEAD^:dir/sub/file:foobar
+	HEAD^:file:foobar
+	EOF
+	git -C parent grep -e "foobar" --recurse-submodules HEAD^ >actual &&
+	test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
 	test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v7 0/7] recursively grep across submodules
  2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
                               ` (6 preceding siblings ...)
  2016-12-16 19:03             ` [PATCH v7 7/7] grep: search history of moved submodules Brandon Williams
@ 2016-12-16 21:42             ` Junio C Hamano
  7 siblings, 0 replies; 126+ messages in thread
From: Junio C Hamano @ 2016-12-16 21:42 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git, peff, sbeller, jonathantanmy, jacob.keller, j6t

Brandon Williams <bmwill@google.com> writes:

> Changes in v7:
> * Rebased on 'origin/bw/realpath-wo-chdir' in order to fix the race condition
>   that occurs when verifying a submodule's gitdir.
> * Reverted is_submodule_populated() to use resolve_gitdir() now that there is
>   no race condition.
> * Added !MINGW to a test in t7814 so that it won't run on windows.  This is due
>   to testing if colons in filenames are still handled correctly, yet windows
>   doesn't allow colons in filenames.

Nice.  

Will queue again to see if those on other platforms have troubles
with it.  I read it through again and think the series is ready for
'next'.

Thanks.

^ permalink raw reply	[flat|nested] 126+ messages in thread

end of thread, other threads:[~2016-12-16 21:42 UTC | newest]

Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-27 22:38 [RFC PATCH 0/5] recursively grep across submodules Brandon Williams
2016-10-27 22:38 ` [PATCH 1/5] submodules: add helper functions to determine presence of submodules Brandon Williams
2016-10-27 22:38 ` [PATCH 2/5] submodules: load gitmodules file from commit sha1 Brandon Williams
2016-10-27 22:38 ` [PATCH 3/5] grep: add submodules as a grep source type Brandon Williams
2016-10-27 22:38 ` [PATCH 4/5] grep: optionally recurse into submodules Brandon Williams
2016-11-05  5:09   ` Jonathan Tan
2016-10-27 22:38 ` [PATCH 5/5] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
2016-10-28 19:35   ` Brandon Williams
2016-10-27 23:26 ` [RFC PATCH 0/5] recursively grep across submodules Junio C Hamano
2016-10-28  0:59   ` Stefan Beller
2016-10-28  2:50     ` Junio C Hamano
2016-10-28  3:46       ` Stefan Beller
2016-10-28 15:06       ` Philip Oakley
2016-10-28 17:02   ` Brandon Williams
2016-10-28 17:21     ` Junio C Hamano
2016-10-31 22:38 ` [PATCH v2 0/6] " Brandon Williams
2016-10-31 22:38   ` [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
2016-10-31 23:34     ` Stefan Beller
2016-11-01 17:20       ` Junio C Hamano
2016-11-01 17:24         ` Brandon Williams
2016-11-01 17:31         ` Stefan Beller
2016-11-06  7:42           ` Jacob Keller
2016-11-01 17:23       ` Brandon Williams
2016-11-05  2:34     ` Jonathan Tan
2016-10-31 22:38   ` [PATCH v2 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
2016-11-01 16:39     ` Stefan Beller
2016-10-31 22:38   ` [PATCH v2 3/6] grep: add submodules as a grep source type Brandon Williams
2016-11-01 16:53     ` Stefan Beller
2016-11-01 17:31     ` Junio C Hamano
2016-10-31 22:38   ` [PATCH v2 4/6] grep: optionally recurse into submodules Brandon Williams
2016-11-01 17:26     ` Stefan Beller
2016-11-01 20:25       ` Brandon Williams
2016-10-31 22:38   ` [PATCH v2 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
2016-11-11 23:09     ` Jonathan Tan
2016-10-31 22:38   ` [PATCH v2 6/6] grep: search history of moved submodules Brandon Williams
2016-11-11 23:51   ` [PATCH v3 0/6] recursively grep across submodules Brandon Williams
2016-11-11 23:51     ` [PATCH v3 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
2016-11-15 23:49       ` Stefan Beller
2016-11-11 23:51     ` [PATCH v3 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
2016-11-12  0:22       ` Stefan Beller
2016-11-11 23:51     ` [PATCH v3 3/6] grep: add submodules as a grep source type Brandon Williams
2016-11-11 23:51     ` [PATCH v3 4/6] grep: optionally recurse into submodules Brandon Williams
2016-11-16  0:07       ` Stefan Beller
2016-11-17 22:13         ` Brandon Williams
2016-11-11 23:51     ` [PATCH v3 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
2016-11-14 18:10       ` Junio C Hamano
2016-11-14 18:44         ` Jonathan Tan
2016-11-14 18:56           ` Junio C Hamano
2016-11-14 19:08             ` Jonathan Tan
2016-11-14 19:14               ` Brandon Williams
2016-11-16  1:09       ` Stefan Beller
2016-11-17 23:34         ` Brandon Williams
2016-11-11 23:51     ` [PATCH v3 6/6] grep: search history of moved submodules Brandon Williams
2016-11-12  0:30       ` Stefan Beller
2016-11-14 17:43         ` Brandon Williams
2016-11-15 17:42     ` [PATCH v3 0/6] recursively grep across submodules Stefan Beller
2016-11-18 19:58     ` [PATCH v4 " Brandon Williams
2016-11-18 19:58       ` [PATCH v4 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
2016-11-18 19:58       ` [PATCH v4 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
2016-11-18 19:58       ` [PATCH v4 3/6] grep: add submodules as a grep source type Brandon Williams
2016-11-18 21:37         ` Junio C Hamano
2016-11-18 22:56           ` Brandon Williams
2016-11-18 19:58       ` [PATCH v4 4/6] grep: optionally recurse into submodules Brandon Williams
2016-11-18 21:48         ` Junio C Hamano
2016-11-18 22:01         ` Junio C Hamano
2016-11-18 22:14         ` Junio C Hamano
2016-11-18 22:58           ` Brandon Williams
2016-11-18 19:58       ` [PATCH v4 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
2016-11-18 22:19         ` Junio C Hamano
2016-11-18 22:52           ` Brandon Williams
2016-11-21 18:14             ` Brandon Williams
2016-11-18 19:58       ` [PATCH v4 6/6] grep: search history of moved submodules Brandon Williams
2016-11-18 20:10       ` [PATCH v4 0/6] recursively grep across submodules Stefan Beller
2016-11-22 18:46       ` [PATCH v5 " Brandon Williams
2016-11-22 18:46         ` [PATCH v5 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
2016-11-22 18:46         ` [PATCH v5 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
2016-11-22 18:46         ` [PATCH v5 3/6] grep: add submodules as a grep source type Brandon Williams
2016-11-22 18:46         ` [PATCH v5 4/6] grep: optionally recurse into submodules Brandon Williams
2016-11-22 18:46         ` [PATCH v5 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
2016-11-22 22:59           ` Junio C Hamano
2016-11-22 23:21             ` Brandon Williams
2016-11-22 23:28               ` Brandon Williams
2016-11-22 23:37                 ` Junio C Hamano
2016-11-22 23:54                   ` Brandon Williams
2016-11-22 18:46         ` [PATCH v5 6/6] grep: search history of moved submodules Brandon Williams
2016-12-01  1:28         ` [PATCH v6 0/6] recursively grep across submodules Brandon Williams
2016-12-01  1:28           ` [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules Brandon Williams
2016-12-01  4:29             ` Jeff King
2016-12-01 18:31               ` Stefan Beller
2016-12-01 18:46               ` Junio C Hamano
2016-12-01 19:09                 ` Jeff King
2016-12-01 19:16                   ` Brandon Williams
2016-12-01 20:54                     ` Brandon Williams
2016-12-01 20:59                       ` Jeff King
2016-12-01 21:56                         ` Stefan Beller
2016-12-01 21:59                           ` Jeff King
2016-12-02 18:36                             ` Brandon Williams
2016-12-02 18:44                               ` Jacob Keller
2016-12-02 18:49                                 ` Brandon Williams
2016-12-02 19:20                                   ` Jacob Keller
2016-12-02 19:28                                     ` Stefan Beller
2016-12-02 21:31                                       ` Jacob Keller
2016-12-02 21:46                                         ` Brandon Williams
2016-12-02 21:45                                       ` Jeff King
2016-12-03  0:16                                         ` Brandon Williams
2016-12-01  1:28           ` [PATCH v6 2/6] submodules: load gitmodules file from commit sha1 Brandon Williams
2016-12-01  1:28           ` [PATCH v6 3/6] grep: add submodules as a grep source type Brandon Williams
2016-12-01  1:28           ` [PATCH v6 4/6] grep: optionally recurse into submodules Brandon Williams
2016-12-01  1:28           ` [PATCH v6 5/6] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
2016-12-01  7:25             ` Johannes Sixt
2016-12-01 17:51               ` Brandon Williams
2016-12-01 18:49                 ` Junio C Hamano
2016-12-01 18:52                 ` Jeff King
2016-12-01  1:28           ` [PATCH v6 6/6] grep: search history of moved submodules Brandon Williams
2016-12-01  4:22           ` [PATCH v6 0/6] recursively grep across submodules Jeff King
2016-12-01 17:45             ` Brandon Williams
2016-12-01 19:03               ` Jeff King
2016-12-16 19:03           ` [PATCH v7 0/7] " Brandon Williams
2016-12-16 19:03             ` [PATCH v7 1/7] submodules: add helper to determine if a submodule is populated Brandon Williams
2016-12-16 19:03             ` [PATCH v7 2/7] submodules: add helper to determine if a submodule is initialized Brandon Williams
2016-12-16 19:03             ` [PATCH v7 3/7] submodules: load gitmodules file from commit sha1 Brandon Williams
2016-12-16 19:03             ` [PATCH v7 4/7] grep: add submodules as a grep source type Brandon Williams
2016-12-16 19:03             ` [PATCH v7 5/7] grep: optionally recurse into submodules Brandon Williams
2016-12-16 19:03             ` [PATCH v7 6/7] grep: enable recurse-submodules to work on <tree> objects Brandon Williams
2016-12-16 19:03             ` [PATCH v7 7/7] grep: search history of moved submodules Brandon Williams
2016-12-16 21:42             ` [PATCH v7 0/7] recursively grep across submodules Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).