git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git grep -F doesn't behave like grep -F?
@ 2012-05-18 11:00 Torne (Richard Coles)
  2012-05-18 12:37 ` René Scharfe
  0 siblings, 1 reply; 8+ messages in thread
From: Torne (Richard Coles) @ 2012-05-18 11:00 UTC (permalink / raw)
  To: git

Hi folks,

git grep -F is documented as: "Use fixed strings for patterns (don’t
interpret pattern as a regex)."

whereas grep -F is documented as "Interpret PATTERN as a  list  of
fixed  strings,  separated  by newlines,  any  of  which is to be
matched."

This accurately describes how they behave, which means that git grep
-F with a pattern containing newlines never matches anything (at least
as far as I can see). Is this intentional, or an oversight? The
ability to grep -F for a list (e.g. the output of another grep) is
pretty handy...

-- 
Torne (Richard Coles)
torne@google.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git grep -F doesn't behave like grep -F?
  2012-05-18 11:00 git grep -F doesn't behave like grep -F? Torne (Richard Coles)
@ 2012-05-18 12:37 ` René Scharfe
  2012-05-18 12:41   ` Torne (Richard Coles)
  0 siblings, 1 reply; 8+ messages in thread
From: René Scharfe @ 2012-05-18 12:37 UTC (permalink / raw)
  To: Torne (Richard Coles); +Cc: git

Am 18.05.2012 13:00, schrieb Torne (Richard Coles):
> Hi folks,
>
> git grep -F is documented as: "Use fixed strings for patterns (don’t
> interpret pattern as a regex)."
>
> whereas grep -F is documented as "Interpret PATTERN as a  list  of
> fixed  strings,  separated  by newlines,  any  of  which is to be
> matched."
>
> This accurately describes how they behave, which means that git grep
> -F with a pattern containing newlines never matches anything (at least
> as far as I can see). Is this intentional, or an oversight? The
> ability to grep -F for a list (e.g. the output of another grep) is
> pretty handy...

You could use -f- (read patterns from stdin).

That said, it looks like a missing feature to me -- at least I didn't 
know that grep -F takes newline separated lists of search strings.  And 
this doesn't seem to be restricted to invocations with -F, only; a plain 
grep with regexps does it as well.

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: git grep -F doesn't behave like grep -F?
  2012-05-18 12:37 ` René Scharfe
@ 2012-05-18 12:41   ` Torne (Richard Coles)
  2012-05-20 14:32     ` [PATCH 1/3] grep: factor out create_grep_pat() René Scharfe
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Torne (Richard Coles) @ 2012-05-18 12:41 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On 18 May 2012 13:37, René Scharfe <rene.scharfe@lsrfire.ath.cx> wrote:
> Am 18.05.2012 13:00, schrieb Torne (Richard Coles):
>
>> Hi folks,
>>
>> git grep -F is documented as: "Use fixed strings for patterns (don’t
>> interpret pattern as a regex)."
>>
>> whereas grep -F is documented as "Interpret PATTERN as a  list  of
>> fixed  strings,  separated  by newlines,  any  of  which is to be
>> matched."
>>
>> This accurately describes how they behave, which means that git grep
>> -F with a pattern containing newlines never matches anything (at least
>> as far as I can see). Is this intentional, or an oversight? The
>> ability to grep -F for a list (e.g. the output of another grep) is
>> pretty handy...
>
>
> You could use -f- (read patterns from stdin).

Ah, yes, rewriting

git grep -F "`git grep -o someregex`"

as

git grep -o someregex | git grep -F -f-

works, but it's not how I immediately think to do it :)

> That said, it looks like a missing feature to me -- at least I didn't know
> that grep -F takes newline separated lists of search strings.  And this
> doesn't seem to be restricted to invocations with -F, only; a plain grep
> with regexps does it as well.

Yeah, it doesn't seem like adding it would break anything; patterns
with newlines don't match any lines by definition currently :)

-- 
Torne (Richard Coles)
torne@google.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] grep: factor out create_grep_pat()
  2012-05-18 12:41   ` Torne (Richard Coles)
@ 2012-05-20 14:32     ` René Scharfe
  2012-05-20 14:32     ` [PATCH 2/3] grep: factor out do_append_grep_pat() René Scharfe
  2012-05-20 14:33     ` [PATCH 3/3] grep: support newline separated pattern list René Scharfe
  2 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2012-05-20 14:32 UTC (permalink / raw)
  To: Torne (Richard Coles); +Cc: git, Junio C Hamano

Add create_grep_pat(), a shared helper for all grep pattern allocation
and initialization needs.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 grep.c |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/grep.c b/grep.c
index f8ffa46..a673ced 100644
--- a/grep.c
+++ b/grep.c
@@ -3,15 +3,26 @@
 #include "userdiff.h"
 #include "xdiff-interface.h"
 
-void append_header_grep_pattern(struct grep_opt *opt, enum grep_header_field field, const char *pat)
+static struct grep_pat *create_grep_pat(const char *pat, size_t patlen,
+					const char *origin, int no,
+					enum grep_pat_token t,
+					enum grep_header_field field)
 {
 	struct grep_pat *p = xcalloc(1, sizeof(*p));
 	p->pattern = pat;
-	p->patternlen = strlen(pat);
-	p->origin = "header";
-	p->no = 0;
-	p->token = GREP_PATTERN_HEAD;
+	p->patternlen = patlen;
+	p->origin = origin;
+	p->no = no;
+	p->token = t;
 	p->field = field;
+	return p;
+}
+
+void append_header_grep_pattern(struct grep_opt *opt,
+				enum grep_header_field field, const char *pat)
+{
+	struct grep_pat *p = create_grep_pat(pat, strlen(pat), "header", 0,
+					     GREP_PATTERN_HEAD, field);
 	*opt->header_tail = p;
 	opt->header_tail = &p->next;
 	p->next = NULL;
@@ -26,12 +37,7 @@ void append_grep_pattern(struct grep_opt *opt, const char *pat,
 void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen,
 		     const char *origin, int no, enum grep_pat_token t)
 {
-	struct grep_pat *p = xcalloc(1, sizeof(*p));
-	p->pattern = pat;
-	p->patternlen = patlen;
-	p->origin = origin;
-	p->no = no;
-	p->token = t;
+	struct grep_pat *p = create_grep_pat(pat, patlen, origin, no, t, 0);
 	*opt->pattern_tail = p;
 	opt->pattern_tail = &p->next;
 	p->next = NULL;
-- 
1.7.10.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] grep: factor out do_append_grep_pat()
  2012-05-18 12:41   ` Torne (Richard Coles)
  2012-05-20 14:32     ` [PATCH 1/3] grep: factor out create_grep_pat() René Scharfe
@ 2012-05-20 14:32     ` René Scharfe
  2012-05-20 14:33     ` [PATCH 3/3] grep: support newline separated pattern list René Scharfe
  2 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2012-05-20 14:32 UTC (permalink / raw)
  To: Torne (Richard Coles); +Cc: git, Junio C Hamano

Add do_append_grep_pat() as a shared function for adding patterns to
the header pattern list and the general pattern list.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Our first three star function. :)

 grep.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/grep.c b/grep.c
index a673ced..f961c2e 100644
--- a/grep.c
+++ b/grep.c
@@ -18,14 +18,19 @@ static struct grep_pat *create_grep_pat(const char *pat, size_t patlen,
 	return p;
 }
 
+static void do_append_grep_pat(struct grep_pat ***tail, struct grep_pat *p)
+{
+	**tail = p;
+	*tail = &p->next;
+	p->next = NULL;
+}
+
 void append_header_grep_pattern(struct grep_opt *opt,
 				enum grep_header_field field, const char *pat)
 {
 	struct grep_pat *p = create_grep_pat(pat, strlen(pat), "header", 0,
 					     GREP_PATTERN_HEAD, field);
-	*opt->header_tail = p;
-	opt->header_tail = &p->next;
-	p->next = NULL;
+	do_append_grep_pat(&opt->header_tail, p);
 }
 
 void append_grep_pattern(struct grep_opt *opt, const char *pat,
@@ -38,9 +43,7 @@ void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen,
 		     const char *origin, int no, enum grep_pat_token t)
 {
 	struct grep_pat *p = create_grep_pat(pat, patlen, origin, no, t, 0);
-	*opt->pattern_tail = p;
-	opt->pattern_tail = &p->next;
-	p->next = NULL;
+	do_append_grep_pat(&opt->pattern_tail, p);
 }
 
 struct grep_opt *grep_opt_dup(const struct grep_opt *opt)
-- 
1.7.10.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] grep: support newline separated pattern list
  2012-05-18 12:41   ` Torne (Richard Coles)
  2012-05-20 14:32     ` [PATCH 1/3] grep: factor out create_grep_pat() René Scharfe
  2012-05-20 14:32     ` [PATCH 2/3] grep: factor out do_append_grep_pat() René Scharfe
@ 2012-05-20 14:33     ` René Scharfe
  2012-05-20 22:29       ` Junio C Hamano
  2 siblings, 1 reply; 8+ messages in thread
From: René Scharfe @ 2012-05-20 14:33 UTC (permalink / raw)
  To: Torne (Richard Coles); +Cc: git, Junio C Hamano

Currently, patterns that contain newline characters don't match anything
when given to git grep.  Regular grep(1) interprets patterns as lists of
newline separated search strings instead.

Implement this functionality by creating and inserting extra grep_pat
structures for patterns consisting of multiple lines when appending to
the pattern lists.  For simplicity, all pattern strings are duplicated.
The original pattern is truncated in place to make it contain only the
first line.

Requested-by: Torne (Richard Coles) <torne@google.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 Documentation/git-grep.txt |    4 +++-
 grep.c                     |   33 ++++++++++++++++++++++++++++++++-
 grep.h                     |    2 +-
 t/t7810-grep.sh            |    5 +++++
 4 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 4785f1c..3bec036 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -31,7 +31,9 @@ SYNOPSIS
 DESCRIPTION
 -----------
 Look for specified patterns in the tracked files in the work tree, blobs
-registered in the index file, or blobs in given tree objects.
+registered in the index file, or blobs in given tree objects.  Patterns
+are lists of one or more search expressions separated by newline
+characters.  An empty string as search expression matches all lines.
 
 
 CONFIGURATION
diff --git a/grep.c b/grep.c
index f961c2e..04e3ec6 100644
--- a/grep.c
+++ b/grep.c
@@ -9,7 +9,7 @@ static struct grep_pat *create_grep_pat(const char *pat, size_t patlen,
 					enum grep_header_field field)
 {
 	struct grep_pat *p = xcalloc(1, sizeof(*p));
-	p->pattern = pat;
+	p->pattern = xmemdupz(pat, patlen);
 	p->patternlen = patlen;
 	p->origin = origin;
 	p->no = no;
@@ -23,6 +23,36 @@ static void do_append_grep_pat(struct grep_pat ***tail, struct grep_pat *p)
 	**tail = p;
 	*tail = &p->next;
 	p->next = NULL;
+
+	switch (p->token) {
+	case GREP_PATTERN: /* atom */
+	case GREP_PATTERN_HEAD:
+	case GREP_PATTERN_BODY:
+		for (;;) {
+			struct grep_pat *new_pat;
+			size_t len = 0;
+			char *cp = p->pattern + p->patternlen, *nl = NULL;
+			while (++len <= p->patternlen) {
+				if (*(--cp) == '\n') {
+					nl = cp;
+					break;
+				}
+			}
+			if (!nl)
+				break;
+			new_pat = create_grep_pat(nl + 1, len - 1, p->origin,
+						  p->no, p->token, p->field);
+			new_pat->next = p->next;
+			if (!p->next)
+				*tail = &new_pat->next;
+			p->next = new_pat;
+			*nl = '\0';
+			p->patternlen -= len;
+		}
+		break;
+	default:
+		break;
+	}
 }
 
 void append_header_grep_pattern(struct grep_opt *opt,
@@ -439,6 +469,7 @@ void free_grep_patterns(struct grep_opt *opt)
 				free_pcre_regexp(p);
 			else
 				regfree(&p->regexp);
+			free(p->pattern);
 			break;
 		default:
 			break;
diff --git a/grep.h b/grep.h
index 36e49d8..ed7de6b 100644
--- a/grep.h
+++ b/grep.h
@@ -38,7 +38,7 @@ struct grep_pat {
 	const char *origin;
 	int no;
 	enum grep_pat_token token;
-	const char *pattern;
+	char *pattern;
 	size_t patternlen;
 	enum grep_header_field field;
 	regex_t regexp;
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index d9ad633..24e9b19 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -351,6 +351,11 @@ test_expect_success 'grep -f, multiple patterns' '
 	test_cmp expected actual
 '
 
+test_expect_success 'grep, multiple patterns' '
+	git grep "$(cat patterns)" >actual &&
+	test_cmp expected actual
+'
+
 cat >expected <<EOF
 file:foo mmap bar
 file:foo_mmap bar
-- 
1.7.10.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] grep: support newline separated pattern list
  2012-05-20 14:33     ` [PATCH 3/3] grep: support newline separated pattern list René Scharfe
@ 2012-05-20 22:29       ` Junio C Hamano
  2012-05-21 16:10         ` René Scharfe
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2012-05-20 22:29 UTC (permalink / raw)
  To: René Scharfe; +Cc: Torne (Richard Coles), git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Currently, patterns that contain newline characters don't match anything
> when given to git grep.  Regular grep(1) interprets patterns as lists of
> newline separated search strings instead.
>
> Implement this functionality by creating and inserting extra grep_pat
> structures for patterns consisting of multiple lines when appending to
> the pattern lists.  For simplicity, all pattern strings are duplicated.
> The original pattern is truncated in place to make it contain only the
> first line.

Thanks for a fix; I am assuming that the duplication for simplicity is not
for simplified allocation but primarily for a simpler freeing?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] grep: support newline separated pattern list
  2012-05-20 22:29       ` Junio C Hamano
@ 2012-05-21 16:10         ` René Scharfe
  0 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2012-05-21 16:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torne (Richard Coles), git

Am 21.05.2012 00:29, schrieb Junio C Hamano:
> René Scharfe<rene.scharfe@lsrfire.ath.cx>  writes:
> 
>> Currently, patterns that contain newline characters don't match anything
>> when given to git grep.  Regular grep(1) interprets patterns as lists of
>> newline separated search strings instead.
>>
>> Implement this functionality by creating and inserting extra grep_pat
>> structures for patterns consisting of multiple lines when appending to
>> the pattern lists.  For simplicity, all pattern strings are duplicated.
>> The original pattern is truncated in place to make it contain only the
>> first line.
> 
> Thanks for a fix; I am assuming that the duplication for simplicity is not
> for simplified allocation but primarily for a simpler freeing?

When we split a search string into multiple parts, we could allocate only
the new parts and remember which strings were allocated, so that we could
later free them (and only them).  Or we could allocate and leak them, as
there aren't that many and we keep them during the whole run of the program
anyway.  Or we could do without any allocations if all match backends
respected length limited strings like kwset does.  Or only allocate
non-fixed pattern.

Allocating any and all pattern strings and freeing them at the end, thus
disregarding these considerations, is simpler, cleaner and still doesn't
cause excessive memory usage.

That reminds me that we can now have this bonus patch:

-- >8 --
Subject: [PATCH 4/3] grep: stop leaking line strings with -f

When reading patterns from a file, we pass the lines as allocated string
buffers to append_grep_pat() and never free them.  That's not a problem
because they are needed until the program ends anyway.

However, now that the function duplicates the pattern string, we can
reuse the strbuf after calling that function.  This simplifies the code
a bit and plugs a minor memory leak.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 builtin/grep.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 643938d..fe1726f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -600,15 +600,12 @@ static int file_callback(const struct option *opt, const char *arg, int unset)
 	if (!patterns)
 		die_errno(_("cannot open '%s'"), arg);
 	while (strbuf_getline(&sb, patterns, '\n') == 0) {
-		char *s;
-		size_t len;
-
 		/* ignore empty line like grep does */
 		if (sb.len == 0)
 			continue;
 
-		s = strbuf_detach(&sb, &len);
-		append_grep_pat(grep_opt, s, len, arg, ++lno, GREP_PATTERN);
+		append_grep_pat(grep_opt, sb.buf, sb.len, arg, ++lno,
+				GREP_PATTERN);
 	}
 	if (!from_stdin)
 		fclose(patterns);
-- 
1.7.10.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-05-21 16:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-18 11:00 git grep -F doesn't behave like grep -F? Torne (Richard Coles)
2012-05-18 12:37 ` René Scharfe
2012-05-18 12:41   ` Torne (Richard Coles)
2012-05-20 14:32     ` [PATCH 1/3] grep: factor out create_grep_pat() René Scharfe
2012-05-20 14:32     ` [PATCH 2/3] grep: factor out do_append_grep_pat() René Scharfe
2012-05-20 14:33     ` [PATCH 3/3] grep: support newline separated pattern list René Scharfe
2012-05-20 22:29       ` Junio C Hamano
2012-05-21 16:10         ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).