git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] Interpret :/<pattern> as a regular expression
@ 2007-06-13  0:50 Johannes Schindelin
  2007-06-13  1:21 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-13  0:50 UTC (permalink / raw
  To: git


Earlier, Git interpreted the pattern as a strict prefix, which made
the operator unsuited in many cases.

Now, the pattern is interpreted as a regular expression (which does not 
change the behaviour too much, since few onelines contain special regex 
characters), so that you can say

	git diff :/.*^Signed-off-by:.Zack.Brown

to see the diff against the most recent reachable commit which was
signed off by Zack, whose Kernel Cousin I miss very much.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	For fun, you can see which committer signed himself off:

		git show :/.*^Signed-off:

	*grin*

 sha1_name.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index d9188ed..f1ba194 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -611,6 +611,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
 {
 	struct commit_list *list = NULL, *backup = NULL, *l;
 	int retval = -1;
+	regex_t regexp;
+	regmatch_t regmatch[1];
 
 	if (prefix[0] == '!') {
 		if (prefix[1] != '!')
@@ -622,6 +624,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
 	for_each_ref(handle_one_ref, &list);
 	for (l = list; l; l = l->next)
 		commit_list_insert(l->item, &backup);
+	if (regcomp(&regexp, prefix, REG_EXTENDED))
+		return error("invalid regexp: %s", prefix);
 	while (list) {
 		char *p;
 		struct commit *commit;
@@ -630,7 +634,9 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
 		parse_object(commit->object.sha1);
 		if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n")))
 			continue;
-		if (!prefixcmp(p + 2, prefix)) {
+		if (!regexec(&regexp, p + 2, 1, regmatch, 0) &&
+				printf("match: %d\n", regmatch[0].rm_so) &&
+				regmatch[0].rm_so == 0) {
 			hashcpy(sha1, commit->object.sha1);
 			retval = 0;
 			break;
@@ -639,6 +645,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
 	free_commit_list(list);
 	for (l = backup; l; l = l->next)
 		clear_commit_marks(l->item, ONELINE_SEEN);
+	regfree(&regexp);
 	return retval;
 }
 
-- 
1.5.2.1.2827.gba84a8-dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13  0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
@ 2007-06-13  1:21 ` Junio C Hamano
  2007-06-13 13:13   ` Johannes Schindelin
  2007-06-13  4:52 ` Junio C Hamano
  2007-06-13 18:41 ` Jeff King
  2 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2007-06-13  1:21 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Earlier, Git interpreted the pattern as a strict prefix, which made
> the operator unsuited in many cases.
>
> Now, the pattern is interpreted as a regular expression (which does not 
> change the behaviour too much, since few onelines contain special regex 
> characters), so that you can say
>
> 	git diff :/.*^Signed-off-by:.Zack.Brown
>
> to see the diff against the most recent reachable commit which was
> signed off by Zack, whose Kernel Cousin I miss very much.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

While this perhaps is an improvement and people who are not
interested in paying the price have a choice of not using this
silly syntax, I am moderately annoyed that the syntax does not
define "the most recent reachable" very well.  It is more like
"the first one we happened to pick by diffing from reachable
refs".  It would be more useful if it took "$commit:/$pattern"
form to limit the search among reachable ones from named commit.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13  0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
  2007-06-13  1:21 ` Junio C Hamano
@ 2007-06-13  4:52 ` Junio C Hamano
  2007-06-13 11:10   ` Johannes Schindelin
  2007-06-13 11:17   ` (unknown) Johannes Schindelin
  2007-06-13 18:41 ` Jeff King
  2 siblings, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2007-06-13  4:52 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> diff --git a/sha1_name.c b/sha1_name.c
> index d9188ed..f1ba194 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -611,6 +611,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
>  {
>  	struct commit_list *list = NULL, *backup = NULL, *l;
>  	int retval = -1;
> +	regex_t regexp;
> +	regmatch_t regmatch[1];
>  
>  	if (prefix[0] == '!') {
>  		if (prefix[1] != '!')

Because you are not extracting any match substring, I do not
think you would need regmatch[] there.

> @@ -622,6 +624,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
>  	for_each_ref(handle_one_ref, &list);
>  	for (l = list; l; l = l->next)
>  		commit_list_insert(l->item, &backup);
> +	if (regcomp(&regexp, prefix, REG_EXTENDED))
> +		return error("invalid regexp: %s", prefix);
>  	while (list) {
>  		char *p;
>  		struct commit *commit;

Why EXTENDED?

I think our code prefer traditional regexp by default, but in
places where extended behaviour is truly useful (e.g. grepping
for bulk of text) have command line option to enable extended.
Of course, extended SHA-1 notation has no chance to do "command
line option", but it somehow feels inconsistent.

Also you probably want REG_NEWLINE.

> @@ -630,7 +634,9 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
>  		parse_object(commit->object.sha1);
>  		if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n")))
>  			continue;
> -		if (!prefixcmp(p + 2, prefix)) {
> +		if (!regexec(&regexp, p + 2, 1, regmatch, 0) &&
> +				printf("match: %d\n", regmatch[0].rm_so) &&
> +				regmatch[0].rm_so == 0) {
>  			hashcpy(sha1, commit->object.sha1);
>  			retval = 0;
>  			break;

Do we want to detect return value other than REG_NOMATCH from
regexec() when it does not return zero?

Please lose the debugging printf() before submitting.

> @@ -639,6 +645,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
>  	free_commit_list(list);
>  	for (l = backup; l; l = l->next)
>  		clear_commit_marks(l->item, ONELINE_SEEN);
> +	regfree(&regexp);
>  	return retval;
>  }
>  

Also I think you would want to fix get_sha1_oneline() so that it
not to refuse to work without save_commit_buffer.  These are to
parse user supplied strings (it would be crazy for scripts to
throw hundreds of ':/random string' to drive git -- it must come
from the end user), and the user has every right to use this
syntax, if he wants to, to specify the starting point for a
command that deliberately turns off save_commit_buffer to save
memory, because the command knows ithat t will traverse tons of
commits without needing the contents of the commit buffer.

After parse_object(), if commit->buffer is NULL, read the buffer
with read_sha1_file() yourself to look for match (and if you did
so you are also responsible for discarding it yourself).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13  4:52 ` Junio C Hamano
@ 2007-06-13 11:10   ` Johannes Schindelin
  2007-06-13 11:17   ` (unknown) Johannes Schindelin
  1 sibling, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-13 11:10 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Hi,

On Tue, 12 Jun 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > diff --git a/sha1_name.c b/sha1_name.c
> > index d9188ed..f1ba194 100644
> > --- a/sha1_name.c
> > +++ b/sha1_name.c
> > @@ -611,6 +611,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
> >  {
> >  	struct commit_list *list = NULL, *backup = NULL, *l;
> >  	int retval = -1;
> > +	regex_t regexp;
> > +	regmatch_t regmatch[1];
> >  
> >  	if (prefix[0] == '!') {
> >  		if (prefix[1] != '!')
> 
> Because you are not extracting any match substring, I do not
> think you would need regmatch[] there.

I explicitely want to anchor the match at the beginning of the message 
(otherwise, most of the interesting patterns would match a merge pulling 
that commit in), and therefore I have to check where the match was. Alas, 
that's only possible with regmatch.

> > @@ -622,6 +624,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
> >  	for_each_ref(handle_one_ref, &list);
> >  	for (l = list; l; l = l->next)
> >  		commit_list_insert(l->item, &backup);
> > +	if (regcomp(&regexp, prefix, REG_EXTENDED))
> > +		return error("invalid regexp: %s", prefix);
> >  	while (list) {
> >  		char *p;
> >  		struct commit *commit;
> 
> Why EXTENDED?

Sorry. Old habit. Will fix.

> Also you probably want REG_NEWLINE.

No. If I look for something in the body of the message (like I showed in 
the two examples), I'd like to say ':/.*Blub'. Of course, you could always 
say

	git diff ':/[ -~
]*Blub'

with REG_NEWLINE, to match newlines also. But frankly, it is much more 
often that I want to match something in the whole message than just in the 
oneline. And if I _do_ want the match only in the oneline, I can still go 
and do

	git diff ':/[^
]*Blub'

when REG_NEWLINE is disabled. If you can teach me a better way to do both 
things, matching in the oneline _or_ matching the whole message, _with_ 
REG_NEWLINE, I'll gladly change it, and provide an example in the commit 
message as well as the documentation for equally clueless subjects as me.

> > @@ -630,7 +634,9 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
> >  		parse_object(commit->object.sha1);
> >  		if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n")))
> >  			continue;
> > -		if (!prefixcmp(p + 2, prefix)) {
> > +		if (!regexec(&regexp, p + 2, 1, regmatch, 0) &&
> > +				printf("match: %d\n", regmatch[0].rm_so) &&
> > +				regmatch[0].rm_so == 0) {
> >  			hashcpy(sha1, commit->object.sha1);
> >  			retval = 0;
> >  			break;
> 
> Do we want to detect return value other than REG_NOMATCH from
> regexec() when it does not return zero?

I am not well versed in the multitude of POSIX and other standards we have 
on this planet, therefore I just read my man page. And it says, quote:

	regexec() returns zero for a successful match or REG_NOMATCH for 
	failure.

Tertium non datur. At least according to my man page here. Am I mistaken 
in my assumption (which seems to be somewhat supported from my limited 
reading of the man page) that all errors should be caught at regcomp() 
time?

> Please lose the debugging printf() before submitting.

Ouch. Sorry.

> > @@ -639,6 +645,7 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
> >  	free_commit_list(list);
> >  	for (l = backup; l; l = l->next)
> >  		clear_commit_marks(l->item, ONELINE_SEEN);
> > +	regfree(&regexp);
> >  	return retval;
> >  }
> >  
> 
> Also I think you would want to fix get_sha1_oneline() so that it
> not to refuse to work without save_commit_buffer.

You're right, will fix.

I'll resubmit shortly. However, feel free to enlighten me with insights 
into working _with_ REG_NEWLINE, and I'll gladly submit another version of 
the patch with REG_NEWLINE enabled, along with the additions to the 
documentation.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* (unknown)
  2007-06-13  4:52 ` Junio C Hamano
  2007-06-13 11:10   ` Johannes Schindelin
@ 2007-06-13 11:17   ` Johannes Schindelin
  2007-06-13 12:11     ` [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
  1 sibling, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-13 11:17 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

[PATCH] Interpret :/<pattern> as a regular expression

Earlier, Git interpreted the pattern as a strict prefix, which made
the operator unsuited in many cases.

Now, the pattern is interpreted as a regular expression, on the whole
message, so that you can say

	git diff :/.*^Signed-off-by:.Zack.Brown

to see the diff against the most recent reachable commit which was
signed off by Zack, whose Kernel Cousin I miss very much.

If you want to match just the oneline, but with a regular expression,
say something like

	git diff ':/[^
]*intelligent'

Since it makes more sense to match the beginning of a message (otherwise,
a pattern like ':/git-gui: Improve' would match the _merge_ commit, 
pulling in the commit you are likely to want), the implementation uses the 
regmatch parameter of regexec() to anchor the pattern there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---

	Anyone who can teach me how to match / not-match a newline
	using a more elegant syntax, please do so, by all means.

 Documentation/git-rev-parse.txt |    6 +++---
 sha1_name.c                     |   31 +++++++++++++++++++++++++------
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index 6380676..56e1561 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -194,9 +194,9 @@ blobs contained in a commit.
   found.
 
 * A colon, followed by a slash, followed by a text: this names
-  a commit whose commit message starts with the specified text.
-  This name returns the youngest matching commit which is
-  reachable from any ref.  If the commit message starts with a
+  a commit whose commit message starts with the specified regular
+  expression.  This name returns the youngest matching commit which
+  is reachable from any ref.  If the commit message starts with a
   '!', you have to repeat that;  the special sequence ':/!',
   followed by something else than '!' is reserved for now.
 
diff --git a/sha1_name.c b/sha1_name.c
index d9188ed..988d599 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -599,8 +599,8 @@ static int handle_one_ref(const char *path,
 
 /*
  * This interprets names like ':/Initial revision of "git"' by searching
- * through history and returning the first commit whose message starts
- * with the given string.
+ * through history and returning the first commit whose message matches
+ * the given regular expression.
  *
  * For future extension, ':/!' is reserved. If you want to match a message
  * beginning with a '!', you have to repeat the exclamation mark.
@@ -611,34 +611,53 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
 {
 	struct commit_list *list = NULL, *backup = NULL, *l;
 	int retval = -1;
+	regex_t regexp;
+	regmatch_t regmatch[1];
+	char *temp_commit_buffer = NULL;
 
 	if (prefix[0] == '!') {
 		if (prefix[1] != '!')
 			die ("Invalid search pattern: %s", prefix);
 		prefix++;
 	}
-	if (!save_commit_buffer)
-		return error("Could not expand oneline-name.");
 	for_each_ref(handle_one_ref, &list);
 	for (l = list; l; l = l->next)
 		commit_list_insert(l->item, &backup);
+	if (regcomp(&regexp, prefix, 0))
+		return error("invalid regexp: %s", prefix);
 	while (list) {
 		char *p;
 		struct commit *commit;
+		enum object_type type;
+		unsigned long size;
 
 		commit = pop_most_recent_commit(&list, ONELINE_SEEN);
 		parse_object(commit->object.sha1);
-		if (!commit->buffer || !(p = strstr(commit->buffer, "\n\n")))
+		if (temp_commit_buffer)
+			free(temp_commit_buffer);
+		if (commit->buffer)
+			p = commit->buffer;
+		else {
+			p = read_sha1_file(commit->object.sha1, &type, &size);
+			if (!p)
+				continue;
+			temp_commit_buffer = p;
+		}
+		if (!(p = strstr(p, "\n\n")))
 			continue;
-		if (!prefixcmp(p + 2, prefix)) {
+		if (!regexec(&regexp, p + 2, 1, regmatch, 0) &&
+				regmatch[0].rm_so == 0) {
 			hashcpy(sha1, commit->object.sha1);
 			retval = 0;
 			break;
 		}
 	}
+	if (temp_commit_buffer)
+		free(temp_commit_buffer);
 	free_commit_list(list);
 	for (l = backup; l; l = l->next)
 		clear_commit_marks(l->item, ONELINE_SEEN);
+	regfree(&regexp);
 	return retval;
 }
 
-- 
1.5.2.1.2827.gba84a8-dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13 11:17   ` (unknown) Johannes Schindelin
@ 2007-06-13 12:11     ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-13 12:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Hi,

I don't know what happened to my subject in the mail I'm replying to... 
Sorry.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13  1:21 ` Junio C Hamano
@ 2007-06-13 13:13   ` Johannes Schindelin
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-13 13:13 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Hi,

On Tue, 12 Jun 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Earlier, Git interpreted the pattern as a strict prefix, which made
> > the operator unsuited in many cases.
> >
> > Now, the pattern is interpreted as a regular expression (which does not 
> > change the behaviour too much, since few onelines contain special regex 
> > characters), so that you can say
> >
> > 	git diff :/.*^Signed-off-by:.Zack.Brown
> >
> > to see the diff against the most recent reachable commit which was
> > signed off by Zack, whose Kernel Cousin I miss very much.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> While this perhaps is an improvement and people who are not
> interested in paying the price have a choice of not using this
> silly syntax, I am moderately annoyed that the syntax does not
> define "the most recent reachable" very well.  It is more like
> "the first one we happened to pick by diffing from reachable
> refs".  It would be more useful if it took "$commit:/$pattern"
> form to limit the search among reachable ones from named commit.

"Unfortunately", $commit:/$pattern is not a good syntax, since it suggests 
that you want to search _in_ $commit, not _from $commit.

How about ':/!commit=$commit:$pattern'?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13  0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
  2007-06-13  1:21 ` Junio C Hamano
  2007-06-13  4:52 ` Junio C Hamano
@ 2007-06-13 18:41 ` Jeff King
  2007-06-13 18:54   ` Johannes Schindelin
  2 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2007-06-13 18:41 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

On Wed, Jun 13, 2007 at 01:50:22AM +0100, Johannes Schindelin wrote:

> Earlier, Git interpreted the pattern as a strict prefix, which made
> the operator unsuited in many cases.

Thank you for working on this...I really like the :/ concept, but find
myself wishing for a regex all the time. I have been meaning to do it
since you introduced the original. :)

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13 18:41 ` Jeff King
@ 2007-06-13 18:54   ` Johannes Schindelin
  2007-06-13 20:00     ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-13 18:54 UTC (permalink / raw
  To: Jeff King; +Cc: git

Hi,

On Wed, 13 Jun 2007, Jeff King wrote:

> On Wed, Jun 13, 2007 at 01:50:22AM +0100, Johannes Schindelin wrote:
> 
> > Earlier, Git interpreted the pattern as a strict prefix, which made
> > the operator unsuited in many cases.
> 
> Thank you for working on this...I really like the :/ concept, but find
> myself wishing for a regex all the time. I have been meaning to do it
> since you introduced the original. :)

:-) Since you seem comfortable with regular expressions, maybe you can 
help me: I am looking for a pattern which matches _any_ character, and one 
which matches only non-newlines, both with and without REG_NEWLINE. Hmm?

Tia,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13 18:54   ` Johannes Schindelin
@ 2007-06-13 20:00     ` Jeff King
  2007-06-13 22:20       ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2007-06-13 20:00 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

On Wed, Jun 13, 2007 at 07:54:59PM +0100, Johannes Schindelin wrote:

> :-) Since you seem comfortable with regular expressions, maybe you can 
> help me: I am looking for a pattern which matches _any_ character, and one 
> which matches only non-newlines, both with and without REG_NEWLINE. Hmm?

Without REG_NEWLINE, any character is just '.', but I think you are
stuck with '[^
]' for non-newlines, since POSIX makes no provisions for quoting the
newline (I just skimmed through POSIX chapter 9, and I didn't see
anything useful).

With REG_NEWLINE, non-newlines is of course '.'. Matching both is tricky
without using extended regular expressions (where you could just do '.|
'). In fact, I have been playing with it for a few minutes and I can't
seem to find a good way, since you really want to represent '.' _inside_
a bracketed alternation sequence. But I don't think there's a character
class for "everything".

I think this would be much easier with pcre, but ISTR some opposition to
that a few months back.

So that's probably not very helpful to you, but at least you have
confirmation from one other person that the answer isn't totally
obvious. :)

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13 20:00     ` Jeff King
@ 2007-06-13 22:20       ` Johannes Schindelin
  2007-06-14  7:48         ` Sam Vilain
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-13 22:20 UTC (permalink / raw
  To: Jeff King; +Cc: git

Hi,

On Wed, 13 Jun 2007, Jeff King wrote:

> On Wed, Jun 13, 2007 at 07:54:59PM +0100, Johannes Schindelin wrote:
> 
> > :-) Since you seem comfortable with regular expressions, maybe you can 
> > help me: I am looking for a pattern which matches _any_ character, and one 
> > which matches only non-newlines, both with and without REG_NEWLINE. Hmm?
> 
> Without REG_NEWLINE, any character is just '.', but I think you are
> stuck with '[^
> ]' for non-newlines, since POSIX makes no provisions for quoting the
> newline (I just skimmed through POSIX chapter 9, and I didn't see
> anything useful).
> 
> With REG_NEWLINE, non-newlines is of course '.'. Matching both is tricky
> without using extended regular expressions (where you could just do '.|
> '). In fact, I have been playing with it for a few minutes and I can't
> seem to find a good way, since you really want to represent '.' _inside_
> a bracketed alternation sequence. But I don't think there's a character
> class for "everything".
> 
> I think this would be much easier with pcre, but ISTR some opposition to
> that a few months back.

Actually, that's funny. Yesterday, I repeated my claim that pcre is 
slow on IRC, and Sam Villain on IRC accused me of trolling. But as you can 
see from my postings on this list ($gmane/41682), you can see that _I_ had 
numbers to back up my claim.

So no, I think pcre is just not worth it.

> So that's probably not very helpful to you, but at least you have
> confirmation from one other person that the answer isn't totally
> obvious. :)

That confirmation is at least some consolation to me :-)

Ciao,
Dscho "who is not here to teach, but to learn"

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-13 22:20       ` Johannes Schindelin
@ 2007-06-14  7:48         ` Sam Vilain
  2007-06-14  8:09           ` Johannes Schindelin
  0 siblings, 1 reply; 14+ messages in thread
From: Sam Vilain @ 2007-06-14  7:48 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Jeff King, git

Johannes Schindelin wrote:
> Actually, that's funny. Yesterday, I repeated my claim that pcre is 
> slow on IRC, and Sam Villain on IRC accused me of trolling. But as you can 
> see from my postings on this list ($gmane/41682), you can see that _I_ had 
> numbers to back up my claim.
>
> So no, I think pcre is just not worth it.
>   

A strange thing to conclude from your figures, which show pcre as the
fastest out of several libraries that you tested.

Your figures show exactly what I was saying on IRC - that a DFA
(external grep) vs NFA engine (most regex libraries) is inherently
faster. The paper I linked to, specially selected as I had previously
read a significant amount of the peer review the paper received,
explained this in detail. The one piece of feedback your numbers got
on-list also mentioned this.

However there is a further flaw in your study. All but one of the
performance tests use an external program, which on a given system may
or may not be faster because of pipeline performance characteristics.
You could improve the quality of the result by using the 'pcregrep'
program as a data point. It might also be worth trying a few more
complex patterns. I suggest reading the paper
(http://swtch.com/~rsc/regexp/regexp1.html) for some background before
repeating the experiment.

Apologies for not reviewing your numbers at the time; it sure is hard to
keep on top of this list. But very interesting that they seem to suggest
pcre would be the best choice from a performance perspective, even
though the figures are very preliminary. Perhaps it is worth pursuing
after all.

Sam.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-14  7:48         ` Sam Vilain
@ 2007-06-14  8:09           ` Johannes Schindelin
  2007-06-14  9:07             ` Sam Vilain
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Schindelin @ 2007-06-14  8:09 UTC (permalink / raw
  To: Sam Vilain; +Cc: Jeff King, git

Hi,

On Thu, 14 Jun 2007, Sam Vilain wrote:

> Johannes Schindelin wrote:
>
> > So no, I think pcre is just not worth it.
>
> A strange thing to conclude from your figures, which show pcre as the 
> fastest out of several libraries that you tested.

The best of the worse. Yes. An external (!) program was 4x faster than 
pcre. I don't know how to make it more obvious that pcre sucks.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Interpret :/<pattern> as a regular expression
  2007-06-14  8:09           ` Johannes Schindelin
@ 2007-06-14  9:07             ` Sam Vilain
  0 siblings, 0 replies; 14+ messages in thread
From: Sam Vilain @ 2007-06-14  9:07 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin wrote:
>> A strange thing to conclude from your figures, which show pcre as the 
>> fastest out of several libraries that you tested.
>>     
>
> The best of the worse. Yes. An external (!) program was 4x faster than 
> pcre. I don't know how to make it more obvious that pcre sucks.
>   

Oh, your position is obvious enough, but it is correct and supportable
by the available evidence? Why not read past the first paragraph of my
post and reply to that.

Sam.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-06-14  9:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-13  0:50 [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
2007-06-13  1:21 ` Junio C Hamano
2007-06-13 13:13   ` Johannes Schindelin
2007-06-13  4:52 ` Junio C Hamano
2007-06-13 11:10   ` Johannes Schindelin
2007-06-13 11:17   ` (unknown) Johannes Schindelin
2007-06-13 12:11     ` [PATCH] Interpret :/<pattern> as a regular expression Johannes Schindelin
2007-06-13 18:41 ` Jeff King
2007-06-13 18:54   ` Johannes Schindelin
2007-06-13 20:00     ` Jeff King
2007-06-13 22:20       ` Johannes Schindelin
2007-06-14  7:48         ` Sam Vilain
2007-06-14  8:09           ` Johannes Schindelin
2007-06-14  9:07             ` Sam Vilain

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).