git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 00/12] nd/icase updates
Date: Mon, 27 Jun 2016 07:53:59 -0700	[thread overview]
Message-ID: <xmqqeg7id6ns.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20160625052238.13615-1-pclouds@gmail.com> ("Nguyễn Thái Ngọc Duy"'s message of "Sat, 25 Jun 2016 07:22:26 +0200")

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> v2 fixes Junio's and Jeff's comments (both good). The sharing "!icase
> || ascii_only" is made a separate commit (6/12) because I think it
> takes some seconds to realize that the conversion is correct and it's
> technically not needed in 5/12 (and it's sort of the opposite of 1/12)
>
> Interdiff

OK.  regcomp_or_die() does make the code simpler.

> diff --git a/grep.c b/grep.c
> index cb058a5..92587a8 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -432,15 +432,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
>  	ascii_only     = !has_non_ascii(p->pattern);
>  
> +	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
>  		p->fixed = !icase || ascii_only;
>  	else
>  		p->fixed = 0;
>  
> @@ -449,6 +442,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  		kwsincr(p->kws, p->pattern, p->patternlen);
>  		kwsprep(p->kws);
>  		return;
> +	} else if (opt->fixed) {
> +		compile_fixed_regexp(p, opt);
> +		return;
>  	}

This if/elseif/else cascade made a lot simpler and while the
discussion is fresh in my brain it makes sense, but it may deserve a
bit of commenting.

And while attempting to do so, I found one possible issue there.

Can't p->ignore_case be true even when opt->regflags does not have
REG_ICASE?  The user never asked us to do a regexp match in such a
case, and the logical place to compensate for that case would be
inside compile_fixed_regexp(), where we use regexp engine behind
user's back for our convenience, I would think.

In the current code, compile_fixed_regexp() is only called when we
want ICASE, but hardcoding that assumption to it unnecessarily robs
flexibility (and the function name does not tell us it is only for
icase in the first place), so I taught it to do the REG_ICASE thing
only when opt->ignore_case is set.

How does this look?


 grep.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 92587a8..3a3a9f4 100644
--- a/grep.c
+++ b/grep.c
@@ -407,17 +407,21 @@ static int is_fixed(const char *s, size_t len)
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int err;
+	int regflags;
 
 	basic_regex_quote_buf(&sb, p->pattern);
-	err = regcomp(&p->regexp, sb.buf, opt->regflags & ~REG_EXTENDED);
+	regflags = opt->regflags & ~REG_EXTENDED;
+	if (opt->ignore_case)
+		regflags |= REG_ICASE;
+	err = regcomp(&p->regexp, sb.buf, regflags);
 	if (opt->debug)
 		fprintf(stderr, "fixed %s\n", sb.buf);
 	strbuf_release(&sb);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, sizeof(errbuf));
 		regfree(&p->regexp);
 		compile_regexp_failed(p, errbuf);
 	}
 }
@@ -425,38 +429,55 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int icase, ascii_only;
 	int err;
 
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
 	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
 	ascii_only     = !has_non_ascii(p->pattern);
 
+	/*
+	 * Even when -F (fixed) asks us to do a non-regexp search, we
+	 * may not be able to correctly case-fold when -i
+	 * (ignore-case) is asked (in which case, we'll synthesize a
+	 * regexp to match the pattern that matches regexp special
+	 * characters literally, while ignoring case differences).  On
+	 * the other hand, even without -F, if the pattern does not
+	 * have any regexp special characters and there is no need for
+	 * case-folding search, we can internally turn it into a
+	 * simple string match using kws.  p->fixed tells us if we
+	 * want to use kws.
+	 */
 	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
 	else
 		p->fixed = 0;
 
 	if (p->fixed) {
 		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
 	} else if (opt->fixed) {
+		/*
+		 * We only come here when the pattern has the regexp
+		 * special characters in it, which need to be matched
+		 * literally, while ignoring case.
+		 */
 		compile_fixed_regexp(p, opt);
 		return;
 	}
 
 	if (opt->pcre) {
 		compile_pcre_regexp(p, opt);
 		return;
 	}
 
 	err = regcomp(&p->regexp, p->pattern, opt->regflags);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, 1024);
 		regfree(&p->regexp);
 		compile_regexp_failed(p, errbuf);
 	}
 }

  parent reply	other threads:[~2016-06-27 14:54 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
2016-06-23 16:28 ` [PATCH 01/11] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
2016-06-23 16:28 ` [PATCH 02/11] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
2016-06-23 16:28 ` [PATCH 03/11] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 04/11] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
2016-06-23 20:12   ` Junio C Hamano
2016-06-23 16:29 ` [PATCH 05/11] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
2016-06-23 20:25   ` Junio C Hamano
2016-06-23 16:29 ` [PATCH 06/11] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 07/11] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 08/11] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 09/11] diffcore-pickaxe: "share" regex error handling code Nguyễn Thái Ngọc Duy
2016-06-23 19:16   ` Jeff King
2016-06-23 16:29 ` [PATCH 10/11] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 11/11] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
2016-06-23 20:32 ` [PATCH 00/11] nd/icase updates Junio C Hamano
2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 01/12] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 02/12] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 03/12] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 04/12] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 05/12] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 06/12] grep: rewrite an if/else condition to avoid duplicate expression Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 07/12] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 08/12] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 09/12] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 10/12] diffcore-pickaxe: Add regcomp_or_die() Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 11/12] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 12/12] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
2016-06-27 14:53   ` Junio C Hamano [this message]
2016-06-27 15:00     ` [PATCH v2 00/12] nd/icase updates Junio C Hamano
2016-06-30 15:45     ` Duy Nguyen
2016-07-01 18:18       ` Junio C Hamano
2016-07-01 18:46         ` Duy Nguyen
2016-07-01 18:54           ` Junio C Hamano
2016-07-01 19:11       ` Junio C Hamano
2016-07-01 19:40         ` Junio C Hamano
2016-07-01 20:06           ` Junio C Hamano
2016-07-01 20:07             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqeg7id6ns.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).