git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/11] nd/icase updates
@ 2016-06-23 16:28 Nguyễn Thái Ngọc Duy
  2016-06-23 16:28 ` [PATCH 01/11] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
                   ` (12 more replies)
  0 siblings, 13 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This update drops 1/12, which is an unnecessary change, and changes a
couple of echo/printf to test_write_lines. One of those echo uses
backlashes and causes problems with Debian dash.

Interdiff therefore is not really interesting

diff --git a/builtin/grep.c b/builtin/grep.c
index 46c5ba1..8c516a9 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -809,7 +809,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 
 	if (!opt.pattern_list)
 		die(_("no pattern given."));
-	if (opt.ignore_case)
+	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
 
 	compile_grep_patterns(&opt);
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 4176625..169fd8d 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -5,7 +5,7 @@ test_description='grep icase on non-English locales'
 . ./lib-gettext.sh
 
 test_expect_success GETTEXT_LOCALE 'setup' '
-	printf "TILRAUN: Halló Heimur!" >file &&
+	test_write_lines "TILRAUN: Halló Heimur!" >file &&
 	git add file &&
 	LC_ALL="$is_IS_locale" &&
 	export LC_ALL
@@ -27,7 +27,7 @@ test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 icase' '
 '
 
 test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 string with "+"' '
-	printf "TILRAUN: Hallóó Heimur!" >file2 &&
+	test_write_lines "TILRAUN: Hallóó Heimur!" >file2 &&
 	git add file2 &&
 	git grep -l --perl-regexp "TILRAUN: H.lló+ Heimur!" >actual &&
 	echo file >expected &&
@@ -38,26 +38,26 @@ test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 string with "+"' '
 test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
 	git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
 		 grep fixed >debug1 &&
-	echo "fixed TILRAUN: Halló Heimur!" >expect1 &&
+	test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 &&
 	test_cmp expect1 debug1 &&
 
 	git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!"  2>&1 >/dev/null |
 		 grep fixed >debug2 &&
-	echo "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 &&
+	test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 &&
 	test_cmp expect2 debug2
 '
 
 test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
-	printf "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file &&
+	test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file &&
 
 	git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 >/dev/null |
 		 grep fixed >debug1 &&
-	echo "fixed \\^*TILR^AUN:\\.\\* \\\\Halló \$He\\[]imur!\\\$" >expect1 &&
+	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\Halló \$He\\[]imur!\\\$" >expect1 &&
 	test_cmp expect1 debug1 &&
 
 	git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$"  2>&1 >/dev/null |
 		 grep fixed >debug2 &&
-	echo "fixed \\^*TILR^AUN:\\.\\* \\\\HALLÓ \$HE\\[]IMUR!\\\$" >expect2 &&
+	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\HALLÓ \$HE\\[]IMUR!\\\$" >expect2 &&
 	test_cmp expect2 debug2
 '
 

Nguyễn Thái Ngọc Duy (11):
  grep: break down an "if" stmt in preparation for next changes
  test-regex: isolate the bug test code
  test-regex: expose full regcomp() to the command line
  grep/icase: avoid kwsset on literal non-ascii strings
  grep/icase: avoid kwsset when -F is specified
  grep/pcre: prepare locale-dependent tables for icase matching
  gettext: add is_utf8_locale()
  grep/pcre: support utf-8
  diffcore-pickaxe: "share" regex error handling code
  diffcore-pickaxe: support case insensitive match on non-ascii
  grep.c: reuse "icase" variable

 diffcore-pickaxe.c                       | 27 ++++++++----
 gettext.c                                | 24 ++++++++++-
 gettext.h                                |  1 +
 grep.c                                   | 47 +++++++++++++++++----
 grep.h                                   |  1 +
 quote.c                                  | 37 +++++++++++++++++
 quote.h                                  |  1 +
 t/t0070-fundamental.sh                   |  2 +-
 t/t7812-grep-icase-non-ascii.sh (new +x) | 71 ++++++++++++++++++++++++++++++++
 t/t7813-grep-icase-iso.sh (new +x)       | 19 +++++++++
 test-regex.c                             | 59 +++++++++++++++++++++++++-
 11 files changed, 269 insertions(+), 20 deletions(-)
 create mode 100755 t/t7812-grep-icase-non-ascii.sh
 create mode 100755 t/t7813-grep-icase-iso.sh

-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 01/11] grep: break down an "if" stmt in preparation for next changes
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:28 ` Nguyễn Thái Ngọc Duy
  2016-06-23 16:28 ` [PATCH 02/11] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 7b2b96a..609f218 100644
--- a/grep.c
+++ b/grep.c
@@ -403,7 +403,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
 
-	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
+	if (opt->fixed) {
+		p->fixed = 1;
+	} else if (is_fixed(p->pattern, p->patternlen))
 		p->fixed = 1;
 	else
 		p->fixed = 0;
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/11] test-regex: isolate the bug test code
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
  2016-06-23 16:28 ` [PATCH 01/11] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:28 ` Nguyễn Thái Ngọc Duy
  2016-06-23 16:28 ` [PATCH 03/11] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This is in preparation to turn test-regex into some generic regex
testing command.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t0070-fundamental.sh |  2 +-
 test-regex.c           | 12 ++++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 5ed69a6..991ed2a 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -31,7 +31,7 @@ test_expect_success 'git_mkstemps_mode does not fail if fd 0 is not open' '
 
 test_expect_success 'check for a bug in the regex routines' '
 	# if this test fails, re-build git with NO_REGEX=1
-	test-regex
+	test-regex --bug
 '
 
 test_done
diff --git a/test-regex.c b/test-regex.c
index 0dc598e..67a1a65 100644
--- a/test-regex.c
+++ b/test-regex.c
@@ -1,6 +1,6 @@
 #include "git-compat-util.h"
 
-int main(int argc, char **argv)
+static int test_regex_bug(void)
 {
 	char *pat = "[^={} \t]+";
 	char *str = "={}\nfred";
@@ -16,5 +16,13 @@ int main(int argc, char **argv)
 	if (m[0].rm_so == 3) /* matches '\n' when it should not */
 		die("regex bug confirmed: re-build git with NO_REGEX=1");
 
-	exit(0);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	if (argc == 2 && !strcmp(argv[1], "--bug"))
+		return test_regex_bug();
+	else
+		usage("test-regex --bug");
 }
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/11] test-regex: expose full regcomp() to the command line
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
  2016-06-23 16:28 ` [PATCH 01/11] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
  2016-06-23 16:28 ` [PATCH 02/11] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:28 ` Nguyễn Thái Ngọc Duy
  2016-06-23 16:29 ` [PATCH 04/11] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 test-regex.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/test-regex.c b/test-regex.c
index 67a1a65..eff26f5 100644
--- a/test-regex.c
+++ b/test-regex.c
@@ -1,4 +1,21 @@
 #include "git-compat-util.h"
+#include "gettext.h"
+
+struct reg_flag {
+	const char *name;
+	int flag;
+};
+
+static struct reg_flag reg_flags[] = {
+	{ "EXTENDED",	 REG_EXTENDED	},
+	{ "NEWLINE",	 REG_NEWLINE	},
+	{ "ICASE",	 REG_ICASE	},
+	{ "NOTBOL",	 REG_NOTBOL	},
+#ifdef REG_STARTEND
+	{ "STARTEND",	 REG_STARTEND	},
+#endif
+	{ NULL, 0 }
+};
 
 static int test_regex_bug(void)
 {
@@ -21,8 +38,38 @@ static int test_regex_bug(void)
 
 int main(int argc, char **argv)
 {
+	const char *pat;
+	const char *str;
+	int flags = 0;
+	regex_t r;
+	regmatch_t m[1];
+
 	if (argc == 2 && !strcmp(argv[1], "--bug"))
 		return test_regex_bug();
-	else
-		usage("test-regex --bug");
+	else if (argc < 3)
+		usage("test-regex --bug\n"
+		      "test-regex <pattern> <string> [<options>]");
+
+	argv++;
+	pat = *argv++;
+	str = *argv++;
+	while (*argv) {
+		struct reg_flag *rf;
+		for (rf = reg_flags; rf->name; rf++)
+			if (!strcmp(*argv, rf->name)) {
+				flags |= rf->flag;
+				break;
+			}
+		if (!rf->name)
+			die("do not recognize %s", *argv);
+		argv++;
+	}
+	git_setup_gettext();
+
+	if (regcomp(&r, pat, flags))
+		die("failed regcomp() for pattern '%s'", pat);
+	if (regexec(&r, str, 1, m, 0))
+		return 1;
+
+	return 0;
 }
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/11] grep/icase: avoid kwsset on literal non-ascii strings
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (2 preceding siblings ...)
  2016-06-23 16:28 ` [PATCH 03/11] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 20:12   ` Junio C Hamano
  2016-06-23 16:29 ` [PATCH 05/11] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

When we detect the pattern is just a literal string, we avoid heavy
regex engine and use fast substring search implemented in kwsset.c.
But kws uses git-ctype which is locale-independent so it does not know
how to fold case properly outside ascii range. Let regcomp or pcre
take care of this case instead. Slower, but accurate.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                                   |  7 ++++++-
 t/t7812-grep-icase-non-ascii.sh (new +x) | 23 +++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100755 t/t7812-grep-icase-non-ascii.sh

diff --git a/grep.c b/grep.c
index 609f218..f192727 100644
--- a/grep.c
+++ b/grep.c
@@ -4,6 +4,7 @@
 #include "xdiff-interface.h"
 #include "diff.h"
 #include "diffcore.h"
+#include "commit.h"
 
 static int grep_source_load(struct grep_source *gs);
 static int grep_source_is_binary(struct grep_source *gs);
@@ -398,14 +399,18 @@ static int is_fixed(const char *s, size_t len)
 
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
+	int icase, ascii_only;
 	int err;
 
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
+	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
+	ascii_only     = !has_non_ascii(p->pattern);
 
 	if (opt->fixed) {
 		p->fixed = 1;
-	} else if (is_fixed(p->pattern, p->patternlen))
+	} else if ((!icase || ascii_only) &&
+		   is_fixed(p->pattern, p->patternlen))
 		p->fixed = 1;
 	else
 		p->fixed = 0;
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
new file mode 100755
index 0000000..b78a774
--- /dev/null
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+test_description='grep icase on non-English locales'
+
+. ./lib-gettext.sh
+
+test_expect_success GETTEXT_LOCALE 'setup' '
+	test_write_lines "TILRAUN: Halló Heimur!" >file &&
+	git add file &&
+	LC_ALL="$is_IS_locale" &&
+	export LC_ALL
+'
+
+test_have_prereq GETTEXT_LOCALE &&
+test-regex "HALLÓ" "Halló" ICASE &&
+test_set_prereq REGEX_LOCALE
+
+test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
+	git grep -i "TILRAUN: Halló Heimur!" &&
+	git grep -i "TILRAUN: HALLÓ HEIMUR!"
+'
+
+test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/11] grep/icase: avoid kwsset when -F is specified
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (3 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 04/11] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 20:25   ` Junio C Hamano
  2016-06-23 16:29 ` [PATCH 06/11] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Similar to the previous commit, we can't use kws on icase search
outside ascii range. But we can't simply pass the pattern to
regcomp/pcre like the previous commit because it may contain regex
special characters, so we need to quote the regex first.

To avoid misquote traps that could lead to undefined behavior, we
always stick to basic regex engine in this case. We don't need fancy
features for grepping a literal string anyway.

basic_regex_quote_buf() assumes that if the pattern is in a multibyte
encoding, ascii chars must be unambiguously encoded as single
bytes. This is true at least for UTF-8. For others, let's wait until
people yell up. Chances are nobody uses multibyte, non utf-8 charsets
anymore.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                          | 25 ++++++++++++++++++++++++-
 quote.c                         | 37 +++++++++++++++++++++++++++++++++++++
 quote.h                         |  1 +
 t/t7812-grep-icase-non-ascii.sh | 26 ++++++++++++++++++++++++++
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index f192727..6de58f3 100644
--- a/grep.c
+++ b/grep.c
@@ -5,6 +5,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "commit.h"
+#include "quote.h"
 
 static int grep_source_load(struct grep_source *gs);
 static int grep_source_is_binary(struct grep_source *gs);
@@ -397,6 +398,24 @@ static int is_fixed(const char *s, size_t len)
 	return 1;
 }
 
+static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int err;
+
+	basic_regex_quote_buf(&sb, p->pattern);
+	err = regcomp(&p->regexp, sb.buf, opt->regflags & ~REG_EXTENDED);
+	if (opt->debug)
+		fprintf(stderr, "fixed %s\n", sb.buf);
+	strbuf_release(&sb);
+	if (err) {
+		char errbuf[1024];
+		regerror(err, &p->regexp, errbuf, sizeof(errbuf));
+		regfree(&p->regexp);
+		compile_regexp_failed(p, errbuf);
+	}
+}
+
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int icase, ascii_only;
@@ -408,7 +427,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	ascii_only     = !has_non_ascii(p->pattern);
 
 	if (opt->fixed) {
-		p->fixed = 1;
+		p->fixed = !icase || ascii_only;
+		if (!p->fixed) {
+			compile_fixed_regexp(p, opt);
+			return;
+		}
 	} else if ((!icase || ascii_only) &&
 		   is_fixed(p->pattern, p->patternlen))
 		p->fixed = 1;
diff --git a/quote.c b/quote.c
index fe884d2..c67adb7 100644
--- a/quote.c
+++ b/quote.c
@@ -440,3 +440,40 @@ void tcl_quote_buf(struct strbuf *sb, const char *src)
 	}
 	strbuf_addch(sb, '"');
 }
+
+void basic_regex_quote_buf(struct strbuf *sb, const char *src)
+{
+	char c;
+
+	if (*src == '^') {
+		/* only beginning '^' is special and needs quoting */
+		strbuf_addch(sb, '\\');
+		strbuf_addch(sb, *src++);
+	}
+	if (*src == '*')
+		/* beginning '*' is not special, no quoting */
+		strbuf_addch(sb, *src++);
+
+	while ((c = *src++)) {
+		switch (c) {
+		case '[':
+		case '.':
+		case '\\':
+		case '*':
+			strbuf_addch(sb, '\\');
+			strbuf_addch(sb, c);
+			break;
+
+		case '$':
+			/* only the end '$' is special and needs quoting */
+			if (*src == '\0')
+				strbuf_addch(sb, '\\');
+			strbuf_addch(sb, c);
+			break;
+
+		default:
+			strbuf_addch(sb, c);
+			break;
+		}
+	}
+}
diff --git a/quote.h b/quote.h
index 99e04d3..362d315 100644
--- a/quote.h
+++ b/quote.h
@@ -67,5 +67,6 @@ extern char *quote_path_relative(const char *in, const char *prefix,
 extern void perl_quote_buf(struct strbuf *sb, const char *src);
 extern void python_quote_buf(struct strbuf *sb, const char *src);
 extern void tcl_quote_buf(struct strbuf *sb, const char *src);
+extern void basic_regex_quote_buf(struct strbuf *sb, const char *src);
 
 #endif
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index b78a774..1929809 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -20,4 +20,30 @@ test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
 	git grep -i "TILRAUN: HALLÓ HEIMUR!"
 '
 
+test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
+	git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
+		 grep fixed >debug1 &&
+	test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 &&
+	test_cmp expect1 debug1 &&
+
+	git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!"  2>&1 >/dev/null |
+		 grep fixed >debug2 &&
+	test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 &&
+	test_cmp expect2 debug2
+'
+
+test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
+	test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file &&
+
+	git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 >/dev/null |
+		 grep fixed >debug1 &&
+	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\Halló \$He\\[]imur!\\\$" >expect1 &&
+	test_cmp expect1 debug1 &&
+
+	git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$"  2>&1 >/dev/null |
+		 grep fixed >debug2 &&
+	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\HALLÓ \$HE\\[]IMUR!\\\$" >expect2 &&
+	test_cmp expect2 debug2
+'
+
 test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/11] grep/pcre: prepare locale-dependent tables for icase matching
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (4 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 05/11] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 16:29 ` [PATCH 07/11] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2333 bytes --]

The default tables are usually built with C locale and only suitable
for LANG=C or similar.  This should make case insensitive search work
correctly for all single-byte charsets.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                             |  8 ++++++--
 grep.h                             |  1 +
 t/t7813-grep-icase-iso.sh (new +x) | 19 +++++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100755 t/t7813-grep-icase-iso.sh

diff --git a/grep.c b/grep.c
index 6de58f3..22f4d99 100644
--- a/grep.c
+++ b/grep.c
@@ -324,11 +324,14 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	int erroffset;
 	int options = PCRE_MULTILINE;
 
-	if (opt->ignore_case)
+	if (opt->ignore_case) {
+		if (has_non_ascii(p->pattern))
+			p->pcre_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
+	}
 
 	p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
-			NULL);
+				      p->pcre_tables);
 	if (!p->pcre_regexp)
 		compile_regexp_failed(p, error);
 
@@ -362,6 +365,7 @@ static void free_pcre_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre_regexp);
 	pcre_free(p->pcre_extra_info);
+	pcre_free((void *)p->pcre_tables);
 }
 #else /* !USE_LIBPCRE */
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 95f197a..cee4357 100644
--- a/grep.h
+++ b/grep.h
@@ -48,6 +48,7 @@ struct grep_pat {
 	regex_t regexp;
 	pcre *pcre_regexp;
 	pcre_extra *pcre_extra_info;
+	const unsigned char *pcre_tables;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
diff --git a/t/t7813-grep-icase-iso.sh b/t/t7813-grep-icase-iso.sh
new file mode 100755
index 0000000..efef7fb
--- /dev/null
+++ b/t/t7813-grep-icase-iso.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+test_description='grep icase on non-English locales'
+
+. ./lib-gettext.sh
+
+test_expect_success GETTEXT_ISO_LOCALE 'setup' '
+	printf "TILRAUN: Halló Heimur!" >file &&
+	git add file &&
+	LC_ALL="$is_IS_iso_locale" &&
+	export LC_ALL
+'
+
+test_expect_success GETTEXT_ISO_LOCALE,LIBPCRE 'grep pcre string' '
+	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
+	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
+'
+
+test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/11] gettext: add is_utf8_locale()
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (5 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 06/11] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 16:29 ` [PATCH 08/11] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

This function returns true if git is running under an UTF-8
locale. pcre in the next patch will need this.

is_encoding_utf8() is used instead of strcmp() to catch both "utf-8"
and "utf8" suffixes.

When built with no gettext support, we peek in several env variables
to detect UTF-8. pcre library might support utf-8 even if libc is
built without locale support.. The peeking code is a copy from
compat/regex/regcomp.c

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 gettext.c | 24 ++++++++++++++++++++++--
 gettext.h |  1 +
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gettext.c b/gettext.c
index a268a2c..db727ea 100644
--- a/gettext.c
+++ b/gettext.c
@@ -18,6 +18,8 @@
 #	endif
 #endif
 
+static const char *charset;
+
 /*
  * Guess the user's preferred languages from the value in LANGUAGE environment
  * variable and LC_MESSAGES locale category if NO_GETTEXT is not defined.
@@ -65,7 +67,6 @@ static int test_vsnprintf(const char *fmt, ...)
 	return ret;
 }
 
-static const char *charset;
 static void init_gettext_charset(const char *domain)
 {
 	/*
@@ -172,8 +173,27 @@ int gettext_width(const char *s)
 {
 	static int is_utf8 = -1;
 	if (is_utf8 == -1)
-		is_utf8 = !strcmp(charset, "UTF-8");
+		is_utf8 = is_utf8_locale();
 
 	return is_utf8 ? utf8_strwidth(s) : strlen(s);
 }
 #endif
+
+int is_utf8_locale(void)
+{
+#ifdef NO_GETTEXT
+	if (!charset) {
+		const char *env = getenv("LC_ALL");
+		if (!env || !*env)
+			env = getenv("LC_CTYPE");
+		if (!env || !*env)
+			env = getenv("LANG");
+		if (!env)
+			env = "";
+		if (strchr(env, '.'))
+			env = strchr(env, '.') + 1;
+		charset = xstrdup(env);
+	}
+#endif
+	return is_encoding_utf8(charset);
+}
diff --git a/gettext.h b/gettext.h
index 33696a4..7eee64a 100644
--- a/gettext.h
+++ b/gettext.h
@@ -90,5 +90,6 @@ const char *Q_(const char *msgid, const char *plu, unsigned long n)
 #endif
 
 const char *get_preferred_languages(void);
+extern int is_utf8_locale(void);
 
 #endif
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/11] grep/pcre: support utf-8
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (6 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 07/11] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 16:29 ` [PATCH 09/11] diffcore-pickaxe: "share" regex error handling code Nguyễn Thái Ngọc Duy
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

In the previous change in this function, we add locale support for
single-byte encodings only. It looks like pcre only supports utf-* as
multibyte encodings, the others are left in the cold (which is
fine).

We need to enable PCRE_UTF8 so pcre can find character boundary
correctly. It's needed for case folding (when --ignore-case is used)
or '*', '+' or similar syntax is used.

The "has_non_ascii()" check is to be on the conservative side. If
there's non-ascii in the pattern, the searched content could still be
in utf-8, but we can treat it just like a byte stream and everything
should work. If we force utf-8 based on locale only and pcre validates
utf-8 and the file content is in non-utf8 encoding, things break.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                          |  2 ++
 t/t7812-grep-icase-non-ascii.sh | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/grep.c b/grep.c
index 22f4d99..6e99b01 100644
--- a/grep.c
+++ b/grep.c
@@ -329,6 +329,8 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 			p->pcre_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
 	}
+	if (is_utf8_locale() && has_non_ascii(p->pattern))
+		options |= PCRE_UTF8;
 
 	p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
 				      p->pcre_tables);
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 1929809..08ae4c9 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -20,6 +20,21 @@ test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
 	git grep -i "TILRAUN: HALLÓ HEIMUR!"
 '
 
+test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 icase' '
+	git grep --perl-regexp    "TILRAUN: H.lló Heimur!" &&
+	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
+	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
+'
+
+test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 string with "+"' '
+	test_write_lines "TILRAUN: Hallóó Heimur!" >file2 &&
+	git add file2 &&
+	git grep -l --perl-regexp "TILRAUN: H.lló+ Heimur!" >actual &&
+	echo file >expected &&
+	echo file2 >>expected &&
+	test_cmp expected actual
+'
+
 test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
 	git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
 		 grep fixed >debug1 &&
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/11] diffcore-pickaxe: "share" regex error handling code
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (7 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 08/11] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 19:16   ` Jeff King
  2016-06-23 16:29 ` [PATCH 10/11] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

There's another regcomp code block coming in this function. By moving
the error handling code out of this block, we don't have to add the
same error handling code in the new block.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diffcore-pickaxe.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 7715c13..69c6567 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -204,20 +204,13 @@ void diffcore_pickaxe(struct diff_options *o)
 	int opts = o->pickaxe_opts;
 	regex_t regex, *regexp = NULL;
 	kwset_t kws = NULL;
+	int err = 0;
 
 	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
-		int err;
 		int cflags = REG_EXTENDED | REG_NEWLINE;
 		if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
 			cflags |= REG_ICASE;
 		err = regcomp(&regex, needle, cflags);
-		if (err) {
-			/* The POSIX.2 people are surely sick */
-			char errbuf[1024];
-			regerror(err, &regex, errbuf, 1024);
-			regfree(&regex);
-			die("invalid regex: %s", errbuf);
-		}
 		regexp = &regex;
 	} else {
 		kws = kwsalloc(DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE)
@@ -225,6 +218,13 @@ void diffcore_pickaxe(struct diff_options *o)
 		kwsincr(kws, needle, strlen(needle));
 		kwsprep(kws);
 	}
+	if (err) {
+		/* The POSIX.2 people are surely sick */
+		char errbuf[1024];
+		regerror(err, &regex, errbuf, 1024);
+		regfree(&regex);
+		die("invalid regex: %s", errbuf);
+	}
 
 	/* Might want to warn when both S and G are on; I don't care... */
 	pickaxe(&diff_queued_diff, o, regexp, kws,
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/11] diffcore-pickaxe: support case insensitive match on non-ascii
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (8 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 09/11] diffcore-pickaxe: "share" regex error handling code Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 16:29 ` [PATCH 11/11] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Similar to the "grep -F -i" case, we can't use kws on icase search
outside ascii range, so we quote the string and pass it to regcomp as
a basic regexp and let regex engine deal with case sensitivity.

The new test is put in t7812 instead of t4209-log-pickaxe because
lib-gettext.sh might cause problems elsewhere, probably.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diffcore-pickaxe.c              | 11 +++++++++++
 t/t7812-grep-icase-non-ascii.sh |  7 +++++++
 2 files changed, 18 insertions(+)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 69c6567..0a5f877 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -7,6 +7,8 @@
 #include "diffcore.h"
 #include "xdiff-interface.h"
 #include "kwset.h"
+#include "commit.h"
+#include "quote.h"
 
 typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two,
 			  struct diff_options *o,
@@ -212,6 +214,15 @@ void diffcore_pickaxe(struct diff_options *o)
 			cflags |= REG_ICASE;
 		err = regcomp(&regex, needle, cflags);
 		regexp = &regex;
+	} else if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE) &&
+		   has_non_ascii(needle)) {
+		struct strbuf sb = STRBUF_INIT;
+		int cflags = REG_NEWLINE | REG_ICASE;
+
+		basic_regex_quote_buf(&sb, needle);
+		err = regcomp(&regex, sb.buf, cflags);
+		strbuf_release(&sb);
+		regexp = &regex;
 	} else {
 		kws = kwsalloc(DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE)
 			       ? tolower_trans_tbl : NULL);
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 08ae4c9..169fd8d 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -61,4 +61,11 @@ test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
 	test_cmp expect2 debug2
 '
 
+test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' '
+	git commit -m first &&
+	git log --format=%f -i -S"TILRAUN: HALLÓ HEIMUR!" >actual &&
+	echo first >expected &&
+	test_cmp expected actual
+'
+
 test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/11] grep.c: reuse "icase" variable
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (9 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 10/11] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
@ 2016-06-23 16:29 ` Nguyễn Thái Ngọc Duy
  2016-06-23 20:32 ` [PATCH 00/11] nd/icase updates Junio C Hamano
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-23 16:29 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 6e99b01..cb058a5 100644
--- a/grep.c
+++ b/grep.c
@@ -445,10 +445,7 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		p->fixed = 0;
 
 	if (p->fixed) {
-		if (opt->regflags & REG_ICASE || p->ignore_case)
-			p->kws = kwsalloc(tolower_trans_tbl);
-		else
-			p->kws = kwsalloc(NULL);
+		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/11] diffcore-pickaxe: "share" regex error handling code
  2016-06-23 16:29 ` [PATCH 09/11] diffcore-pickaxe: "share" regex error handling code Nguyễn Thái Ngọc Duy
@ 2016-06-23 19:16   ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2016-06-23 19:16 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Junio C Hamano

On Thu, Jun 23, 2016 at 06:29:05PM +0200, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
> index 7715c13..69c6567 100644
> --- a/diffcore-pickaxe.c
> +++ b/diffcore-pickaxe.c
> @@ -204,20 +204,13 @@ void diffcore_pickaxe(struct diff_options *o)
>  	int opts = o->pickaxe_opts;
>  	regex_t regex, *regexp = NULL;
>  	kwset_t kws = NULL;
> +	int err = 0;
>  
>  	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
> -		int err;
>  		int cflags = REG_EXTENDED | REG_NEWLINE;
>  		if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
>  			cflags |= REG_ICASE;
>  		err = regcomp(&regex, needle, cflags);
> -		if (err) {
> -			/* The POSIX.2 people are surely sick */
> -			char errbuf[1024];
> -			regerror(err, &regex, errbuf, 1024);
> -			regfree(&regex);
> -			die("invalid regex: %s", errbuf);
> -		}
>  		regexp = &regex;
>  	} else {
>  		kws = kwsalloc(DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE)
> @@ -225,6 +218,13 @@ void diffcore_pickaxe(struct diff_options *o)
>  		kwsincr(kws, needle, strlen(needle));
>  		kwsprep(kws);
>  	}
> +	if (err) {
> +		/* The POSIX.2 people are surely sick */
> +		char errbuf[1024];
> +		regerror(err, &regex, errbuf, 1024);
> +		regfree(&regex);
> +		die("invalid regex: %s", errbuf);
> +	}

Hrm. I wondered what happens if we see an error in the kwset code block,
which did not put anything useful in "regex" at all.

It's OK right now, because "err" is newly promoted to the top of the
function, and so we know that kwset cannot call it. But it seems like
an accident waiting to happen. Calling it "regex_err" or something might
help.

But I also wonder if a function wouldn't be better. You could even roll
it up with regcomp, like:

  static void regcomp_or_die(regex_t *regex, const char *pattern, int flags)
  {
	int err = regcomp(regex, pattern, flags);
	if (err) {
		char buf[1024];
		regerror(err, &regex, buf, sizeof(buf));
		regfree(&regex);
		die("invalid regex: %s", buf);
	}
  }

I think you could also skip the regfree(), since we are about to die. I
also think the error message would probably be better if it mentioned
the text of "pattern" itself (since it might be coming from config, or
you might have provided several patterns, or you might have thought
something was supposed to be a non-regex).

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/11] grep/icase: avoid kwsset on literal non-ascii strings
  2016-06-23 16:29 ` [PATCH 04/11] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
@ 2016-06-23 20:12   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-06-23 20:12 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

>  static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  {
> +	int icase, ascii_only;
>  	int err;
>  
>  	p->word_regexp = opt->word_regexp;
>  	p->ignore_case = opt->ignore_case;
> +	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
> +	ascii_only     = !has_non_ascii(p->pattern);
>  
>  	if (opt->fixed) {
>  		p->fixed = 1;
> -	} else if (is_fixed(p->pattern, p->patternlen))
> +	} else if ((!icase || ascii_only) &&
> +		   is_fixed(p->pattern, p->patternlen))
>  		p->fixed = 1;

... we are not told to do "fixed" explicitly with "-F", and that is
safe for a literal pattern if the pattern is only ascii (with or
without -i) but not safe with "-i" when dealing with non-ascii
pattern.

Makes perfect sense.

>  	else
>  		p->fixed = 0;
> diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
> new file mode 100755
> index 0000000..b78a774
> --- /dev/null
> +++ b/t/t7812-grep-icase-non-ascii.sh
> @@ -0,0 +1,23 @@
> +#!/bin/sh
> +
> +test_description='grep icase on non-English locales'
> +
> +. ./lib-gettext.sh
> +
> +test_expect_success GETTEXT_LOCALE 'setup' '
> +	test_write_lines "TILRAUN: Halló Heimur!" >file &&
> +	git add file &&
> +	LC_ALL="$is_IS_locale" &&
> +	export LC_ALL
> +'
> +
> +test_have_prereq GETTEXT_LOCALE &&
> +test-regex "HALLÓ" "Halló" ICASE &&
> +test_set_prereq REGEX_LOCALE
> +
> +test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
> +	git grep -i "TILRAUN: Halló Heimur!" &&
> +	git grep -i "TILRAUN: HALLÓ HEIMUR!"
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/11] grep/icase: avoid kwsset when -F is specified
  2016-06-23 16:29 ` [PATCH 05/11] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
@ 2016-06-23 20:25   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-06-23 20:25 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> @@ -408,7 +427,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  	ascii_only     = !has_non_ascii(p->pattern);
>  
>  	if (opt->fixed) {
> -		p->fixed = 1;
> +		p->fixed = !icase || ascii_only;
> +		if (!p->fixed) {
> +			compile_fixed_regexp(p, opt);
> +			return;
> +		}
>  	} else if ((!icase || ascii_only) &&
>  		   is_fixed(p->pattern, p->patternlen))
>  		p->fixed = 1;

Makes me feel somewhat dirty to see that we do the same "!icase ||
ascii_only" on both sides of the if/else cascade.

I wonder if it would be more readable to structure it like this
instead?

	if (opt->fixed || is_fixed(...))
		p->fixed = (!icase || ascii_only);
	else
        	p->fixed = 0;

	if (p->fixed)
        	/* do the kws thing and return */
	else if (opt->fixed)
        	/* do the "quote into regexp" thing and return */

	/* pcre and other from the original follows */


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 00/11] nd/icase updates
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (10 preceding siblings ...)
  2016-06-23 16:29 ` [PATCH 11/11] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
@ 2016-06-23 20:32 ` Junio C Hamano
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
  12 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-06-23 20:32 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This update drops 1/12, which is an unnecessary change, and changes a
> couple of echo/printf to test_write_lines. One of those echo uses
> backlashes and causes problems with Debian dash.
>
> Interdiff therefore is not really interesting

Neither Debian dash (that understands XSI echo) nor other shells
(that do not do XSI echo) was at fault, though.  It was "one of
thoese echo was written in an unportable way with backslashes" ;-)

In any case, I re-read the whole thing, and except for a small nit I
sent separately, all looked sensible.

Thanks, will replace what has been queued on 'pu'.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 00/12] nd/icase updates
  2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
                   ` (11 preceding siblings ...)
  2016-06-23 20:32 ` [PATCH 00/11] nd/icase updates Junio C Hamano
@ 2016-06-25  5:22 ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 01/12] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
                     ` (12 more replies)
  12 siblings, 13 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

v2 fixes Junio's and Jeff's comments (both good). The sharing "!icase
|| ascii_only" is made a separate commit (6/12) because I think it
takes some seconds to realize that the conversion is correct and it's
technically not needed in 5/12 (and it's sort of the opposite of 1/12)

Interdiff

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 0a5f877..55067ca 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -200,19 +200,30 @@ static void pickaxe(struct diff_queue_struct *q, struct diff_options *o,
 	*q = outq;
 }
 
+static void regcomp_or_die(regex_t *regex, const char *needle, int cflags)
+{
+	int err = regcomp(regex, needle, cflags);
+	if (err) {
+		/* The POSIX.2 people are surely sick */
+		char errbuf[1024];
+		regerror(err, regex, errbuf, 1024);
+		regfree(regex);
+		die("invalid regex: %s", errbuf);
+	}
+}
+
 void diffcore_pickaxe(struct diff_options *o)
 {
 	const char *needle = o->pickaxe;
 	int opts = o->pickaxe_opts;
 	regex_t regex, *regexp = NULL;
 	kwset_t kws = NULL;
-	int err = 0;
 
 	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
 		int cflags = REG_EXTENDED | REG_NEWLINE;
 		if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
 			cflags |= REG_ICASE;
-		err = regcomp(&regex, needle, cflags);
+		regcomp_or_die(&regex, needle, cflags);
 		regexp = &regex;
 	} else if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE) &&
 		   has_non_ascii(needle)) {
@@ -220,7 +231,7 @@ void diffcore_pickaxe(struct diff_options *o)
 		int cflags = REG_NEWLINE | REG_ICASE;
 
 		basic_regex_quote_buf(&sb, needle);
-		err = regcomp(&regex, sb.buf, cflags);
+		regcomp_or_die(&regex, sb.buf, cflags);
 		strbuf_release(&sb);
 		regexp = &regex;
 	} else {
@@ -229,13 +240,6 @@ void diffcore_pickaxe(struct diff_options *o)
 		kwsincr(kws, needle, strlen(needle));
 		kwsprep(kws);
 	}
-	if (err) {
-		/* The POSIX.2 people are surely sick */
-		char errbuf[1024];
-		regerror(err, &regex, errbuf, 1024);
-		regfree(&regex);
-		die("invalid regex: %s", errbuf);
-	}
 
 	/* Might want to warn when both S and G are on; I don't care... */
 	pickaxe(&diff_queued_diff, o, regexp, kws,
diff --git a/grep.c b/grep.c
index cb058a5..92587a8 100644
--- a/grep.c
+++ b/grep.c
@@ -432,15 +432,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
 	ascii_only     = !has_non_ascii(p->pattern);
 
-	if (opt->fixed) {
+	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
-		if (!p->fixed) {
-			compile_fixed_regexp(p, opt);
-			return;
-		}
-	} else if ((!icase || ascii_only) &&
-		   is_fixed(p->pattern, p->patternlen))
-		p->fixed = 1;
 	else
 		p->fixed = 0;
 
@@ -449,6 +442,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
+	} else if (opt->fixed) {
+		compile_fixed_regexp(p, opt);
+		return;
 	}
 
 	if (opt->pcre) {

Nguyễn Thái Ngọc Duy (12):
  grep: break down an "if" stmt in preparation for next changes
  test-regex: isolate the bug test code
  test-regex: expose full regcomp() to the command line
  grep/icase: avoid kwsset on literal non-ascii strings
  grep/icase: avoid kwsset when -F is specified
  grep: rewrite an if/else condition to avoid duplicate expression
  grep/pcre: prepare locale-dependent tables for icase matching
  gettext: add is_utf8_locale()
  grep/pcre: support utf-8
  diffcore-pickaxe: Add regcomp_or_die()
  diffcore-pickaxe: support case insensitive match on non-ascii
  grep.c: reuse "icase" variable

 diffcore-pickaxe.c                       | 33 +++++++++++----
 gettext.c                                | 24 ++++++++++-
 gettext.h                                |  1 +
 grep.c                                   | 43 +++++++++++++++----
 grep.h                                   |  1 +
 quote.c                                  | 37 +++++++++++++++++
 quote.h                                  |  1 +
 t/t0070-fundamental.sh                   |  2 +-
 t/t7812-grep-icase-non-ascii.sh (new +x) | 71 ++++++++++++++++++++++++++++++++
 t/t7813-grep-icase-iso.sh (new +x)       | 19 +++++++++
 test-regex.c                             | 59 +++++++++++++++++++++++++-
 11 files changed, 270 insertions(+), 21 deletions(-)
 create mode 100755 t/t7812-grep-icase-non-ascii.sh
 create mode 100755 t/t7813-grep-icase-iso.sh

-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 01/12] grep: break down an "if" stmt in preparation for next changes
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 02/12] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 7b2b96a..f430d7e 100644
--- a/grep.c
+++ b/grep.c
@@ -403,7 +403,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
 
-	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
+	if (opt->fixed)
+		p->fixed = 1;
+	else if (is_fixed(p->pattern, p->patternlen))
 		p->fixed = 1;
 	else
 		p->fixed = 0;
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 02/12] test-regex: isolate the bug test code
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 01/12] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 03/12] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

This is in preparation to turn test-regex into some generic regex
testing command.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t0070-fundamental.sh |  2 +-
 test-regex.c           | 12 ++++++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 5ed69a6..991ed2a 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -31,7 +31,7 @@ test_expect_success 'git_mkstemps_mode does not fail if fd 0 is not open' '
 
 test_expect_success 'check for a bug in the regex routines' '
 	# if this test fails, re-build git with NO_REGEX=1
-	test-regex
+	test-regex --bug
 '
 
 test_done
diff --git a/test-regex.c b/test-regex.c
index 0dc598e..67a1a65 100644
--- a/test-regex.c
+++ b/test-regex.c
@@ -1,6 +1,6 @@
 #include "git-compat-util.h"
 
-int main(int argc, char **argv)
+static int test_regex_bug(void)
 {
 	char *pat = "[^={} \t]+";
 	char *str = "={}\nfred";
@@ -16,5 +16,13 @@ int main(int argc, char **argv)
 	if (m[0].rm_so == 3) /* matches '\n' when it should not */
 		die("regex bug confirmed: re-build git with NO_REGEX=1");
 
-	exit(0);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	if (argc == 2 && !strcmp(argv[1], "--bug"))
+		return test_regex_bug();
+	else
+		usage("test-regex --bug");
 }
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 03/12] test-regex: expose full regcomp() to the command line
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 01/12] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 02/12] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 04/12] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 test-regex.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/test-regex.c b/test-regex.c
index 67a1a65..eff26f5 100644
--- a/test-regex.c
+++ b/test-regex.c
@@ -1,4 +1,21 @@
 #include "git-compat-util.h"
+#include "gettext.h"
+
+struct reg_flag {
+	const char *name;
+	int flag;
+};
+
+static struct reg_flag reg_flags[] = {
+	{ "EXTENDED",	 REG_EXTENDED	},
+	{ "NEWLINE",	 REG_NEWLINE	},
+	{ "ICASE",	 REG_ICASE	},
+	{ "NOTBOL",	 REG_NOTBOL	},
+#ifdef REG_STARTEND
+	{ "STARTEND",	 REG_STARTEND	},
+#endif
+	{ NULL, 0 }
+};
 
 static int test_regex_bug(void)
 {
@@ -21,8 +38,38 @@ static int test_regex_bug(void)
 
 int main(int argc, char **argv)
 {
+	const char *pat;
+	const char *str;
+	int flags = 0;
+	regex_t r;
+	regmatch_t m[1];
+
 	if (argc == 2 && !strcmp(argv[1], "--bug"))
 		return test_regex_bug();
-	else
-		usage("test-regex --bug");
+	else if (argc < 3)
+		usage("test-regex --bug\n"
+		      "test-regex <pattern> <string> [<options>]");
+
+	argv++;
+	pat = *argv++;
+	str = *argv++;
+	while (*argv) {
+		struct reg_flag *rf;
+		for (rf = reg_flags; rf->name; rf++)
+			if (!strcmp(*argv, rf->name)) {
+				flags |= rf->flag;
+				break;
+			}
+		if (!rf->name)
+			die("do not recognize %s", *argv);
+		argv++;
+	}
+	git_setup_gettext();
+
+	if (regcomp(&r, pat, flags))
+		die("failed regcomp() for pattern '%s'", pat);
+	if (regexec(&r, str, 1, m, 0))
+		return 1;
+
+	return 0;
 }
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 04/12] grep/icase: avoid kwsset on literal non-ascii strings
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (2 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 03/12] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 05/12] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

When we detect the pattern is just a literal string, we avoid heavy
regex engine and use fast substring search implemented in kwsset.c.
But kws uses git-ctype which is locale-independent so it does not know
how to fold case properly outside ascii range. Let regcomp or pcre
take care of this case instead. Slower, but accurate.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                                   |  7 ++++++-
 t/t7812-grep-icase-non-ascii.sh (new +x) | 23 +++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100755 t/t7812-grep-icase-non-ascii.sh

diff --git a/grep.c b/grep.c
index f430d7e..451275d 100644
--- a/grep.c
+++ b/grep.c
@@ -4,6 +4,7 @@
 #include "xdiff-interface.h"
 #include "diff.h"
 #include "diffcore.h"
+#include "commit.h"
 
 static int grep_source_load(struct grep_source *gs);
 static int grep_source_is_binary(struct grep_source *gs);
@@ -398,14 +399,18 @@ static int is_fixed(const char *s, size_t len)
 
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
+	int icase, ascii_only;
 	int err;
 
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
+	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
+	ascii_only     = !has_non_ascii(p->pattern);
 
 	if (opt->fixed)
 		p->fixed = 1;
-	else if (is_fixed(p->pattern, p->patternlen))
+	else if ((!icase || ascii_only) &&
+		 is_fixed(p->pattern, p->patternlen))
 		p->fixed = 1;
 	else
 		p->fixed = 0;
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
new file mode 100755
index 0000000..b78a774
--- /dev/null
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+test_description='grep icase on non-English locales'
+
+. ./lib-gettext.sh
+
+test_expect_success GETTEXT_LOCALE 'setup' '
+	test_write_lines "TILRAUN: Halló Heimur!" >file &&
+	git add file &&
+	LC_ALL="$is_IS_locale" &&
+	export LC_ALL
+'
+
+test_have_prereq GETTEXT_LOCALE &&
+test-regex "HALLÓ" "Halló" ICASE &&
+test_set_prereq REGEX_LOCALE
+
+test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
+	git grep -i "TILRAUN: Halló Heimur!" &&
+	git grep -i "TILRAUN: HALLÓ HEIMUR!"
+'
+
+test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 05/12] grep/icase: avoid kwsset when -F is specified
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (3 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 04/12] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 06/12] grep: rewrite an if/else condition to avoid duplicate expression Nguyễn Thái Ngọc Duy
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Similar to the previous commit, we can't use kws on icase search
outside ascii range. But we can't simply pass the pattern to
regcomp/pcre like the previous commit because it may contain regex
special characters, so we need to quote the regex first.

To avoid misquote traps that could lead to undefined behavior, we
always stick to basic regex engine in this case. We don't need fancy
features for grepping a literal string anyway.

basic_regex_quote_buf() assumes that if the pattern is in a multibyte
encoding, ascii chars must be unambiguously encoded as single
bytes. This is true at least for UTF-8. For others, let's wait until
people yell up. Chances are nobody uses multibyte, non utf-8 charsets
anymore.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                          | 24 +++++++++++++++++++++++-
 quote.c                         | 37 +++++++++++++++++++++++++++++++++++++
 quote.h                         |  1 +
 t/t7812-grep-icase-non-ascii.sh | 26 ++++++++++++++++++++++++++
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 451275d..5a8c52a 100644
--- a/grep.c
+++ b/grep.c
@@ -5,6 +5,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "commit.h"
+#include "quote.h"
 
 static int grep_source_load(struct grep_source *gs);
 static int grep_source_is_binary(struct grep_source *gs);
@@ -397,6 +398,24 @@ static int is_fixed(const char *s, size_t len)
 	return 1;
 }
 
+static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int err;
+
+	basic_regex_quote_buf(&sb, p->pattern);
+	err = regcomp(&p->regexp, sb.buf, opt->regflags & ~REG_EXTENDED);
+	if (opt->debug)
+		fprintf(stderr, "fixed %s\n", sb.buf);
+	strbuf_release(&sb);
+	if (err) {
+		char errbuf[1024];
+		regerror(err, &p->regexp, errbuf, sizeof(errbuf));
+		regfree(&p->regexp);
+		compile_regexp_failed(p, errbuf);
+	}
+}
+
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int icase, ascii_only;
@@ -408,7 +427,7 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	ascii_only     = !has_non_ascii(p->pattern);
 
 	if (opt->fixed)
-		p->fixed = 1;
+		p->fixed = !icase || ascii_only;
 	else if ((!icase || ascii_only) &&
 		 is_fixed(p->pattern, p->patternlen))
 		p->fixed = 1;
@@ -423,6 +442,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
+	} else if (opt->fixed) {
+		compile_fixed_regexp(p, opt);
+		return;
 	}
 
 	if (opt->pcre) {
diff --git a/quote.c b/quote.c
index fe884d2..c67adb7 100644
--- a/quote.c
+++ b/quote.c
@@ -440,3 +440,40 @@ void tcl_quote_buf(struct strbuf *sb, const char *src)
 	}
 	strbuf_addch(sb, '"');
 }
+
+void basic_regex_quote_buf(struct strbuf *sb, const char *src)
+{
+	char c;
+
+	if (*src == '^') {
+		/* only beginning '^' is special and needs quoting */
+		strbuf_addch(sb, '\\');
+		strbuf_addch(sb, *src++);
+	}
+	if (*src == '*')
+		/* beginning '*' is not special, no quoting */
+		strbuf_addch(sb, *src++);
+
+	while ((c = *src++)) {
+		switch (c) {
+		case '[':
+		case '.':
+		case '\\':
+		case '*':
+			strbuf_addch(sb, '\\');
+			strbuf_addch(sb, c);
+			break;
+
+		case '$':
+			/* only the end '$' is special and needs quoting */
+			if (*src == '\0')
+				strbuf_addch(sb, '\\');
+			strbuf_addch(sb, c);
+			break;
+
+		default:
+			strbuf_addch(sb, c);
+			break;
+		}
+	}
+}
diff --git a/quote.h b/quote.h
index 99e04d3..362d315 100644
--- a/quote.h
+++ b/quote.h
@@ -67,5 +67,6 @@ extern char *quote_path_relative(const char *in, const char *prefix,
 extern void perl_quote_buf(struct strbuf *sb, const char *src);
 extern void python_quote_buf(struct strbuf *sb, const char *src);
 extern void tcl_quote_buf(struct strbuf *sb, const char *src);
+extern void basic_regex_quote_buf(struct strbuf *sb, const char *src);
 
 #endif
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index b78a774..1929809 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -20,4 +20,30 @@ test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
 	git grep -i "TILRAUN: HALLÓ HEIMUR!"
 '
 
+test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
+	git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
+		 grep fixed >debug1 &&
+	test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 &&
+	test_cmp expect1 debug1 &&
+
+	git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!"  2>&1 >/dev/null |
+		 grep fixed >debug2 &&
+	test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 &&
+	test_cmp expect2 debug2
+'
+
+test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
+	test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file &&
+
+	git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 >/dev/null |
+		 grep fixed >debug1 &&
+	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\Halló \$He\\[]imur!\\\$" >expect1 &&
+	test_cmp expect1 debug1 &&
+
+	git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$"  2>&1 >/dev/null |
+		 grep fixed >debug2 &&
+	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\HALLÓ \$HE\\[]IMUR!\\\$" >expect2 &&
+	test_cmp expect2 debug2
+'
+
 test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 06/12] grep: rewrite an if/else condition to avoid duplicate expression
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (4 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 05/12] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 07/12] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

"!icase || ascii_only" is repeated twice in this if/else chain as this
series evolves. Rewrite it (and basically revert the first if
condition back to before the "grep: break down an "if" stmt..." commit).

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 grep.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 5a8c52a..4b6b7bc 100644
--- a/grep.c
+++ b/grep.c
@@ -426,11 +426,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
 	ascii_only     = !has_non_ascii(p->pattern);
 
-	if (opt->fixed)
+	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
-	else if ((!icase || ascii_only) &&
-		 is_fixed(p->pattern, p->patternlen))
-		p->fixed = 1;
 	else
 		p->fixed = 0;
 
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 07/12] grep/pcre: prepare locale-dependent tables for icase matching
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (5 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 06/12] grep: rewrite an if/else condition to avoid duplicate expression Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 08/12] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2333 bytes --]

The default tables are usually built with C locale and only suitable
for LANG=C or similar.  This should make case insensitive search work
correctly for all single-byte charsets.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                             |  8 ++++++--
 grep.h                             |  1 +
 t/t7813-grep-icase-iso.sh (new +x) | 19 +++++++++++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100755 t/t7813-grep-icase-iso.sh

diff --git a/grep.c b/grep.c
index 4b6b7bc..8cf4247 100644
--- a/grep.c
+++ b/grep.c
@@ -324,11 +324,14 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	int erroffset;
 	int options = PCRE_MULTILINE;
 
-	if (opt->ignore_case)
+	if (opt->ignore_case) {
+		if (has_non_ascii(p->pattern))
+			p->pcre_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
+	}
 
 	p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
-			NULL);
+				      p->pcre_tables);
 	if (!p->pcre_regexp)
 		compile_regexp_failed(p, error);
 
@@ -362,6 +365,7 @@ static void free_pcre_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre_regexp);
 	pcre_free(p->pcre_extra_info);
+	pcre_free((void *)p->pcre_tables);
 }
 #else /* !USE_LIBPCRE */
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 95f197a..cee4357 100644
--- a/grep.h
+++ b/grep.h
@@ -48,6 +48,7 @@ struct grep_pat {
 	regex_t regexp;
 	pcre *pcre_regexp;
 	pcre_extra *pcre_extra_info;
+	const unsigned char *pcre_tables;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
diff --git a/t/t7813-grep-icase-iso.sh b/t/t7813-grep-icase-iso.sh
new file mode 100755
index 0000000..efef7fb
--- /dev/null
+++ b/t/t7813-grep-icase-iso.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+test_description='grep icase on non-English locales'
+
+. ./lib-gettext.sh
+
+test_expect_success GETTEXT_ISO_LOCALE 'setup' '
+	printf "TILRAUN: Halló Heimur!" >file &&
+	git add file &&
+	LC_ALL="$is_IS_iso_locale" &&
+	export LC_ALL
+'
+
+test_expect_success GETTEXT_ISO_LOCALE,LIBPCRE 'grep pcre string' '
+	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
+	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
+'
+
+test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 08/12] gettext: add is_utf8_locale()
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (6 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 07/12] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 09/12] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

This function returns true if git is running under an UTF-8
locale. pcre in the next patch will need this.

is_encoding_utf8() is used instead of strcmp() to catch both "utf-8"
and "utf8" suffixes.

When built with no gettext support, we peek in several env variables
to detect UTF-8. pcre library might support utf-8 even if libc is
built without locale support.. The peeking code is a copy from
compat/regex/regcomp.c

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 gettext.c | 24 ++++++++++++++++++++++--
 gettext.h |  1 +
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gettext.c b/gettext.c
index a268a2c..db727ea 100644
--- a/gettext.c
+++ b/gettext.c
@@ -18,6 +18,8 @@
 #	endif
 #endif
 
+static const char *charset;
+
 /*
  * Guess the user's preferred languages from the value in LANGUAGE environment
  * variable and LC_MESSAGES locale category if NO_GETTEXT is not defined.
@@ -65,7 +67,6 @@ static int test_vsnprintf(const char *fmt, ...)
 	return ret;
 }
 
-static const char *charset;
 static void init_gettext_charset(const char *domain)
 {
 	/*
@@ -172,8 +173,27 @@ int gettext_width(const char *s)
 {
 	static int is_utf8 = -1;
 	if (is_utf8 == -1)
-		is_utf8 = !strcmp(charset, "UTF-8");
+		is_utf8 = is_utf8_locale();
 
 	return is_utf8 ? utf8_strwidth(s) : strlen(s);
 }
 #endif
+
+int is_utf8_locale(void)
+{
+#ifdef NO_GETTEXT
+	if (!charset) {
+		const char *env = getenv("LC_ALL");
+		if (!env || !*env)
+			env = getenv("LC_CTYPE");
+		if (!env || !*env)
+			env = getenv("LANG");
+		if (!env)
+			env = "";
+		if (strchr(env, '.'))
+			env = strchr(env, '.') + 1;
+		charset = xstrdup(env);
+	}
+#endif
+	return is_encoding_utf8(charset);
+}
diff --git a/gettext.h b/gettext.h
index 33696a4..7eee64a 100644
--- a/gettext.h
+++ b/gettext.h
@@ -90,5 +90,6 @@ const char *Q_(const char *msgid, const char *plu, unsigned long n)
 #endif
 
 const char *get_preferred_languages(void);
+extern int is_utf8_locale(void);
 
 #endif
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 09/12] grep/pcre: support utf-8
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (7 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 08/12] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 10/12] diffcore-pickaxe: Add regcomp_or_die() Nguyễn Thái Ngọc Duy
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

In the previous change in this function, we add locale support for
single-byte encodings only. It looks like pcre only supports utf-* as
multibyte encodings, the others are left in the cold (which is
fine).

We need to enable PCRE_UTF8 so pcre can find character boundary
correctly. It's needed for case folding (when --ignore-case is used)
or '*', '+' or similar syntax is used.

The "has_non_ascii()" check is to be on the conservative side. If
there's non-ascii in the pattern, the searched content could still be
in utf-8, but we can treat it just like a byte stream and everything
should work. If we force utf-8 based on locale only and pcre validates
utf-8 and the file content is in non-utf8 encoding, things break.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c                          |  2 ++
 t/t7812-grep-icase-non-ascii.sh | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/grep.c b/grep.c
index 8cf4247..3d63612 100644
--- a/grep.c
+++ b/grep.c
@@ -329,6 +329,8 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 			p->pcre_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
 	}
+	if (is_utf8_locale() && has_non_ascii(p->pattern))
+		options |= PCRE_UTF8;
 
 	p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
 				      p->pcre_tables);
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 1929809..08ae4c9 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -20,6 +20,21 @@ test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
 	git grep -i "TILRAUN: HALLÓ HEIMUR!"
 '
 
+test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 icase' '
+	git grep --perl-regexp    "TILRAUN: H.lló Heimur!" &&
+	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
+	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
+'
+
+test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 string with "+"' '
+	test_write_lines "TILRAUN: Hallóó Heimur!" >file2 &&
+	git add file2 &&
+	git grep -l --perl-regexp "TILRAUN: H.lló+ Heimur!" >actual &&
+	echo file >expected &&
+	echo file2 >>expected &&
+	test_cmp expected actual
+'
+
 test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
 	git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
 		 grep fixed >debug1 &&
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 10/12] diffcore-pickaxe: Add regcomp_or_die()
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (8 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 09/12] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 11/12] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

There's another regcomp code block coming in this function that needs
the same error handling. This function can help avoid duplicating
error handling code.

Helped-by: Jeff King <peff@peff.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diffcore-pickaxe.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 7715c13..2093b6a 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -198,6 +198,18 @@ static void pickaxe(struct diff_queue_struct *q, struct diff_options *o,
 	*q = outq;
 }
 
+static void regcomp_or_die(regex_t *regex, const char *needle, int cflags)
+{
+	int err = regcomp(regex, needle, cflags);
+	if (err) {
+		/* The POSIX.2 people are surely sick */
+		char errbuf[1024];
+		regerror(err, regex, errbuf, 1024);
+		regfree(regex);
+		die("invalid regex: %s", errbuf);
+	}
+}
+
 void diffcore_pickaxe(struct diff_options *o)
 {
 	const char *needle = o->pickaxe;
@@ -206,18 +218,10 @@ void diffcore_pickaxe(struct diff_options *o)
 	kwset_t kws = NULL;
 
 	if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
-		int err;
 		int cflags = REG_EXTENDED | REG_NEWLINE;
 		if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
 			cflags |= REG_ICASE;
-		err = regcomp(&regex, needle, cflags);
-		if (err) {
-			/* The POSIX.2 people are surely sick */
-			char errbuf[1024];
-			regerror(err, &regex, errbuf, 1024);
-			regfree(&regex);
-			die("invalid regex: %s", errbuf);
-		}
+		regcomp_or_die(&regex, needle, cflags);
 		regexp = &regex;
 	} else {
 		kws = kwsalloc(DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE)
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 11/12] diffcore-pickaxe: support case insensitive match on non-ascii
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (9 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 10/12] diffcore-pickaxe: Add regcomp_or_die() Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-25  5:22   ` [PATCH v2 12/12] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
  2016-06-27 14:53   ` [PATCH v2 00/12] nd/icase updates Junio C Hamano
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Similar to the "grep -F -i" case, we can't use kws on icase search
outside ascii range, so we quote the string and pass it to regcomp as
a basic regexp and let regex engine deal with case sensitivity.

The new test is put in t7812 instead of t4209-log-pickaxe because
lib-gettext.sh might cause problems elsewhere, probably.

Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 diffcore-pickaxe.c              | 11 +++++++++++
 t/t7812-grep-icase-non-ascii.sh |  7 +++++++
 2 files changed, 18 insertions(+)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 2093b6a..55067ca 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -7,6 +7,8 @@
 #include "diffcore.h"
 #include "xdiff-interface.h"
 #include "kwset.h"
+#include "commit.h"
+#include "quote.h"
 
 typedef int (*pickaxe_fn)(mmfile_t *one, mmfile_t *two,
 			  struct diff_options *o,
@@ -223,6 +225,15 @@ void diffcore_pickaxe(struct diff_options *o)
 			cflags |= REG_ICASE;
 		regcomp_or_die(&regex, needle, cflags);
 		regexp = &regex;
+	} else if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE) &&
+		   has_non_ascii(needle)) {
+		struct strbuf sb = STRBUF_INIT;
+		int cflags = REG_NEWLINE | REG_ICASE;
+
+		basic_regex_quote_buf(&sb, needle);
+		regcomp_or_die(&regex, sb.buf, cflags);
+		strbuf_release(&sb);
+		regexp = &regex;
 	} else {
 		kws = kwsalloc(DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE)
 			       ? tolower_trans_tbl : NULL);
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 08ae4c9..169fd8d 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -61,4 +61,11 @@ test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
 	test_cmp expect2 debug2
 '
 
+test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' '
+	git commit -m first &&
+	git log --format=%f -i -S"TILRAUN: HALLÓ HEIMUR!" >actual &&
+	echo first >expected &&
+	test_cmp expected actual
+'
+
 test_done
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 12/12] grep.c: reuse "icase" variable
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (10 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 11/12] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
@ 2016-06-25  5:22   ` Nguyễn Thái Ngọc Duy
  2016-06-27 14:53   ` [PATCH v2 00/12] nd/icase updates Junio C Hamano
  12 siblings, 0 replies; 39+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-06-25  5:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 grep.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 3d63612..92587a8 100644
--- a/grep.c
+++ b/grep.c
@@ -438,10 +438,7 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		p->fixed = 0;
 
 	if (p->fixed) {
-		if (opt->regflags & REG_ICASE || p->ignore_case)
-			p->kws = kwsalloc(tolower_trans_tbl);
-		else
-			p->kws = kwsalloc(NULL);
+		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
-- 
2.8.2.526.g02eed6d


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
                     ` (11 preceding siblings ...)
  2016-06-25  5:22   ` [PATCH v2 12/12] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
@ 2016-06-27 14:53   ` Junio C Hamano
  2016-06-27 15:00     ` Junio C Hamano
  2016-06-30 15:45     ` Duy Nguyen
  12 siblings, 2 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-06-27 14:53 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> v2 fixes Junio's and Jeff's comments (both good). The sharing "!icase
> || ascii_only" is made a separate commit (6/12) because I think it
> takes some seconds to realize that the conversion is correct and it's
> technically not needed in 5/12 (and it's sort of the opposite of 1/12)
>
> Interdiff

OK.  regcomp_or_die() does make the code simpler.

> diff --git a/grep.c b/grep.c
> index cb058a5..92587a8 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -432,15 +432,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
>  	ascii_only     = !has_non_ascii(p->pattern);
>  
> +	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
>  		p->fixed = !icase || ascii_only;
>  	else
>  		p->fixed = 0;
>  
> @@ -449,6 +442,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  		kwsincr(p->kws, p->pattern, p->patternlen);
>  		kwsprep(p->kws);
>  		return;
> +	} else if (opt->fixed) {
> +		compile_fixed_regexp(p, opt);
> +		return;
>  	}

This if/elseif/else cascade made a lot simpler and while the
discussion is fresh in my brain it makes sense, but it may deserve a
bit of commenting.

And while attempting to do so, I found one possible issue there.

Can't p->ignore_case be true even when opt->regflags does not have
REG_ICASE?  The user never asked us to do a regexp match in such a
case, and the logical place to compensate for that case would be
inside compile_fixed_regexp(), where we use regexp engine behind
user's back for our convenience, I would think.

In the current code, compile_fixed_regexp() is only called when we
want ICASE, but hardcoding that assumption to it unnecessarily robs
flexibility (and the function name does not tell us it is only for
icase in the first place), so I taught it to do the REG_ICASE thing
only when opt->ignore_case is set.

How does this look?


 grep.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 92587a8..3a3a9f4 100644
--- a/grep.c
+++ b/grep.c
@@ -407,17 +407,21 @@ static int is_fixed(const char *s, size_t len)
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int err;
+	int regflags;
 
 	basic_regex_quote_buf(&sb, p->pattern);
-	err = regcomp(&p->regexp, sb.buf, opt->regflags & ~REG_EXTENDED);
+	regflags = opt->regflags & ~REG_EXTENDED;
+	if (opt->ignore_case)
+		regflags |= REG_ICASE;
+	err = regcomp(&p->regexp, sb.buf, regflags);
 	if (opt->debug)
 		fprintf(stderr, "fixed %s\n", sb.buf);
 	strbuf_release(&sb);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, sizeof(errbuf));
 		regfree(&p->regexp);
 		compile_regexp_failed(p, errbuf);
 	}
 }
@@ -425,38 +429,55 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	int icase, ascii_only;
 	int err;
 
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
 	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
 	ascii_only     = !has_non_ascii(p->pattern);
 
+	/*
+	 * Even when -F (fixed) asks us to do a non-regexp search, we
+	 * may not be able to correctly case-fold when -i
+	 * (ignore-case) is asked (in which case, we'll synthesize a
+	 * regexp to match the pattern that matches regexp special
+	 * characters literally, while ignoring case differences).  On
+	 * the other hand, even without -F, if the pattern does not
+	 * have any regexp special characters and there is no need for
+	 * case-folding search, we can internally turn it into a
+	 * simple string match using kws.  p->fixed tells us if we
+	 * want to use kws.
+	 */
 	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
 	else
 		p->fixed = 0;
 
 	if (p->fixed) {
 		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
 	} else if (opt->fixed) {
+		/*
+		 * We only come here when the pattern has the regexp
+		 * special characters in it, which need to be matched
+		 * literally, while ignoring case.
+		 */
 		compile_fixed_regexp(p, opt);
 		return;
 	}
 
 	if (opt->pcre) {
 		compile_pcre_regexp(p, opt);
 		return;
 	}
 
 	err = regcomp(&p->regexp, p->pattern, opt->regflags);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, 1024);
 		regfree(&p->regexp);
 		compile_regexp_failed(p, errbuf);
 	}
 }

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-06-27 14:53   ` [PATCH v2 00/12] nd/icase updates Junio C Hamano
@ 2016-06-27 15:00     ` Junio C Hamano
  2016-06-30 15:45     ` Duy Nguyen
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-06-27 15:00 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

>  	if (p->fixed) {
>  		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
>  		kwsincr(p->kws, p->pattern, p->patternlen);
>  		kwsprep(p->kws);
>  		return;
>  	} else if (opt->fixed) {
> +		/*
> +		 * We only come here when the pattern has the regexp
> +		 * special characters in it, which need to be matched
> +		 * literally, while ignoring case.
> +		 */

This comment is wrong.

		/*
		 * We come here only when the pattern has non-ascii
		 * characters we cannot case-fold, and we were asked
		 * to ignore-case.
		 */

>  		compile_fixed_regexp(p, opt);
>  		return;
>  	}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-06-27 14:53   ` [PATCH v2 00/12] nd/icase updates Junio C Hamano
  2016-06-27 15:00     ` Junio C Hamano
@ 2016-06-30 15:45     ` Duy Nguyen
  2016-07-01 18:18       ` Junio C Hamano
  2016-07-01 19:11       ` Junio C Hamano
  1 sibling, 2 replies; 39+ messages in thread
From: Duy Nguyen @ 2016-06-30 15:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Mon, Jun 27, 2016 at 4:53 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> diff --git a/grep.c b/grep.c
>> index cb058a5..92587a8 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -432,15 +432,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>>       icase          = opt->regflags & REG_ICASE || p->ignore_case;
>>       ascii_only     = !has_non_ascii(p->pattern);
>>
>> +     if (opt->fixed || is_fixed(p->pattern, p->patternlen))
>>               p->fixed = !icase || ascii_only;
>>       else
>>               p->fixed = 0;
>>
>> @@ -449,6 +442,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>>               kwsincr(p->kws, p->pattern, p->patternlen);
>>               kwsprep(p->kws);
>>               return;
>> +     } else if (opt->fixed) {
>> +             compile_fixed_regexp(p, opt);
>> +             return;
>>       }
>
> This if/elseif/else cascade made a lot simpler and while the
> discussion is fresh in my brain it makes sense, but it may deserve a
> bit of commenting.
>
> And while attempting to do so, I found one possible issue there.
>
> Can't p->ignore_case be true even when opt->regflags does not have
> REG_ICASE?  The user never asked us to do a regexp match in such a
> case, and the logical place to compensate for that case would be
> inside compile_fixed_regexp(), where we use regexp engine behind
> user's back for our convenience, I would think.
>
> In the current code, compile_fixed_regexp() is only called when we
> want ICASE, but hardcoding that assumption to it unnecessarily robs
> flexibility (and the function name does not tell us it is only for
> icase in the first place), so I taught it to do the REG_ICASE thing
> only when opt->ignore_case is set.
>
> How does this look?
>
>
>  grep.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/grep.c b/grep.c
> index 92587a8..3a3a9f4 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -407,17 +407,21 @@ static int is_fixed(const char *s, size_t len)
>  static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
>  {
>         struct strbuf sb = STRBUF_INIT;
>         int err;
> +       int regflags;
>
>         basic_regex_quote_buf(&sb, p->pattern);
> -       err = regcomp(&p->regexp, sb.buf, opt->regflags & ~REG_EXTENDED);
> +       regflags = opt->regflags & ~REG_EXTENDED;
> +       if (opt->ignore_case)
> +               regflags |= REG_ICASE;
> +       err = regcomp(&p->regexp, sb.buf, regflags);

Makes sense. But then if opt->ignore_case is false and regflags
happens to have REG_ICASE set, should we clear it as well?

The rest looks good (after your comment fixup). I see you already have
all the changes in your SQUASH??? commit. Do you want me to resend or
you will just squash this in locally?
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-06-30 15:45     ` Duy Nguyen
@ 2016-07-01 18:18       ` Junio C Hamano
  2016-07-01 18:46         ` Duy Nguyen
  2016-07-01 19:11       ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-01 18:18 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> Makes sense. But then if opt->ignore_case is false and regflags
> happens to have REG_ICASE set, should we clear it as well?

I think .ignore_case is set iff '-i' is given, and .regflags has
REG_ICASE only if '-i' is given and the user said she does not want
literal string match (i.e. no '-F').

So... can .regflags have REG_ICASE when .ignore_case is false?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-07-01 18:18       ` Junio C Hamano
@ 2016-07-01 18:46         ` Duy Nguyen
  2016-07-01 18:54           ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Duy Nguyen @ 2016-07-01 18:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On Fri, Jul 1, 2016 at 8:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> Makes sense. But then if opt->ignore_case is false and regflags
>> happens to have REG_ICASE set, should we clear it as well?
>
> I think .ignore_case is set iff '-i' is given, and .regflags has
> REG_ICASE only if '-i' is given and the user said she does not want
> literal string match (i.e. no '-F').
>
> So... can .regflags have REG_ICASE when .ignore_case is false?

Yeah reg_icase is more like a subset of ignore_case. Ignore what I wrote.
-- 
Duy

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-07-01 18:46         ` Duy Nguyen
@ 2016-07-01 18:54           ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-01 18:54 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> On Fri, Jul 1, 2016 at 8:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>>> Makes sense. But then if opt->ignore_case is false and regflags
>>> happens to have REG_ICASE set, should we clear it as well?
>>
>> I think .ignore_case is set iff '-i' is given, and .regflags has
>> REG_ICASE only if '-i' is given and the user said she does not want
>> literal string match (i.e. no '-F').
>>
>> So... can .regflags have REG_ICASE when .ignore_case is false?
>
> Yeah reg_icase is more like a subset of ignore_case. Ignore what I wrote.

The rule of the game can be updated to make whoever looks at
.ignore_case currently instead look at .regflags & REG_ICASE; that
way, we could remove .ignore_case and stop worrying about one is
subset of the other.

It is just it felt funny to be mucking with .regflags when the user
explicitly said -F, and that is why they are separate.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-06-30 15:45     ` Duy Nguyen
  2016-07-01 18:18       ` Junio C Hamano
@ 2016-07-01 19:11       ` Junio C Hamano
  2016-07-01 19:40         ` Junio C Hamano
  1 sibling, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-01 19:11 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Duy Nguyen <pclouds@gmail.com> writes:

> The rest looks good (after your comment fixup). I see you already have
> all the changes in your SQUASH??? commit. Do you want me to resend or
> you will just squash this in locally?

Squashing in would need to redo this into a few relevant commits,
so it won't be "just squash this in locally" I am afraid, but let me
try.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-07-01 19:11       ` Junio C Hamano
@ 2016-07-01 19:40         ` Junio C Hamano
  2016-07-01 20:06           ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-01 19:40 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> The rest looks good (after your comment fixup). I see you already have
>> all the changes in your SQUASH??? commit. Do you want me to resend or
>> you will just squash this in locally?
>
> Squashing in would need to redo this into a few relevant commits,
> so it won't be "just squash this in locally" I am afraid, but let me
> try.

Ok, there was a miniscule conflicts but otherwise the squashed
material was all coming from a single step in the original, so
I did so myself.  Let's start merging the result to 'next' ;-)

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-07-01 19:40         ` Junio C Hamano
@ 2016-07-01 20:06           ` Junio C Hamano
  2016-07-01 20:07             ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2016-07-01 20:06 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Duy Nguyen <pclouds@gmail.com> writes:
>>
>>> The rest looks good (after your comment fixup). I see you already have
>>> all the changes in your SQUASH??? commit. Do you want me to resend or
>>> you will just squash this in locally?
>>
>> Squashing in would need to redo this into a few relevant commits,
>> so it won't be "just squash this in locally" I am afraid, but let me
>> try.
>
> Ok, there was a miniscule conflicts but otherwise the squashed
> material was all coming from a single step in the original, so
> I did so myself.  Let's start merging the result to 'next' ;-)
>
> Thanks.

IOW, this on top of the fixup we discussed.

 sideband.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/sideband.c b/sideband.c
index afa0136..2782df8 100644
--- a/sideband.c
+++ b/sideband.c
@@ -62,10 +62,7 @@ int recv_sideband(const char *me, int in_stream, int out)
 			 *
 			 * The output is accumulated in a buffer and
 			 * each line is printed to stderr using
-			 * fwrite(3).  This is a "best effort"
-			 * approach to support inter-process atomicity
-			 * (single fwrite(3) call is likely to end up
-			 * in single atomic write() system calls).
+			 * write(2) to ensure inter-process atomicity.
 			 */
 			while ((brk = strpbrk(b, "\n\r"))) {
 				int linelen = brk - b;
@@ -78,7 +75,7 @@ int recv_sideband(const char *me, int in_stream, int out)
 				} else {
 					strbuf_addf(&outbuf, "%c", *brk);
 				}
-				fwrite(outbuf.buf, 1, outbuf.len, stderr);
+				xwrite(2, outbuf.buf, outbuf.len);
 				strbuf_reset(&outbuf);
 
 				b = brk + 1;
@@ -101,7 +98,7 @@ int recv_sideband(const char *me, int in_stream, int out)
 
 	if (outbuf.len) {
 		strbuf_addf(&outbuf, "\n");
-		fwrite(outbuf.buf, 1, outbuf.len, stderr);
+		xwrite(2, outbuf.buf, outbuf.len);
 	}
 	strbuf_release(&outbuf);
 	return retval;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 00/12] nd/icase updates
  2016-07-01 20:06           ` Junio C Hamano
@ 2016-07-01 20:07             ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2016-07-01 20:07 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

>> Ok, there was a miniscule conflicts but otherwise the squashed
>> material was all coming from a single step in the original, so
>> I did so myself.  Let's start merging the result to 'next' ;-)
>>
>> Thanks.
>
> IOW, this on top of the fixup we discussed.
>
>  sideband.c | 9 +++------

NO, sorry, a wrong thread.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-07-01 20:07 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-23 16:28 [PATCH 00/11] nd/icase updates Nguyễn Thái Ngọc Duy
2016-06-23 16:28 ` [PATCH 01/11] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
2016-06-23 16:28 ` [PATCH 02/11] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
2016-06-23 16:28 ` [PATCH 03/11] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 04/11] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
2016-06-23 20:12   ` Junio C Hamano
2016-06-23 16:29 ` [PATCH 05/11] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
2016-06-23 20:25   ` Junio C Hamano
2016-06-23 16:29 ` [PATCH 06/11] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 07/11] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 08/11] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 09/11] diffcore-pickaxe: "share" regex error handling code Nguyễn Thái Ngọc Duy
2016-06-23 19:16   ` Jeff King
2016-06-23 16:29 ` [PATCH 10/11] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
2016-06-23 16:29 ` [PATCH 11/11] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
2016-06-23 20:32 ` [PATCH 00/11] nd/icase updates Junio C Hamano
2016-06-25  5:22 ` [PATCH v2 00/12] " Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 01/12] grep: break down an "if" stmt in preparation for next changes Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 02/12] test-regex: isolate the bug test code Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 03/12] test-regex: expose full regcomp() to the command line Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 04/12] grep/icase: avoid kwsset on literal non-ascii strings Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 05/12] grep/icase: avoid kwsset when -F is specified Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 06/12] grep: rewrite an if/else condition to avoid duplicate expression Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 07/12] grep/pcre: prepare locale-dependent tables for icase matching Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 08/12] gettext: add is_utf8_locale() Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 09/12] grep/pcre: support utf-8 Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 10/12] diffcore-pickaxe: Add regcomp_or_die() Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 11/12] diffcore-pickaxe: support case insensitive match on non-ascii Nguyễn Thái Ngọc Duy
2016-06-25  5:22   ` [PATCH v2 12/12] grep.c: reuse "icase" variable Nguyễn Thái Ngọc Duy
2016-06-27 14:53   ` [PATCH v2 00/12] nd/icase updates Junio C Hamano
2016-06-27 15:00     ` Junio C Hamano
2016-06-30 15:45     ` Duy Nguyen
2016-07-01 18:18       ` Junio C Hamano
2016-07-01 18:46         ` Duy Nguyen
2016-07-01 18:54           ` Junio C Hamano
2016-07-01 19:11       ` Junio C Hamano
2016-07-01 19:40         ` Junio C Hamano
2016-07-01 20:06           ` Junio C Hamano
2016-07-01 20:07             ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).