git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v3 00/30] Easy to review grep & pre-PCRE changes
@ 2017-05-20 21:42 Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 01/30] Makefile & configure: reword inaccurate comment about PCRE Ævar Arnfjörð Bjarmason
                   ` (30 more replies)
  0 siblings, 31 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Easy to review? 29 (I mean 30) patches? Are you kidding me?!

As noted in v1 (<20170511091829.5634-1-avarab@gmail.com>;
https://public-inbox.org/git/20170511091829.5634-1-avarab@gmail.com/)
these are all doc, test, refactoring etc. changes needed by the
subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series.

Since Junio hasn't been picking it I'm no longer sending updates to
that patch series & waiting for this one to cook first.

See <20170513231509.7834-1-avarab@gmail.com>
(https://public-inbox.org/git/20170513231509.7834-1-avarab@gmail.com/)
for v2 & notes about that version. What changed this time around? See
below:

Ævar Arnfjörð Bjarmason (30):
  Makefile & configure: reword inaccurate comment about PCRE
  grep & rev-list doc: stop promising libpcre for --perl-regexp
  test-lib: rename the LIBPCRE prerequisite to PCRE

No changes.

  log: add exhaustive tests for pattern style options & config

Test comment clarifications in t4202-log.sh as pointed out by Junio.

  log: make --regexp-ignore-case work with --perl-regexp

NEW: I noticed that the `-i` in `git log --perl-regexp -i --grep=<rx>`
never worked as intended. I.e. the flag for ignoring the case of the
pattern wasn't picked up.

Fixing this was trivial (one-line change), so I've included it in this
series since it's needed by a new t/perf patch (see below).

  grep: add a test asserting that --perl-regexp dies when !PCRE
  grep: add a test for backreferences in PCRE patterns
  grep: change non-ASCII -i test to stop using --debug
  grep: add tests for --threads=N and grep.threads
  grep: amend submodule recursion test for regex engine testing
  grep: add tests for grep pattern types being passed to submodules

No changes.

  grep: add a test helper function for less verbose -f \0 tests

Trivial style changes in nul_match() suggested by Junio. No functional
changes.

  grep: prepare for testing binary regexes containing rx metacharacters

No changes.

  grep: add tests to fix blind spots with \0 patterns

Continued trivial style changes in nul_match() (the other half of the
code in that function is added in this commit)>

  perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
  perf: emit progress output when unpacking & building

No changes.

  perf: add a comparison test of grep regex engines
  perf: add a comparison test of grep regex engines with -F
  perf: add a comparison test of log --grep regex engines

The log --grep test is new, and all these tests learned to take an env
variable to pass arbitrary extra grep/log flags, so I can e.g. test
with -i, -v, -w etc.

Subsequent commit messages that e.g. mentioned perf tests with the
previous hardcoded -i test have been amended to mention the new test
results.

  grep: catch a missing enum in switch statement

Grammar fix in commit message.

  grep: remove redundant regflags assignments

The two commits that made changes to regflags assignments have been
squashed.

  grep: factor test for \0 in grep patterns into a function

Rewrote commit message to not go off on a tangent about what grep -f
[file-with-\0-pattern] should mean, which is not what this change is
about.

  grep: change the internal PCRE macro names to be PCRE1
  grep: change internal *pcre* variable & function names to be *pcre1*
  grep: move is_fixed() earlier to avoid forward declaration
  test-lib: add a PTHREADS prerequisite

No changes.

  pack-objects & index-pack: add test for --threads warning
  pack-objects: fix buggy warning about threads

Rewrote the tests in these two so that the first one sets up a failing
test which is subsequently fixed in the commit that fixes the bug, as
suggested by Junio.

Removed a stray `cat err` left over from debugging.

  grep: given --threads with NO_PTHREADS=YesPlease, warn
  grep: assert that threading is enabled when calling grep_{lock,unlock}

No changes.

 Documentation/git-grep.txt         |   7 +-
 Documentation/rev-list-options.txt |   8 +-
 Makefile                           |  14 ++-
 builtin/grep.c                     |  23 +++-
 builtin/pack-objects.c             |   4 +-
 configure.ac                       |  12 ++-
 grep.c                             | 108 ++++++++++---------
 grep.h                             |  10 +-
 revision.c                         |   1 +
 t/README                           |   8 +-
 t/perf/README                      |  19 +++-
 t/perf/p4220-log-grep-engines.sh   |  44 ++++++++
 t/perf/p7820-grep-engines.sh       |  47 ++++++++
 t/perf/p7821-grep-engines-fixed.sh |  32 ++++++
 t/perf/run                         |  13 ++-
 t/t4202-log.sh                     | 160 +++++++++++++++++++++++++--
 t/t5300-pack-object.sh             |  36 +++++++
 t/t7008-grep-binary.sh             | 135 +++++++++++++++++------
 t/t7810-grep.sh                    |  81 +++++++++++---
 t/t7812-grep-icase-non-ascii.sh    |  29 ++---
 t/t7813-grep-icase-iso.sh          |   2 +-
 t/t7814-grep-recurse-submodules.sh | 215 +++++++++++++++++++++++--------------
 t/test-lib.sh                      |   3 +-
 23 files changed, 771 insertions(+), 240 deletions(-)
 create mode 100755 t/perf/p4220-log-grep-engines.sh
 create mode 100755 t/perf/p7820-grep-engines.sh
 create mode 100755 t/perf/p7821-grep-engines-fixed.sh

-- 
2.13.0.303.g4ebf302169


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 01/30] Makefile & configure: reword inaccurate comment about PCRE
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 02/30] grep & rev-list doc: stop promising libpcre for --perl-regexp Ævar Arnfjörð Bjarmason
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Reword an outdated & inaccurate comment which suggests that only
git-grep can use PCRE.

This comment was added back when PCRE support was initially added in
commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true
at the time.

It hasn't been telling the full truth since git-log learned to use
PCRE with --grep in commit 727b6fc3ed ("log --grep: accept
--basic-regexp and --perl-regexp", 2012-10-03), and more importantly
is likely to get more inaccurate over time as more use is made of PCRE
in other areas.

Reword it to be more future-proof, and to more clearly explain that
this enables user-initiated runtime behavior.

Copy/pasting this so much in configure.ac is lame, these Makefile-like
flags aren't even used by autoconf, just the corresponding
--with[out]-* options. But copy/pasting the comments that make sense
for the Makefile to configure.ac where they make less sense is the
pattern everything else follows in that file. I'm not going to war
against that as part of this change, just following the existing
pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile     |  6 ++++--
 configure.ac | 12 ++++++++----
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index e35542e631..eedadb8056 100644
--- a/Makefile
+++ b/Makefile
@@ -24,8 +24,10 @@ all::
 # Define NO_OPENSSL environment variable if you do not have OpenSSL.
 # This also implies BLK_SHA1.
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
diff --git a/configure.ac b/configure.ac
index 128165529f..deeb968daa 100644
--- a/configure.ac
+++ b/configure.ac
@@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library (default is YES)])
 AS_HELP_STRING([],              [ARG can be prefix for openssl library and headers]),
 GIT_PARSE_WITH([openssl]))
 
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
@@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
 GIT_CONF_SUBST([NO_OPENSSL])
 
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 
 if test -n "$USE_LIBPCRE"; then
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 02/30] grep & rev-list doc: stop promising libpcre for --perl-regexp
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 01/30] Makefile & configure: reword inaccurate comment about PCRE Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 03/30] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Stop promising in our grep & rev-list options documentation that we're
always going to be using libpcre when given the --perl-regexp option.

Instead talk about using "Perl-compatible regular expressions" and
using these types of patterns using "a compile-time dependency".

Saying "libpcre" means that we're talking about libpcre.so, which is
always going to be v1. This change is part of an ongoing saga to add
support for libpcre2, which comes with PCRE v2.

In the future we might use some completely unrelated library to
provide perl-compatible regular expression support. By wording the
documentation differently and not promising any specific version of
PCRE or even PCRE at all we have more wiggle room to change the
implementation.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-grep.txt         | 7 +++++--
 Documentation/rev-list-options.txt | 8 ++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 71f32f3508..5033483db4 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -161,8 +161,11 @@ OPTIONS
 
 -P::
 --perl-regexp::
-	Use Perl-compatible regexp for patterns. Requires libpcre to be
-	compiled in.
+	Use Perl-compatible regular expressions for patterns.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 -F::
 --fixed-strings::
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a02f7324c0..a46f70c2b1 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -92,8 +92,12 @@ endif::git-rev-list[]
 	pattern as a regular expression).
 
 --perl-regexp::
-	Consider the limiting patterns to be Perl-compatible regular expressions.
-	Requires libpcre to be compiled in.
+	Consider the limiting patterns to be Perl-compatible regular
+	expressions.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 --remove-empty::
 	Stop when a given path disappears from the tree.
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 03/30] test-lib: rename the LIBPCRE prerequisite to PCRE
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 01/30] Makefile & configure: reword inaccurate comment about PCRE Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 02/30] grep & rev-list doc: stop promising libpcre for --perl-regexp Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 04/30] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 7219 bytes --]

Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
libpcre2 support, where having just "LIBPCRE" would be confusing as it
implies v1 of the library.

None of these tests are incompatible between versions 1 & 2 of
libpcre, it's less confusing to give them a more general name to make
it clear that they work on both library versions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/README                        |  4 ++--
 t/t7810-grep.sh                 | 28 ++++++++++++++--------------
 t/t7812-grep-icase-non-ascii.sh |  4 ++--
 t/t7813-grep-icase-iso.sh       |  2 +-
 t/test-lib.sh                   |  2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/t/README b/t/README
index ab386c3681..a90cb62583 100644
--- a/t/README
+++ b/t/README
@@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own.
    Test is not run by root user, and an attempt to write to an
    unwritable file is expected to fail correctly.
 
- - LIBPCRE
+ - PCRE
 
-   Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests
+   Git was compiled with support for PCRE. Wrap any tests
    that use git-grep --perl-regexp or git-grep -P in these.
 
  - CASE_INSENSITIVE_FS
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index cee42097b0..c84c4d99f9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -275,7 +275,7 @@ do
 		test_cmp expected actual
 	'
 
-	test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" '
+	test_expect_success PCRE "grep $L with grep.patterntype=perl" '
 		echo "${HC}ab:a+b*c" >expected &&
 		git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab >actual &&
 		test_cmp expected actual
@@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv)
 hello.c:	printf("Hello world.\n");
 EOF
 
-test_expect_success LIBPCRE 'grep --perl-regexp pattern' '
+test_expect_success PCRE 'grep --perl-regexp pattern' '
 	git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern' '
+test_expect_success PCRE 'grep -P pattern' '
 	git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
 	test_cmp expected actual
 '
@@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with grep.extendedRegexp=true' '
 	test_cmp empty actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with grep.extendedRegexp=true' '
+test_expect_success PCRE 'grep -P pattern with grep.extendedRegexp=true' '
 	git -c grep.extendedregexp=true \
 		grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -v pattern' '
+test_expect_success PCRE 'grep -P -v pattern' '
 	{
 		echo "ab:a+b*c"
 		echo "ab:a+bc"
@@ -1085,7 +1085,7 @@ test_expect_success LIBPCRE 'grep -P -v pattern' '
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -i pattern' '
+test_expect_success PCRE 'grep -P -i pattern' '
 	cat >expected <<-EOF &&
 	hello.c:	printf("Hello world.\n");
 	EOF
@@ -1093,7 +1093,7 @@ test_expect_success LIBPCRE 'grep -P -i pattern' '
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -w pattern' '
+test_expect_success PCRE 'grep -P -w pattern' '
 	{
 		echo "hello_world:Hello world"
 		echo "hello_world:HeLLo world"
@@ -1118,11 +1118,11 @@ test_expect_success 'grep invalidpattern properly dies with grep.patternType=ext
 	test_must_fail git -c grep.patterntype=extended grep "a["
 '
 
-test_expect_success LIBPCRE 'grep -P invalidpattern properly dies ' '
+test_expect_success PCRE 'grep -P invalidpattern properly dies ' '
 	test_must_fail git grep -P "a["
 '
 
-test_expect_success LIBPCRE 'grep invalidpattern properly dies with grep.patternType=perl' '
+test_expect_success PCRE 'grep invalidpattern properly dies with grep.patternType=perl' '
 	test_must_fail git -c grep.patterntype=perl grep "a["
 '
 
@@ -1191,13 +1191,13 @@ test_expect_success 'grep pattern with grep.patternType=fixed, =basic, =perl, =e
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -G -F -E -P pattern' '
+test_expect_success PCRE 'grep -G -F -E -P pattern' '
 	echo "d0:0" >expected &&
 	git grep -G -F -E -P "[\d]" d0 >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' '
+test_expect_success PCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' '
 	echo "d0:0" >expected &&
 	git \
 		-c grep.patterntype=fixed \
@@ -1208,7 +1208,7 @@ test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, =
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with grep.patternType=fixed' '
+test_expect_success PCRE 'grep -P pattern with grep.patternType=fixed' '
 	echo "ab:a+b*c" >expected &&
 	git \
 		-c grep.patterntype=fixed \
@@ -1343,12 +1343,12 @@ space: line with leading space2
 space: line with leading space3
 EOF
 
-test_expect_success LIBPCRE 'grep -E "^ "' '
+test_expect_success PCRE 'grep -E "^ "' '
 	git grep -E "^ " space >actual &&
 	test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P "^ "' '
+test_expect_success PCRE 'grep -P "^ "' '
 	git grep -P "^ " space >actual &&
 	test_cmp expected actual
 '
diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 169fd8d706..04a61cb8e0 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -20,13 +20,13 @@ test_expect_success REGEX_LOCALE 'grep literal string, no -F' '
 	git grep -i "TILRAUN: HALLÓ HEIMUR!"
 '
 
-test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 icase' '
+test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 icase' '
 	git grep --perl-regexp    "TILRAUN: H.lló Heimur!" &&
 	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
 	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
 '
 
-test_expect_success GETTEXT_LOCALE,LIBPCRE 'grep pcre utf-8 string with "+"' '
+test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 string with "+"' '
 	test_write_lines "TILRAUN: Hallóó Heimur!" >file2 &&
 	git add file2 &&
 	git grep -l --perl-regexp "TILRAUN: H.lló+ Heimur!" >actual &&
diff --git a/t/t7813-grep-icase-iso.sh b/t/t7813-grep-icase-iso.sh
index efef7fb81f..701e08a8e5 100755
--- a/t/t7813-grep-icase-iso.sh
+++ b/t/t7813-grep-icase-iso.sh
@@ -11,7 +11,7 @@ test_expect_success GETTEXT_ISO_LOCALE 'setup' '
 	export LC_ALL
 '
 
-test_expect_success GETTEXT_ISO_LOCALE,LIBPCRE 'grep pcre string' '
+test_expect_success GETTEXT_ISO_LOCALE,PCRE 'grep pcre string' '
 	git grep --perl-regexp -i "TILRAUN: H.lló Heimur!" &&
 	git grep --perl-regexp -i "TILRAUN: H.LLÓ HEIMUR!"
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 26b3edfb2e..04d857a42b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1014,7 +1014,7 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE" && test_set_prereq LIBPCRE
+test -n "$USE_LIBPCRE" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 04/30] log: add exhaustive tests for pattern style options & config
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 03/30] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp Ævar Arnfjörð Bjarmason
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add exhaustive tests for how the different grep.patternType options &
the corresponding command-line options affect git-log.

Before this change it was possible to patch revision.c so that the
--basic-regexp option was synonymous with --extended-regexp, and
--perl-regexp wasn't recognized at all, and still have 100% of the
test suite pass.

This was because the first test being modified here, added in commit
34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as
"git grep"", 2012-10-03), didn't actually check whether we'd enabled
extended regular expressions as distinct from re-toggling non-fixed
string support.

Fix that by changing the pattern to a pattern that'll only match if
--extended-regexp option is provided, but won't match under the
default --basic-regexp option.

Other potential regressions were possible since there were no tests
for the rest of the combinations of grep.patternType configuration
toggles & corresponding git-log command-line options. Add exhaustive
tests for those.

The patterns being passed to fixed/basic/extended/PCRE are carefully
crafted to return the wrong thing if the grep engine were to pick any
other matching method than the one it's told to use.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4202-log.sh | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index f577990716..a8dce0ca2d 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -262,7 +262,30 @@ test_expect_success 'log --grep -i' '
 
 test_expect_success 'log -F -E --grep=<ere> uses ere' '
 	echo second >expect &&
-	git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual &&
+	# basic would need \(s\) to do the same
+	git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success PCRE 'log -F -E --perl-regexp --grep=<pcre> uses PCRE' '
+	test_when_finished "rm -rf num_commits" &&
+	git init num_commits &&
+	(
+		cd num_commits &&
+		test_commit 1d &&
+		test_commit 2e
+	) &&
+
+	# In PCRE \d in [\d] is like saying "0-9", and matches the 2
+	# in 2e...
+	echo 2e >expect &&
+	git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp --grep="[\d]" >actual &&
+	test_cmp expect actual &&
+
+	# ...in POSIX basic and extended it is the same as [d],
+	# i.e. "d", which matches 1d, but does not match 2e.
+	echo 1d >expect &&
+	git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" >actual &&
 	test_cmp expect actual
 '
 
@@ -280,6 +303,79 @@ test_expect_success 'log with grep.patternType configuration and command line' '
 	test_cmp expect actual
 '
 
+test_expect_success 'log with various grep.patternType configurations & command-lines' '
+	git init pattern-type &&
+	(
+		cd pattern-type &&
+		test_commit 1 file A &&
+
+		# The tagname is overridden here because creating a
+		# tag called "(1|2)" as test_commit would otherwise
+		# implicitly do would fail on e.g. MINGW.
+		test_commit "(1|2)" file B 2 &&
+
+		echo "(1|2)" >expect.fixed &&
+		cp expect.fixed expect.basic &&
+		cp expect.fixed expect.extended &&
+		cp expect.fixed expect.perl &&
+
+		# A strcmp-like match with fixed.
+		git -c grep.patternType=fixed log --pretty=tformat:%s \
+			--grep="(1|2)" >actual.fixed &&
+
+		# POSIX basic matches (, | and ) literally.
+		git -c grep.patternType=basic log --pretty=tformat:%s \
+			--grep="(.|.)" >actual.basic &&
+
+		# POSIX extended needs to have | escaped to match it
+		# literally, whereas under basic this is the same as
+		# (|2), i.e. it would also match "1". This test checks
+		# for extended by asserting that it is not matching
+		# what basic would match.
+		git -c grep.patternType=extended log --pretty=tformat:%s \
+			--grep="\|2" >actual.extended &&
+		if test_have_prereq PCRE
+		then
+			# Only PCRE would match [\d]\| with only
+			# "(1|2)" due to [\d]. POSIX basic would match
+			# both it and "1" since similarly to the
+			# extended match above it is the same as
+			# \([\d]\|\). POSIX extended would
+			# match neither.
+			git -c grep.patternType=perl log --pretty=tformat:%s \
+				--grep="[\d]\|" >actual.perl &&
+			test_cmp expect.perl actual.perl
+		fi &&
+		test_cmp expect.fixed actual.fixed &&
+		test_cmp expect.basic actual.basic &&
+		test_cmp expect.extended actual.extended &&
+
+		git log --pretty=tformat:%s -F \
+			--grep="(1|2)" >actual.fixed.short-arg &&
+		git log --pretty=tformat:%s -E \
+			--grep="\|2" >actual.extended.short-arg &&
+		test_cmp expect.fixed actual.fixed.short-arg &&
+		test_cmp expect.extended actual.extended.short-arg &&
+
+		git log --pretty=tformat:%s --fixed-strings \
+			--grep="(1|2)" >actual.fixed.long-arg &&
+		git log --pretty=tformat:%s --basic-regexp \
+			--grep="(.|.)" >actual.basic.long-arg &&
+		git log --pretty=tformat:%s --extended-regexp \
+			--grep="\|2" >actual.extended.long-arg &&
+		if test_have_prereq PCRE
+		then
+			git log --pretty=tformat:%s --perl-regexp \
+				--grep="[\d]\|" >actual.perl.long-arg &&
+			test_cmp expect.perl actual.perl.long-arg
+
+		fi &&
+		test_cmp expect.fixed actual.fixed.long-arg &&
+		test_cmp expect.basic actual.basic.long-arg &&
+		test_cmp expect.extended actual.extended.long-arg
+	)
+'
+
 cat > expect <<EOF
 * Second
 * sixth
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 04/30] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 23:50   ` Junio C Hamano
  2017-05-20 21:42 ` [PATCH v3 06/30] grep: add a test asserting that --perl-regexp dies when !PCRE Ævar Arnfjörð Bjarmason
                   ` (25 subsequent siblings)
  30 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Make the --regexp-ignore-case option work with --perl-regexp. This
never worked, and there was no test for this. Fix the bug and add a
test.

When PCRE support was added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) compile_pcre_regexp() would only check
opt->ignore_case, but when the --perl-regexp option was added in
commit 727b6fc3ed ("log --grep: accept --basic-regexp and
--perl-regexp", 2012-10-03) the code didn't set the opt->ignore_case.

Change the test suite to test for -i and --invert-regexp with
basic/extended/perl patterns in addition to fixed, which was the only
patternType that was tested for before in combination with those
options.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 revision.c     |  1 +
 t/t4202-log.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index 8a8c1789c7..4883cdd2d0 100644
--- a/revision.c
+++ b/revision.c
@@ -1991,6 +1991,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--extended-regexp") || !strcmp(arg, "-E")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
 	} else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
+		revs->grep_filter.ignore_case = 1;
 		revs->grep_filter.regflags |= REG_ICASE;
 		DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index a8dce0ca2d..547f4c19a7 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -231,14 +231,47 @@ second
 initial
 EOF
 test_expect_success 'log --invert-grep --grep' '
-	git log --pretty="tformat:%s" --invert-grep --grep=th --grep=Sec >actual &&
-	test_cmp expect actual
+	# Fixed
+	git -c grep.patternType=fixed log --pretty="tformat:%s" --invert-grep --grep=th --grep=Sec >actual &&
+	test_cmp expect actual &&
+
+	# POSIX basic
+	git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep --grep=t[h] --grep=S[e]c >actual &&
+	test_cmp expect actual &&
+
+	# POSIX extended
+	git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep --grep=t[h] --grep=S[e]c >actual &&
+	test_cmp expect actual &&
+
+	# PCRE
+	if test_have_prereq PCRE
+	then
+		git -c grep.patternType=perl log --pretty="tformat:%s" --invert-grep --grep=t[h] --grep=S[e]c >actual &&
+		test_cmp expect actual
+	fi
 '
 
 test_expect_success 'log --invert-grep --grep -i' '
 	echo initial >expect &&
-	git log --pretty="tformat:%s" --invert-grep -i --grep=th --grep=Sec >actual &&
-	test_cmp expect actual
+
+	# Fixed
+	git -c grep.patternType=fixed log --pretty="tformat:%s" --invert-grep -i --grep=th --grep=Sec >actual &&
+	test_cmp expect actual &&
+
+	# POSIX basic
+	git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep -i --grep=t[h] --grep=S[e]c >actual &&
+	test_cmp expect actual &&
+
+	# POSIX extended
+	git -c grep.patternType=extended log --pretty="tformat:%s" --invert-grep -i --grep=t[h] --grep=S[e]c >actual &&
+	test_cmp expect actual &&
+
+	# PCRE
+	if test_have_prereq PCRE
+	then
+		git -c grep.patternType=perl log --pretty="tformat:%s" --invert-grep -i --grep=t[h] --grep=S[e]c >actual &&
+		test_cmp expect actual
+	fi
 '
 
 test_expect_success 'log --grep option parsing' '
@@ -256,8 +289,25 @@ test_expect_success 'log -i --grep' '
 
 test_expect_success 'log --grep -i' '
 	echo Second >expect &&
+
+	# Fixed
 	git log -1 --pretty="tformat:%s" --grep=sec -i >actual &&
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	# POSIX basic
+	git -c grep.patternType=basic log -1 --pretty="tformat:%s" --grep=s[e]c -i >actual &&
+	test_cmp expect actual &&
+
+	# POSIX extended
+	git -c grep.patternType=extended log -1 --pretty="tformat:%s" --grep=s[e]c -i >actual &&
+	test_cmp expect actual &&
+
+	# PCRE
+	if test_have_prereq PCRE
+	then
+		git -c grep.patternType=perl log -1 --pretty="tformat:%s" --grep=s[e]c -i >actual &&
+		test_cmp expect actual
+	fi
 '
 
 test_expect_success 'log -F -E --grep=<ere> uses ere' '
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 06/30] grep: add a test asserting that --perl-regexp dies when !PCRE
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 07/30] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a test asserting that when --perl-regexp (and -P for grep) is
given to git-grep & git-log that we die with an error.

In developing the PCRE v2 series I introduced a regression where -P
would (through control-flow fall-through) become synonymous with basic
POSIX matching. I.e. 'git grep -P '[\d]' would match "d" instead of
digits.

The entire test suite would still pass with this serious regression,
since everything that tested for --perl-regexp would be guarded by the
PCRE prerequisite, fix that blind-spot by adding tests under !PCRE
asserting that git must die when given --perl-regexp or -P.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t4202-log.sh  |  4 +++-
 t/t7810-grep.sh | 12 ++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 547f4c19a7..dbed3efeee 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -418,7 +418,9 @@ test_expect_success 'log with various grep.patternType configurations & command-
 			git log --pretty=tformat:%s --perl-regexp \
 				--grep="[\d]\|" >actual.perl.long-arg &&
 			test_cmp expect.perl actual.perl.long-arg
-
+		else
+			test_must_fail git log --perl-regexp \
+				--grep="[\d]\|"
 		fi &&
 		test_cmp expect.fixed actual.fixed.long-arg &&
 		test_cmp expect.basic actual.basic.long-arg &&
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index c84c4d99f9..8d69113695 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -281,6 +281,10 @@ do
 		test_cmp expected actual
 	'
 
+	test_expect_success !PCRE "grep $L with grep.patterntype=perl errors without PCRE" '
+		test_must_fail git -c grep.patterntype=perl grep "foo.*bar"
+	'
+
 	test_expect_success "grep $L with grep.patternType=default and grep.extendedRegexp=true" '
 		echo "${HC}ab:abc" >expected &&
 		git \
@@ -1058,11 +1062,19 @@ test_expect_success PCRE 'grep --perl-regexp pattern' '
 	test_cmp expected actual
 '
 
+test_expect_success !PCRE 'grep --perl-regexp pattern errors without PCRE' '
+	test_must_fail git grep --perl-regexp "foo.*bar"
+'
+
 test_expect_success PCRE 'grep -P pattern' '
 	git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
 	test_cmp expected actual
 '
 
+test_expect_success !PCRE 'grep -P pattern errors without PCRE' '
+	test_must_fail git grep -P "foo.*bar"
+'
+
 test_expect_success 'grep pattern with grep.extendedRegexp=true' '
 	>empty &&
 	test_must_fail git -c grep.extendedregexp=true \
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 07/30] grep: add a test for backreferences in PCRE patterns
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 06/30] grep: add a test asserting that --perl-regexp dies when !PCRE Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 08/30] grep: change non-ASCII -i test to stop using --debug Ævar Arnfjörð Bjarmason
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a test for backreferences such as (.)\1 in PCRE patterns. This
test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned
on. Before this change turning it on would break these sort of
patterns, but wouldn't break any tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7810-grep.sh | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 8d69113695..daa906b9b0 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1114,6 +1114,13 @@ test_expect_success PCRE 'grep -P -w pattern' '
 	test_cmp expected actual
 '
 
+test_expect_success PCRE 'grep -P backreferences work (the PCRE NO_AUTO_CAPTURE flag is not set)' '
+	git grep -P -h "(?P<one>.)(?P=one)" hello_world >actual &&
+	test_cmp hello_world actual &&
+	git grep -P -h "(.)\1" hello_world >actual &&
+	test_cmp hello_world actual
+'
+
 test_expect_success 'grep -G invalidpattern properly dies ' '
 	test_must_fail git grep -G "a["
 '
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 08/30] grep: change non-ASCII -i test to stop using --debug
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 07/30] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 09/30] grep: add tests for --threads=N and grep.threads Ævar Arnfjörð Bjarmason
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Change a non-ASCII case-insensitive test case to stop using --debug,
and instead simply test for the expected results.

The test coverage remains the same with this change, but the test
won't break due to internal refactoring.

This test was added in commit 793dc676e0 ("grep/icase: avoid kwsset
when -F is specified", 2016-06-25). It was asserting that the regex
must be compiled with compile_fixed_regexp(), instead test for the
expected results, allowing the underlying implementation to change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7812-grep-icase-non-ascii.sh | 25 +++++--------------------
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 04a61cb8e0..0059a1f837 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -36,29 +36,14 @@ test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 string with "+"' '
 '
 
 test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
-	git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
-		 grep fixed >debug1 &&
-	test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 &&
-	test_cmp expect1 debug1 &&
-
-	git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!"  2>&1 >/dev/null |
-		 grep fixed >debug2 &&
-	test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 &&
-	test_cmp expect2 debug2
+	git grep -i -F "TILRAUN: Halló Heimur!" &&
+	git grep -i -F "TILRAUN: HALLÓ HEIMUR!"
 '
 
 test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
-	test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file &&
-
-	git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 >/dev/null |
-		 grep fixed >debug1 &&
-	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\Halló \$He\\[]imur!\\\$" >expect1 &&
-	test_cmp expect1 debug1 &&
-
-	git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$"  2>&1 >/dev/null |
-		 grep fixed >debug2 &&
-	test_write_lines "fixed \\^*TILR^AUN:\\.\\* \\\\HALLÓ \$HE\\[]IMUR!\\\$" >expect2 &&
-	test_cmp expect2 debug2
+	test_write_lines "TILRAUN: Halló Heimur [abc]!" >file3 &&
+	git add file3 &&
+	git grep -i -F "TILRAUN: Halló Heimur [abc]!" file3
 '
 
 test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' '
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 09/30] grep: add tests for --threads=N and grep.threads
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (7 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 08/30] grep: change non-ASCII -i test to stop using --debug Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 10/30] grep: amend submodule recursion test for regex engine testing Ævar Arnfjörð Bjarmason
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add tests for --threads=N being supplied on the command-line, or when
grep.threads=N being supplied in the configuration.

When the threading support was made run-time configurable in commit
89f09dd34e ("grep: add --threads=<num> option and grep.threads
configuration", 2015-12-15) no tests were added for it.

In developing a change to the grep code I was able to make
'--threads=1 <pat>` segfault, while the test suite still passed. This
change fixes that blind spot in the tests.

In addition to asserting that asking for N threads shouldn't segfault,
test that the grep output given any N is the same.

The choice to test only 1..10 as opposed to 1..8 or 1..16 or whatever
is arbitrary. Testing 1..1024 works locally for me (but gets
noticeably slower as more threads are spawned). Given the structure of
the code there's no reason to test an arbitrary number of threads,
only 0, 1 and >=2 are special modes of operation.

A later patch introduces a PTHREADS test prerequisite which is true
under NO_PTHREADS=UnfortunatelyYes, but even under NO_PTHREADS it's
fine to test --threads=N, we'll just ignore it and not use
threading. So these tests also make sense under that mode to assert
that --threads=N without pthreads still returns expected results.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7810-grep.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index daa906b9b0..561709ef6a 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -775,6 +775,22 @@ test_expect_success 'grep -W with userdiff' '
 	test_cmp expected actual
 '
 
+for threads in $(test_seq 0 10)
+do
+	test_expect_success "grep --threads=$threads & -c grep.threads=$threads" "
+		git grep --threads=$threads . >actual.$threads &&
+		if test $threads -ge 1
+		then
+			test_cmp actual.\$(($threads - 1)) actual.$threads
+		fi &&
+		git -c grep.threads=$threads grep . >actual.$threads &&
+		if test $threads -ge 1
+		then
+			test_cmp actual.\$(($threads - 1)) actual.$threads
+		fi
+	"
+done
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
 	mkdir -p s &&
 	(
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 10/30] grep: amend submodule recursion test for regex engine testing
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (8 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 09/30] grep: add tests for --threads=N and grep.threads Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 11/30] grep: add tests for grep pattern types being passed to submodules Ævar Arnfjörð Bjarmason
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Amend the submodule recursion test to prepare it for subsequent tests
of whether it passes along the grep.patternType to the submodule
greps.

This is the result of searching & replacing:

    foobar -> (1|2)d(3|4)
    foo    -> (1|2)
    bar    -> (3|4)

Currently there's no tests for whether e.g. -P or -E is correctly
passed along, tests for that will be added in a follow-up change, but
first add content to the tests which will match differently under
different regex engines.

Reuse the pattern established in an earlier commit of mine in this
series ("log: add exhaustive tests for pattern style options &
config", 2017-04-07). The pattern "(.|.)[\d]" will match this content
differently under fixed/basic/extended & perl.

This test code was originally added in commit 0281e487fd ("grep:
optionally recurse into submodules", 2016-12-16).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7814-grep-recurse-submodules.sh | 166 ++++++++++++++++++-------------------
 1 file changed, 83 insertions(+), 83 deletions(-)

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 5b6eb3a65e..1472855e1d 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -9,13 +9,13 @@ submodules.
 . ./test-lib.sh
 
 test_expect_success 'setup directory structure and submodule' '
-	echo "foobar" >a &&
+	echo "(1|2)d(3|4)" >a &&
 	mkdir b &&
-	echo "bar" >b/b &&
+	echo "(3|4)" >b/b &&
 	git add a b &&
 	git commit -m "add a and b" &&
 	git init submodule &&
-	echo "foobar" >submodule/a &&
+	echo "(1|2)d(3|4)" >submodule/a &&
 	git -C submodule add a &&
 	git -C submodule commit -m "add a" &&
 	git submodule add ./submodule &&
@@ -24,18 +24,18 @@ test_expect_success 'setup directory structure and submodule' '
 
 test_expect_success 'grep correctly finds patterns in a submodule' '
 	cat >expect <<-\EOF &&
-	a:foobar
-	b/b:bar
-	submodule/a:foobar
+	a:(1|2)d(3|4)
+	b/b:(3|4)
+	submodule/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules >actual &&
+	git grep -e "(3|4)" --recurse-submodules >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep and basic pathspecs' '
 	cat >expect <<-\EOF &&
-	submodule/a:foobar
+	submodule/a:(1|2)d(3|4)
 	EOF
 
 	git grep -e. --recurse-submodules -- submodule >actual &&
@@ -44,7 +44,7 @@ test_expect_success 'grep and basic pathspecs' '
 
 test_expect_success 'grep and nested submodules' '
 	git init submodule/sub &&
-	echo "foobar" >submodule/sub/a &&
+	echo "(1|2)d(3|4)" >submodule/sub/a &&
 	git -C submodule/sub add a &&
 	git -C submodule/sub commit -m "add a" &&
 	git -C submodule submodule add ./sub &&
@@ -54,117 +54,117 @@ test_expect_success 'grep and nested submodules' '
 	git commit -m "updated submodule" &&
 
 	cat >expect <<-\EOF &&
-	a:foobar
-	b/b:bar
-	submodule/a:foobar
-	submodule/sub/a:foobar
+	a:(1|2)d(3|4)
+	b/b:(3|4)
+	submodule/a:(1|2)d(3|4)
+	submodule/sub/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules >actual &&
+	git grep -e "(3|4)" --recurse-submodules >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
 	cat >expect <<-\EOF &&
-	a:foobar
-	submodule/a:foobar
-	submodule/sub/a:foobar
+	a:(1|2)d(3|4)
+	submodule/a:(1|2)d(3|4)
+	submodule/sub/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+	git grep -e "(3|4)" --and -e "(1|2)" --recurse-submodules >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
 	cat >expect <<-\EOF &&
-	b/b:bar
+	b/b:(3|4)
 	EOF
 
-	git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+	git grep -e "(3|4)" --and --not -e "(1|2)" --recurse-submodules >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'basic grep tree' '
 	cat >expect <<-\EOF &&
-	HEAD:a:foobar
-	HEAD:b/b:bar
-	HEAD:submodule/a:foobar
-	HEAD:submodule/sub/a:foobar
+	HEAD:a:(1|2)d(3|4)
+	HEAD:b/b:(3|4)
+	HEAD:submodule/a:(1|2)d(3|4)
+	HEAD:submodule/sub/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD >actual &&
+	git grep -e "(3|4)" --recurse-submodules HEAD >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep tree HEAD^' '
 	cat >expect <<-\EOF &&
-	HEAD^:a:foobar
-	HEAD^:b/b:bar
-	HEAD^:submodule/a:foobar
+	HEAD^:a:(1|2)d(3|4)
+	HEAD^:b/b:(3|4)
+	HEAD^:submodule/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+	git grep -e "(3|4)" --recurse-submodules HEAD^ >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep tree HEAD^^' '
 	cat >expect <<-\EOF &&
-	HEAD^^:a:foobar
-	HEAD^^:b/b:bar
+	HEAD^^:a:(1|2)d(3|4)
+	HEAD^^:b/b:(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD^^ >actual &&
+	git grep -e "(3|4)" --recurse-submodules HEAD^^ >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep tree and pathspecs' '
 	cat >expect <<-\EOF &&
-	HEAD:submodule/a:foobar
-	HEAD:submodule/sub/a:foobar
+	HEAD:submodule/a:(1|2)d(3|4)
+	HEAD:submodule/sub/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD -- submodule >actual &&
+	git grep -e "(3|4)" --recurse-submodules HEAD -- submodule >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep tree and pathspecs' '
 	cat >expect <<-\EOF &&
-	HEAD:submodule/a:foobar
-	HEAD:submodule/sub/a:foobar
+	HEAD:submodule/a:(1|2)d(3|4)
+	HEAD:submodule/sub/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD -- "submodule*a" >actual &&
+	git grep -e "(3|4)" --recurse-submodules HEAD -- "submodule*a" >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep tree and more pathspecs' '
 	cat >expect <<-\EOF &&
-	HEAD:submodule/a:foobar
+	HEAD:submodule/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD -- "submodul?/a" >actual &&
+	git grep -e "(3|4)" --recurse-submodules HEAD -- "submodul?/a" >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep tree and more pathspecs' '
 	cat >expect <<-\EOF &&
-	HEAD:submodule/sub/a:foobar
+	HEAD:submodule/sub/a:(1|2)d(3|4)
 	EOF
 
-	git grep -e "bar" --recurse-submodules HEAD -- "submodul*/sub/a" >actual &&
+	git grep -e "(3|4)" --recurse-submodules HEAD -- "submodul*/sub/a" >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success !MINGW 'grep recurse submodule colon in name' '
 	git init parent &&
 	test_when_finished "rm -rf parent" &&
-	echo "foobar" >"parent/fi:le" &&
+	echo "(1|2)d(3|4)" >"parent/fi:le" &&
 	git -C parent add "fi:le" &&
 	git -C parent commit -m "add fi:le" &&
 
 	git init "su:b" &&
 	test_when_finished "rm -rf su:b" &&
-	echo "foobar" >"su:b/fi:le" &&
+	echo "(1|2)d(3|4)" >"su:b/fi:le" &&
 	git -C "su:b" add "fi:le" &&
 	git -C "su:b" commit -m "add fi:le" &&
 
@@ -172,30 +172,30 @@ test_expect_success !MINGW 'grep recurse submodule colon in name' '
 	git -C parent commit -m "add submodule" &&
 
 	cat >expect <<-\EOF &&
-	fi:le:foobar
-	su:b/fi:le:foobar
+	fi:le:(1|2)d(3|4)
+	su:b/fi:le:(1|2)d(3|4)
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	git -C parent grep -e "(1|2)d(3|4)" --recurse-submodules >actual &&
 	test_cmp expect actual &&
 
 	cat >expect <<-\EOF &&
-	HEAD:fi:le:foobar
-	HEAD:su:b/fi:le:foobar
+	HEAD:fi:le:(1|2)d(3|4)
+	HEAD:su:b/fi:le:(1|2)d(3|4)
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules HEAD >actual &&
+	git -C parent grep -e "(1|2)d(3|4)" --recurse-submodules HEAD >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep history with moved submoules' '
 	git init parent &&
 	test_when_finished "rm -rf parent" &&
-	echo "foobar" >parent/file &&
+	echo "(1|2)d(3|4)" >parent/file &&
 	git -C parent add file &&
 	git -C parent commit -m "add file" &&
 
 	git init sub &&
 	test_when_finished "rm -rf sub" &&
-	echo "foobar" >sub/file &&
+	echo "(1|2)d(3|4)" >sub/file &&
 	git -C sub add file &&
 	git -C sub commit -m "add file" &&
 
@@ -203,82 +203,82 @@ test_expect_success 'grep history with moved submoules' '
 	git -C parent commit -m "add submodule" &&
 
 	cat >expect <<-\EOF &&
-	dir/sub/file:foobar
-	file:foobar
+	dir/sub/file:(1|2)d(3|4)
+	file:(1|2)d(3|4)
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	git -C parent grep -e "(1|2)d(3|4)" --recurse-submodules >actual &&
 	test_cmp expect actual &&
 
 	git -C parent mv dir/sub sub-moved &&
 	git -C parent commit -m "moved submodule" &&
 
 	cat >expect <<-\EOF &&
-	file:foobar
-	sub-moved/file:foobar
+	file:(1|2)d(3|4)
+	sub-moved/file:(1|2)d(3|4)
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	git -C parent grep -e "(1|2)d(3|4)" --recurse-submodules >actual &&
 	test_cmp expect actual &&
 
 	cat >expect <<-\EOF &&
-	HEAD^:dir/sub/file:foobar
-	HEAD^:file:foobar
+	HEAD^:dir/sub/file:(1|2)d(3|4)
+	HEAD^:file:(1|2)d(3|4)
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules HEAD^ >actual &&
+	git -C parent grep -e "(1|2)d(3|4)" --recurse-submodules HEAD^ >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep using relative path' '
 	test_when_finished "rm -rf parent sub" &&
 	git init sub &&
-	echo "foobar" >sub/file &&
+	echo "(1|2)d(3|4)" >sub/file &&
 	git -C sub add file &&
 	git -C sub commit -m "add file" &&
 
 	git init parent &&
-	echo "foobar" >parent/file &&
+	echo "(1|2)d(3|4)" >parent/file &&
 	git -C parent add file &&
 	mkdir parent/src &&
-	echo "foobar" >parent/src/file2 &&
+	echo "(1|2)d(3|4)" >parent/src/file2 &&
 	git -C parent add src/file2 &&
 	git -C parent submodule add ../sub &&
 	git -C parent commit -m "add files and submodule" &&
 
 	# From top works
 	cat >expect <<-\EOF &&
-	file:foobar
-	src/file2:foobar
-	sub/file:foobar
+	file:(1|2)d(3|4)
+	src/file2:(1|2)d(3|4)
+	sub/file:(1|2)d(3|4)
 	EOF
-	git -C parent grep --recurse-submodules -e "foobar" >actual &&
+	git -C parent grep --recurse-submodules -e "(1|2)d(3|4)" >actual &&
 	test_cmp expect actual &&
 
 	# Relative path to top
 	cat >expect <<-\EOF &&
-	../file:foobar
-	file2:foobar
-	../sub/file:foobar
+	../file:(1|2)d(3|4)
+	file2:(1|2)d(3|4)
+	../sub/file:(1|2)d(3|4)
 	EOF
-	git -C parent/src grep --recurse-submodules -e "foobar" -- .. >actual &&
+	git -C parent/src grep --recurse-submodules -e "(1|2)d(3|4)" -- .. >actual &&
 	test_cmp expect actual &&
 
 	# Relative path to submodule
 	cat >expect <<-\EOF &&
-	../sub/file:foobar
+	../sub/file:(1|2)d(3|4)
 	EOF
-	git -C parent/src grep --recurse-submodules -e "foobar" -- ../sub >actual &&
+	git -C parent/src grep --recurse-submodules -e "(1|2)d(3|4)" -- ../sub >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'grep from a subdir' '
 	test_when_finished "rm -rf parent sub" &&
 	git init sub &&
-	echo "foobar" >sub/file &&
+	echo "(1|2)d(3|4)" >sub/file &&
 	git -C sub add file &&
 	git -C sub commit -m "add file" &&
 
 	git init parent &&
 	mkdir parent/src &&
-	echo "foobar" >parent/src/file &&
+	echo "(1|2)d(3|4)" >parent/src/file &&
 	git -C parent add src/file &&
 	git -C parent submodule add ../sub src/sub &&
 	git -C parent submodule add ../sub sub &&
@@ -286,19 +286,19 @@ test_expect_success 'grep from a subdir' '
 
 	# Verify grep from root works
 	cat >expect <<-\EOF &&
-	src/file:foobar
-	src/sub/file:foobar
-	sub/file:foobar
+	src/file:(1|2)d(3|4)
+	src/sub/file:(1|2)d(3|4)
+	sub/file:(1|2)d(3|4)
 	EOF
-	git -C parent grep --recurse-submodules -e "foobar" >actual &&
+	git -C parent grep --recurse-submodules -e "(1|2)d(3|4)" >actual &&
 	test_cmp expect actual &&
 
 	# Verify grep from a subdir works
 	cat >expect <<-\EOF &&
-	file:foobar
-	sub/file:foobar
+	file:(1|2)d(3|4)
+	sub/file:(1|2)d(3|4)
 	EOF
-	git -C parent/src grep --recurse-submodules -e "foobar" >actual &&
+	git -C parent/src grep --recurse-submodules -e "(1|2)d(3|4)" >actual &&
 	test_cmp expect actual
 '
 
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 11/30] grep: add tests for grep pattern types being passed to submodules
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (9 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 10/30] grep: amend submodule recursion test for regex engine testing Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 12/30] grep: add a test helper function for less verbose -f \0 tests Ævar Arnfjörð Bjarmason
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add testing for grep pattern types being correctly passed to
submodules. The pattern "(.|.)[\d]" matches differently under
fixed (not at all), and then matches different lines under
basic/extended & perl regular expressions, so this change asserts that
the pattern type is passed along correctly.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7814-grep-recurse-submodules.sh | 49 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 1472855e1d..3a58197f47 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -313,4 +313,53 @@ test_incompatible_with_recurse_submodules ()
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
 
+test_expect_success 'grep --recurse-submodules should pass the pattern type along' '
+	# Fixed
+	test_must_fail git grep -F --recurse-submodules -e "(.|.)[\d]" &&
+	test_must_fail git -c grep.patternType=fixed grep --recurse-submodules -e "(.|.)[\d]" &&
+
+	# Basic
+	git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
+	cat >expect <<-\EOF &&
+	a:(1|2)d(3|4)
+	submodule/a:(1|2)d(3|4)
+	submodule/sub/a:(1|2)d(3|4)
+	EOF
+	test_cmp expect actual &&
+	git -c grep.patternType=basic grep --recurse-submodules -e "(.|.)[\d]" >actual &&
+	test_cmp expect actual &&
+
+	# Extended
+	git grep -E --recurse-submodules -e "(.|.)[\d]" >actual &&
+	cat >expect <<-\EOF &&
+	.gitmodules:[submodule "submodule"]
+	.gitmodules:	path = submodule
+	.gitmodules:	url = ./submodule
+	a:(1|2)d(3|4)
+	submodule/.gitmodules:[submodule "sub"]
+	submodule/a:(1|2)d(3|4)
+	submodule/sub/a:(1|2)d(3|4)
+	EOF
+	test_cmp expect actual &&
+	git -c grep.patternType=extended grep --recurse-submodules -e "(.|.)[\d]" >actual &&
+	test_cmp expect actual &&
+	git -c grep.extendedRegexp=true grep --recurse-submodules -e "(.|.)[\d]" >actual &&
+	test_cmp expect actual &&
+
+	# Perl
+	if test_have_prereq PCRE
+	then
+		git grep -P --recurse-submodules -e "(.|.)[\d]" >actual &&
+		cat >expect <<-\EOF &&
+		a:(1|2)d(3|4)
+		b/b:(3|4)
+		submodule/a:(1|2)d(3|4)
+		submodule/sub/a:(1|2)d(3|4)
+		EOF
+		test_cmp expect actual &&
+		git -c grep.patternType=perl grep --recurse-submodules -e "(.|.)[\d]" >actual &&
+		test_cmp expect actual
+	fi
+'
+
 test_done
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 12/30] grep: add a test helper function for less verbose -f \0 tests
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (10 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 11/30] grep: add tests for grep pattern types being passed to submodules Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 13/30] grep: prepare for testing binary regexes containing rx metacharacters Ævar Arnfjörð Bjarmason
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a helper function to make the tests which check for patterns with
\0 in them more succinct. Right now this isn't a big win, but
subsequent commits will add a lot more of these tests.

The helper is based on the match() function in t3070-wildmatch.sh.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7008-grep-binary.sh | 58 +++++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 9c9c378119..df93d8e44c 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -4,6 +4,29 @@ test_description='git grep in binary files'
 
 . ./test-lib.sh
 
+nul_match () {
+	matches=$1
+	flags=$2
+	pattern=$3
+	pattern_human=$(echo "$pattern" | sed 's/Q/<NUL>/g')
+
+	if test "$matches" = 1
+	then
+		test_expect_success "git grep -f f $flags '$pattern_human' a" "
+			printf '$pattern' | q_to_nul >f &&
+			git grep -f f $flags a
+		"
+	elif test "$matches" = 0
+	then
+		test_expect_success "git grep -f f $flags '$pattern_human' a" "
+			printf '$pattern' | q_to_nul >f &&
+			test_must_fail git grep -f f $flags a
+		"
+	else
+		test_expect_success "PANIC: Test framework error. Unknown matches value $matches" 'false'
+	fi
+}
+
 test_expect_success 'setup' "
 	echo 'binaryQfile' | q_to_nul >a &&
 	git add a &&
@@ -69,35 +92,12 @@ test_expect_failure 'git grep .fi a' '
 	git grep .fi a
 '
 
-test_expect_success 'git grep -F y<NUL>f a' "
-	printf 'yQf' | q_to_nul >f &&
-	git grep -f f -F a
-"
-
-test_expect_success 'git grep -F y<NUL>x a' "
-	printf 'yQx' | q_to_nul >f &&
-	test_must_fail git grep -f f -F a
-"
-
-test_expect_success 'git grep -Fi Y<NUL>f a' "
-	printf 'YQf' | q_to_nul >f &&
-	git grep -f f -Fi a
-"
-
-test_expect_success 'git grep -Fi Y<NUL>x a' "
-	printf 'YQx' | q_to_nul >f &&
-	test_must_fail git grep -f f -Fi a
-"
-
-test_expect_success 'git grep y<NUL>f a' "
-	printf 'yQf' | q_to_nul >f &&
-	git grep -f f a
-"
-
-test_expect_success 'git grep y<NUL>x a' "
-	printf 'yQx' | q_to_nul >f &&
-	test_must_fail git grep -f f a
-"
+nul_match 1 '-F' 'yQf'
+nul_match 0 '-F' 'yQx'
+nul_match 1 '-Fi' 'YQf'
+nul_match 0 '-Fi' 'YQx'
+nul_match 1 '' 'yQf'
+nul_match 0 '' 'yQx'
 
 test_expect_success 'grep respects binary diff attribute' '
 	echo text >t &&
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 13/30] grep: prepare for testing binary regexes containing rx metacharacters
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (11 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 12/30] grep: add a test helper function for less verbose -f \0 tests Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 14/30] grep: add tests to fix blind spots with \0 patterns Ævar Arnfjörð Bjarmason
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add setup code needed for testing regexes that contain both binary
data and regex metacharacters.

The POSIX regcomp() function inherently can't support that, because it
takes a \0-delimited char *, but other regex engines APIs like PCRE v2
take a pattern/length pair, and are thus able to handle \0s in
patterns as well as any other character.

When kwset was imported in commit 9eceddeec6 ("Use kwset in grep",
2011-08-21) this limitation was fixed, but at the expense of
introducing the undocumented limitation that any pattern containing \0
implicitly becomes a fixed match (equivalent to -F having been
provided).

That's not something we'd like to keep in the future. The inability to
match patterns containing \0 is a leaky implementation detail.

So add tests as a first step towards changing that. In order to test
that \0-patterns can properly match as regexes the test string needs
to have some regex metacharacters in it.

There were other blind spots in the tests. The code around kwset
specially handles case-insensitive & non-ASCII data, but there were no
tests for this.

Fix all of that by amending the text being matched to contain both
regex metacharacters & non-ASCII data.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7008-grep-binary.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index df93d8e44c..20370d6e0c 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -28,7 +28,7 @@ nul_match () {
 }
 
 test_expect_success 'setup' "
-	echo 'binaryQfile' | q_to_nul >a &&
+	echo 'binaryQfileQm[*]cQ*æQð' | q_to_nul >a &&
 	git add a &&
 	git commit -m.
 "
@@ -162,7 +162,7 @@ test_expect_success 'grep does not honor textconv' '
 '
 
 test_expect_success 'grep --textconv honors textconv' '
-	echo "a:binaryQfile" >expect &&
+	echo "a:binaryQfileQm[*]cQ*æQð" >expect &&
 	git grep --textconv Qfile >actual &&
 	test_cmp expect actual
 '
@@ -172,7 +172,7 @@ test_expect_success 'grep --no-textconv does not honor textconv' '
 '
 
 test_expect_success 'grep --textconv blob honors textconv' '
-	echo "HEAD:a:binaryQfile" >expect &&
+	echo "HEAD:a:binaryQfileQm[*]cQ*æQð" >expect &&
 	git grep --textconv Qfile HEAD:a >actual &&
 	test_cmp expect actual
 '
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 14/30] grep: add tests to fix blind spots with \0 patterns
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (12 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 13/30] grep: prepare for testing binary regexes containing rx metacharacters Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Ævar Arnfjörð Bjarmason
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Address a big blind spot in the tests for patterns containing \0. The
is_fixed() function considers any string that contains \0 fixed, even
if it contains regular expression metacharacters, those patterns are
currently matched with kwset.

Before this change removing that memchr(s, 0, len) check from
is_fixed() wouldn't change the result of any of the tests, since
regcomp() will happily match the part before the \0.

The kwset path is dependent on whether the the -i flag is on, and
whether the pattern has any non-ASCII characters, but none of this was
tested for.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7008-grep-binary.sh | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 20370d6e0c..615e7e0162 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -22,6 +22,18 @@ nul_match () {
 			printf '$pattern' | q_to_nul >f &&
 			test_must_fail git grep -f f $flags a
 		"
+	elif test "$matches" = T1
+	then
+		test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+			printf '$pattern' | q_to_nul >f &&
+			git grep -f f $flags a
+		"
+	elif test "$matches" = T0
+	then
+		test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+			printf '$pattern' | q_to_nul >f &&
+			test_must_fail git grep -f f $flags a
+		"
 	else
 		test_expect_success "PANIC: Test framework error. Unknown matches value $matches" 'false'
 	fi
@@ -98,6 +110,65 @@ nul_match 1 '-Fi' 'YQf'
 nul_match 0 '-Fi' 'YQx'
 nul_match 1 '' 'yQf'
 nul_match 0 '' 'yQx'
+nul_match 1 '' 'æQð'
+nul_match 1 '-F' 'eQm[*]c'
+nul_match 1 '-Fi' 'EQM[*]C'
+
+# Regex patterns that would match but shouldn't with -F
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-F' '[y]Qf'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '-Fi' '[Y]QF'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-F' '[æ]Qð'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '-Fi' '[Æ]QÐ'
+
+# kwset is disabled on -i & non-ASCII. No way to match non-ASCII \0
+# patterns case-insensitively.
+nul_match T1 '-i' 'ÆQÐ'
+
+# \0 implicitly disables regexes. This is an undocumented internal
+# limitation.
+nul_match T1 '' 'yQ[f]'
+nul_match T1 '' '[y]Qf'
+nul_match T1 '-i' 'YQ[F]'
+nul_match T1 '-i' '[Y]Qf'
+nul_match T1 '' 'æQ[ð]'
+nul_match T1 '' '[æ]Qð'
+nul_match T1 '-i' 'ÆQ[Ð]'
+
+# ... because of \0 implicitly disabling regexes regexes that
+# should/shouldn't match don't do the right thing.
+nul_match T1 '' 'eQm.*cQ'
+nul_match T1 '-i' 'EQM.*cQ'
+nul_match T0 '' 'eQm[*]c'
+nul_match T0 '-i' 'EQM[*]C'
+
+# Due to the REG_STARTEND extension when kwset() is disabled on -i &
+# non-ASCII the string will be matched in its entirety, but the
+# pattern will be cut off at the first \0.
+nul_match 0 '-i' 'NOMATCHQð'
+nul_match T0 '-i' '[Æ]QNOMATCH'
+nul_match T0 '-i' '[æ]QNOMATCH'
+# Matches, but for the wrong reasons, just stops at [æ]
+nul_match 1 '-i' '[Æ]Qð'
+nul_match 1 '-i' '[æ]Qð'
+
+# Ensure that the matcher doesn't regress to something that stops at
+# \0
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '' 'yQNOMATCH'
+nul_match 0 '' 'QNOMATCH'
+nul_match 0 '-i' 'YQNOMATCH'
+nul_match 0 '-i' 'QNOMATCH'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '' 'yQNÓMATCH'
+nul_match 0 '' 'QNÓMATCH'
+nul_match 0 '-i' 'YQNÓMATCH'
+nul_match 0 '-i' 'QNÓMATCH'
 
 test_expect_success 'grep respects binary diff attribute' '
 	echo text >t &&
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (13 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 14/30] grep: add tests to fix blind spots with \0 patterns Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 23:50   ` Junio C Hamano
  2017-05-20 21:42 ` [PATCH v3 16/30] perf: emit progress output when unpacking & building Ævar Arnfjörð Bjarmason
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a git GIT_PERF_MAKE_COMMAND variable to compliment the existing
GIT_PERF_MAKE_OPTS facility. This allows specifying an arbitrary shell
command to execute instead of 'make'.

This is useful e.g. in cases where the name, semantics or defaults of
a Makefile flag have changed over time. It can even be used to change
the contents of the tree, useful for monkeypatching ancient versions
of git to get them to build.

This opens Pandora's box in some ways, it's now possible to
"jailbreak" the perf environment and e.g. modify the source tree via
this arbitrary instead of just issuing a custom "make" command, such a
command has to be re-entrant in the sense that subsequent perf runs
will re-use the possibly modified tree.

It would be pointless to try to mitigate or work around that caveat in
a tool purely aimed at Git developers, so this change makes no attempt
to do so.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile      |  3 +++
 t/perf/README | 19 +++++++++++++++++--
 t/perf/run    | 11 +++++++++--
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index eedadb8056..d1587452f1 100644
--- a/Makefile
+++ b/Makefile
@@ -2272,6 +2272,9 @@ endif
 ifdef GIT_PERF_MAKE_OPTS
 	@echo GIT_PERF_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_OPTS)))'\' >>$@+
 endif
+ifdef GIT_PERF_MAKE_COMMAND
+	@echo GIT_PERF_MAKE_COMMAND=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_COMMAND)))'\' >>$@+
+endif
 ifdef GIT_INTEROP_MAKE_OPTS
 	@echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+
 endif
diff --git a/t/perf/README b/t/perf/README
index 49ea4349be..b3d95042a8 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -60,8 +60,23 @@ You can set the following variables (also in your config.mak):
 
     GIT_PERF_MAKE_OPTS
 	Options to use when automatically building a git tree for
-	performance testing.  E.g., -j6 would be useful.
-
+	performance testing. E.g., -j6 would be useful. Passed
+	directly to make as "make $GIT_PERF_MAKE_OPTS".
+
+    GIT_PERF_MAKE_COMMAND
+	An arbitrary command that'll be run in place of the make
+	command, if set the GIT_PERF_MAKE_OPTS variable is
+	ignored. Useful in cases where source tree changes might
+	require issuing a different make command to different
+	revisions.
+
+	This can be (ab)used to monkeypatch or otherwise change the
+	tree about to be built. Note that the build directory can be
+	re-used for subsequent runs so the make command might get
+	executed multiple times on the same tree, but don't count on
+	any of that, that's an implementation detail that might change
+	in the future.
+ 
     GIT_PERF_REPO
     GIT_PERF_LARGE_REPO
 	Repositories to copy for the performance tests.  The normal
diff --git a/t/perf/run b/t/perf/run
index c788d713ae..b61024a830 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -37,8 +37,15 @@ build_git_rev () {
 			cp "../../$config" "build/$rev/"
 		fi
 	done
-	(cd build/$rev && make $GIT_PERF_MAKE_OPTS) ||
-	die "failed to build revision '$mydir'"
+	(
+		cd build/$rev &&
+		if test -n "$GIT_PERF_MAKE_COMMAND"
+		then
+			sh -c "$GIT_PERF_MAKE_COMMAND"
+		else
+			make $GIT_PERF_MAKE_OPTS
+		fi
+	) || die "failed to build revision '$mydir'"
 }
 
 run_dirs_helper () {
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 16/30] perf: emit progress output when unpacking & building
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (14 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 17/30] perf: add a comparison test of grep regex engines Ævar Arnfjörð Bjarmason
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Amend the t/perf/run output so that in addition to the "Running N
tests" heading currently being emitted, it also emits "Unpacking $rev"
and "Building $rev" when setting up the build/$rev directory & when
building it, respectively.

This makes it easier to see what's going on and what revision is being
tested as the output scrolls by.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/run | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/perf/run b/t/perf/run
index b61024a830..beb4acc0e4 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -24,6 +24,7 @@ run_one_dir () {
 
 unpack_git_rev () {
 	rev=$1
+	echo "=== Unpacking $rev in build/$rev ==="
 	mkdir -p build/$rev
 	(cd "$(git rev-parse --show-cdup)" && git archive --format=tar $rev) |
 	(cd build/$rev && tar x)
@@ -37,6 +38,7 @@ build_git_rev () {
 			cp "../../$config" "build/$rev/"
 		fi
 	done
+	echo "=== Building $rev ==="
 	(
 		cd build/$rev &&
 		if test -n "$GIT_PERF_MAKE_COMMAND"
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 17/30] perf: add a comparison test of grep regex engines
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (15 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 16/30] perf: emit progress output when unpacking & building Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 18/30] perf: add a comparison test of grep regex engines with -F Ævar Arnfjörð Bjarmason
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a very basic performance comparison test comparing the POSIX
basic, extended and perl engines.

In theory the "basic" and "extended" engines should be implemented
using the same underlying code with a slightly different pattern
parser, but some implementations may not do this. Jump through some
slight hoops to test both, which is worthwhile since "basic" is the
default.

Running this on an i7 3.4GHz Linux 4.9.0-2 Debian testing against a
checkout of linux.git & latest upstream PCRE, both PCRE and git
compiled with -O3 using gcc 7.1.1:

    $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7820-grep-engines.sh
    [...]
    Test                                            this tree
    ---------------------------------------------------------------
    7820.1: basic grep 'how.to'                     0.34(1.24+0.53)
    7820.2: extended grep 'how.to'                  0.33(1.23+0.45)
    7820.3: perl grep 'how.to'                      0.31(1.05+0.56)
    7820.5: basic grep '^how to'                    0.32(1.24+0.42)
    7820.6: extended grep '^how to'                 0.33(1.20+0.44)
    7820.7: perl grep '^how to'                     0.57(2.67+0.42)
    7820.9: basic grep '[how] to'                   0.51(2.16+0.45)
    7820.10: extended grep '[how] to'               0.49(2.20+0.43)
    7820.11: perl grep '[how] to'                   0.56(2.60+0.43)
    7820.13: basic grep '\(e.t[^ ]*\|v.ry\) rare'   0.66(3.25+0.40)
    7820.14: extended grep '(e.t[^ ]*|v.ry) rare'   0.65(3.19+0.46)
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       1.05(5.74+0.34)
    7820.17: basic grep 'm\(ú\|u\)lt.b\(æ\|y\)te'   0.34(1.28+0.47)
    7820.18: extended grep 'm(ú|u)lt.b(æ|y)te'      0.34(1.38+0.38)
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.39(1.56+0.44)

Options can also be passed to git-grep via the GIT_PERF_7820_GREP_OPTS
environment variable. There are various modes such as "-v" that have
very different performance profiles, but handling the combinatorial
explosion of testing all those options would make this script much
more complex and harder to maintain. Instead just add the ability to
do one-shot runs with arbitrary options, e.g.:

    $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_7820_GREP_OPTS=" -i" ./run p7820-grep-engines.sh
    [...]
    Test                                               this tree
    ------------------------------------------------------------------
    7820.1: basic grep -i 'how.to'                     0.49(1.72+0.38)
    7820.2: extended grep -i 'how.to'                  0.46(1.64+0.42)
    7820.3: perl grep -i 'how.to'                      0.44(1.45+0.45)
    7820.5: basic grep -i '^how to'                    0.47(1.76+0.38)
    7820.6: extended grep -i '^how to'                 0.47(1.70+0.42)
    7820.7: perl grep -i '^how to'                     0.65(2.72+0.37)
    7820.9: basic grep -i '[how] to'                   0.86(3.64+0.42)
    7820.10: extended grep -i '[how] to'               0.84(3.62+0.46)
    7820.11: perl grep -i '[how] to'                   0.73(3.06+0.39)
    7820.13: basic grep -i '\(e.t[^ ]*\|v.ry\) rare'   1.63(8.13+0.36)
    7820.14: extended grep -i '(e.t[^ ]*|v.ry) rare'   1.64(8.01+0.44)
    7820.15: perl grep -i '(e.t[^ ]*|v.ry) rare'       1.44(6.88+0.44)
    7820.17: basic grep -i 'm\(ú\|u\)lt.b\(æ\|y\)te'   0.66(2.67+0.44)
    7820.18: extended grep -i 'm(ú|u)lt.b(æ|y)te'      0.66(2.67+0.43)
    7820.19: perl grep -i 'm(ú|u)lt.b(æ|y)te'          0.59(2.31+0.37)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p7820-grep-engines.sh | 47 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)
 create mode 100755 t/perf/p7820-grep-engines.sh

diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh
new file mode 100755
index 0000000000..a3a1b9fa28
--- /dev/null
+++ b/t/perf/p7820-grep-engines.sh
@@ -0,0 +1,47 @@
+#!/bin/sh
+
+test_description="Comparison of git-grep's regex engines
+
+Set GIT_PERF_7820_GREP_OPTS in the environment to pass options to
+git-grep. Make sure to include a leading space,
+e.g. GIT_PERF_7820_GREP_OPTS=' -i'. Some options to try:
+
+	-i
+	-w
+	-v
+	-vi
+	-vw
+	-viw
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for pattern in \
+	'how.to' \
+	'^how to' \
+	'[how] to' \
+	'\(e.t[^ ]*\|v.ry\) rare' \
+	'm\(ú\|u\)lt.b\(æ\|y\)te'
+do
+	for engine in basic extended perl
+	do
+		if test $engine != "basic"
+		then
+			# Poor man's basic -> extended converter.
+			pattern=$(echo "$pattern" | sed 's/\\//g')
+		fi
+		test_perf "$engine grep$GIT_PERF_7820_GREP_OPTS '$pattern'" "
+			git -c grep.patternType=$engine grep$GIT_PERF_7820_GREP_OPTS -- '$pattern' >'out.$engine' || :
+		"
+	done
+
+	test_expect_success "assert that all engines found the same for$GIT_PERF_7820_GREP_OPTS '$pattern'" "
+		test_cmp 'out.basic' 'out.extended' &&
+		test_cmp 'out.basic' 'out.perl'
+	"
+done
+
+test_done
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 18/30] perf: add a comparison test of grep regex engines with -F
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (16 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 17/30] perf: add a comparison test of grep regex engines Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 19/30] perf: add a comparison test of log --grep regex engines Ævar Arnfjörð Bjarmason
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a performance comparison test which compares both case-sensitive &
case-insensitive fixed-string grep, as well as non-ASCII
case-sensitive & case-insensitive grep.

    $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7821-grep-engines-fixed.sh
    [...]
    Test                             this tree
    ------------------------------------------------
    7821.1: fixed grep int           0.61(1.72+0.65)
    7821.2: basic grep int           0.69(1.72+0.53)
    7821.3: extended grep int        0.60(1.72+0.54)
    7821.4: perl grep int            0.65(1.65+0.64)
    7821.6: fixed grep uncommon      0.25(0.53+0.48)
    7821.7: basic grep uncommon      0.26(0.57+0.46)
    7821.8: extended grep uncommon   0.25(0.52+0.51)
    7821.9: perl grep uncommon       0.26(0.56+0.48)
    7821.11: fixed grep æ            0.40(1.26+0.44)
    7821.12: basic grep æ            0.40(1.28+0.43)
    7821.13: extended grep æ         0.39(1.28+0.44)
    7821.14: perl grep æ             0.39(1.29+0.44)

This test needs to be run with GIT_PERF_7821_GREP_OPTS=' -i' to avoid
going through the same kwset.[ch] codepath, see the "Even when -F..."
comment in grep.c:

    $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_7821_GREP_OPTS=' -i' ./run p7821-grep-engines-fixed.sh
    [...]
    Test                                this tree
    ---------------------------------------------------
    7821.1: fixed grep -i int           1.55(1.86+0.66)
    7821.2: basic grep -i int           0.66(1.97+0.54)
    7821.3: extended grep -i int        0.72(1.88+0.62)
    7821.4: perl grep -i int            0.75(1.93+0.57)
    7821.6: fixed grep -i uncommon      0.27(0.52+0.54)
    7821.7: basic grep -i uncommon      0.25(0.58+0.44)
    7821.8: extended grep -i uncommon   0.26(0.62+0.43)
    7821.9: perl grep -i uncommon       0.26(0.55+0.53)
    7821.11: fixed grep -i æ            0.32(0.87+0.46)
    7821.12: basic grep -i æ            0.30(0.90+0.41)
    7821.13: extended grep -i æ         0.32(0.92+0.41)
    7821.14: perl grep -i æ             0.29(0.71+0.53)

I'm planning to make that not be the case, this performance test gives
a baseline for comparing performance before & after any such change.

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p7821-grep-engines-fixed.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100755 t/perf/p7821-grep-engines-fixed.sh

diff --git a/t/perf/p7821-grep-engines-fixed.sh b/t/perf/p7821-grep-engines-fixed.sh
new file mode 100755
index 0000000000..d935194ecf
--- /dev/null
+++ b/t/perf/p7821-grep-engines-fixed.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description="Comparison of git-grep's regex engines with -F
+
+Set GIT_PERF_7821_GREP_OPTS in the environment to pass options to
+git-grep. Make sure to include a leading space,
+e.g. GIT_PERF_7821_GREP_OPTS=' -w'. See p7820-grep-engines.sh for more
+options to try.
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for args in 'int' 'uncommon' 'æ'
+do
+	for engine in fixed basic extended perl
+	do
+		test_perf "$engine grep$GIT_PERF_7821_GREP_OPTS $args" "
+			git -c grep.patternType=$engine grep$GIT_PERF_7821_GREP_OPTS $args >'out.$engine.$args' || :
+		"
+	done
+
+	test_expect_success "assert that all engines found the same for$GIT_PERF_7821_GREP_OPTS $args" "
+		test_cmp 'out.fixed.$args' 'out.basic.$args' &&
+		test_cmp 'out.fixed.$args' 'out.extended.$args' &&
+		test_cmp 'out.fixed.$args' 'out.perl.$args'
+	"
+done
+
+test_done
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 19/30] perf: add a comparison test of log --grep regex engines
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (17 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 18/30] perf: add a comparison test of grep regex engines with -F Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 20/30] grep: catch a missing enum in switch statement Ævar Arnfjörð Bjarmason
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a very basic performance comparison test comparing the POSIX
basic, extended and perl engines with patterns matching log messages
via --grep=<pattern>.

    $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p4220-log-grep-engines.sh
    [...]
    Test                                                  this tree
    ---------------------------------------------------------------------
    4220.1: basic log --grep='how.to'                     6.22(6.00+0.21)
    4220.2: extended log --grep='how.to'                  6.23(5.98+0.23)
    4220.3: perl log --grep='how.to'                      6.07(5.79+0.25)
    4220.5: basic log --grep='^how to'                    6.19(5.93+0.22)
    4220.6: extended log --grep='^how to'                 6.19(5.93+0.23)
    4220.7: perl log --grep='^how to'                     6.14(5.88+0.24)
    4220.9: basic log --grep='[how] to'                   6.96(6.65+0.28)
    4220.10: extended log --grep='[how] to'               6.96(6.69+0.24)
    4220.11: perl log --grep='[how] to'                   6.95(6.58+0.33)
    4220.13: basic log --grep='\(e.t[^ ]*\|v.ry\) rare'   7.10(6.80+0.27)
    4220.14: extended log --grep='(e.t[^ ]*|v.ry) rare'   7.07(6.80+0.26)
    4220.15: perl log --grep='(e.t[^ ]*|v.ry) rare'       7.70(7.46+0.22)
    4220.17: basic log --grep='m\(ú\|u\)lt.b\(æ\|y\)te'   6.12(5.87+0.24)
    4220.18: extended log --grep='m(ú|u)lt.b(æ|y)te'      6.14(5.84+0.26)
    4220.19: perl log --grep='m(ú|u)lt.b(æ|y)te'          6.16(5.93+0.20)

With -i:

    $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_4220_LOG_OPTS=' -i' ./run p4220-log-grep-engines.sh
    [...]
    Test                                                     this tree
    ------------------------------------------------------------------------
    4220.1: basic log -i --grep='how.to'                     6.74(6.41+0.32)
    4220.2: extended log -i --grep='how.to'                  6.78(6.55+0.22)
    4220.3: perl log -i --grep='how.to'                      6.06(5.77+0.28)
    4220.5: basic log -i --grep='^how to'                    6.80(6.57+0.22)
    4220.6: extended log -i --grep='^how to'                 6.83(6.52+0.29)
    4220.7: perl log -i --grep='^how to'                     6.16(5.94+0.20)
    4220.9: basic log -i --grep='[how] to'                   7.87(7.61+0.24)
    4220.10: extended log -i --grep='[how] to'               7.85(7.57+0.27)
    4220.11: perl log -i --grep='[how] to'                   7.03(6.75+0.25)
    4220.13: basic log -i --grep='\(e.t[^ ]*\|v.ry\) rare'   8.68(8.41+0.25)
    4220.14: extended log -i --grep='(e.t[^ ]*|v.ry) rare'   8.80(8.44+0.28)
    4220.15: perl log -i --grep='(e.t[^ ]*|v.ry) rare'       7.85(7.56+0.26)
    4220.17: basic log -i --grep='m\(ú\|u\)lt.b\(æ\|y\)te'   6.94(6.68+0.24)
    4220.18: extended log -i --grep='m(ú|u)lt.b(æ|y)te'      7.04(6.76+0.24)
    4220.19: perl log -i --grep='m(ú|u)lt.b(æ|y)te'          6.26(5.92+0.29)

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Before commit ("log: make --regexp-ignore-case work with
--perl-regexp", 2017-05-20) this test will almost definitely
fail (depending on the repo) if passed the -i option, since it wasn't
properly supported under PCRE.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/perf/p4220-log-grep-engines.sh | 44 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)
 create mode 100755 t/perf/p4220-log-grep-engines.sh

diff --git a/t/perf/p4220-log-grep-engines.sh b/t/perf/p4220-log-grep-engines.sh
new file mode 100755
index 0000000000..02793ac77b
--- /dev/null
+++ b/t/perf/p4220-log-grep-engines.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+
+test_description="Comparison of git-log's --grep regex engines
+
+Set GIT_PERF_4220_LOG_OPTS in the environment to pass options to
+git-grep. Make sure to include a leading space,
+e.g. GIT_PERF_4220_LOG_OPTS=' -i'. Some options to try:
+
+	-i
+	--invert-grep
+	-i --invert-grep
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for pattern in \
+	'how.to' \
+	'^how to' \
+	'[how] to' \
+	'\(e.t[^ ]*\|v.ry\) rare' \
+	'm\(ú\|u\)lt.b\(æ\|y\)te'
+do
+	for engine in basic extended perl
+	do
+		if test $engine != "basic"
+		then
+			# Poor man's basic -> extended converter.
+			pattern=$(echo $pattern | sed 's/\\//g')
+		fi
+		test_perf "$engine log$GIT_PERF_4220_LOG_OPTS --grep='$pattern'" "
+			git -c grep.patternType=$engine log --pretty=format:%h$GIT_PERF_4220_LOG_OPTS --grep='$pattern' >'out.$engine' || :
+		"
+	done
+
+	test_expect_success "assert that all engines found the same for$GIT_PERF_4220_LOG_OPTS '$pattern'" "
+		test_cmp 'out.basic' 'out.extended' &&
+		test_cmp 'out.basic' 'out.perl'
+	"
+done
+
+test_done
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 20/30] grep: catch a missing enum in switch statement
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (18 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 19/30] perf: add a comparison test of log --grep regex engines Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 21/30] grep: remove redundant regflags assignments Ævar Arnfjörð Bjarmason
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a die(...) to a default case for the switch statement selecting
between grep pattern types under --recurse-submodules.

Normally this would be caught by -Wswitch, but the grep_pattern_type
type is converted to int by going through parse_options(). Changing
the argument type passed to compile_submodule_options() won't work,
the value will just get coerced. The -Wswitch-default warning will
warn about it, but that produces a lot of noise across the codebase,
this potential issue would be drowned in that noise.

Thus catching this at runtime is the least bad option. This won't ever
trigger in practice, but if a new pattern type were to be added this
catches an otherwise silent bug during development.

See commit 0281e487fd ("grep: optionally recurse into submodules",
2016-12-16) for the initial addition of this code.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e81..a191e2976b 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -495,6 +495,8 @@ static void compile_submodule_options(const struct grep_opt *opt,
 		break;
 	case GREP_PATTERN_TYPE_UNSPECIFIED:
 		break;
+	default:
+		die("BUG: Added a new grep pattern type without updating switch statement");
 	}
 
 	for (pattern = opt->pattern_list; pattern != NULL;
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 21/30] grep: remove redundant regflags assignments
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (19 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 20/30] grep: catch a missing enum in switch statement Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function Ævar Arnfjörð Bjarmason
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Remove redundant assignments to the "regflags" variable. This variable
is only used set under GREP_PATTERN_TYPE_ERE, so there's no need to
un-set it under GREP_PATTERN_TYPE_{FIXED,BRE,PCRE}.

Back in 5010cb5fcc[1], we did do "opt.regflags &= ~REG_EXTENDED" upon
seeing "-G" on the command line and flipped the bit on upon seeing
"-E", but I think that was perfectly sensible and it would have been a
bug if we didn't.  They were part of the command line parsing that
could have seen "-E" on the command line earlier.

When cca2c172 ("git-grep: do not die upon -F/-P when
grep.extendedRegexp is set.", 2011-05-09) switched the command line
parsing to "read into a 'tentatively this is what we saw the last'
variable and then finally commit just once", we didn't touch
opt.regflags for PCRE and FIXED, but we still had to flip regflags
between BRE and ERE, because parsing of grep.extendedregexp
configuration variable directly touched opt.regflags back then, which
was done by b22520a3 ("grep: allow -E and -n to be turned on by
default via configuration", 2011-03-30).

When 84befcd0 ("grep: add a grep.patternType configuration setting",
2012-08-03) introduced extended_regexp_option field, we stopped
flipping regflags while reading the configuration, and that was when
we should have noticed and stopped dropping REG_EXTENDED bit in the
"now we can commit what type to use" helper function.

There is no reason to do this anymore, so stop doing it, more to
reduce "wait this is used under fixed/BRE/PCRE how?" confusion when
reading the code, than to to save ourselves trivial CPU cycles by
removing one assignment.

1. "built-in "git grep"", 2006-04-30.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/grep.c b/grep.c
index 47cee45067..bf6c2494fd 100644
--- a/grep.c
+++ b/grep.c
@@ -179,7 +179,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 	case GREP_PATTERN_TYPE_BRE:
 		opt->fixed = 0;
 		opt->pcre = 0;
-		opt->regflags &= ~REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
@@ -191,13 +190,11 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
 		opt->pcre = 0;
-		opt->regflags &= ~REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_PCRE:
 		opt->fixed = 0;
 		opt->pcre = 1;
-		opt->regflags &= ~REG_EXTENDED;
 		break;
 	}
 }
@@ -415,10 +412,9 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int err;
-	int regflags;
+	int regflags = opt->regflags;
 
 	basic_regex_quote_buf(&sb, p->pattern);
-	regflags = opt->regflags & ~REG_EXTENDED;
 	if (opt->ignore_case)
 		regflags |= REG_ICASE;
 	err = regcomp(&p->regexp, sb.buf, regflags);
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (20 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 21/30] grep: remove redundant regflags assignments Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-23 21:17   ` Brandon Williams
  2017-05-20 21:42 ` [PATCH v3 23/30] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
                   ` (8 subsequent siblings)
  30 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Factor the test for \0 in grep patterns into a function. Since commit
9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
\0 is considered fixed as regcomp() can't handle it.

This change makes later changes that make use of either has_null() or
is_fixed() (but not both) smaller.

While I'm at it make the comment conform to the style guide, i.e. add
an opening "/*\n".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/grep.c b/grep.c
index bf6c2494fd..79eb681c6e 100644
--- a/grep.c
+++ b/grep.c
@@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p,
 	die("%s'%s': %s", where, p->pattern, error);
 }
 
+static int has_null(const char *s, size_t len)
+{
+	/*
+	 * regcomp cannot accept patterns with NULs so when using it
+	 * we consider any pattern containing a NUL fixed.
+	 */
+	if (memchr(s, 0, len))
+		return 1;
+
+	return 0;
+}
+
 #ifdef USE_LIBPCRE
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
@@ -394,12 +406,6 @@ static int is_fixed(const char *s, size_t len)
 {
 	size_t i;
 
-	/* regcomp cannot accept patterns with NULs so we
-	 * consider any pattern containing a NUL fixed.
-	 */
-	if (memchr(s, 0, len))
-		return 1;
-
 	for (i = 0; i < len; i++) {
 		if (is_regex_special(s[i]))
 			return 0;
@@ -451,7 +457,7 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	 * simple string match using kws.  p->fixed tells us if we
 	 * want to use kws.
 	 */
-	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
+	if (opt->fixed || has_null(p->pattern, p->patternlen) || is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
 	else
 		p->fixed = 0;
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 23/30] grep: change the internal PCRE macro names to be PCRE1
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (21 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 24/30] grep: change internal *pcre* variable & function names to be *pcre1* Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.

This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.

In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.

However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.

This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile      | 4 ++--
 grep.c        | 6 +++---
 grep.h        | 2 +-
 t/test-lib.sh | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index d1587452f1..374fbc7e58 100644
--- a/Makefile
+++ b/Makefile
@@ -1088,7 +1088,7 @@ ifdef NO_LIBGEN_H
 endif
 
 ifdef USE_LIBPCRE
-	BASIC_CFLAGS += -DUSE_LIBPCRE
+	BASIC_CFLAGS += -DUSE_LIBPCRE1
 	ifdef LIBPCREDIR
 		BASIC_CFLAGS += -I$(LIBPCREDIR)/include
 		EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
@@ -2240,7 +2240,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+
 	@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
-	@echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
+	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
diff --git a/grep.c b/grep.c
index 79eb681c6e..854f2404be 100644
--- a/grep.c
+++ b/grep.c
@@ -333,7 +333,7 @@ static int has_null(const char *s, size_t len)
 	return 0;
 }
 
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	const char *error;
@@ -385,7 +385,7 @@ static void free_pcre_regexp(struct grep_pat *p)
 	pcre_free(p->pcre_extra_info);
 	pcre_free((void *)p->pcre_tables);
 }
-#else /* !USE_LIBPCRE */
+#else /* !USE_LIBPCRE1 */
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
@@ -400,7 +400,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
 static void free_pcre_regexp(struct grep_pat *p)
 {
 }
-#endif /* !USE_LIBPCRE */
+#endif /* !USE_LIBPCRE1 */
 
 static int is_fixed(const char *s, size_t len)
 {
diff --git a/grep.h b/grep.h
index 267534ca24..073b0e4c92 100644
--- a/grep.h
+++ b/grep.h
@@ -1,7 +1,7 @@
 #ifndef GREP_H
 #define GREP_H
 #include "color.h"
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 #include <pcre.h>
 #else
 typedef int pcre;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 04d857a42b..1d0f636cbd 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1014,7 +1014,7 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 24/30] grep: change internal *pcre* variable & function names to be *pcre1*
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (22 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 23/30] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 25/30] grep: move is_fixed() earlier to avoid forward declaration Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.

An earlier change in this series ("grep: change the internal PCRE
macro names to be PCRE1", 2017-04-07) elaborates on the motivations
behind this change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 52 ++++++++++++++++++++++++++--------------------------
 grep.h |  8 ++++----
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/grep.c b/grep.c
index 854f2404be..07512346b1 100644
--- a/grep.c
+++ b/grep.c
@@ -178,23 +178,23 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 
 	case GREP_PATTERN_TYPE_BRE:
 		opt->fixed = 0;
-		opt->pcre = 0;
+		opt->pcre1 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
 		opt->fixed = 0;
-		opt->pcre = 0;
+		opt->pcre1 = 0;
 		opt->regflags |= REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
-		opt->pcre = 0;
+		opt->pcre1 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_PCRE:
 		opt->fixed = 0;
-		opt->pcre = 1;
+		opt->pcre1 = 1;
 		break;
 	}
 }
@@ -334,7 +334,7 @@ static int has_null(const char *s, size_t len)
 }
 
 #ifdef USE_LIBPCRE1
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	const char *error;
 	int erroffset;
@@ -342,23 +342,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern))
-			p->pcre_tables = pcre_maketables();
+			p->pcre1_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
 	}
 	if (is_utf8_locale() && has_non_ascii(p->pattern))
 		options |= PCRE_UTF8;
 
-	p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
-				      p->pcre_tables);
-	if (!p->pcre_regexp)
+	p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
+				      p->pcre1_tables);
+	if (!p->pcre1_regexp)
 		compile_regexp_failed(p, error);
 
-	p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, &error);
-	if (!p->pcre_extra_info && error)
+	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+	if (!p->pcre1_extra_info && error)
 		die("%s", error);
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 		regmatch_t *match, int eflags)
 {
 	int ovector[30], ret, flags = 0;
@@ -366,7 +366,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
-	ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line,
+	ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
 			0, flags, ovector, ARRAY_SIZE(ovector));
 	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
 		die("pcre_exec failed with error code %d", ret);
@@ -379,25 +379,25 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
 	return ret;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
-	pcre_free(p->pcre_regexp);
-	pcre_free(p->pcre_extra_info);
-	pcre_free((void *)p->pcre_tables);
+	pcre_free(p->pcre1_regexp);
+	pcre_free(p->pcre1_extra_info);
+	pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
 	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 		regmatch_t *match, int eflags)
 {
 	return 1;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
 }
 #endif /* !USE_LIBPCRE1 */
@@ -477,8 +477,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
-	if (opt->pcre) {
-		compile_pcre_regexp(p, opt);
+	if (opt->pcre1) {
+		compile_pcre1_regexp(p, opt);
 		return;
 	}
 
@@ -834,8 +834,8 @@ void free_grep_patterns(struct grep_opt *opt)
 		case GREP_PATTERN_BODY:
 			if (p->kws)
 				kwsfree(p->kws);
-			else if (p->pcre_regexp)
-				free_pcre_regexp(p);
+			else if (p->pcre1_regexp)
+				free_pcre1_regexp(p);
 			else
 				regfree(&p->regexp);
 			free(p->pattern);
@@ -914,8 +914,8 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
 
 	if (p->fixed)
 		hit = !fixmatch(p, line, eol, match);
-	else if (p->pcre_regexp)
-		hit = !pcrematch(p, line, eol, match, eflags);
+	else if (p->pcre1_regexp)
+		hit = !pcre1match(p, line, eol, match, eflags);
 	else
 		hit = !regexec_buf(&p->regexp, line, eol - line, 1, match,
 				   eflags);
diff --git a/grep.h b/grep.h
index 073b0e4c92..38ac82b638 100644
--- a/grep.h
+++ b/grep.h
@@ -46,9 +46,9 @@ struct grep_pat {
 	size_t patternlen;
 	enum grep_header_field field;
 	regex_t regexp;
-	pcre *pcre_regexp;
-	pcre_extra *pcre_extra_info;
-	const unsigned char *pcre_tables;
+	pcre *pcre1_regexp;
+	pcre_extra *pcre1_extra_info;
+	const unsigned char *pcre1_tables;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
@@ -111,7 +111,7 @@ struct grep_opt {
 	int allow_textconv;
 	int extended;
 	int use_reflog_filter;
-	int pcre;
+	int pcre1;
 	int relative;
 	int pathname;
 	int null_following_name;
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 25/30] grep: move is_fixed() earlier to avoid forward declaration
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (23 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 24/30] grep: change internal *pcre* variable & function names to be *pcre1* Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 26/30] test-lib: add a PTHREADS prerequisite Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Move the is_fixed() function which are currently only used in
compile_regexp() earlier so it can be used in the PCRE family of
functions in a later change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/grep.c b/grep.c
index 07512346b1..1157529115 100644
--- a/grep.c
+++ b/grep.c
@@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p,
 	die("%s'%s': %s", where, p->pattern, error);
 }
 
+static int is_fixed(const char *s, size_t len)
+{
+	size_t i;
+
+	for (i = 0; i < len; i++) {
+		if (is_regex_special(s[i]))
+			return 0;
+	}
+
+	return 1;
+}
+
 static int has_null(const char *s, size_t len)
 {
 	/*
@@ -402,18 +414,6 @@ static void free_pcre1_regexp(struct grep_pat *p)
 }
 #endif /* !USE_LIBPCRE1 */
 
-static int is_fixed(const char *s, size_t len)
-{
-	size_t i;
-
-	for (i = 0; i < len; i++) {
-		if (is_regex_special(s[i]))
-			return 0;
-	}
-
-	return 1;
-}
-
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 26/30] test-lib: add a PTHREADS prerequisite
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (24 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 25/30] grep: move is_fixed() earlier to avoid forward declaration Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 27/30] pack-objects & index-pack: add test for --threads warning Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a PTHREADS prerequisite which is false when git is compiled with
NO_PTHREADS=YesPlease.

There's lots of custom code that runs when threading isn't available,
but before this prerequisite there was no way to test it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile      | 1 +
 t/README      | 4 ++++
 t/test-lib.sh | 1 +
 3 files changed, 6 insertions(+)

diff --git a/Makefile b/Makefile
index 374fbc7e58..a79274e5e6 100644
--- a/Makefile
+++ b/Makefile
@@ -2242,6 +2242,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
 	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
+	@echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
 	@echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+
diff --git a/t/README b/t/README
index a90cb62583..2f95860369 100644
--- a/t/README
+++ b/t/README
@@ -817,6 +817,10 @@ use these, and "test_set_prereq" for how to define your own.
    Test is run on a filesystem which converts decomposed utf-8 (nfd)
    to precomposed utf-8 (nfc).
 
+ - PTHREADS
+
+   Git wasn't compiled with NO_PTHREADS=YesPlease.
+
 Tips for Writing Tests
 ----------------------
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1d0f636cbd..43529451f9 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1013,6 +1013,7 @@ esac
 
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
+test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
 test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 27/30] pack-objects & index-pack: add test for --threads warning
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (25 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 26/30] test-lib: add a PTHREADS prerequisite Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 28/30] pack-objects: fix buggy warning about threads Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a test for the warning that's emitted when --threads or
pack.threads is provided under NO_PTHREADS=YesPlease. This uses the
new PTHREADS prerequisite.

The assertion for C_LOCALE_OUTPUT in the latter test is currently
redundant, since unlike index-pack the pack-objects warnings aren't
i18n'd. However they might be changed to be i18n'd in the future, and
there's no harm in future-proofing the test.

There's an existing bug in the implementation of pack-objects which
this test currently tests for as-is. Details about the bug & the fix
are included in a follow-up change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5300-pack-object.sh | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 43a672c345..6ed23ee1d2 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -421,6 +421,42 @@ test_expect_success 'index-pack <pack> works in non-repo' '
 	test_path_is_file foo.idx
 '
 
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'index-pack --threads=N or pack.threads=N warns when no pthreads' '
+	test_must_fail git index-pack --threads=2 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 1 warnings &&
+	grep -F "no threads support, ignoring --threads=2" err &&
+
+	test_must_fail git -c pack.threads=2 index-pack 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 1 warnings &&
+	grep -F "no threads support, ignoring pack.threads" err &&
+
+	test_must_fail git -c pack.threads=2 index-pack --threads=4 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 2 warnings &&
+	grep -F "no threads support, ignoring --threads=4" err &&
+	grep -F "no threads support, ignoring pack.threads" err
+'
+
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack.threads=N warns when no pthreads' '
+	git pack-objects --threads=2 --stdout --all </dev/null >/dev/null 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 1 warnings &&
+	grep -F "no threads support, ignoring --threads" err &&
+
+	git -c pack.threads=2 pack-objects --stdout --all </dev/null >/dev/null 2>err &&
+	grep ^warning: err >warnings &&
+	test_must_fail test_line_count = 1 warnings &&
+	grep -F "no threads support, ignoring pack.threads" err &&
+
+	git -c pack.threads=2 pack-objects --threads=4 --stdout --all </dev/null >/dev/null 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 2 warnings &&
+	grep -F "no threads support, ignoring --threads" err &&
+	grep -F "no threads support, ignoring pack.threads" err
+'
+
 #
 # WARNING!
 #
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 28/30] pack-objects: fix buggy warning about threads
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (26 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 27/30] pack-objects & index-pack: add test for --threads warning Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 29/30] grep: given --threads with NO_PTHREADS=YesPlease, warn Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Fix a buggy warning about threads under NO_PTHREADS=YesPlease. Due to
re-using the delta_search_threads variable for both the state of the
"pack.threads" config & the --threads option, setting "pack.threads"
but not supplying --threads would trigger the warning for both
"pack.threads" & --threads.

Solve this bug by resetting the delta_search_threads variable in
git_pack_config(), it might then be set by --threads again and be
subsequently warned about, as the test I'm changing here asserts.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/pack-objects.c | 4 +++-
 t/t5300-pack-object.sh | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9b4ba8a80d..efa21a15dd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2472,8 +2472,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 			die("invalid number of threads specified (%d)",
 			    delta_search_threads);
 #ifdef NO_PTHREADS
-		if (delta_search_threads != 1)
+		if (delta_search_threads != 1) {
 			warning("no threads support, ignoring %s", k);
+			delta_search_threads = 0;
+		}
 #endif
 		return 0;
 	}
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 6ed23ee1d2..9c68b99251 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -447,7 +447,7 @@ test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack.
 
 	git -c pack.threads=2 pack-objects --stdout --all </dev/null >/dev/null 2>err &&
 	grep ^warning: err >warnings &&
-	test_must_fail test_line_count = 1 warnings &&
+	test_line_count = 1 warnings &&
 	grep -F "no threads support, ignoring pack.threads" err &&
 
 	git -c pack.threads=2 pack-objects --threads=4 --stdout --all </dev/null >/dev/null 2>err &&
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 29/30] grep: given --threads with NO_PTHREADS=YesPlease, warn
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (27 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 28/30] pack-objects: fix buggy warning about threads Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 21:42 ` [PATCH v3 30/30] grep: assert that threading is enabled when calling grep_{lock,unlock} Ævar Arnfjörð Bjarmason
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a warning about missing thread support when grep.threads or
--threads is set to a non 0 (default) or 1 (no parallelism) value
under NO_PTHREADS=YesPlease.

This is for consistency with the index-pack & pack-objects commands,
which also take a --threads option & are configurable via
pack.threads, and have long warned about the same under
NO_PTHREADS=YesPlease.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c  | 13 +++++++++++++
 t/t7810-grep.sh | 18 ++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index a191e2976b..3c721b75a5 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -289,6 +289,17 @@ static int grep_cmd_config(const char *var, const char *value, void *cb)
 		if (num_threads < 0)
 			die(_("invalid number of threads specified (%d) for %s"),
 			    num_threads, var);
+#ifdef NO_PTHREADS
+		else if (num_threads && num_threads != 1) {
+			/*
+			 * TRANSLATORS: %s is the configuration
+			 * variable for tweaking threads, currently
+			 * grep.threads
+			 */
+			warning(_("no threads support, ignoring %s"), var);
+			num_threads = 0;
+		}
+#endif
 	}
 
 	return st;
@@ -1229,6 +1240,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	else if (num_threads < 0)
 		die(_("invalid number of threads specified (%d)"), num_threads);
 #else
+	if (num_threads)
+		warning(_("no threads support, ignoring --threads"));
 	num_threads = 0;
 #endif
 
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 561709ef6a..f106387820 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -791,6 +791,24 @@ do
 	"
 done
 
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'grep --threads=N or pack.threads=N warns when no pthreads' '
+	git grep --threads=2 Hello hello_world 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 1 warnings &&
+	grep -F "no threads support, ignoring --threads" err &&
+	git -c grep.threads=2 grep Hello hello_world 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 1 warnings &&
+	grep -F "no threads support, ignoring grep.threads" err &&
+	git -c grep.threads=2 grep --threads=4 Hello hello_world 2>err &&
+	grep ^warning: err >warnings &&
+	test_line_count = 2 warnings &&
+	grep -F "no threads support, ignoring --threads" err &&
+	grep -F "no threads support, ignoring grep.threads" err &&
+	git -c grep.threads=0 grep --threads=0 Hello hello_world 2>err &&
+	test_line_count = 0 err
+'
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
 	mkdir -p s &&
 	(
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 30/30] grep: assert that threading is enabled when calling grep_{lock,unlock}
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (28 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 29/30] grep: given --threads with NO_PTHREADS=YesPlease, warn Ævar Arnfjörð Bjarmason
@ 2017-05-20 21:42 ` Ævar Arnfjörð Bjarmason
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
  30 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 21:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Change the grep_{lock,unlock} functions to assert that num_threads is
true, instead of only locking & unlocking the pthread mutex lock when
it is.

These functions are never called when num_threads isn't true, this
logic has gone through multiple iterations since the initial
introduction of grep threading in commit 5b594f457a ("Threaded grep",
2010-01-25), but ever since then they'd only be called if num_threads
was true, so this check made the code confusing to read.

Replace the check with an assertion, so that it's clear to the reader
that this code path is never taken unless we're spawning threads.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3c721b75a5..b1095362fb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -73,14 +73,14 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-	if (num_threads)
-		pthread_mutex_lock(&grep_mutex);
+	assert(num_threads);
+	pthread_mutex_lock(&grep_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-	if (num_threads)
-		pthread_mutex_unlock(&grep_mutex);
+	assert(num_threads);
+	pthread_mutex_unlock(&grep_mutex);
 }
 
 /* Signalled when a new work_item is added to todo. */
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 00/30] Easy to review grep & pre-PCRE changes
  2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
                   ` (29 preceding siblings ...)
  2017-05-20 21:42 ` [PATCH v3 30/30] grep: assert that threading is enabled when calling grep_{lock,unlock} Ævar Arnfjörð Bjarmason
@ 2017-05-20 23:50 ` Junio C Hamano
  2017-05-23 19:24   ` [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
                     ` (7 more replies)
  30 siblings, 8 replies; 77+ messages in thread
From: Junio C Hamano @ 2017-05-20 23:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Easy to review? 29 (I mean 30) patches? Are you kidding me?!
>
> As noted in v1 (<20170511091829.5634-1-avarab@gmail.com>;
> https://public-inbox.org/git/20170511091829.5634-1-avarab@gmail.com/)
> these are all doc, test, refactoring etc. changes needed by the
> subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series.
>
> Since Junio hasn't been picking it I'm no longer sending updates to
> that patch series & waiting for this one to cook first.

I actually do not mind a reroll that goes together with this.  The
only reason why I skipped the earlier one was because I looked at
the original one, and the discussion on the reroll of this 'easy to
review' part indicated that it will be rerolled, before I got to
look at these upper layer patches.

Overall nicely done.  I only had just a few observations.

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp
  2017-05-20 21:42 ` [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp Ævar Arnfjörð Bjarmason
@ 2017-05-20 23:50   ` Junio C Hamano
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2017-05-20 23:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Make the --regexp-ignore-case option work with --perl-regexp. This
> never worked, and there was no test for this. Fix the bug and add a
> test.
>
> When PCRE support was added in commit 63e7e9d8b6 ("git-grep: Learn
> PCRE", 2011-05-09) compile_pcre_regexp() would only check
> opt->ignore_case, but when the --perl-regexp option was added in
> commit 727b6fc3ed ("log --grep: accept --basic-regexp and
> --perl-regexp", 2012-10-03) the code didn't set the opt->ignore_case.
>
> Change the test suite to test for -i and --invert-regexp with
> basic/extended/perl patterns in addition to fixed, which was the only
> patternType that was tested for before in combination with those
> options.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  revision.c     |  1 +
>  t/t4202-log.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 56 insertions(+), 5 deletions(-)
>
> diff --git a/revision.c b/revision.c
> index 8a8c1789c7..4883cdd2d0 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1991,6 +1991,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>  	} else if (!strcmp(arg, "--extended-regexp") || !strcmp(arg, "-E")) {
>  		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
>  	} else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
> +		revs->grep_filter.ignore_case = 1;
>  		revs->grep_filter.regflags |= REG_ICASE;
>  		DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
>  	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {

Looks good.

I however wonder if it is a better approach in the longer term to
treat the .ignore_case field just like .extended_regexp_option
field, i.e. not committing immediately to .regflags but commit it
after config and command line parsing is done, just like we make the
"BRE? ERE?" decision in grep_commit_pattern_type().

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
  2017-05-20 21:42 ` [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Ævar Arnfjörð Bjarmason
@ 2017-05-20 23:50   ` Junio C Hamano
  2017-05-21  6:23     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2017-05-20 23:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/t/perf/README b/t/perf/README
> index 49ea4349be..b3d95042a8 100644
> --- a/t/perf/README
> +++ b/t/perf/README
> @@ -60,8 +60,23 @@ You can set the following variables (also in your config.mak):
>  
>      GIT_PERF_MAKE_OPTS
>  	Options to use when automatically building a git tree for
> -	performance testing.  E.g., -j6 would be useful.
> -
> +...
> +	any of that, that's an implementation detail that might change
> +	in the future.
> + 

I'll remove the trailing whitespace on this otherwise blank line
while queuing (no need to resend only to fix this one).

Thanks.

>      GIT_PERF_REPO
>      GIT_PERF_LARGE_REPO
>  	Repositories to copy for the performance tests.  The normal
> diff --git a/t/perf/run b/t/perf/run
> index c788d713ae..b61024a830 100755
> --- a/t/perf/run
> +++ b/t/perf/run
> @@ -37,8 +37,15 @@ build_git_rev () {
>  			cp "../../$config" "build/$rev/"
>  		fi
>  	done
> -	(cd build/$rev && make $GIT_PERF_MAKE_OPTS) ||
> -	die "failed to build revision '$mydir'"
> +	(
> +		cd build/$rev &&
> +		if test -n "$GIT_PERF_MAKE_COMMAND"
> +		then
> +			sh -c "$GIT_PERF_MAKE_COMMAND"
> +		else
> +			make $GIT_PERF_MAKE_OPTS
> +		fi
> +	) || die "failed to build revision '$mydir'"
>  }
>  
>  run_dirs_helper () {

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
  2017-05-20 23:50   ` Junio C Hamano
@ 2017-05-21  6:23     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-21  6:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams

On Sun, May 21, 2017 at 1:50 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> diff --git a/t/perf/README b/t/perf/README
>> index 49ea4349be..b3d95042a8 100644
>> --- a/t/perf/README
>> +++ b/t/perf/README
>> @@ -60,8 +60,23 @@ You can set the following variables (also in your config.mak):
>>
>>      GIT_PERF_MAKE_OPTS
>>       Options to use when automatically building a git tree for
>> -     performance testing.  E.g., -j6 would be useful.
>> -
>> +...
>> +     any of that, that's an implementation detail that might change
>> +     in the future.
>> +
>
> I'll remove the trailing whitespace on this otherwise blank line
> while queuing (no need to resend only to fix this one).
>
> Thanks.

Thanks, forgot about diff --check on the whole series with all the
other checks I was doing.

>>      GIT_PERF_REPO
>>      GIT_PERF_LARGE_REPO
>>       Repositories to copy for the performance tests.  The normal
>> diff --git a/t/perf/run b/t/perf/run
>> index c788d713ae..b61024a830 100755
>> --- a/t/perf/run
>> +++ b/t/perf/run
>> @@ -37,8 +37,15 @@ build_git_rev () {
>>                       cp "../../$config" "build/$rev/"
>>               fi
>>       done
>> -     (cd build/$rev && make $GIT_PERF_MAKE_OPTS) ||
>> -     die "failed to build revision '$mydir'"
>> +     (
>> +             cd build/$rev &&
>> +             if test -n "$GIT_PERF_MAKE_COMMAND"
>> +             then
>> +                     sh -c "$GIT_PERF_MAKE_COMMAND"
>> +             else
>> +                     make $GIT_PERF_MAKE_OPTS
>> +             fi
>> +     ) || die "failed to build revision '$mydir'"
>>  }
>>
>>  run_dirs_helper () {

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp
  2017-05-20 23:50   ` Junio C Hamano
@ 2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
  2017-05-22  0:17       ` Junio C Hamano
                         ` (6 more replies)
  0 siblings, 7 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-21  6:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams

On Sun, May 21, 2017 at 1:50 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Make the --regexp-ignore-case option work with --perl-regexp. This
>> never worked, and there was no test for this. Fix the bug and add a
>> test.
>>
>> When PCRE support was added in commit 63e7e9d8b6 ("git-grep: Learn
>> PCRE", 2011-05-09) compile_pcre_regexp() would only check
>> opt->ignore_case, but when the --perl-regexp option was added in
>> commit 727b6fc3ed ("log --grep: accept --basic-regexp and
>> --perl-regexp", 2012-10-03) the code didn't set the opt->ignore_case.
>>
>> Change the test suite to test for -i and --invert-regexp with
>> basic/extended/perl patterns in addition to fixed, which was the only
>> patternType that was tested for before in combination with those
>> options.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  revision.c     |  1 +
>>  t/t4202-log.sh | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>>  2 files changed, 56 insertions(+), 5 deletions(-)
>>
>> diff --git a/revision.c b/revision.c
>> index 8a8c1789c7..4883cdd2d0 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -1991,6 +1991,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>>       } else if (!strcmp(arg, "--extended-regexp") || !strcmp(arg, "-E")) {
>>               revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
>>       } else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
>> +             revs->grep_filter.ignore_case = 1;
>>               revs->grep_filter.regflags |= REG_ICASE;
>>               DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
>>       } else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
>
> Looks good.
>
> I however wonder if it is a better approach in the longer term to
> treat the .ignore_case field just like .extended_regexp_option
> field, i.e. not committing immediately to .regflags but commit it
> after config and command line parsing is done, just like we make the
> "BRE? ERE?" decision in grep_commit_pattern_type().

I started hacking up a patch to fix the root cause of this, i.e. the
users of the grep API should only set `.ignore_case = 1` and not care
about setting regflags, but it was more than a trivial change, so I
didn't include it in this series:

diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e81..be28c37265 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1151,8 +1151,6 @@ int cmd_grep(int argc, const char **argv, const
char *prefix)

        if (!opt.pattern_list)
                die(_("no pattern given."));
-       if (!opt.fixed && opt.ignore_case)
-               opt.regflags |= REG_ICASE;

        compile_grep_patterns(&opt);

diff --git a/grep.c b/grep.c
index 47cee45067..7b13ee1043 100644
--- a/grep.c
+++ b/grep.c
@@ -435,12 +435,11 @@ static void compile_fixed_regexp(struct grep_pat
*p, struct grep_opt *opt)

 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
-       int icase, ascii_only;
+       int ascii_only;
        int err;

        p->word_regexp = opt->word_regexp;
        p->ignore_case = opt->ignore_case;
-       icase          = opt->regflags & REG_ICASE || p->ignore_case;
        ascii_only     = !has_non_ascii(p->pattern);

        /*
@@ -456,12 +455,12 @@ static void compile_regexp(struct grep_pat *p,
struct grep_opt *opt)
         * want to use kws.
         */
        if (opt->fixed || is_fixed(p->pattern, p->patternlen))
-               p->fixed = !icase || ascii_only;
+               p->fixed = !p->ignore_case || ascii_only;
        else
                p->fixed = 0;

        if (p->fixed) {
-               p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
+               p->kws = kwsalloc(p->ignore_case ? tolower_trans_tbl : NULL);
                kwsincr(p->kws, p->pattern, p->patternlen);
                kwsprep(p->kws);
                return;
@@ -480,6 +479,8 @@ static void compile_regexp(struct grep_pat *p,
struct grep_opt *opt)
                return;
        }

+       if (p->ignore_case)
+               opt->regflags |= REG_ICASE;
        err = regcomp(&p->regexp, p->pattern, opt->regflags);
        if (err) {
                char errbuf[1024];
diff --git a/revision.c b/revision.c
index 4883cdd2d0..30c23a1098 100644
--- a/revision.c
+++ b/revision.c
@@ -1992,7 +1992,6 @@ static int handle_revision_opt(struct rev_info
*revs, int argc, const char **arg
                revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
        } else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
                revs->grep_filter.ignore_case = 1;
-               revs->grep_filter.regflags |= REG_ICASE;
                DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
        } else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
                revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;

But an even better solution is to get rid of passing the regflags
field in grep_opt entirely, this conflicts with some of my later
patches:

diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e81..be28c37265 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1151,8 +1151,6 @@ int cmd_grep(int argc, const char **argv, const
char *prefix)

        if (!opt.pattern_list)
                die(_("no pattern given."));
-       if (!opt.fixed && opt.ignore_case)
-               opt.regflags |= REG_ICASE;

        compile_grep_patterns(&opt);
diff --git a/grep.c b/grep.c
index 47cee45067..1bde7037ba 100644
--- a/grep.c
+++ b/grep.c
@@ -34,7 +34,6 @@ void init_grep_defaults(void)
        memset(opt, 0, sizeof(*opt));
        opt->relative = 1;
        opt->pathname = 1;
-       opt->regflags = REG_NEWLINE;
        opt->max_depth = -1;
        opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
        opt->extended_regexp_option = 0;
@@ -156,7 +155,6 @@ void grep_init(struct grep_opt *opt, const char *prefix)
        opt->linenum = def->linenum;
        opt->max_depth = def->max_depth;
        opt->pathname = def->pathname;
-       opt->regflags = def->regflags;
        opt->relative = def->relative;
        opt->output = def->output;

@@ -179,25 +177,25 @@ static void grep_set_pattern_type_option(enum
grep_pattern_type pattern_type, st
        case GREP_PATTERN_TYPE_BRE:
                opt->fixed = 0;
                opt->pcre = 0;
-               opt->regflags &= ~REG_EXTENDED;
+               opt->extended = 0;
                break;
         case GREP_PATTERN_TYPE_ERE:
                opt->fixed = 0;
                opt->pcre = 0;
-               opt->regflags |= REG_EXTENDED;
+               opt->extended = 1;
                break;

        case GREP_PATTERN_TYPE_FIXED:
                opt->fixed = 1;
                opt->pcre = 0;
-               opt->regflags &= ~REG_EXTENDED;
+               opt->extended = 0;
                break;

        case GREP_PATTERN_TYPE_PCRE:
                opt->fixed = 0;
                opt->pcre = 1;
-               opt->regflags &= ~REG_EXTENDED;
+               opt->extended = 0;
                break;
        }
 }
@@ -415,10 +413,9 @@ static void compile_fixed_regexp(struct grep_pat
*p, struct grep_opt *opt)
 {
        struct strbuf sb = STRBUF_INIT;
        int err;
-       int regflags;
+       int regflags = REG_NEWLINE;

        basic_regex_quote_buf(&sb, p->pattern);
-       regflags = opt->regflags & ~REG_EXTENDED;
        if (opt->ignore_case)
                regflags |= REG_ICASE;
        err = regcomp(&p->regexp, sb.buf, regflags);
@@ -435,12 +432,12 @@ static void compile_fixed_regexp(struct grep_pat
*p, struct grep_opt *opt)

 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
-       int icase, ascii_only;
+       int ascii_only;
        int err;
+       int regflags = REG_NEWLINE;

        p->word_regexp = opt->word_regexp;
        p->ignore_case = opt->ignore_case;
-       icase          = opt->regflags & REG_ICASE || p->ignore_case;
        ascii_only     = !has_non_ascii(p->pattern);

        /*
@@ -456,12 +453,12 @@ static void compile_regexp(struct grep_pat *p,
struct grep_opt *opt)
         * want to use kws.
         */
        if (opt->fixed || is_fixed(p->pattern, p->patternlen))
-               p->fixed = !icase || ascii_only;
+               p->fixed = !p->ignore_case || ascii_only;
        else
                p->fixed = 0;

        if (p->fixed) {
-               p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
+               p->kws = kwsalloc(p->ignore_case ? tolower_trans_tbl : NULL);
                kwsincr(p->kws, p->pattern, p->patternlen);
                kwsprep(p->kws);
                return;
@@ -480,7 +477,11 @@ static void compile_regexp(struct grep_pat *p,
struct grep_opt *opt)
                return;
        }

-       err = regcomp(&p->regexp, p->pattern, opt->regflags);
+       if (p->ignore_case)
+               regflags |= REG_ICASE;
+       if (opt->extended)
+               regflags |= REG_EXTENDED;
+       err = regcomp(&p->regexp, p->pattern, regflags);
        if (err) {
                char errbuf[1024];
                regerror(err, &p->regexp, errbuf, 1024);
diff --git a/grep.h b/grep.h
index 267534ca24..d9d603deb1 100644
--- a/grep.h
+++ b/grep.h
@@ -129,7 +129,6 @@ struct grep_opt {
        char color_match_selected[COLOR_MAXLEN];
        char color_selected[COLOR_MAXLEN];
        char color_sep[COLOR_MAXLEN];
-       int regflags;
        unsigned pre_context;
        unsigned post_context;
        unsigned last_shown;
diff --git a/revision.c b/revision.c
index 4883cdd2d0..67240d38af 100644
--- a/revision.c
+++ b/revision.c
@@ -1362,7 +1362,6 @@ void init_revisions(struct rev_info *revs, const
char *prefix)
        init_grep_defaults();
        grep_init(&revs->grep_filter, prefix);
        revs->grep_filter.status_only = 1;
-       revs->grep_filter.regflags = REG_NEWLINE;

        diff_setup(&revs->diffopt);
        if (prefix && !revs->diffopt.prefix) {
@@ -1992,7 +1991,6 @@ static int handle_revision_opt(struct rev_info
*revs, int argc, const char **arg
                revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
        } else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
                revs->grep_filter.ignore_case = 1;
-               revs->grep_filter.regflags |= REG_ICASE;
                DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
        } else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
                revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;

But as all this code cleanup isn't needed for fixing this bug, and I'd
really like to get this series merged into next/master ASAP so I can
start submitting the grep/pcre patches that are actually interesting,
let's leave this orthogonal code cleanup for now.

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
@ 2017-05-22  0:17       ` Junio C Hamano
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2017-05-22  0:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Looks good.
>>
>> I however wonder if it is a better approach in the longer term to
>> treat the .ignore_case field just like .extended_regexp_option
>> field, i.e. not committing immediately to .regflags but commit it
>> after config and command line parsing is done, just like we make the
>> "BRE? ERE?" decision in grep_commit_pattern_type().
>
> I started hacking up a patch to fix the root cause of this, i.e. the
> users of the grep API should only set `.ignore_case = 1` and not care
> about setting regflags, but it was more than a trivial change, so I
> didn't include it in this series:

Ah, sorry.  Now I re-read my response, "Looks good.  I however
wonder..." does sound like I am requesting to do something more to
solve the same issue.  I shouldn't have phrased it that way.  It was
more like "while I was staring the context lines of your patch, I
noticed this tangent", nothing more.

> ...
> But an even better solution is to get rid of passing the regflags
> field in grep_opt entirely, this conflicts with some of my later
> patches:

Yes, that may be a good direction to go in longer term, but let's
not push it before the other bits already in flight land safely.

> ...
> But as all this code cleanup isn't needed for fixing this bug, and I'd
> really like to get this series merged into next/master ASAP so I can
> start submitting the grep/pcre patches that are actually interesting,
> let's leave this orthogonal code cleanup for now.

Yes.  Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-23 19:24   ` [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

On Sun, May 21, 2017 at 1:50 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Easy to review? 29 (I mean 30) patches? Are you kidding me?!
>>
>> As noted in v1 (<20170511091829.5634-1-avarab@gmail.com>;
>> https://public-inbox.org/git/20170511091829.5634-1-avarab@gmail.com/)
>> these are all doc, test, refactoring etc. changes needed by the
>> subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series.
>>
>> Since Junio hasn't been picking it I'm no longer sending updates to
>> that patch series & waiting for this one to cook first.
>
> I actually do not mind a reroll that goes together with this.  The
> only reason why I skipped the earlier one was because I looked at
> the original one, and the discussion on the reroll of this 'easy to
> review' part indicated that it will be rerolled, before I got to
> look at these upper layer patches.

Great, now that the base of this is migrating to next, here's the
second part of this.

For v1 see <20170513234535.12749-1-avarab@gmail.com>
(https://public-inbox.org/git/20170513234535.12749-1-avarab@gmail.com/).

The only changes to the content are better if/else branching around
conditional macros (but no functional changes) in the PCRE v1 JIT API
patch in response to a comment by Simon Ruderich.

The only other changes are trivial updates to the commit messages to
account for t/perf changes made in the series this builds on.

Ævar Arnfjörð Bjarmason (7):
  grep: don't redundantly compile throwaway patterns under threading
  grep: skip pthreads overhead when using one thread
  log: add -P as a synonym for --perl-regexp
  grep: add support for the PCRE v1 JIT API
  grep: un-break building with PCRE < 8.32
  grep: un-break building with PCRE < 8.20
  grep: add support for PCRE v2

 Documentation/rev-list-options.txt |   1 +
 Makefile                           |  30 +++++--
 builtin/grep.c                     |  16 +++-
 configure.ac                       |  77 +++++++++++++---
 grep.c                             | 177 ++++++++++++++++++++++++++++++++++++-
 grep.h                             |  31 +++++++
 revision.c                         |   2 +-
 t/t4202-log.sh                     |  12 +++
 t/test-lib.sh                      |   2 +-
 9 files changed, 324 insertions(+), 24 deletions(-)

-- 
2.13.0.303.g4ebf302169


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
  2017-05-23 19:24   ` [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-24  4:42     ` Junio C Hamano
  2017-05-23 19:24   ` [PATCH v2 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Change the pattern compilation logic under threading so that grep
doesn't compile a pattern it never ends up using on the non-threaded
code path, only to compile it again N times for N threads which will
each use their own copy, ignoring the initially compiled pattern.

This redundant compilation dates back to the initial introduction of
the threaded grep in commit 5b594f457a ("Threaded grep",
2010-01-25).

There was never any reason for doing this redundant work other than an
oversight in the initial commit. Jeff King suggested on-list in
<20170414212325.fefrl3qdjigwyitd@sigill.intra.peff.net> that this
might be needed to check the pattern for sanity before threaded
execution commences.

That's not the case. The pattern is compiled under threading in
start_threads() before any concurrent execution has started by calling
pthread_create(), so if the pattern contains an error we still do the
right thing. I.e. die with one error before any threaded execution has
commenced, instead of e.g. spewing out an error for each N threads,
which could be a regression a change like this might inadvertently
introduce.

This change is not meant as an optimization, any performance gains
from this are in the hundreds to thousands of nanoseconds at most. If
we wanted more performance here we could just re-use the compiled
patterns in multiple threads (regcomp(3) is thread-safe), or partially
re-use them and the associated structures in the case of later PCRE
JIT changes.

Rather, it's just to make the code easier to reason about. It's
confusing to debug this under threading & non-threading when the
threading codepaths redundantly compile a pattern which is never used.

The reason the patterns are recompiled is as a side-effect of
duplicating the whole grep_opt structure, which is not thread safe,
writable, and munged during execution. The grep_opt structure then
points to the grep_pat structure where pattern or patterns are stored.

I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT changes to follow the most expensive PCRE
patterns take something like 0.0X milliseconds to compile at most[1].

The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
--debug option to dump the parse tree", 2012-09-13) still works
properly with this change. It only emits debugging info during pattern
compilation, which is now dumped by the pattern compiled just before
the first thread is started.

1. http://sljit.sourceforge.net/pcre.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index b1095362fb..12e62fcbf3 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -224,7 +224,8 @@ static void start_threads(struct grep_opt *opt)
 		int err;
 		struct grep_opt *o = grep_opt_dup(opt);
 		o->output = strbuf_out;
-		o->debug = 0;
+		if (i)
+			o->debug = 0;
 		compile_grep_patterns(o);
 		err = pthread_create(&threads[i], NULL, run, o);
 
@@ -1167,8 +1168,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
 
-	compile_grep_patterns(&opt);
-
 	/*
 	 * We have to find "--" in a separate pass, because its presence
 	 * influences how we will parse arguments that come before it.
@@ -1245,6 +1244,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	num_threads = 0;
 #endif
 
+	if (!num_threads)
+		/*
+		 * The compiled patterns on the main path are only
+		 * used when not using threading. Otherwise
+		 * start_threads() below calls compile_grep_patterns()
+		 * for each thread.
+		 */
+		compile_grep_patterns(&opt);
+
 #ifndef NO_PTHREADS
 	if (num_threads) {
 		if (!(opt.name_only || opt.unmatch_name_only || opt.count)
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 2/7] grep: skip pthreads overhead when using one thread
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
  2017-05-23 19:24   ` [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
  2017-05-23 19:24   ` [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-24  4:45     ` Junio C Hamano
  2017-05-23 19:24   ` [PATCH v2 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Skip the administrative overhead of using pthreads when only using one
thread. Instead take the non-threaded path which would be taken under
NO_PTHREADS.

The threading support was initially added in commit
5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
number of 8 threads. Later the number of threads was made configurable
in commit 89f09dd34e ("grep: add --threads=<num> option and
grep.threads configuration", 2015-12-15).

That change did not add any special handling for --threads=1. Now we
take a slightly faster path by skipping thread handling entirely when
1 thread is requested.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 12e62fcbf3..bd008cb100 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		num_threads = GREP_NUM_THREADS_DEFAULT;
 	else if (num_threads < 0)
 		die(_("invalid number of threads specified (%d)"), num_threads);
+	if (num_threads == 1)
+		num_threads = 0;
 #else
 	if (num_threads)
 		warning(_("no threads support, ignoring --threads"));
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 3/7] log: add -P as a synonym for --perl-regexp
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
                     ` (2 preceding siblings ...)
  2017-05-23 19:24   ` [PATCH v2 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-23 19:24   ` [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Add a short -P option as a synonym for the longer --perl-regexp, for
consistency with the options the corresponding grep invocations
accept.

This was intentionally omitted in commit 727b6fc3ed ("log --grep:
accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
future use.

Make it consistent with "grep" rather than to keep it open for future
use, and to avoid the confusion of -P meaning different things for
grep & log, as is the case with the -G option.

As noted in the aforementioned commit the --basic-regexp option can't
have a corresponding -G argument, as the log command already uses that
for -G<regex>.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/rev-list-options.txt |  1 +
 revision.c                         |  2 +-
 t/t4202-log.sh                     | 12 ++++++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a46f70c2b1..9c44eae55d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -91,6 +91,7 @@ endif::git-rev-list[]
 	Consider the limiting patterns to be fixed strings (don't interpret
 	pattern as a regular expression).
 
+-P::
 --perl-regexp::
 	Consider the limiting patterns to be Perl-compatible regular
 	expressions.
diff --git a/revision.c b/revision.c
index 4883cdd2d0..60329da1bd 100644
--- a/revision.c
+++ b/revision.c
@@ -1996,7 +1996,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-	} else if (!strcmp(arg, "--perl-regexp")) {
+	} else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index dbed3efeee..2b07d1c0c2 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -404,8 +404,20 @@ test_expect_success 'log with various grep.patternType configurations & command-
 			--grep="(1|2)" >actual.fixed.short-arg &&
 		git log --pretty=tformat:%s -E \
 			--grep="\|2" >actual.extended.short-arg &&
+		if test_have_prereq PCRE
+		then
+			git log --pretty=tformat:%s -P \
+				--grep="[\d]\|" >actual.perl.short-arg
+		else
+			test_must_fail git log -P \
+				--grep="[\d]\|"
+		fi &&
 		test_cmp expect.fixed actual.fixed.short-arg &&
 		test_cmp expect.extended actual.extended.short-arg &&
+		if test_have_prereq PCRE
+		then
+			test_cmp expect.perl actual.perl.short-arg
+		fi &&
 
 		git log --pretty=tformat:%s --fixed-strings \
 			--grep="(1|2)" >actual.fixed.long-arg &&
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
                     ` (3 preceding siblings ...)
  2017-05-23 19:24   ` [PATCH v2 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-24  5:17     ` Junio C Hamano
  2017-05-23 19:24   ` [PATCH v2 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:

    $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh
    Test                                           HEAD~             HEAD
    ---------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.35(1.11+0.43)   0.23(0.42+0.46) -34.3%
    7820.7: perl grep '^how to'                     0.64(2.71+0.36)   0.27(0.66+0.44) -57.8%
    7820.11: perl grep '[how] to'                   0.63(2.51+0.42)   0.33(0.98+0.39) -47.6%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       1.17(5.61+0.35)   0.34(1.08+0.46) -70.9%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.43(1.52+0.44)   0.30(0.88+0.42) -30.2%

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 34 +++++++++++++++++++++++++++++++++-
 grep.h |  6 ++++++
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1157529115..49e9aed457 100644
--- a/grep.c
+++ b/grep.c
@@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	const char *error;
 	int erroffset;
 	int options = PCRE_MULTILINE;
+#ifdef PCRE_CONFIG_JIT
+	int canjit;
+#endif
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern))
@@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	if (!p->pcre1_regexp)
 		compile_regexp_failed(p, error);
 
-	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, PCRE_STUDY_JIT_COMPILE, &error);
 	if (!p->pcre1_extra_info && error)
 		die("%s", error);
+
+#ifdef PCRE_CONFIG_JIT
+	pcre_config(PCRE_CONFIG_JIT, &canjit);
+	if (canjit == 1) {
+		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
+		if (!p->pcre1_jit_stack)
+			die("BUG: Couldn't allocate PCRE JIT stack");
+		pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack);
+		p->pcre1_jit_on = 1;
+	}
+#endif
 }
 
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
@@ -378,8 +392,17 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
+#ifdef PCRE_CONFIG_JIT
+	if (p->pcre1_jit_on)
+		ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+				    eol - line, 0, flags, ovector,
+				    ARRAY_SIZE(ovector), p->pcre1_jit_stack);
+	else
+#endif
+	/* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
 	ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
 			0, flags, ovector, ARRAY_SIZE(ovector));
+
 	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
 		die("pcre_exec failed with error code %d", ret);
 	if (ret > 0) {
@@ -394,7 +417,16 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre1_regexp);
+
+#ifdef PCRE_CONFIG_JIT
+	if (p->pcre1_jit_on) {
+		pcre_free_study(p->pcre1_extra_info);
+		pcre_jit_stack_free(p->pcre1_jit_stack);
+	} else
+#endif
+	/* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
 	pcre_free(p->pcre1_extra_info);
+
 	pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
diff --git a/grep.h b/grep.h
index 38ac82b638..14f47189f9 100644
--- a/grep.h
+++ b/grep.h
@@ -3,9 +3,13 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include <pcre.h>
+#ifndef PCRE_STUDY_JIT_COMPILE
+#define PCRE_STUDY_JIT_COMPILE 0
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
+typedef int pcre_jit_stack;
 #endif
 #include "kwset.h"
 #include "thread-utils.h"
@@ -48,7 +52,9 @@ struct grep_pat {
 	regex_t regexp;
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
+	pcre_jit_stack *pcre1_jit_stack;
 	const unsigned char *pcre1_tables;
+	int pcre1_jit_on;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 5/7] grep: un-break building with PCRE < 8.32
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
                     ` (4 preceding siblings ...)
  2017-05-23 19:24   ` [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-24  6:00     ` Junio C Hamano
  2017-05-23 19:24   ` [PATCH v2 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
  2017-05-23 19:24   ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
  7 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.

The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).

This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.

So just take the easy way out and disable the JIT on any version older
than 8.32.

The reason this change isn't part of the initial change PCRE JIT
support is because possibly slightly annoying someone who's bisecting
with an ancient PCRE is worth it to have a cleaner history showing
which parts of the implementation are only used for ancient PCRE
versions. This also makes it easier to revert this change if we ever
decide to stop supporting those old versions.

1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
   native interface for JIT. Through this interface, the
   compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 8 ++++----
 grep.h | 5 +++++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 49e9aed457..3c0c30f033 100644
--- a/grep.c
+++ b/grep.c
@@ -351,7 +351,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	const char *error;
 	int erroffset;
 	int options = PCRE_MULTILINE;
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	int canjit;
 #endif
 
@@ -372,7 +372,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	if (!p->pcre1_extra_info && error)
 		die("%s", error);
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	pcre_config(PCRE_CONFIG_JIT, &canjit);
 	if (canjit == 1) {
 		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
@@ -392,7 +392,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	if (p->pcre1_jit_on)
 		ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
 				    eol - line, 0, flags, ovector,
@@ -418,7 +418,7 @@ static void free_pcre1_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre1_regexp);
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	if (p->pcre1_jit_on) {
 		pcre_free_study(p->pcre1_extra_info);
 		pcre_jit_stack_free(p->pcre1_jit_stack);
diff --git a/grep.h b/grep.h
index 14f47189f9..73ef0ef8ec 100644
--- a/grep.h
+++ b/grep.h
@@ -3,6 +3,11 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include <pcre.h>
+#ifdef PCRE_CONFIG_JIT
+#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
+#define GIT_PCRE1_CAN_DO_MODERN_JIT
+#endif
+#endif
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 6/7] grep: un-break building with PCRE < 8.20
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
                     ` (5 preceding siblings ...)
  2017-05-23 19:24   ` [PATCH v2 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-23 19:24   ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
  7 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.20.

The 8.20 release was the first release to have JIT & pcre_jit_stack in
the headers, so a mock type needs to be provided for it on those
releases.

Now git should compile with all PCRE versions that it supported before
my JIT change.

I've tested it as far back as version 7.5 released on 2008-01-10, once
I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1,
and I couldn't be bothered to anything older than 7.5 as I'm confident
that if the build breaks on those older versions it's not because of
my JIT change.

See the "un-break" change in this series ("grep: un-break building
with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the
main PCRE JIT commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/grep.h b/grep.h
index 73ef0ef8ec..b7b9d487b0 100644
--- a/grep.h
+++ b/grep.h
@@ -11,6 +11,9 @@
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
+#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20
+typedef int pcre_jit_stack;
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 7/7] grep: add support for PCRE v2
  2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
                     ` (6 preceding siblings ...)
  2017-05-23 19:24   ` [PATCH v2 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
@ 2017-05-23 19:24   ` Ævar Arnfjörð Bjarmason
  2017-05-24  6:23     ` Junio C Hamano
  7 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-23 19:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

    $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh
    [...]
    Test                                           HEAD~2            HEAD~                    HEAD
    ----------------------------------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.22(0.40+0.48)   0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
    7820.7: perl grep '^how to'                     0.27(0.62+0.50)   0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
    7820.11: perl grep '[how] to'                   0.33(0.92+0.47)   0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.35(1.08+0.46)   0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.78+0.51)   0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

    Test                                           HEAD~             HEAD
    ---------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.23(0.41+0.47)   0.23(0.26+0.59) +0.0%
    7820.7: perl grep '^how to'                     0.27(0.64+0.47)   0.23(0.28+0.56) -14.8%
    7820.11: perl grep '[how] to'                   0.34(0.95+0.44)   0.25(0.38+0.56) -26.5%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.34(1.07+0.46)   0.24(0.52+0.49) -29.4%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.81+0.46)   0.22(0.33+0.54) -26.7%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile      |  30 +++++++++---
 configure.ac  |  77 ++++++++++++++++++++++++++-----
 grep.c        | 143 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 grep.h        |  17 +++++++
 t/test-lib.sh |   2 +-
 5 files changed, 250 insertions(+), 19 deletions(-)

diff --git a/Makefile b/Makefile
index a79274e5e6..d77ca4c1a5 100644
--- a/Makefile
+++ b/Makefile
@@ -29,7 +29,12 @@ all::
 # Perl-compatible regular expressions instead of standard or extended
 # POSIX regular expressions.
 #
-# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
+# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
+# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
+# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
+# default in future releases.
+#
+# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
 # Define HAVE_ALLOCA_H if you have working alloca(3) defined in that header.
@@ -1087,15 +1092,27 @@ ifdef NO_LIBGEN_H
 	COMPAT_OBJS += compat/basename.o
 endif
 
-ifdef USE_LIBPCRE
-	BASIC_CFLAGS += -DUSE_LIBPCRE1
-	ifdef LIBPCREDIR
-		BASIC_CFLAGS += -I$(LIBPCREDIR)/include
-		EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+USE_LIBPCRE1 ?= $(USE_LIBPCRE)
+
+ifneq (,$(USE_LIBPCRE1))
+	ifdef USE_LIBPCRE2
+$(error Only set USE_LIBPCRE1 (or its alias USE_LIBPCRE) or USE_LIBPCRE2, not both!)
 	endif
+
+	BASIC_CFLAGS += -DUSE_LIBPCRE1
 	EXTLIBS += -lpcre
 endif
 
+ifdef USE_LIBPCRE2
+	BASIC_CFLAGS += -DUSE_LIBPCRE2
+	EXTLIBS += -lpcre2-8
+endif
+
+ifdef LIBPCREDIR
+	BASIC_CFLAGS += -I$(LIBPCREDIR)/include
+	EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+endif
+
 ifdef HAVE_ALLOCA_H
 	BASIC_CFLAGS += -DHAVE_ALLOCA_H
 endif
@@ -2241,6 +2258,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
 	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
+	@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
diff --git a/configure.ac b/configure.ac
index deeb968daa..11d083fbe0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -255,21 +255,61 @@ GIT_PARSE_WITH([openssl]))
 # Perl-compatible regular expressions instead of standard or extended
 # POSIX regular expressions.
 #
-# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
+# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
+# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
+# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
+# default in future releases.
+#
+# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
 AC_ARG_WITH(libpcre,
-AS_HELP_STRING([--with-libpcre],[support Perl-compatible regexes (default is NO)])
+AS_HELP_STRING([--with-libpcre],[synonym for --with-libpcre1]),
+    if test "$withval" = "no"; then
+	USE_LIBPCRE1=
+    elif test "$withval" = "yes"; then
+	USE_LIBPCRE1=YesPlease
+    else
+	USE_LIBPCRE1=YesPlease
+	LIBPCREDIR=$withval
+	AC_MSG_NOTICE([Setting LIBPCREDIR to $LIBPCREDIR])
+        dnl USE_LIBPCRE1 can still be modified below, so don't substitute
+        dnl it yet.
+	GIT_CONF_SUBST([LIBPCREDIR])
+    fi)
+
+AC_ARG_WITH(libpcre1,
+AS_HELP_STRING([--with-libpcre1],[support Perl-compatible regexes via libpcre1 (default is NO)])
+AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and headers]),
+    if test "$withval" = "no"; then
+	USE_LIBPCRE1=
+    elif test "$withval" = "yes"; then
+	USE_LIBPCRE1=YesPlease
+    else
+	USE_LIBPCRE1=YesPlease
+	LIBPCREDIR=$withval
+	AC_MSG_NOTICE([Setting LIBPCREDIR to $LIBPCREDIR])
+        dnl USE_LIBPCRE1 can still be modified below, so don't substitute
+        dnl it yet.
+	GIT_CONF_SUBST([LIBPCREDIR])
+    fi)
+
+AC_ARG_WITH(libpcre2,
+AS_HELP_STRING([--with-libpcre2],[support Perl-compatible regexes via libpcre2 (default is NO)])
 AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and headers]),
+    if test -n "$USE_LIBPCRE1"; then
+        AC_MSG_ERROR([Only supply one of --with-libpcre1 or --with-libpcre2!])
+    fi
+
     if test "$withval" = "no"; then
-	USE_LIBPCRE=
+	USE_LIBPCRE2=
     elif test "$withval" = "yes"; then
-	USE_LIBPCRE=YesPlease
+	USE_LIBPCRE2=YesPlease
     else
-	USE_LIBPCRE=YesPlease
+	USE_LIBPCRE2=YesPlease
 	LIBPCREDIR=$withval
 	AC_MSG_NOTICE([Setting LIBPCREDIR to $LIBPCREDIR])
-        dnl USE_LIBPCRE can still be modified below, so don't substitute
+        dnl USE_LIBPCRE2 can still be modified below, so don't substitute
         dnl it yet.
 	GIT_CONF_SUBST([LIBPCREDIR])
     fi)
@@ -501,13 +541,11 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
 GIT_CONF_SUBST([NO_OPENSSL])
 
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. Various
-# commands such as log and grep offer runtime options to use
-# Perl-compatible regular expressions instead of standard or extended
-# POSIX regular expressions.
+# Handle the USE_LIBPCRE1 and USE_LIBPCRE2 options potentially set
+# above.
 #
 
-if test -n "$USE_LIBPCRE"; then
+if test -n "$USE_LIBPCRE1"; then
 
 GIT_STASH_FLAGS($LIBPCREDIR)
 
@@ -517,7 +555,22 @@ AC_CHECK_LIB([pcre], [pcre_version],
 
 GIT_UNSTASH_FLAGS($LIBPCREDIR)
 
-GIT_CONF_SUBST([USE_LIBPCRE])
+GIT_CONF_SUBST([USE_LIBPCRE1])
+
+fi
+
+
+if test -n "$USE_LIBPCRE2"; then
+
+GIT_STASH_FLAGS($LIBPCREDIR)
+
+AC_CHECK_LIB([pcre2-8], [pcre2_config_8],
+[USE_LIBPCRE2=YesPlease],
+[USE_LIBPCRE2=])
+
+GIT_UNSTASH_FLAGS($LIBPCREDIR)
+
+GIT_CONF_SUBST([USE_LIBPCRE2])
 
 fi
 
diff --git a/grep.c b/grep.c
index 3c0c30f033..569cf9e290 100644
--- a/grep.c
+++ b/grep.c
@@ -179,22 +179,36 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 	case GREP_PATTERN_TYPE_BRE:
 		opt->fixed = 0;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
 		opt->fixed = 0;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		opt->regflags |= REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_PCRE:
 		opt->fixed = 0;
+#ifdef USE_LIBPCRE2
+		opt->pcre1 = 0;
+		opt->pcre2 = 1;
+#else
+		/* It's important that pcre1 always be assigned to
+		 * even when there's no USE_LIBPCRE* defined. We still
+		 * call the PCRE stub function, it just dies with
+		 * "cannot use Perl-compatible regexes[...]".
+		 */
 		opt->pcre1 = 1;
+		opt->pcre2 = 0;
+#endif
 		break;
 	}
 }
@@ -446,6 +460,126 @@ static void free_pcre1_regexp(struct grep_pat *p)
 }
 #endif /* !USE_LIBPCRE1 */
 
+#ifdef USE_LIBPCRE2
+static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
+{
+	int error;
+	PCRE2_UCHAR errbuf[256];
+	PCRE2_SIZE erroffset;
+	int options = PCRE2_MULTILINE;
+	const uint8_t *character_tables = NULL;
+	uint32_t canjit;
+	int jitret;
+
+	assert(opt->pcre2);
+
+	p->pcre2_compile_context = NULL;
+
+	if (opt->ignore_case) {
+		if (has_non_ascii(p->pattern)) {
+			character_tables = pcre2_maketables(NULL);
+			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
+			pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
+		}
+		options |= PCRE2_CASELESS;
+	}
+	if (is_utf8_locale() && has_non_ascii(p->pattern))
+		options |= PCRE2_UTF;
+
+	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
+					 p->patternlen, options, &error, &erroffset,
+					 p->pcre2_compile_context);
+
+	if (p->pcre2_pattern) {
+		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
+		if (!p->pcre2_match_data)
+			die("BUG: Couldn't allocate PCRE2 match data");
+	} else {
+		pcre2_get_error_message(error, errbuf, sizeof(errbuf));
+		compile_regexp_failed(p, (const char *)&errbuf);
+	}
+
+	pcre2_config(PCRE2_CONFIG_JIT, &canjit);
+	if (canjit == 1) {
+		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
+		if (!jitret)
+			p->pcre2_jit_on = 1;
+		else
+			die("BUG: Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
+		p->pcre2_jit_stack = pcre2_jit_stack_create(1, 1024 * 1024, NULL);
+		if (!p->pcre2_jit_stack)
+			die("BUG: Couldn't allocate PCRE2 JIT stack");
+		p->pcre2_match_context = pcre2_match_context_create(NULL);
+		if (!p->pcre2_jit_stack)
+			die("BUG: Couldn't allocate PCRE2 match context");
+		pcre2_jit_stack_assign(p->pcre2_match_context, NULL, p->pcre2_jit_stack);
+	}
+}
+
+static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	int ret, flags = 0;
+	PCRE2_SIZE *ovector;
+	PCRE2_UCHAR errbuf[256];
+
+	if (eflags & REG_NOTBOL)
+		flags |= PCRE2_NOTBOL;
+
+	if (p->pcre2_jit_on)
+		ret = pcre2_jit_match(p->pcre2_pattern, (unsigned char *)line,
+				      eol - line, 0, flags, p->pcre2_match_data,
+				      NULL);
+	else
+		ret = pcre2_match(p->pcre2_pattern, (unsigned char *)line,
+				  eol - line, 0, flags, p->pcre2_match_data,
+				  NULL);
+
+	if (ret < 0 && ret != PCRE2_ERROR_NOMATCH) {
+		pcre2_get_error_message(ret, errbuf, sizeof(errbuf));
+		die("%s failed with error code %d: %s",
+		    (p->pcre2_jit_on ? "pcre2_jit_match" : "pcre2_match"), ret,
+		    errbuf);
+	}
+	if (ret > 0) {
+		ovector = pcre2_get_ovector_pointer(p->pcre2_match_data);
+		ret = 0;
+		match->rm_so = (int)ovector[0];
+		match->rm_eo = (int)ovector[1];
+	}
+
+	return ret;
+}
+
+static void free_pcre2_pattern(struct grep_pat *p)
+{
+	pcre2_compile_context_free(p->pcre2_compile_context);
+	pcre2_code_free(p->pcre2_pattern);
+	pcre2_match_data_free(p->pcre2_match_data);
+	pcre2_jit_stack_free(p->pcre2_jit_stack);
+	pcre2_match_context_free(p->pcre2_match_context);
+}
+#else /* !USE_LIBPCRE2 */
+static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
+{
+	/* Unreachable until USE_LIBPCRE2 becomes synonymous with
+	 * USE_LIBPCRE. See the sibling comment in
+	 * grep_set_pattern_type_option().
+	 */
+	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
+}
+
+static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	return 1;
+}
+
+static void free_pcre2_pattern(struct grep_pat *p)
+{
+}
+#endif /* !USE_LIBPCRE2 */
+
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -509,6 +643,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
+	if (opt->pcre2) {
+		compile_pcre2_pattern(p, opt);
+		return;
+	}
+
 	if (opt->pcre1) {
 		compile_pcre1_regexp(p, opt);
 		return;
@@ -868,6 +1007,8 @@ void free_grep_patterns(struct grep_opt *opt)
 				kwsfree(p->kws);
 			else if (p->pcre1_regexp)
 				free_pcre1_regexp(p);
+			else if (p->pcre2_pattern)
+				free_pcre2_pattern(p);
 			else
 				regfree(&p->regexp);
 			free(p->pattern);
@@ -948,6 +1089,8 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
 		hit = !fixmatch(p, line, eol, match);
 	else if (p->pcre1_regexp)
 		hit = !pcre1match(p, line, eol, match, eflags);
+	else if (p->pcre2_pattern)
+		hit = !pcre2match(p, line, eol, match, eflags);
 	else
 		hit = !regexec_buf(&p->regexp, line, eol - line, 1, match,
 				   eflags);
diff --git a/grep.h b/grep.h
index b7b9d487b0..b40afc2e2f 100644
--- a/grep.h
+++ b/grep.h
@@ -19,6 +19,16 @@ typedef int pcre;
 typedef int pcre_extra;
 typedef int pcre_jit_stack;
 #endif
+#ifdef USE_LIBPCRE2
+#define PCRE2_CODE_UNIT_WIDTH 8
+#include <pcre2.h>
+#else
+typedef int pcre2_code;
+typedef int pcre2_match_data;
+typedef int pcre2_compile_context;
+typedef int pcre2_match_context;
+typedef int pcre2_jit_stack;
+#endif
 #include "kwset.h"
 #include "thread-utils.h"
 #include "userdiff.h"
@@ -63,6 +73,12 @@ struct grep_pat {
 	pcre_jit_stack *pcre1_jit_stack;
 	const unsigned char *pcre1_tables;
 	int pcre1_jit_on;
+	pcre2_code *pcre2_pattern;
+	pcre2_match_data *pcre2_match_data;
+	pcre2_compile_context *pcre2_compile_context;
+	pcre2_match_context *pcre2_match_context;
+	pcre2_jit_stack *pcre2_jit_stack;
+	int pcre2_jit_on;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
@@ -126,6 +142,7 @@ struct grep_opt {
 	int extended;
 	int use_reflog_filter;
 	int pcre1;
+	int pcre2;
 	int relative;
 	int pathname;
 	int null_following_name;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 43529451f9..f5da636bea 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1015,7 +1015,7 @@ esac
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1$USE_LIBPCRE2" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.13.0.303.g4ebf302169


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function
  2017-05-20 21:42 ` [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function Ævar Arnfjörð Bjarmason
@ 2017-05-23 21:17   ` Brandon Williams
  0 siblings, 0 replies; 77+ messages in thread
From: Brandon Williams @ 2017-05-23 21:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen

On 05/20, Ævar Arnfjörð Bjarmason wrote:
> Factor the test for \0 in grep patterns into a function. Since commit
> 9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
> \0 is considered fixed as regcomp() can't handle it.
> 
> This change makes later changes that make use of either has_null() or
> is_fixed() (but not both) smaller.
> 
> While I'm at it make the comment conform to the style guide, i.e. add
> an opening "/*\n".
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  grep.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/grep.c b/grep.c
> index bf6c2494fd..79eb681c6e 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p,
>  	die("%s'%s': %s", where, p->pattern, error);
>  }
>  
> +static int has_null(const char *s, size_t len)
> +{
> +	/*
> +	 * regcomp cannot accept patterns with NULs so when using it
> +	 * we consider any pattern containing a NUL fixed.
> +	 */
> +	if (memchr(s, 0, len))
> +		return 1;
> +
> +	return 0;
> +}
> +
>  #ifdef USE_LIBPCRE
>  static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
>  {
> @@ -394,12 +406,6 @@ static int is_fixed(const char *s, size_t len)
>  {
>  	size_t i;
>  
> -	/* regcomp cannot accept patterns with NULs so we
> -	 * consider any pattern containing a NUL fixed.
> -	 */
> -	if (memchr(s, 0, len))
> -		return 1;
> -
>  	for (i = 0; i < len; i++) {
>  		if (is_regex_special(s[i]))
>  			return 0;
> @@ -451,7 +457,7 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  	 * simple string match using kws.  p->fixed tells us if we
>  	 * want to use kws.
>  	 */
> -	if (opt->fixed || is_fixed(p->pattern, p->patternlen))
> +	if (opt->fixed || has_null(p->pattern, p->patternlen) || is_fixed(p->pattern, p->patternlen))

small nit: longer than 80 char

>  		p->fixed = !icase || ascii_only;
>  	else
>  		p->fixed = 0;
> -- 
> 2.13.0.303.g4ebf302169
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-23 19:24   ` [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
@ 2017-05-24  4:42     ` Junio C Hamano
  2017-05-25 10:33       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2017-05-24  4:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Rather, it's just to make the code easier to reason about. It's
> confusing to debug this under threading & non-threading when the
> threading codepaths redundantly compile a pattern which is never used.
>
> The reason the patterns are recompiled is as a side-effect of
> duplicating the whole grep_opt structure, which is not thread safe,
> writable, and munged during execution. The grep_opt structure then
> points to the grep_pat structure where pattern or patterns are stored.
>
> I looked into e.g. splitting the API into some "do & alloc threadsafe
> stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
> execution time of grep_opt_dup() & pattern compilation is trivial
> compared to actually executing the grep, so there was no point. Even
> with the more expensive JIT changes to follow the most expensive PCRE
> patterns take something like 0.0X milliseconds to compile at most[1].

OK.

> The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
> --debug option to dump the parse tree", 2012-09-13) still works
> properly with this change. It only emits debugging info during pattern
> compilation, which is now dumped by the pattern compiled just before
> the first thread is started.

When opt is passed to run(), opt->debug is still true for the first
worker thread.  As long as opt->debug never makes difference after
compile_grep_patterns(opt) returns, I think the change in this patch
safe.  I do not know if we want to rely on it, but we can explain it
away by saying "we'll only debug the runtime behaviour for the first
worker only", or something, so it is not a big deal either way.

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/7] grep: skip pthreads overhead when using one thread
  2017-05-23 19:24   ` [PATCH v2 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
@ 2017-05-24  4:45     ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2017-05-24  4:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Skip the administrative overhead of using pthreads when only using one
> thread. Instead take the non-threaded path which would be taken under
> NO_PTHREADS.
>
> The threading support was initially added in commit
> 5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
> number of 8 threads. Later the number of threads was made configurable
> in commit 89f09dd34e ("grep: add --threads=<num> option and
> grep.threads configuration", 2015-12-15).
>
> That change did not add any special handling for --threads=1. Now we
> take a slightly faster path by skipping thread handling entirely when
> 1 thread is requested.

OK, this is what Peff and you were discussing in the earlier round,
having the controller do the work himself, instead of sitting and
waiting for a sole worker to finish the work.  Looks good.

Thanks.

>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/grep.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 12e62fcbf3..bd008cb100 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		num_threads = GREP_NUM_THREADS_DEFAULT;
>  	else if (num_threads < 0)
>  		die(_("invalid number of threads specified (%d)"), num_threads);
> +	if (num_threads == 1)
> +		num_threads = 0;
>  #else
>  	if (num_threads)
>  		warning(_("no threads support, ignoring --threads"));

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API
  2017-05-23 19:24   ` [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
@ 2017-05-24  5:17     ` Junio C Hamano
  2017-05-24  7:37       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2017-05-24  5:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/grep.c b/grep.c
> index 1157529115..49e9aed457 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
>  	const char *error;
>  	int erroffset;
>  	int options = PCRE_MULTILINE;
> +#ifdef PCRE_CONFIG_JIT
> +	int canjit;
> +#endif

Is "canjit" a property purely of the library (e.g. version and
compilation option), or of combination of that and nature of the
pattern, or something else like the memory pressure?

I am wondering if it is worth doing something like this:

	static int canjit = -1;
	if (canjit < 0) {
		pcre_config(PCRE_CONFIG_JIT, &canjit);
	}

if it depends purely on the library linked to the process.

> @@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
>  	if (!p->pcre1_regexp)
>  		compile_regexp_failed(p, error);
>  
> -	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
> +	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, PCRE_STUDY_JIT_COMPILE, &error);
>  	if (!p->pcre1_extra_info && error)
>  		die("%s", error);
> +
> +#ifdef PCRE_CONFIG_JIT
> +	pcre_config(PCRE_CONFIG_JIT, &canjit);
> +	if (canjit == 1) {
> +		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
> +		if (!p->pcre1_jit_stack)
> +			die("BUG: Couldn't allocate PCRE JIT stack");

I agree that dying is OK, but as far as I can tell, this is not a
BUG (there is no error a programmer can correct by a follow-up
patch); please do not mark it as such (it is likely that we'll later
do a tree-wide s/die("BUG:/BUG("/ and this will interfere).

> +		pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack);
> +		p->pcre1_jit_on = 1;

Contrary to what I wondered about "canjit" above, I think it makes
tons of sense to contain the "is JIT in use?" information in "struct
grep_pat" and not rely on any global state.  Not that we are likely
to want to be able to JIT some patterns while not doing others.  So
I agree with the design choice of adding pcre1_jit_on field to the
structure.

But then, wouldn't it make more sense to do all of the above without
the canjit variable at all?  i.e. something like...

        #ifdef PCRE_CONFIG_JIT
                pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
                if (p->pcre1_jit_on)
                        ... stack thing ...
        #else
                p->pcre1_jit_on = 0;
        #endif

> +#ifdef PCRE_CONFIG_JIT
> +	if (p->pcre1_jit_on) {
> +		pcre_free_study(p->pcre1_extra_info);
> +		pcre_jit_stack_free(p->pcre1_jit_stack);
> +	} else
> +#endif
> +	/* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
>  	pcre_free(p->pcre1_extra_info);
> +
>  	pcre_free((void *)p->pcre1_tables);

It is very thoughtful to add a blank line here (and you did the same
in another similar hunk), but I have a feeling that it is still a
bit too subtle a hint to signal to the readers that these two
pcre_free()s fire differently, i.e. the former does not fire if jit
is on but the latter always fires.

Would this be a bit safer while being not too ugly to live, I wonder?

        #ifdef PCRE_CONFIG_JIT
                if (p->pcre1_jit_on) {
                        pcre_free_study(p->pcre1_extra_info);
                        pcre_jit_stack_free(p->pcre1_jit_stack);
                } else
        #endif
                {
                        /* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
                        pcre_free(p->pcre1_extra_info);
                }
                pcre_free((void *)p->pcre1_tables);

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 5/7] grep: un-break building with PCRE < 8.32
  2017-05-23 19:24   ` [PATCH v2 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
@ 2017-05-24  6:00     ` Junio C Hamano
  2017-05-24  6:38       ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2017-05-24  6:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Amend my change earlier in this series ("grep: add support for the
> PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
> versions earlier than 8.32.
> ...
> So just take the easy way out and disable the JIT on any version older
> than 8.32.

The above were very understandable, but I had quite a hard time
parsing first sentence of the next paragraph, especially everything
after "because".  In the end I think I figured out what you wanted
to say (so you do not have to explain it to me in a response), but I
wish there were an easier-to-understand way to write the same thing.

> The reason this change isn't part of the initial change PCRE JIT
> support is because possibly slightly annoying someone who's bisecting
> with an ancient PCRE is worth it to have a cleaner history showing
> which parts of the implementation are only used for ancient PCRE
> versions. This also makes it easier to revert this change if we ever
> decide to stop supporting those old versions.
>
> 1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
>    native interface for JIT. Through this interface, the
>    compiled[...]")
> 2. https://bugs.exim.org/show_bug.cgi?id=2121
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  grep.c | 8 ++++----
>  grep.h | 5 +++++
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index 49e9aed457..3c0c30f033 100644
> --- a/grep.c
> +++ b/grep.c
> ...
> @@ -418,7 +418,7 @@ static void free_pcre1_regexp(struct grep_pat *p)
>  {
>  	pcre_free(p->pcre1_regexp);
>  
> -#ifdef PCRE_CONFIG_JIT
> +#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
>  	if (p->pcre1_jit_on) {
>  		pcre_free_study(p->pcre1_extra_info);
>  		pcre_jit_stack_free(p->pcre1_jit_stack);
> diff --git a/grep.h b/grep.h
> index 14f47189f9..73ef0ef8ec 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -3,6 +3,11 @@
>  #include "color.h"
>  #ifdef USE_LIBPCRE1
>  #include <pcre.h>
> +#ifdef PCRE_CONFIG_JIT
> +#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
> +#define GIT_PCRE1_CAN_DO_MODERN_JIT
> +#endif
> +#endif
>  #ifndef PCRE_STUDY_JIT_COMPILE
>  #define PCRE_STUDY_JIT_COMPILE 0
>  #endif

After reading the patch, I do not necessarily agree with your
pros-and-cons between keeping the patches as separate steps and
squashing them into one, though.  Even if this were squashed into
[PATCH 4/7], the logic to set GIT_PCRE1_CAN_DO_MODERN_JIT based on
PCRE_CONFIG_JIT and PCRE's version in this patch is well isolated to
a single place, and it is easy to spot what needs to be done when we
decide to lose the version-based GIT_PCRE1_CAN_DO_MODERN_JIT from
our code after making sure that nobody uses versions older than
8.32.

By the time we will make such a decision, it is likely that we no
longer remember if "#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT" we have in
our code at that moment in the future were all already present in
this patch.  An update to drop support for older PCRE is likely to
be done by finding the above part in grep.h to remove the
version-dependent part, while still keeping #ifdef based on
GIT_PCRE1_CAN_DO_MODERN_JIT in the *.c code.  Being able to revert
this patch does not help there.

We might also do a find-and-replace of GIT_PCRE1_CAN_DO_MODERN_JIT
to PCRE_CONFIG_JIT when we drop support for older PCRE, but I do not
think we would assume that it is sufficient to revert this patch
when we do so.  We may have added more #ifdef on the GIT_MODERN_JIT
symbol we now need to change to PCRE_CONFIG_JIT, so that will be
done by find-and-replace of the then-current code, not by reverting
this patch.

So in that sense, I do not think keeping them separate in practice
has the "makes it easier to revert this change" benefit.

If I were doing these two patches, I'd squash them together into
one, rename GIT_PCRE1_CAN_DO_MODERN_JIT to GIT_PCRE1_USE_JIT, and
explain in the log message why we turn it off for versions older
than 8.32 like you did in the log message for thsi patch.  

The reason for the "rename" is because I might also be tempted to
allow users of newer version to manually decline GIT_PCRE1_USE_JIT
in Makefile/config.mak; i.e. we may decide not to USE something even
if we CAN, and the #ifdef symbol you are using is about the decision
to USE or not USE, not necessarily if the library CAN.

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 7/7] grep: add support for PCRE v2
  2017-05-23 19:24   ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
@ 2017-05-24  6:23     ` Junio C Hamano
  2017-05-25  9:49       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2017-05-24  6:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add support for v2 of the PCRE API. This is a new major version of
> PCRE that came out in early 2015[1].
>
> The regular expression syntax is the same, but while the API is
> similar, pretty much every function is either renamed or takes
> different arguments. Thus using it via entirely new functions makes
> sense, as opposed to trying to e.g. have one compile_pcre_pattern()
> that would call either PCRE v1 or v2 functions.
>
> Git can now be compiled with either USE_LIBPCRE1=YesPlease or
> USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
> synonym for the former. Providing both is a compile-time error.
>
> With earlier patches to enable JIT for PCRE v1 the performance of the
> release versions of both libraries is almost exactly the same, with
> PCRE v2 being around 1% slower.
>
> However after I reported this to the pcre-dev mailing list[2] I got a
> lot of help with the API use from Zoltán Herczeg, he subsequently
> optimized some of the JIT functionality in v2 of the library.
>
> Running the p7820-grep-engines.sh performance test against the latest
> Subversion trunk of both, with both them and git compiled as -O3, and
> the test run against linux.git, gives the following results. Just the
> /perl/ tests shown:
>
>     $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh
>     [...]
>     Test                                           HEAD~2            HEAD~                    HEAD
>     ----------------------------------------------------------------------------------------------------------------
>     7820.3: perl grep 'how.to'                      0.22(0.40+0.48)   0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
>     7820.7: perl grep '^how to'                     0.27(0.62+0.50)   0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
>     7820.11: perl grep '[how] to'                   0.33(0.92+0.47)   0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
>     7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.35(1.08+0.46)   0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
>     7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.78+0.51)   0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%
>
> See commit ("perf: add a comparison test of grep regex engines",
> 2017-04-19) for details on the machine the above test run was executed
> on.
>
> Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
> JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
> mentioning p7820-grep-engines.sh for more details on the test setup.
>
> For ease of readability, a different run just of HEAD~ (PCRE v1 with
> JIT v.s. PCRE v2), again with just the /perl/ tests shown:
>
>     Test                                           HEAD~             HEAD
>     ---------------------------------------------------------------------------------------
>     7820.3: perl grep 'how.to'                      0.23(0.41+0.47)   0.23(0.26+0.59) +0.0%
>     7820.7: perl grep '^how to'                     0.27(0.64+0.47)   0.23(0.28+0.56) -14.8%
>     7820.11: perl grep '[how] to'                   0.34(0.95+0.44)   0.25(0.38+0.56) -26.5%
>     7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.34(1.07+0.46)   0.24(0.52+0.49) -29.4%
>     7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.81+0.46)   0.22(0.33+0.54) -26.7%
>
> I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
> when it does it's around 20% faster.
>
> A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
> the compiled pattern can be shared between threads, but not some of
> the JIT context, however the grep threading support does all pattern &
> JIT compilation in separate threads, so this code doesn't need to
> concern itself with thread safety.

Nicely explained.

> -# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
> +# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
> +# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
> +# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
> +# default in future releases.
> +#
> +# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
>  # /foo/bar/include and /foo/bar/lib directories.

As there is no way to use both, having a single LIBPCREDIR is not a
hurting limitation, which makes sense.

> @@ -2241,6 +2258,7 @@ GIT-BUILD-OPTIONS: FORCE
>  	@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
>  	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
>  	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+

Shouldn't the line above record $(USE_LIBPCRE1) instead of the
generic fallback?

> +	@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
>  	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
>  	@echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
>  	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+

> diff --git a/grep.c b/grep.c
> index 3c0c30f033..569cf9e290 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -179,22 +179,36 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
>  	case GREP_PATTERN_TYPE_BRE:
>  		opt->fixed = 0;
>  		opt->pcre1 = 0;
> +		opt->pcre2 = 0;
>  		break;
>  
>  	case GREP_PATTERN_TYPE_ERE:
>  		opt->fixed = 0;
>  		opt->pcre1 = 0;
> +		opt->pcre2 = 0;
>  		opt->regflags |= REG_EXTENDED;
>  		break;
>  
>  	case GREP_PATTERN_TYPE_FIXED:
>  		opt->fixed = 1;
>  		opt->pcre1 = 0;
> +		opt->pcre2 = 0;
>  		break;
>  
>  	case GREP_PATTERN_TYPE_PCRE:
>  		opt->fixed = 0;
> +#ifdef USE_LIBPCRE2
> +		opt->pcre1 = 0;
> +		opt->pcre2 = 1;
> +#else
> +		/* It's important that pcre1 always be assigned to
> +		 * even when there's no USE_LIBPCRE* defined. We still
> +		 * call the PCRE stub function, it just dies with
> +		 * "cannot use Perl-compatible regexes[...]".
> +		 */
>  		opt->pcre1 = 1;

Very well thought-out comment.  Our style wants you to have
slash-aster that opens a multi-line comment on its own line, though.

> +		opt->pcre2 = 0;
> +#endif
>  		break;
>  	}
>  }
> @@ -446,6 +460,126 @@ static void free_pcre1_regexp(struct grep_pat *p)
>  }
>  #endif /* !USE_LIBPCRE1 */
>  
> +#ifdef USE_LIBPCRE2
> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
> +{
> +...
> +	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
> +					 p->patternlen, options, &error, &erroffset,
> +					 p->pcre2_compile_context);

Are all die("BUG:...") in this function actual bugs, or just
"die()"?  Just like the comment on an earlier patch, things like
running out of memory that you as a Git programmer cannot fix by
correcting this code are not die("BUG:"), but normal runtime errors.

> +
> +	if (p->pcre2_pattern) {
> +		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
> +		if (!p->pcre2_match_data)
> +			die("BUG: Couldn't allocate PCRE2 match data");
> +	} else {
> +		pcre2_get_error_message(error, errbuf, sizeof(errbuf));
> +		compile_regexp_failed(p, (const char *)&errbuf);
> +	}
> +
> +	pcre2_config(PCRE2_CONFIG_JIT, &canjit);
> +	if (canjit == 1) {
> +		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
> +		if (!jitret)
> +			p->pcre2_jit_on = 1;

I think the same "would it be better to do this without canjit?"
comment applies here.

> +#else /* !USE_LIBPCRE2 */
> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
> +{
> +	/* Unreachable until USE_LIBPCRE2 becomes synonymous with
> +	 * USE_LIBPCRE. See the sibling comment in
> +	 * grep_set_pattern_type_option().
> +	 */
> +	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
> +}

Wow.  If I were doing this, I wouldn't have been this cautious, but
I have no complaints ;-).


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 5/7] grep: un-break building with PCRE < 8.32
  2017-05-24  6:00     ` Junio C Hamano
@ 2017-05-24  6:38       ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2017-05-24  6:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Jeffrey Walton, Michał Kiedrowicz, J Smith,
	Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams, Stefan Beller,
	Johannes Schindelin, Simon Ruderich

Junio C Hamano <gitster@pobox.com> writes:

> So in that sense, I do not think keeping them separate in practice
> has the "makes it easier to revert this change" benefit.
>
> If I were doing these two patches, I'd squash them together into
> one, rename GIT_PCRE1_CAN_DO_MODERN_JIT to GIT_PCRE1_USE_JIT, and
> explain in the log message why we turn it off for versions older
> than 8.32 like you did in the log message for thsi patch.  
>
> The reason for the "rename" is because I might also be tempted to
> allow users of newer version to manually decline GIT_PCRE1_USE_JIT
> in Makefile/config.mak; i.e. we may decide not to USE something even
> if we CAN, and the #ifdef symbol you are using is about the decision
> to USE or not USE, not necessarily if the library CAN.

Need to say a few things that I forgot to mention.

I said that I do not see practical benefit for keeping the patches
separate.  But at the same time, I do not see it a huge problem that
such a "main one that is partly broken" followed by "fix for one
minority" followed by "fix for another minority" pattern causes to
bisection.  So I'd be OK either way.

Especially, I'd be more than OK if the "main one that is partly broken"
says "by the way, this is (deliberately) left broken for two cases,
and if you hit this during your bisection, do not answer good or bad
and instead reset to a few commits newer that has both fixes" in its
log message.  Then there is no downside in bisection.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API
  2017-05-24  5:17     ` Junio C Hamano
@ 2017-05-24  7:37       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-24  7:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams, Stefan Beller, Johannes Schindelin,
	Simon Ruderich

On Wed, May 24, 2017 at 7:17 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> diff --git a/grep.c b/grep.c
>> index 1157529115..49e9aed457 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
>>       const char *error;
>>       int erroffset;
>>       int options = PCRE_MULTILINE;
>> +#ifdef PCRE_CONFIG_JIT
>> +     int canjit;
>> +#endif
>
> Is "canjit" a property purely of the library (e.g. version and
> compilation option), or of combination of that and nature of the
> pattern, or something else like the memory pressure?
>
> I am wondering if it is worth doing something like this:
>
>         static int canjit = -1;
>         if (canjit < 0) {
>                 pcre_config(PCRE_CONFIG_JIT, &canjit);
>         }
>
> if it depends purely on the library linked to the process.

It purely depends on how the the library, was compiled. I just wrote
it like that because compiling the pattern is not a hot codepath (i.e.
we call this max 8 or so times or so, whereas exec will be called
thousands/millions/billions of times), so trying to avoid calling this
trivial function seemed pointless.

But looking at this again it would be simpler to combine what you're
suggesting with just passing a pointer to *.pcre[12]_jit_on directly,
skipping the canjit variables.

>> @@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
>>       if (!p->pcre1_regexp)
>>               compile_regexp_failed(p, error);
>>
>> -     p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
>> +     p->pcre1_extra_info = pcre_study(p->pcre1_regexp, PCRE_STUDY_JIT_COMPILE, &error);
>>       if (!p->pcre1_extra_info && error)
>>               die("%s", error);
>> +
>> +#ifdef PCRE_CONFIG_JIT
>> +     pcre_config(PCRE_CONFIG_JIT, &canjit);
>> +     if (canjit == 1) {
>> +             p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
>> +             if (!p->pcre1_jit_stack)
>> +                     die("BUG: Couldn't allocate PCRE JIT stack");
>
> I agree that dying is OK, but as far as I can tell, this is not a
> BUG (there is no error a programmer can correct by a follow-up
> patch); please do not mark it as such (it is likely that we'll later
> do a tree-wide s/die("BUG:/BUG("/ and this will interfere).

Makes sense. Looks like the convention for this sort of thing is to
just do s/BUG: //, e.g. the code in wrapper.c does that.

>> +             pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack);
>> +             p->pcre1_jit_on = 1;
>
> Contrary to what I wondered about "canjit" above, I think it makes
> tons of sense to contain the "is JIT in use?" information in "struct
> grep_pat" and not rely on any global state.  Not that we are likely
> to want to be able to JIT some patterns while not doing others.  So
> I agree with the design choice of adding pcre1_jit_on field to the
> structure.
>
> But then, wouldn't it make more sense to do all of the above without
> the canjit variable at all?  i.e. something like...
>
>         #ifdef PCRE_CONFIG_JIT
>                 pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
>                 if (p->pcre1_jit_on)
>                         ... stack thing ...
>         #else
>                 p->pcre1_jit_on = 0;
>         #endif

*Nod*

>> +#ifdef PCRE_CONFIG_JIT
>> +     if (p->pcre1_jit_on) {
>> +             pcre_free_study(p->pcre1_extra_info);
>> +             pcre_jit_stack_free(p->pcre1_jit_stack);
>> +     } else
>> +#endif
>> +     /* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
>>       pcre_free(p->pcre1_extra_info);
>> +
>>       pcre_free((void *)p->pcre1_tables);
>
> It is very thoughtful to add a blank line here (and you did the same
> in another similar hunk), but I have a feeling that it is still a
> bit too subtle a hint to signal to the readers that these two
> pcre_free()s fire differently, i.e. the former does not fire if jit
> is on but the latter always fires.
>
> Would this be a bit safer while being not too ugly to live, I wonder?
>
>         #ifdef PCRE_CONFIG_JIT
>                 if (p->pcre1_jit_on) {
>                         pcre_free_study(p->pcre1_extra_info);
>                         pcre_jit_stack_free(p->pcre1_jit_stack);
>                 } else
>         #endif
>                 {
>                         /* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
>                         pcre_free(p->pcre1_extra_info);
>                 }
>                 pcre_free((void *)p->pcre1_tables);
>

Makes sense. I'll change it.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 7/7] grep: add support for PCRE v2
  2017-05-24  6:23     ` Junio C Hamano
@ 2017-05-25  9:49       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-25  9:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams, Stefan Beller, Johannes Schindelin,
	Simon Ruderich

On Wed, May 24, 2017 at 8:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Add support for v2 of the PCRE API. This is a new major version of
>> PCRE that came out in early 2015[1].
>>
>> The regular expression syntax is the same, but while the API is
>> similar, pretty much every function is either renamed or takes
>> different arguments. Thus using it via entirely new functions makes
>> sense, as opposed to trying to e.g. have one compile_pcre_pattern()
>> that would call either PCRE v1 or v2 functions.
>>
>> Git can now be compiled with either USE_LIBPCRE1=YesPlease or
>> USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
>> synonym for the former. Providing both is a compile-time error.
>>
>> With earlier patches to enable JIT for PCRE v1 the performance of the
>> release versions of both libraries is almost exactly the same, with
>> PCRE v2 being around 1% slower.
>>
>> However after I reported this to the pcre-dev mailing list[2] I got a
>> lot of help with the API use from Zoltán Herczeg, he subsequently
>> optimized some of the JIT functionality in v2 of the library.
>>
>> Running the p7820-grep-engines.sh performance test against the latest
>> Subversion trunk of both, with both them and git compiled as -O3, and
>> the test run against linux.git, gives the following results. Just the
>> /perl/ tests shown:
>>
>>     $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh
>>     [...]
>>     Test                                           HEAD~2            HEAD~                    HEAD
>>     ----------------------------------------------------------------------------------------------------------------
>>     7820.3: perl grep 'how.to'                      0.22(0.40+0.48)   0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
>>     7820.7: perl grep '^how to'                     0.27(0.62+0.50)   0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
>>     7820.11: perl grep '[how] to'                   0.33(0.92+0.47)   0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
>>     7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.35(1.08+0.46)   0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
>>     7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.78+0.51)   0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%
>>
>> See commit ("perf: add a comparison test of grep regex engines",
>> 2017-04-19) for details on the machine the above test run was executed
>> on.
>>
>> Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
>> JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
>> mentioning p7820-grep-engines.sh for more details on the test setup.
>>
>> For ease of readability, a different run just of HEAD~ (PCRE v1 with
>> JIT v.s. PCRE v2), again with just the /perl/ tests shown:
>>
>>     Test                                           HEAD~             HEAD
>>     ---------------------------------------------------------------------------------------
>>     7820.3: perl grep 'how.to'                      0.23(0.41+0.47)   0.23(0.26+0.59) +0.0%
>>     7820.7: perl grep '^how to'                     0.27(0.64+0.47)   0.23(0.28+0.56) -14.8%
>>     7820.11: perl grep '[how] to'                   0.34(0.95+0.44)   0.25(0.38+0.56) -26.5%
>>     7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.34(1.07+0.46)   0.24(0.52+0.49) -29.4%
>>     7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.81+0.46)   0.22(0.33+0.54) -26.7%
>>
>> I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
>> when it does it's around 20% faster.
>>
>> A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
>> the compiled pattern can be shared between threads, but not some of
>> the JIT context, however the grep threading support does all pattern &
>> JIT compilation in separate threads, so this code doesn't need to
>> concern itself with thread safety.
>
> Nicely explained.
>
>> -# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
>> +# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
>> +# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
>> +# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
>> +# default in future releases.
>> +#
>> +# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
>>  # /foo/bar/include and /foo/bar/lib directories.
>
> As there is no way to use both, having a single LIBPCREDIR is not a
> hurting limitation, which makes sense.

Will nevertheless add a comment to clarify this.

>> @@ -2241,6 +2258,7 @@ GIT-BUILD-OPTIONS: FORCE
>>       @echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
>>       @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
>>       @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
>
> Shouldn't the line above record $(USE_LIBPCRE1) instead of the
> generic fallback?

Yes, will fix.

>> +     @echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
>>       @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
>>       @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
>>       @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
>
>> diff --git a/grep.c b/grep.c
>> index 3c0c30f033..569cf9e290 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -179,22 +179,36 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
>>       case GREP_PATTERN_TYPE_BRE:
>>               opt->fixed = 0;
>>               opt->pcre1 = 0;
>> +             opt->pcre2 = 0;
>>               break;
>>
>>       case GREP_PATTERN_TYPE_ERE:
>>               opt->fixed = 0;
>>               opt->pcre1 = 0;
>> +             opt->pcre2 = 0;
>>               opt->regflags |= REG_EXTENDED;
>>               break;
>>
>>       case GREP_PATTERN_TYPE_FIXED:
>>               opt->fixed = 1;
>>               opt->pcre1 = 0;
>> +             opt->pcre2 = 0;
>>               break;
>>
>>       case GREP_PATTERN_TYPE_PCRE:
>>               opt->fixed = 0;
>> +#ifdef USE_LIBPCRE2
>> +             opt->pcre1 = 0;
>> +             opt->pcre2 = 1;
>> +#else
>> +             /* It's important that pcre1 always be assigned to
>> +              * even when there's no USE_LIBPCRE* defined. We still
>> +              * call the PCRE stub function, it just dies with
>> +              * "cannot use Perl-compatible regexes[...]".
>> +              */
>>               opt->pcre1 = 1;
>
> Very well thought-out comment.  Our style wants you to have
> slash-aster that opens a multi-line comment on its own line, though.
Will fix.
>> +             opt->pcre2 = 0;
>> +#endif
>>               break;
>>       }
>>  }
>> @@ -446,6 +460,126 @@ static void free_pcre1_regexp(struct grep_pat *p)
>>  }
>>  #endif /* !USE_LIBPCRE1 */
>>
>> +#ifdef USE_LIBPCRE2
>> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
>> +{
>> +...
>> +     p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
>> +                                      p->patternlen, options, &error, &erroffset,
>> +                                      p->pcre2_compile_context);
>
> Are all die("BUG:...") in this function actual bugs, or just
> "die()"?  Just like the comment on an earlier patch, things like
> running out of memory that you as a Git programmer cannot fix by
> correcting this code are not die("BUG:"), but normal runtime errors.
Will fix these.
>> +
>> +     if (p->pcre2_pattern) {
>> +             p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
>> +             if (!p->pcre2_match_data)
>> +                     die("BUG: Couldn't allocate PCRE2 match data");
>> +     } else {
>> +             pcre2_get_error_message(error, errbuf, sizeof(errbuf));
>> +             compile_regexp_failed(p, (const char *)&errbuf);
>> +     }
>> +
>> +     pcre2_config(PCRE2_CONFIG_JIT, &canjit);
>> +     if (canjit == 1) {
>> +             jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
>> +             if (!jitret)
>> +                     p->pcre2_jit_on = 1;
>
> I think the same "would it be better to do this without canjit?"
> comment applies here.
Yup, changed.
>> +#else /* !USE_LIBPCRE2 */
>> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
>> +{
>> +     /* Unreachable until USE_LIBPCRE2 becomes synonymous with
>> +      * USE_LIBPCRE. See the sibling comment in
>> +      * grep_set_pattern_type_option().
>> +      */
>> +     die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
>> +}
>
> Wow.  If I were doing this, I wouldn't have been this cautious, but
> I have no complaints ;-).
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-24  4:42     ` Junio C Hamano
@ 2017-05-25 10:33       ` Ævar Arnfjörð Bjarmason
  2017-05-26  0:58         ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-25 10:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams, Stefan Beller, Johannes Schindelin,
	Simon Ruderich

On Wed, May 24, 2017 at 6:42 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Rather, it's just to make the code easier to reason about. It's
>> confusing to debug this under threading & non-threading when the
>> threading codepaths redundantly compile a pattern which is never used.
>>
>> The reason the patterns are recompiled is as a side-effect of
>> duplicating the whole grep_opt structure, which is not thread safe,
>> writable, and munged during execution. The grep_opt structure then
>> points to the grep_pat structure where pattern or patterns are stored.
>>
>> I looked into e.g. splitting the API into some "do & alloc threadsafe
>> stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
>> execution time of grep_opt_dup() & pattern compilation is trivial
>> compared to actually executing the grep, so there was no point. Even
>> with the more expensive JIT changes to follow the most expensive PCRE
>> patterns take something like 0.0X milliseconds to compile at most[1].
>
> OK.
>
>> The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
>> --debug option to dump the parse tree", 2012-09-13) still works
>> properly with this change. It only emits debugging info during pattern
>> compilation, which is now dumped by the pattern compiled just before
>> the first thread is started.
>
> When opt is passed to run(), opt->debug is still true for the first
> worker thread.  As long as opt->debug never makes difference after
> compile_grep_patterns(opt) returns, I think the change in this patch
> safe.

Right, the --debug feature only impacts pattern compilation.

> I do not know if we want to rely on it, but we can explain it
> away by saying "we'll only debug the runtime behaviour for the first
> worker only", or something, so it is not a big deal either way.

I think it's a pointless distraction to start speculating in this
commit message what we're going to do with --debug it if it ever
starts emitting some debugging information at pattern execution time.

As an aside, I'd very much like to remove both --debug and the
--and/--or/--all-match, gives some very rough edges in the UI and how
easy it is to make that feature error or segfault, I suspect you might
be the only one using it.

There are pattern matching optimizations I'd like to do that are much
more of a pain with that feature around. It's easy to AND multiple
regexes together into one match via -e, but when you have to deal with
negation and arbitrarily complex chained & parenthesized  AND/OR you
end up having to run your custom state machine on every line with
multiple regex matches per line.

The system grep doesn't have this feature, and people seem to do
without it. The motivation for it isn't explained in commit 79d3696cfb
("git-grep: boolean expression on pattern matching.", 2006-06-30), but
I suspect it's a hack around not being able to do "git grep ... | git
grep -v ...", which is how you'd do "I'd like to match this, but not
that" with the system grep.

Just supporting that would be much easier than supporting the and/or
matching machinery.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-25 10:33       ` Ævar Arnfjörð Bjarmason
@ 2017-05-26  0:58         ` Junio C Hamano
  2017-05-26  8:06           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2017-05-26  0:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams, Stefan Beller, Johannes Schindelin,
	Simon Ruderich

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think it's a pointless distraction to start speculating in this
> commit message what we're going to do with --debug it if it ever
> starts emitting some debugging information at pattern execution time.

OK.

> As an aside, I'd very much like to remove both --debug and the
> --and/--or/--all-match, gives some very rough edges in the UI and how
> easy it is to make that feature error or segfault, I suspect you might
> be the only one using it.

I agree that rewriting "grep -e A -e B" to "grep -e A|B" as an
optimization is an interesting possibility to look into, and I can
understand that having to support "--and" and "--not" would
make such an optimization harder to implement. "-e A --and -e B"
must become "-e A.*B|B.*A" and as you get more terms your unified
pattern will grow combinatorial, at which point you would be better
off matching N patterns and combining the result.

Ever saw a user run "ps | grep rogue | grep -v grep" to find a rogue
process to kill?  That would not work if the rogue process's command
line has a word "grep".  Because "git grep" is often run on files in
order to find the location the patterns appear in, "git grep -e
pattern | grep -v unwanted" shares the same issue--the unwanted
pattern may appear in the filename, and the downstream "grep -v" may
filter out a valid hit.  This is why "--not" exists [*1*].  I agree
that emulating it within the same "concatenate patterns into one"
optimization you are envisioning may be hard.

Attempting to optimize "--all-match" would share similar difficulty
with "--and", but your matching now must be done with the entire
buffer and not go line-by-line.  It was meant to make it possible to
say "find commits that avarab@ talks about both regex and log", i.e.

	$ git log --author=avarab@ --all-match --grep=log --grep=regex

This is not something you can emulate by piping an output of grep to
another grep.

But none of the above means you have to give up optimizing.  

You can choose not to combine them into a single pattern if certain
constructions are hard, and do only the easy ones.  If you think
that harder combinations are not used very often, the result would
be faster for many cases while not losing useful features, which is
what we want.


[Footnote]

*1* For human consumption, lack of "--not" may not hurt in the sense
    that there are workarounds (i.e. you can do without "| grep -v
    unwanted" and filter irrelevant ones by eyeballing).  But it is
    essential while scripting and trying to be precise.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-26  0:58         ` Junio C Hamano
@ 2017-05-26  8:06           ` Ævar Arnfjörð Bjarmason
  2017-05-26  9:49             ` Junio C Hamano
  0 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-26  8:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams, Stefan Beller, Johannes Schindelin,
	Simon Ruderich

On Fri, May 26, 2017 at 2:58 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I think it's a pointless distraction to start speculating in this
>> commit message what we're going to do with --debug it if it ever
>> starts emitting some debugging information at pattern execution time.
>
> OK.
>
>> As an aside, I'd very much like to remove both --debug and the
>> --and/--or/--all-match, gives some very rough edges in the UI and how
>> easy it is to make that feature error or segfault, I suspect you might
>> be the only one using it.
>
> I agree that rewriting "grep -e A -e B" to "grep -e A|B" as an
> optimization is an interesting possibility to look into, and I can
> understand that having to support "--and" and "--not" would
> make such an optimization harder to implement. "-e A --and -e B"
> must become "-e A.*B|B.*A" and as you get more terms your unified
> pattern will grow combinatorial, at which point you would be better
> off matching N patterns and combining the result.
>
> Ever saw a user run "ps | grep rogue | grep -v grep" to find a rogue
> process to kill?  That would not work if the rogue process's command
> line has a word "grep".  Because "git grep" is often run on files in
> order to find the location the patterns appear in, "git grep -e
> pattern | grep -v unwanted" shares the same issue--the unwanted
> pattern may appear in the filename, and the downstream "grep -v" may
> filter out a valid hit.  This is why "--not" exists [*1*].  I agree
> that emulating it within the same "concatenate patterns into one"
> optimization you are envisioning may be hard.
>
> Attempting to optimize "--all-match" would share similar difficulty
> with "--and", but your matching now must be done with the entire
> buffer and not go line-by-line.  It was meant to make it possible to
> say "find commits that avarab@ talks about both regex and log", i.e.
>
>         $ git log --author=avarab@ --all-match --grep=log --grep=regex
>
> This is not something you can emulate by piping an output of grep to
> another grep.
>
> But none of the above means you have to give up optimizing.
>
> You can choose not to combine them into a single pattern if certain
> constructions are hard, and do only the easy ones.  If you think
> that harder combinations are not used very often, the result would
> be faster for many cases while not losing useful features, which is
> what we want.

To be clear the point of my mail was not to say "I can't think of a
way to support both of these things, help!", obviously we can continue
to maintain two codepaths. The point was to raise the idea that we
could simply remove the more complex & doomed to forever be slow
codepath.

Obviously there are caveats with the likes of "grep foo | grep bar"
that don't exist with "grep -e foo --and -e bar". I'm less interested
in whether we can come up with cases that wouldn't be possible if this
were removed, than if anyone's using them in practice.

I suspect that to the extent anyone uses this for common things it
could be emulated by --single-line --perl-regexp and e.g. 'foo.*bar'
instead of 'foo' --and 'bar'. I.e. we could offer to AND together your
regexes and match them over the entire content.

If someone needed something more complex we could just show an example
of piping e.g. \0-delimited commit messages into an arbitrary perl
script you provide.

Anyway, I've only looked this over a tiny bit, and I don't know
whether it's worth it to remove this, right now I was just interested
in some reports of what it was used for. I.e. whether anyone uses it
for N-level deep mixed AND/OR branches, or whether it's really just a
lazy way to concat regexes and get around the current limitation of
not being able to match across lines.

> [Footnote]
>
> *1* For human consumption, lack of "--not" may not hurt in the sense
>     that there are workarounds (i.e. you can do without "| grep -v
>     unwanted" and filter irrelevant ones by eyeballing).  But it is
>     essential while scripting and trying to be precise.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-26  8:06           ` Ævar Arnfjörð Bjarmason
@ 2017-05-26  9:49             ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2017-05-26  9:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen,
	Brandon Williams, Stefan Beller, Johannes Schindelin,
	Simon Ruderich

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> To be clear the point of my mail was not to say "I can't think of a
> way to support both of these things, help!", obviously we can continue
> to maintain two codepaths. The point was to raise the idea that we
> could simply remove the more complex & doomed to forever be slow
> codepath.

To be clear, the point of my response was that these features must
remain.  As long as they are more convenient than sifting through
output produced by pattern matching engine that is less powerful
(which forces the user to give wider pattern than desired, to avoid
false negatives) with eyeball, having to match each pattern one by
one, instead of being able to use a combined and more efficient
single pattern, is still more efficient for the end user, which is
the point of using a computer.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 0/5] grep: remove redundant code & reflags from API
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
  2017-05-22  0:17       ` Junio C Hamano
@ 2017-06-28 21:58       ` Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 0/6] " Ævar Arnfjörð Bjarmason
                           ` (6 more replies)
  2017-06-28 21:58       ` [PATCH 1/5] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  6 siblings, 7 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-28 21:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Here's a follow-up to a small tangent of discussion in my ~30 patch
grep cleanup series.

There are no functional changes here, just getting rid of dead code,
and removing the POSIX `regflags` variable from the grep API used by
grep/log, which was the cause of the long-standing bug of "-i" not
working with PCRE when used via git-log.

Ævar Arnfjörð Bjarmason (5):
  grep: remove redundant double assignment to 0
  grep: remove redundant grep pattern type assignment
  grep: remove redundant "fixed" field re-assignment to 0
  grep: remove redundant and verbose re-assignments to 0
  grep: remove regflags from the public grep_opt API

 builtin/grep.c |  2 --
 grep.c         | 59 +++++++++++++++++++++++++++++++++-------------------------
 grep.h         |  1 -
 revision.c     |  2 --
 4 files changed, 34 insertions(+), 30 deletions(-)

-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 1/5] grep: remove redundant double assignment to 0
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
  2017-05-22  0:17       ` Junio C Hamano
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
@ 2017-06-28 21:58       ` Ævar Arnfjörð Bjarmason
  2017-06-28 21:58       ` [PATCH 2/5] grep: remove redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-28 21:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Stop assigning 0 to the extended_regexp_option field right after we've
zeroed out the entire struct with memset() just a few lines earlier.

Unlike some of the code being refactored in subsequent commits, this
was always completely redundant. See the original code introduced in
84befcd0a4 ("grep: add a grep.patternType configuration setting",
2012-08-03).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/grep.c b/grep.c
index 98733db623..29439886e7 100644
--- a/grep.c
+++ b/grep.c
@@ -38,7 +38,6 @@ void init_grep_defaults(void)
 	opt->regflags = REG_NEWLINE;
 	opt->max_depth = -1;
 	opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
-	opt->extended_regexp_option = 0;
 	color_set(opt->color_context, "");
 	color_set(opt->color_filename, "");
 	color_set(opt->color_function, "");
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 2/5] grep: remove redundant grep pattern type assignment
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
                         ` (2 preceding siblings ...)
  2017-06-28 21:58       ` [PATCH 1/5] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
@ 2017-06-28 21:58       ` Ævar Arnfjörð Bjarmason
  2017-06-29 17:03         ` Stefan Beller
  2017-06-28 21:58       ` [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  6 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-28 21:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Remove a redundant assignment to extended_regexp_option to make it
zero if grep.extendedRegexp is not set. This is always called right
after init_grep_defaults() which memsets the entire structure to 0.

This is a logical follow-up to my commit to remove redundant regflags
assignments[1]. This logic was originally introduced in [2], but as
explained in the former commit it's working around a pattern in our
code that no longer exists, and is now confusing as it leads the
reader to think that this needs to be flipped back & forth.

1. e0b9f8ae09 ("grep: remove redundant regflags assignments",
   2017-05-25)
2. b22520a37c ("grep: allow -E and -n to be turned on by default via
   configuration", 2011-03-30)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/grep.c b/grep.c
index 29439886e7..6614042fdc 100644
--- a/grep.c
+++ b/grep.c
@@ -80,8 +80,6 @@ int grep_config(const char *var, const char *value, void *cb)
 	if (!strcmp(var, "grep.extendedregexp")) {
 		if (git_config_bool(var, value))
 			opt->extended_regexp_option = 1;
-		else
-			opt->extended_regexp_option = 0;
 		return 0;
 	}
 
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
                         ` (3 preceding siblings ...)
  2017-06-28 21:58       ` [PATCH 2/5] grep: remove redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
@ 2017-06-28 21:58       ` Ævar Arnfjörð Bjarmason
  2017-06-29 17:10         ` Stefan Beller
  2017-06-28 21:58       ` [PATCH 4/5] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
  2017-06-28 21:58       ` [PATCH 5/5] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
  6 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-28 21:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Remove the redundant re-assignment of the fixed field to zero right
after the entire struct has been set to zero via memset(...).

Unlike some nearby commits this pattern doesn't date back to the
pattern described in e0b9f8ae09 ("grep: remove redundant regflags
assignments", 2017-05-25), instead it was apparently cargo-culted in
9eceddeec6 ("Use kwset in grep", 2011-08-21).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/grep.c b/grep.c
index 6614042fdc..7cd8a6512f 100644
--- a/grep.c
+++ b/grep.c
@@ -627,8 +627,6 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	    has_null(p->pattern, p->patternlen) ||
 	    is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
-	else
-		p->fixed = 0;
 
 	if (p->fixed) {
 		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 4/5] grep: remove redundant and verbose re-assignments to 0
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
                         ` (4 preceding siblings ...)
  2017-06-28 21:58       ` [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
@ 2017-06-28 21:58       ` Ævar Arnfjörð Bjarmason
  2017-06-28 21:58       ` [PATCH 5/5] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-28 21:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Remove the redundant re-assignments of the fixed/pcre1/pcre2 fields to
zero right after the entire struct has been set to zero via
memset(...).

See an earlier related cleanup commit e0b9f8ae09 ("grep: remove
redundant regflags assignments", 2017-05-25) for an explanation of why
the code was structured like this to begin with.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/grep.c b/grep.c
index 7cd8a6512f..736e1e00d6 100644
--- a/grep.c
+++ b/grep.c
@@ -175,28 +175,18 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 		/* fall through */
 
 	case GREP_PATTERN_TYPE_BRE:
-		opt->fixed = 0;
-		opt->pcre1 = 0;
-		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
-		opt->fixed = 0;
-		opt->pcre1 = 0;
-		opt->pcre2 = 0;
 		opt->regflags |= REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
-		opt->pcre1 = 0;
-		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_PCRE:
-		opt->fixed = 0;
 #ifdef USE_LIBPCRE2
-		opt->pcre1 = 0;
 		opt->pcre2 = 1;
 #else
 		/*
@@ -206,7 +196,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 		 * "cannot use Perl-compatible regexes[...]".
 		 */
 		opt->pcre1 = 1;
-		opt->pcre2 = 0;
 #endif
 		break;
 	}
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 5/5] grep: remove regflags from the public grep_opt API
  2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
                         ` (5 preceding siblings ...)
  2017-06-28 21:58       ` [PATCH 4/5] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
@ 2017-06-28 21:58       ` Ævar Arnfjörð Bjarmason
  2017-06-29 17:43         ` Stefan Beller
  6 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-28 21:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Refactor calls to the grep machinery to always pass opt.ignore_case &
opt.extended_regexp_option instead of setting the equivalent regflags
bits.

The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
really just plastering over the code smell which this change fixes.

See my "Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with
--perl-regexp"[1] for the discussion leading up to this.

The reason for adding the extensive commentary here is that I
discovered some subtle complexity in implementing this that really
should be called out explicitly to future readers.

Before this change we'd rely on the difference between
`extended_regexp_option` and `regflags` to serve as a membrane between
our preliminary parsing of grep.extendedRegexp and grep.patternType,
and what we decided to do internally.

Now that those two are the same thing, it's necessary to unset
`extended_regexp_option` just before we commit in cases where both of
those config variables are set. See 84befcd0a4 ("grep: add a
grep.patternType configuration setting", 2012-08-03) for the code and
documentation related to that.

The explanation of why the if/else branches in
grep_commit_pattern_type() are ordered the way they are exists in that
commit message, but I think it's worth calling this subtlety out
explicitly with a comment for future readers.

Unrelated to that: I could have factored out the default REG_NEWLINE
flag into some custom GIT_GREP_H_DEFAULT_REGFLAGS or something, but
since it's just used in two places I didn't think it was worth the
effort.

As an aside we're really lacking test coverage regflags being
initiated as 0 instead of as REG_NEWLINE. Tests will fail if it's
removed from compile_regexp(), but not if it's removed from
compile_fixed_regexp(). I have not dug to see if it's actually needed
in the latter case or if the test coverage is lacking.

1. <CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com>
   (https://public-inbox.org/git/CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c |  2 --
 grep.c         | 43 ++++++++++++++++++++++++++++++++++---------
 grep.h         |  1 -
 revision.c     |  2 --
 4 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index f61a9d938b..b682966439 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1169,8 +1169,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 
 	if (!opt.pattern_list)
 		die(_("no pattern given."));
-	if (!opt.fixed && opt.ignore_case)
-		opt.regflags |= REG_ICASE;
 
 	/*
 	 * We have to find "--" in a separate pass, because its presence
diff --git a/grep.c b/grep.c
index 736e1e00d6..51aaad9f03 100644
--- a/grep.c
+++ b/grep.c
@@ -35,7 +35,6 @@ void init_grep_defaults(void)
 	memset(opt, 0, sizeof(*opt));
 	opt->relative = 1;
 	opt->pathname = 1;
-	opt->regflags = REG_NEWLINE;
 	opt->max_depth = -1;
 	opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
 	color_set(opt->color_context, "");
@@ -154,7 +153,6 @@ void grep_init(struct grep_opt *opt, const char *prefix)
 	opt->linenum = def->linenum;
 	opt->max_depth = def->max_depth;
 	opt->pathname = def->pathname;
-	opt->regflags = def->regflags;
 	opt->relative = def->relative;
 	opt->output = def->output;
 
@@ -170,6 +168,24 @@ void grep_init(struct grep_opt *opt, const char *prefix)
 
 static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
 {
+	/*
+	 * When committing to the pattern type by setting the relevant
+	 * fields in grep_opt it's generally not necessary to zero out
+	 * the fields we're not choosing, since they won't have been
+	 * set by anything. The extended_regexp_option field is the
+	 * only exception to this.
+	 *
+	 * This is because in the process of parsing grep.patternType
+	 * & grep.extendedRegexp we set opt->pattern_type_option and
+	 * opt->extended_regexp_option, respectively. We then
+	 * internally use opt->extended_regexp_option to see if we're
+	 * compiling an ERE. It must be unset if that's not actually
+	 * the case.
+	 */
+	if (pattern_type != GREP_PATTERN_TYPE_ERE &&
+	    opt->extended_regexp_option)
+		opt->extended_regexp_option = 0;
+
 	switch (pattern_type) {
 	case GREP_PATTERN_TYPE_UNSPECIFIED:
 		/* fall through */
@@ -178,7 +194,7 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
-		opt->regflags |= REG_EXTENDED;
+		opt->extended_regexp_option = 1;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
@@ -208,6 +224,11 @@ void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o
 	else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
 		grep_set_pattern_type_option(opt->pattern_type_option, opt);
 	else if (opt->extended_regexp_option)
+		/*
+		 * This branch *must* happen after setting from the
+		 * opt->pattern_type_option above, we don't want
+		 * grep.extendedRegexp to override grep.patternType!
+		 */
 		grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);
 }
 
@@ -573,7 +594,7 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int err;
-	int regflags = opt->regflags;
+	int regflags = REG_NEWLINE;
 
 	basic_regex_quote_buf(&sb, p->pattern);
 	if (opt->ignore_case)
@@ -592,12 +613,12 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
-	int icase, ascii_only;
+	int ascii_only;
 	int err;
+	int regflags = REG_NEWLINE;
 
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
-	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
 	ascii_only     = !has_non_ascii(p->pattern);
 
 	/*
@@ -615,10 +636,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	if (opt->fixed ||
 	    has_null(p->pattern, p->patternlen) ||
 	    is_fixed(p->pattern, p->patternlen))
-		p->fixed = !icase || ascii_only;
+		p->fixed = !p->ignore_case || ascii_only;
 
 	if (p->fixed) {
-		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
+		p->kws = kwsalloc(p->ignore_case ? tolower_trans_tbl : NULL);
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
@@ -642,7 +663,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
-	err = regcomp(&p->regexp, p->pattern, opt->regflags);
+	if (p->ignore_case)
+		regflags |= REG_ICASE;
+	if (opt->extended_regexp_option)
+		regflags |= REG_EXTENDED;
+	err = regcomp(&p->regexp, p->pattern, regflags);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, 1024);
diff --git a/grep.h b/grep.h
index b8f93bfc2d..0c091e5104 100644
--- a/grep.h
+++ b/grep.h
@@ -162,7 +162,6 @@ struct grep_opt {
 	char color_match_selected[COLOR_MAXLEN];
 	char color_selected[COLOR_MAXLEN];
 	char color_sep[COLOR_MAXLEN];
-	int regflags;
 	unsigned pre_context;
 	unsigned post_context;
 	unsigned last_shown;
diff --git a/revision.c b/revision.c
index e181ad1b70..207103d211 100644
--- a/revision.c
+++ b/revision.c
@@ -1362,7 +1362,6 @@ void init_revisions(struct rev_info *revs, const char *prefix)
 	init_grep_defaults();
 	grep_init(&revs->grep_filter, prefix);
 	revs->grep_filter.status_only = 1;
-	revs->grep_filter.regflags = REG_NEWLINE;
 
 	diff_setup(&revs->diffopt);
 	if (prefix && !revs->diffopt.prefix) {
@@ -2022,7 +2021,6 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
 	} else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
 		revs->grep_filter.ignore_case = 1;
-		revs->grep_filter.regflags |= REG_ICASE;
 		DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 2/5] grep: remove redundant grep pattern type assignment
  2017-06-28 21:58       ` [PATCH 2/5] grep: remove redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
@ 2017-06-29 17:03         ` Stefan Beller
  2017-06-29 17:50           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Stefan Beller @ 2017-06-29 17:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, J Smith,
	Joe Ratterman, Fredrik Kuivinen

On Wed, Jun 28, 2017 at 2:58 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Remove a redundant assignment to extended_regexp_option to make it
> zero if grep.extendedRegexp is not set. This is always called right
> after init_grep_defaults() which memsets the entire structure to 0.
>
> This is a logical follow-up to my commit to remove redundant regflags
> assignments[1]. This logic was originally introduced in [2], but as
> explained in the former commit it's working around a pattern in our
> code that no longer exists, and is now confusing as it leads the
> reader to think that this needs to be flipped back & forth.
>
> 1. e0b9f8ae09 ("grep: remove redundant regflags assignments",
>    2017-05-25)
> 2. b22520a37c ("grep: allow -E and -n to be turned on by default via
>    configuration", 2011-03-30)
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  grep.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index 29439886e7..6614042fdc 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -80,8 +80,6 @@ int grep_config(const char *var, const char *value, void *cb)
>         if (!strcmp(var, "grep.extendedregexp")) {
>                 if (git_config_bool(var, value))
>                         opt->extended_regexp_option = 1;
> -               else
> -                       opt->extended_regexp_option = 0;
>                 return 0;

Instead of having a condition here, have you considered to remove the
condition alltogether?

    if (!strcmp(var, "grep.extendedregexp")) {
        opt->extended_regexp_option = git_config_bool(var, value);
        return 0;
    }

This does not have the effect of not assigning the value in case of 0,
but it may be easier to reason about when reading the code.

This would also conform to the code below in that function, that parses
grep.linenumber or grep.fullname

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0
  2017-06-28 21:58       ` [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
@ 2017-06-29 17:10         ` Stefan Beller
  0 siblings, 0 replies; 77+ messages in thread
From: Stefan Beller @ 2017-06-29 17:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, J Smith,
	Joe Ratterman, Fredrik Kuivinen

On Wed, Jun 28, 2017 at 2:58 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Remove the redundant re-assignment of the fixed field to zero right
> after the entire struct has been set to zero via memset(...).
>
> Unlike some nearby commits this pattern doesn't date back to the
> pattern described in e0b9f8ae09 ("grep: remove redundant regflags
> assignments", 2017-05-25), instead it was apparently cargo-culted in
> 9eceddeec6 ("Use kwset in grep", 2011-08-21).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  grep.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index 6614042fdc..7cd8a6512f 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -627,8 +627,6 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>             has_null(p->pattern, p->patternlen) ||
>             is_fixed(p->pattern, p->patternlen))
>                 p->fixed = !icase || ascii_only;
> -       else
> -               p->fixed = 0;
>

I was about to propose a similar action as in 2/5,
but getting the condition right is not as easy:

    p->fixed = (opt->fixed ||
           has_null(p->pattern, p->patternlen) ||
           is_fixed(p->pattern, p->patternlen)) &&
           (!icase || ascii_only);

does not look as convincing here.

Thanks for mentioning 9eceddeec6 as in that commit
I would have been easy with just proposing to have

    p->fixed = opt->fixed || is_fixed(p->pattern, p->patternlen);

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 5/5] grep: remove regflags from the public grep_opt API
  2017-06-28 21:58       ` [PATCH 5/5] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
@ 2017-06-29 17:43         ` Stefan Beller
  2017-06-29 18:16           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 77+ messages in thread
From: Stefan Beller @ 2017-06-29 17:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, J Smith,
	Joe Ratterman, Fredrik Kuivinen

On Wed, Jun 28, 2017 at 2:58 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Refactor calls to the grep machinery to always pass opt.ignore_case &
> opt.extended_regexp_option instead of setting the equivalent regflags
> bits.
>
> The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
> make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
> really just plastering over the code smell which this change fixes.
>
> See my "Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with
> --perl-regexp"[1] for the discussion leading up to this.
>
> The reason for adding the extensive commentary here is that I
> discovered some subtle complexity in implementing this that really
> should be called out explicitly to future readers.
>
> Before this change we'd rely on the difference between
> `extended_regexp_option` and `regflags` to serve as a membrane between
> our preliminary parsing of grep.extendedRegexp and grep.patternType,
> and what we decided to do internally.
>
> Now that those two are the same thing, it's necessary to unset
> `extended_regexp_option` just before we commit in cases where both of
> those config variables are set. See 84befcd0a4 ("grep: add a
> grep.patternType configuration setting", 2012-08-03) for the code and
> documentation related to that.
>
> The explanation of why the if/else branches in
> grep_commit_pattern_type() are ordered the way they are exists in that
> commit message, but I think it's worth calling this subtlety out
> explicitly with a comment for future readers.

Up to here the commit message is inspiring confidence.

>
> Unrelated to that: I could have factored out the default REG_NEWLINE
> flag into some custom GIT_GREP_H_DEFAULT_REGFLAGS or something, but
> since it's just used in two places I didn't think it was worth the
> effort.
>
> As an aside we're really lacking test coverage regflags being
> initiated as 0 instead of as REG_NEWLINE. Tests will fail if it's
> removed from compile_regexp(), but not if it's removed from
> compile_fixed_regexp(). I have not dug to see if it's actually needed
> in the latter case or if the test coverage is lacking.

This sounds as if extra careful review is needed.


>
> 1. <CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com>
>    (https://public-inbox.org/git/CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com/)
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/grep.c |  2 --
>  grep.c         | 43 ++++++++++++++++++++++++++++++++++---------
>  grep.h         |  1 -
>  revision.c     |  2 --
>  4 files changed, 34 insertions(+), 14 deletions(-)
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index f61a9d938b..b682966439 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -1169,8 +1169,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>
>         if (!opt.pattern_list)
>                 die(_("no pattern given."));
> -       if (!opt.fixed && opt.ignore_case)
> -               opt.regflags |= REG_ICASE;
>
>         /*
>          * We have to find "--" in a separate pass, because its presence
> diff --git a/grep.c b/grep.c
> index 736e1e00d6..51aaad9f03 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -35,7 +35,6 @@ void init_grep_defaults(void)
>         memset(opt, 0, sizeof(*opt));
>         opt->relative = 1;
>         opt->pathname = 1;
> -       opt->regflags = REG_NEWLINE;
>         opt->max_depth = -1;
>         opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
>         color_set(opt->color_context, "");
> @@ -154,7 +153,6 @@ void grep_init(struct grep_opt *opt, const char *prefix)
>         opt->linenum = def->linenum;
>         opt->max_depth = def->max_depth;
>         opt->pathname = def->pathname;
> -       opt->regflags = def->regflags;
>         opt->relative = def->relative;
>         opt->output = def->output;
>
> @@ -170,6 +168,24 @@ void grep_init(struct grep_opt *opt, const char *prefix)
>
>  static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
>  {
> +       /*
> +        * When committing to the pattern type by setting the relevant
> +        * fields in grep_opt it's generally not necessary to zero out
> +        * the fields we're not choosing, since they won't have been
> +        * set by anything. The extended_regexp_option field is the
> +        * only exception to this.
> +        *
> +        * This is because in the process of parsing grep.patternType
> +        * & grep.extendedRegexp we set opt->pattern_type_option and
> +        * opt->extended_regexp_option, respectively. We then
> +        * internally use opt->extended_regexp_option to see if we're
> +        * compiling an ERE. It must be unset if that's not actually
> +        * the case.
> +        */
> +       if (pattern_type != GREP_PATTERN_TYPE_ERE &&
> +           opt->extended_regexp_option)
> +               opt->extended_regexp_option = 0;
> +
>         switch (pattern_type) {
>         case GREP_PATTERN_TYPE_UNSPECIFIED:
>                 /* fall through */
> @@ -178,7 +194,7 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
>                 break;
>
>         case GREP_PATTERN_TYPE_ERE:
> -               opt->regflags |= REG_EXTENDED;
> +               opt->extended_regexp_option = 1;
>                 break;
>
>         case GREP_PATTERN_TYPE_FIXED:
> @@ -208,6 +224,11 @@ void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o
>         else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
>                 grep_set_pattern_type_option(opt->pattern_type_option, opt);
>         else if (opt->extended_regexp_option)
> +               /*
> +                * This branch *must* happen after setting from the
> +                * opt->pattern_type_option above,

I do not quite understand this. Are you saying

  opt->pattern_type_option takes precedence over
  opt->extended_regexp_option if the former is not _UNSPECIFIED ?

As grep_set_pattern_type_option is only called from here,
I wondered if we can put the long comment (and the code)
here in this function grep_commit_pattern_type to have it less
subtle? I have no proposal how though.

I think I grokked this patch and it makes sense, though the commit
message strongly hints at asking for tests. ;)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2/5] grep: remove redundant grep pattern type assignment
  2017-06-29 17:03         ` Stefan Beller
@ 2017-06-29 17:50           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 17:50 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, J Smith,
	Joe Ratterman, Fredrik Kuivinen


On Thu, Jun 29 2017, Stefan Beller jotted:

> On Wed, Jun 28, 2017 at 2:58 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> Remove a redundant assignment to extended_regexp_option to make it
>> zero if grep.extendedRegexp is not set. This is always called right
>> after init_grep_defaults() which memsets the entire structure to 0.
>>
>> This is a logical follow-up to my commit to remove redundant regflags
>> assignments[1]. This logic was originally introduced in [2], but as
>> explained in the former commit it's working around a pattern in our
>> code that no longer exists, and is now confusing as it leads the
>> reader to think that this needs to be flipped back & forth.
>>
>> 1. e0b9f8ae09 ("grep: remove redundant regflags assignments",
>>    2017-05-25)
>> 2. b22520a37c ("grep: allow -E and -n to be turned on by default via
>>    configuration", 2011-03-30)
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  grep.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/grep.c b/grep.c
>> index 29439886e7..6614042fdc 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -80,8 +80,6 @@ int grep_config(const char *var, const char *value, void *cb)
>>         if (!strcmp(var, "grep.extendedregexp")) {
>>                 if (git_config_bool(var, value))
>>                         opt->extended_regexp_option = 1;
>> -               else
>> -                       opt->extended_regexp_option = 0;
>>                 return 0;
>
> Instead of having a condition here, have you considered to remove the
> condition alltogether?
>
>     if (!strcmp(var, "grep.extendedregexp")) {
>         opt->extended_regexp_option = git_config_bool(var, value);
>         return 0;
>     }
>
> This does not have the effect of not assigning the value in case of 0,
> but it may be easier to reason about when reading the code.
>
> This would also conform to the code below in that function, that parses
> grep.linenumber or grep.fullname

I didn't think about that. Good point. I'll do that instead in v2.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 5/5] grep: remove regflags from the public grep_opt API
  2017-06-29 17:43         ` Stefan Beller
@ 2017-06-29 18:16           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 18:16 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, J Smith,
	Joe Ratterman, Fredrik Kuivinen


On Thu, Jun 29 2017, Stefan Beller jotted:

> On Wed, Jun 28, 2017 at 2:58 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> Refactor calls to the grep machinery to always pass opt.ignore_case &
>> opt.extended_regexp_option instead of setting the equivalent regflags
>> bits.
>>
>> The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
>> make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
>> really just plastering over the code smell which this change fixes.
>>
>> See my "Re: [PATCH v3 05/30] log: make --regexp-ignore-case work with
>> --perl-regexp"[1] for the discussion leading up to this.
>>
>> The reason for adding the extensive commentary here is that I
>> discovered some subtle complexity in implementing this that really
>> should be called out explicitly to future readers.
>>
>> Before this change we'd rely on the difference between
>> `extended_regexp_option` and `regflags` to serve as a membrane between
>> our preliminary parsing of grep.extendedRegexp and grep.patternType,
>> and what we decided to do internally.
>>
>> Now that those two are the same thing, it's necessary to unset
>> `extended_regexp_option` just before we commit in cases where both of
>> those config variables are set. See 84befcd0a4 ("grep: add a
>> grep.patternType configuration setting", 2012-08-03) for the code and
>> documentation related to that.
>>
>> The explanation of why the if/else branches in
>> grep_commit_pattern_type() are ordered the way they are exists in that
>> commit message, but I think it's worth calling this subtlety out
>> explicitly with a comment for future readers.
>
> Up to here the commit message is inspiring confidence.

Thanks.

>>
>> Unrelated to that: I could have factored out the default REG_NEWLINE
>> flag into some custom GIT_GREP_H_DEFAULT_REGFLAGS or something, but
>> since it's just used in two places I didn't think it was worth the
>> effort.
>>
>> As an aside we're really lacking test coverage regflags being
>> initiated as 0 instead of as REG_NEWLINE. Tests will fail if it's
>> removed from compile_regexp(), but not if it's removed from
>> compile_fixed_regexp(). I have not dug to see if it's actually needed
>> in the latter case or if the test coverage is lacking.
>
> This sounds as if extra careful review is needed.

Note though (since I didn't say this explicitly) nothing about this
commit changes the semanics of what we pass to regcomp, I'm just noting
this caveat with REG_NEWLINE as an aside since I'm moving it around.

>>
>> 1. <CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com>
>>    (https://public-inbox.org/git/CACBZZX6Hp4Q4TOj_X1fbdCA4twoXF5JemZ5ZbEn7wmkA=1KO2g@mail.gmail.com/)
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  builtin/grep.c |  2 --
>>  grep.c         | 43 ++++++++++++++++++++++++++++++++++---------
>>  grep.h         |  1 -
>>  revision.c     |  2 --
>>  4 files changed, 34 insertions(+), 14 deletions(-)
>>
>> diff --git a/builtin/grep.c b/builtin/grep.c
>> index f61a9d938b..b682966439 100644
>> --- a/builtin/grep.c
>> +++ b/builtin/grep.c
>> @@ -1169,8 +1169,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>>
>>         if (!opt.pattern_list)
>>                 die(_("no pattern given."));
>> -       if (!opt.fixed && opt.ignore_case)
>> -               opt.regflags |= REG_ICASE;
>>
>>         /*
>>          * We have to find "--" in a separate pass, because its presence
>> diff --git a/grep.c b/grep.c
>> index 736e1e00d6..51aaad9f03 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -35,7 +35,6 @@ void init_grep_defaults(void)
>>         memset(opt, 0, sizeof(*opt));
>>         opt->relative = 1;
>>         opt->pathname = 1;
>> -       opt->regflags = REG_NEWLINE;
>>         opt->max_depth = -1;
>>         opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
>>         color_set(opt->color_context, "");
>> @@ -154,7 +153,6 @@ void grep_init(struct grep_opt *opt, const char *prefix)
>>         opt->linenum = def->linenum;
>>         opt->max_depth = def->max_depth;
>>         opt->pathname = def->pathname;
>> -       opt->regflags = def->regflags;
>>         opt->relative = def->relative;
>>         opt->output = def->output;
>>
>> @@ -170,6 +168,24 @@ void grep_init(struct grep_opt *opt, const char *prefix)
>>
>>  static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
>>  {
>> +       /*
>> +        * When committing to the pattern type by setting the relevant
>> +        * fields in grep_opt it's generally not necessary to zero out
>> +        * the fields we're not choosing, since they won't have been
>> +        * set by anything. The extended_regexp_option field is the
>> +        * only exception to this.
>> +        *
>> +        * This is because in the process of parsing grep.patternType
>> +        * & grep.extendedRegexp we set opt->pattern_type_option and
>> +        * opt->extended_regexp_option, respectively. We then
>> +        * internally use opt->extended_regexp_option to see if we're
>> +        * compiling an ERE. It must be unset if that's not actually
>> +        * the case.
>> +        */
>> +       if (pattern_type != GREP_PATTERN_TYPE_ERE &&
>> +           opt->extended_regexp_option)
>> +               opt->extended_regexp_option = 0;
>> +
>>         switch (pattern_type) {
>>         case GREP_PATTERN_TYPE_UNSPECIFIED:
>>                 /* fall through */
>> @@ -178,7 +194,7 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
>>                 break;
>>
>>         case GREP_PATTERN_TYPE_ERE:
>> -               opt->regflags |= REG_EXTENDED;
>> +               opt->extended_regexp_option = 1;
>>                 break;
>>
>>         case GREP_PATTERN_TYPE_FIXED:
>> @@ -208,6 +224,11 @@ void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o
>>         else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
>>                 grep_set_pattern_type_option(opt->pattern_type_option, opt);
>>         else if (opt->extended_regexp_option)
>> +               /*
>> +                * This branch *must* happen after setting from the
>> +                * opt->pattern_type_option above,
>
> I do not quite understand this. Are you saying
>
>   opt->pattern_type_option takes precedence over
>   opt->extended_regexp_option if the former is not _UNSPECIFIED ?

I mean this "else if" code *must* be in that order, i.e.:

	else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
		grep_set_pattern_type_option(opt->pattern_type_option, opt);
	else if (opt->extended_regexp_option)
		grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);

Not:

	else if (opt->extended_regexp_option)
		grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);
	else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
		grep_set_pattern_type_option(opt->pattern_type_option, opt);

Since we only want to pay attention to grep.extendedRegexp it
grep.patternType is not set. If grep.patternType is set then the
pattern_type_option will not be GREP_PATTERN_TYPE_UNSPECIFIED (but
e.g. GREP_PATTERN_TYPE_BRE).

> As grep_set_pattern_type_option is only called from here,
> I wondered if we can put the long comment (and the code)
> here in this function grep_commit_pattern_type to have it less
> subtle? I have no proposal how though.

Ah you mean the whole "When committing to the pattern type by" comment +
code. Yeah I think that makes sense. I'll try that for v2 and see if
it's better.

> I think I grokked this patch and it makes sense, though the commit
> message strongly hints at asking for tests. ;)

*Points up at "moving it around" comment above*

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 0/6] grep: remove redundant code & reflags from API
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
@ 2017-06-29 22:22         ` Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 1/6] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 22:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Addresses comments from Stefan Beller (thanks!). I looked into it and
the REG_NEWLINE flag was redundant in 1/2 cases, see 6/6 for the
removal of that.

I looked into refactoring 5/6 as noted in 87zicqirrg.fsf@gmail.com,
but for the reasons now explained in the last paragraph of 5/6 decided
not to and to keep it as it was.

Ævar Arnfjörð Bjarmason (6):
  grep: remove redundant double assignment to 0
  grep: adjust a redundant grep pattern type assignment
  grep: remove redundant "fixed" field re-assignment to 0
  grep: remove redundant and verbose re-assignments to 0
  grep: remove regflags from the public grep_opt API
  grep: remove redundant REG_NEWLINE when compiling fixed regex

 builtin/grep.c |  2 --
 grep.c         | 62 +++++++++++++++++++++++++++++++++-------------------------
 grep.h         |  1 -
 revision.c     |  2 --
 4 files changed, 35 insertions(+), 32 deletions(-)

-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 1/6] grep: remove redundant double assignment to 0
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 0/6] " Ævar Arnfjörð Bjarmason
@ 2017-06-29 22:22         ` Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 2/6] grep: adjust a redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 22:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Stop assigning 0 to the extended_regexp_option field right after we've
zeroed out the entire struct with memset() just a few lines earlier.

Unlike some of the code being refactored in subsequent commits, this
was always completely redundant. See the original code introduced in
84befcd0a4 ("grep: add a grep.patternType configuration setting",
2012-08-03).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/grep.c b/grep.c
index 98733db623..29439886e7 100644
--- a/grep.c
+++ b/grep.c
@@ -38,7 +38,6 @@ void init_grep_defaults(void)
 	opt->regflags = REG_NEWLINE;
 	opt->max_depth = -1;
 	opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
-	opt->extended_regexp_option = 0;
 	color_set(opt->color_context, "");
 	color_set(opt->color_filename, "");
 	color_set(opt->color_function, "");
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 2/6] grep: adjust a redundant grep pattern type assignment
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 0/6] " Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 1/6] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
@ 2017-06-29 22:22         ` Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 3/6] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 22:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Adjust a now-redundant assignment to extended_regexp_option to make it
zero if grep.extendedRegexp is not set. This is always called right
after init_grep_defaults() which memsets the entire structure to 0, so
there's no need to set it again to zero.

However the reason for the if/else pattern is a holdover from[1] where
this was adjusted from a bitfield assignment to a boolean. Rather than
getting rid of the assignment to 0 in all cases, let's just use the
value returned by git_config_bool(), which is more idiomatic and in
sync with the rest of the boolean handling in this function.

This is a logical follow-up to my commit to remove redundant regflags
assignments[2]. This logic was originally introduced in [3], but as
explained in the former commit it's working around a pattern in our
code that no longer exists, and is now confusing as it leads the
reader to think that this needs to be flipped back & forth.

1. 84befcd0a4 ("grep: add a grep.patternType configuration setting",
   2012-08-03)
2. e0b9f8ae09 ("grep: remove redundant regflags assignments",
   2017-05-25)
3. b22520a37c ("grep: allow -E and -n to be turned on by default via
   configuration", 2011-03-30)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 29439886e7..817270d081 100644
--- a/grep.c
+++ b/grep.c
@@ -78,10 +78,7 @@ int grep_config(const char *var, const char *value, void *cb)
 		return -1;
 
 	if (!strcmp(var, "grep.extendedregexp")) {
-		if (git_config_bool(var, value))
-			opt->extended_regexp_option = 1;
-		else
-			opt->extended_regexp_option = 0;
+		opt->extended_regexp_option = git_config_bool(var, value);
 		return 0;
 	}
 
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 3/6] grep: remove redundant "fixed" field re-assignment to 0
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
                           ` (2 preceding siblings ...)
  2017-06-29 22:22         ` [PATCH v2 2/6] grep: adjust a redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
@ 2017-06-29 22:22         ` Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 4/6] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 22:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Remove the redundant re-assignment of the fixed field to zero right
after the entire struct has been set to zero via memset(...).

Unlike some nearby commits this pattern doesn't date back to the
pattern described in e0b9f8ae09 ("grep: remove redundant regflags
assignments", 2017-05-25), instead it was apparently cargo-culted in
9eceddeec6 ("Use kwset in grep", 2011-08-21).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/grep.c b/grep.c
index 817270d081..86dc9b696f 100644
--- a/grep.c
+++ b/grep.c
@@ -626,8 +626,6 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	    has_null(p->pattern, p->patternlen) ||
 	    is_fixed(p->pattern, p->patternlen))
 		p->fixed = !icase || ascii_only;
-	else
-		p->fixed = 0;
 
 	if (p->fixed) {
 		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 4/6] grep: remove redundant and verbose re-assignments to 0
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
                           ` (3 preceding siblings ...)
  2017-06-29 22:22         ` [PATCH v2 3/6] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
@ 2017-06-29 22:22         ` Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 5/6] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
  2017-06-29 22:22         ` [PATCH v2 6/6] grep: remove redundant REG_NEWLINE when compiling fixed regex Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 22:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Remove the redundant re-assignments of the fixed/pcre1/pcre2 fields to
zero right after the entire struct has been set to zero via
memset(...).

See an earlier related cleanup commit e0b9f8ae09 ("grep: remove
redundant regflags assignments", 2017-05-25) for an explanation of why
the code was structured like this to begin with.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/grep.c b/grep.c
index 86dc9b696f..7fcdaa0753 100644
--- a/grep.c
+++ b/grep.c
@@ -174,28 +174,18 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 		/* fall through */
 
 	case GREP_PATTERN_TYPE_BRE:
-		opt->fixed = 0;
-		opt->pcre1 = 0;
-		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
-		opt->fixed = 0;
-		opt->pcre1 = 0;
-		opt->pcre2 = 0;
 		opt->regflags |= REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
-		opt->pcre1 = 0;
-		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_PCRE:
-		opt->fixed = 0;
 #ifdef USE_LIBPCRE2
-		opt->pcre1 = 0;
 		opt->pcre2 = 1;
 #else
 		/*
@@ -205,7 +195,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 		 * "cannot use Perl-compatible regexes[...]".
 		 */
 		opt->pcre1 = 1;
-		opt->pcre2 = 0;
 #endif
 		break;
 	}
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 5/6] grep: remove regflags from the public grep_opt API
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
                           ` (4 preceding siblings ...)
  2017-06-29 22:22         ` [PATCH v2 4/6] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
@ 2017-06-29 22:22         ` Ævar Arnfjörð Bjarmason
  2017-06-30 17:20           ` Junio C Hamano
  2017-06-29 22:22         ` [PATCH v2 6/6] grep: remove redundant REG_NEWLINE when compiling fixed regex Ævar Arnfjörð Bjarmason
  6 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 22:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Refactor calls to the grep machinery to always pass opt.ignore_case &
opt.extended_regexp_option instead of setting the equivalent regflags
bits.

The bug fixed when making -i work with -P in commit 9e3cbc59d5 ("log:
make --regexp-ignore-case work with --perl-regexp", 2017-05-20) was
really just plastering over the code smell which this change fixes.

The reason for adding the extensive commentary here is that I
discovered some subtle complexity in implementing this that really
should be called out explicitly to future readers.

Before this change we'd rely on the difference between
`extended_regexp_option` and `regflags` to serve as a membrane between
our preliminary parsing of grep.extendedRegexp and grep.patternType,
and what we decided to do internally.

Now that those two are the same thing, it's necessary to unset
`extended_regexp_option` just before we commit in cases where both of
those config variables are set. See 84befcd0a4 ("grep: add a
grep.patternType configuration setting", 2012-08-03) for the code and
documentation related to that.

The explanation of why the if/else branches in
grep_commit_pattern_type() are ordered the way they are exists in that
commit message, but I think it's worth calling this subtlety out
explicitly with a comment for future readers.

Even though grep_commit_pattern_type() is the only caller of
grep_set_pattern_type_option() it's simpler to reset the
extended_regexp_option flag in the latter, since 2/3 branches in the
former would otherwise need to reset it, this way we can do it in one
place.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c |  2 --
 grep.c         | 43 ++++++++++++++++++++++++++++++++++---------
 grep.h         |  1 -
 revision.c     |  2 --
 4 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index f61a9d938b..b682966439 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1169,8 +1169,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 
 	if (!opt.pattern_list)
 		die(_("no pattern given."));
-	if (!opt.fixed && opt.ignore_case)
-		opt.regflags |= REG_ICASE;
 
 	/*
 	 * We have to find "--" in a separate pass, because its presence
diff --git a/grep.c b/grep.c
index 7fcdaa0753..11a86548d6 100644
--- a/grep.c
+++ b/grep.c
@@ -35,7 +35,6 @@ void init_grep_defaults(void)
 	memset(opt, 0, sizeof(*opt));
 	opt->relative = 1;
 	opt->pathname = 1;
-	opt->regflags = REG_NEWLINE;
 	opt->max_depth = -1;
 	opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED;
 	color_set(opt->color_context, "");
@@ -153,7 +152,6 @@ void grep_init(struct grep_opt *opt, const char *prefix)
 	opt->linenum = def->linenum;
 	opt->max_depth = def->max_depth;
 	opt->pathname = def->pathname;
-	opt->regflags = def->regflags;
 	opt->relative = def->relative;
 	opt->output = def->output;
 
@@ -169,6 +167,24 @@ void grep_init(struct grep_opt *opt, const char *prefix)
 
 static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
 {
+	/*
+	 * When committing to the pattern type by setting the relevant
+	 * fields in grep_opt it's generally not necessary to zero out
+	 * the fields we're not choosing, since they won't have been
+	 * set by anything. The extended_regexp_option field is the
+	 * only exception to this.
+	 *
+	 * This is because in the process of parsing grep.patternType
+	 * & grep.extendedRegexp we set opt->pattern_type_option and
+	 * opt->extended_regexp_option, respectively. We then
+	 * internally use opt->extended_regexp_option to see if we're
+	 * compiling an ERE. It must be unset if that's not actually
+	 * the case.
+	 */
+	if (pattern_type != GREP_PATTERN_TYPE_ERE &&
+	    opt->extended_regexp_option)
+		opt->extended_regexp_option = 0;
+
 	switch (pattern_type) {
 	case GREP_PATTERN_TYPE_UNSPECIFIED:
 		/* fall through */
@@ -177,7 +193,7 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
-		opt->regflags |= REG_EXTENDED;
+		opt->extended_regexp_option = 1;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
@@ -207,6 +223,11 @@ void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_o
 	else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED)
 		grep_set_pattern_type_option(opt->pattern_type_option, opt);
 	else if (opt->extended_regexp_option)
+		/*
+		 * This branch *must* happen after setting from the
+		 * opt->pattern_type_option above, we don't want
+		 * grep.extendedRegexp to override grep.patternType!
+		 */
 		grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt);
 }
 
@@ -572,7 +593,7 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int err;
-	int regflags = opt->regflags;
+	int regflags = REG_NEWLINE;
 
 	basic_regex_quote_buf(&sb, p->pattern);
 	if (opt->ignore_case)
@@ -591,12 +612,12 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
-	int icase, ascii_only;
+	int ascii_only;
 	int err;
+	int regflags = REG_NEWLINE;
 
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
-	icase	       = opt->regflags & REG_ICASE || p->ignore_case;
 	ascii_only     = !has_non_ascii(p->pattern);
 
 	/*
@@ -614,10 +635,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	if (opt->fixed ||
 	    has_null(p->pattern, p->patternlen) ||
 	    is_fixed(p->pattern, p->patternlen))
-		p->fixed = !icase || ascii_only;
+		p->fixed = !p->ignore_case || ascii_only;
 
 	if (p->fixed) {
-		p->kws = kwsalloc(icase ? tolower_trans_tbl : NULL);
+		p->kws = kwsalloc(p->ignore_case ? tolower_trans_tbl : NULL);
 		kwsincr(p->kws, p->pattern, p->patternlen);
 		kwsprep(p->kws);
 		return;
@@ -641,7 +662,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
-	err = regcomp(&p->regexp, p->pattern, opt->regflags);
+	if (p->ignore_case)
+		regflags |= REG_ICASE;
+	if (opt->extended_regexp_option)
+		regflags |= REG_EXTENDED;
+	err = regcomp(&p->regexp, p->pattern, regflags);
 	if (err) {
 		char errbuf[1024];
 		regerror(err, &p->regexp, errbuf, 1024);
diff --git a/grep.h b/grep.h
index b8f93bfc2d..0c091e5104 100644
--- a/grep.h
+++ b/grep.h
@@ -162,7 +162,6 @@ struct grep_opt {
 	char color_match_selected[COLOR_MAXLEN];
 	char color_selected[COLOR_MAXLEN];
 	char color_sep[COLOR_MAXLEN];
-	int regflags;
 	unsigned pre_context;
 	unsigned post_context;
 	unsigned last_shown;
diff --git a/revision.c b/revision.c
index e181ad1b70..207103d211 100644
--- a/revision.c
+++ b/revision.c
@@ -1362,7 +1362,6 @@ void init_revisions(struct rev_info *revs, const char *prefix)
 	init_grep_defaults();
 	grep_init(&revs->grep_filter, prefix);
 	revs->grep_filter.status_only = 1;
-	revs->grep_filter.regflags = REG_NEWLINE;
 
 	diff_setup(&revs->diffopt);
 	if (prefix && !revs->diffopt.prefix) {
@@ -2022,7 +2021,6 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
 	} else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
 		revs->grep_filter.ignore_case = 1;
-		revs->grep_filter.regflags |= REG_ICASE;
 		DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 6/6] grep: remove redundant REG_NEWLINE when compiling fixed regex
  2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
                           ` (5 preceding siblings ...)
  2017-06-29 22:22         ` [PATCH v2 5/6] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
@ 2017-06-29 22:22         ` Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-29 22:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, J Smith, Joe Ratterman,
	Fredrik Kuivinen, Ævar Arnfjörð Bjarmason

Remove the redundant REG_NEWLINE regcomp() flag from the code that
compiles a fixed-string regular-expression.

The REG_NEWLINE causes metacharacters such as "." to match a newline,
since the basic_regex_quote_buf() function being called here escapes
all metacharacters using REG_NEWLINE is confusing and redundant.

The use of this flag was introduced as an unintended emergent property
of 793dc676e0 ("grep/icase: avoid kwsset when -F is specified",
2016-06-25).

That change amended the existing regflags, which were initialized to
REG_NEWLINE in init_grep_defaults() assuming a subsequent non-fixed
regcomp().

Manual testing reveals that this was always redundant, since no flags
of any use were inherited from opt->regflags even back
then. 793dc676e0 passes all tests with this on top:

    diff --git a/grep.c b/grep.c
    index 627ae3e3e8..89e84ed7fd 100644
    --- a/grep.c
    +++ b/grep.c
    @@ -407,3 +407,3 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
            basic_regex_quote_buf(&sb, p->pattern);
    -       regflags = opt->regflags & ~REG_EXTENDED;
    +       regflags = 0;
            if (opt->ignore_case)

Since this isn't used for anything and never was, remove it to reduce
confusion when reading this code.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 11a86548d6..2efec0e182 100644
--- a/grep.c
+++ b/grep.c
@@ -593,7 +593,7 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
 	int err;
-	int regflags = REG_NEWLINE;
+	int regflags = 0;
 
 	basic_regex_quote_buf(&sb, p->pattern);
 	if (opt->ignore_case)
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 5/6] grep: remove regflags from the public grep_opt API
  2017-06-29 22:22         ` [PATCH v2 5/6] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
@ 2017-06-30 17:20           ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2017-06-30 17:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, J Smith, Joe Ratterman, Fredrik Kuivinen

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> @@ -169,6 +167,24 @@ void grep_init(struct grep_opt *opt, const char *prefix)
>  
>  static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt)
>  {
> +	/*
> +	 * When committing to the pattern type by setting the relevant
> +	 * fields in grep_opt it's generally not necessary to zero out
> +	 * the fields we're not choosing, since they won't have been
> +	 * set by anything. The extended_regexp_option field is the
> +	 * only exception to this.
> +	 *
> +	 * This is because in the process of parsing grep.patternType
> +	 * & grep.extendedRegexp we set opt->pattern_type_option and
> +	 * opt->extended_regexp_option, respectively. We then
> +	 * internally use opt->extended_regexp_option to see if we're
> +	 * compiling an ERE. It must be unset if that's not actually
> +	 * the case.
> +	 */
> +	if (pattern_type != GREP_PATTERN_TYPE_ERE &&
> +	    opt->extended_regexp_option)
> +		opt->extended_regexp_option = 0;

Good to have the reasoning in an in-code comment like the above.
But after reading these two paragraphs and then before reading the
three line code, a more natural embodiment in the code of the
commentary that came to my mind was

	if (pattern_type != GREP_PATTERN_TYPE_ERE)
		opt->extended_regexp_option = 0;

The end-result is the same as yours, of course, but I somehow found
it match the reasoning better.

Now, I wonder if this can further be tweaked to

	opt->extended_regexp_option = (pattern_type == GREP_PATTERN_TYPE_ERE);

which might lead us in a direction to really unify the two related
fields extended_regexp_option and pattern_type_option.

Even if that were a good longer term direction to go in, it is
outside the scope of this step, of course.  I am merely bringing it
up as an conversation item ;-).


^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2017-06-30 17:20 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 01/30] Makefile & configure: reword inaccurate comment about PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 02/30] grep & rev-list doc: stop promising libpcre for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 03/30] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 04/30] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-20 23:50   ` Junio C Hamano
2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
2017-05-22  0:17       ` Junio C Hamano
2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 0/6] " Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 1/6] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 2/6] grep: adjust a redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 3/6] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 4/6] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 5/6] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
2017-06-30 17:20           ` Junio C Hamano
2017-06-29 22:22         ` [PATCH v2 6/6] grep: remove redundant REG_NEWLINE when compiling fixed regex Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 1/5] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 2/5] grep: remove redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
2017-06-29 17:03         ` Stefan Beller
2017-06-29 17:50           ` Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 17:10         ` Stefan Beller
2017-06-28 21:58       ` [PATCH 4/5] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 5/5] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
2017-06-29 17:43         ` Stefan Beller
2017-06-29 18:16           ` Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 06/30] grep: add a test asserting that --perl-regexp dies when !PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 07/30] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 08/30] grep: change non-ASCII -i test to stop using --debug Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 09/30] grep: add tests for --threads=N and grep.threads Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 10/30] grep: amend submodule recursion test for regex engine testing Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 11/30] grep: add tests for grep pattern types being passed to submodules Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 12/30] grep: add a test helper function for less verbose -f \0 tests Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 13/30] grep: prepare for testing binary regexes containing rx metacharacters Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 14/30] grep: add tests to fix blind spots with \0 patterns Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Ævar Arnfjörð Bjarmason
2017-05-20 23:50   ` Junio C Hamano
2017-05-21  6:23     ` Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 16/30] perf: emit progress output when unpacking & building Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 17/30] perf: add a comparison test of grep regex engines Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 18/30] perf: add a comparison test of grep regex engines with -F Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 19/30] perf: add a comparison test of log --grep regex engines Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 20/30] grep: catch a missing enum in switch statement Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 21/30] grep: remove redundant regflags assignments Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function Ævar Arnfjörð Bjarmason
2017-05-23 21:17   ` Brandon Williams
2017-05-20 21:42 ` [PATCH v3 23/30] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 24/30] grep: change internal *pcre* variable & function names to be *pcre1* Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 25/30] grep: move is_fixed() earlier to avoid forward declaration Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 26/30] test-lib: add a PTHREADS prerequisite Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 27/30] pack-objects & index-pack: add test for --threads warning Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 28/30] pack-objects: fix buggy warning about threads Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 29/30] grep: given --threads with NO_PTHREADS=YesPlease, warn Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 30/30] grep: assert that threading is enabled when calling grep_{lock,unlock} Ævar Arnfjörð Bjarmason
2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
2017-05-24  4:42     ` Junio C Hamano
2017-05-25 10:33       ` Ævar Arnfjörð Bjarmason
2017-05-26  0:58         ` Junio C Hamano
2017-05-26  8:06           ` Ævar Arnfjörð Bjarmason
2017-05-26  9:49             ` Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
2017-05-24  4:45     ` Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
2017-05-24  5:17     ` Junio C Hamano
2017-05-24  7:37       ` Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
2017-05-24  6:00     ` Junio C Hamano
2017-05-24  6:38       ` Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
2017-05-24  6:23     ` Junio C Hamano
2017-05-25  9:49       ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).