git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes
@ 2017-05-11 17:01 Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

This goes on top of the 29 patch series of "Easy to review grep &
pre-PCRE changes" (<20170511091829.5634-1-avarab@gmail.com>;
https://public-inbox.org/git/20170511091829.5634-1-avarab@gmail.com/).

This could be split into 3 unrelated things, but I have think it's
probably easier for everyone to bundle these up, since they all go on
top of the other series. Comments below:

Ævar Arnfjörð Bjarmason (7):
  grep: don't redundantly compile throwaway patterns under threading
  grep: skip pthreads overhead when using one thread

Internal changes to grep to not redundantly spawn threads. No
functional changes, just internal cleanup.

  log: add -P as a synonym for --perl-regexp

Trivial change to add -P.

  grep: add support for the PCRE v1 JIT API
  grep: un-break building with PCRE < 8.32
  grep: un-break building with PCRE < 8.20

I tested ancient versions of PCRE, which turned up build issues that
are fixed this time around.

  grep: add support for PCRE v2

The main point of this whole thing.

 Documentation/rev-list-options.txt |   1 +
 Makefile                           |  30 +++++--
 builtin/grep.c                     |  16 +++-
 configure.ac                       |  77 +++++++++++++---
 grep.c                             | 180 ++++++++++++++++++++++++++++++++++++-
 grep.h                             |  31 +++++++
 revision.c                         |   2 +-
 t/t4202-log.sh                     |  12 +++
 t/test-lib.sh                      |   2 +-
 9 files changed, 327 insertions(+), 24 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
@ 2017-05-11 17:01 ` Ævar Arnfjörð Bjarmason
  2017-05-12 17:35   ` Brandon Williams
  2017-05-11 17:01 ` [PATCH 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Change the pattern compilation logic under threading so that grep
doesn't compile a pattern it never ends up using on the non-threaded
code path, only to compile it again N times for N threads which will
each use their own copy, ignoring the initially compiled pattern.

This redundant compilation dates back to the initial introduction of
the threaded grep in commit 5b594f457a ("Threaded grep",
2010-01-25).

There was never any reason for doing this redundant work other than an
oversight in the initial commit. Jeff King suggested on-list in
<20170414212325.fefrl3qdjigwyitd@sigill.intra.peff.net> that this
might be needed to check the pattern for sanity before threaded
execution commences.

That's not the case. The pattern is compiled under threading in
start_threads() before any concurrent execution has started by calling
pthread_create(), so if the pattern contains an error we still do the
right thing. I.e. die with one error before any threaded execution has
commenced, instead of e.g. spewing out an error for each N threads,
which could be a regression a change like this might inadvertently
introduce.

The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
--debug option to dump the parse tree", 2012-09-13) still works
properly with this change. It only emits debugging info during pattern
compilation, which is now dumped by the pattern compiled just before
the first thread is started.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 50e4bd2cd2..7baa4778b7 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -224,7 +224,8 @@ static void start_threads(struct grep_opt *opt)
 		int err;
 		struct grep_opt *o = grep_opt_dup(opt);
 		o->output = strbuf_out;
-		o->debug = 0;
+		if (i)
+			o->debug = 0;
 		compile_grep_patterns(o);
 		err = pthread_create(&threads[i], NULL, run, o);
 
@@ -1169,8 +1170,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	if (!opt.fixed && opt.ignore_case)
 		opt.regflags |= REG_ICASE;
 
-	compile_grep_patterns(&opt);
-
 	/*
 	 * We have to find "--" in a separate pass, because its presence
 	 * influences how we will parse arguments that come before it.
@@ -1247,6 +1246,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	num_threads = 0;
 #endif
 
+	if (!num_threads)
+		/*
+		 * The compiled patterns on the main path are only
+		 * used when not using threading. Otherwise
+		 * start_threads() below calls compile_grep_patterns()
+		 * for each thread.
+		 */
+		compile_grep_patterns(&opt);
+
 #ifndef NO_PTHREADS
 	if (num_threads) {
 		if (!(opt.name_only || opt.unmatch_name_only || opt.count)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/7] grep: skip pthreads overhead when using one thread
  2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
@ 2017-05-11 17:01 ` Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Skip the administrative overhead of using pthreads when only using one
thread. Instead take the non-threaded path which would be taken under
NO_PTHREADS.

The threading support was initially added in commit
5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
number of 8 threads. Later the number of threads was made configurable
in commit 89f09dd34e ("grep: add --threads=<num> option and
grep.threads configuration", 2015-12-15).

That change did not add any special handling for --threads=1. Now we
take a slightly faster path by skipping thread handling entirely when
1 thread is requested.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7baa4778b7..9c0d1ecc12 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1240,6 +1240,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 		num_threads = GREP_NUM_THREADS_DEFAULT;
 	else if (num_threads < 0)
 		die(_("invalid number of threads specified (%d)"), num_threads);
+	if (num_threads == 1)
+		num_threads = 0;
 #else
 	if (num_threads)
 		warning(_("no threads support, ignoring --threads"));
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/7] log: add -P as a synonym for --perl-regexp
  2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
@ 2017-05-11 17:01 ` Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add a short -P option as a synonym for the longer --perl-regexp, for
consistency with the options the corresponding grep invocations
accept.

This was intentionally omitted in commit 727b6fc3ed ("log --grep:
accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
future use.

Make it consistent with "grep" rather than to keep it open for future
use, and to avoid the confusion of -P meaning different things for
grep & log, as is the case with the -G option.

As noted in the aforementioned commit the --basic-regexp option can't
have a corresponding -G argument, as the log command already uses that
for -G<regex>.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/rev-list-options.txt |  1 +
 revision.c                         |  2 +-
 t/t4202-log.sh                     | 12 ++++++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a46f70c2b1..9c44eae55d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -91,6 +91,7 @@ endif::git-rev-list[]
 	Consider the limiting patterns to be fixed strings (don't interpret
 	pattern as a regular expression).
 
+-P::
 --perl-regexp::
 	Consider the limiting patterns to be Perl-compatible regular
 	expressions.
diff --git a/revision.c b/revision.c
index 7ff61ff5f7..03a3a012de 100644
--- a/revision.c
+++ b/revision.c
@@ -1995,7 +1995,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
 	} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-	} else if (!strcmp(arg, "--perl-regexp")) {
+	} else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
 		revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
 	} else if (!strcmp(arg, "--all-match")) {
 		revs->grep_filter.all_match = 1;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index b0122a1991..a87b7f6089 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -331,8 +331,20 @@ test_expect_success 'log with various grep.patternType configurations & command-
 			--grep="(1|2)" >actual.fixed.short-arg &&
 		git log --pretty=tformat:%s -E \
 			--grep="\|2" >actual.extended.short-arg &&
+		if test_have_prereq PCRE
+		then
+			git log --pretty=tformat:%s -P \
+				--grep="[\d]\|" >actual.perl.short-arg
+		else
+			test_must_fail git log -P \
+				--grep="[\d]\|"
+		fi &&
 		test_cmp expect.fixed actual.fixed.short-arg &&
 		test_cmp expect.extended actual.extended.short-arg &&
+		if test_have_prereq PCRE
+		then
+			test_cmp expect.perl actual.perl.short-arg
+		fi &&
 
 		git log --pretty=tformat:%s --fixed-strings \
 			--grep="(1|2)" >actual.fixed.long-arg &&
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/7] grep: add support for the PCRE v1 JIT API
  2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2017-05-11 17:01 ` [PATCH 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
@ 2017-05-11 17:01 ` Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:

    $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh
    Test                                           HEAD~             HEAD
    ---------------------------------------------------------------------------------------
    7820.3: perl grep how.to                       0.31(1.11+0.44)   0.20(0.35+0.57) -35.5%
    7820.7: perl grep ^how to                      0.57(2.66+0.38)   0.23(0.65+0.46) -59.6%
    7820.11: perl grep [how] to                    0.55(2.54+0.43)   0.29(0.86+0.52) -47.3%
    7820.15: perl grep (e.t[^ ]*|v.ry) rare        1.05(5.54+0.33)   0.30(1.10+0.44) -71.4%
    7820.19: perl grep m(ú|u)ult.b(æ|y)te          0.37(1.53+0.43)   0.24(0.70+0.47) -35.1%

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 37 ++++++++++++++++++++++++++++++++++++-
 grep.h |  6 ++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 5c808f8971..593e72f92a 100644
--- a/grep.c
+++ b/grep.c
@@ -350,6 +350,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	const char *error;
 	int erroffset;
 	int options = PCRE_MULTILINE;
+#ifdef PCRE_CONFIG_JIT
+	int canjit;
+#endif
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern))
@@ -364,9 +367,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	if (!p->pcre1_regexp)
 		compile_regexp_failed(p, error);
 
-	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+	p->pcre1_extra_info = pcre_study(p->pcre1_regexp, PCRE_STUDY_JIT_COMPILE, &error);
 	if (!p->pcre1_extra_info && error)
 		die("%s", error);
+
+#ifdef PCRE_CONFIG_JIT
+	pcre_config(PCRE_CONFIG_JIT, &canjit);
+	if (canjit == 1) {
+		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
+		if (!p->pcre1_jit_stack)
+			die("BUG: Couldn't allocate PCRE JIT stack");
+		pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack);
+		p->pcre1_jit_on = 1;
+	}
+#endif
 }
 
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
@@ -377,8 +391,20 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
+#ifdef PCRE_CONFIG_JIT
+	if (p->pcre1_jit_on)
+		ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+				    eol - line, 0, flags, ovector,
+				    ARRAY_SIZE(ovector), p->pcre1_jit_stack);
+	else
+		ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+				eol - line, 0, flags, ovector,
+				ARRAY_SIZE(ovector));
+#else
 	ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
 			0, flags, ovector, ARRAY_SIZE(ovector));
+#endif
+
 	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
 		die("pcre_exec failed with error code %d", ret);
 	if (ret > 0) {
@@ -393,7 +419,16 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre1_regexp);
+#ifdef PCRE_CONFIG_JIT
+	if (p->pcre1_jit_on) {
+		pcre_free_study(p->pcre1_extra_info);
+		pcre_jit_stack_free(p->pcre1_jit_stack);
+	} else {
+		pcre_free(p->pcre1_extra_info);
+	}
+#else
 	pcre_free(p->pcre1_extra_info);
+#endif
 	pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
diff --git a/grep.h b/grep.h
index 38ac82b638..14f47189f9 100644
--- a/grep.h
+++ b/grep.h
@@ -3,9 +3,13 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include <pcre.h>
+#ifndef PCRE_STUDY_JIT_COMPILE
+#define PCRE_STUDY_JIT_COMPILE 0
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
+typedef int pcre_jit_stack;
 #endif
 #include "kwset.h"
 #include "thread-utils.h"
@@ -48,7 +52,9 @@ struct grep_pat {
 	regex_t regexp;
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
+	pcre_jit_stack *pcre1_jit_stack;
 	const unsigned char *pcre1_tables;
+	int pcre1_jit_on;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/7] grep: un-break building with PCRE < 8.32
  2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2017-05-11 17:01 ` [PATCH 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
@ 2017-05-11 17:01 ` Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.

The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).

This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.

So just take the easy way out and disable the JIT on any version older
than 8.32.

The reason this change isn't part of the initial change PCRE JIT
support is because possibly slightly annoying someone who's bisecting
with an ancient PCRE is worth it to have a cleaner history showing
which parts of the implementation are only used for ancient PCRE
versions. This also makes it easier to revert this change if we ever
decide to stop supporting those old versions.

1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
   native interface for JIT. Through this interface, the
   compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 8 ++++----
 grep.h | 5 +++++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 593e72f92a..9ba9aab1a9 100644
--- a/grep.c
+++ b/grep.c
@@ -350,7 +350,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	const char *error;
 	int erroffset;
 	int options = PCRE_MULTILINE;
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	int canjit;
 #endif
 
@@ -371,7 +371,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 	if (!p->pcre1_extra_info && error)
 		die("%s", error);
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	pcre_config(PCRE_CONFIG_JIT, &canjit);
 	if (canjit == 1) {
 		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
@@ -391,7 +391,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	if (p->pcre1_jit_on)
 		ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
 				    eol - line, 0, flags, ovector,
@@ -419,7 +419,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre1_regexp);
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
 	if (p->pcre1_jit_on) {
 		pcre_free_study(p->pcre1_extra_info);
 		pcre_jit_stack_free(p->pcre1_jit_stack);
diff --git a/grep.h b/grep.h
index 14f47189f9..73ef0ef8ec 100644
--- a/grep.h
+++ b/grep.h
@@ -3,6 +3,11 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include <pcre.h>
+#ifdef PCRE_CONFIG_JIT
+#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
+#define GIT_PCRE1_CAN_DO_MODERN_JIT
+#endif
+#endif
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/7] grep: un-break building with PCRE < 8.20
  2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2017-05-11 17:01 ` [PATCH 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
@ 2017-05-11 17:01 ` Ævar Arnfjörð Bjarmason
  2017-05-11 17:01 ` [PATCH 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.20.

The 8.20 release was the first release to have JIT & pcre_jit_stack in
the headers, so a mock type needs to be provided for it on those
releases.

Now git should compile with all PCRE versions that it supported before
my JIT change.

I've tested it as far back as version 7.5 released on 2008-01-10, once
I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1,
and I couldn't be bothered to anything older than 7.5 as I'm confident
that if the build breaks on those older versions it's not because of
my JIT change.

See the "un-break" change in this series ("grep: un-break building
with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the
main PCRE JIT commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/grep.h b/grep.h
index 73ef0ef8ec..b7b9d487b0 100644
--- a/grep.h
+++ b/grep.h
@@ -11,6 +11,9 @@
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
+#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20
+typedef int pcre_jit_stack;
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/7] grep: add support for PCRE v2
  2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2017-05-11 17:01 ` [PATCH 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
@ 2017-05-11 17:01 ` Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-11 17:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jeffrey Walton, Michał Kiedrowicz,
	J Smith, Victor Leschuk, Nguyễn Thái Ngọc Duy,
	Fredrik Kuivinen, Brandon Williams,
	Ævar Arnfjörð Bjarmason

Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

    $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh
    [...]
    Test                                           HEAD~2            HEAD~                    HEAD
    ----------------------------------------------------------------------------------------------------------------
    7820.2: perl grep how.to                       0.30(1.07+0.48)   0.20(0.36+0.54) -33.3%   0.20(0.26+0.60) -33.3%
    7820.5: perl grep ^how to                      0.56(2.62+0.42)   0.25(0.61+0.48) -55.4%   0.19(0.27+0.61) -66.1%
    7820.8: perl grep [how] to                     0.57(2.54+0.41)   0.29(0.95+0.43) -49.1%   0.22(0.36+0.58) -61.4%
    7820.11: perl grep (e.t[^ ]*|v.ry) rare        1.03(5.57+0.34)   0.30(1.07+0.47) -70.9%   0.23(0.54+0.44) -77.7%
    7820.14: perl grep m(ú|u)ult.b(æ|y)te          0.37(1.60+0.37)   0.25(0.73+0.44) -32.4%   0.20(0.29+0.55) -45.9%

See commit ("perf: add a performance comparison test of grep -G, -E
and -P", 2017-04-19) for further details on the machine the above test
run was executed on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

    Test                                           HEAD~             HEAD
    ---------------------------------------------------------------------------------------
    7820.2: perl grep how.to                       0.20(0.40+0.48)   0.20(0.31+0.55) +0.0%
    7820.5: perl grep ^how to                      0.24(0.62+0.50)   0.19(0.30+0.58) -20.8%
    7820.8: perl grep [how] to                     0.29(0.98+0.39)   0.22(0.35+0.61) -24.1%
    7820.11: perl grep (e.t[^ ]*|v.ry) rare        0.30(1.16+0.38)   0.22(0.50+0.50) -26.7%
    7820.14: perl grep m(ú|u)ult.b(æ|y)te          0.24(0.74+0.43)   0.20(0.31+0.54) -16.7%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile      |  30 +++++++++---
 configure.ac  |  77 ++++++++++++++++++++++++++-----
 grep.c        | 143 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 grep.h        |  17 +++++++
 t/test-lib.sh |   2 +-
 5 files changed, 250 insertions(+), 19 deletions(-)

diff --git a/Makefile b/Makefile
index a79274e5e6..d77ca4c1a5 100644
--- a/Makefile
+++ b/Makefile
@@ -29,7 +29,12 @@ all::
 # Perl-compatible regular expressions instead of standard or extended
 # POSIX regular expressions.
 #
-# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
+# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
+# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
+# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
+# default in future releases.
+#
+# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
 # Define HAVE_ALLOCA_H if you have working alloca(3) defined in that header.
@@ -1087,15 +1092,27 @@ ifdef NO_LIBGEN_H
 	COMPAT_OBJS += compat/basename.o
 endif
 
-ifdef USE_LIBPCRE
-	BASIC_CFLAGS += -DUSE_LIBPCRE1
-	ifdef LIBPCREDIR
-		BASIC_CFLAGS += -I$(LIBPCREDIR)/include
-		EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+USE_LIBPCRE1 ?= $(USE_LIBPCRE)
+
+ifneq (,$(USE_LIBPCRE1))
+	ifdef USE_LIBPCRE2
+$(error Only set USE_LIBPCRE1 (or its alias USE_LIBPCRE) or USE_LIBPCRE2, not both!)
 	endif
+
+	BASIC_CFLAGS += -DUSE_LIBPCRE1
 	EXTLIBS += -lpcre
 endif
 
+ifdef USE_LIBPCRE2
+	BASIC_CFLAGS += -DUSE_LIBPCRE2
+	EXTLIBS += -lpcre2-8
+endif
+
+ifdef LIBPCREDIR
+	BASIC_CFLAGS += -I$(LIBPCREDIR)/include
+	EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
+endif
+
 ifdef HAVE_ALLOCA_H
 	BASIC_CFLAGS += -DHAVE_ALLOCA_H
 endif
@@ -2241,6 +2258,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
 	@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
 	@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
+	@echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
diff --git a/configure.ac b/configure.ac
index deeb968daa..11d083fbe0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -255,21 +255,61 @@ GIT_PARSE_WITH([openssl]))
 # Perl-compatible regular expressions instead of standard or extended
 # POSIX regular expressions.
 #
-# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
+# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
+# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
+# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
+# default in future releases.
+#
+# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
 #
 AC_ARG_WITH(libpcre,
-AS_HELP_STRING([--with-libpcre],[support Perl-compatible regexes (default is NO)])
+AS_HELP_STRING([--with-libpcre],[synonym for --with-libpcre1]),
+    if test "$withval" = "no"; then
+	USE_LIBPCRE1=
+    elif test "$withval" = "yes"; then
+	USE_LIBPCRE1=YesPlease
+    else
+	USE_LIBPCRE1=YesPlease
+	LIBPCREDIR=$withval
+	AC_MSG_NOTICE([Setting LIBPCREDIR to $LIBPCREDIR])
+        dnl USE_LIBPCRE1 can still be modified below, so don't substitute
+        dnl it yet.
+	GIT_CONF_SUBST([LIBPCREDIR])
+    fi)
+
+AC_ARG_WITH(libpcre1,
+AS_HELP_STRING([--with-libpcre1],[support Perl-compatible regexes via libpcre1 (default is NO)])
+AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and headers]),
+    if test "$withval" = "no"; then
+	USE_LIBPCRE1=
+    elif test "$withval" = "yes"; then
+	USE_LIBPCRE1=YesPlease
+    else
+	USE_LIBPCRE1=YesPlease
+	LIBPCREDIR=$withval
+	AC_MSG_NOTICE([Setting LIBPCREDIR to $LIBPCREDIR])
+        dnl USE_LIBPCRE1 can still be modified below, so don't substitute
+        dnl it yet.
+	GIT_CONF_SUBST([LIBPCREDIR])
+    fi)
+
+AC_ARG_WITH(libpcre2,
+AS_HELP_STRING([--with-libpcre2],[support Perl-compatible regexes via libpcre2 (default is NO)])
 AS_HELP_STRING([],           [ARG can be also prefix for libpcre library and headers]),
+    if test -n "$USE_LIBPCRE1"; then
+        AC_MSG_ERROR([Only supply one of --with-libpcre1 or --with-libpcre2!])
+    fi
+
     if test "$withval" = "no"; then
-	USE_LIBPCRE=
+	USE_LIBPCRE2=
     elif test "$withval" = "yes"; then
-	USE_LIBPCRE=YesPlease
+	USE_LIBPCRE2=YesPlease
     else
-	USE_LIBPCRE=YesPlease
+	USE_LIBPCRE2=YesPlease
 	LIBPCREDIR=$withval
 	AC_MSG_NOTICE([Setting LIBPCREDIR to $LIBPCREDIR])
-        dnl USE_LIBPCRE can still be modified below, so don't substitute
+        dnl USE_LIBPCRE2 can still be modified below, so don't substitute
         dnl it yet.
 	GIT_CONF_SUBST([LIBPCREDIR])
     fi)
@@ -501,13 +541,11 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
 GIT_CONF_SUBST([NO_OPENSSL])
 
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. Various
-# commands such as log and grep offer runtime options to use
-# Perl-compatible regular expressions instead of standard or extended
-# POSIX regular expressions.
+# Handle the USE_LIBPCRE1 and USE_LIBPCRE2 options potentially set
+# above.
 #
 
-if test -n "$USE_LIBPCRE"; then
+if test -n "$USE_LIBPCRE1"; then
 
 GIT_STASH_FLAGS($LIBPCREDIR)
 
@@ -517,7 +555,22 @@ AC_CHECK_LIB([pcre], [pcre_version],
 
 GIT_UNSTASH_FLAGS($LIBPCREDIR)
 
-GIT_CONF_SUBST([USE_LIBPCRE])
+GIT_CONF_SUBST([USE_LIBPCRE1])
+
+fi
+
+
+if test -n "$USE_LIBPCRE2"; then
+
+GIT_STASH_FLAGS($LIBPCREDIR)
+
+AC_CHECK_LIB([pcre2-8], [pcre2_config_8],
+[USE_LIBPCRE2=YesPlease],
+[USE_LIBPCRE2=])
+
+GIT_UNSTASH_FLAGS($LIBPCREDIR)
+
+GIT_CONF_SUBST([USE_LIBPCRE2])
 
 fi
 
diff --git a/grep.c b/grep.c
index 9ba9aab1a9..3855428dfb 100644
--- a/grep.c
+++ b/grep.c
@@ -179,22 +179,36 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
 	case GREP_PATTERN_TYPE_BRE:
 		opt->fixed = 0;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_ERE:
 		opt->fixed = 0;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		opt->regflags |= REG_EXTENDED;
 		break;
 
 	case GREP_PATTERN_TYPE_FIXED:
 		opt->fixed = 1;
 		opt->pcre1 = 0;
+		opt->pcre2 = 0;
 		break;
 
 	case GREP_PATTERN_TYPE_PCRE:
 		opt->fixed = 0;
+#ifdef USE_LIBPCRE2
+		opt->pcre1 = 0;
+		opt->pcre2 = 1;
+#else
+		/* It's important that pcre1 always be assigned to
+		 * even when there's no USE_LIBPCRE* defined. We still
+		 * call the PCRE stub function, it just dies with
+		 * "cannot use Perl-compatible regexes[...]".
+		 */
 		opt->pcre1 = 1;
+		opt->pcre2 = 0;
+#endif
 		break;
 	}
 }
@@ -448,6 +462,126 @@ static void free_pcre1_regexp(struct grep_pat *p)
 }
 #endif /* !USE_LIBPCRE1 */
 
+#ifdef USE_LIBPCRE2
+static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
+{
+	int error;
+	PCRE2_UCHAR errbuf[256];
+	PCRE2_SIZE erroffset;
+	int options = PCRE2_MULTILINE;
+	const uint8_t *character_tables = NULL;
+	uint32_t canjit;
+	int jitret;
+
+	assert(opt->pcre2);
+
+	p->pcre2_compile_context = NULL;
+
+	if (opt->ignore_case) {
+		if (has_non_ascii(p->pattern)) {
+			character_tables = pcre2_maketables(NULL);
+			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
+			pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
+		}
+		options |= PCRE2_CASELESS;
+	}
+	if (is_utf8_locale() && has_non_ascii(p->pattern))
+		options |= PCRE2_UTF;
+
+	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
+					 p->patternlen, options, &error, &erroffset,
+					 p->pcre2_compile_context);
+
+	if (p->pcre2_pattern) {
+		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
+		if (!p->pcre2_match_data)
+			die("BUG: Couldn't allocate PCRE2 match data");
+	} else {
+		pcre2_get_error_message(error, errbuf, sizeof(errbuf));
+		compile_regexp_failed(p, (const char *)&errbuf);
+	}
+
+	pcre2_config(PCRE2_CONFIG_JIT, &canjit);
+	if (canjit == 1) {
+		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
+		if (!jitret)
+			p->pcre2_jit_on = 1;
+		else
+			die("BUG: Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
+		p->pcre2_jit_stack = pcre2_jit_stack_create(1, 1024 * 1024, NULL);
+		if (!p->pcre2_jit_stack)
+			die("BUG: Couldn't allocate PCRE2 JIT stack");
+		p->pcre2_match_context = pcre2_match_context_create(NULL);
+		if (!p->pcre2_jit_stack)
+			die("BUG: Couldn't allocate PCRE2 match context");
+		pcre2_jit_stack_assign(p->pcre2_match_context, NULL, p->pcre2_jit_stack);
+	}
+}
+
+static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	int ret, flags = 0;
+	PCRE2_SIZE *ovector;
+	PCRE2_UCHAR errbuf[256];
+
+	if (eflags & REG_NOTBOL)
+		flags |= PCRE2_NOTBOL;
+
+	if (p->pcre2_jit_on)
+		ret = pcre2_jit_match(p->pcre2_pattern, (unsigned char *)line,
+				      eol - line, 0, flags, p->pcre2_match_data,
+				      NULL);
+	else
+		ret = pcre2_match(p->pcre2_pattern, (unsigned char *)line,
+				  eol - line, 0, flags, p->pcre2_match_data,
+				  NULL);
+
+	if (ret < 0 && ret != PCRE2_ERROR_NOMATCH) {
+		pcre2_get_error_message(ret, errbuf, sizeof(errbuf));
+		die("%s failed with error code %d: %s",
+		    (p->pcre2_jit_on ? "pcre2_jit_match" : "pcre2_match"), ret,
+		    errbuf);
+	}
+	if (ret > 0) {
+		ovector = pcre2_get_ovector_pointer(p->pcre2_match_data);
+		ret = 0;
+		match->rm_so = (int)ovector[0];
+		match->rm_eo = (int)ovector[1];
+	}
+
+	return ret;
+}
+
+static void free_pcre2_pattern(struct grep_pat *p)
+{
+	pcre2_compile_context_free(p->pcre2_compile_context);
+	pcre2_code_free(p->pcre2_pattern);
+	pcre2_match_data_free(p->pcre2_match_data);
+	pcre2_jit_stack_free(p->pcre2_jit_stack);
+	pcre2_match_context_free(p->pcre2_match_context);
+}
+#else /* !USE_LIBPCRE2 */
+static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
+{
+	/* Unreachable until USE_LIBPCRE2 becomes synonymous with
+	 * USE_LIBPCRE. See the sibling comment in
+	 * grep_set_pattern_type_option().
+	 */
+	die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
+}
+
+static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
+		regmatch_t *match, int eflags)
+{
+	return 1;
+}
+
+static void free_pcre2_pattern(struct grep_pat *p)
+{
+}
+#endif /* !USE_LIBPCRE2 */
+
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -511,6 +645,11 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
+	if (opt->pcre2) {
+		compile_pcre2_pattern(p, opt);
+		return;
+	}
+
 	if (opt->pcre1) {
 		compile_pcre1_regexp(p, opt);
 		return;
@@ -870,6 +1009,8 @@ void free_grep_patterns(struct grep_opt *opt)
 				kwsfree(p->kws);
 			else if (p->pcre1_regexp)
 				free_pcre1_regexp(p);
+			else if (p->pcre2_pattern)
+				free_pcre2_pattern(p);
 			else
 				regfree(&p->regexp);
 			free(p->pattern);
@@ -950,6 +1091,8 @@ static int patmatch(struct grep_pat *p, char *line, char *eol,
 		hit = !fixmatch(p, line, eol, match);
 	else if (p->pcre1_regexp)
 		hit = !pcre1match(p, line, eol, match, eflags);
+	else if (p->pcre2_pattern)
+		hit = !pcre2match(p, line, eol, match, eflags);
 	else
 		hit = !regexec_buf(&p->regexp, line, eol - line, 1, match,
 				   eflags);
diff --git a/grep.h b/grep.h
index b7b9d487b0..b40afc2e2f 100644
--- a/grep.h
+++ b/grep.h
@@ -19,6 +19,16 @@ typedef int pcre;
 typedef int pcre_extra;
 typedef int pcre_jit_stack;
 #endif
+#ifdef USE_LIBPCRE2
+#define PCRE2_CODE_UNIT_WIDTH 8
+#include <pcre2.h>
+#else
+typedef int pcre2_code;
+typedef int pcre2_match_data;
+typedef int pcre2_compile_context;
+typedef int pcre2_match_context;
+typedef int pcre2_jit_stack;
+#endif
 #include "kwset.h"
 #include "thread-utils.h"
 #include "userdiff.h"
@@ -63,6 +73,12 @@ struct grep_pat {
 	pcre_jit_stack *pcre1_jit_stack;
 	const unsigned char *pcre1_tables;
 	int pcre1_jit_on;
+	pcre2_code *pcre2_pattern;
+	pcre2_match_data *pcre2_match_data;
+	pcre2_compile_context *pcre2_compile_context;
+	pcre2_match_context *pcre2_match_context;
+	pcre2_jit_stack *pcre2_jit_stack;
+	int pcre2_jit_on;
 	kwset_t kws;
 	unsigned fixed:1;
 	unsigned ignore_case:1;
@@ -126,6 +142,7 @@ struct grep_opt {
 	int extended;
 	int use_reflog_filter;
 	int pcre1;
+	int pcre2;
 	int relative;
 	int pathname;
 	int null_following_name;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ab92c0ebaa..44d4679384 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1011,7 +1011,7 @@ esac
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1$USE_LIBPCRE2" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-11 17:01 ` [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
@ 2017-05-12 17:35   ` Brandon Williams
  2017-05-12 18:17     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 10+ messages in thread
From: Brandon Williams @ 2017-05-12 17:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen

On 05/11, Ævar Arnfjörð Bjarmason wrote:
> Change the pattern compilation logic under threading so that grep
> doesn't compile a pattern it never ends up using on the non-threaded
> code path, only to compile it again N times for N threads which will
> each use their own copy, ignoring the initially compiled pattern.

Is there a reason each thread needs to compile the patterns as opposed
to them being compiled a single time and being copies N time for N
threads?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading
  2017-05-12 17:35   ` Brandon Williams
@ 2017-05-12 18:17     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-12 18:17 UTC (permalink / raw)
  To: Brandon Williams
  Cc: Git Mailing List, Junio C Hamano, Jeff King, Jeffrey Walton,
	Michał Kiedrowicz, J Smith, Victor Leschuk,
	Nguyễn Thái Ngọc Duy, Fredrik Kuivinen

On Fri, May 12, 2017 at 7:35 PM, Brandon Williams <bmwill@google.com> wrote:
> On 05/11, Ęvar Arnfjörš Bjarmason wrote:
>> Change the pattern compilation logic under threading so that grep
>> doesn't compile a pattern it never ends up using on the non-threaded
>> code path, only to compile it again N times for N threads which will
>> each use their own copy, ignoring the initially compiled pattern.
>
> Is there a reason each thread needs to compile the patterns as opposed
> to them being compiled a single time and being copies N time for N
> threads?

Not really, just simplicity.

I'll amend the commit message to mention that I did this not as an
optimization, but just so the code is easier to reason about and
debug, when debugging this I found that there were two different
stacktraces to get to where I was compiling the pattern, and e.g.
"give each compilation/execution an id" for ad-hoc debugging would
always end up with one unused pattern, confusingly.

The reason it's like this is because it's a side-effect of duplicating
the whole grep_opt structure, which is not thread safe, writable, and
munged during execution, that's where the pattern is stored.

We could re-use the compiled regexp itself for POSIX, for PCRE you can
pre-JIT, post-JIT you can only partially do it, i.e. the pattern is
const, thread safe and can be shared, but you need to also marshal
around mutable per-thread for the JIT stack etc.

I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT some of the most expensive PCRE patterns
take something like 0.0X milliseconds to compile at most[1].

1. http://sljit.sourceforge.net/pcre.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-12 18:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-11 17:01 [PATCH 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
2017-05-12 17:35   ` Brandon Williams
2017-05-12 18:17     ` Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
2017-05-11 17:01 ` [PATCH 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).