git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Beat Bolli" <dev+git@drbeat.li>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v2 3/8] grep: stop using a custom JIT stack with PCRE v1
Date: Fri, 26 Jul 2019 17:08:13 +0200	[thread overview]
Message-ID: <20190726150818.6373-4-avarab@gmail.com> (raw)
In-Reply-To: <20190724151415.3698-1-avarab@gmail.com>

Simplify the PCRE v1 code for the same reasons as for the PCRE v2 code
in the last commit. Unlike with v2 we actually used the custom stack
in v1, but let's use PCRE's built-in 32 KB one instead, since
experience with v2 shows that's enough. Most distros are already using
v2 as a default, and the underlying sljit code is the same.

Unfortunately we can't just pass a NULL to pcre_jit_exec() as with
pcre2_jit_match(). Unlike the v2 function it doesn't support
that. Instead we need to use the fatter pcre_exec() if we'd like the
same behavior.

This will make things slightly slower than on the fast-path function,
but it's OK since we care less about v1 performance these days since
we have and recommend v2. Running a similar performance test as what I
ran in fbaceaac47 ("grep: add support for the PCRE v1 JIT API",
2017-05-25) via:

    GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE1=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst' ./run HEAD~ HEAD p7820-grep-engines.sh

Gives us this, just the /perl/ results:

    Test                                            HEAD~             HEAD
    ---------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.19(0.67+0.52)   0.19(0.65+0.52) +0.0%
    7820.7: perl grep '^how to'                     0.19(0.78+0.44)   0.19(0.72+0.49) +0.0%
    7820.11: perl grep '[how] to'                   0.39(2.13+0.43)   0.40(2.10+0.46) +2.6%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.44(2.55+0.37)   0.45(2.47+0.41) +2.3%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.23(1.06+0.42)   0.22(1.03+0.43) -4.3%

It will also implicitly re-enable UTF-8 validation for PCRE v1. As
noted in [1] we now have cases as a result where PCRE v1 is more eager
to error out. Subsequent patches will fix that for v2, and I think
it's fair to tell v1 users "just upgrade" and not worry about that
edge case for v1.

1.  https://public-inbox.org/git/CAPUEsphZJ_Uv9o1-yDpjNLA_q-f7gWXz9g1gCY2pYAYN8ri40g@mail.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 28 +++++-----------------------
 grep.h |  5 -----
 2 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/grep.c b/grep.c
index 4b1e917ac5..9c2b259771 100644
--- a/grep.c
+++ b/grep.c
@@ -394,12 +394,6 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 
 #ifdef GIT_PCRE1_USE_JIT
 	pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
-	if (p->pcre1_jit_on) {
-		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
-		if (!p->pcre1_jit_stack)
-			die("Couldn't allocate PCRE JIT stack");
-		pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack);
-	}
 #endif
 }
 
@@ -411,18 +405,9 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
-#ifdef GIT_PCRE1_USE_JIT
-	if (p->pcre1_jit_on) {
-		ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
-				    eol - line, 0, flags, ovector,
-				    ARRAY_SIZE(ovector), p->pcre1_jit_stack);
-	} else
-#endif
-	{
-		ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
-				eol - line, 0, flags, ovector,
-				ARRAY_SIZE(ovector));
-	}
+	ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+			eol - line, 0, flags, ovector,
+			ARRAY_SIZE(ovector));
 
 	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
 		die("pcre_exec failed with error code %d", ret);
@@ -439,14 +424,11 @@ static void free_pcre1_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre1_regexp);
 #ifdef GIT_PCRE1_USE_JIT
-	if (p->pcre1_jit_on) {
+	if (p->pcre1_jit_on)
 		pcre_free_study(p->pcre1_extra_info);
-		pcre_jit_stack_free(p->pcre1_jit_stack);
-	} else
+	else
 #endif
-	{
 		pcre_free(p->pcre1_extra_info);
-	}
 	pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
diff --git a/grep.h b/grep.h
index 4d8e300175..ce2d72571f 100644
--- a/grep.h
+++ b/grep.h
@@ -14,13 +14,9 @@
 #ifndef GIT_PCRE_STUDY_JIT_COMPILE
 #define GIT_PCRE_STUDY_JIT_COMPILE 0
 #endif
-#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20
-typedef int pcre_jit_stack;
-#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
-typedef int pcre_jit_stack;
 #endif
 #ifdef USE_LIBPCRE2
 #define PCRE2_CODE_UNIT_WIDTH 8
@@ -85,7 +81,6 @@ struct grep_pat {
 	regex_t regexp;
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
-	pcre_jit_stack *pcre1_jit_stack;
 	const unsigned char *pcre1_tables;
 	int pcre1_jit_on;
 	pcre2_code *pcre2_pattern;
-- 
2.22.0.455.g172b71a6c5


  parent reply	other threads:[~2019-07-26 15:09 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-21 19:40 [PATCH] grep: use custom JIT stack with pcre2 Carlo Marcelo Arenas Belón
2019-07-24 15:14 ` [PATCH 0/3] grep: PCRE JIT fixes Ævar Arnfjörð Bjarmason
2019-07-24 16:18   ` Junio C Hamano
2019-07-24 20:03     ` Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 0/8] grep: PCRE JIT fixes + ab/no-kwset fix Ævar Arnfjörð Bjarmason
2019-07-26 20:27     ` Junio C Hamano
2019-07-29  9:20       ` Ævar Arnfjörð Bjarmason
2019-07-29 16:12         ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 1/8] grep: remove overly paranoid BUG(...) code Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 2/8] grep: stop "using" a custom JIT stack with PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-29  0:33     ` Carlo Arenas
2019-07-26 15:08   ` Ævar Arnfjörð Bjarmason [this message]
2019-07-29  1:26     ` [PATCH v2 3/8] grep: stop using a custom JIT stack with PCRE v1 Carlo Arenas
2019-07-26 15:08   ` [PATCH v2 4/8] grep: consistently use "p->fixed" in compile_regexp() Ævar Arnfjörð Bjarmason
2019-07-29  1:48     ` Carlo Arenas
2019-07-29  9:05       ` Ævar Arnfjörð Bjarmason
2019-07-29  9:13         ` Ævar Arnfjörð Bjarmason
2019-07-29 16:23           ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 5/8] grep: create a "is_fixed" member in "grep_pat" Ævar Arnfjörð Bjarmason
2019-07-26 15:08   ` [PATCH v2 6/8] grep: stess test PCRE v2 on invalid UTF-8 data Ævar Arnfjörð Bjarmason
2019-07-26 20:34     ` Junio C Hamano
2019-07-26 21:55       ` Ævar Arnfjörð Bjarmason
2019-07-29  3:06     ` Carlo Arenas
2019-11-26 21:50     ` [PATCH] t7812: add missing redirects Andreas Schwab
2019-11-26 22:27       ` Johannes Schindelin
2019-11-26 23:11         ` Andreas Schwab
2019-11-27 11:58           ` Jeff King
2019-11-30  0:46       ` [PATCH] t7812: expect failure for grep -i with invalid UTF-8 data Todd Zullinger
2019-11-30  8:00         ` Andreas Schwab
2019-12-01 16:33           ` Junio C Hamano
2019-12-01 17:09             ` Andreas Schwab
2019-12-01 18:32             ` Todd Zullinger
2019-12-02  6:13               ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 7/8] grep: do not enter PCRE2_UTF mode on fixed matching Ævar Arnfjörð Bjarmason
2019-07-26 20:36     ` Junio C Hamano
2019-07-26 15:08   ` [PATCH v2 8/8] grep: optimistically use PCRE2_MATCH_INVALID_UTF Ævar Arnfjörð Bjarmason
2019-07-26 21:07     ` Junio C Hamano
2019-07-26 21:53       ` Ævar Arnfjörð Bjarmason
2019-07-26 21:57         ` Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 0/4] grep: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 0/2] " Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 " Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 1/2] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24 17:28         ` [PATCH v5 2/2] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 1/2] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24 11:48       ` [PATCH v4 2/2] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2021-01-24 13:53         ` Ramsay Jones
2021-01-24 14:24           ` Ramsay Jones
2021-01-24 14:49             ` Ævar Arnfjörð Bjarmason
2021-01-24 16:10               ` Ramsay Jones
2021-01-24 17:29                 ` Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 1/4] grep/pcre2 tests: don't rely on invalid UTF-8 data test Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 2/4] grep/pcre2: simplify boolean spaghetti Ævar Arnfjörð Bjarmason
2021-01-24  5:33       ` Junio C Hamano
2021-01-24 10:45         ` Johannes Sixt
2021-01-24  2:12     ` [PATCH v3 3/4] grep/pcre2: further " Ævar Arnfjörð Bjarmason
2021-01-24  2:12     ` [PATCH v3 4/4] grep/pcre2: better support invalid UTF-8 haystacks Ævar Arnfjörð Bjarmason
2019-07-24 15:14 ` [PATCH 1/3] grep: remove overly paranoid BUG(...) code Ævar Arnfjörð Bjarmason
2019-07-24 15:14 ` [PATCH 2/3] grep: stop "using" a custom JIT stack with PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-24 16:24   ` Junio C Hamano
2019-07-24 20:06     ` Ævar Arnfjörð Bjarmason
2019-07-25  5:11       ` Carlo Arenas
2019-07-24 15:14 ` [PATCH 3/3] grep: stop using a custom JIT stack with PCRE v1 Ævar Arnfjörð Bjarmason
2019-07-26 13:15   ` Carlo Arenas
2019-07-26 13:50     ` Ævar Arnfjörð Bjarmason
2019-07-26 14:12       ` Carlo Arenas
2019-07-26 14:43         ` Ævar Arnfjörð Bjarmason
2019-07-26 20:26           ` [RFC PATCH 0/2] PCRE1 cleanup Carlo Marcelo Arenas Belón
2019-07-26 20:26             ` [RFC PATCH 1/2] grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1 Carlo Marcelo Arenas Belón
2019-07-26 20:26             ` [RFC PATCH 2/2] grep: refactor and simplify PCRE1 support Carlo Marcelo Arenas Belón

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190726150818.6373-4-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=carenas@gmail.com \
    --cc=dev+git@drbeat.li \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).