git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Jeff King" <peff@peff.net>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Fredrik Kuivinen" <frekui@gmail.com>,
	"Brandon Williams" <bmwill@google.com>,
	"Stefan Beller" <sbeller@google.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Simon Ruderich" <simon@ruderich.org>
Subject: Re: [PATCH v2 7/7] grep: add support for PCRE v2
Date: Thu, 25 May 2017 11:49:15 +0200	[thread overview]
Message-ID: <CACBZZX5Bp43eoWYvpnTtkwA8LyHzwZmrumxBG=-XXuqu1DMKSw@mail.gmail.com> (raw)
In-Reply-To: <xmqqa862hixh.fsf@gitster.mtv.corp.google.com>

On Wed, May 24, 2017 at 8:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Add support for v2 of the PCRE API. This is a new major version of
>> PCRE that came out in early 2015[1].
>>
>> The regular expression syntax is the same, but while the API is
>> similar, pretty much every function is either renamed or takes
>> different arguments. Thus using it via entirely new functions makes
>> sense, as opposed to trying to e.g. have one compile_pcre_pattern()
>> that would call either PCRE v1 or v2 functions.
>>
>> Git can now be compiled with either USE_LIBPCRE1=YesPlease or
>> USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
>> synonym for the former. Providing both is a compile-time error.
>>
>> With earlier patches to enable JIT for PCRE v1 the performance of the
>> release versions of both libraries is almost exactly the same, with
>> PCRE v2 being around 1% slower.
>>
>> However after I reported this to the pcre-dev mailing list[2] I got a
>> lot of help with the API use from Zoltán Herczeg, he subsequently
>> optimized some of the JIT functionality in v2 of the library.
>>
>> Running the p7820-grep-engines.sh performance test against the latest
>> Subversion trunk of both, with both them and git compiled as -O3, and
>> the test run against linux.git, gives the following results. Just the
>> /perl/ tests shown:
>>
>>     $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh
>>     [...]
>>     Test                                           HEAD~2            HEAD~                    HEAD
>>     ----------------------------------------------------------------------------------------------------------------
>>     7820.3: perl grep 'how.to'                      0.22(0.40+0.48)   0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
>>     7820.7: perl grep '^how to'                     0.27(0.62+0.50)   0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
>>     7820.11: perl grep '[how] to'                   0.33(0.92+0.47)   0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
>>     7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.35(1.08+0.46)   0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
>>     7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.78+0.51)   0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%
>>
>> See commit ("perf: add a comparison test of grep regex engines",
>> 2017-04-19) for details on the machine the above test run was executed
>> on.
>>
>> Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
>> JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
>> mentioning p7820-grep-engines.sh for more details on the test setup.
>>
>> For ease of readability, a different run just of HEAD~ (PCRE v1 with
>> JIT v.s. PCRE v2), again with just the /perl/ tests shown:
>>
>>     Test                                           HEAD~             HEAD
>>     ---------------------------------------------------------------------------------------
>>     7820.3: perl grep 'how.to'                      0.23(0.41+0.47)   0.23(0.26+0.59) +0.0%
>>     7820.7: perl grep '^how to'                     0.27(0.64+0.47)   0.23(0.28+0.56) -14.8%
>>     7820.11: perl grep '[how] to'                   0.34(0.95+0.44)   0.25(0.38+0.56) -26.5%
>>     7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.34(1.07+0.46)   0.24(0.52+0.49) -29.4%
>>     7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.30(0.81+0.46)   0.22(0.33+0.54) -26.7%
>>
>> I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
>> when it does it's around 20% faster.
>>
>> A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
>> the compiled pattern can be shared between threads, but not some of
>> the JIT context, however the grep threading support does all pattern &
>> JIT compilation in separate threads, so this code doesn't need to
>> concern itself with thread safety.
>
> Nicely explained.
>
>> -# Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
>> +# Currently USE_LIBPCRE is a synonym for USE_LIBPCRE1, define
>> +# USE_LIBPCRE2 instead if you'd like to use version 2 of the PCRE
>> +# library. The USE_LIBPCRE flag will likely be changed to mean v2 by
>> +# default in future releases.
>> +#
>> +# Define LIBPCREDIR=/foo/bar if your PCRE header and library files are in
>>  # /foo/bar/include and /foo/bar/lib directories.
>
> As there is no way to use both, having a single LIBPCREDIR is not a
> hurting limitation, which makes sense.

Will nevertheless add a comment to clarify this.

>> @@ -2241,6 +2258,7 @@ GIT-BUILD-OPTIONS: FORCE
>>       @echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
>>       @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
>>       @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+
>
> Shouldn't the line above record $(USE_LIBPCRE1) instead of the
> generic fallback?

Yes, will fix.

>> +     @echo USE_LIBPCRE2=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE2)))'\' >>$@+
>>       @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
>>       @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+
>>       @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
>
>> diff --git a/grep.c b/grep.c
>> index 3c0c30f033..569cf9e290 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -179,22 +179,36 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st
>>       case GREP_PATTERN_TYPE_BRE:
>>               opt->fixed = 0;
>>               opt->pcre1 = 0;
>> +             opt->pcre2 = 0;
>>               break;
>>
>>       case GREP_PATTERN_TYPE_ERE:
>>               opt->fixed = 0;
>>               opt->pcre1 = 0;
>> +             opt->pcre2 = 0;
>>               opt->regflags |= REG_EXTENDED;
>>               break;
>>
>>       case GREP_PATTERN_TYPE_FIXED:
>>               opt->fixed = 1;
>>               opt->pcre1 = 0;
>> +             opt->pcre2 = 0;
>>               break;
>>
>>       case GREP_PATTERN_TYPE_PCRE:
>>               opt->fixed = 0;
>> +#ifdef USE_LIBPCRE2
>> +             opt->pcre1 = 0;
>> +             opt->pcre2 = 1;
>> +#else
>> +             /* It's important that pcre1 always be assigned to
>> +              * even when there's no USE_LIBPCRE* defined. We still
>> +              * call the PCRE stub function, it just dies with
>> +              * "cannot use Perl-compatible regexes[...]".
>> +              */
>>               opt->pcre1 = 1;
>
> Very well thought-out comment.  Our style wants you to have
> slash-aster that opens a multi-line comment on its own line, though.
Will fix.
>> +             opt->pcre2 = 0;
>> +#endif
>>               break;
>>       }
>>  }
>> @@ -446,6 +460,126 @@ static void free_pcre1_regexp(struct grep_pat *p)
>>  }
>>  #endif /* !USE_LIBPCRE1 */
>>
>> +#ifdef USE_LIBPCRE2
>> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
>> +{
>> +...
>> +     p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
>> +                                      p->patternlen, options, &error, &erroffset,
>> +                                      p->pcre2_compile_context);
>
> Are all die("BUG:...") in this function actual bugs, or just
> "die()"?  Just like the comment on an earlier patch, things like
> running out of memory that you as a Git programmer cannot fix by
> correcting this code are not die("BUG:"), but normal runtime errors.
Will fix these.
>> +
>> +     if (p->pcre2_pattern) {
>> +             p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
>> +             if (!p->pcre2_match_data)
>> +                     die("BUG: Couldn't allocate PCRE2 match data");
>> +     } else {
>> +             pcre2_get_error_message(error, errbuf, sizeof(errbuf));
>> +             compile_regexp_failed(p, (const char *)&errbuf);
>> +     }
>> +
>> +     pcre2_config(PCRE2_CONFIG_JIT, &canjit);
>> +     if (canjit == 1) {
>> +             jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
>> +             if (!jitret)
>> +                     p->pcre2_jit_on = 1;
>
> I think the same "would it be better to do this without canjit?"
> comment applies here.
Yup, changed.
>> +#else /* !USE_LIBPCRE2 */
>> +static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
>> +{
>> +     /* Unreachable until USE_LIBPCRE2 becomes synonymous with
>> +      * USE_LIBPCRE. See the sibling comment in
>> +      * grep_set_pattern_type_option().
>> +      */
>> +     die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE");
>> +}
>
> Wow.  If I were doing this, I wouldn't have been this cautious, but
> I have no complaints ;-).
>

  reply	other threads:[~2017-05-25  9:49 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-20 21:42 [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 01/30] Makefile & configure: reword inaccurate comment about PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 02/30] grep & rev-list doc: stop promising libpcre for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 03/30] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 04/30] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 05/30] log: make --regexp-ignore-case work with --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-20 23:50   ` Junio C Hamano
2017-05-21  6:58     ` Ævar Arnfjörð Bjarmason
2017-05-22  0:17       ` Junio C Hamano
2017-06-28 21:58       ` [PATCH 0/5] grep: remove redundant code & reflags from API Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 0/6] " Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 1/6] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 2/6] grep: adjust a redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 3/6] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 4/6] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
2017-06-29 22:22         ` [PATCH v2 5/6] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
2017-06-30 17:20           ` Junio C Hamano
2017-06-29 22:22         ` [PATCH v2 6/6] grep: remove redundant REG_NEWLINE when compiling fixed regex Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 1/5] grep: remove redundant double assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 2/5] grep: remove redundant grep pattern type assignment Ævar Arnfjörð Bjarmason
2017-06-29 17:03         ` Stefan Beller
2017-06-29 17:50           ` Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 3/5] grep: remove redundant "fixed" field re-assignment to 0 Ævar Arnfjörð Bjarmason
2017-06-29 17:10         ` Stefan Beller
2017-06-28 21:58       ` [PATCH 4/5] grep: remove redundant and verbose re-assignments " Ævar Arnfjörð Bjarmason
2017-06-28 21:58       ` [PATCH 5/5] grep: remove regflags from the public grep_opt API Ævar Arnfjörð Bjarmason
2017-06-29 17:43         ` Stefan Beller
2017-06-29 18:16           ` Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 06/30] grep: add a test asserting that --perl-regexp dies when !PCRE Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 07/30] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 08/30] grep: change non-ASCII -i test to stop using --debug Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 09/30] grep: add tests for --threads=N and grep.threads Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 10/30] grep: amend submodule recursion test for regex engine testing Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 11/30] grep: add tests for grep pattern types being passed to submodules Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 12/30] grep: add a test helper function for less verbose -f \0 tests Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 13/30] grep: prepare for testing binary regexes containing rx metacharacters Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 14/30] grep: add tests to fix blind spots with \0 patterns Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 15/30] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Ævar Arnfjörð Bjarmason
2017-05-20 23:50   ` Junio C Hamano
2017-05-21  6:23     ` Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 16/30] perf: emit progress output when unpacking & building Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 17/30] perf: add a comparison test of grep regex engines Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 18/30] perf: add a comparison test of grep regex engines with -F Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 19/30] perf: add a comparison test of log --grep regex engines Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 20/30] grep: catch a missing enum in switch statement Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 21/30] grep: remove redundant regflags assignments Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function Ævar Arnfjörð Bjarmason
2017-05-23 21:17   ` Brandon Williams
2017-05-20 21:42 ` [PATCH v3 23/30] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 24/30] grep: change internal *pcre* variable & function names to be *pcre1* Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 25/30] grep: move is_fixed() earlier to avoid forward declaration Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 26/30] test-lib: add a PTHREADS prerequisite Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 27/30] pack-objects & index-pack: add test for --threads warning Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 28/30] pack-objects: fix buggy warning about threads Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 29/30] grep: given --threads with NO_PTHREADS=YesPlease, warn Ævar Arnfjörð Bjarmason
2017-05-20 21:42 ` [PATCH v3 30/30] grep: assert that threading is enabled when calling grep_{lock,unlock} Ævar Arnfjörð Bjarmason
2017-05-20 23:50 ` [PATCH v3 00/30] Easy to review grep & pre-PCRE changes Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading Ævar Arnfjörð Bjarmason
2017-05-24  4:42     ` Junio C Hamano
2017-05-25 10:33       ` Ævar Arnfjörð Bjarmason
2017-05-26  0:58         ` Junio C Hamano
2017-05-26  8:06           ` Ævar Arnfjörð Bjarmason
2017-05-26  9:49             ` Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 2/7] grep: skip pthreads overhead when using one thread Ævar Arnfjörð Bjarmason
2017-05-24  4:45     ` Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 3/7] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API Ævar Arnfjörð Bjarmason
2017-05-24  5:17     ` Junio C Hamano
2017-05-24  7:37       ` Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 5/7] grep: un-break building with PCRE < 8.32 Ævar Arnfjörð Bjarmason
2017-05-24  6:00     ` Junio C Hamano
2017-05-24  6:38       ` Junio C Hamano
2017-05-23 19:24   ` [PATCH v2 6/7] grep: un-break building with PCRE < 8.20 Ævar Arnfjörð Bjarmason
2017-05-23 19:24   ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
2017-05-24  6:23     ` Junio C Hamano
2017-05-25  9:49       ` Ævar Arnfjörð Bjarmason [this message]
  -- strict thread matches above, loose matches on Subject: below --
2017-05-13 23:45 [PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes Ævar Arnfjörð Bjarmason
2017-05-13 23:45 ` [PATCH v2 7/7] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACBZZX5Bp43eoWYvpnTtkwA8LyHzwZmrumxBG=-XXuqu1DMKSw@mail.gmail.com' \
    --to=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=simon@ruderich.org \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).