git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: PCRE v2 compile error, was Re: What's cooking in git.git (May 2017, #01; Mon, 1)
Date: Tue, 9 May 2017 02:00:18 +0200	[thread overview]
Message-ID: <CACBZZX4G_ThE55Gi53QJt1=9K4jQXqJ3QL8JSGpiSSSDRrKeNA@mail.gmail.com> (raw)
In-Reply-To: <20170508233224.udpuuzlygjpsjogt@genre.crustytoothpaste.net>

On Tue, May 9, 2017 at 1:32 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Mon, May 08, 2017 at 04:10:41PM +0900, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>> > This won't be in my next PCRE submission, but I have a path locally to
>> > simply import PCRE into git.git as compat/pcre2, so it can be compiled
>> > with NO_PCRE=Y similar to how NO_REGEX=Y works.
>> >
>> > This will hopefully address your concerns partially, i.e. when you do
>> > want to try it out it'll be easier.
>>
>> Eek, please don't.
>>
>> Until pcre2 becomes _so_ stable that all reasonable distros give
>> choice to the end-users to install it easily in a packaged form,
>> such a "not a fork, but a copy" will be a moving target that I'd
>> rather not to have in compat/.  IOW, our compat/$pkg should be a
>> last resort to help those on distros that are so hard to convince to
>> carry the version/variant of $pkg we would like to use.

The reason I'm looking into this is because the WIP part of my PCRE
branch has changes which start to use PCRE as a general matching
engine in git, even. I.e.:

* git grep -F will be powered by it rather than kwset (which'll be git rm'd)
* Long standing limitations with \0s in regexes go away.
* grep -G and -E will use PCRE via a WIP POSIX -> PCRE  pattern
translator (https://bugs.exim.org/show_bug.cgi?id=2106)
* Perhaps we can remove compat/regex/ entirely & use PCRE via its
POSIX emulation mode + pattern translator (we use regcomp/regexec a
lot for non-grep/log), I'm not sure yet.

I have messy but working code for this in a WIP branch, here's the
performance improvement against linux.git:

Test                                           v2.13.0-rc2       HEAD
---------------------------------------------------------------------------------------
7820.1: basic grep how.to                      0.31(1.20+0.46)
0.19(0.33+0.55) -38.7%
7820.2: extended grep how.to                   0.31(1.19+0.46)
0.19(0.33+0.55) -38.7%
7820.3: perl grep how.to                       0.30(1.12+0.46)
0.19(0.28+0.62) -36.7%
7820.5: basic grep ^how to                     0.31(1.24+0.39)
0.19(0.32+0.56) -38.7%
7820.6: extended grep ^how to                  0.30(1.18+0.44)
0.19(0.22+0.66) -36.7%
7820.7: perl grep ^how to                      0.55(2.68+0.41)
0.19(0.32+0.56) -65.5%
7820.9: basic grep [how] to                    0.47(2.17+0.44)
0.22(0.39+0.54) -53.2%
7820.10: extended grep [how] to                0.47(2.21+0.40)
0.22(0.39+0.55) -53.2%
7820.11: perl grep [how] to                    0.53(2.64+0.39)
0.21(0.37+0.58) -60.4%
7820.13: basic grep \(e.t[^ ]*\|v.ry\) rare    0.63(3.16+0.48)
0.21(0.48+0.53) -66.7%
7820.14: extended grep (e.t[^ ]*|v.ry) rare    0.64(3.28+0.38)
0.21(0.45+0.57) -67.2%
7820.15: perl grep (e.t[^ ]*|v.ry) rare        1.00(5.77+0.37)
0.21(0.50+0.53) -79.0%
7820.17: basic grep m\(ú\|u\)ult.b\(æ\|y\)te   0.31(1.23+0.51)
0.19(0.30+0.58) -38.7%
7820.18: extended grep m(ú|u)ult.b(æ|y)te      0.32(1.26+0.47)
0.19(0.27+0.61) -40.6%
7820.19: perl grep m(ú|u)ult.b(æ|y)te          0.36(1.61+0.37)
0.19(0.30+0.57) -47.2%
7821.1: fixed grep int                         0.52(1.71+0.64)
0.43(1.11+0.72) -17.3%
7821.2: basic grep int                         0.54(1.62+0.70)
0.42(1.14+0.62) -22.2%
7821.3: extended grep int                      0.53(1.67+0.64)
0.51(1.17+0.62) -3.8%
7821.4: perl grep int                          0.53(1.71+0.59)
0.72(1.13+0.63) +35.8%
7821.6: fixed grep -i int                      0.58(1.86+0.67)
0.47(1.32+0.62) -19.0%
7821.7: basic grep -i int                      0.62(1.94+0.61)
0.57(1.25+0.72) -8.1%
7821.8: extended grep -i int                   0.82(1.86+0.68)
0.50(1.41+0.56) -39.0%
7821.9: perl grep -i int                       0.70(1.88+0.68)
0.56(1.25+0.70) -20.0%
7821.11: fixed grep æ                          0.33(1.30+0.43)
0.19(0.22+0.64) -42.4%
7821.12: basic grep æ                          0.33(1.35+0.38)
0.18(0.26+0.59) -45.5%
7821.13: extended grep æ                       0.33(1.20+0.52)
0.18(0.32+0.53) -45.5%
7821.14: perl grep æ                           0.33(1.31+0.40)
0.18(0.28+0.56) -45.5%
7821.16: fixed grep -i æ                       0.25(0.87+0.50)
0.18(0.24+0.60) -28.0%
7821.17: basic grep -i æ                       0.26(0.88+0.48)
0.18(0.24+0.60) -30.8%
7821.18: extended grep -i æ                    0.26(0.92+0.44)
0.18(0.24+0.61) -30.8%
7821.19: perl grep -i æ                        0.25(0.79+0.45)
0.19(0.32+0.56) -24.0%

In case that comes out misformatted it's also available at
https://github.com/avar/git/commit/ee5b2040e2c697e22a73c7b5f07f1b1e591f07e3

I.e. grepping is sped up by 50% and more in many cases, even for -G
and -E patterns which now get translated internally into PCRE
patterns.

> PCRE and PCRE2 also tend to have a lot of security updates, so I would
> prefer if we didn't import them into the tree.  It is far better for
> users to use their distro's packages for PCRE, as it means they get
> automatic security updates even if they're using an old Git.
>
> We shouldn't consider shipping anything with a remotely frequent history
> of security updates in our tree, since people very frequently run old or
> ancient versions of Git.

I'm aware of its security record[1], but I wonder what threat model
you have in mind here. I'm not aware of any parts of git (except maybe
gitweb?) where we take regexes from untrusted sources.

I.e. yes there have been DoS's & even some overflow bugs leading code
execution in PCRE, but in the context of powering git-grep & git-log
with PCRE this falls into the "stop hitting yourself" category.

1. https://www.cvedetails.com/vendor/3265/Pcre.html

  reply	other threads:[~2017-05-09  0:00 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-01  5:35 What's cooking in git.git (May 2017, #01; Mon, 1) Junio C Hamano
2017-05-01 14:25 ` Ævar Arnfjörð Bjarmason
2017-05-01 16:21   ` Brandon Williams
2017-05-01 17:44     ` Ævar Arnfjörð Bjarmason
2017-05-01 19:29   ` Jeff King
2017-05-01 23:21   ` Junio C Hamano
2017-05-02  8:23     ` Ævar Arnfjörð Bjarmason
2017-05-02  9:35       ` Junio C Hamano
2017-05-02 12:31         ` Ævar Arnfjörð Bjarmason
2017-05-02 11:11 ` Johannes Schindelin
2017-05-02 12:09 ` PCRE v2 compile error, was " Johannes Schindelin
2017-05-02 12:27   ` Ævar Arnfjörð Bjarmason
2017-05-02 16:05     ` Johannes Schindelin
2017-05-02 18:52       ` Ævar Arnfjörð Bjarmason
2017-05-02 20:51         ` Kevin Daudt
2017-05-02 21:16           ` Ævar Arnfjörð Bjarmason
2017-05-03  9:45         ` Johannes Schindelin
2017-05-03 11:15           ` Duy Nguyen
2017-05-03 15:10           ` Ævar Arnfjörð Bjarmason
2017-05-04  9:11             ` Johannes Schindelin
2017-05-04 10:24               ` Ævar Arnfjörð Bjarmason
2017-05-04 11:32                 ` Johannes Schindelin
2017-05-04 11:53                   ` Ævar Arnfjörð Bjarmason
2017-05-08  6:30                   ` Ævar Arnfjörð Bjarmason
2017-05-08  7:10                     ` Junio C Hamano
2017-05-08 23:32                       ` brian m. carlson
2017-05-09  0:00                         ` Ævar Arnfjörð Bjarmason [this message]
2017-05-09  0:37                           ` brian m. carlson
2017-05-09 10:40                             ` Johannes Schindelin
2017-05-09 11:29                               ` Ævar Arnfjörð Bjarmason
2017-05-09 11:12                             ` Ævar Arnfjörð Bjarmason
2017-05-09 14:22                               ` demerphq
2017-05-09 14:46                                 ` Ævar Arnfjörð Bjarmason
2017-05-02 17:43     ` Brandon Williams
2017-05-02 17:49       ` Ævar Arnfjörð Bjarmason
2017-05-05  1:12 ` Ramsay Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACBZZX4G_ThE55Gi53QJt1=9K4jQXqJ3QL8JSGpiSSSDRrKeNA@mail.gmail.com' \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).