git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: PCRE v2 compile error, was Re: What's cooking in git.git (May 2017, #01; Mon, 1)
Date: Tue, 9 May 2017 13:12:41 +0200	[thread overview]
Message-ID: <CACBZZX6V8qbnrZAdhRvPthy5Z91iEG8rrJ=Sf9tdkOt52M9j1Q@mail.gmail.com> (raw)
In-Reply-To: <20170509003714.ylwn5ezvu5h36kj7@genre.crustytoothpaste.net>

On Tue, May 9, 2017 at 2:37 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Tue, May 09, 2017 at 02:00:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, May 9, 2017 at 1:32 AM, brian m. carlson
>> <sandals@crustytoothpaste.net> wrote:
>> > PCRE and PCRE2 also tend to have a lot of security updates, so I would
>> > prefer if we didn't import them into the tree.  It is far better for
>> > users to use their distro's packages for PCRE, as it means they get
>> > automatic security updates even if they're using an old Git.
>> >
>> > We shouldn't consider shipping anything with a remotely frequent history
>> > of security updates in our tree, since people very frequently run old or
>> > ancient versions of Git.
>>
>> I'm aware of its security record[1], but I wonder what threat model
>> you have in mind here. I'm not aware of any parts of git (except maybe
>> gitweb?) where we take regexes from untrusted sources.
>>
>> I.e. yes there have been DoS's & even some overflow bugs leading code
>> execution in PCRE, but in the context of powering git-grep & git-log
>> with PCRE this falls into the "stop hitting yourself" category.
>
> Just because you don't drive Git with untrusted regexes doesn't mean
> other people don't.  It's not a good idea to require a stronger security
> model than we absolutely have to, since people can and will violate it.
> Think how devastating Shellshock was even though technically nobody
> should provide insecure environment variables to the shell.

Yes this is definitely something we should keep in mind. I see my
comment could be read as dismissing your concerns, I didn't mean that.
This is definitely something to be concerned about.

I was trying to solicit feedback on whether there were any other parts
of stock git that could take external input as regexes than gitweb,
I'm not aware of any.

I thought that gitweb wouldn't take regexes by default, but I see now
that the 'search' feature is on by default, and that allows you to
grep / pickaxe with regexes. Either -E or -F for grep, or
--pickaxe-regex (which implies -E) for log.

> And, yes, gitweb does in fact call git grep.  That means that git grep
> must in fact be secure against untrusted regexes, or you have a remote
> code execution vulnerability.

Indeed, but it's not as clear-cut as turning on on PCRE making you vulnerable.

* gitweb is vulnerable to CPU DoS now in its default configuration.
It's easy to provide an ERE that ends up slurping up 100% CPU for
several seconds on any non-trivial sized repo, do that in parallel &
you have a DoS vector.

* Obviously something that's 2x as fast or more (which my WIP code is)
makes this better.

* PCRE tends to be worse at pathological patterns, but this is because
it has really large limits by default and will try rather insanely
hard to match your pattern.

                -DHEAP_LIMIT=20000000 \
                -DMATCH_LIMIT=10000000 \
                -DMATCH_LIMIT_DEPTH=10000000 \
                -DMAX_NAME_COUNT=10000 \
                -DMAX_NAME_SIZE=32 \
                -DPARENS_NEST_LIMIT=250 \

This can be reduced dynamically via the API (and the patterns can't
change it, except to reduce it). For example on my system 2.11.0
(before any of my changes) on linux.git:

    $ time git grep -E -- '-?-?-?-?-?-?-?-?-?-?-?-----------$' |wc -l
    12434
    real    0m0.444s
    $ time git grep -P -- '-?-?-?-?-?-?-?-?-?-?-?-----------$' | wc -l
    12434
    real    1m20.849s

With my JIT changes:

    $ time ~/g/git/git --exec-path=/home/avar/g/git grep -P --
'-?-?-?-?-?-?-?-?-?-?-?-----------$' | wc -l
    12434
    real    0m5.334s

However for user-supplied patterns we can just turn on really
conservative settings:

    $ time ~/g/git/git --exec-path=/home/avar/g/git grep -P --
'(*LIMIT_RECURSION=1000)(*LIMIT_MATCH=1000)-?-?-?-?-?-?-?-?-?-?-?-----------$'
| wc -l
    fatal: pcre2_jit_match failed with error code -47: match limit exceeded
    0
    real    0m0.021s

When I locally compile git with something like this:

-               -DMATCH_LIMIT=10000000 \
-               -DMATCH_LIMIT_DEPTH=10000000 \
-               -DMATCH_LIMIT_RECURSION=10000000 \
-               -DMAX_NAME_COUNT=10000 \
+               -DMATCH_LIMIT=1000 \
+               -DMATCH_LIMIT_DEPTH=1000 \
+               -DMATCH_LIMIT_RECURSION=1000 \
+               -DMAX_NAME_COUNT=100 \
                -DMAX_NAME_SIZE=32 \
-               -DPARENS_NEST_LIMIT=250 \
+               -DPARENS_NEST_LIMIT=10 \

All tests still pass, and from my own testing I can't find any
non-pathological patterns this would break. So we might actually
consider turning this down for git by default.

* Much of the rest of the patterns PCRE has CVEs for can similarly be
mitigated by simply turning various features off.

> Furthermore, at work we distribute Git with all releases of our product.
> We normally only do non-security updates to the last couple of releases,
> but we provide security updates to all supported versions.  I'm not
> comfortable shipping the entirety of PCRE or PCRE2 to customers without
> providing security updates, so you're going to make my job (and my
> coworkers') a lot harder by shipping it.  Please don't.

I'm not going to make your job harder. This WIP patch I have for
embedding PCRE2 as compat/pcre2 is for use by those who'd want to get
a faster grep, but for whatever reason don't have a handy PCRE v2
package already, or wouldn't mind supporting a git that comes bundled
with it.

My comments about "git rm"-ing kwset & compat/regex were meant in the
sense of "I have a branch where we don't use that anymore, and it's
faster", not that you couldn't compile a git without PCRE for the
forseeable future.

My WIP branch does remove much of the non-PCRE code, but that's just
because it was easier to hack up like that and not support a bunch of
if pcre/else branches, I'll amend that and keep PCRE optional when I
end up submitting those patches.

  parent reply	other threads:[~2017-05-09 11:13 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-01  5:35 What's cooking in git.git (May 2017, #01; Mon, 1) Junio C Hamano
2017-05-01 14:25 ` Ævar Arnfjörð Bjarmason
2017-05-01 16:21   ` Brandon Williams
2017-05-01 17:44     ` Ævar Arnfjörð Bjarmason
2017-05-01 19:29   ` Jeff King
2017-05-01 23:21   ` Junio C Hamano
2017-05-02  8:23     ` Ævar Arnfjörð Bjarmason
2017-05-02  9:35       ` Junio C Hamano
2017-05-02 12:31         ` Ævar Arnfjörð Bjarmason
2017-05-02 11:11 ` Johannes Schindelin
2017-05-02 12:09 ` PCRE v2 compile error, was " Johannes Schindelin
2017-05-02 12:27   ` Ævar Arnfjörð Bjarmason
2017-05-02 16:05     ` Johannes Schindelin
2017-05-02 18:52       ` Ævar Arnfjörð Bjarmason
2017-05-02 20:51         ` Kevin Daudt
2017-05-02 21:16           ` Ævar Arnfjörð Bjarmason
2017-05-03  9:45         ` Johannes Schindelin
2017-05-03 11:15           ` Duy Nguyen
2017-05-03 15:10           ` Ævar Arnfjörð Bjarmason
2017-05-04  9:11             ` Johannes Schindelin
2017-05-04 10:24               ` Ævar Arnfjörð Bjarmason
2017-05-04 11:32                 ` Johannes Schindelin
2017-05-04 11:53                   ` Ævar Arnfjörð Bjarmason
2017-05-08  6:30                   ` Ævar Arnfjörð Bjarmason
2017-05-08  7:10                     ` Junio C Hamano
2017-05-08 23:32                       ` brian m. carlson
2017-05-09  0:00                         ` Ævar Arnfjörð Bjarmason
2017-05-09  0:37                           ` brian m. carlson
2017-05-09 10:40                             ` Johannes Schindelin
2017-05-09 11:29                               ` Ævar Arnfjörð Bjarmason
2017-05-09 11:12                             ` Ævar Arnfjörð Bjarmason [this message]
2017-05-09 14:22                               ` demerphq
2017-05-09 14:46                                 ` Ævar Arnfjörð Bjarmason
2017-05-02 17:43     ` Brandon Williams
2017-05-02 17:49       ` Ævar Arnfjörð Bjarmason
2017-05-05  1:12 ` Ramsay Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACBZZX6V8qbnrZAdhRvPthy5Z91iEG8rrJ=Sf9tdkOt52M9j1Q@mail.gmail.com' \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).