From: Mathias Krause <minipli@grsecurity.net>
To: Junio C Hamano <gitster@pobox.com>, Stephane Odul <stephane@clumio.com>
Cc: git@vger.kernel.org,
"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: Suspected git grep regression in git 2.40.0
Date: Tue, 21 Mar 2023 20:20:42 +0100 [thread overview]
Message-ID: <b0f4b588-9871-8e59-e5a2-3f8745a7c4cd@grsecurity.net> (raw)
In-Reply-To: <xmqqttyejc7y.fsf@gitster.g>
On 21.03.23 17:33, Junio C Hamano wrote:
> [jc: on the CC: line, summoned a few people who may know a lot more
> about pcre than I do]
>
> Stephane Odul <stephane@clumio.com> writes:
>
>> We have a CI pipeline on a private repository that started failing
>> consistently while running `git grep -P` commands. The command
>> fails with an exit code of -11 and is failing pretty
>> consistently. With prior versions of git there is no issue
>> whatsoever, but with 2.40.0 it always fails on the same
>> call. Other git grep calls are fine, but that one is failing
>> consistently.
>>
>> I was not able to reproduce locally but my main machine is an M1
>> MacBook Pro, the CI pipeline runs under Kubernetes in AWS and the
>> container is based on Ubuntu 20.04 with the git client installed
>> via the PPA.
>>
>> The error is for this pattern `git grep -cP '^\w+ = json.load'`.
>>
>> As a workaround we tried to download and install the microsoft-git
>> v2.39.2 deb package since it allows us to downgrade, but then the
>> git grep commands just got stuck.
>
> One "grep -P" related change we had recently between 2.39 and 2.40
> was
>
> 50b6ad55 (grep: fall back to interpreter if JIT memory
> allocation fails, 2023-01-31)
>
> The code tries to figure out at runtime if pcre engine has
> functioning JIT by making an extra JIT compilation of a sample
> pattern and when it fails with a specific reason, fall back to
> interpreted pattern matching. The change made the code to emit a
> bit more detailed information when it fails, but a controlled exit
> from the codepath should give $?=128, not 11.
>
> So the above commit may or may not be related. It could be that the
> version of pcre library linked to Git 2.40 and older Git you are
> running in your CI environment has been updated.
I don't think it's that commit. I just tried the pattern here on a system that
would require the interpreter fallback and one that wouldn't and both didn't
error out (Debian bookworm/sid; libpcre2-8: 10.42-1).
Looking at the history, though, another commit stands out:
$ git log --oneline --no-merges v2.39.0..v2.40.0 -- grep.c
fb2ebe72a374 grep API: plug memory leaks by freeing "header_list"
891c9965fbc0 grep.c: refactor free_grep_patterns()
50b6ad55b04d grep: fall back to interpreter if JIT memory allocation fails
acabd2048ee0 grep: correctly identify utf-8 characters with \{b,w} in -P
Commit acabd2048ee0 suspiciously matches the pattern with its "\w" mention.
Looking at the diff gives:
if (!opt->ignore_locale && is_utf8_locale() && !literal)
- options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
+ options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
So maybe the PCRE2_UCP option isn't supported by the system in question or it
forces PCRE2 to do a verification it didn't do before?
The exit code of -11 could relate to PCRE2's error PCRE2_ERROR_UTF8_ERR9 (-11)
strengthening the suspicion it's related to that change. It's described as:
PCRE2_ERROR_UTF8_ERR9 5th-byte's two top bits are not 0x80
Maybe the input data isn't valid UTF-8 encoded?
Stephane, can you pin point the file that the grep is failing on and verify it
is indeed a valid UFT-8 encoded file via:
$ iconv -f UTF-8 your_file > /dev/null && echo OK || echo "Not valid"
If this prints "OK", we need to investigate further. However, if it prints
"Not valid", a broken file encoding is probably the cause and we should
revisit the above change and think of alternatives, as this might be seen as
a regression (even though the file isn't strictly valid UTF-8).
Thanks,
Mathias
>
> Does it make a difference if you disable JIT by prefixing the
> pattern with (*NO_JIT)?
>
> Thanks.
next prev parent reply other threads:[~2023-03-21 19:21 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-21 8:04 Suspected git grep regression in git 2.40.0 Stephane Odul
2023-03-21 12:33 ` Bagas Sanjaya
2023-03-21 16:33 ` Junio C Hamano
2023-03-21 19:20 ` Mathias Krause [this message]
2023-03-21 20:46 ` [EXTERNAL SENDER] " Stephane Odul
2023-03-22 19:52 ` Mathias Krause
2023-03-22 20:04 ` [EXTERNAL SENDER] " Stephane Odul
2023-03-23 14:40 ` Suspected git grep regression in git 2.40.0 - proposed fix Mathias Krause
2023-03-23 16:19 ` Junio C Hamano
2023-03-23 16:36 ` Mathias Krause
2023-03-23 17:25 ` [PATCH v2] grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34 Mathias Krause
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b0f4b588-9871-8e59-e5a2-3f8745a7c4cd@grsecurity.net \
--to=minipli@grsecurity.net \
--cc=avarab@gmail.com \
--cc=carenas@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=stephane@clumio.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).