git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Mathias Krause <minipli@grsecurity.net>
To: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Cc: "Mathias Krause" <minipli@grsecurity.net>,
	"Stephane Odul" <stephane@clumio.com>,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: Suspected git grep regression in git 2.40.0 - proposed fix
Date: Thu, 23 Mar 2023 15:40:00 +0100	[thread overview]
Message-ID: <20230323144000.21146-1-minipli@grsecurity.net> (raw)
In-Reply-To: <4a103812-c4c6-a010-c2e5-4e42e9855c2e@grsecurity.net>

I looked a little further and found an interesting entry in the PCRE2's
changelog for version 10.35:

https://github.com/PCRE2Project/pcre2/blob/pcre2-10.35/ChangeLog#L66:

  17. Fix a crash which occurs when the character type of an invalid UTF
  character is decoded in JIT.

Its fix commit https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
and associated unit test is basically doing the same as what Stephane is
running into: making the JIT compiled code choke on something that's not
a valid UTF-8 string.

So it looks like we're out of luck and have to implement yet another
quirk to handle these broken versions. We can either disable the JIT
compiler completely for these versions (and exchange the segfault for a
serve performance regression) or fall-back to the previous behaviour and
ignore Unicode properties (and reintroduce the bug commit acabd2048ee0
("grep: correctly identify utf-8 characters with \{b,w} in -P") wanted
to fix).

I went with the second option and could confirm the below patch fixes
the segfault on Ubuntu 20.04 which is shipping the broken version.

Junio, what's your call on it? Below patch or simply a revert of commit
acabd2048ee0? Other ideas?

Thanks,
Mathias

-- >8 --
Subject: [PATCH] grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34

Stephane is reporting[1] a regression introduced in git v2.40.0 that leads
to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an
older version of libpcre2 that triggers a wild pointer dereference in
the generated JIT code that was fixed in PCRE2 10.35.

Instead of completely disabling the JIT compiler for the buggy version,
just mask out the Unicode property handling as we used to do prior to
commit acabd2048ee0 ("grep: correctly identify utf-8 characters with
\{b,w} in -P").

[1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/

Reported-by: Stephane Odul <stephane@clumio.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 grep.c | 11 ++++++++++-
 grep.h |  3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index cee44a78d044..d249beae60d0 100644
--- a/grep.c
+++ b/grep.c
@@ -317,8 +317,17 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		}
 		options |= PCRE2_CASELESS;
 	}
-	if (!opt->ignore_locale && is_utf8_locale() && !literal)
+	if (!opt->ignore_locale && is_utf8_locale() && !literal) {
 		options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
+#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
+		/*
+		 * Work around a JIT bug related to invalid Unicode character
+		 * handling fixed in 10.35:
+		 * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
+		 */
+		options ^= PCRE2_UCP;
+#endif
+	}
 
 #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
 	/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
diff --git a/grep.h b/grep.h
index 6075f997e68f..c59592e3bdba 100644
--- a/grep.h
+++ b/grep.h
@@ -7,6 +7,9 @@
 #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
 #define GIT_PCRE2_VERSION_10_36_OR_HIGHER
 #endif
+#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 35) || PCRE2_MAJOR >= 11
+#define GIT_PCRE2_VERSION_10_35_OR_HIGHER
+#endif
 #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 34) || PCRE2_MAJOR >= 11
 #define GIT_PCRE2_VERSION_10_34_OR_HIGHER
 #endif
-- 
2.39.2


  parent reply	other threads:[~2023-03-23 14:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  8:04 Suspected git grep regression in git 2.40.0 Stephane Odul
2023-03-21 12:33 ` Bagas Sanjaya
2023-03-21 16:33 ` Junio C Hamano
2023-03-21 19:20   ` Mathias Krause
2023-03-21 20:46     ` [EXTERNAL SENDER] " Stephane Odul
2023-03-22 19:52       ` Mathias Krause
2023-03-22 20:04         ` [EXTERNAL SENDER] " Stephane Odul
2023-03-23 14:40         ` Mathias Krause [this message]
2023-03-23 16:19           ` Suspected git grep regression in git 2.40.0 - proposed fix Junio C Hamano
2023-03-23 16:36             ` Mathias Krause
2023-03-23 17:25           ` [PATCH v2] grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34 Mathias Krause

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230323144000.21146-1-minipli@grsecurity.net \
    --to=minipli@grsecurity.net \
    --cc=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stephane@clumio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).