git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Git List" <git@vger.kernel.org>,
	"Hamza Mahfooz" <someguy@effective-light.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Andreas Schwab" <schwab@linux-m68k.org>
Subject: Re: [v2.35.0 regression] some PCRE hangs under UTF-8 locale (was: [PATCH 1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns)
Date: Sun, 30 Jan 2022 14:32:46 +0100	[thread overview]
Message-ID: <b74f781c-548b-5254-d3d1-fc1873c70e10@web.de> (raw)
In-Reply-To: <20220130090422.GA4769@szeder.dev>

Am 30.01.22 um 10:04 schrieb SZEDER Gábor:
> On Sun, Jan 30, 2022 at 08:55:02AM +0100, René Scharfe wrote:
>> e0c6029 (Fix inifinite loop when a single byte newline is searched in
>> JIT., 2020-05-29) [1] sounds like it might have fixed it.  It's part of
>> version 10.36.
>
> I saw this hang on two Ubuntu 20.04 based boxes, which predate that
> fix you mention only by a month or two, and apparently the almost two
> years since then was not enough for this fix to trickle down into
> updated 20.04 pcre packages, because:
>
>> Do you still get the error when you disable JIT, i.e. when you use the
>> pattern "(*NO_JIT)^\s" instead?
>
> No, with this pattern it works as expected.
>
> So is there a more convenient way to disable PCRE JIT in Git?  FWIW,
> (non-git) 'grep -P' works with the same patterns.

I don't know a better way.  We could do it automatically, though:

--- >8 ---
Subject: [PATCH] grep: disable JIT on PCRE2 before 10.36 to avoid endless loop

Commit e0c6029 (Fix inifinite loop when a single byte newline is
searched in JIT., 2020-05-29) of PCRE2 adds the following point to its
ChangeLog for version 10.36:

  2. Fix inifinite loop when a single byte newline is searched in JIT when
  invalid utf8 mode is enabled.

Avoid that bug on older versions (which are still reportedly found in
the wild) by disabling the JIT when handling UTF-8.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Not sure how to test it.  Killing git grep after a second or so seems a
bit clumsy.  timeout(1) from GNU coreutils at least allows doing that
from the shell, but it's not a standard tool.  Perhaps we need a new
test helper for that purpose?

 grep.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/grep.c b/grep.c
index 7bb0360869..16629a2301 100644
--- a/grep.c
+++ b/grep.c
@@ -406,6 +406,14 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	}

 	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
+#ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
+	/*
+	 * Work around the bug fixed by e0c6029 (Fix inifinite loop when a
+	 * single byte newline is searched in JIT., 2020-05-29).
+	 */
+	if (options & PCRE2_MATCH_INVALID_UTF)
+		p->pcre2_jit_on = 0;
+#endif
 	if (p->pcre2_jit_on) {
 		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
 		if (jitret)
--
2.35.0

  reply	other threads:[~2022-01-30 13:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-18 19:50 [PATCH 1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns René Scharfe
2021-12-18 19:53 ` [PATCH 2/2] grep/pcre2: factor out literal variable René Scharfe
2021-12-19 19:37   ` Ævar Arnfjörð Bjarmason
2021-12-20 20:52     ` Junio C Hamano
2021-12-20 22:03       ` Ævar Arnfjörð Bjarmason
2021-12-20 20:53     ` Junio C Hamano
2021-12-20 20:47   ` Junio C Hamano
2022-01-29 17:25 ` [v2.35.0 regression] some PCRE hangs under UTF-8 locale (was: [PATCH 1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns) SZEDER Gábor
2022-01-30  7:55   ` René Scharfe
2022-01-30  9:04     ` SZEDER Gábor
2022-01-30 13:32       ` René Scharfe [this message]
2022-01-31 21:01         ` Ævar Arnfjörð Bjarmason
2022-02-05 17:00           ` René Scharfe
2022-02-06 10:08             ` SZEDER Gábor
2022-02-12 20:46             ` Ævar Arnfjörð Bjarmason
2022-02-17 21:14               ` René Scharfe
2022-02-17 22:56                 ` [v2.35.0 regression] some PCRE hangs under UTF-8 locale Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b74f781c-548b-5254-d3d1-fc1873c70e10@web.de \
    --to=l.s.r@web.de \
    --cc=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=schwab@linux-m68k.org \
    --cc=someguy@effective-light.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).