git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Git List" <git@vger.kernel.org>,
	"Hamza Mahfooz" <someguy@effective-light.com>,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Andreas Schwab" <schwab@linux-m68k.org>
Subject: Re: [v2.35.0 regression] some PCRE hangs under UTF-8 locale (was: [PATCH 1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns)
Date: Thu, 17 Feb 2022 22:14:29 +0100	[thread overview]
Message-ID: <4e391e2e-6561-3c2e-0306-c860a37356bc@web.de> (raw)
In-Reply-To: <220212.86zgmvx13i.gmgdl@evledraar.gmail.com>

Am 12.02.22 um 21:46 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Feb 05 2022, René Scharfe wrote:
>
>>
>> I can't actually test the effectiveness of the patch because PCRE2's
>> JIT doesn't work on my development machine at all (Apple M1), as I just
>> discovered. :-/  While we know that disabling JIT helps, we didn't
>> actually determine, yet, if e0c6029 (Fix inifinite loop when a single
>> byte newline is searched in JIT., 2020-05-29) really fixes the "^\s"
>> bug.
>>
>> So I have to abandon this patch, unfortunately.  Any volunteer to pick
>> it up?
>
> We can test it in CI, and have a proposed patch from Hamza Mahfooz to do
> so. See
> https://lore.kernel.org/git/211220.865yrjszg4.gmgdl@evledraar.gmail.com/
>
> There's been some minor changes to the main.yml since then, but I think
> you should be able to just pick that patch up, adjust it, apply whatever
> changes you want to test on top, and push it to github.

Good idea!  Except the "just" is not justified, I feel.  I learned that

  - t7810 fails with PCRE2 built with --disable-unicode because it uses
    \p{...} unconditionally, and that's not supported without Unicode
    support -- no idea how to detect that and skip those tests except
    by trying and maybe looking for the note that "this version of PCRE2
    does not have support for \P, \p, or \X", which somehow feels iffy,

  - PCRE2 10.35 doesn't build on Ubuntu x64 without adding -mshstk to
    CFLAGS, and that's the version I wanted to test,

  - many of the Unicode related tests require Islandic language support,
    and "sudo apt-get -y install `check-language-support -l is`"
    installs it,

  - the condition for our workaround for bug 2642 is reversed,

  - with that fixed I can't trigger the endless loop.

So perhaps that's the only fix we need here -- or perhaps I got
confused by the multitude of options.

--- >8 ---
Subject: [PATCH] grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround

PCRE2 bug 2642 was fixed in version 10.36.  Our 95ca1f987e (grep/pcre2:
better support invalid UTF-8 haystacks, 2021-01-24) worked around it on
older versions by setting the flag PCRE2_NO_START_OPTIMIZE.  797c359978
(grep/pcre2: use compile-time PCREv2 version test, 2021-02-18) switched
it around to set the flag on 10.36 and higher instead, while it claimed
to use "the same test done at compile-time".

Switch the condition back to apply the workaround on PCRE2 versions
_before_ 10.36.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 5bec7fd793..ef34d764f9 100644
--- a/grep.c
+++ b/grep.c
@@ -386,7 +386,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	if (!opt->ignore_locale && is_utf8_locale() && !literal)
 		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);

-#ifdef GIT_PCRE2_VERSION_10_36_OR_HIGHER
+#ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
 	/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
 	if (PCRE2_MATCH_INVALID_UTF && options & (PCRE2_UTF | PCRE2_CASELESS))
 		options |= PCRE2_NO_START_OPTIMIZE;
--
2.35.1

  reply	other threads:[~2022-02-17 21:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-18 19:50 [PATCH 1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns René Scharfe
2021-12-18 19:53 ` [PATCH 2/2] grep/pcre2: factor out literal variable René Scharfe
2021-12-19 19:37   ` Ævar Arnfjörð Bjarmason
2021-12-20 20:52     ` Junio C Hamano
2021-12-20 22:03       ` Ævar Arnfjörð Bjarmason
2021-12-20 20:53     ` Junio C Hamano
2021-12-20 20:47   ` Junio C Hamano
2022-01-29 17:25 ` [v2.35.0 regression] some PCRE hangs under UTF-8 locale (was: [PATCH 1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns) SZEDER Gábor
2022-01-30  7:55   ` René Scharfe
2022-01-30  9:04     ` SZEDER Gábor
2022-01-30 13:32       ` René Scharfe
2022-01-31 21:01         ` Ævar Arnfjörð Bjarmason
2022-02-05 17:00           ` René Scharfe
2022-02-06 10:08             ` SZEDER Gábor
2022-02-12 20:46             ` Ævar Arnfjörð Bjarmason
2022-02-17 21:14               ` René Scharfe [this message]
2022-02-17 22:56                 ` [v2.35.0 regression] some PCRE hangs under UTF-8 locale Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e391e2e-6561-3c2e-0306-c860a37356bc@web.de \
    --to=l.s.r@web.de \
    --cc=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=schwab@linux-m68k.org \
    --cc=someguy@effective-light.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).