* [PATCH] grep: fall back to interpreter mode if JIT fails @ 2022-12-16 12:15 Mathias Krause 2022-12-16 16:12 ` Ævar Arnfjörð Bjarmason ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Mathias Krause @ 2022-12-16 12:15 UTC (permalink / raw) To: git; +Cc: Mathias Krause, Carlo Marcelo Arenas Belón From: Carlo Marcelo Arenas Belón <carenas@gmail.com> Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48): [user@fedora git]$ git grep -c PCRE2_JIT grep.c:1 [user@fedora git]$ # Enable SELinux's W^X policy [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep' [user@fedora git]$ git grep -c PCRE2_JIT fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48' Instead of failing hard in this case and making 'git grep' unusable on such systems, simply fall back to interpreter mode, leading to a much better user experience. Such a change was already proposed 4 years ago [1] but wasn't merged for unknown reasons. 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> # tweaked changelog, added comment --- grep.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/grep.c b/grep.c index 06eed694936c..f2ada528b21d 100644 --- a/grep.c +++ b/grep.c @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); if (p->pcre2_jit_on) { jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); - if (jitret) - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + if (jitret) { + /* + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) + * indicated JIT support, the library might still + * fail to generate JIT code for various reasons, + * e.g. when SELinux's 'deny_execmem' or PaX's + * MPROTECT prevent creating W|X memory mappings. + * + * Instead of faling hard, fall back to interpreter + * mode, just as if the pattern was prefixed with + * '(*NO_JIT)'. + */ + p->pcre2_jit_on = 0; + return; + } /* * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just -- 2.35.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-16 12:15 [PATCH] grep: fall back to interpreter mode if JIT fails Mathias Krause @ 2022-12-16 16:12 ` Ævar Arnfjörð Bjarmason 2022-12-16 19:26 ` Mathias Krause 2022-12-16 22:52 ` Junio C Hamano 2023-01-27 15:49 ` [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails Mathias Krause 2 siblings, 1 reply; 36+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-16 16:12 UTC (permalink / raw) To: Mathias Krause; +Cc: git, Carlo Marcelo Arenas Belón On Fri, Dec 16 2022, Mathias Krause wrote: > From: Carlo Marcelo Arenas Belón <carenas@gmail.com> > > Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT > enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, > making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48): > > [user@fedora git]$ git grep -c PCRE2_JIT > grep.c:1 > > [user@fedora git]$ # Enable SELinux's W^X policy > [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem > > [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep' > [user@fedora git]$ git grep -c PCRE2_JIT > fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48' > > Instead of failing hard in this case and making 'git grep' unusable on > such systems, simply fall back to interpreter mode, leading to a much > better user experience. > > Such a change was already proposed 4 years ago [1] but wasn't merged for > unknown reasons. Yeah, it's unfortunate that it fell between the cracks, and it's good to have such a fallback mechanism. > 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com > > Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> > Signed-off-by: Mathias Krause <minipli@grsecurity.net> # tweaked changelog, added comment > --- > grep.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/grep.c b/grep.c > index 06eed694936c..f2ada528b21d 100644 > --- a/grep.c > +++ b/grep.c > @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt > pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); > if (p->pcre2_jit_on) { > jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); > - if (jitret) > - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); > + if (jitret) { > + /* > + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) > + * indicated JIT support, the library might still > + * fail to generate JIT code for various reasons, > + * e.g. when SELinux's 'deny_execmem' or PaX's > + * MPROTECT prevent creating W|X memory mappings. > + * > + * Instead of faling hard, fall back to interpreter > + * mode, just as if the pattern was prefixed with > + * '(*NO_JIT)'. > + */ > + p->pcre2_jit_on = 0; > + return; From my reading of the docs it returns two different codes: PCRE2_ERROR_JIT_BADOPTION or PCRE2_ERROR_NOMEMORY. This change will start treating both the same, but we only want to allow the latter, surely? So shouldn't this be e.g.: jitret = pcre2_jit_compile(...); if (jitret == PCRE2_ERROR_NOMEMORY) { /* code you added here */ } else if (jitret) { BUG(...); } I put a BUG() there, we could keep the die(), but PCRE2_ERROR_JIT_BADOPTION is really more appropriate as a BUG(), and if it starts returning any other codes our use of the API is also in some unknown state, so we should also BUG() out. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-16 16:12 ` Ævar Arnfjörð Bjarmason @ 2022-12-16 19:26 ` Mathias Krause 2022-12-16 23:09 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Mathias Krause @ 2022-12-16 19:26 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Carlo Marcelo Arenas Belón [-- Attachment #1.1: Type: text/plain, Size: 3722 bytes --] Am 16.12.22 um 17:12 schrieb Ævar Arnfjörð Bjarmason: > > On Fri, Dec 16 2022, Mathias Krause wrote: > >> From: Carlo Marcelo Arenas Belón <carenas@gmail.com> >> >> Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT >> enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, >> making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48): >> >> [user@fedora git]$ git grep -c PCRE2_JIT >> grep.c:1 >> >> [user@fedora git]$ # Enable SELinux's W^X policy >> [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem >> >> [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep' >> [user@fedora git]$ git grep -c PCRE2_JIT >> fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48' >> >> Instead of failing hard in this case and making 'git grep' unusable on >> such systems, simply fall back to interpreter mode, leading to a much >> better user experience. >> >> Such a change was already proposed 4 years ago [1] but wasn't merged for >> unknown reasons. > > Yeah, it's unfortunate that it fell between the cracks, and it's good to > have such a fallback mechanism. I fully agree, obviously! >> 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com >> >> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> >> Signed-off-by: Mathias Krause <minipli@grsecurity.net> # tweaked changelog, added comment >> --- >> grep.c | 17 +++++++++++++++-- >> 1 file changed, 15 insertions(+), 2 deletions(-) >> >> diff --git a/grep.c b/grep.c >> index 06eed694936c..f2ada528b21d 100644 >> --- a/grep.c >> +++ b/grep.c >> @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt >> pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); >> if (p->pcre2_jit_on) { >> jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); >> - if (jitret) >> - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); >> + if (jitret) { >> + /* >> + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) >> + * indicated JIT support, the library might still >> + * fail to generate JIT code for various reasons, >> + * e.g. when SELinux's 'deny_execmem' or PaX's >> + * MPROTECT prevent creating W|X memory mappings. >> + * >> + * Instead of faling hard, fall back to interpreter >> + * mode, just as if the pattern was prefixed with >> + * '(*NO_JIT)'. >> + */ >> + p->pcre2_jit_on = 0; >> + return; > > From my reading of the docs it returns two different codes: > PCRE2_ERROR_JIT_BADOPTION or PCRE2_ERROR_NOMEMORY. > > This change will start treating both the same, but we only want to allow > the latter, surely? > > So shouldn't this be e.g.: > > jitret = pcre2_jit_compile(...); > if (jitret == PCRE2_ERROR_NOMEMORY) { > /* code you added here */ > } else if (jitret) { > BUG(...); > } I'm fine with it either way. In fact, just testing for the "JIT memory cannot be allocated" error might be more on point and could detect wrong API usage by the means of an error. However, from a user's point of view a fall back to interpreter mode might still be desired in this case, as a failing 'git grep' is simply not acceptable, IMHO. > I put a BUG() there, we could keep the die(), but > PCRE2_ERROR_JIT_BADOPTION is really more appropriate as a BUG(), and if > it starts returning any other codes our use of the API is also in some > unknown state, so we should also BUG() out. Valid points. I'll give others some more time to churn in and if there are no strong objections, I'll change it like you suggest. Thanks, Mathias [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 665 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-16 19:26 ` Mathias Krause @ 2022-12-16 23:09 ` Junio C Hamano 2022-12-17 2:50 ` Carlo Arenas 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2022-12-16 23:09 UTC (permalink / raw) To: Mathias Krause Cc: Ævar Arnfjörð Bjarmason, git, Carlo Marcelo Arenas Belón Mathias Krause <minipli@grsecurity.net> writes: > ... However, from a user's point of view a fall back to > interpreter mode might still be desired in this case, as a failing > 'git grep' is simply not acceptable, IMHO. "git grep" that silently produces a wrong result (by falling back after a problem is detected) would not be acceptable, either. Receiving BADOPTION could be a sign that there is something wrong in the input, not from the end-user but from the code, in which case stopping with BUG() may be a more appropriate? >> I put a BUG() there, we could keep the die(), but >> PCRE2_ERROR_JIT_BADOPTION is really more appropriate as a BUG(), and if >> it starts returning any other codes our use of the API is also in some >> unknown state, so we should also BUG() out. > > Valid points. I'll give others some more time to churn in and if there > are no strong objections, I'll change it like you suggest. Yup, sounds quite sensible. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-16 23:09 ` Junio C Hamano @ 2022-12-17 2:50 ` Carlo Arenas 2022-12-19 9:00 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 36+ messages in thread From: Carlo Arenas @ 2022-12-17 2:50 UTC (permalink / raw) To: Junio C Hamano Cc: Mathias Krause, Ævar Arnfjörð Bjarmason, git On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote: > > Mathias Krause <minipli@grsecurity.net> writes: > > > ... However, from a user's point of view a fall back to > > interpreter mode might still be desired in this case, as a failing > > 'git grep' is simply not acceptable, IMHO. > > "git grep" that silently produces a wrong result (by falling back > after a problem is detected) would not be acceptable, either. except that an error at this point only invalidates the use of JIT, so calling pcre2_jit_match() is invalid but calling pcre2_match() is not. the later is setup to be used later by the code that is added, > Receiving BADOPTION could be a sign that there is something wrong in > the input, not from the end-user but from the code, in which case > stopping with BUG() may be a more appropriate? The way PCRE handles this kind of errors internally is to instruct pcre2_match() to use the interpreter. While a BUG() might be a way to ensure the code is using the right set of options I would expect that the failure will be reported by pcre2_compile instead, with the only cases left, only being interna to PCRE (ex: JIT can't yet support a feature the interpreter has) Carlo ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-17 2:50 ` Carlo Arenas @ 2022-12-19 9:00 ` Ævar Arnfjörð Bjarmason 2022-12-20 19:29 ` Mathias Krause 0 siblings, 1 reply; 36+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-19 9:00 UTC (permalink / raw) To: Carlo Arenas; +Cc: Junio C Hamano, Mathias Krause, git, pcre-dev On Fri, Dec 16 2022, Carlo Arenas wrote: [CC-ing pcre-dev@ for this "future error API" discussion] > On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Mathias Krause <minipli@grsecurity.net> writes: >> >> > ... However, from a user's point of view a fall back to >> > interpreter mode might still be desired in this case, as a failing >> > 'git grep' is simply not acceptable, IMHO. >> >> "git grep" that silently produces a wrong result (by falling back >> after a problem is detected) would not be acceptable, either. > > except that an error at this point only invalidates the use of JIT, > so calling pcre2_jit_match() is invalid but calling pcre2_match() is not. > > the later is setup to be used later by the code that is added, I think we could stumble ahead, but if this were to happen our assumptions about how the API works have been invalidated. The pcre2_jit_compile() doesn't promise to return a finite set of error codes, but: [...]0 for success, or a negative error code otherwise[...] But if new codes were added it's anyone's guess what state we'd be in, so I think the safe thing is to BUG() out if we get as far as pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or PCRE2_ERROR_NOMEMORY. >> Receiving BADOPTION could be a sign that there is something wrong in >> the input, not from the end-user but from the code, in which case >> stopping with BUG() may be a more appropriate? > > The way PCRE handles this kind of errors internally is to instruct pcre2_match() > to use the interpreter. > > While a BUG() might be a way to ensure the code is using the right set > of options > I would expect that the failure will be reported by pcre2_compile > instead, with the > only cases left, only being interna to PCRE (ex: JIT can't yet support > a feature the > interpreter has) I agree that it's possible in general that an external library might start returning a "benign" error code that we could recover from, so BUG(...) would be overdoing it. I just don't see that libpcre would do that in this case. In general the JIT supports all patterns that the normal engine does, so if we're past the "is it available?" hump it should work. If it starts not doing that I'd think they'd do a major version upgrade, or opt-in with a new flag etc. Note that it already has a way of checking for "we tried to do the jit thing, but it wasn't on in the end". See the code I added in a25b9085043[1] (grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT), 2017-11-23), which comes right after the pcre2_jit_compile(). So not only would a BUG() biting us here require them to create a new code for the state of "we have the JIT, but can't use it here" (for some reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already "overloaded" to mean that). It would also require them to invent a new "soft" failure mode for the JIT, i.e. not the facility added in a25b9085043, where we can use the JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern itself. But I may be wrong, so I've CC'd pcre-dev@ to see if they have any commentary on this proposed API paranoia. For them: The greater context of this thread on the git ML is at [2]. 1. https://github.com/git/git/commit/a25b9085043b8029169b4d5b172b78ca3f38fb37 2. https://lore.kernel.org/git/20221216121557.30714-1-minipli@grsecurity.net/ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-19 9:00 ` Ævar Arnfjörð Bjarmason @ 2022-12-20 19:29 ` Mathias Krause 2022-12-20 21:11 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 36+ messages in thread From: Mathias Krause @ 2022-12-20 19:29 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Carlo Arenas Cc: Junio C Hamano, git, pcre-dev Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason: > > On Fri, Dec 16 2022, Carlo Arenas wrote: > > [CC-ing pcre-dev@ for this "future error API" discussion] > >> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote: >>> >>> Mathias Krause <minipli@grsecurity.net> writes: >>> >>>> ... However, from a user's point of view a fall back to >>>> interpreter mode might still be desired in this case, as a failing >>>> 'git grep' is simply not acceptable, IMHO. >>> >>> "git grep" that silently produces a wrong result (by falling back >>> after a problem is detected) would not be acceptable, either. >> >> except that an error at this point only invalidates the use of JIT, >> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not. >> >> the later is setup to be used later by the code that is added, > > I think we could stumble ahead, but if this were to happen our > assumptions about how the API works have been invalidated. Well, pcre2_jit_compile() might fail for internal reasons, e.g. pcre2jit(3) states: "[...] If a pattern is too big, a call to pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY." For example, the following fails for me: $ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)… But explicitly disabling JIT makes it "work": $ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c $ It's a made up example and might even be intended behavior by git, but it also proves a point Carlo already mentioned, a failing call to pcre2_jit_compile() only invalidates the use of the JIT engine. We can still use and fall back to the interpreter. It would be used anyway if PCRE2 was compiled without JIT support, so I don't see any issues with falling back to interpreter mode if the JIT compilation fails -- for whatever reason. > The pcre2_jit_compile() doesn't promise to return a finite set of error > codes, but: > > [...]0 for success, or a negative error code otherwise[...] > > But if new codes were added it's anyone's guess what state we'd be in, > so I think the safe thing is to BUG() out if we get as far as > pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or > PCRE2_ERROR_NOMEMORY. But why BUG()? JIT is an optimization that might fail for PCRE2 internal reasons. Why should we make 'git grep' fail too in this case when we can handle it just fine by attempting to use the interpreter? If the pattern is really bogus, the interpreter will complain as well and we'll error out. But failing just because the JIT engine can't handle the pattern? Doesn't sound right to me. >>> Receiving BADOPTION could be a sign that there is something wrong in >>> the input, not from the end-user but from the code, in which case >>> stopping with BUG() may be a more appropriate? >> >> The way PCRE handles this kind of errors internally is to instruct pcre2_match() >> to use the interpreter. >> >> While a BUG() might be a way to ensure the code is using the right set >> of options >> I would expect that the failure will be reported by pcre2_compile >> instead, with the >> only cases left, only being interna to PCRE (ex: JIT can't yet support >> a feature the >> interpreter has) > > I agree that it's possible in general that an external library might > start returning a "benign" error code that we could recover from, so > BUG(...) would be overdoing it. And I think that's the case here: JIT is an optimization that might not be available under all circumstances, as, for example, under SELinux's 'deny_execmem' setting. So we need to have a backup plan for such systems anyway. Why not always try to use the interpreter if JIT compilation fails? > I just don't see that libpcre would do that in this case. In general the > JIT supports all patterns that the normal engine does, so if we're past > the "is it available?" hump it should work. Leaving my above example aside, showing that JIT cannot handle very long patterns the interpreter mode does, it is, in fact, unfortunate that libpcre2 announces JIT support when it could know better that it does not. In fact, I also try to fix this misbehavior in PCRE2/SLJIT[1,2], but the maintainer thinks it's a bug in git which should be fixed by handling errors from pcre2_jit_compile() and falling back to interpreter mode[3]. While I still think this can pretty much be fixed in PCRE2 alone, handling possible PCRE2 JIT errors in git and falling back to interpreter mode is still a sensible thing to do. [1] https://github.com/PCRE2Project/pcre2/pull/157 [2] https://github.com/zherczeg/sljit/pull/136 [3] https://github.com/zherczeg/sljit/pull/136#issuecomment-1307254018 > If it starts not doing that I'd think they'd do a major version upgrade, > or opt-in with a new flag etc. > > Note that it already has a way of checking for "we tried to do the jit > thing, but it wasn't on in the end". See the code I added in > a25b9085043[1] (grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT), > 2017-11-23), which comes right after the pcre2_jit_compile(). Yeah, that's required as well, unfortunately. :/ > So not only would a BUG() biting us here require them to create a new > code for the state of "we have the JIT, but can't use it here" (for some > reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already > "overloaded" to mean that). > > It would also require them to invent a new "soft" failure mode for the > JIT, i.e. not the facility added in a25b9085043, where we can use the > JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern > itself. We should really treat PCRE2 JIT as an *optional* optimization that might not be available for certain cases. For these we should, IMHO, simply use the interpreter mode, instead of bugging users with a BUG() / die(). > But I may be wrong, so I've CC'd pcre-dev@ to see if they have any > commentary on this proposed API paranoia. For them: The greater context > of this thread on the git ML is at [2]. > > 1. https://github.com/git/git/commit/a25b9085043b8029169b4d5b172b78ca3f38fb37 > 2. https://lore.kernel.org/git/20221216121557.30714-1-minipli@grsecurity.net/ Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-20 19:29 ` Mathias Krause @ 2022-12-20 21:11 ` Ævar Arnfjörð Bjarmason 2023-01-18 14:22 ` Mathias Krause 0 siblings, 1 reply; 36+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-20 21:11 UTC (permalink / raw) To: Mathias Krause; +Cc: Carlo Arenas, Junio C Hamano, git On Tue, Dec 20 2022, Mathias Krause wrote: [De-CC-ing pcre-dev@, since this part is all git-specific] > Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason: >> >> On Fri, Dec 16 2022, Carlo Arenas wrote: >> >> [CC-ing pcre-dev@ for this "future error API" discussion] >> >>> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote: >>>> >>>> Mathias Krause <minipli@grsecurity.net> writes: >>>> >>>>> ... However, from a user's point of view a fall back to >>>>> interpreter mode might still be desired in this case, as a failing >>>>> 'git grep' is simply not acceptable, IMHO. >>>> >>>> "git grep" that silently produces a wrong result (by falling back >>>> after a problem is detected) would not be acceptable, either. >>> >>> except that an error at this point only invalidates the use of JIT, >>> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not. >>> >>> the later is setup to be used later by the code that is added, >> >> I think we could stumble ahead, but if this were to happen our >> assumptions about how the API works have been invalidated. > > Well, pcre2_jit_compile() might fail for internal reasons, e.g. > pcre2jit(3) states: "[...] If a pattern is too big, a call to > pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY." > > For example, the following fails for me: > $ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c > fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)… > > But explicitly disabling JIT makes it "work": > $ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c > $ > > It's a made up example and might even be intended behavior by git, but > it also proves a point Carlo already mentioned, a failing call to > pcre2_jit_compile() only invalidates the use of the JIT engine. We can > still use and fall back to the interpreter. We should arguably do this, I hadn't bothered because I haven't been able to find anything except pathological patterns where it matters, and silently falling back in those cases will suck a lot more IMO. If you are using such a pathological pattern it's almost always a better idea to adjust your crazy pattern. So I think in the *general* case we really should just keep this, and *maybe* suggest the user try with (*NO_JIT) in the pattern. But silently falling back kind of sucks, but unfortunately pcre2 doesn't provide a way to say "failed because of SELinux" v.s. "failed because the pattern is crazy", except that we could try to compile a known-good pattern with the JIT, to disambiguate the two. Anyway, if this is your goal you should really lead with that, not with fixing a relatively obscure SELinux edge case... > It would be used anyway if PCRE2 was compiled without JIT support, so I > don't see any issues with falling back to interpreter mode if the JIT > compilation fails -- for whatever reason. It's the "for whatever reason" that I take issue with. We'd be in an unknown state with the API behaving differently than we expect, and returning unknown codes. That's different than the *known* error codes (e.g. "no memory", oven though it's meaning is apparently overloaded to the point of near-uselessness). >> The pcre2_jit_compile() doesn't promise to return a finite set of error >> codes, but: >> >> [...]0 for success, or a negative error code otherwise[...] >> >> But if new codes were added it's anyone's guess what state we'd be in, >> so I think the safe thing is to BUG() out if we get as far as >> pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or >> PCRE2_ERROR_NOMEMORY. > > But why BUG()? JIT is an optimization that might fail for PCRE2 internal > reasons. Why should we make 'git grep' fail too in this case when we can > handle it just fine by attempting to use the interpreter? > > If the pattern is really bogus, the interpreter will complain as well > and we'll error out. But failing just because the JIT engine can't > handle the pattern? Doesn't sound right to me. See above, we're failing because our assumptions about how to use the API have broken down at that point. We usually bug out in those cases. >>>> Receiving BADOPTION could be a sign that there is something wrong in >>>> the input, not from the end-user but from the code, in which case >>>> stopping with BUG() may be a more appropriate? >>> >>> The way PCRE handles this kind of errors internally is to instruct pcre2_match() >>> to use the interpreter. >>> >>> While a BUG() might be a way to ensure the code is using the right set >>> of options >>> I would expect that the failure will be reported by pcre2_compile >>> instead, with the >>> only cases left, only being interna to PCRE (ex: JIT can't yet support >>> a feature the >>> interpreter has) >> >> I agree that it's possible in general that an external library might >> start returning a "benign" error code that we could recover from, so >> BUG(...) would be overdoing it. > > And I think that's the case here: JIT is an optimization that might not > be available under all circumstances, as, for example, under SELinux's > 'deny_execmem' setting. So we need to have a backup plan for such > systems anyway. Why not always try to use the interpreter if JIT > compilation fails? See above, but maybe it's the least sucky thing (and definitely simpler). I'm mainly checking that we're doing that we want here, and that we're going into it with eyes open. That we're now discussing a topic entirely different from SELinux on a thread where we're (according to the commit message) fixing pcre2 where the JIT is "unusable on such systems" is my main concern here. >> So not only would a BUG() biting us here require them to create a new >> code for the state of "we have the JIT, but can't use it here" (for some >> reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already >> "overloaded" to mean that). >> >> It would also require them to invent a new "soft" failure mode for the >> JIT, i.e. not the facility added in a25b9085043, where we can use the >> JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern >> itself. > > We should really treat PCRE2 JIT as an *optional* optimization that > might not be available for certain cases. For these we should, IMHO, > simply use the interpreter mode, instead of bugging users with a BUG() / > die(). To summarize some of the above, I think performance also matters, we have cases where: A. We could use the non-JIT B. We could use the JIT, and it's a *lot* faster C. We can't use the JIT at all D. We can't use the JIT because we run into its limits I think it's fair to die on "D" as in practice you only (I think!) run into it on pathological patterns, but yes, another option would be to fall back to "A". But thinking you're doing "B" and not wanting to implicitly fall back to "A" is also a valid use-case. So I'm inclined to suggest that we should be less helpful with automatic fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or something. But as noted above needing to always disable an apparently "available" JIT on some systems (SELinux) does throw a monkey wrench into that particular suggestion :( So I'm not sure, I'm mainly trying to encourage you to think through the edge cases, and to summarize the full impact of the change in a re-roll. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-20 21:11 ` Ævar Arnfjörð Bjarmason @ 2023-01-18 14:22 ` Mathias Krause 2023-01-18 15:44 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 36+ messages in thread From: Mathias Krause @ 2023-01-18 14:22 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Carlo Arenas, Junio C Hamano, git On 20.12.22 22:11, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Dec 20 2022, Mathias Krause wrote: > > [De-CC-ing pcre-dev@, since this part is all git-specific] > >> Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason: >>> >>> On Fri, Dec 16 2022, Carlo Arenas wrote: >>> >>> [CC-ing pcre-dev@ for this "future error API" discussion] >>> >>>> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote: >>>>> >>>>> Mathias Krause <minipli@grsecurity.net> writes: >>>>> >>>>>> ... However, from a user's point of view a fall back to >>>>>> interpreter mode might still be desired in this case, as a failing >>>>>> 'git grep' is simply not acceptable, IMHO. >>>>> >>>>> "git grep" that silently produces a wrong result (by falling back >>>>> after a problem is detected) would not be acceptable, either. >>>> >>>> except that an error at this point only invalidates the use of JIT, >>>> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not. >>>> >>>> the later is setup to be used later by the code that is added, >>> >>> I think we could stumble ahead, but if this were to happen our >>> assumptions about how the API works have been invalidated. >> >> Well, pcre2_jit_compile() might fail for internal reasons, e.g. >> pcre2jit(3) states: "[...] If a pattern is too big, a call to >> pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY." >> >> For example, the following fails for me: >> $ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c >> fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)… >> >> But explicitly disabling JIT makes it "work": >> $ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c >> $ >> >> It's a made up example and might even be intended behavior by git, but >> it also proves a point Carlo already mentioned, a failing call to >> pcre2_jit_compile() only invalidates the use of the JIT engine. We can >> still use and fall back to the interpreter. > > We should arguably do this, I hadn't bothered because I haven't been > able to find anything except pathological patterns where it matters, and > silently falling back in those cases will suck a lot more IMO. > > If you are using such a pathological pattern it's almost always a better > idea to adjust your crazy pattern. > > So I think in the *general* case we really should just keep this, and > *maybe* suggest the user try with (*NO_JIT) in the pattern. Except for the case I'm trying to address, where we simply cannot detect *why* the JIT is failing from the error code alone. It'll error out with PCRE2_ERROR_NOMEMORY for the case where the JIT fails to allocate W|X memory as well as for the case where the pattern is crazy. > But silently falling back kind of sucks, but unfortunately pcre2 doesn't > provide a way to say "failed because of SELinux" v.s. "failed because > the pattern is crazy", except that we could try to compile a known-good > pattern with the JIT, to disambiguate the two. Exactly, so what about something like this: If JIT is generally available, try to JIT the user supplied pattern: 1/ If it fails with PCRE2_ERROR_NOMEMORY, try to compile a know good pattern, e.g. ".": 1a/ If this fails with PCRE2_ERROR_NOMEMORY as well, fall back to interpreter, as JIT is non-functional because of SELinux / PaX. 1b/ If not, it's a "crazy" pattern, suggest '(*NO_JIT)'. 2/ If it succeeds or fails with a different error, continue as of now, i.e. use JIT on success or die() on error, optionally suggesting '(*NO_JIT)'. That should handle the case you're concerned about and only fall back to interpreter mode if JIT won't be functional anyway. Admitted, this would also allow crazy patterns, but there's simply no way to detect these under such constraints. > Anyway, if this is your goal you should really lead with that, not with > fixing a relatively obscure SELinux edge case... It's just that I'm suffering from that "obscure SELinux edge case", just not under SELinux but PaX MPROTECT. I only mentioned SELinux to state, it's not only an issue for PaX but regular systems as well. So it's not a hypothetical case I like to get handled, but my work environment to get usable again. >> It would be used anyway if PCRE2 was compiled without JIT support, so I >> don't see any issues with falling back to interpreter mode if the JIT >> compilation fails -- for whatever reason. > > It's the "for whatever reason" that I take issue with. We'd be in an > unknown state with the API behaving differently than we expect, and > returning unknown codes. That's different than the *known* error codes > (e.g. "no memory", oven though it's meaning is apparently overloaded to > the point of near-uselessness). > >>> The pcre2_jit_compile() doesn't promise to return a finite set of error >>> codes, but: >>> >>> [...]0 for success, or a negative error code otherwise[...] >>> >>> But if new codes were added it's anyone's guess what state we'd be in, >>> so I think the safe thing is to BUG() out if we get as far as >>> pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or >>> PCRE2_ERROR_NOMEMORY. >> >> But why BUG()? JIT is an optimization that might fail for PCRE2 internal >> reasons. Why should we make 'git grep' fail too in this case when we can >> handle it just fine by attempting to use the interpreter? >> >> If the pattern is really bogus, the interpreter will complain as well >> and we'll error out. But failing just because the JIT engine can't >> handle the pattern? Doesn't sound right to me. > > See above, we're failing because our assumptions about how to use the > API have broken down at that point. We usually bug out in those cases. > >>>>> Receiving BADOPTION could be a sign that there is something wrong in >>>>> the input, not from the end-user but from the code, in which case >>>>> stopping with BUG() may be a more appropriate? >>>> >>>> The way PCRE handles this kind of errors internally is to instruct pcre2_match() >>>> to use the interpreter. >>>> >>>> While a BUG() might be a way to ensure the code is using the right set >>>> of options >>>> I would expect that the failure will be reported by pcre2_compile >>>> instead, with the >>>> only cases left, only being interna to PCRE (ex: JIT can't yet support >>>> a feature the >>>> interpreter has) >>> >>> I agree that it's possible in general that an external library might >>> start returning a "benign" error code that we could recover from, so >>> BUG(...) would be overdoing it. >> >> And I think that's the case here: JIT is an optimization that might not >> be available under all circumstances, as, for example, under SELinux's >> 'deny_execmem' setting. So we need to have a backup plan for such >> systems anyway. Why not always try to use the interpreter if JIT >> compilation fails? > > See above, but maybe it's the least sucky thing (and definitely > simpler). I'm mainly checking that we're doing that we want here, and > that we're going into it with eyes open. > > That we're now discussing a topic entirely different from SELinux on a > thread where we're (according to the commit message) fixing pcre2 where > the JIT is "unusable on such systems" is my main concern here. Yeah, I overlooked that angle initially, but it's a valid concern. However, limiting the functional change of falling back to interpreter mode on "JIT's broken anyway" systems only should address these and get me a functional 'git grep' again. >>> So not only would a BUG() biting us here require them to create a new >>> code for the state of "we have the JIT, but can't use it here" (for some >>> reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already >>> "overloaded" to mean that). >>> >>> It would also require them to invent a new "soft" failure mode for the >>> JIT, i.e. not the facility added in a25b9085043, where we can use the >>> JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern >>> itself. >> >> We should really treat PCRE2 JIT as an *optional* optimization that >> might not be available for certain cases. For these we should, IMHO, >> simply use the interpreter mode, instead of bugging users with a BUG() / >> die(). > > To summarize some of the above, I think performance also matters, we > have cases where: > > A. We could use the non-JIT > B. We could use the JIT, and it's a *lot* faster > C. We can't use the JIT at all > D. We can't use the JIT because we run into its limits > > I think it's fair to die on "D" as in practice you only (I think!) run > into it on pathological patterns, but yes, another option would be to > fall back to "A". > > But thinking you're doing "B" and not wanting to implicitly fall back to > "A" is also a valid use-case. Agreed. My above sketched handling should do that, as in not falling back to interpreter mode when the JIT would be functional per se, but just failed on this particular pattern. > So I'm inclined to suggest that we should be less helpful with automatic > fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or > something. Well, that's a real pain from a user's (my) point of view. Trust me, I'm suffering from this, otherwise I wouldn't have brought up the issue ;) > But as noted above needing to always disable an apparently "available" > JIT on some systems (SELinux) does throw a monkey wrench into that > particular suggestion :( Yep. > So I'm not sure, I'm mainly trying to encourage you to think through the > edge cases, and to summarize the full impact of the change in a re-roll. Yeah, I agree. The implied fallback to the interpreter, even for the more obscure cases should have been mentioned in the changelog, but I overlooked / ignored that case initially. My take from the discussion is to do a re-roll with something like this: if (p->pcre2_jit_on) { jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) { /* JIT's non-functional because of SELinux / PaX */ p->pcre2_jit_on = 0; return; } else if (jitret) { die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n" "Try prefixing the pattern with '(*NO_JIT)'\n", p->pattern, jitret); } ... } ...with pcre2_jit_functional() being something like this: static int pcre2_jit_functional(void) { pcre2_code *code; size_t off; int err; code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL); if (!code) return 0; err = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE); pcre2_code_free(code); return err == 0; } Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2023-01-18 14:22 ` Mathias Krause @ 2023-01-18 15:44 ` Ævar Arnfjörð Bjarmason 2023-01-19 9:19 ` Mathias Krause 0 siblings, 1 reply; 36+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-01-18 15:44 UTC (permalink / raw) To: Mathias Krause; +Cc: Carlo Arenas, Junio C Hamano, git On Wed, Jan 18 2023, Mathias Krause wrote: > On 20.12.22 22:11, Ævar Arnfjörð Bjarmason wrote: >> >> On Tue, Dec 20 2022, Mathias Krause wrote: >> >> [De-CC-ing pcre-dev@, since this part is all git-specific] >> >>> Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason: >>>> >>>> On Fri, Dec 16 2022, Carlo Arenas wrote: >>>> >>>> [CC-ing pcre-dev@ for this "future error API" discussion] >>>> >>>>> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote: >>>>>> >>>>>> Mathias Krause <minipli@grsecurity.net> writes: >>>>>> >>>>>>> ... However, from a user's point of view a fall back to >>>>>>> interpreter mode might still be desired in this case, as a failing >>>>>>> 'git grep' is simply not acceptable, IMHO. >>>>>> >>>>>> "git grep" that silently produces a wrong result (by falling back >>>>>> after a problem is detected) would not be acceptable, either. >>>>> >>>>> except that an error at this point only invalidates the use of JIT, >>>>> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not. >>>>> >>>>> the later is setup to be used later by the code that is added, >>>> >>>> I think we could stumble ahead, but if this were to happen our >>>> assumptions about how the API works have been invalidated. >>> >>> Well, pcre2_jit_compile() might fail for internal reasons, e.g. >>> pcre2jit(3) states: "[...] If a pattern is too big, a call to >>> pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY." >>> >>> For example, the following fails for me: >>> $ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c >>> fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)… >>> >>> But explicitly disabling JIT makes it "work": >>> $ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c >>> $ >>> >>> It's a made up example and might even be intended behavior by git, but >>> it also proves a point Carlo already mentioned, a failing call to >>> pcre2_jit_compile() only invalidates the use of the JIT engine. We can >>> still use and fall back to the interpreter. >> >> We should arguably do this, I hadn't bothered because I haven't been >> able to find anything except pathological patterns where it matters, and >> silently falling back in those cases will suck a lot more IMO. >> >> If you are using such a pathological pattern it's almost always a better >> idea to adjust your crazy pattern. >> >> So I think in the *general* case we really should just keep this, and >> *maybe* suggest the user try with (*NO_JIT) in the pattern. > > Except for the case I'm trying to address, where we simply cannot detect > *why* the JIT is failing from the error code alone. It'll error out with > PCRE2_ERROR_NOMEMORY for the case where the JIT fails to allocate W|X > memory as well as for the case where the pattern is crazy. *nod*, though I see you saw that I addressed that below. >> But silently falling back kind of sucks, but unfortunately pcre2 doesn't >> provide a way to say "failed because of SELinux" v.s. "failed because >> the pattern is crazy", except that we could try to compile a known-good >> pattern with the JIT, to disambiguate the two. > > Exactly, so what about something like this: > > If JIT is generally available, try to JIT the user supplied pattern: > 1/ If it fails with PCRE2_ERROR_NOMEMORY, try to compile a know good > pattern, e.g. ".": > 1a/ If this fails with PCRE2_ERROR_NOMEMORY as well, fall back to > interpreter, as JIT is non-functional because of SELinux / PaX. > 1b/ If not, it's a "crazy" pattern, suggest '(*NO_JIT)'. > 2/ If it succeeds or fails with a different error, continue as of now, > i.e. use JIT on success or die() on error, optionally suggesting > '(*NO_JIT)'. > > That should handle the case you're concerned about and only fall back to > interpreter mode if JIT won't be functional anyway. Admitted, this would > also allow crazy patterns, but there's simply no way to detect these > under such constraints. That sounds good, i.e. we could narrow the JIT falling back case to these SELinux cases and the like, distinct from generic internal errors. Maybe it's too much paranoia, but it should work & get rid of the ambiguity. >> Anyway, if this is your goal you should really lead with that, not with >> fixing a relatively obscure SELinux edge case... > > It's just that I'm suffering from that "obscure SELinux edge case", just > not under SELinux but PaX MPROTECT. I only mentioned SELinux to state, > it's not only an issue for PaX but regular systems as well. So it's not > a hypothetical case I like to get handled, but my work environment to > get usable again. Right, I didn't mean to imply it was, just that for the rest of us the more general effect of this change outside of SELinux is of more general interest. >>> It would be used anyway if PCRE2 was compiled without JIT support, so I >>> don't see any issues with falling back to interpreter mode if the JIT >>> compilation fails -- for whatever reason. >> >> It's the "for whatever reason" that I take issue with. We'd be in an >> unknown state with the API behaving differently than we expect, and >> returning unknown codes. That's different than the *known* error codes >> (e.g. "no memory", oven though it's meaning is apparently overloaded to >> the point of near-uselessness). >> >>>> The pcre2_jit_compile() doesn't promise to return a finite set of error >>>> codes, but: >>>> >>>> [...]0 for success, or a negative error code otherwise[...] >>>> >>>> But if new codes were added it's anyone's guess what state we'd be in, >>>> so I think the safe thing is to BUG() out if we get as far as >>>> pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or >>>> PCRE2_ERROR_NOMEMORY. >>> >>> But why BUG()? JIT is an optimization that might fail for PCRE2 internal >>> reasons. Why should we make 'git grep' fail too in this case when we can >>> handle it just fine by attempting to use the interpreter? >>> >>> If the pattern is really bogus, the interpreter will complain as well >>> and we'll error out. But failing just because the JIT engine can't >>> handle the pattern? Doesn't sound right to me. >> >> See above, we're failing because our assumptions about how to use the >> API have broken down at that point. We usually bug out in those cases. >> >>>>>> Receiving BADOPTION could be a sign that there is something wrong in >>>>>> the input, not from the end-user but from the code, in which case >>>>>> stopping with BUG() may be a more appropriate? >>>>> >>>>> The way PCRE handles this kind of errors internally is to instruct pcre2_match() >>>>> to use the interpreter. >>>>> >>>>> While a BUG() might be a way to ensure the code is using the right set >>>>> of options >>>>> I would expect that the failure will be reported by pcre2_compile >>>>> instead, with the >>>>> only cases left, only being interna to PCRE (ex: JIT can't yet support >>>>> a feature the >>>>> interpreter has) >>>> >>>> I agree that it's possible in general that an external library might >>>> start returning a "benign" error code that we could recover from, so >>>> BUG(...) would be overdoing it. >>> >>> And I think that's the case here: JIT is an optimization that might not >>> be available under all circumstances, as, for example, under SELinux's >>> 'deny_execmem' setting. So we need to have a backup plan for such >>> systems anyway. Why not always try to use the interpreter if JIT >>> compilation fails? >> >> See above, but maybe it's the least sucky thing (and definitely >> simpler). I'm mainly checking that we're doing that we want here, and >> that we're going into it with eyes open. >> >> That we're now discussing a topic entirely different from SELinux on a >> thread where we're (according to the commit message) fixing pcre2 where >> the JIT is "unusable on such systems" is my main concern here. > > Yeah, I overlooked that angle initially, but it's a valid concern. > However, limiting the functional change of falling back to interpreter > mode on "JIT's broken anyway" systems only should address these and get > me a functional 'git grep' again. *nod* >>>> So not only would a BUG() biting us here require them to create a new >>>> code for the state of "we have the JIT, but can't use it here" (for some >>>> reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already >>>> "overloaded" to mean that). >>>> >>>> It would also require them to invent a new "soft" failure mode for the >>>> JIT, i.e. not the facility added in a25b9085043, where we can use the >>>> JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern >>>> itself. >>> >>> We should really treat PCRE2 JIT as an *optional* optimization that >>> might not be available for certain cases. For these we should, IMHO, >>> simply use the interpreter mode, instead of bugging users with a BUG() / >>> die(). >> >> To summarize some of the above, I think performance also matters, we >> have cases where: >> >> A. We could use the non-JIT >> B. We could use the JIT, and it's a *lot* faster >> C. We can't use the JIT at all >> D. We can't use the JIT because we run into its limits >> >> I think it's fair to die on "D" as in practice you only (I think!) run >> into it on pathological patterns, but yes, another option would be to >> fall back to "A". >> >> But thinking you're doing "B" and not wanting to implicitly fall back to >> "A" is also a valid use-case. > > Agreed. My above sketched handling should do that, as in not falling > back to interpreter mode when the JIT would be functional per se, but > just failed on this particular pattern. > >> So I'm inclined to suggest that we should be less helpful with automatic >> fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or >> something. > > Well, that's a real pain from a user's (my) point of view. Trust me, I'm > suffering from this, otherwise I wouldn't have brought up the issue ;) That's fair enough, falling back in the "D" case sounds good. >> But as noted above needing to always disable an apparently "available" >> JIT on some systems (SELinux) does throw a monkey wrench into that >> particular suggestion :( > > Yep. > >> So I'm not sure, I'm mainly trying to encourage you to think through the >> edge cases, and to summarize the full impact of the change in a re-roll. > > Yeah, I agree. The implied fallback to the interpreter, even for the > more obscure cases should have been mentioned in the changelog, but I > overlooked / ignored that case initially. > > My take from the discussion is to do a re-roll with something like this: > > if (p->pcre2_jit_on) { > jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); > if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) { > /* JIT's non-functional because of SELinux / PaX */ > p->pcre2_jit_on = 0; > return; > } else if (jitret) { > die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n" > "Try prefixing the pattern with '(*NO_JIT)'\n", > p->pattern, jitret); > } > ... > } > > ...with pcre2_jit_functional() being something like this: > > static int pcre2_jit_functional(void) > { > pcre2_code *code; > size_t off; > int err; > > code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL); > if (!code) > return 0; > > err = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE); > pcre2_code_free(code); > > return err == 0; > } This seems sensible as pseudocode, although please don't add another pcre2_compile() entry point (as e.g. passing NULL here will bypass the context we carefully set up, and if we have a custom allocator...). Instead you could just re-enter the API itself via compile_grep_patterns(), or perhaps lower via compile_pcre2_pattern(). Then add some flag to "struct grep_opt" to indicate that we shouldn't die() or BUG() there, but just run a "jit test". This code also treats all failures of pcre2_jit_compile() the same, but here we only want PCRE2_ERROR_NOMEMORY. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2023-01-18 15:44 ` Ævar Arnfjörð Bjarmason @ 2023-01-19 9:19 ` Mathias Krause 0 siblings, 0 replies; 36+ messages in thread From: Mathias Krause @ 2023-01-19 9:19 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Carlo Arenas, Junio C Hamano, git On 18.01.23 16:44, Ævar Arnfjörð Bjarmason wrote: >>> [snip] >>> But silently falling back kind of sucks, but unfortunately pcre2 doesn't >>> provide a way to say "failed because of SELinux" v.s. "failed because >>> the pattern is crazy", except that we could try to compile a known-good >>> pattern with the JIT, to disambiguate the two. >> >> Exactly, so what about something like this: >> >> If JIT is generally available, try to JIT the user supplied pattern: >> 1/ If it fails with PCRE2_ERROR_NOMEMORY, try to compile a know good >> pattern, e.g. ".": >> 1a/ If this fails with PCRE2_ERROR_NOMEMORY as well, fall back to >> interpreter, as JIT is non-functional because of SELinux / PaX. >> 1b/ If not, it's a "crazy" pattern, suggest '(*NO_JIT)'. >> 2/ If it succeeds or fails with a different error, continue as of now, >> i.e. use JIT on success or die() on error, optionally suggesting >> '(*NO_JIT)'. >> >> That should handle the case you're concerned about and only fall back to >> interpreter mode if JIT won't be functional anyway. Admitted, this would >> also allow crazy patterns, but there's simply no way to detect these >> under such constraints. > > That sounds good, i.e. we could narrow the JIT falling back case to > these SELinux cases and the like, distinct from generic internal errors. > > Maybe it's too much paranoia, but it should work & get rid of the > ambiguity. Nah, it's fine. >>> [snip] >>> See above, but maybe it's the least sucky thing (and definitely >>> simpler). I'm mainly checking that we're doing that we want here, and >>> that we're going into it with eyes open. >>> >>> That we're now discussing a topic entirely different from SELinux on a >>> thread where we're (according to the commit message) fixing pcre2 where >>> the JIT is "unusable on such systems" is my main concern here. >> >> Yeah, I overlooked that angle initially, but it's a valid concern. >> However, limiting the functional change of falling back to interpreter >> mode on "JIT's broken anyway" systems only should address these and get >> me a functional 'git grep' again. > > *nod* > >>>>> [snip] >>> >>> To summarize some of the above, I think performance also matters, we >>> have cases where: >>> >>> A. We could use the non-JIT >>> B. We could use the JIT, and it's a *lot* faster >>> C. We can't use the JIT at all >>> D. We can't use the JIT because we run into its limits >>> >>> I think it's fair to die on "D" as in practice you only (I think!) run >>> into it on pathological patterns, but yes, another option would be to >>> fall back to "A". >>> >>> But thinking you're doing "B" and not wanting to implicitly fall back to >>> "A" is also a valid use-case. >> >> Agreed. My above sketched handling should do that, as in not falling >> back to interpreter mode when the JIT would be functional per se, but >> just failed on this particular pattern. >> >>> So I'm inclined to suggest that we should be less helpful with automatic >>> fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or >>> something. >> >> Well, that's a real pain from a user's (my) point of view. Trust me, I'm >> suffering from this, otherwise I wouldn't have brought up the issue ;) > > That's fair enough, falling back in the "D" case sounds good. > >>> But as noted above needing to always disable an apparently "available" >>> JIT on some systems (SELinux) does throw a monkey wrench into that >>> particular suggestion :( >> >> Yep. >> >>> So I'm not sure, I'm mainly trying to encourage you to think through the >>> edge cases, and to summarize the full impact of the change in a re-roll. >> >> Yeah, I agree. The implied fallback to the interpreter, even for the >> more obscure cases should have been mentioned in the changelog, but I >> overlooked / ignored that case initially. >> >> My take from the discussion is to do a re-roll with something like this: >> >> if (p->pcre2_jit_on) { >> jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); >> if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) { >> /* JIT's non-functional because of SELinux / PaX */ >> p->pcre2_jit_on = 0; >> return; >> } else if (jitret) { >> die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n" >> "Try prefixing the pattern with '(*NO_JIT)'\n", >> p->pattern, jitret); >> } >> ... >> } >> >> ...with pcre2_jit_functional() being something like this: >> >> static int pcre2_jit_functional(void) >> { >> pcre2_code *code; >> size_t off; >> int err; >> >> code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL); >> if (!code) >> return 0; >> >> err = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE); >> pcre2_code_free(code); >> >> return err == 0; >> } > > This seems sensible as pseudocode, although please don't add another > pcre2_compile() entry point (as e.g. passing NULL here will bypass the > context we carefully set up, and if we have a custom allocator...). That was intentional, actually. I specifically want to use the most basic API to test for "general JIT availability." That test should tell if PCRE2 JIT is working /in general/, not specifically how we make use of it in git, which might not. However, that would be a different error, i.e. not because PCRE2 failed to allocate JIT memory but us using the API wrong, hitting limitations, etc. Making use of the PCRE2 context we set up in compile_pcre2_pattern() (and thereby possibly generating errors because of it) could mask API usage errors resulting from these and may lead to a false fallback to interpreter mode when we want to die() instead? > Instead you could just re-enter the API itself via > compile_grep_patterns(), or perhaps lower via > compile_pcre2_pattern(). Then add some flag to "struct grep_opt" to > indicate that we shouldn't die() or BUG() there, but just run a "jit > test". This feels kinda hacky and overkill, actually. A dedicated simplistic test looks much cleaner, IMHO. > This code also treats all failures of pcre2_jit_compile() the same, but > here we only want PCRE2_ERROR_NOMEMORY. Does it really matter? I mean, we could change the final test to 'err != PCRE2_ERROR_NOMEMORY' but we also return early when we're unable to generate PCRE2 code for the "." pattern -- not strictly a JIT error as well. I'd say instead of splitting hairs, pcre2_jit_functional() should stay as above, as a failure to compile "." as a pattern can only happen (a) under very tight memory constraints or (b) for JIT internal reasons, which would boil down to not being able to allocate the required resources to generate JIT code --- it's such a simple pattern it's not expected to fail in any other way. Case (a) can be ignored, IMHO, as that would lead to follow up errors soon anyway. Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-16 12:15 [PATCH] grep: fall back to interpreter mode if JIT fails Mathias Krause 2022-12-16 16:12 ` Ævar Arnfjörð Bjarmason @ 2022-12-16 22:52 ` Junio C Hamano 2022-12-20 20:40 ` Mathias Krause 2023-01-27 15:49 ` [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails Mathias Krause 2 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2022-12-16 22:52 UTC (permalink / raw) To: Mathias Krause; +Cc: git, Carlo Marcelo Arenas Belón Mathias Krause <minipli@grsecurity.net> writes: > Such a change was already proposed 4 years ago [1] but wasn't merged for > unknown reasons. > > 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com This part does not belong to the log message but should go below three-dash line. If you have a more concrete "it was rejected because ...", to help judging why this version improves upon the previous attempt and it is clear the reason for past rejection no longer applies, that is a very different story, though. The way I read the original thread (assuming that the lore archive is showing all relevant messages there) is that RFC was reviewed well and everybody was happy about the direction it took. It even received fairly concrete suggestions for the real, non-RFC version, but that never materialized. It is very clear that the posted patch was not yet in a mergeable state as-is; "for unknown reasons" is less than helpful. Now, we learned a more concrete reason, i.e. "it got tons of help improving the draft into the final version, but the help was discarded and the final version did not materialize", does the attempt this time around improve on it sufficiently to make it mergeable, or is it sufficiently different that it needs a fresh review from scratch? If the latter, then "for unknown reasons" becomes even less relevant. The rest of the proposed log message, and the change itself, both look very reasonable and well explained, at least to me. Thanks. > Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> > Signed-off-by: Mathias Krause <minipli@grsecurity.net> # tweaked changelog, added comment > --- > grep.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/grep.c b/grep.c > index 06eed694936c..f2ada528b21d 100644 > --- a/grep.c > +++ b/grep.c > @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt > pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); > if (p->pcre2_jit_on) { > jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); > - if (jitret) > - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); > + if (jitret) { > + /* > + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) > + * indicated JIT support, the library might still > + * fail to generate JIT code for various reasons, > + * e.g. when SELinux's 'deny_execmem' or PaX's > + * MPROTECT prevent creating W|X memory mappings. > + * > + * Instead of faling hard, fall back to interpreter > + * mode, just as if the pattern was prefixed with > + * '(*NO_JIT)'. > + */ > + p->pcre2_jit_on = 0; > + return; > + } > > /* > * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] grep: fall back to interpreter mode if JIT fails 2022-12-16 22:52 ` Junio C Hamano @ 2022-12-20 20:40 ` Mathias Krause 0 siblings, 0 replies; 36+ messages in thread From: Mathias Krause @ 2022-12-20 20:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Carlo Marcelo Arenas Belón Am 16.12.22 um 23:52 schrieb Junio C Hamano: > Mathias Krause <minipli@grsecurity.net> writes: > >> Such a change was already proposed 4 years ago [1] but wasn't merged for >> unknown reasons. >> >> 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com > > This part does not belong to the log message but should go below > three-dash line. If you have a more concrete "it was rejected > because ...", to help judging why this version improves upon the > previous attempt and it is clear the reason for past rejection no > longer applies, that is a very different story, though. > > The way I read the original thread (assuming that the lore archive > is showing all relevant messages there) is that RFC was reviewed > well and everybody was happy about the direction it took. It even > received fairly concrete suggestions for the real, non-RFC version, > but that never materialized. It had a follow-up RFC[1] half a year later, adding a config option and, after some more discussion, even command line switches[2]. But not just IMHO is that far too much, but even you suggested to simply revert back to the initial RFC and implement the automatic fallback[3], basically merging it with a proper changelog[4]. As that never happened, I took up the work and tried to do just that. 1. https://lore.kernel.org/git/20190728235427.41425-1-carenas@gmail.com/ 2. https://lore.kernel.org/git/20190729105955.44390-1-carenas@gmail.com/ 3. https://lore.kernel.org/git/xmqqh874vikk.fsf@gitster-ct.c.googlers.com/ 4. https://lore.kernel.org/git/xmqqef1zmkp5.fsf@gitster-ct.c.googlers.com/ > It is very clear that the posted patch was not yet in a mergeable > state as-is; "for unknown reasons" is less than helpful. > > Now, we learned a more concrete reason, i.e. "it got tons of help > improving the draft into the final version, but the help was > discarded and the final version did not materialize", does the > attempt this time around improve on it sufficiently to make it > mergeable, or is it sufficiently different that it needs a fresh > review from scratch? If the latter, then "for unknown reasons" > becomes even less relevant. Sorry for not digging up that information for the initial patch submission. I'll add it to v2. > The rest of the proposed log message, and the change itself, both > look very reasonable and well explained, at least to me. Thanks a lot for the detailed review. I'll update the commit log accordingly but will wait for more feedback to see on which solution we want to settle on. > Thanks. > > >> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> >> Signed-off-by: Mathias Krause <minipli@grsecurity.net> # tweaked changelog, added comment >> --- >> grep.c | 17 +++++++++++++++-- >> 1 file changed, 15 insertions(+), 2 deletions(-) >> >> diff --git a/grep.c b/grep.c >> index 06eed694936c..f2ada528b21d 100644 >> --- a/grep.c >> +++ b/grep.c >> @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt >> pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); >> if (p->pcre2_jit_on) { >> jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); >> - if (jitret) >> - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); >> + if (jitret) { >> + /* >> + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) >> + * indicated JIT support, the library might still >> + * fail to generate JIT code for various reasons, >> + * e.g. when SELinux's 'deny_execmem' or PaX's >> + * MPROTECT prevent creating W|X memory mappings. >> + * >> + * Instead of faling hard, fall back to interpreter >> + * mode, just as if the pattern was prefixed with >> + * '(*NO_JIT)'. >> + */ >> + p->pcre2_jit_on = 0; >> + return; >> + } >> >> /* >> * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2022-12-16 12:15 [PATCH] grep: fall back to interpreter mode if JIT fails Mathias Krause 2022-12-16 16:12 ` Ævar Arnfjörð Bjarmason 2022-12-16 22:52 ` Junio C Hamano @ 2023-01-27 15:49 ` Mathias Krause 2023-01-27 16:34 ` Junio C Hamano 2023-01-31 18:56 ` [PATCH v3] " Mathias Krause 2 siblings, 2 replies; 36+ messages in thread From: Mathias Krause @ 2023-01-27 15:49 UTC (permalink / raw) To: git Cc: Ævar Arnfjörð Bjarmason, Mathias Krause, Carlo Marcelo Arenas Belón Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48): [user@fedora git]$ git grep -c PCRE2_JIT grep.c:1 [user@fedora git]$ # Enable SELinux's W^X policy [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep' [user@fedora git]$ git grep -c PCRE2_JIT fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48' Instead of failing hard in this case and making 'git grep' unusable on such systems, simply fall back to interpreter mode, leading to a much better user experience. As having a functional PCRE2 JIT compiler is a legitimate use case for performance reasons, we'll only do the fallback if the supposedly available JIT is found to be non-functional by attempting to JIT compile a very simple pattern. If this fails, JIT is deemed to be non-functional and we do the interpreter fallback. For all other cases, i.e. the simple pattern can be compiled but the user provided cannot, we fail hard as we do now as the reason for the failure must be the pattern itself. Cc: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> --- This patch is based on a previous attempt proposed by Carlo already 4 years ago[1]. However, it wasn't applied as there were still ongoing discussions about how to handle and possibly avoid the automatic fallback. A follow-up RFC had been posted half a year later[2], adding a config option and, after some more discussion, even command line switches[3]. But, after all, it was agreed on that this is far too much and Junio suggested to simply revert back to the initial RFC and implement the automatic fallback[4], basically merging it with a proper changelog[5]. As that never happened, I took up the work and tried to do just that. This, again, lead to some discussion but, fortunately, less about the config knobs, but more about the implications of such a change. I tried to address Ævar's concerns about always falling back to the interpreter and limited it to the problematic case I want to get solved. 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com 2. https://lore.kernel.org/git/20190728235427.41425-1-carenas@gmail.com/ 3. https://lore.kernel.org/git/20190729105955.44390-1-carenas@gmail.com/ 4. https://lore.kernel.org/git/xmqqh874vikk.fsf@gitster-ct.c.googlers.com/ 5. https://lore.kernel.org/git/xmqqef1zmkp5.fsf@gitster-ct.c.googlers.com/ Changes in v2: The current version takes a conservative approach by only implementing the fallback to interpreter mode when the runtime test for basic JIT support fails as well, indicating the inability of PCRE2's memory allocator to acquire a W|X mappings for runtime code generation. pcre2_jit_functional() very much intentional calls pcre2_compile() without making use of the context set up in compile_pcre2_pattern() to exclude any errors related to that -- a wrong combination of options making pcre2_jit_compile() fail for these reasons instead of the memory mapping error we try to detect. This ensures we're doing a dumb simple JIT compile test to probe its general runtime availability and not wrongly fall back to interpreter mode because the option combination we're trying to make use of isn't supported by the JIT. I also changed the author to myself as the current state has little in common with what Carlo once proposed. --- grep.c | 42 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/grep.c b/grep.c index 06eed694936c..59afc3f07fc9 100644 --- a/grep.c +++ b/grep.c @@ -262,6 +262,31 @@ static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) free(pointer); } +static int pcre2_jit_functional(void) +{ + static int jit_working = -1; + pcre2_code *code; + size_t off; + int err; + + if (jit_working != -1) + return jit_working; + + /* + * Try to JIT compile a simple pattern to probe if the JIT is + * working in general. It might fail for systems where creating + * memory mappings for runtime code generation is restricted. + */ + code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL); + if (!code) + return 0; + + jit_working = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE) == 0; + pcre2_code_free(code); + + return jit_working; +} + static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) { int error; @@ -317,8 +342,23 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); if (p->pcre2_jit_on) { jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); - if (jitret) + if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) { + /* + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) + * indicated JIT support, the library might still + * fail to generate JIT code for various reasons, + * e.g. when SELinux's 'deny_execmem' or PaX's + * MPROTECT prevent creating W|X memory mappings. + * + * Instead of faling hard, fall back to interpreter + * mode, just as if the pattern was prefixed with + * '(*NO_JIT)'. + */ + p->pcre2_jit_on = 0; + return; + } else if (jitret) { die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + } /* * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just -- 2.39.0 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-27 15:49 ` [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails Mathias Krause @ 2023-01-27 16:34 ` Junio C Hamano 2023-01-27 17:39 ` Junio C Hamano 2023-01-29 12:28 ` Mathias Krause 2023-01-31 18:56 ` [PATCH v3] " Mathias Krause 1 sibling, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2023-01-27 16:34 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Mathias Krause <minipli@grsecurity.net> writes: > As having a functional PCRE2 JIT compiler is a legitimate use case for > performance reasons, we'll only do the fallback if the supposedly > available JIT is found to be non-functional by attempting to JIT compile > a very simple pattern. If this fails, JIT is deemed to be non-functional > and we do the interpreter fallback. For all other cases, i.e. the simple > pattern can be compiled but the user provided cannot, we fail hard as we > do now as the reason for the failure must be the pattern itself. I do not know if it is a good idea to rely on the "very simple pattern". The implementation of JIT could devise various ways to succeed for such simple patterns without having writable-executable piece of memory. What happened to the earlier idea of falling back to the interpreted codepath, which will catch any bad pattern that has "the reason for the failure" by failing anyway? > +static int pcre2_jit_functional(void) > +{ > + static int jit_working = -1; > + pcre2_code *code; > + size_t off; > + int err; > + > + if (jit_working != -1) > + return jit_working; > + > + /* > + * Try to JIT compile a simple pattern to probe if the JIT is > + * working in general. It might fail for systems where creating > + * memory mappings for runtime code generation is restricted. > + */ > + code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL); > + if (!code) > + return 0; > + > + jit_working = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE) == 0; > + pcre2_code_free(code); I'd prefer not having to worry about: Or it might not fail for such systems, as the pattern is too simple and future versions of pcre2_compile() could have special case code. > @@ -317,8 +342,23 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt > pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); > if (p->pcre2_jit_on) { > jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); > - if (jitret) > + if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) { > + /* > + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) > + * indicated JIT support, the library might still > + * fail to generate JIT code for various reasons, > + * e.g. when SELinux's 'deny_execmem' or PaX's > + * MPROTECT prevent creating W|X memory mappings. > + * > + * Instead of faling hard, fall back to interpreter > + * mode, just as if the pattern was prefixed with > + * '(*NO_JIT)'. > + */ > + p->pcre2_jit_on = 0; > + return; Yes, the "instead of failing hard, fall back" makes sense. Just that I do not see why the runtime test is a good thing to have. In short, we are not in the business of catching bugs in pcre2_jit implementations, so if they say they cannot compile the pattern (I would even say I doubt the point of checking the return code to ensure it is NOMEMORY), it would be fine to let the interpreter codepath to inspect the pattern and diagnose problems with it, or take the slow match without JIT. What am I missing? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-27 16:34 ` Junio C Hamano @ 2023-01-27 17:39 ` Junio C Hamano 2023-01-27 18:46 ` Junio C Hamano 2023-01-29 13:36 ` Mathias Krause 2023-01-29 12:28 ` Mathias Krause 1 sibling, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2023-01-27 17:39 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Junio C Hamano <gitster@pobox.com> writes: > Yes, the "instead of failing hard, fall back" makes sense. Just > that I do not see why the runtime test is a good thing to have. In > short, we are not in the business of catching bugs in pcre2_jit > implementations, so if they say they cannot compile the pattern (I > would even say I doubt the point of checking the return code to > ensure it is NOMEMORY), it would be fine to let the interpreter > codepath to inspect the pattern and diagnose problems with it, or > take the slow match without JIT. > > What am I missing? Note that I've seen and recently re-read the discussion that leads to https://lore.kernel.org/git/f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net/ I suspect that this auto-probe is related to solving "the user thinks JIT is in use but because of failing JIT the user's pattern is getting horrible performance" somehow. But I do not think a hard failure is a good approach to help users in such a situation. After such a failure, the user can prefix "(*NO_JIT)" to the pattern and retry, or give up the operation altogether and not get a useful result, but wouldn't it be far more helpful to just fallback as if (*NO_JIT) was on from the beginning? Also I notice that p->pcre2_jit_on is per "struct grep_pat", so it is not like "once we see a pathological pattern, we turn off JIT completely for other patterns", right? That is, if you have git grep -P -e "$A" -e "$B" and we fail to compile "$A" (for whatever reason), we could still (attempt to) compile "$B". Perhaps $A was too complex or was incompatible with JIT combined with other options, but $B may be easy enough to still be JITtable, in which case we would match with the JITted version of $B with interpreted version of $A, instead of failing, right? THanks. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-27 17:39 ` Junio C Hamano @ 2023-01-27 18:46 ` Junio C Hamano 2023-01-29 13:37 ` Mathias Krause 2023-01-29 13:36 ` Mathias Krause 1 sibling, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2023-01-27 18:46 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Junio C Hamano <gitster@pobox.com> writes: >> What am I missing? > > Note that I've seen and recently re-read the discussion that leads to > https://lore.kernel.org/git/f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net/ > > I suspect that this auto-probe is related to solving "the user > thinks JIT is in use but because of failing JIT the user's pattern > is getting horrible performance" somehow. But I do not think a hard > failure is a good approach to help users in such a situation. I guess what I am saying is that the previous one that has been queued on 'seen' may be better. It should cover your original "SELinux and other mechanisms can render JIT unusable because they do not allow dynamic generation of code" use case. Thanks. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-27 18:46 ` Junio C Hamano @ 2023-01-29 13:37 ` Mathias Krause 0 siblings, 0 replies; 36+ messages in thread From: Mathias Krause @ 2023-01-29 13:37 UTC (permalink / raw) To: Junio C Hamano Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 27.01.23 19:46, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >>> What am I missing? >> >> Note that I've seen and recently re-read the discussion that leads to >> https://lore.kernel.org/git/f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net/ >> >> I suspect that this auto-probe is related to solving "the user >> thinks JIT is in use but because of failing JIT the user's pattern >> is getting horrible performance" somehow. But I do not think a hard >> failure is a good approach to help users in such a situation. > > I guess what I am saying is that the previous one that has been > queued on 'seen' may be better. It should cover your original > "SELinux and other mechanisms can render JIT unusable because they > do not allow dynamic generation of code" use case. It clearly does cover my use case but it has a bad impact on the runtime of pathological patterns. But if you think that's not an issue, I'll update the changelog of v1 accordingly and resent it. Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-27 17:39 ` Junio C Hamano 2023-01-27 18:46 ` Junio C Hamano @ 2023-01-29 13:36 ` Mathias Krause 2023-01-29 17:15 ` Junio C Hamano 1 sibling, 1 reply; 36+ messages in thread From: Mathias Krause @ 2023-01-29 13:36 UTC (permalink / raw) To: Junio C Hamano Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 27.01.23 18:39, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Yes, the "instead of failing hard, fall back" makes sense. Just >> that I do not see why the runtime test is a good thing to have. In >> short, we are not in the business of catching bugs in pcre2_jit >> implementations, so if they say they cannot compile the pattern (I >> would even say I doubt the point of checking the return code to >> ensure it is NOMEMORY), it would be fine to let the interpreter >> codepath to inspect the pattern and diagnose problems with it, or >> take the slow match without JIT. >> >> What am I missing? > > Note that I've seen and recently re-read the discussion that leads to > https://lore.kernel.org/git/f680b274-fa85-6624-096a-7753a2671c15@grsecurity.net/ Ahh, so ignore my last advise in the previous Email. You already read it. > I suspect that this auto-probe is related to solving "the user > thinks JIT is in use but because of failing JIT the user's pattern > is getting horrible performance" somehow. But I do not think a hard > failure is a good approach to help users in such a situation. Yes, it's exactly trying to detect this case and not cause a regression for users expecting 'git grep' to make use of the JIT. So, beside the W|X memory allocation error, other errors are likely only to point out limitations of the JIT compiler, e.g. the example I gave in https://lore.kernel.org/all/2b04b19a-a2bd-3dd5-6f21-ed0b0ad3e02f@grsecurity.net/, which is, admitted, a made up example that can easily be worked around by manually prefixing it with '(*NO_JIT)'. It would be a pain having to do that for *every* pattern, but only doing it for the pathological cases should be fine. Otherwise more users would have run into it already and complained about it, no? > After such a failure, the user can prefix "(*NO_JIT)" to the pattern > and retry, or give up the operation altogether and not get a useful > result, but wouldn't it be far more helpful to just fallback as if > (*NO_JIT) was on from the beginning? Sure, if it would be required for *every*, i.e. "normal" patterns. But always doing it even for abusive ones doesn't feel right. > Also I notice that p->pcre2_jit_on is per "struct grep_pat", so it > is not like "once we see a pathological pattern, we turn off JIT > completely for other patterns", right? That is, if you have > > git grep -P -e "$A" -e "$B" > > and we fail to compile "$A" (for whatever reason), we could still > (attempt to) compile "$B". Perhaps $A was too complex or was > incompatible with JIT combined with other options, but $B may be > easy enough to still be JITtable, in which case we would match with > the JITted version of $B with interpreted version of $A, instead of > failing, right? The current version of git would fail hard if it fails to JIT compile "$A". My patch doesn't change that behavior and that's intentional, as I share Ævar's thinking about falling back to the interpreter mode for "complex patterns" (which means pathological cases, really) is a bad idea. While we might be able to compile the pattern and run it in interpreter mode, it'll likely have a *much* higher runtime. Just to get you a glimpse of how much longer, this is what it takes running the pathological pattern from above's example on git.git: $ time git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" Binary file git-gui/macosx/git-gui.icns matches Binary file t/t5000/pax.tar matches Binary file t/t5004/big-pack.zip matches Binary file t/t5004/empty-with-pax-header.tar matches real 44m42,150s user 577m14,623s sys 0m46,210s So this grep run eat up ~9.5 *hours* of CPU time. Do we really want to fall back to something like this for the pathological cases? ...Yeah, I don't think so either. Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-29 13:36 ` Mathias Krause @ 2023-01-29 17:15 ` Junio C Hamano 2023-01-30 10:56 ` Ævar Arnfjörð Bjarmason 2023-01-30 11:08 ` Mathias Krause 0 siblings, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2023-01-29 17:15 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Mathias Krause <minipli@grsecurity.net> writes: > ... While we might be able to compile the pattern and run it in > interpreter mode, it'll likely have a *much* higher runtime. > ... > So this grep run eat up ~9.5 *hours* of CPU time. Do we really want to > fall back to something like this for the pathological cases? ...Yeah, I > don't think so either. You may not, but I do not agree with you at all. The code should not outsmart the user in such a case. Even if the pattern the user came up with is impossible to grok for a working JIT compiler, and it might be hard to grok for the interpreter, what is the next step you recommend the user if you refuse to fall back on the interprete? "Rewrite it to please the JIT compiler"? If that is the best pattern the user can produce to solve the problem at hand, being able to give the user an answer in 9 hours is much better than not being able to give anything at all. Maybe after waiting for 5 minutes, the user gets bored and ^C, or without killing it, open another terminal and try a different patern, and in 9 hours, perhaps comes up with an equivalent (or different but close enough) pattern that happens to run much faster, at which time the user may kill the original one. In any of these cases, by refusing to run, the code is not doing any service to the user. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-29 17:15 ` Junio C Hamano @ 2023-01-30 10:56 ` Ævar Arnfjörð Bjarmason 2023-01-30 18:49 ` Junio C Hamano 2023-01-30 11:08 ` Mathias Krause 1 sibling, 1 reply; 36+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-01-30 10:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: Mathias Krause, git, Carlo Marcelo Arenas Belón On Sun, Jan 29 2023, Junio C Hamano wrote: > Mathias Krause <minipli@grsecurity.net> writes: > >> ... While we might be able to compile the pattern and run it in >> interpreter mode, it'll likely have a *much* higher runtime. >> ... >> So this grep run eat up ~9.5 *hours* of CPU time. Do we really want to >> fall back to something like this for the pathological cases? ...Yeah, I >> don't think so either. > > You may not, but I do not agree with you at all. The code should > not outsmart the user in such a case. It's the falling back in the nominal case that would be outsmarting the user. If I compile libpcre2 with JIT support I'm expecting Git to use that, and not fall back in those cases where the JIT engine would give up. > Even if the pattern the user came up with is impossible to grok for > a working JIT compiler, and it might be hard to grok for the > interpreter, what is the next step you recommend the user if you > refuse to fall back on the interprete? "Rewrite it to please the > JIT compiler"? I'd argue that it's pretty much impossible to unintentionally write such pathological patterns, the edge cases where e.g. the JIT would run out of resources v.s. the normal engine are a non-issue for any "normal" use. Pathological regexes are pretty much only interesting to anyone in the context of DoS attacks where they're being used to cause intentional slowdowns. Here we're discussing an orthagonal case where the "JIT fails", but rather than some pathological pattern it's because SELinux has made it not work at runtime, and we're trying to tease the two cases apart. > If that is the best pattern the user can produce to solve the > problem at hand, being able to give the user an answer in 9 hours is > much better than not being able to give anything at all. Speed is a feature in itself, and in a lot of cases (e.g. user-supplied patterns vulnerable to a DoS attack) continuing on the slow path is much worse. Even just using my terminal for ad-hoc "git grep", I'd *much* rather get an early error about the pattern exceeding JIT resources than continuing on the fallback path. If I had somehow written one by accident (and this is stretching credulity) you can usually apply some minor tweaks to the pattern, and then execute it in seconds instead of minutes/hours. > Maybe after waiting for 5 minutes, the user gets bored and ^C, or > without killing it, open another terminal and try a different > patern, and in 9 hours, perhaps comes up with an equivalent (or > different but close enough) pattern that happens to run much faster, > at which time the user may kill the original one. In any of these > cases, by refusing to run, the code is not doing any service to the > user. I don't think this is plausible at all per the above, and that we shouldn't harm realistic use-cases to satisfy hypothetical ones. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 10:56 ` Ævar Arnfjörð Bjarmason @ 2023-01-30 18:49 ` Junio C Hamano 2023-01-31 8:34 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2023-01-30 18:49 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Mathias Krause, git, Carlo Marcelo Arenas Belón Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > If I compile libpcre2 with JIT support I'm expecting Git to use that, > and not fall back in those cases where the JIT engine would give up. The thing is, the reason why their Git has JIT enabled pcre2 for many users is not because they choose to compile their own Git for themselves because they wanted to play with JIT. To them, their distro and/or their employer gave a precompiled Git, in the hope that with JIT would be faster than without JIT when JIT is usable. In that context, "Speed is a feature in itself" is correct but "failing fast, forcing the user to try different things" is not a "Speed" feature at all. It may be interesting only for those who are curious to see what pattern was rejected by JIT. It is especially true as (1) we are willing to fall back to interpreter in the SELinux senario, and (2) for normal users who want to use Git, and not necessarily interested in playing with JIT, there is no other recourse than prefixing "I do not want this JITted" to their pattern ANYWAY. Why fail fast and force the user to take the only recourse manually, when the machinery already knows what the user's only viable alternative is (i.e. falling back to the interpreter)? > Pathological regexes are pretty much only interesting to anyone in the > context of DoS attacks where they're being used to cause intentional > slowdowns. Exactly. > Here we're discussing an orthagonal case where the "JIT fails", but > rather than some pathological pattern it's because SELinux has made it > not work at runtime, and we're trying to tease the two cases apart. s/and we're/but you're/. And I do not think you want to. > I don't think this is plausible at all per the above, and that we > shouldn't harm realistic use-cases to satisfy hypothetical ones. To me, what you are advocating is exactly the hypothetical ones that harm end-users who did not choose to enable JIT themselves. When JIT fails for whatever reason (including the SELinux senario) for them, they do not need to be told by Git failing, when the interpreter can give them the correct answer. Wanting to see the result of the operation they asked Git to do, while allowing Git to use clever optimizations WHEN ABLE, is what I see as realistic use-cases. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 18:49 ` Junio C Hamano @ 2023-01-31 8:34 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 36+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-01-31 8:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: Mathias Krause, git, Carlo Marcelo Arenas Belón On Mon, Jan 30 2023, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> If I compile libpcre2 with JIT support I'm expecting Git to use that, >> and not fall back in those cases where the JIT engine would give up. > > The thing is, the reason why their Git has JIT enabled pcre2 for > many users is not because they choose to compile their own Git for > themselves because they wanted to play with JIT. To them, their > distro and/or their employer gave a precompiled Git, in the hope > that with JIT would be faster than without JIT when JIT is usable. > > In that context, "Speed is a feature in itself" is correct but > "failing fast, forcing the user to try different things" is not a > "Speed" feature at all. It may be interesting only for those who > are curious to see what pattern was rejected by JIT. It is > especially true as (1) we are willing to fall back to interpreter in > the SELinux senario, and (2) for normal users who want to use Git, > and not necessarily interested in playing with JIT, there is no > other recourse than prefixing "I do not want this JITted" to their > pattern ANYWAY. Why fail fast and force the user to take the only > recourse manually, when the machinery already knows what the user's > only viable alternative is (i.e. falling back to the interpreter)? Because we have an issue with (1), but not (2). How would (2) happen? So far I've only seen intentionally pathological patterns designed to trigger the JIT's limits. I don't think it's worth DWYM-ing that path, when we're having to assume a lot about the "M" part of that. >> Pathological regexes are pretty much only interesting to anyone in the >> context of DoS attacks where they're being used to cause intentional >> slowdowns. > > Exactly. > >> Here we're discussing an orthagonal case where the "JIT fails", but >> rather than some pathological pattern it's because SELinux has made it >> not work at runtime, and we're trying to tease the two cases apart. > > s/and we're/but you're/. And I do not think you want to. That s/// is fair, but brings me back to my question above of why we're trying to solve (2) here. >> I don't think this is plausible at all per the above, and that we >> shouldn't harm realistic use-cases to satisfy hypothetical ones. > > To me, what you are advocating is exactly the hypothetical ones that > harm end-users who did not choose to enable JIT themselves. When JIT > fails for whatever reason (including the SELinux senario) for them, > they do not need to be told by Git failing, when the interpreter can > give them the correct answer. Wanting to see the result of the > operation they asked Git to do, while allowing Git to use clever > optimizations WHEN ABLE, is what I see as realistic use-cases. I'm saying that the "JIT fails for whatever reason" is hypothetical. It'll fail because of: - The (1) case, where we're categorically unable to run the JIT. Then we should proceed as if the JIT isn't available (as we do when it's e.g. not compiled into PCRE). - The pattern is pathological enough that it's about to take eons to execute it (2). The lack of bug reports about "hey, my existing 'git grep' pattern failed" when the JIT was shipped with v2.14.0 shows that this doesn't happen in practice. - The case where the API is returning some new error code that's unknown to us, let's call that (3). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-29 17:15 ` Junio C Hamano 2023-01-30 10:56 ` Ævar Arnfjörð Bjarmason @ 2023-01-30 11:08 ` Mathias Krause 2023-01-30 18:54 ` Junio C Hamano 1 sibling, 1 reply; 36+ messages in thread From: Mathias Krause @ 2023-01-30 11:08 UTC (permalink / raw) To: Junio C Hamano Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 29.01.23 18:15, Junio C Hamano wrote: > Mathias Krause <minipli@grsecurity.net> writes: > >> ... While we might be able to compile the pattern and run it in >> interpreter mode, it'll likely have a *much* higher runtime. >> ... >> So this grep run eat up ~9.5 *hours* of CPU time. Do we really want to >> fall back to something like this for the pathological cases? ...Yeah, I >> don't think so either. > > You may not, but I do not agree with you at all. The code should > not outsmart the user in such a case. It doesn't. My rhetoric question was just missing "automatically" to state that I would dislike an *automatic* fallback to the interpreter for *pathological cases.* But I'm fine with (and that's what this patch is all about!) a fallback to the interpreter for patterns that simply fail the JIT because it's broken. Sorry for the confusion. > Even if the pattern the user came up with is impossible to grok for > a working JIT compiler, and it might be hard to grok for the > interpreter, what is the next step you recommend the user if you > refuse to fall back on the interprete? "Rewrite it to please the > JIT compiler"? Not at all. A user is still free to disable the JIT and enforce using the interpreter by using the "(*NO_JIT)" prefix. My patch doesn't disable this behavior. My patch only tries to avoid having to specify it for "regular" patterns when the JIT is broken anyways. The key here is that this would be a manual step (in contrast to an automatic fallback), i.e. we require explicit user consent to accept the worse runtime performance. And, IMHO, that should be acceptable from a usability point of view as this would only be required for the pathological cases an otherwise functional JIT simply cannot handle. > If that is the best pattern the user can produce to solve the > problem at hand, being able to give the user an answer in 9 hours is > much better than not being able to give anything at all. Sure, fully agree. > Maybe after waiting for 5 minutes, the user gets bored and ^C, or > without killing it, open another terminal and try a different > patern, and in 9 hours, perhaps comes up with an equivalent (or > different but close enough) pattern that happens to run much faster, > at which time the user may kill the original one. In any of these > cases, by refusing to run, the code is not doing any service to the > user. My patch doesn't make it worse than what 'git grep' would currently be doing. On the contrary, actually. It allows me to use PaX's MPROTECT and have a functional 'git grep' as well. Maybe the missing piece here is simply something like below to make users more aware of the possibility to disable the JIT for the more complex cases?: diff --git a/grep.c b/grep.c index 59afc3f07fc9..1422f168b087 100644 --- a/grep.c +++ b/grep.c @@ -357,7 +357,8 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_jit_on = 0; return; } else if (jitret) { - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'%s\n", p->pattern, jitret, + pcre2_jit_functional() ? "\nYou might retry by prefixing the pattern with '(*NO_JIT)'" : ""); } /* (Sorry about the wrapped lines, my mailer is just broken. I'll make it a proper patch, if such functionality is indeed wanted.) Thanks, Mathias ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 11:08 ` Mathias Krause @ 2023-01-30 18:54 ` Junio C Hamano 2023-01-30 20:08 ` Junio C Hamano 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2023-01-30 18:54 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Mathias Krause <minipli@grsecurity.net> writes: > My patch doesn't make it worse than what 'git grep' would currently be > doing. On the contrary, actually. It allows me to use PaX's MPROTECT and > have a functional 'git grep' as well. I know. But then without the "why did it fail?" logic (i.e. the v1 patch), it does not make it worse than the current code, either, and of course allows you to use JIT-enabled pcre2 even where JIT is impossible due to MPROTECT and whatever reasson. > Maybe the missing piece here is simply something like below to make > users more aware of the possibility to disable the JIT for the more > complex cases?: If we were to keep that "die", it is absolutely required, I would think. Users who got their Git with JIT-enabled pcre2 may be viewing JIT merely as "a clever optimization the implementation is allowed to use when able", without knowing and more importantly without wanting to know how to disable it from within their patterns. But can't we drop that die() if we took the v1 route? > diff --git a/grep.c b/grep.c > index 59afc3f07fc9..1422f168b087 100644 > --- a/grep.c > +++ b/grep.c > @@ -357,7 +357,8 @@ static void compile_pcre2_pattern(struct grep_pat > *p, const struct grep_opt *opt > p->pcre2_jit_on = 0; > return; > } else if (jitret) { > - die("Couldn't JIT the PCRE2 pattern '%s', got > '%d'\n", p->pattern, jitret); > + die("Couldn't JIT the PCRE2 pattern '%s', got > '%d'%s\n", p->pattern, jitret, > + pcre2_jit_functional() ? "\nYou might retry > by prefixing the pattern with '(*NO_JIT)'" : ""); > } > > /* > > (Sorry about the wrapped lines, my mailer is just broken. I'll make it a > proper patch, if such functionality is indeed wanted.) > > Thanks, > Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 18:54 ` Junio C Hamano @ 2023-01-30 20:08 ` Junio C Hamano 2023-01-30 21:21 ` Junio C Hamano 2023-01-31 7:30 ` Mathias Krause 0 siblings, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2023-01-30 20:08 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Junio C Hamano <gitster@pobox.com> writes: > If we were to keep that "die", it is absolutely required, I would > think. Users who got their Git with JIT-enabled pcre2 may be > viewing JIT merely as "a clever optimization the implementation is > allowed to use when able", without knowing and more importantly > without wanting to know how to disable it from within their > patterns. > > But can't we drop that die() if we took the v1 route? Having said all that, I do not mind queuing v2 if the "use *NO_JIT to disable" is added to the message to help users who are forced to redo the query. And in practice, it shouldn't make that much difference, because the only scenario (other than the SELinux-like situation where JIT is compiled in but does not work at all) that the difference may matter would happen when a non-trivial portion of the patterns users use are not workable with JIT, but if that were the case, we would have written JIT off as not mature enough and not yet usable long time ago. So, in practice, patterns refused by JIT would be a very tiny minority to matter in real life, and "failing fast to inconvenience users" would not be too bad. So while I still think v1's simplicity is the right thing to have here, I think it is waste of our braincell to compare v1 vs v2. As v2 gives smaller incremental behaviour change perceived by end users, if somebody really wanted to, I'd expect that a low-hanging fruit #leftoverbit on top of such a patch, after the dust settles, would be to (1) rename pcre2_jit_functional() to fall_back_to_interpreter() or something, (2) add a configuration variable to tell fall_back_to_interpreter() that any form of JIT error is allowed to fall back to interpreter(). and such a patch will essentially give back the simplicity of v1 to folks who opt into the configuration. Thanks. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 20:08 ` Junio C Hamano @ 2023-01-30 21:21 ` Junio C Hamano 2023-01-30 22:30 ` Ramsay Jones 2023-01-31 7:48 ` Mathias Krause 2023-01-31 7:30 ` Mathias Krause 1 sibling, 2 replies; 36+ messages in thread From: Junio C Hamano @ 2023-01-30 21:21 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Junio C Hamano <gitster@pobox.com> writes: > Having said all that, I do not mind queuing v2 if the "use *NO_JIT > to disable" is added to the message to help users who are forced to > redo the query. In the meantime, here is what I plan to apply on top of v2 while queuing it. The message given to die() should lack the terminating LF, and the overlong line can and should be split at operator boundary. Thanks. grep.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git c/grep.c w/grep.c index 59afc3f07f..42f184bd09 100644 --- c/grep.c +++ w/grep.c @@ -357,7 +357,11 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_jit_on = 0; return; } else if (jitret) { - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'%s", + p->pattern, jitret, + pcre2_jit_functional() + ? "\nPerhaps prefix (*NO_GIT) to your pattern?" + : ""); } /* ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 21:21 ` Junio C Hamano @ 2023-01-30 22:30 ` Ramsay Jones 2023-01-30 23:27 ` Junio C Hamano 2023-01-31 7:48 ` Mathias Krause 1 sibling, 1 reply; 36+ messages in thread From: Ramsay Jones @ 2023-01-30 22:30 UTC (permalink / raw) To: Junio C Hamano, Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 30/01/2023 21:21, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Having said all that, I do not mind queuing v2 if the "use *NO_JIT >> to disable" is added to the message to help users who are forced to >> redo the query. > > In the meantime, here is what I plan to apply on top of v2 while > queuing it. The message given to die() should lack the terminating > LF, and the overlong line can and should be split at operator > boundary. > > Thanks. > > grep.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git c/grep.c w/grep.c > index 59afc3f07f..42f184bd09 100644 > --- c/grep.c > +++ w/grep.c > @@ -357,7 +357,11 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt > p->pcre2_jit_on = 0; > return; > } else if (jitret) { > - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); > + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'%s", > + p->pattern, jitret, > + pcre2_jit_functional() > + ? "\nPerhaps prefix (*NO_GIT) to your pattern?" s/NO_GIT/NO_JIT/ ? :) ATB, Ramsay Jones > + : ""); > } > > /* ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 22:30 ` Ramsay Jones @ 2023-01-30 23:27 ` Junio C Hamano 0 siblings, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2023-01-30 23:27 UTC (permalink / raw) To: Ramsay Jones Cc: Mathias Krause, git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Ramsay Jones <ramsay@ramsayjones.plus.com> writes: >> + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'%s", >> + p->pattern, jitret, >> + pcre2_jit_functional() >> + ? "\nPerhaps prefix (*NO_GIT) to your pattern?" > > s/NO_GIT/NO_JIT/ ? :) Indeed. Thanks. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 21:21 ` Junio C Hamano 2023-01-30 22:30 ` Ramsay Jones @ 2023-01-31 7:48 ` Mathias Krause 2023-01-31 16:41 ` Junio C Hamano 1 sibling, 1 reply; 36+ messages in thread From: Mathias Krause @ 2023-01-31 7:48 UTC (permalink / raw) To: Junio C Hamano Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 30.01.23 22:21, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Having said all that, I do not mind queuing v2 if the "use *NO_JIT >> to disable" is added to the message to help users who are forced to >> redo the query. > > In the meantime, here is what I plan to apply on top of v2 while > queuing it. The message given to die() should lack the terminating > LF, and the overlong line can and should be split at operator > boundary. > > Thanks. > > grep.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git c/grep.c w/grep.c > index 59afc3f07f..42f184bd09 100644 > --- c/grep.c > +++ w/grep.c > @@ -357,7 +357,11 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt > p->pcre2_jit_on = 0; > return; > } else if (jitret) { > - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); > + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'%s", > + p->pattern, jitret, > + pcre2_jit_functional() > + ? "\nPerhaps prefix (*NO_GIT) to your pattern?" > + : ""); > } > > /* Looks sensible, but maybe something like below would be even better? diff --git a/grep.c b/grep.c index 59afc3f07fc9..e0144ba77e7a 100644 --- a/grep.c +++ b/grep.c @@ -357,7 +357,13 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt p->pcre2_jit_on = 0; return; } else if (jitret) { - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + int do_clip = p->patternlen > 64; + int clip_len = do_clip ? 64 : p->patternlen; + die("Couldn't JIT the PCRE2 pattern '%.*s'%s, got '%d'%s", + clip_len, p->pattern, do_clip ? "..." : "", jitret, + pcre2_jit_functional() + ? "\nPerhaps prefix (*NO_JIT) to your pattern?" + : ""); } /* It'll ensure, git will be printing the hint even for very long patterns, like the one I was testing this with ("$(perl -e 'print "(.)" x 4000')"). Thanks, Mathias ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-31 7:48 ` Mathias Krause @ 2023-01-31 16:41 ` Junio C Hamano 2023-01-31 18:34 ` Mathias Krause 0 siblings, 1 reply; 36+ messages in thread From: Junio C Hamano @ 2023-01-31 16:41 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Mathias Krause <minipli@grsecurity.net> writes: > Looks sensible, but maybe something like below would be even better? When I say "in the meantime", I expect it not to be the final one. This time, it was meant as a mere reminder to me while I wait for the (hopefully final) reroll. Thanks. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-31 16:41 ` Junio C Hamano @ 2023-01-31 18:34 ` Mathias Krause 0 siblings, 0 replies; 36+ messages in thread From: Mathias Krause @ 2023-01-31 18:34 UTC (permalink / raw) To: Junio C Hamano Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 31.01.23 17:41, Junio C Hamano wrote: > Mathias Krause <minipli@grsecurity.net> writes: > >> Looks sensible, but maybe something like below would be even better? > > When I say "in the meantime", I expect it not to be the final one. > This time, it was meant as a mere reminder to me while I wait for > the (hopefully final) reroll. Got it, will send v3 (the final one, as three times is the charm, they say) integrating the change to die() in a moment. Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-30 20:08 ` Junio C Hamano 2023-01-30 21:21 ` Junio C Hamano @ 2023-01-31 7:30 ` Mathias Krause 1 sibling, 0 replies; 36+ messages in thread From: Mathias Krause @ 2023-01-31 7:30 UTC (permalink / raw) To: Junio C Hamano Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 30.01.23 21:08, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> If we were to keep that "die", it is absolutely required, I would >> think. Users who got their Git with JIT-enabled pcre2 may be >> viewing JIT merely as "a clever optimization the implementation is >> allowed to use when able", without knowing and more importantly >> without wanting to know how to disable it from within their >> patterns. >> >> But can't we drop that die() if we took the v1 route? > > Having said all that, I do not mind queuing v2 if the "use *NO_JIT > to disable" is added to the message to help users who are forced to > redo the query. > > And in practice, it shouldn't make that much difference, because the > only scenario (other than the SELinux-like situation where JIT is > compiled in but does not work at all) that the difference may matter > would happen when a non-trivial portion of the patterns users use > are not workable with JIT, but if that were the case, we would have > written JIT off as not mature enough and not yet usable long time > ago. So, in practice, patterns refused by JIT would be a very tiny > minority to matter in real life, and "failing fast to inconvenience > users" would not be too bad. Exactly! > So while I still think v1's simplicity is the right thing to have > here, I think it is waste of our braincell to compare v1 vs v2. As > v2 gives smaller incremental behaviour change perceived by end > users, if somebody really wanted to, I'd expect that a low-hanging > fruit #leftoverbit on top of such a patch, after the dust settles, > would be to > > (1) rename pcre2_jit_functional() to fall_back_to_interpreter() or > something, > > (2) add a configuration variable to tell fall_back_to_interpreter() > that any form of JIT error is allowed to fall back to > interpreter(). > > and such a patch will essentially give back the simplicity of v1 to > folks who opt into the configuration. Fair enough. But aside from the W|X memory allocation denial exception is the likelihood to run into the limitations of PCRE2's JIT requiring the interpreter fallback so little (as otherwise we'd see it in the past already), I think, the demand for such a knob is basically nonexistent. Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails 2023-01-27 16:34 ` Junio C Hamano 2023-01-27 17:39 ` Junio C Hamano @ 2023-01-29 12:28 ` Mathias Krause 1 sibling, 0 replies; 36+ messages in thread From: Mathias Krause @ 2023-01-29 12:28 UTC (permalink / raw) To: Junio C Hamano Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón On 27.01.23 17:34, Junio C Hamano wrote: > Mathias Krause <minipli@grsecurity.net> writes: > >> As having a functional PCRE2 JIT compiler is a legitimate use case for >> performance reasons, we'll only do the fallback if the supposedly >> available JIT is found to be non-functional by attempting to JIT compile >> a very simple pattern. If this fails, JIT is deemed to be non-functional >> and we do the interpreter fallback. For all other cases, i.e. the simple >> pattern can be compiled but the user provided cannot, we fail hard as we >> do now as the reason for the failure must be the pattern itself. > > I do not know if it is a good idea to rely on the "very simple > pattern". The implementation of JIT could devise various ways to > succeed for such simple patterns without having writable-executable > piece of memory. Well, if PCRE2 JIT ever changes to optimize this case, we would be back to the error I'm seeing right now. But I doubt that PCRE2 will be doing optimizations like that. The current implementation does the JIT memory allocation test very early, even before looking at the pattern: https://github.com/PCRE2Project/pcre2/blob/pcre2-10.42/src/pcre2_jit_compile.c#L14450-L14465 But I can add a call to pcre2_pattern_info(PCRE2_INFO_JITSIZE) if you really like me to, but IMHO it's not needed. > What happened to the earlier idea of falling back > to the interpreted codepath, which will catch any bad pattern that > has "the reason for the failure" by failing anyway? Ævar's concerns about always falling back to the interpreter mode made me change the patch like this. Basically what he's concerned about are two things: 1/ "Crazy patterns" that fail the JIT but will work for the interpreter can be a serve performance regression. 2/ Always falling back to interpreter mode might mask JIT API usage errors, we'd like to see. While 1/ could also be seen as a limitation of current 'git grep', I share Ævar's extended runtime regression concerns. If, for example, some web interface offers users to supply arbitrary grep patterns, abusing the interpreter mode fallback will consume significant more CPU resources than it does right now (which simply fails with an error). > >> +static int pcre2_jit_functional(void) >> +{ >> + static int jit_working = -1; >> + pcre2_code *code; >> + size_t off; >> + int err; >> + >> + if (jit_working != -1) >> + return jit_working; >> + >> + /* >> + * Try to JIT compile a simple pattern to probe if the JIT is >> + * working in general. It might fail for systems where creating >> + * memory mappings for runtime code generation is restricted. >> + */ >> + code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL); >> + if (!code) >> + return 0; >> + >> + jit_working = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE) == 0; >> + pcre2_code_free(code); > > I'd prefer not having to worry about: Or it might not fail for such > systems, as the pattern is too simple and future versions of > pcre2_compile() could have special case code. See above, it's unlikely to happen. > >> @@ -317,8 +342,23 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt >> pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); >> if (p->pcre2_jit_on) { >> jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); >> - if (jitret) >> + if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) { >> + /* >> + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) >> + * indicated JIT support, the library might still >> + * fail to generate JIT code for various reasons, >> + * e.g. when SELinux's 'deny_execmem' or PaX's >> + * MPROTECT prevent creating W|X memory mappings. >> + * >> + * Instead of faling hard, fall back to interpreter >> + * mode, just as if the pattern was prefixed with >> + * '(*NO_JIT)'. >> + */ >> + p->pcre2_jit_on = 0; >> + return; > > Yes, the "instead of failing hard, fall back" makes sense. Just > that I do not see why the runtime test is a good thing to have. It prevents the fallback from being abused and introducing new regressions. So it's good to have. > In > short, we are not in the business of catching bugs in pcre2_jit > implementations, so if they say they cannot compile the pattern (I > would even say I doubt the point of checking the return code to > ensure it is NOMEMORY), it would be fine to let the interpreter > codepath to inspect the pattern and diagnose problems with it, or > take the slow match without JIT. Yeah, unfortunately they're not gonna fix what's a bug, IMHO. They think it's a feature: https://github.com/PCRE2Project/pcre2/pull/157 Anyhow, the error code is very well documented, see pcre2_jit_compile(3) """ [...] The function can also return PCRE2_ERROR_NOMEMORY if JIT is unable to allocate executable memory for the compiler, even if it was because of a system security restriction. """ And that's very much in line with what the test in pcre2_jit_compile()'s current implementation does. > What am I missing? Please have a look at Ævar's reasoning here: https://lore.kernel.org/git/221220.86bknxwy9t.gmgdl@evledraar.gmail.com/ Thanks, Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3] grep: fall back to interpreter if JIT memory allocation fails 2023-01-27 15:49 ` [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails Mathias Krause 2023-01-27 16:34 ` Junio C Hamano @ 2023-01-31 18:56 ` Mathias Krause 2023-01-31 21:05 ` Junio C Hamano 1 sibling, 1 reply; 36+ messages in thread From: Mathias Krause @ 2023-01-31 18:56 UTC (permalink / raw) To: git Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Mathias Krause, Carlo Marcelo Arenas Belón Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48): [user@fedora git]$ git grep -c PCRE2_JIT grep.c:1 [user@fedora git]$ # Enable SELinux's W^X policy [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep' [user@fedora git]$ git grep -c PCRE2_JIT fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48' Instead of failing hard in this case and making 'git grep' unusable on such systems, simply fall back to interpreter mode, leading to a much better user experience. As having a functional PCRE2 JIT compiler is a legitimate use case for performance reasons, we'll only do the fallback if the supposedly available JIT is found to be non-functional by attempting to JIT compile a very simple pattern. If this fails, JIT is deemed to be non-functional and we do the interpreter fallback. For all other cases, i.e. the simple pattern can be compiled but the user provided cannot, we fail hard as we do now as the reason for the failure must be the pattern itself. To aid users in helping themselves change the error message to include a hint about the '(*NO_JIT)' prefix. Also clip the pattern at 64 characters to ensure the hint will be seen by the user and not internally truncated by the die() function. Cc: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> --- v2: https://lore.kernel.org/git/20230127154952.485913-1-minipli@grsecurity.net/ Changes in v3: Mention the possibility to prefix a failing pattern with '(*NO_JIT)' in case we run into the JIT's limitations, as per Junio. Also clip the printed pattern to ensure the hint actually gets printed. --- grep.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/grep.c b/grep.c index 06eed694936c..c6fd44e3ec94 100644 --- a/grep.c +++ b/grep.c @@ -262,6 +262,31 @@ static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data) free(pointer); } +static int pcre2_jit_functional(void) +{ + static int jit_working = -1; + pcre2_code *code; + size_t off; + int err; + + if (jit_working != -1) + return jit_working; + + /* + * Try to JIT compile a simple pattern to probe if the JIT is + * working in general. It might fail for systems where creating + * memory mappings for runtime code generation is restricted. + */ + code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL); + if (!code) + return 0; + + jit_working = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE) == 0; + pcre2_code_free(code); + + return jit_working; +} + static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt) { int error; @@ -317,8 +342,29 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); if (p->pcre2_jit_on) { jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); - if (jitret) - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) { + /* + * Even though pcre2_config(PCRE2_CONFIG_JIT, ...) + * indicated JIT support, the library might still + * fail to generate JIT code for various reasons, + * e.g. when SELinux's 'deny_execmem' or PaX's + * MPROTECT prevent creating W|X memory mappings. + * + * Instead of faling hard, fall back to interpreter + * mode, just as if the pattern was prefixed with + * '(*NO_JIT)'. + */ + p->pcre2_jit_on = 0; + return; + } else if (jitret) { + int need_clip = p->patternlen > 64; + int clip_len = need_clip ? 64 : p->patternlen; + die("Couldn't JIT the PCRE2 pattern '%.*s'%s, got '%d'%s", + clip_len, p->pattern, need_clip ? "..." : "", jitret, + pcre2_jit_functional() + ? "\nPerhaps prefix (*NO_JIT) to your pattern?" + : ""); + } /* * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just -- 2.39.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v3] grep: fall back to interpreter if JIT memory allocation fails 2023-01-31 18:56 ` [PATCH v3] " Mathias Krause @ 2023-01-31 21:05 ` Junio C Hamano 0 siblings, 0 replies; 36+ messages in thread From: Junio C Hamano @ 2023-01-31 21:05 UTC (permalink / raw) To: Mathias Krause Cc: git, Ævar Arnfjörð Bjarmason, Carlo Marcelo Arenas Belón Mathias Krause <minipli@grsecurity.net> writes: > Mention the possibility to prefix a failing pattern with '(*NO_JIT)' in > case we run into the JIT's limitations, as per Junio. Also clip the > printed pattern to ensure the hint actually gets printed. Looking good. Will queue. Thanks. ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2023-01-31 21:05 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-16 12:15 [PATCH] grep: fall back to interpreter mode if JIT fails Mathias Krause 2022-12-16 16:12 ` Ævar Arnfjörð Bjarmason 2022-12-16 19:26 ` Mathias Krause 2022-12-16 23:09 ` Junio C Hamano 2022-12-17 2:50 ` Carlo Arenas 2022-12-19 9:00 ` Ævar Arnfjörð Bjarmason 2022-12-20 19:29 ` Mathias Krause 2022-12-20 21:11 ` Ævar Arnfjörð Bjarmason 2023-01-18 14:22 ` Mathias Krause 2023-01-18 15:44 ` Ævar Arnfjörð Bjarmason 2023-01-19 9:19 ` Mathias Krause 2022-12-16 22:52 ` Junio C Hamano 2022-12-20 20:40 ` Mathias Krause 2023-01-27 15:49 ` [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails Mathias Krause 2023-01-27 16:34 ` Junio C Hamano 2023-01-27 17:39 ` Junio C Hamano 2023-01-27 18:46 ` Junio C Hamano 2023-01-29 13:37 ` Mathias Krause 2023-01-29 13:36 ` Mathias Krause 2023-01-29 17:15 ` Junio C Hamano 2023-01-30 10:56 ` Ævar Arnfjörð Bjarmason 2023-01-30 18:49 ` Junio C Hamano 2023-01-31 8:34 ` Ævar Arnfjörð Bjarmason 2023-01-30 11:08 ` Mathias Krause 2023-01-30 18:54 ` Junio C Hamano 2023-01-30 20:08 ` Junio C Hamano 2023-01-30 21:21 ` Junio C Hamano 2023-01-30 22:30 ` Ramsay Jones 2023-01-30 23:27 ` Junio C Hamano 2023-01-31 7:48 ` Mathias Krause 2023-01-31 16:41 ` Junio C Hamano 2023-01-31 18:34 ` Mathias Krause 2023-01-31 7:30 ` Mathias Krause 2023-01-29 12:28 ` Mathias Krause 2023-01-31 18:56 ` [PATCH v3] " Mathias Krause 2023-01-31 21:05 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).