git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git bug: Perl compatible regular expressions do not work as expected
@ 2023-03-25 12:31 Mario Grgic
  2023-03-25 12:42 ` Kristoffer Haugsbakk
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Mario Grgic @ 2023-03-25 12:31 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

Initilize a new git repo, create a simple C file that you commit into the new
repo. Then search the git history using Perl compatible regular expression. 

Basically,

mkdir test
cd test
git init

Create a file 

cat <<END > test.c
int main(int argc, const char *argv[])
{
    return 0;
}
END 

git commit -m 'added test file'

Then run:

git log --all -p -G '\bmain\b'

git is compiled with pcre2 library support but does not find main word in the
file we just added.


What did you expect to happen? (Expected behavior)
git compiled with pcre2 library should support Perl compatible regular
expressions

What happened instead? (Actual behavior)
Nothing is found, when in fact the search for term is present in the git
history

What's different between what you expected and what actually happened?
git should print out the commit containing the search term.

Anything else you want to add:

git was compiled with pcre2 library. Here is output of otool -L

$ otool -L git
git:
	/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices (compatibility version 1.0.0, current version 1069.24.0)
	/usr/local/lib/libpcre2-8.0.dylib (compatibility version 12.0.0, current version 12.2.0)
	/usr/local/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.13)
	/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
	/usr/local/lib/libintl.8.dylib (compatibility version 11.0.0, current version 11.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1677.104.0)
	/usr/lib/libcharset.1.dylib (compatibility version 2.0.0, current version 2.0.0)


Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.40.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64 x86_64
compiler info: clang: 12.0.0 (clang-1200.0.32.21)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]
not run from a git repository - no hooks to show


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic
@ 2023-03-25 12:42 ` Kristoffer Haugsbakk
  2023-03-25 12:59   ` Mario Grgic
  2023-03-25 13:04 ` demerphq
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Kristoffer Haugsbakk @ 2023-03-25 12:42 UTC (permalink / raw)
  To: Mario Grgic; +Cc: git

On Sat, Mar 25, 2023, at 13:31, Mario Grgic wrote:
> What happened instead? (Actual behavior)
> Nothing is found, when in fact the search for term is present in the git
> history
>
> What's different between what you expected and what actually happened?
> git should print out the commit containing the search term.

Weird. It works for me.

    $ mkdir test
    $ cd test
    $ git init
    $ [add file]
    $ # This was missing from the instructions
    $ git add --all
    $ git commit -m 'added test file'
    $ git log --all -p -G '\bmain\b'
    commit 56fbac5e12649c4de95071cc1872569d7c34055e (HEAD -> main)
    Author: Kristoffer Haugsbakk <code@khaugsbakk.name>
    Date:   Sat Mar 25 13:33:51 2023 +0100

        added test file

    diff --git a/test.c b/test.c
    new file mode 100644
    index 0000000..0bffa6a
    --- /dev/null
    +++ b/test.c
    @@ -0,0 +1,5 @@
    +int main(int argc, const char *argv[])
    +{
    +    return 0;
    +}
    +

.

    [System Info]
    git version:
    git version 2.40.0
    cpu: x86_64
    no commit associated with this build
    sizeof-long: 8
    sizeof-size_t: 8
    shell-path: /bin/sh
    uname: Linux 5.4.0-144-generic #161~18.04.1-Ubuntu SMP Fri Feb 10 15:55:22 UTC 2023 x86_64
    compiler info: gnuc: 7.5
    libc info: glibc: 2.27
    $SHELL (typically, interactive shell): /bin/bash


    [Enabled Hooks]

-- 
Kristoffer Haugsbakk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 12:42 ` Kristoffer Haugsbakk
@ 2023-03-25 12:59   ` Mario Grgic
  0 siblings, 0 replies; 18+ messages in thread
From: Mario Grgic @ 2023-03-25 12:59 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git

I did some bisecting and this is broken for me on git-2.39.2 as well, but works in git-2.38.4 and previous versions (tried 2.30.0 as well).


> On Mar 25, 2023, at 8:42 AM, Kristoffer Haugsbakk <code@khaugsbakk.name> wrote:
> 
> On Sat, Mar 25, 2023, at 13:31, Mario Grgic wrote:
>> What happened instead? (Actual behavior)
>> Nothing is found, when in fact the search for term is present in the git
>> history
>> 
>> What's different between what you expected and what actually happened?
>> git should print out the commit containing the search term.
> 
> Weird. It works for me.
> 
>    $ mkdir test
>    $ cd test
>    $ git init
>    $ [add file]
>    $ # This was missing from the instructions
>    $ git add --all
>    $ git commit -m 'added test file'
>    $ git log --all -p -G '\bmain\b'
>    commit 56fbac5e12649c4de95071cc1872569d7c34055e (HEAD -> main)
>    Author: Kristoffer Haugsbakk <code@khaugsbakk.name>
>    Date:   Sat Mar 25 13:33:51 2023 +0100
> 
>        added test file
> 
>    diff --git a/test.c b/test.c
>    new file mode 100644
>    index 0000000..0bffa6a
>    --- /dev/null
>    +++ b/test.c
>    @@ -0,0 +1,5 @@
>    +int main(int argc, const char *argv[])
>    +{
>    +    return 0;
>    +}
>    +
> 
> .
> 
>    [System Info]
>    git version:
>    git version 2.40.0
>    cpu: x86_64
>    no commit associated with this build
>    sizeof-long: 8
>    sizeof-size_t: 8
>    shell-path: /bin/sh
>    uname: Linux 5.4.0-144-generic #161~18.04.1-Ubuntu SMP Fri Feb 10 15:55:22 UTC 2023 x86_64
>    compiler info: gnuc: 7.5
>    libc info: glibc: 2.27
>    $SHELL (typically, interactive shell): /bin/bash
> 
> 
>    [Enabled Hooks]
> 
> -- 
> Kristoffer Haugsbakk


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic
  2023-03-25 12:42 ` Kristoffer Haugsbakk
@ 2023-03-25 13:04 ` demerphq
  2023-03-25 13:09   ` Mario Grgic
  2023-03-25 14:16 ` Mario Grgic
  2023-03-25 15:39 ` Mario Grgic
  3 siblings, 1 reply; 18+ messages in thread
From: demerphq @ 2023-03-25 13:04 UTC (permalink / raw)
  To: Mario Grgic; +Cc: git

On Sat, 25 Mar 2023 at 14:00, Mario Grgic <mario_grgic@hotmail.com> wrote:
> git log --all -p -G '\bmain\b'

I think you want -P not -p.

cheers,
Yves

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 13:04 ` demerphq
@ 2023-03-25 13:09   ` Mario Grgic
  2023-03-25 13:24     ` demerphq
  2023-03-25 18:09     ` René Scharfe
  0 siblings, 2 replies; 18+ messages in thread
From: Mario Grgic @ 2023-03-25 13:09 UTC (permalink / raw)
  To: demerphq; +Cc: git

The lowercase -p is to print the output in patch format. You can rewrite the command line as 

 git log --all --patch --perl-regexp -G '\bmain\b’

I still get no output in any git version after 2.38.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 13:09   ` Mario Grgic
@ 2023-03-25 13:24     ` demerphq
  2023-03-25 18:09     ` René Scharfe
  1 sibling, 0 replies; 18+ messages in thread
From: demerphq @ 2023-03-25 13:24 UTC (permalink / raw)
  To: Mario Grgic; +Cc: git

On Sat, 25 Mar 2023 at 14:10, Mario Grgic <mario_grgic@hotmail.com> wrote:
>
> The lowercase -p is to print the output in patch format. You can rewrite the command line as
>
>  git log --all --patch --perl-regexp -G '\bmain\b’
>
> I still get no output in any git version after 2.38.4

RIght, sorry about that. My bad.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic
  2023-03-25 12:42 ` Kristoffer Haugsbakk
  2023-03-25 13:04 ` demerphq
@ 2023-03-25 14:16 ` Mario Grgic
  2023-03-25 15:39 ` Mario Grgic
  3 siblings, 0 replies; 18+ messages in thread
From: Mario Grgic @ 2023-03-25 14:16 UTC (permalink / raw)
  To: git

According to git bisect, this got broken in the following commit:


commit 1819ad327b7a1f19540a819813b70a0e8a7f798f
Author: Diomidis Spinellis <dds@aueb.gr>
Date:   Fri Aug 26 11:58:15 2022 +0300

    grep: fix multibyte regex handling under macOS
    
    The commit 29de20504e (Makefile: fix default regex settings on
    Darwin, 2013-05-11) fixed t0070-fundamental.sh under Darwin (macOS) by
    adopting Git's regex library.  However, this library is compiled with
    NO_MBSUPPORT, which causes git-grep to work incorrectly on multibyte
    (e.g. UTF-8) files.  Current macOS versions pass t0070-fundamental.sh
    with the native macOS regex library, which also supports multibyte
    characters.
    
    Adjust the Makefile to use the native regex library, and call
    setlocale(3) to set CTYPE according to the user's preference.
    The setlocale call is required on all platforms, but in platforms
    supporting gettext(3), setlocale was called as a side-effect of
    initializing gettext.  Therefore, move the CTYPE setlocale call from
    gettext.c to common-main.c and the corresponding locale.h include
    into git-compat-util.h.
    
    Thanks to the global initialization of CTYPE setlocale, the test-tool
    regex command now works correctly with supported multibyte regexes, and
    is used to set the MB_REGEX test prerequisite by assessing a platform's
    support for them.
    
    Signed-off-by: Diomidis Spinellis <dds@aueb.gr>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

 Makefile          |  2 +-
 common-main.c     |  1 +
 gettext.c         |  2 --
 git-compat-util.h |  1 +
 t/t7810-grep.sh   | 15 +++++++++++++++
 5 files changed, 18 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic
                   ` (2 preceding siblings ...)
  2023-03-25 14:16 ` Mario Grgic
@ 2023-03-25 15:39 ` Mario Grgic
  2023-03-27 16:30   ` Junio C Hamano
  3 siblings, 1 reply; 18+ messages in thread
From: Mario Grgic @ 2023-03-25 15:39 UTC (permalink / raw)
  To: git

Confirming that putting back NO_REGEX = YesPlease in the Makefile fixes the problem. I.e. the following patch fixes it for me:

--- Makefile	2023-03-25 11:24:01.000000000 -0400
+++ Makefile.patched	2023-03-25 11:25:11.000000000 -0400
@@ -1554,6 +1554,7 @@
 		APPLE_COMMON_CRYPTO = YesPlease
 		COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
 	endif
+	NO_REGEX = YesPlease
 	PTHREAD_LIBS =
 endif
 


Perhaps the real problem is that the build system should not use macOS native regex library is its configured to use —pcre2 library.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 13:09   ` Mario Grgic
  2023-03-25 13:24     ` demerphq
@ 2023-03-25 18:09     ` René Scharfe
  2023-03-27 16:29       ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: René Scharfe @ 2023-03-25 18:09 UTC (permalink / raw)
  To: Mario Grgic, demerphq; +Cc: git

Am 25.03.23 um 14:09 schrieb Mario Grgic:
> The lowercase -p is to print the output in patch format. You can rewrite the command line as
>
>  git log --all --patch --perl-regexp -G '\bmain\b’
>
> I still get no output in any git version after 2.38.4

-G doesn't support Perl regular expressions.  --perl-regexp only affects
--grep, --grep-reflog, --author, and --committer.  Neither POSIX basic
nor extended regular expressions support \b as word boundary.  GNU regex
and our compat/regex/ do, as extensions.  macOS regex supports it if the
flag REG_ENHANCED is given to regcomp(3).

So perhaps this is rather a feature request to support Perl regular
expressions for -G (and probably -S as well).  Or to enable REG_ENHANCED
for them, at least, like 54463d32ef (use enhanced basic regular
expressions on macOS, 2023-01-08) did to get alternations for git grep
on macOS.

René


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 18:09     ` René Scharfe
@ 2023-03-27 16:29       ` Junio C Hamano
  2023-03-27 17:23         ` René Scharfe
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-03-27 16:29 UTC (permalink / raw)
  To: René Scharfe; +Cc: Mario Grgic, demerphq, git

René Scharfe <l.s.r@web.de> writes:

> Am 25.03.23 um 14:09 schrieb Mario Grgic:
>> The lowercase -p is to print the output in patch format. You can rewrite the command line as
>>
>>  git log --all --patch --perl-regexp -G '\bmain\b’
>>
>> I still get no output in any git version after 2.38.4
>
> -G doesn't support Perl regular expressions.  --perl-regexp only affects
> --grep, --grep-reflog, --author, and --committer.  Neither POSIX basic
> nor extended regular expressions support \b as word boundary.  GNU regex
> and our compat/regex/ do, as extensions.  macOS regex supports it if the
> flag REG_ENHANCED is given to regcomp(3).

Good summary to unconfuse speculations in the thread.

> So perhaps this is rather a feature request to support Perl regular
> expressions for -G (and probably -S as well).  

Perhaps.  I used to be a "it would be wonderful if pcre were usable
everywhere" dreamer, but after seeing our share of bugs caused by
use of pcre, I am not a huge proponent anymore.  I do not object to
such an enhancement at all, as long as it is done cleanly and in
such a way that it is clear pcre cannot be used by accident when the
user does not ask for it.

> Or to enable REG_ENHANCED
> for them, at least, like 54463d32ef (use enhanced basic regular
> expressions on macOS, 2023-01-08) did to get alternations for git grep
> on macOS.

This one sounds like a reasonable thing, which may not have huge
unintended fallout, to do.  I am a bit surprised that we have to
cover each individual callsite of regcomp(3), though.  Doesn't the
54463d32ef fix use "#define regcomp git_regcomp" to cover everybody?




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-25 15:39 ` Mario Grgic
@ 2023-03-27 16:30   ` Junio C Hamano
  2023-03-27 17:22     ` Mario Grgic
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-03-27 16:30 UTC (permalink / raw)
  To: Mario Grgic; +Cc: git

Mario Grgic <mario_grgic@hotmail.com> writes:

> Confirming that putting back NO_REGEX = YesPlease in the Makefile fixes the problem. I.e. the following patch fixes it for me:
>
> --- Makefile	2023-03-25 11:24:01.000000000 -0400
> +++ Makefile.patched	2023-03-25 11:25:11.000000000 -0400
> @@ -1554,6 +1554,7 @@
>  		APPLE_COMMON_CRYPTO = YesPlease
>  		COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
>  	endif
> +	NO_REGEX = YesPlease
>  	PTHREAD_LIBS =
>  endif

It will unfortunately break multibyte support on macOS by reverting
what 1819ad32 (grep: fix multibyte regex handling under macOS,
2022-08-26) did.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-27 16:30   ` Junio C Hamano
@ 2023-03-27 17:22     ` Mario Grgic
  2023-03-27 21:11       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Mario Grgic @ 2023-03-27 17:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

In my case, I compiled git with pcre2 support, using third party PCRE2 library: https://github.com/PCRE2Project/pcre2
 and PCRE and multibyte support in git works with it just fine.

E.g. adding and committing a file:


```
#!/usr/bin/env node

console.log('顔🏁');
```

and searching it with

git log --all -p  -G '顔🏁’

works. 

Again, perhaps the solution here is to distinguish two cases:

1. git is compiled with native regex library on macOS 
2. git is compiled with third party regex library with PCRE and multibyte support on macOS

in case 2 everything works. 

Case 1 is perhaps more work on macOS and a feature request.



> On Mar 27, 2023, at 12:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Mario Grgic <mario_grgic@hotmail.com> writes:
> 
>> Confirming that putting back NO_REGEX = YesPlease in the Makefile fixes the problem. I.e. the following patch fixes it for me:
>> 
>> --- Makefile	2023-03-25 11:24:01.000000000 -0400
>> +++ Makefile.patched	2023-03-25 11:25:11.000000000 -0400
>> @@ -1554,6 +1554,7 @@
>> 		APPLE_COMMON_CRYPTO = YesPlease
>> 		COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
>> 	endif
>> +	NO_REGEX = YesPlease
>> 	PTHREAD_LIBS =
>> endif
> 
> It will unfortunately break multibyte support on macOS by reverting
> what 1819ad32 (grep: fix multibyte regex handling under macOS,
> 2022-08-26) did.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-27 16:29       ` Junio C Hamano
@ 2023-03-27 17:23         ` René Scharfe
  2023-03-27 21:33           ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: René Scharfe @ 2023-03-27 17:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mario Grgic, demerphq, git

Am 27.03.23 um 18:29 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Or to enable REG_ENHANCED
>> for them, at least, like 54463d32ef (use enhanced basic regular
>> expressions on macOS, 2023-01-08) did to get alternations for git grep
>> on macOS.
>
> This one sounds like a reasonable thing, which may not have huge
> unintended fallout, to do.  I am a bit surprised that we have to
> cover each individual callsite of regcomp(3), though.  Doesn't the
> 54463d32ef fix use "#define regcomp git_regcomp" to cover everybody?

54463d32ef only affects basic regular expressions (BRE), but -G and -S
use extended ones (ERE).  macOS allows both to be "enhanced".

I have a hard time finding a readable reference to the documentation to
brings us all onto the same page regarding what REG_ENHANCED actually
does.  [1] is in raw troff format, [2] isn't very pretty and shows ads.
Anyway, the rather lengthy section "ENHANCED FEATURES" explains it.

René


[1] https://opensource.apple.com/source/Libc/Libc-1439.40.11/regex/FreeBSD/re_format.7.auto.html
[2] https://www.unix.com/man-page/osx/7/re_format/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-27 17:22     ` Mario Grgic
@ 2023-03-27 21:11       ` Junio C Hamano
  2023-03-28  0:03         ` Mario Grgic
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-03-27 21:11 UTC (permalink / raw)
  To: Mario Grgic; +Cc: git

Mario Grgic <mario_grgic@hotmail.com> writes:

[administrivia: do not top post]

>> Mario Grgic <mario_grgic@hotmail.com> writes:
>>  ...
>>> +	NO_REGEX = YesPlease
>>> 	PTHREAD_LIBS =
>>> endif
>> 
>> It will unfortunately break multibyte support on macOS by reverting
>> what 1819ad32 (grep: fix multibyte regex handling under macOS,
>> 2022-08-26) did.

> In my case, I compiled git with pcre2 support, using third party
> PCRE2 library: https://github.com/PCRE2Project/pcre2 and PCRE and
> multibyte support in git works with it just fine.

Sorry, you misunderstood.  1819ad32 is about enabling multi-byte
support for normal regexp types, and does not have anything to do
with pcre.  By setting NO_REGEX, the build will not link with
Apple's regex library but the one from compat/ and that version is
what is used for -G and -E (not -P).  -G/-E patterns with multi-byte
would not work with compat/ stuff, but they should work when linked
with Apple's regex library.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-27 17:23         ` René Scharfe
@ 2023-03-27 21:33           ` Junio C Hamano
  2023-03-28 13:47             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-03-27 21:33 UTC (permalink / raw)
  To: René Scharfe; +Cc: Mario Grgic, demerphq, git

René Scharfe <l.s.r@web.de> writes:

> 54463d32ef only affects basic regular expressions (BRE), but -G and -S
> use extended ones (ERE).  macOS allows both to be "enhanced".
>
> I have a hard time finding a readable reference to the documentation to
> brings us all onto the same page regarding what REG_ENHANCED actually
> does.  [1] is in raw troff format, [2] isn't very pretty and shows ads.

Wow, [2] is also misleading in that it loses the distinction between
'+' and '\+'; in [1] at least you can see the differences between \&+
and \e+ ;-)

> Anyway, the rather lengthy section "ENHANCED FEATURES" explains it.

I suspect that 54463d32ef was done in a conservative way to avoid
unintended side effects to make ERE "enhanced".  I am not 100%
certain, but after reading the documentation you pointed at, I do
not see a valid expression without ENHANCED flag starting to mean
totally different thing with it (well, an extra '?' turning a
pattern from greedy to minimal may count as such a change in
semantics, but I do not see anybody sensible adding an extra '?'
in a pattern in the first place).

Perhaps it is safe to go ahead and loosen what 54463d32ef did, with
something like this (this is not even compile tested, of course)?

 Makefile                  | 9 +++++++++
 compat/regcomp_enhanced.c | 2 ++
 git-compat-util.h         | 2 +-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git c/Makefile w/Makefile
index 50ee51fde3..1fe0eb399e 100644
--- c/Makefile
+++ w/Makefile
@@ -293,6 +293,10 @@ include shared.mak
 # the flag REG_ENHANCED and you'd like to use it to enable enhanced basic
 # regular expressions.
 #
+# Define USE_ENHANCED_REGULAR_EXPRESSIONS if your C library provides
+# the flag REG_ENHANCED and you'd like to use it to enable enhanced
+# regular expressions for both BRE and ERE.
+#
 # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
 # user.
 #
@@ -2041,11 +2045,16 @@ ifdef NO_REGEX
 	COMPAT_CFLAGS += -Icompat/regex
 	COMPAT_OBJS += compat/regex/regex.o
 else
+ifdef USE_ENHANCED_REGULAR_EXPRESSIONS
+	COMPAT_CFLAGS += -DUSE_ENHANCED_REGULAR_EXPRESSIONS
+	COMPAT_OBJS += compat/regcomp_enhanced.o
+else
 ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
 	COMPAT_CFLAGS += -DUSE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
 	COMPAT_OBJS += compat/regcomp_enhanced.o
 endif
 endif
+endif
 ifdef NATIVE_CRLF
 	BASIC_CFLAGS += -DNATIVE_CRLF
 endif
diff --git c/compat/regcomp_enhanced.c w/compat/regcomp_enhanced.c
index 84193ce53b..aadbbac6d3 100644
--- c/compat/regcomp_enhanced.c
+++ w/compat/regcomp_enhanced.c
@@ -3,7 +3,9 @@
 
 int git_regcomp(regex_t *preg, const char *pattern, int cflags)
 {
+#ifndef USE_ENHANCED_REGULAR_EXPRESSIONS
 	if (!(cflags & REG_EXTENDED))
+#endif
 		cflags |= REG_ENHANCED;
 	return regcomp(preg, pattern, cflags);
 }
diff --git c/git-compat-util.h w/git-compat-util.h
index 1e6592624d..95d9b75a15 100644
--- c/git-compat-util.h
+++ w/git-compat-util.h
@@ -1377,7 +1377,7 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
 	return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
 }
 
-#ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
+#if defined(USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS) || defined(USE_ENHANCED_REGULAR_EXPRESSIONS)
 int git_regcomp(regex_t *preg, const char *pattern, int cflags);
 #define regcomp git_regcomp
 #endif


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-27 21:11       ` Junio C Hamano
@ 2023-03-28  0:03         ` Mario Grgic
  0 siblings, 0 replies; 18+ messages in thread
From: Mario Grgic @ 2023-03-28  0:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Interesting. I could be wrong. Let me clarify what I did and what I observe. I edited the Makefile and put back the NO_REGEX = YesPlease line. Then I configured the build as 

./configure —with-libpcre2=/usr/local 

then built and installed git. 

otool -L git shows git binary is dynamically linking the libpcre in /usr/local

However, it looks like this third party library is also used for -G  searches (plain POSIX regex or PCRE), since it accepts and correctly finds things like ‘\btext\b’. That is 

git log —all -p -G ‘\main\b’

works

Furthermore, -G works with multibyte strings as well: 

git log --all -p  -G '顔🏁’

What doesn’t work is multibyte strings with PCRESs, so for example this does not work:

git log --all -p  -G ‘\b顔🏁\b’

This doesn’t seem to be consistent with the thesis that regex library from compat is used for -G searches and that it doesn’t support multibyte strings?


> On Mar 27, 2023, at 5:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Mario Grgic <mario_grgic@hotmail.com> writes:
> 
> [administrivia: do not top post]
> 
>>> Mario Grgic <mario_grgic@hotmail.com> writes:
>>> ...
>>>> +	NO_REGEX = YesPlease
>>>> 	PTHREAD_LIBS =
>>>> endif
>>> 
>>> It will unfortunately break multibyte support on macOS by reverting
>>> what 1819ad32 (grep: fix multibyte regex handling under macOS,
>>> 2022-08-26) did.
> 
>> In my case, I compiled git with pcre2 support, using third party
>> PCRE2 library: https://github.com/PCRE2Project/pcre2 and PCRE and
>> multibyte support in git works with it just fine.
> 
> Sorry, you misunderstood.  1819ad32 is about enabling multi-byte
> support for normal regexp types, and does not have anything to do
> with pcre.  By setting NO_REGEX, the build will not link with
> Apple's regex library but the one from compat/ and that version is
> what is used for -G and -E (not -P).  -G/-E patterns with multi-byte
> would not work with compat/ stuff, but they should work when linked
> with Apple's regex library.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-27 21:33           ` Junio C Hamano
@ 2023-03-28 13:47             ` Junio C Hamano
  2023-03-28 17:56               ` René Scharfe
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2023-03-28 13:47 UTC (permalink / raw)
  To: René Scharfe; +Cc: Mario Grgic, demerphq, git

Junio C Hamano <gitster@pobox.com> writes:

> I suspect that 54463d32ef was done in a conservative way to avoid
> unintended side effects to make ERE "enhanced".  I am not 100%
> certain, but after reading the documentation you pointed at, I do
> not see a valid expression without ENHANCED flag starting to mean
> totally different thing with it (well, an extra '?' turning a
> pattern from greedy to minimal may count as such a change in
> semantics, but I do not see anybody sensible adding an extra '?'
> in a pattern in the first place).

Sorry, but that is nonsense.  We cannot avoid being backward
incompatible if we suddenly flip the "enhanced" bit for BRE.  A
sane pattern written expecting non-enhanced BRE can change its
meaning when the "enhanced" mode is enabled.

But if "enhanced" is what users want, and if that is what the other
tools on the platform use, then perhaps flipping the "enhanced" bit
may not be a bad idea.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: git bug: Perl compatible regular expressions do not work as expected
  2023-03-28 13:47             ` Junio C Hamano
@ 2023-03-28 17:56               ` René Scharfe
  0 siblings, 0 replies; 18+ messages in thread
From: René Scharfe @ 2023-03-28 17:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Mario Grgic, demerphq, git

Am 28.03.23 um 15:47 schrieb Junio C Hamano:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> I suspect that 54463d32ef was done in a conservative way to avoid
>> unintended side effects to make ERE "enhanced".  I am not 100%
>> certain, but after reading the documentation you pointed at, I do
>> not see a valid expression without ENHANCED flag starting to mean
>> totally different thing with it (well, an extra '?' turning a
>> pattern from greedy to minimal may count as such a change in
>> semantics, but I do not see anybody sensible adding an extra '?'
>> in a pattern in the first place).
>
> Sorry, but that is nonsense.  We cannot avoid being backward
> incompatible if we suddenly flip the "enhanced" bit for BRE.  A
> sane pattern written expecting non-enhanced BRE can change its
> meaning when the "enhanced" mode is enabled.

Right, and there are even more enhancements for ERE, i.e. more
incompatibilities.  Not sure how big of a problem that actually would
be, though.

> But if "enhanced" is what users want, and if that is what the other
> tools on the platform use, then perhaps flipping the "enhanced" bit
> may not be a bad idea.

Apple's grep(1) uses it, with and without -E:

https://github.com/apple-oss-distributions/text_cmds/search?q=REG_ENHANCED

E.g. awk(1) and sed(1) don't, AFAICS.

René

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-03-28 17:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-25 12:31 git bug: Perl compatible regular expressions do not work as expected Mario Grgic
2023-03-25 12:42 ` Kristoffer Haugsbakk
2023-03-25 12:59   ` Mario Grgic
2023-03-25 13:04 ` demerphq
2023-03-25 13:09   ` Mario Grgic
2023-03-25 13:24     ` demerphq
2023-03-25 18:09     ` René Scharfe
2023-03-27 16:29       ` Junio C Hamano
2023-03-27 17:23         ` René Scharfe
2023-03-27 21:33           ` Junio C Hamano
2023-03-28 13:47             ` Junio C Hamano
2023-03-28 17:56               ` René Scharfe
2023-03-25 14:16 ` Mario Grgic
2023-03-25 15:39 ` Mario Grgic
2023-03-27 16:30   ` Junio C Hamano
2023-03-27 17:22     ` Mario Grgic
2023-03-27 21:11       ` Junio C Hamano
2023-03-28  0:03         ` Mario Grgic

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).