git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* BUG: git grep behave oddly with alternatives
@ 2023-01-03  9:53 Marco Nenciarini
  2023-01-03 16:29 ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Marco Nenciarini @ 2023-01-03  9:53 UTC (permalink / raw)
  To: git

I've installed latest git from brew and suddenly git grep started 
behaving oddly when using alternatives.

```
$ echo abd > test.file
$ git grep --untracked 'a\(b\|c\)d' test.file
$ git grep --untracked 'a\(b\)d' test.file
test.file:abd
```

It should have matched in both cases.


If I switch to exented pattern it starts working again

```
$ git grep --untracked -E 'a(b|c)d' test.file
test.file:abd
```

[System Info]
git version:
git version 2.39.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 
PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64 x86_64
compiler info: clang: 14.0.0 (clang-1400.0.29.202)
libc info: no libc information available
$SHELL (typically, interactive shell): /usr/local/bin/bash


-- 
Marco Nenciarini - EnterpriseDB
marco.nenciarini@enterprisedb.com | www.enterprisedb.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-03  9:53 BUG: git grep behave oddly with alternatives Marco Nenciarini
@ 2023-01-03 16:29 ` René Scharfe
  2023-01-03 18:13   ` Marco Nenciarini
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-01-03 16:29 UTC (permalink / raw)
  To: Marco Nenciarini, git

Am 03.01.23 um 10:53 schrieb Marco Nenciarini:
> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives.
>
> ```
> $ echo abd > test.file
> $ git grep --untracked 'a\(b\|c\)d' test.file
> $ git grep --untracked 'a\(b\)d' test.file
> test.file:abd
> ```
>
> It should have matched in both cases.
>
>
> If I switch to exented pattern it starts working again
>
> ```
> $ git grep --untracked -E 'a(b|c)d' test.file
> test.file:abd
> ```

This is expected: Basic regular expressions don't support alternation.

Under which circumstances did it work for you without -E or -P?

>
> [System Info]
> git version:
> git version 2.39.0
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> feature: fsmonitor--daemon
> uname: Darwin 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64 x86_64
> compiler info: clang: 14.0.0 (clang-1400.0.29.202)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /usr/local/bin/bash
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-03 16:29 ` René Scharfe
@ 2023-01-03 18:13   ` Marco Nenciarini
  2023-01-03 20:52     ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Marco Nenciarini @ 2023-01-03 18:13 UTC (permalink / raw)
  To: René Scharfe, git

On 03/01/23 17:29, René Scharfe wrote:
> Am 03.01.23 um 10:53 schrieb Marco Nenciarini:
>> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives.
>>
>> ```
>> $ echo abd > test.file
>> $ git grep --untracked 'a\(b\|c\)d' test.file
>> $ git grep --untracked 'a\(b\)d' test.file
>> test.file:abd
>> ```
>>
>> It should have matched in both cases.
>>
>>
>> If I switch to exented pattern it starts working again
>>
>> ```
>> $ git grep --untracked -E 'a(b|c)d' test.file
>> test.file:abd
>> ```
> 
> This is expected: Basic regular expressions don't support alternation.
> 
> Under which circumstances did it work for you without -E or -P?
> 

It has always worked until I installed 2.39.0 on my mac. I've also 
checked with other developers that are using a mac and they reports the 
same. On Linux it works regardless the git version.

Using grep directly also works, so it is a different behavior between 
grep and git grep, that is surprising.

>>
>> [System Info]
>> git version:
>> git version 2.39.0
>> cpu: x86_64
>> no commit associated with this build
>> sizeof-long: 8
>> sizeof-size_t: 8
>> shell-path: /bin/sh
>> feature: fsmonitor--daemon
>> uname: Darwin 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:08:47 PST 2022; root:xnu-8792.61.2~4/RELEASE_X86_64 x86_64
>> compiler info: clang: 14.0.0 (clang-1400.0.29.202)
>> libc info: no libc information available
>> $SHELL (typically, interactive shell): /usr/local/bin/bash
>>
>>
> 

-- 
Marco Nenciarini - EnterpriseDB
marco.nenciarini@enterprisedb.com | www.enterprisedb.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-03 18:13   ` Marco Nenciarini
@ 2023-01-03 20:52     ` René Scharfe
  2023-01-04  6:13       ` Junio C Hamano
  2023-01-04  7:46       ` Jeff King
  0 siblings, 2 replies; 22+ messages in thread
From: René Scharfe @ 2023-01-03 20:52 UTC (permalink / raw)
  To: Marco Nenciarini, git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason

Am 03.01.23 um 19:13 schrieb Marco Nenciarini:
> On 03/01/23 17:29, René Scharfe wrote:
>> Am 03.01.23 um 10:53 schrieb Marco Nenciarini:
>>> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives.
>>>
>>> ```
>>> $ echo abd > test.file
>>> $ git grep --untracked 'a\(b\|c\)d' test.file
>>> $ git grep --untracked 'a\(b\)d' test.file
>>> test.file:abd
>>> ```
>>>
>>> It should have matched in both cases.
>>>
>>>
>>> If I switch to exented pattern it starts working again
>>>
>>> ```
>>> $ git grep --untracked -E 'a(b|c)d' test.file
>>> test.file:abd
>>> ```
>>
>> This is expected: Basic regular expressions don't support alternation.
>>
>> Under which circumstances did it work for you without -E or -P?
>>
>
> It has always worked until I installed 2.39.0 on my mac. I've also checked with other developers that are using a mac and they reports the same. On Linux it works regardless the git version.
>
> Using grep directly also works, so it is a different behavior between grep and git grep, that is surprising.

Meaning you used Apple's version of git before?

   $ echo abd > test.file
   $ /usr/bin/git grep --untracked 'b\|c' test.file
   test.file:abd
   $ /usr/bin/git version
   git version 2.37.1 (Apple Git-137.1)

   $ git grep --untracked 'b\|c' test.file
   $ git version
   git version 2.39.0

So I can reproduce your findings on macOS Ventura.  Same with grep:

   $ grep 'b\|c' test.file
   abd
   $ grep --version
   grep (BSD grep, GNU compatible) 2.6.0-FreeBSD

re_format(7) says:

   "Obsolete (“basic”) regular expressions differ in several respects.
    ‘|’ is an ordinary character and there is no equivalent for its
    functionality.".

Under the headline "ENHANCED FEATURES" it continues, however:

   "When the REG_ENHANCED flag is passed to one of the regcomp()
    variants, additional features are activated."

And:

   "For enhanced basic REs, ‘+’, ‘?’ and ‘|’ remain regular characters,
    but ‘\+’, ‘\?’ and ‘\|’ have the same special meaning as the
    unescaped characters do for extended REs, i.e., one or more
    matches, zero or one matches and alteration, respectively."

So apparently Apple chose a middle ground between basic and extended
regular expressions for its grep and git grep.

Under "IMPLEMENTATION CHOICES" it says:

   "The regex implementation in Mac OS X 10.8 and later is based on a
    heavily modified subset of TRE (http://laurikari.net/tre/)."

The manpage of GNU grep says:

   "grep understands three different versions of regular expression
    syntax: “basic” (BRE), “extended” (ERE) and “perl” (PCRE).  In
    GNU grep there is no difference in available functionality
    between basic and extended syntax.  In other implementations,
    basic regular expressions are less powerful."

And under the headline "Basic vs Extended Regular Expressions":

   "In basic regular expressions the meta-characters ?, +, {, |, (,
    and ) lose their special meaning; instead use the backslashed
    versions \?, \+, \{, \|, \(, and \)."

So GNU grep apparently made the same choice as Apple, probably far
earlier.

When I compile git with NO_REGEX and thus with the fallback in
compat/regex/, the result supports alternation as well:

   $ ./git grep --untracked 'b\|c' test.file
   test.file:abd
   $ nm ./git | grep regcomp
   0000000100255978 T _git_regcomp

Based on that I suggest:

--- >8 ---
Subject: grep: use REG_ENHANCED on macOS

GNU grep(1) and regcomp(3) use enhanced basic regular expressions by
default, which means that it e.g. supports alternation, e.g. "a\|b"
matches both "a" and "b".  The same is true for our compat/regex/
implementation.

On macOS Ventura (and possibly earlier) grep(1) also uses enhanced BREs,
but regcomp(3) requires the flag REG_ENHANCED to turn them on.  Do that
for git grep if possible, for consistency with the platform's grep(1)
and our fallback library.

It would be simpler to use REG_ENHANCED everywhere it is defined instead
of adding a Makefile setting, but it's a non-standard extension and
might mean something else on other platforms.

Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Makefile         | 8 ++++++++
 config.mak.uname | 1 +
 grep.c           | 4 ++++
 3 files changed, 13 insertions(+)

diff --git a/Makefile b/Makefile
index db447d0738..15e7edc9d2 100644
--- a/Makefile
+++ b/Makefile
@@ -289,6 +289,10 @@ include shared.mak
 # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
 # feature.
 #
+# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag
+# REG_ENHANCED to enable enhanced basic regular expressions and you'd
+# like to use it in git grep.
+#
 # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
 # user.
 #
@@ -2037,6 +2041,10 @@ endif
 ifdef NO_REGEX
 	COMPAT_CFLAGS += -Icompat/regex
 	COMPAT_OBJS += compat/regex/regex.o
+else
+ifdef GIT_GREP_USES_REG_ENHANCED
+	COMPAT_CFLAGS += -DGIT_GREP_USES_REG_ENHANCED
+endif
 endif
 ifdef NATIVE_CRLF
 	BASIC_CFLAGS += -DNATIVE_CRLF
diff --git a/config.mak.uname b/config.mak.uname
index d63629fe80..14c145ae42 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin)
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
 	CSPRNG_METHOD = arc4random
+	GIT_GREP_USES_REG_ENHANCED = YesPlease

 	# Workaround for `gettext` being keg-only and not even being linked via
 	# `brew link --force gettext`, should be obsolete as of
diff --git a/grep.c b/grep.c
index 06eed69493..a8430daaba 100644
--- a/grep.c
+++ b/grep.c
@@ -502,6 +502,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		regflags |= REG_ICASE;
 	if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE)
 		regflags |= REG_EXTENDED;
+#if defined(GIT_GREP_USES_REG_ENHANCED) && defined(REG_ENHANCED)
+	else
+		regflags |= REG_ENHANCED;
+#endif
 	err = regcomp(&p->regexp, p->pattern, regflags);
 	if (err) {
 		char errbuf[1024];
--
2.39.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-03 20:52     ` René Scharfe
@ 2023-01-04  6:13       ` Junio C Hamano
  2023-01-04  7:46       ` Jeff King
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-01-04  6:13 UTC (permalink / raw)
  To: René Scharfe
  Cc: Marco Nenciarini, git, Ævar Arnfjörð Bjarmason

René Scharfe <l.s.r@web.de> writes:

>    "For enhanced basic REs, ‘+’, ‘?’ and ‘|’ remain regular characters,
>     but ‘\+’, ‘\?’ and ‘\|’ have the same special meaning as the
>     unescaped characters do for extended REs, i.e., one or more
>     matches, zero or one matches and alteration, respectively."
>
> So apparently Apple chose a middle ground between basic and extended
> regular expressions for its grep and git grep.

Sounds like GNU extension to me.

> So GNU grep apparently made the same choice as Apple, probably far
> earlier.

Yup.

> Based on that I suggest:
> ...
>
> It would be simpler to use REG_ENHANCED everywhere it is defined instead
> of adding a Makefile setting, but it's a non-standard extension and
> might mean something else on other platforms.

OK.  Very conservative and good.

> Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  Makefile         | 8 ++++++++
>  config.mak.uname | 1 +
>  grep.c           | 4 ++++
>  3 files changed, 13 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index db447d0738..15e7edc9d2 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -289,6 +289,10 @@ include shared.mak
>  # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
>  # feature.
>  #
> +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag
> +# REG_ENHANCED to enable enhanced basic regular expressions and you'd
> +# like to use it in git grep.
> +#
>  # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
>  # user.
>  #
> @@ -2037,6 +2041,10 @@ endif
>  ifdef NO_REGEX
>  	COMPAT_CFLAGS += -Icompat/regex
>  	COMPAT_OBJS += compat/regex/regex.o
> +else
> +ifdef GIT_GREP_USES_REG_ENHANCED
> +	COMPAT_CFLAGS += -DGIT_GREP_USES_REG_ENHANCED
> +endif
>  endif
>  ifdef NATIVE_CRLF
>  	BASIC_CFLAGS += -DNATIVE_CRLF
> diff --git a/config.mak.uname b/config.mak.uname
> index d63629fe80..14c145ae42 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin)
>  	FREAD_READS_DIRECTORIES = UnfortunatelyYes
>  	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
>  	CSPRNG_METHOD = arc4random
> +	GIT_GREP_USES_REG_ENHANCED = YesPlease
>
>  	# Workaround for `gettext` being keg-only and not even being linked via
>  	# `brew link --force gettext`, should be obsolete as of
> diff --git a/grep.c b/grep.c
> index 06eed69493..a8430daaba 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -502,6 +502,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  		regflags |= REG_ICASE;
>  	if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE)
>  		regflags |= REG_EXTENDED;
> +#if defined(GIT_GREP_USES_REG_ENHANCED) && defined(REG_ENHANCED)
> +	else
> +		regflags |= REG_ENHANCED;
> +#endif
>  	err = regcomp(&p->regexp, p->pattern, regflags);
>  	if (err) {
>  		char errbuf[1024];
> --
> 2.39.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-03 20:52     ` René Scharfe
  2023-01-04  6:13       ` Junio C Hamano
@ 2023-01-04  7:46       ` Jeff King
  2023-01-04 16:36         ` René Scharfe
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2023-01-04  7:46 UTC (permalink / raw)
  To: René Scharfe
  Cc: Marco Nenciarini, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

On Tue, Jan 03, 2023 at 09:52:27PM +0100, René Scharfe wrote:

> diff --git a/Makefile b/Makefile
> index db447d0738..15e7edc9d2 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -289,6 +289,10 @@ include shared.mak
>  # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
>  # feature.
>  #
> +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag
> +# REG_ENHANCED to enable enhanced basic regular expressions and you'd
> +# like to use it in git grep.

I didn't test, but just from looking at the patch I'd expect this to
affect other parts of Git besides git-grep. E.g., "git log --grep".
Which raises two questions:

 - would a more generalized name be better? USE_REG_ENHANCED or
   something? That might be _too_ general, but see below.

 - should this cover other cases? Grepping for "regcomp", would people
   want this to behave consistently for "git config --get-regexp", or
   diff funcnames, and so on?

If so, then I could envision a USE_REG_ENHANCED which just wraps the
system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not
set?

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-04  7:46       ` Jeff King
@ 2023-01-04 16:36         ` René Scharfe
  2023-01-06  9:09           ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-01-04 16:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Marco Nenciarini, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Am 04.01.23 um 08:46 schrieb Jeff King:
> On Tue, Jan 03, 2023 at 09:52:27PM +0100, René Scharfe wrote:
>
>> diff --git a/Makefile b/Makefile
>> index db447d0738..15e7edc9d2 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -289,6 +289,10 @@ include shared.mak
>>  # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
>>  # feature.
>>  #
>> +# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag
>> +# REG_ENHANCED to enable enhanced basic regular expressions and you'd
>> +# like to use it in git grep.
>
> I didn't test, but just from looking at the patch I'd expect this to
> affect other parts of Git besides git-grep. E.g., "git log --grep".
> Which raises two questions:
>
>  - would a more generalized name be better? USE_REG_ENHANCED or
>    something? That might be _too_ general, but see below.
>
>  - should this cover other cases? Grepping for "regcomp", would people
>    want this to behave consistently for "git config --get-regexp", or
>    diff funcnames, and so on?
>
> If so, then I could envision a USE_REG_ENHANCED which just wraps the
> system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not
> set?

Good point.  I don't know what people want, though.  re_format(7) on
macOS/BSD and regex(7) on Linux call basic REs "obsolete" and extended
REs "modern", so they seem to push people away from the old kind,
enhanced or not.

But making a consistent choice for all regex use makes sense --
platforms that use compat/regex/ get the same enhanced flavor
everywhere.

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-04 16:36         ` René Scharfe
@ 2023-01-06  9:09           ` Jeff King
  2023-01-08  0:42             ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2023-01-06  9:09 UTC (permalink / raw)
  To: René Scharfe
  Cc: Marco Nenciarini, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

On Wed, Jan 04, 2023 at 05:36:21PM +0100, René Scharfe wrote:

> > I didn't test, but just from looking at the patch I'd expect this to
> > affect other parts of Git besides git-grep. E.g., "git log --grep".
> > Which raises two questions:
> >
> >  - would a more generalized name be better? USE_REG_ENHANCED or
> >    something? That might be _too_ general, but see below.
> >
> >  - should this cover other cases? Grepping for "regcomp", would people
> >    want this to behave consistently for "git config --get-regexp", or
> >    diff funcnames, and so on?
> >
> > If so, then I could envision a USE_REG_ENHANCED which just wraps the
> > system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not
> > set?
> 
> Good point.  I don't know what people want, though.  re_format(7) on
> macOS/BSD and regex(7) on Linux call basic REs "obsolete" and extended
> REs "modern", so they seem to push people away from the old kind,
> enhanced or not.

Oh, good point. I was just grepping for regcomp(), but of course any
case which is already passing REG_EXTENDED would not be affected anyway.
And most places are already using that. E.g., the config code always
does so, and it looks like pickaxe "-G" does so.

For diffs, we have diff.*.xfuncname, which uses EREs. We do still
support regular "funcname" for backwards compatibility, but we only
document the extended version. Ironically, that option was introduced
because BREs did not portably support things like alternation, even with
the "enhanced" syntax. ;) See 45d9414fa5 (diff.*.xfuncname which uses
"extended" regex's for hunk header selection, 2008-09-18).

So I think we are embracing the "everyone should use EREs" mentality
already. The only spots I see that use BREs are:

  - grep.c, which handles "git grep" and "git log --grep"

  - line-range.c, presumably for "-L" function matching

  - deprecated non-ERE funcname patterns

Your patch is handling the first, which is by the far most important. I
would be OK leaving the others as-is, but I also wouldn't mind a patch
that works at the regcomp() level to make things automatically
consistent.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-06  9:09           ` Jeff King
@ 2023-01-08  0:42             ` René Scharfe
  2023-01-08  1:27               ` Junio C Hamano
  2023-01-11 18:56               ` Jeff King
  0 siblings, 2 replies; 22+ messages in thread
From: René Scharfe @ 2023-01-08  0:42 UTC (permalink / raw)
  To: Jeff King
  Cc: Marco Nenciarini, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Am 06.01.23 um 10:09 schrieb Jeff King:
> On Wed, Jan 04, 2023 at 05:36:21PM +0100, René Scharfe wrote:
>
>>> I didn't test, but just from looking at the patch I'd expect this to
>>> affect other parts of Git besides git-grep. E.g., "git log --grep".
>>> Which raises two questions:
>>>
>>>  - would a more generalized name be better? USE_REG_ENHANCED or
>>>    something? That might be _too_ general, but see below.
>>>
>>>  - should this cover other cases? Grepping for "regcomp", would people
>>>    want this to behave consistently for "git config --get-regexp", or
>>>    diff funcnames, and so on?
>>>
>>> If so, then I could envision a USE_REG_ENHANCED which just wraps the
>>> system regcomp and adds the REG_ENHANCED flag when REG_EXTENDED is not
>>> set?
>>
>> Good point.  I don't know what people want, though.  re_format(7) on
>> macOS/BSD and regex(7) on Linux call basic REs "obsolete" and extended
>> REs "modern", so they seem to push people away from the old kind,
>> enhanced or not.
>
> Oh, good point. I was just grepping for regcomp(), but of course any
> case which is already passing REG_EXTENDED would not be affected anyway.
> And most places are already using that. E.g., the config code always
> does so, and it looks like pickaxe "-G" does so.
>
> For diffs, we have diff.*.xfuncname, which uses EREs. We do still
> support regular "funcname" for backwards compatibility, but we only
> document the extended version. Ironically, that option was introduced
> because BREs did not portably support things like alternation, even with
> the "enhanced" syntax. ;) See 45d9414fa5 (diff.*.xfuncname which uses
> "extended" regex's for hunk header selection, 2008-09-18).
>
> So I think we are embracing the "everyone should use EREs" mentality
> already. The only spots I see that use BREs are:
>
>   - grep.c, which handles "git grep" and "git log --grep"
>
>   - line-range.c, presumably for "-L" function matching
>
>   - deprecated non-ERE funcname patterns
>
> Your patch is handling the first, which is by the far most important. I
> would be OK leaving the others as-is, but I also wouldn't mind a patch
> that works at the regcomp() level to make things automatically
> consistent.

There's also the code handling "git log -i -S nonäscii" in
diffcore_pickaxe(), but it quotes special characters and thus
effectively does a fixed-string search, so you're right in omitting it
above.

REG_ENHANCED on macOS affects REG_EXTENDED as well.  It unlocks e.g.
non-greedy repetitions and inline comments.  Sounds nice, but also
potentially surprising.  I was unable to find a current version of
the re_format(7) manpage online, unfortunately.

Apple's latest version of Git sets NO_REGEX and thus uses
compat/regex, if I read their source correctly:

https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302

The easiest and most consistent option would be to do the same.  But
we can't do that, because it would break multibyte support, which was
fixed by 1819ad327b (grep: fix multibyte regex handling under macOS,
2022-08-26), which started to use the system regex functions again.

Which begs the question: Isn't that a problem for the platforms that
still have to use NO_REGEX?  Shouldn't we fix compat/regex?

Anyway, here's an attempt at a more general, but still targeted fix
for macOS:

--- >8 ---
Subject: [PATCH] use enhanced basic regular expressions on macOS

When 1819ad327b (grep: fix multibyte regex handling under macOS,
2022-08-26) started to use the native regex library instead of Git's
own (compat/regex/), it lost support for alternation in basic
regular expressions.

Bring it back by enabling the flag REG_ENHANCED on macOS when
compiling basic regular expressions.

Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Makefile                  | 9 +++++++++
 compat/regcomp_enhanced.c | 9 +++++++++
 config.mak.uname          | 1 +
 git-compat-util.h         | 5 +++++
 4 files changed, 24 insertions(+)
 create mode 100644 compat/regcomp_enhanced.c

diff --git a/Makefile b/Makefile
index db447d0738..46e30be673 100644
--- a/Makefile
+++ b/Makefile
@@ -289,6 +289,10 @@ include shared.mak
 # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
 # feature.
 #
+# Define USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS if your C library provides
+# the flag REG_ENHANCED and you'd like to use it to enable enhanced basic
+# regular expressions.
+#
 # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
 # user.
 #
@@ -2037,6 +2041,11 @@ endif
 ifdef NO_REGEX
 	COMPAT_CFLAGS += -Icompat/regex
 	COMPAT_OBJS += compat/regex/regex.o
+else
+ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
+	COMPAT_CFLAGS += -DUSE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
+	COMPAT_OBJS += compat/regcomp_enhanced.o
+endif
 endif
 ifdef NATIVE_CRLF
 	BASIC_CFLAGS += -DNATIVE_CRLF
diff --git a/compat/regcomp_enhanced.c b/compat/regcomp_enhanced.c
new file mode 100644
index 0000000000..84193ce53b
--- /dev/null
+++ b/compat/regcomp_enhanced.c
@@ -0,0 +1,9 @@
+#include "../git-compat-util.h"
+#undef regcomp
+
+int git_regcomp(regex_t *preg, const char *pattern, int cflags)
+{
+	if (!(cflags & REG_EXTENDED))
+		cflags |= REG_ENHANCED;
+	return regcomp(preg, pattern, cflags);
+}
diff --git a/config.mak.uname b/config.mak.uname
index d63629fe80..7d25995265 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin)
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
 	CSPRNG_METHOD = arc4random
+	USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS = YesPlease

 	# Workaround for `gettext` being keg-only and not even being linked via
 	# `brew link --force gettext`, should be obsolete as of
diff --git a/git-compat-util.h b/git-compat-util.h
index 76e4b11131..1efa834089 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1338,6 +1338,11 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
 	return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
 }

+#ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
+int git_regcomp(regex_t *preg, const char *pattern, int cflags);
+#define regcomp git_regcomp
+#endif
+
 #ifndef DIR_HAS_BSD_GROUP_SEMANTICS
 # define FORCE_DIR_SET_GID S_ISGID
 #else
--
2.39.0

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-08  0:42             ` René Scharfe
@ 2023-01-08  1:27               ` Junio C Hamano
  2023-01-11 18:56               ` Jeff King
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-01-08  1:27 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jeff King, Marco Nenciarini, git,
	Ævar Arnfjörð Bjarmason

René Scharfe <l.s.r@web.de> writes:

> diff --git a/Makefile b/Makefile
> index db447d0738..46e30be673 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -289,6 +289,10 @@ include shared.mak
>  # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
>  # feature.
>  #
> +# Define USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS if your C library provides
> +# the flag REG_ENHANCED and you'd like to use it to enable enhanced basic
> +# regular expressions.
> +#

I wondered if we should mention macOS somewhere in the description
to help those users that may be affeced, but it seems that looking
for "REG_ENHANCED BSD" with a search engine finds pages that
indicate this is available on BSD's in general?

> @@ -2037,6 +2041,11 @@ endif
>  ifdef NO_REGEX
>  	COMPAT_CFLAGS += -Icompat/regex
>  	COMPAT_OBJS += compat/regex/regex.o
> +else
> +ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
> +	COMPAT_CFLAGS += -DUSE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
> +	COMPAT_OBJS += compat/regcomp_enhanced.o
> +endif

OK.

> diff --git a/compat/regcomp_enhanced.c b/compat/regcomp_enhanced.c
> new file mode 100644
> index 0000000000..84193ce53b
> --- /dev/null
> +++ b/compat/regcomp_enhanced.c
> @@ -0,0 +1,9 @@
> +#include "../git-compat-util.h"
> +#undef regcomp
> +int git_regcomp(regex_t *preg, const char *pattern, int cflags)
> +{
> +	if (!(cflags & REG_EXTENDED))
> +		cflags |= REG_ENHANCED;
> +	return regcomp(preg, pattern, cflags);
> +}

OK.  I like the "we only want to affect BRE" bit, that is carefully
done.

> diff --git a/config.mak.uname b/config.mak.uname
> index d63629fe80..7d25995265 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin)
>  	FREAD_READS_DIRECTORIES = UnfortunatelyYes
>  	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
>  	CSPRNG_METHOD = arc4random
> +	USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS = YesPlease

OK.  This would give macOS folks who have already been using the
enhanced mode (without us asking) with their older libraries the
behaviour they are more familiar with.  Good.

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 76e4b11131..1efa834089 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1338,6 +1338,11 @@ static inline int regexec_buf(const regex_t *preg, const char *buf, size_t size,
>  	return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
>  }
>
> +#ifdef USE_ENHANCED_BASIC_REGULAR_EXPRESSIONS
> +int git_regcomp(regex_t *preg, const char *pattern, int cflags);
> +#define regcomp git_regcomp
> +#endif

OK.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-08  0:42             ` René Scharfe
  2023-01-08  1:27               ` Junio C Hamano
@ 2023-01-11 18:56               ` Jeff King
  2023-01-12 17:13                 ` René Scharfe
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2023-01-11 18:56 UTC (permalink / raw)
  To: René Scharfe
  Cc: Marco Nenciarini, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

On Sun, Jan 08, 2023 at 01:42:04AM +0100, René Scharfe wrote:

> REG_ENHANCED on macOS affects REG_EXTENDED as well.  It unlocks e.g.
> non-greedy repetitions and inline comments.  Sounds nice, but also
> potentially surprising.  I was unable to find a current version of
> the re_format(7) manpage online, unfortunately.

I'm not quite sure what you mean here by "non-greedy repetitions".
Something like:

  # prefer "foo bar" to "foo bar bar"; only matters for colorizing or
  # --only-matching
  git grep -E 'foo.*?bar'

? If so, then yeah, that changes the meaning of a bare "?" and people
might be surprised by it.

> Apple's latest version of Git sets NO_REGEX and thus uses
> compat/regex, if I read their source correctly:
> 
> https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302
> 
> The easiest and most consistent option would be to do the same.  But
> we can't do that, because it would break multibyte support, which was
> fixed by 1819ad327b (grep: fix multibyte regex handling under macOS,
> 2022-08-26), which started to use the system regex functions again.

Looks like that NO_REGEX line was dropped by 1819ad327b. So I don't
think Apple added it themselves; their version is just based on an older
version of Git (looks like 2.24.3).

> Which begs the question: Isn't that a problem for the platforms that
> still have to use NO_REGEX?  Shouldn't we fix compat/regex?

I'm not sure. I always assumed our fallback was similar-ish to what was
in glibc and was thus pretty featureful, but that may not be true (it
actually comes from gawk). It looks like we just didn't pull over the
multi-byte parts in a997bf423d (compat/regex: get the gawk regex engine
to compile within git, 2010-08-17).

> Anyway, here's an attempt at a more general, but still targeted fix
> for macOS:
> 
> --- >8 ---
> Subject: [PATCH] use enhanced basic regular expressions on macOS

This seems pretty sensible, regardless of whether we improve multi-byte
support for compat/regex.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-11 18:56               ` Jeff King
@ 2023-01-12 17:13                 ` René Scharfe
  2023-01-12 17:52                   ` Ævar Arnfjörð Bjarmason
  2023-01-12 21:54                   ` Jeff King
  0 siblings, 2 replies; 22+ messages in thread
From: René Scharfe @ 2023-01-12 17:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Marco Nenciarini, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

Am 11.01.23 um 19:56 schrieb Jeff King:
> On Sun, Jan 08, 2023 at 01:42:04AM +0100, René Scharfe wrote:
>
>> REG_ENHANCED on macOS affects REG_EXTENDED as well.  It unlocks e.g.
>> non-greedy repetitions and inline comments.  Sounds nice, but also
>> potentially surprising.  I was unable to find a current version of
>> the re_format(7) manpage online, unfortunately.
>
> I'm not quite sure what you mean here by "non-greedy repetitions".
> Something like:
>
>   # prefer "foo bar" to "foo bar bar"; only matters for colorizing or
>   # --only-matching
>   git grep -E 'foo.*?bar'
>
> ? If so, then yeah, that changes the meaning of a bare "?" and people
> might be surprised by it.

Right.  To be fair, question mark is a special character and you'd
probably need to quote it anyway if you want to match a literal
question mark.  Otherwise I get:

   $ git grep -E 'foo.*?bar'
   fatal: command line, 'foo.*?bar': repetition-operator operand invalid

>> Apple's latest version of Git sets NO_REGEX and thus uses
>> compat/regex, if I read their source correctly:
>>
>> https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302
>>
>> The easiest and most consistent option would be to do the same.  But
>> we can't do that, because it would break multibyte support, which was
>> fixed by 1819ad327b (grep: fix multibyte regex handling under macOS,
>> 2022-08-26), which started to use the system regex functions again.
>
> Looks like that NO_REGEX line was dropped by 1819ad327b. So I don't
> think Apple added it themselves; their version is just based on an older
> version of Git (looks like 2.24.3).

Makes sense.

>> Which begs the question: Isn't that a problem for the platforms that
>> still have to use NO_REGEX?  Shouldn't we fix compat/regex?
>
> I'm not sure. I always assumed our fallback was similar-ish to what was
> in glibc and was thus pretty featureful, but that may not be true (it
> actually comes from gawk). It looks like we just didn't pull over the
> multi-byte parts in a997bf423d (compat/regex: get the gawk regex engine
> to compile within git, 2010-08-17).

GAWK removed NO_MBSUPPORT, NO_MBSUPPORT and mbsupport.h in the meantime.
I guess that means they support multi-byte characters everywhere now.

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-12 17:13                 ` René Scharfe
@ 2023-01-12 17:52                   ` Ævar Arnfjörð Bjarmason
  2023-01-12 21:54                   ` Jeff King
  1 sibling, 0 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-01-12 17:52 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Marco Nenciarini, git, Junio C Hamano


On Thu, Jan 12 2023, René Scharfe wrote:

> Am 11.01.23 um 19:56 schrieb Jeff King:
>> On Sun, Jan 08, 2023 at 01:42:04AM +0100, René Scharfe wrote:
> [...]
>>> Which begs the question: Isn't that a problem for the platforms that
>>> still have to use NO_REGEX?  Shouldn't we fix compat/regex?
>>
>> I'm not sure. I always assumed our fallback was similar-ish to what was
>> in glibc and was thus pretty featureful, but that may not be true (it
>> actually comes from gawk). It looks like we just didn't pull over the
>> multi-byte parts in a997bf423d (compat/regex: get the gawk regex engine
>> to compile within git, 2010-08-17).
>
> GAWK removed NO_MBSUPPORT, NO_MBSUPPORT and mbsupport.h in the meantime.
> I guess that means they support multi-byte characters everywhere now.
>
> René

Note that GAWK doesn't really have its own regex engine, their is mostly
glibc.git's, which they've ported over to their codebase with some
difficulty over the years (it not being in gnulib, like most such shared
GNU libraries).

Ours is then an old copy of gawk.git's, which I attempted to update a
few years back, but that patch series ran into some issues I can't
remember, so we still have that very old version.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-12 17:13                 ` René Scharfe
  2023-01-12 17:52                   ` Ævar Arnfjörð Bjarmason
@ 2023-01-12 21:54                   ` Jeff King
  2023-01-13  8:28                     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 22+ messages in thread
From: Jeff King @ 2023-01-12 21:54 UTC (permalink / raw)
  To: René Scharfe
  Cc: Marco Nenciarini, git, Junio C Hamano,
	Ævar Arnfjörð Bjarmason

On Thu, Jan 12, 2023 at 06:13:13PM +0100, René Scharfe wrote:

> > I'm not quite sure what you mean here by "non-greedy repetitions".
> > Something like:
> >
> >   # prefer "foo bar" to "foo bar bar"; only matters for colorizing or
> >   # --only-matching
> >   git grep -E 'foo.*?bar'
> >
> > ? If so, then yeah, that changes the meaning of a bare "?" and people
> > might be surprised by it.
> 
> Right.  To be fair, question mark is a special character and you'd
> probably need to quote it anyway if you want to match a literal
> question mark.  Otherwise I get:
> 
>    $ git grep -E 'foo.*?bar'
>    fatal: command line, 'foo.*?bar': repetition-operator operand invalid

This is on macOS, I assume? With glibc it seems to be quietly ignored:

  $ git grep -E -o 'foo.*?ba' .clang-format
  .clang-format:foo, bar, ba

So it is not treated literally (as it would be without -E). But nor does
it make the match non-greedy (otherwise it would have output "foo, ba",
as "git grep -P" does).

So it does seem like all bets are off for what people can and should
expect here. Which isn't to say we should make things worse. I mostly
wondered if REG_ENHANCED might take us closer to what glibc was doing by
default, but it doesn't seem like it.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-12 21:54                   ` Jeff King
@ 2023-01-13  8:28                     ` Ævar Arnfjörð Bjarmason
  2023-01-13 17:19                       ` Junio C Hamano
  2023-01-13 17:24                       ` René Scharfe
  0 siblings, 2 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2023-01-13  8:28 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Marco Nenciarini, git, Junio C Hamano


On Thu, Jan 12 2023, Jeff King wrote:

> On Thu, Jan 12, 2023 at 06:13:13PM +0100, René Scharfe wrote:
>
>> > I'm not quite sure what you mean here by "non-greedy repetitions".
>> > Something like:
>> >
>> >   # prefer "foo bar" to "foo bar bar"; only matters for colorizing or
>> >   # --only-matching
>> >   git grep -E 'foo.*?bar'
>> >
>> > ? If so, then yeah, that changes the meaning of a bare "?" and people
>> > might be surprised by it.
>> 
>> Right.  To be fair, question mark is a special character and you'd
>> probably need to quote it anyway if you want to match a literal
>> question mark.  Otherwise I get:
>> 
>>    $ git grep -E 'foo.*?bar'
>>    fatal: command line, 'foo.*?bar': repetition-operator operand invalid
>
> This is on macOS, I assume? With glibc it seems to be quietly ignored:
>
>   $ git grep -E -o 'foo.*?ba' .clang-format
>   .clang-format:foo, bar, ba
>
> So it is not treated literally (as it would be without -E). But nor does
> it make the match non-greedy (otherwise it would have output "foo, ba",
> as "git grep -P" does).
>
> So it does seem like all bets are off for what people can and should
> expect here. Which isn't to say we should make things worse. I mostly
> wondered if REG_ENHANCED might take us closer to what glibc was doing by
> default, but it doesn't seem like it.

There's a couple of ways out of this that I don't see in this thread:

- Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and
  PCRE. One view is to say the first two must match POSIX, another is
  tha whatever the platform thinks they should do is how they should
  act.

  Of course that makes git regex invocations "unportable", but it might
  be acceptable. People can always use PCRE if they want "portability".

- Just mandate PCRE for Mac OS, then map -E to -P. We could do this with
  the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag,
  i.e. PCRE's own "translate this to ERE".

  But Perl/PCRE syntax is already a superset of ERE syntax, minus things
  like (*VERB), (?: etc., which people are unlikely to write without
  intending to get the PCRE semantics.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-13  8:28                     ` Ævar Arnfjörð Bjarmason
@ 2023-01-13 17:19                       ` Junio C Hamano
  2023-01-14  6:44                         ` René Scharfe
  2023-01-13 17:24                       ` René Scharfe
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2023-01-13 17:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, René Scharfe, Marco Nenciarini, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, Jan 12 2023, Jeff King wrote:
>
>> So it does seem like all bets are off for what people can and should
>> expect here. Which isn't to say we should make things worse. I mostly
>> wondered if REG_ENHANCED might take us closer to what glibc was doing by
>> default, but it doesn't seem like it.

I thought that René's "Use enhanced only when doing BRE" was fairly
focused, but I am very tempted to accept ...

> There's a couple of ways out of this that I don't see in this thread:
>
> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and
>   PCRE. One view is to say the first two must match POSIX, another is
>   tha whatever the platform thinks they should do is how they should
>   act.

... this view.  The story "BRE and ERE work via what system
libraries provide, and 'git grep' matches what system grep' does" is
an easy to understand view.

> - Just mandate PCRE for Mac OS, then map -E to -P. We could do this with
>   the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag,
>   i.e. PCRE's own "translate this to ERE".

Presumably this is to ensure -E works identically everywhere?  If
so, then we should do that everywhere, not just on macOS.  But again
it makes "git grep -E" slightly incompatible with "grep" on systems
(including macOS), so...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-13  8:28                     ` Ævar Arnfjörð Bjarmason
  2023-01-13 17:19                       ` Junio C Hamano
@ 2023-01-13 17:24                       ` René Scharfe
  2023-01-13 23:03                         ` René Scharfe
  1 sibling, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-01-13 17:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: Marco Nenciarini, git, Junio C Hamano

Am 13.01.23 um 09:28 schrieb Ævar Arnfjörð Bjarmason:
>
> There's a couple of ways out of this that I don't see in this thread:
>
> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and
>   PCRE. One view is to say the first two must match POSIX, another is
>   tha whatever the platform thinks they should do is how they should
>   act.

I like that and it was my first reaction -- alternation is a
non-standard extension to BREs.  But we supported it explicitly since
3632cfc248 (Use compatibility regex library for OSX/Darwin, 2008-09-07),
so not having it anymore is a regression that we should fix, I think.

>   Of course that makes git regex invocations "unportable", but it might
>   be acceptable. People can always use PCRE if they want "portability".
>
> - Just mandate PCRE for Mac OS, then map -E to -P. We could do this with
>   the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag,
>   i.e. PCRE's own "translate this to ERE".
>
>   But Perl/PCRE syntax is already a superset of ERE syntax, minus things
>   like (*VERB), (?: etc., which people are unlikely to write without
>   intending to get the PCRE semantics.

PCRE2_CONVERT_POSIX_EXTENDED is a flag for pcre2_pattern_convert(),
which allows converting an ERE into a PCRE2 regex
(https://pcre.org/current/doc/html/pcre2convert.html).  So this feature
allows using PCRE for every kind of regexes, right?

There may be feature differences for EREs between libc on macOS, glibc
and/or GAWK, but I'm not aware of any complaints so far.  So I currently
don't see the need for that.

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-13 17:24                       ` René Scharfe
@ 2023-01-13 23:03                         ` René Scharfe
  0 siblings, 0 replies; 22+ messages in thread
From: René Scharfe @ 2023-01-13 23:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: Marco Nenciarini, git, Junio C Hamano

Am 13.01.23 um 18:24 schrieb René Scharfe:
> Am 13.01.23 um 09:28 schrieb Ævar Arnfjörð Bjarmason:
>>
>> - Just mandate PCRE for Mac OS, then map -E to -P. We could do this with
>>   the pcre2convert() API and PCRE2_CONVERT_POSIX_EXTENDED flag,
>>   i.e. PCRE's own "translate this to ERE".
>>
>>   But Perl/PCRE syntax is already a superset of ERE syntax, minus things
>>   like (*VERB), (?: etc., which people are unlikely to write without
>>   intending to get the PCRE semantics.
>
> PCRE2_CONVERT_POSIX_EXTENDED is a flag for pcre2_pattern_convert(),
> which allows converting an ERE into a PCRE2 regex
> (https://pcre.org/current/doc/html/pcre2convert.html).  So this feature
> allows using PCRE for every kind of regexes, right?
>
> There may be feature differences for EREs between libc on macOS, glibc
> and/or GAWK, but I'm not aware of any complaints so far.  So I currently
> don't see the need for that.

Thought that maybe this could help us matching NUL characters and fix the
TODO in t7815, because PCRE2 can handle binary files just fine.  But no:
pcre2_pattern_convert() uses (NUL*) in its result, meaning that NUL is
treated as a line separator and . (dot) won't match it.  And it's still
experimental according to the documentation link I mentioned above.

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-13 17:19                       ` Junio C Hamano
@ 2023-01-14  6:44                         ` René Scharfe
  2023-01-14  8:31                           ` René Scharfe
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-01-14  6:44 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Marco Nenciarini, git

Am 13.01.23 um 18:19 schrieb Junio C Hamano:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Thu, Jan 12 2023, Jeff King wrote:
>>
>>> So it does seem like all bets are off for what people can and should
>>> expect here. Which isn't to say we should make things worse. I mostly
>>> wondered if REG_ENHANCED might take us closer to what glibc was doing by
>>> default, but it doesn't seem like it.
>
> I thought that René's "Use enhanced only when doing BRE" was fairly
> focused, but I am very tempted to accept ...
>
>> There's a couple of ways out of this that I don't see in this thread:
>>
>> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and
>>   PCRE. One view is to say the first two must match POSIX, another is
>>   tha whatever the platform thinks they should do is how they should
>>   act.
>
> ... this view.  The story "BRE and ERE work via what system
> libraries provide, and 'git grep' matches what system grep' does" is
> an easy to understand view.

That was my stance in my first reply as well.  But 3632cfc248 (Use
compatibility regex library for OSX/Darwin, 2008-09-07) explicitly
added alternation support for BREs on macOS, and 1819ad327b (grep: fix
multibyte regex handling under macOS, 2022-08-26) removed it seemingly
by accident.  And grep(1) does support them on macOS 13.1:

   $ uname -rs
   Darwin 22.2.0
   $ which grep
   /usr/bin/grep
   $ grep --version
   grep (BSD grep, GNU compatible) 2.6.0-FreeBSD
   $ grep '\(REG_STARTEND\|NeededForASAN\)' Makefile
   # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
   NO_REGEX = NeededForASAN

René


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-14  6:44                         ` René Scharfe
@ 2023-01-14  8:31                           ` René Scharfe
  2023-01-14 12:45                             ` Diomidis Spinellis
  0 siblings, 1 reply; 22+ messages in thread
From: René Scharfe @ 2023-01-14  8:31 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Marco Nenciarini, git, Diomidis Spinellis

Am 14.01.23 um 07:44 schrieb René Scharfe:
> Am 13.01.23 um 18:19 schrieb Junio C Hamano:
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> On Thu, Jan 12 2023, Jeff King wrote:
>>>
>>>> So it does seem like all bets are off for what people can and should
>>>> expect here. Which isn't to say we should make things worse. I mostly
>>>> wondered if REG_ENHANCED might take us closer to what glibc was doing by
>>>> default, but it doesn't seem like it.
>>
>> I thought that René's "Use enhanced only when doing BRE" was fairly
>> focused, but I am very tempted to accept ...
>>
>>> There's a couple of ways out of this that I don't see in this thread:
>>>
>>> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and
>>>   PCRE. One view is to say the first two must match POSIX, another is
>>>   tha whatever the platform thinks they should do is how they should
>>>   act.
>>
>> ... this view.  The story "BRE and ERE work via what system
>> libraries provide, and 'git grep' matches what system grep' does" is
>> an easy to understand view.
>
> That was my stance in my first reply as well.  But 3632cfc248 (Use
> compatibility regex library for OSX/Darwin, 2008-09-07) explicitly
> added alternation support for BREs on macOS, and 1819ad327b (grep: fix
> multibyte regex handling under macOS, 2022-08-26) removed it seemingly
> by accident.  And grep(1) does support them on macOS 13.1:
>
>    $ uname -rs
>    Darwin 22.2.0
>    $ which grep
>    /usr/bin/grep
>    $ grep --version
>    grep (BSD grep, GNU compatible) 2.6.0-FreeBSD
>    $ grep '\(REG_STARTEND\|NeededForASAN\)' Makefile
>    # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
>    NO_REGEX = NeededForASAN

And I neglected to copy the author of 1819ad327b until now. :-|

@Diomidis: Here's a link to the start of this thread:
https://lore.kernel.org/git/f82ae28a-fb56-8d1f-96c8-550b61439d3a@enterprisedb.com/

René

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-14  8:31                           ` René Scharfe
@ 2023-01-14 12:45                             ` Diomidis Spinellis
  2023-01-14 16:08                               ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Diomidis Spinellis @ 2023-01-14 12:45 UTC (permalink / raw)
  To: René Scharfe, Junio C Hamano,
	Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Marco Nenciarini, git

On 14-Jan-23 10:31, René Scharfe wrote:
> Am 14.01.23 um 07:44 schrieb René Scharfe:
>> Am 13.01.23 um 18:19 schrieb Junio C Hamano:
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>>> On Thu, Jan 12 2023, Jeff King wrote:
>>>> There's a couple of ways out of this that I don't see in this thread:
>>>>
>>>> - Declare it not a problem: We have -G, -E and -P to map to BRE, ERE and
>>>>    PCRE. One view is to say the first two must match POSIX, another is
>>>>    tha whatever the platform thinks they should do is how they should
>>>>    act.
>>>
>>> ... this view.  The story "BRE and ERE work via what system
>>> libraries provide, and 'git grep' matches what system grep' does" is
>>> an easy to understand view.
>>
>> That was my stance in my first reply as well.  But 3632cfc248 (Use
>> compatibility regex library for OSX/Darwin, 2008-09-07) explicitly
>> added alternation support for BREs on macOS, and 1819ad327b (grep: fix
>> multibyte regex handling under macOS, 2022-08-26) removed it seemingly
>> by accident.  And grep(1) does support them on macOS 13.1:

Indeed, I removed the alternation handling functionality by accident.  I 
was not aware of the BRE-handling difference between the GNU and the 
macOS native regex library.  I think having git-grep behave the same as 
grep(1) on each platform is consistent with the principle of least 
astonishment (POLA).  This would mean that on macOS plain git-grep 
should use enhanced basic REs as proposed in René's patch.

Diomidis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: git grep behave oddly with alternatives
  2023-01-14 12:45                             ` Diomidis Spinellis
@ 2023-01-14 16:08                               ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2023-01-14 16:08 UTC (permalink / raw)
  To: Diomidis Spinellis
  Cc: René Scharfe, Ævar Arnfjörð Bjarmason,
	Jeff King, Marco Nenciarini, git

Diomidis Spinellis <dds@aueb.gr> writes:

> Indeed, I removed the alternation handling functionality by accident.
> I was not aware of the BRE-handling difference between the GNU and the
> macOS native regex library.  I think having git-grep behave the same
> as grep(1) on each platform is consistent with the principle of least
> astonishment (POLA).  This would mean that on macOS plain git-grep
> should use enhanced basic REs as proposed in René's patch.

Thanks.  I am fine with what René's patch does, which we already
queued, then ;-)

Let's merge 54463d32 (use enhanced basic regular expressions on
macOS, 2023-01-08) down to 'next'.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-01-14 16:09 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-03  9:53 BUG: git grep behave oddly with alternatives Marco Nenciarini
2023-01-03 16:29 ` René Scharfe
2023-01-03 18:13   ` Marco Nenciarini
2023-01-03 20:52     ` René Scharfe
2023-01-04  6:13       ` Junio C Hamano
2023-01-04  7:46       ` Jeff King
2023-01-04 16:36         ` René Scharfe
2023-01-06  9:09           ` Jeff King
2023-01-08  0:42             ` René Scharfe
2023-01-08  1:27               ` Junio C Hamano
2023-01-11 18:56               ` Jeff King
2023-01-12 17:13                 ` René Scharfe
2023-01-12 17:52                   ` Ævar Arnfjörð Bjarmason
2023-01-12 21:54                   ` Jeff King
2023-01-13  8:28                     ` Ævar Arnfjörð Bjarmason
2023-01-13 17:19                       ` Junio C Hamano
2023-01-14  6:44                         ` René Scharfe
2023-01-14  8:31                           ` René Scharfe
2023-01-14 12:45                             ` Diomidis Spinellis
2023-01-14 16:08                               ` Junio C Hamano
2023-01-13 17:24                       ` René Scharfe
2023-01-13 23:03                         ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).