From: "René Scharfe" <l.s.r@web.de>
To: Marco Nenciarini <marco.nenciarini@enterprisedb.com>,
git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: BUG: git grep behave oddly with alternatives
Date: Tue, 3 Jan 2023 21:52:27 +0100 [thread overview]
Message-ID: <343a891e-d737-0ace-26a9-3839d3bd5583@web.de> (raw)
In-Reply-To: <1f61b660-b2d0-ba93-3182-05a9ab97b00e@enterprisedb.com>
Am 03.01.23 um 19:13 schrieb Marco Nenciarini:
> On 03/01/23 17:29, René Scharfe wrote:
>> Am 03.01.23 um 10:53 schrieb Marco Nenciarini:
>>> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives.
>>>
>>> ```
>>> $ echo abd > test.file
>>> $ git grep --untracked 'a\(b\|c\)d' test.file
>>> $ git grep --untracked 'a\(b\)d' test.file
>>> test.file:abd
>>> ```
>>>
>>> It should have matched in both cases.
>>>
>>>
>>> If I switch to exented pattern it starts working again
>>>
>>> ```
>>> $ git grep --untracked -E 'a(b|c)d' test.file
>>> test.file:abd
>>> ```
>>
>> This is expected: Basic regular expressions don't support alternation.
>>
>> Under which circumstances did it work for you without -E or -P?
>>
>
> It has always worked until I installed 2.39.0 on my mac. I've also checked with other developers that are using a mac and they reports the same. On Linux it works regardless the git version.
>
> Using grep directly also works, so it is a different behavior between grep and git grep, that is surprising.
Meaning you used Apple's version of git before?
$ echo abd > test.file
$ /usr/bin/git grep --untracked 'b\|c' test.file
test.file:abd
$ /usr/bin/git version
git version 2.37.1 (Apple Git-137.1)
$ git grep --untracked 'b\|c' test.file
$ git version
git version 2.39.0
So I can reproduce your findings on macOS Ventura. Same with grep:
$ grep 'b\|c' test.file
abd
$ grep --version
grep (BSD grep, GNU compatible) 2.6.0-FreeBSD
re_format(7) says:
"Obsolete (“basic”) regular expressions differ in several respects.
‘|’ is an ordinary character and there is no equivalent for its
functionality.".
Under the headline "ENHANCED FEATURES" it continues, however:
"When the REG_ENHANCED flag is passed to one of the regcomp()
variants, additional features are activated."
And:
"For enhanced basic REs, ‘+’, ‘?’ and ‘|’ remain regular characters,
but ‘\+’, ‘\?’ and ‘\|’ have the same special meaning as the
unescaped characters do for extended REs, i.e., one or more
matches, zero or one matches and alteration, respectively."
So apparently Apple chose a middle ground between basic and extended
regular expressions for its grep and git grep.
Under "IMPLEMENTATION CHOICES" it says:
"The regex implementation in Mac OS X 10.8 and later is based on a
heavily modified subset of TRE (http://laurikari.net/tre/)."
The manpage of GNU grep says:
"grep understands three different versions of regular expression
syntax: “basic” (BRE), “extended” (ERE) and “perl” (PCRE). In
GNU grep there is no difference in available functionality
between basic and extended syntax. In other implementations,
basic regular expressions are less powerful."
And under the headline "Basic vs Extended Regular Expressions":
"In basic regular expressions the meta-characters ?, +, {, |, (,
and ) lose their special meaning; instead use the backslashed
versions \?, \+, \{, \|, \(, and \)."
So GNU grep apparently made the same choice as Apple, probably far
earlier.
When I compile git with NO_REGEX and thus with the fallback in
compat/regex/, the result supports alternation as well:
$ ./git grep --untracked 'b\|c' test.file
test.file:abd
$ nm ./git | grep regcomp
0000000100255978 T _git_regcomp
Based on that I suggest:
--- >8 ---
Subject: grep: use REG_ENHANCED on macOS
GNU grep(1) and regcomp(3) use enhanced basic regular expressions by
default, which means that it e.g. supports alternation, e.g. "a\|b"
matches both "a" and "b". The same is true for our compat/regex/
implementation.
On macOS Ventura (and possibly earlier) grep(1) also uses enhanced BREs,
but regcomp(3) requires the flag REG_ENHANCED to turn them on. Do that
for git grep if possible, for consistency with the platform's grep(1)
and our fallback library.
It would be simpler to use REG_ENHANCED everywhere it is defined instead
of adding a Makefile setting, but it's a non-standard extension and
might mean something else on other platforms.
Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Makefile | 8 ++++++++
config.mak.uname | 1 +
grep.c | 4 ++++
3 files changed, 13 insertions(+)
diff --git a/Makefile b/Makefile
index db447d0738..15e7edc9d2 100644
--- a/Makefile
+++ b/Makefile
@@ -289,6 +289,10 @@ include shared.mak
# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
# feature.
#
+# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag
+# REG_ENHANCED to enable enhanced basic regular expressions and you'd
+# like to use it in git grep.
+#
# Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
# user.
#
@@ -2037,6 +2041,10 @@ endif
ifdef NO_REGEX
COMPAT_CFLAGS += -Icompat/regex
COMPAT_OBJS += compat/regex/regex.o
+else
+ifdef GIT_GREP_USES_REG_ENHANCED
+ COMPAT_CFLAGS += -DGIT_GREP_USES_REG_ENHANCED
+endif
endif
ifdef NATIVE_CRLF
BASIC_CFLAGS += -DNATIVE_CRLF
diff --git a/config.mak.uname b/config.mak.uname
index d63629fe80..14c145ae42 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin)
FREAD_READS_DIRECTORIES = UnfortunatelyYes
HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
CSPRNG_METHOD = arc4random
+ GIT_GREP_USES_REG_ENHANCED = YesPlease
# Workaround for `gettext` being keg-only and not even being linked via
# `brew link --force gettext`, should be obsolete as of
diff --git a/grep.c b/grep.c
index 06eed69493..a8430daaba 100644
--- a/grep.c
+++ b/grep.c
@@ -502,6 +502,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
regflags |= REG_ICASE;
if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE)
regflags |= REG_EXTENDED;
+#if defined(GIT_GREP_USES_REG_ENHANCED) && defined(REG_ENHANCED)
+ else
+ regflags |= REG_ENHANCED;
+#endif
err = regcomp(&p->regexp, p->pattern, regflags);
if (err) {
char errbuf[1024];
--
2.39.0
next prev parent reply other threads:[~2023-01-03 20:52 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-03 9:53 BUG: git grep behave oddly with alternatives Marco Nenciarini
2023-01-03 16:29 ` René Scharfe
2023-01-03 18:13 ` Marco Nenciarini
2023-01-03 20:52 ` René Scharfe [this message]
2023-01-04 6:13 ` Junio C Hamano
2023-01-04 7:46 ` Jeff King
2023-01-04 16:36 ` René Scharfe
2023-01-06 9:09 ` Jeff King
2023-01-08 0:42 ` René Scharfe
2023-01-08 1:27 ` Junio C Hamano
2023-01-11 18:56 ` Jeff King
2023-01-12 17:13 ` René Scharfe
2023-01-12 17:52 ` Ævar Arnfjörð Bjarmason
2023-01-12 21:54 ` Jeff King
2023-01-13 8:28 ` Ævar Arnfjörð Bjarmason
2023-01-13 17:19 ` Junio C Hamano
2023-01-14 6:44 ` René Scharfe
2023-01-14 8:31 ` René Scharfe
2023-01-14 12:45 ` Diomidis Spinellis
2023-01-14 16:08 ` Junio C Hamano
2023-01-13 17:24 ` René Scharfe
2023-01-13 23:03 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=343a891e-d737-0ace-26a9-3839d3bd5583@web.de \
--to=l.s.r@web.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=marco.nenciarini@enterprisedb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).