git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Marco Nenciarini <marco.nenciarini@enterprisedb.com>,
	git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: BUG: git grep behave oddly with alternatives
Date: Tue, 3 Jan 2023 21:52:27 +0100	[thread overview]
Message-ID: <343a891e-d737-0ace-26a9-3839d3bd5583@web.de> (raw)
In-Reply-To: <1f61b660-b2d0-ba93-3182-05a9ab97b00e@enterprisedb.com>

Am 03.01.23 um 19:13 schrieb Marco Nenciarini:
> On 03/01/23 17:29, René Scharfe wrote:
>> Am 03.01.23 um 10:53 schrieb Marco Nenciarini:
>>> I've installed latest git from brew and suddenly git grep started behaving oddly when using alternatives.
>>>
>>> ```
>>> $ echo abd > test.file
>>> $ git grep --untracked 'a\(b\|c\)d' test.file
>>> $ git grep --untracked 'a\(b\)d' test.file
>>> test.file:abd
>>> ```
>>>
>>> It should have matched in both cases.
>>>
>>>
>>> If I switch to exented pattern it starts working again
>>>
>>> ```
>>> $ git grep --untracked -E 'a(b|c)d' test.file
>>> test.file:abd
>>> ```
>>
>> This is expected: Basic regular expressions don't support alternation.
>>
>> Under which circumstances did it work for you without -E or -P?
>>
>
> It has always worked until I installed 2.39.0 on my mac. I've also checked with other developers that are using a mac and they reports the same. On Linux it works regardless the git version.
>
> Using grep directly also works, so it is a different behavior between grep and git grep, that is surprising.

Meaning you used Apple's version of git before?

   $ echo abd > test.file
   $ /usr/bin/git grep --untracked 'b\|c' test.file
   test.file:abd
   $ /usr/bin/git version
   git version 2.37.1 (Apple Git-137.1)

   $ git grep --untracked 'b\|c' test.file
   $ git version
   git version 2.39.0

So I can reproduce your findings on macOS Ventura.  Same with grep:

   $ grep 'b\|c' test.file
   abd
   $ grep --version
   grep (BSD grep, GNU compatible) 2.6.0-FreeBSD

re_format(7) says:

   "Obsolete (“basic”) regular expressions differ in several respects.
    ‘|’ is an ordinary character and there is no equivalent for its
    functionality.".

Under the headline "ENHANCED FEATURES" it continues, however:

   "When the REG_ENHANCED flag is passed to one of the regcomp()
    variants, additional features are activated."

And:

   "For enhanced basic REs, ‘+’, ‘?’ and ‘|’ remain regular characters,
    but ‘\+’, ‘\?’ and ‘\|’ have the same special meaning as the
    unescaped characters do for extended REs, i.e., one or more
    matches, zero or one matches and alteration, respectively."

So apparently Apple chose a middle ground between basic and extended
regular expressions for its grep and git grep.

Under "IMPLEMENTATION CHOICES" it says:

   "The regex implementation in Mac OS X 10.8 and later is based on a
    heavily modified subset of TRE (http://laurikari.net/tre/)."

The manpage of GNU grep says:

   "grep understands three different versions of regular expression
    syntax: “basic” (BRE), “extended” (ERE) and “perl” (PCRE).  In
    GNU grep there is no difference in available functionality
    between basic and extended syntax.  In other implementations,
    basic regular expressions are less powerful."

And under the headline "Basic vs Extended Regular Expressions":

   "In basic regular expressions the meta-characters ?, +, {, |, (,
    and ) lose their special meaning; instead use the backslashed
    versions \?, \+, \{, \|, \(, and \)."

So GNU grep apparently made the same choice as Apple, probably far
earlier.

When I compile git with NO_REGEX and thus with the fallback in
compat/regex/, the result supports alternation as well:

   $ ./git grep --untracked 'b\|c' test.file
   test.file:abd
   $ nm ./git | grep regcomp
   0000000100255978 T _git_regcomp

Based on that I suggest:

--- >8 ---
Subject: grep: use REG_ENHANCED on macOS

GNU grep(1) and regcomp(3) use enhanced basic regular expressions by
default, which means that it e.g. supports alternation, e.g. "a\|b"
matches both "a" and "b".  The same is true for our compat/regex/
implementation.

On macOS Ventura (and possibly earlier) grep(1) also uses enhanced BREs,
but regcomp(3) requires the flag REG_ENHANCED to turn them on.  Do that
for git grep if possible, for consistency with the platform's grep(1)
and our fallback library.

It would be simpler to use REG_ENHANCED everywhere it is defined instead
of adding a Makefile setting, but it's a non-standard extension and
might mean something else on other platforms.

Reported-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 Makefile         | 8 ++++++++
 config.mak.uname | 1 +
 grep.c           | 4 ++++
 3 files changed, 13 insertions(+)

diff --git a/Makefile b/Makefile
index db447d0738..15e7edc9d2 100644
--- a/Makefile
+++ b/Makefile
@@ -289,6 +289,10 @@ include shared.mak
 # Define NO_REGEX if your C library lacks regex support with REG_STARTEND
 # feature.
 #
+# Define GIT_GREP_USES_REG_ENHANCED if your C library provides the flag
+# REG_ENHANCED to enable enhanced basic regular expressions and you'd
+# like to use it in git grep.
+#
 # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
 # user.
 #
@@ -2037,6 +2041,10 @@ endif
 ifdef NO_REGEX
 	COMPAT_CFLAGS += -Icompat/regex
 	COMPAT_OBJS += compat/regex/regex.o
+else
+ifdef GIT_GREP_USES_REG_ENHANCED
+	COMPAT_CFLAGS += -DGIT_GREP_USES_REG_ENHANCED
+endif
 endif
 ifdef NATIVE_CRLF
 	BASIC_CFLAGS += -DNATIVE_CRLF
diff --git a/config.mak.uname b/config.mak.uname
index d63629fe80..14c145ae42 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -147,6 +147,7 @@ ifeq ($(uname_S),Darwin)
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
 	CSPRNG_METHOD = arc4random
+	GIT_GREP_USES_REG_ENHANCED = YesPlease

 	# Workaround for `gettext` being keg-only and not even being linked via
 	# `brew link --force gettext`, should be obsolete as of
diff --git a/grep.c b/grep.c
index 06eed69493..a8430daaba 100644
--- a/grep.c
+++ b/grep.c
@@ -502,6 +502,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		regflags |= REG_ICASE;
 	if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE)
 		regflags |= REG_EXTENDED;
+#if defined(GIT_GREP_USES_REG_ENHANCED) && defined(REG_ENHANCED)
+	else
+		regflags |= REG_ENHANCED;
+#endif
 	err = regcomp(&p->regexp, p->pattern, regflags);
 	if (err) {
 		char errbuf[1024];
--
2.39.0

  reply	other threads:[~2023-01-03 20:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-03  9:53 BUG: git grep behave oddly with alternatives Marco Nenciarini
2023-01-03 16:29 ` René Scharfe
2023-01-03 18:13   ` Marco Nenciarini
2023-01-03 20:52     ` René Scharfe [this message]
2023-01-04  6:13       ` Junio C Hamano
2023-01-04  7:46       ` Jeff King
2023-01-04 16:36         ` René Scharfe
2023-01-06  9:09           ` Jeff King
2023-01-08  0:42             ` René Scharfe
2023-01-08  1:27               ` Junio C Hamano
2023-01-11 18:56               ` Jeff King
2023-01-12 17:13                 ` René Scharfe
2023-01-12 17:52                   ` Ævar Arnfjörð Bjarmason
2023-01-12 21:54                   ` Jeff King
2023-01-13  8:28                     ` Ævar Arnfjörð Bjarmason
2023-01-13 17:19                       ` Junio C Hamano
2023-01-14  6:44                         ` René Scharfe
2023-01-14  8:31                           ` René Scharfe
2023-01-14 12:45                             ` Diomidis Spinellis
2023-01-14 16:08                               ` Junio C Hamano
2023-01-13 17:24                       ` René Scharfe
2023-01-13 23:03                         ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=343a891e-d737-0ace-26a9-3839d3bd5583@web.de \
    --to=l.s.r@web.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=marco.nenciarini@enterprisedb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).