From: Jeff King <peff@peff.net>
To: demerphq <demerphq@gmail.com>
Cc: "D. Ben Knoble" <ben.knoble@gmail.com>, git@vger.kernel.org
Subject: Re: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)
Date: Fri, 3 Feb 2023 12:01:37 -0500 [thread overview]
Message-ID: <Y9098dyaTtiNk506@coredump.intra.peff.net> (raw)
In-Reply-To: <CANgJU+XNLqf0E2+YC8yxtRPVh=mevc3P0eeye2_nx=ULB2iVWw@mail.gmail.com>
On Thu, Feb 02, 2023 at 05:22:37PM +0100, demerphq wrote:
> I've been lurking watching some of the regex discussion on the list
> and personally I think it is asking for trouble to use "whatever regex
> engine is traditional in a given environment" instead of just choosing
> a good open source engine and using it consistently everywhere. I
> don't really buy the arguments I have seen to justify a policy of "use
> the standard library version"; regex engines vary widely in
> performance and implementation and feature set, and even the really
> good ones do not entirely agree on every semantic[1], so if you don't
> standardize you will be forever dealing with bugs related to those
> differences.
I think this is a perennial question for portable software: is it better
to be consistent across platforms (by shipping our own regex engine), or
consistent with other programs on the same platform (by using the system
regex).
I don't have a strong opinion either way. The main concern I'd have is
handling dependencies. I like pcre a lot, but I'm not sure that I would
want building Git to require pcre on every platform. If there's an
engine we can ship as a vendored dependency that builds everywhere, that
helps. We have the engine imported from gawk in compat/regex. That
_probably_ builds everywhere (though we don't really know, because any
platform that doesn't set NO_REGEX has been happily using the system
routines). But it also may not be the best choice; avoiding its
multi-byte handling was the reason behind 1819ad327 in the first place.
> I think the git project should choose the feature set[2] it thinks are
> important, and then choose a regex engine that provides those features
> and is well supported, and then use it consistently everywhere that
> git needs to do regex based matching. Anything else is asking for
> trouble at some level or another.
IMHO the biggest issue here is that the built-in userdiff regexes are
doing something a bit questionable, which is embedding high-bit
characters directly into the regex. If we can avoid that, then having
consistency in multi-byte handling across platforms becomes a lot less
important.
-Peff
next prev parent reply other threads:[~2023-02-03 17:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 15:18 grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f) D. Ben Knoble
2023-02-01 16:09 ` demerphq
2023-02-01 16:21 ` D. Ben Knoble
2023-02-01 18:23 ` demerphq
2023-02-01 18:54 ` Junio C Hamano
2023-02-01 21:33 ` D. Ben Knoble
2023-02-01 21:34 ` D. Ben Knoble
2023-02-01 22:15 ` Junio C Hamano
2023-02-01 23:03 ` Jeff King
2023-02-02 16:22 ` demerphq
2023-02-02 20:49 ` D. Ben Knoble
2023-02-03 17:01 ` Jeff King [this message]
2023-02-03 21:56 ` Ævar Arnfjörð Bjarmason
2023-02-04 11:17 ` Jeff King
2023-02-04 11:32 ` demerphq
2023-02-05 19:51 ` D. Ben Knoble
2023-02-07 18:23 ` Jeff King
2023-02-07 22:27 ` D. Ben Knoble
2023-02-07 18:19 ` Jeff King
2023-02-02 20:47 ` D. Ben Knoble
2023-02-03 16:55 ` Jeff King
2023-02-03 17:06 ` D. Ben Knoble
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y9098dyaTtiNk506@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=ben.knoble@gmail.com \
--cc=demerphq@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).