From: "D. Ben Knoble" <ben.knoble@gmail.com>
To: Jeff King <peff@peff.net>
Cc: demerphq <demerphq@gmail.com>, git@vger.kernel.org
Subject: Re: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)
Date: Thu, 2 Feb 2023 15:47:28 -0500 [thread overview]
Message-ID: <CALnO6CA3LL2TbMyvVsgeNgGHr9tGq4-FYR0-RMyJJiMvV3P91w@mail.gmail.com> (raw)
In-Reply-To: <Y9rv29c0dYUAYx8B@coredump.intra.peff.net>
On Wed, Feb 1, 2023 at 6:03 PM Jeff King <peff@peff.net> wrote:
> So the regex engine is complaining that it is getting bytes with high
> bits set, but that are not part of a multi-byte character. I.e., it is
> not happy to do bytewise matching, but really wants valid UTF8 in the
> expression.
I did manage to find that the call to regcomp in diff.c's
init_diff_words_data (line 2212 in v2.39.1) is what crashes; I could
not step into it with gdb, however.
Further, the following C program compiles without warnings (except for
the unused main parameters):
```
#include <regex.h>
#include <assert.h>
#include <stddef.h>
#include <stdio.h>
int main(int argc, char **argv) {
regex_t re;
int ret = regcomp(&re, "[\xc0-\xff][\x80-\xbf]+", REG_EXTENDED |
REG_NEWLINE);
/* assert(ret != 0); */
size_t errbuf_size = regerror(ret, &re, NULL, 0);
char errbuf[errbuf_size];
regerror(ret, &re, errbuf, errbuf_size);
printf("%s\n", errbuf);
}
```
```
# CFLAGS='-Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes
-Wold-style-definition -Wshadow -Wpointer-arith -Wcast-qual -pedantic
-std=c11'
# cc $CFLAGS regtest.c -o regtest && ./regtest
*** unknown regexp error code ***
```
(the assertion fails because regcomp succeeds!)
So I can neither find out what's to blame nor what to fix. Here are
the linked libraries on macOS (IIUC):
```
# otool -L regtest
regtest:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1311.0.0)
# otool -L ./git-diff # from v2.39.1 source build today
./git-diff:
/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices
(compatibility version 1.0.0, current version 1141.1.0)
/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
/usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version
12.0.0, current version 12.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1311.0.0)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
(compatibility version 150.0.0, current version 1856.105.0)
```
--
D. Ben Knoble
next prev parent reply other threads:[~2023-02-02 20:48 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 15:18 grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f) D. Ben Knoble
2023-02-01 16:09 ` demerphq
2023-02-01 16:21 ` D. Ben Knoble
2023-02-01 18:23 ` demerphq
2023-02-01 18:54 ` Junio C Hamano
2023-02-01 21:33 ` D. Ben Knoble
2023-02-01 21:34 ` D. Ben Knoble
2023-02-01 22:15 ` Junio C Hamano
2023-02-01 23:03 ` Jeff King
2023-02-02 16:22 ` demerphq
2023-02-02 20:49 ` D. Ben Knoble
2023-02-03 17:01 ` Jeff King
2023-02-03 21:56 ` Ævar Arnfjörð Bjarmason
2023-02-04 11:17 ` Jeff King
2023-02-04 11:32 ` demerphq
2023-02-05 19:51 ` D. Ben Knoble
2023-02-07 18:23 ` Jeff King
2023-02-07 22:27 ` D. Ben Knoble
2023-02-07 18:19 ` Jeff King
2023-02-02 20:47 ` D. Ben Knoble [this message]
2023-02-03 16:55 ` Jeff King
2023-02-03 17:06 ` D. Ben Knoble
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALnO6CA3LL2TbMyvVsgeNgGHr9tGq4-FYR0-RMyJJiMvV3P91w@mail.gmail.com \
--to=ben.knoble@gmail.com \
--cc=demerphq@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).