git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: demerphq <demerphq@gmail.com>
To: "D. Ben Knoble" <ben.knoble@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)
Date: Wed, 1 Feb 2023 17:09:33 +0100	[thread overview]
Message-ID: <CANgJU+X_e0owKC3uWPaA_gVP54syF1+MJ-cTn+fjPrNS5LDsMA@mail.gmail.com> (raw)
In-Reply-To: <CALnO6CAZtwfGY4SYeOuKqdP9+e_0EYNf4F703DRQB7UUfd_bUg@mail.gmail.com>

On Wed, 1 Feb 2023 at 16:25, D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> I recently updated to git 2.39.1 and noticed today that `git diff
> --word-diff` fails for files with `diff=scheme`. I was able to narrow
> the failure down to the inclusion of control characters \xc0, \xff,
> \x80, \xbf by https://github.com/git/git/blob/2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473/userdiff.c#L17
> in the definition of the scheme diff pattern (really, all patterns).
>
> I suspect the commit referenced in the subject, given that it messes
> with regex handling on macOS.
>
> Relevant environment that I can think of:
> ```
> # locale
> LANG="fr_FR.UTF-8"
> LC_COLLATE="fr_FR.UTF-8"
> LC_CTYPE="fr_FR.UTF-8"
> LC_MESSAGES="fr_FR.UTF-8"
> LC_MONETARY="fr_FR.UTF-8"
> LC_NUMERIC="fr_FR.UTF-8"
> LC_TIME="fr_FR.UTF-8"
> LC_ALL="fr_FR.UTF-8"
> ```
>
> I'm on macOS 11.7.
>
> Failure (using Zsh to produce the characters; I think there's a Bash
> equivalent):
> ```
> # git diff --word-diff --word-diff-regex=$'[\xc0-\xff][\x80-\xbf]+'
> fatal¬†: invalid regular expression: [¿-ˇ][Ä-ø]+
> ```

FWIW that looks pretty weird to me, like the escapes in the charclass
were interpolated before being fed to the regex engine. Are you sure
you tested the right thing?

Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"

  reply	other threads:[~2023-02-01 16:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01 15:18 grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f) D. Ben Knoble
2023-02-01 16:09 ` demerphq [this message]
2023-02-01 16:21   ` D. Ben Knoble
2023-02-01 18:23     ` demerphq
2023-02-01 18:54       ` Junio C Hamano
2023-02-01 21:33         ` D. Ben Knoble
2023-02-01 21:34           ` D. Ben Knoble
2023-02-01 22:15           ` Junio C Hamano
2023-02-01 23:03   ` Jeff King
2023-02-02 16:22     ` demerphq
2023-02-02 20:49       ` D. Ben Knoble
2023-02-03 17:01       ` Jeff King
2023-02-03 21:56         ` Ævar Arnfjörð Bjarmason
2023-02-04 11:17           ` Jeff King
2023-02-04 11:32         ` demerphq
2023-02-05 19:51           ` D. Ben Knoble
2023-02-07 18:23             ` Jeff King
2023-02-07 22:27               ` D. Ben Knoble
2023-02-07 18:19           ` Jeff King
2023-02-02 20:47     ` D. Ben Knoble
2023-02-03 16:55       ` Jeff King
2023-02-03 17:06         ` D. Ben Knoble

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANgJU+X_e0owKC3uWPaA_gVP54syF1+MJ-cTn+fjPrNS5LDsMA@mail.gmail.com \
    --to=demerphq@gmail.com \
    --cc=ben.knoble@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).