ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: duerst via ruby-core <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: duerst <noreply@ruby-lang.org>
Subject: [ruby-core:117591] [Ruby master Misc#20434] Deprecate encoding-releated regular expression modifiers
Date: Thu, 18 Apr 2024 06:22:30 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-107992.20240418062231.10206@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-20434.20240417155807.10206@ruby-lang.org

Issue #20434 has been updated by duerst (Martin Dürst).


I guess there might still be some use for the encoding-related modifiers in single-line scripts and the like. But I don't have an actual use case; I hope whoever has such an use case comes forward.

The replacement code (`::Regexp.new(::String.new("\x81\x40", encoding: "Windows-31J"))`) is quite lengthy. This makes it clear that while each regular expression has an encodings in the same way as each String has an encoding, regular expressions don't really allow to manipulate the encoding. Strings have #force_encoding and #encode, so maybe adding one or both methods to Regexp would help. The example could then be written as `/\x81\x40/.force_encoding("Windows-31J")` or /\3000/.encode("Windows-31J").

----------------------------------------
Misc #20434: Deprecate encoding-releated regular expression modifiers
https://bugs.ruby-lang.org/issues/20434#change-107992

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
This is a follow-up to @duerst's comment here: https://bugs.ruby-lang.org/issues/20406#note-6.

As noted in the other issue, there are many encodings that factor in to how a regular expression operates. This includes:

* The encoding of the file
* The encoding of the string parts within the regular expression
* The regular expression encoding modifiers
* The encoding of the string being matched

At the time the modifiers were introduced, I believe the modifiers may have been the only (??) encoding that factored in here. At this point, however, they can lead to quite a bit of confusion, as noted in the other ticket.

I would like to propose to deprecate the regular expression encoding modifiers. Instead, we could suggest in a warning to instead create a regular expression with an encoded string. For example, when we find:

```ruby
/\x81\x40/s
```

we would instead suggest:

```ruby
::Regexp.new(::String.new("\x81\x40", encoding: "Windows-31J"))
```

or equivalent. As a migration path, we could do the following:

1. Emit a warning to change to the suggested expression
2. Change the compiler to compile to the suggested expression when those flags are found
3. Remove support for the flags

Step 2 may be unnecessary depending on how long of a timeline we would like to provide. To be clear, I'm not advocating for any particular timeline, and would be fine with this being multiple years/versions to give plenty of time for people to migrate. But I do think this would be a good change to eliminate confusion about the interaction between the four different encodings at play.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

  parent reply	other threads:[~2024-04-18  6:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17 15:58 [ruby-core:117568] [Ruby master Misc#20434] Deprecate regular expression modifiers kddnewton (Kevin Newton) via ruby-core
2024-04-18  0:23 ` [ruby-core:117581] [Ruby master Misc#20434] Deprecate encoding-releated " shyouhei (Shyouhei Urabe) via ruby-core
2024-04-18  6:22 ` duerst via ruby-core [this message]
2024-04-18  9:23 ` [ruby-core:117595] " byroot (Jean Boussier) via ruby-core
2024-04-18 11:41 ` [ruby-core:117596] [Ruby master Misc#20434] Deprecate encoding-related " Eregon (Benoit Daloze) via ruby-core
2024-04-18 18:50 ` [ruby-core:117600] " kddnewton (Kevin Newton) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-107992.20240418062231.10206@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=noreply@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).