From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [IPv6:2a01:4f8:1c0c:6b10::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 7857C1F44D for ; Thu, 18 Apr 2024 09:23:56 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; secure) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=BFH6l9Pk; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=VOVkbOEV; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 8CF1F8432A; Thu, 18 Apr 2024 09:23:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1713432228; bh=vEVFh+rbPinOqVE/AljxKT7nHekAnV6B9ZGpvI+P/QU=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=BFH6l9PkyC+U+xFXXCQmzd6HBdQUuF0Fph/W/0dnQWX3rsT/lTAjeLef0HaCl+DmV g7JaUPRZTjOlkrgdKudnVIHyBcamSKVi2+JKAwUB9omlSfUnJY5s+5GuaY6bMOcRQT 9lx5nf/sfsYxsYIk17VlEz4UYXSdd3D70Oo0/228= Received: from s.wfbtzhsv.outbound-mail.sendgrid.net (s.wfbtzhsv.outbound-mail.sendgrid.net [159.183.224.104]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 9422184317 for ; Thu, 18 Apr 2024 09:23:45 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=VOVkbOEV; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=C7MvaDn0s4FXq/eHgGD/cRDhGNwfLsJAri3BQEQjZkk=; b=VOVkbOEVrAwrry2dQCfaYRwmNu312552fvLlSJLAAfIRIPK2Zoi7znBg6oSlTyRvHEyx a/vQP7efMtOrxZwv6zDilt5cq3pgPbM9nhBQUHn+C1yRZmROe9AlbozItk6+vL6R3K+4yV LQEfhP+oszNmfMMnkz49PYozmJYpfn/nHcSbDIzbEhWCxMglUf0IOvpx+96BuRNesXr41t aYqnGhAeJFw/kJURVi0jqYwvLarO3MLXVXwWVMgYF0gynPzurtQ+VW234hCdjL+WmpPZzw Qf6GCHjU4KZ71xJWZnLBrlgrg7K6wejHDJzvodEdDmEFqMIuCx5Txq+TF8rgGk1w== Received: by recvd-66b7c4cf5b-mwj48 with SMTP id recvd-66b7c4cf5b-mwj48-1-6620E6A0-5 2024-04-18 09:23:44.19614204 +0000 UTC m=+472854.720270942 Received: from herokuapp.com (unknown) by geopod-ismtpd-1 (SG) with ESMTP id DvHEp8nsSa2flyTvlnTdSQ for ; Thu, 18 Apr 2024 09:23:44.134 +0000 (UTC) Date: Thu, 18 Apr 2024 09:23:44 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Misc X-Redmine-Issue-Id: 20434 X-Redmine-Issue-Author: kddnewton X-Redmine-Issue-Priority: Normal X-Redmine-Sender: byroot X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 94191 X-SG-EID: =?us-ascii?Q?u001=2EKmNZ1u3n1vIpO8NNTdp+Q9c0ai7potxbEDLMO7SOJO=2F4KkRUz0d23466m?= =?us-ascii?Q?naiq=2F5fmA4hb60MdRMUAwHZnjIWVFu=2FrqiBOz5c?= =?us-ascii?Q?nOvkBudsSTjf9Ci0J3LVAz6FL2JiKXhl8VI1Sm=2F?= =?us-ascii?Q?Nlx0Z3NNgTRH5vFS4TkC4tTTcoR7Bc0GZUHuF94?= =?us-ascii?Q?fdGAUsT2lS5yaHZh557zPmWBKKrSCSr3AtZA3W9?= =?us-ascii?Q?vTXe2JWxy+z8D0KacwP+wbblfzDHse8D51Woh5A?= =?us-ascii?Q?Qf9qCNs85lTna3w=2FZ9HYwMVNFw=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: CKHU26KOKSEVOZM5YUI6EZ4N6GCUGXHO X-Message-ID-Hash: CKHU26KOKSEVOZM5YUI6EZ4N6GCUGXHO X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:117595] [Ruby master Misc#20434] Deprecate encoding-releated regular expression modifiers List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "byroot (Jean Boussier) via ruby-core" Cc: "byroot (Jean Boussier)" Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Issue #20434 has been updated by byroot (Jean Boussier). `/\x81\x40/.force_encoding("Windows-31J")` wouldn't work because `String#fo= rce_encoding` mutates the string, and Regexp literals are immutable. Similarly `String#encode` doesn't just change the string encoding attribute= , but convert the bytes to the new encoding. So I'd expect `/\3000/.encode(= "Windows-31J")` to fail with: ```ruby \x81" on UTF-8 (Encoding::InvalidByteSequenceError) ``` So I think the String API to mirror would be `String.new(encoding:)` - `Regexp.new(/\x81\x40/, encoding: Encoding::WINDOWS_31J)` - `Regexp.new("\x81\x40", encoding: Encoding::WINDOWS_31J)` But if we want an instance method, I think something like: `/\x81\x40/.encoded(Encoding::WINDOWS_31J)`, which by the way would also be= useful on `String`, e.g., this is common: ```ruby # frozen_string_literal: true THING =3D "f=E9e".dup.force_encoding(Encoding::ISO8859_1) ``` So it could become: ```ruby # frozen_string_literal: true THING =3D "f=E9e".encoded(Encoding::ISO8859_1) ``` ---------------------------------------- Misc #20434: Deprecate encoding-releated regular expression modifiers https://bugs.ruby-lang.org/issues/20434#change-107997 * Author: kddnewton (Kevin Newton) * Status: Open ---------------------------------------- This is a follow-up to @duerst's comment here: https://bugs.ruby-lang.org/i= ssues/20406#note-6. As noted in the other issue, there are many encodings that factor in to how= a regular expression operates. This includes: * The encoding of the file * The encoding of the string parts within the regular expression * The regular expression encoding modifiers * The encoding of the string being matched At the time the modifiers were introduced, I believe the modifiers may have= been the only (??) encoding that factored in here. At this point, however,= they can lead to quite a bit of confusion, as noted in the other ticket. I would like to propose to deprecate the regular expression encoding modifi= ers. Instead, we could suggest in a warning to instead create a regular exp= ression with an encoded string. For example, when we find: ```ruby /\x81\x40/s ``` we would instead suggest: ```ruby ::Regexp.new(::String.new("\x81\x40", encoding: "Windows-31J")) ``` or equivalent. As a migration path, we could do the following: 1. Emit a warning to change to the suggested expression 2. Change the compiler to compile to the suggested expression when those fl= ags are found 3. Remove support for the flags Step 2 may be unnecessary depending on how long of a timeline we would like= to provide. To be clear, I'm not advocating for any particular timeline, a= nd would be fine with this being multiple years/versions to give plenty of = time for people to migrate. But I do think this would be a good change to e= liminate confusion about the interaction between the four different encodin= gs at play. --=20 https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c= ore.ml.ruby-lang.org/