From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on starla X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 00E8A1F44D for ; Thu, 4 Apr 2024 11:36:09 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; secure) header.d=ml.ruby-lang.org header.i=@ml.ruby-lang.org header.a=rsa-sha256 header.s=mail header.b=E8kS+OF3; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=qQs13J8N; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 7243083C31; Thu, 4 Apr 2024 11:36:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ml.ruby-lang.org; s=mail; t=1712230561; bh=BlpI03ChgTWTQ1U7kmNE9PqcguQHCI76WnjcL0wjzKY=; h=Date:References:To:Reply-To:Subject:List-Id:List-Archive: List-Help:List-Owner:List-Post:List-Subscribe:List-Unsubscribe: From:Cc:From; b=E8kS+OF3hQGNjIa/wWADn7q2gFz6H1yLjaaeUTmxVXcGAAJLBq6Cx1Jh1CpeHf/5k ayP5PkVnIjw6DAKvnQMapQHH2oVkwu4mQenpI0zS+7IV3JYhd9VSuixEfVS8rhpa6W TzRQcDx8GKNEhxh+4EXqPJUgkJsmLyD0F/aY/3Mk= Received: from s.wrqvtbkv.outbound-mail.sendgrid.net (s.wrqvtbkv.outbound-mail.sendgrid.net [149.72.123.24]) by nue.mailmanlists.eu (Postfix) with ESMTPS id E672C83C1E for ; Thu, 4 Apr 2024 11:35:57 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=pass (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=qQs13J8N; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=MTPvsFmH5yOiw4ToTF+0GJs8tCn3G0ZL4dLvO4VxD44=; b=qQs13J8NlQAFS93Fv4VWOqZBAphvBF/rXfSQKUGoLbEgGhdODoBgnO/W9nHS+R1nxvNL 32a9g0FPpIaTBc0YyUFJuX7oxLTPP+5FG4rE63TrREdgeq30DG/UrB8wZL5gbfcpc0HgrM RKo7nbC8uc4ZLwB/7qVsHO9rBgxy9JNmcercinZ+veQFtKFDWZHVc7mnqJw9MBqv4z6NPz s/qs7ZZSmKTwr4Wuk/rhjVp6aAStG+IkL7h+EWC7Y1XN7QBDdPUIqx5eFvgv19CdBqrtx2 U68Sz/V/l1la3PHjNWc+NJu6gnJRC7/5L3/YWmkBZ/O8DxIHGdRbf/pvA4/E5dmg== Received: by recvd-7fc89fc779-sknwq with SMTP id recvd-7fc89fc779-sknwq-1-660E909C-F 2024-04-04 11:35:56.797244661 +0000 UTC m=+1434991.321145361 Received: from herokuapp.com (unknown) by geopod-ismtpd-38 (SG) with ESMTP id QlDNiNL_QSKohDfQVE_fEA for ; Thu, 04 Apr 2024 11:35:56.760 +0000 (UTC) Date: Thu, 04 Apr 2024 11:35:56 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Misc X-Redmine-Issue-Id: 20406 X-Redmine-Issue-Author: andrykonchin X-Redmine-Issue-Priority: Normal X-Redmine-Sender: Eregon X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 94036 X-SG-EID: =?us-ascii?Q?u001=2EByjZWvxTCjdoV8K03xEuhE7KqN4thWULFLM7+oH78KY30oYB3qFthsDpL?= =?us-ascii?Q?4w4cbYa3ttBh8bAHPOnE=2FkzPba67JNu7Lnrked2?= =?us-ascii?Q?O7K9VQ=2FJax2O8Cdr7riY+514ZCxdYjNiKxCHpQw?= =?us-ascii?Q?Wwmv2rlHEzNScEtLw8+hIVzXVX=2FGZDCKo6c15BQ?= =?us-ascii?Q?v6vOqgWsLyyitZv0myWj4irByQ9BEas5P7PipGl?= =?us-ascii?Q?bGwh+gU1zJ6A5k40rxyQEkbpvDtI+FBA2=2FiZ9Uc?= =?us-ascii?Q?Cnboh4UL448oc+YIrVh9dAxFMg=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: u001.I8uzylDtAfgbeCOeLBYDww== Message-ID-Hash: SL5A4YQDFXSMXXU6HPKKZ5ILPZMB5R3L X-Message-ID-Hash: SL5A4YQDFXSMXXU6HPKKZ5ILPZMB5R3L X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:117441] [Ruby master Misc#20406] Question about Regexp encoding negotiation List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: "Eregon (Benoit Daloze) via ruby-core" Cc: "Eregon (Benoit Daloze)" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Issue #20406 has been updated by Eregon (Benoit Daloze). Indeed, on a similar topic I wonder how much encoding negotiation at Regexp creation time matters. Because there is another encoding negotiation between the regexp and the string being matched which happens when matching. Maybe the Regexp encoding should e.g. always be US-ASCII if there are only 7-bit characters in the Regexp source, or maybe always UTF-8 in that case since it's most likely a regexp will be matched against UTF-8 strings, this illustrates the Regexp encoding doesn't really matter for the 7-bit source case. Or maybe Regexp literals should just always use the source encoding, that would make things a lot simpler and closer to string literals. And the `/nesu` flag would just override the source encoding (and maybe be eventually deprecated, but probably not worth it if their semantics are clear). I'm not sure what's the point of `Regexp#fixed_encoding?` either, it seems regardless of it a Regexp can be matched with strings of different but compatible encodings (the docs about this in `ri Regexp` are incorrect). ---------------------------------------- Misc #20406: Question about Regexp encoding negotiation https://bugs.ruby-lang.org/issues/20406#change-107821 * Author: andrykonchin (Andrew Konchin) * Status: Open ---------------------------------------- I am wondering what are the rules to calculate Regexp literal encoding in case an encoding modifier is specified. >From the documentstion: > By default, a regexp with only US-ASCII characters has US-ASCII encoding: > ... > A regular expression containing non-US-ASCII characters is assumed to use the source encoding. This can be overridden with one of the following modifiers. > //n ... > //u ... > //e ... > //s ... Looking at the following examples I would assume that these rules are followed except one case: ```ruby p /\xc2\xa1/e .encoding # EUC-JP p /#{ }\xc2\xa1/e .encoding # EUC-JP p /a/e .encoding # EUC-JP p /a #{} a/e .encoding # EUC-JP p /#{} a/e .encoding # US-ASCII ``` The last Regexp `/#{} a/e` is supposed to have `EUC-JP` encoding but has `US-ASCII`. So I am wondering what rule is applied in this case. -- https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/