ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: duerst via ruby-core <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: duerst <noreply@ruby-lang.org>
Subject: [ruby-core:117437] [Ruby master Misc#20406] Question about Regexp encoding negotiation
Date: Thu, 04 Apr 2024 00:09:16 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-107813.20240404000916.13553@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-20406.20240402171943.13553@ruby-lang.org

Issue #20406 has been updated by duerst (Martin Dürst).


This is a more general comment, but my impression is that the encoding flags on regular expressions may be outdated. They exist since before Ruby introduced encoding information for Strings,... in Ruby 1.9. It may be time now to look into how/when they can be deprecated.

----------------------------------------
Misc #20406: Question about Regexp encoding negotiation
https://bugs.ruby-lang.org/issues/20406#change-107813

* Author: andrykonchin (Andrew Konchin)
* Status: Open
----------------------------------------
I am wondering what are the rules to calculate Regexp literal encoding in case an encoding modifier is specified.


From the documentstion:

> By default, a regexp with only US-ASCII characters has US-ASCII encoding:
> ...
> A regular expression containing non-US-ASCII characters is assumed to use the source encoding. This can be overridden with one of the following modifiers.
> //n ...
> //u ...
> //e ...
> //s ...

Looking at the following examples I would assume that these rules are followed except one case:

```ruby
 p /\xc2\xa1/e     .encoding # EUC-JP
 p /#{ }\xc2\xa1/e .encoding # EUC-JP

 p /a/e            .encoding # EUC-JP
 p /a #{} a/e      .encoding # EUC-JP
 p /#{} a/e        .encoding # US-ASCII
```

The last Regexp `/#{} a/e` is supposed to have `EUC-JP` encoding but has `US-ASCII`. So I am wondering what rule is applied in this case.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

  parent reply	other threads:[~2024-04-04  0:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 17:19 [ruby-core:117408] [Ruby master Misc#20406] Question about Regexp encoding negotiation andrykonchin (Andrew Konchin) via ruby-core
2024-04-03  6:10 ` [ruby-core:117421] " shyouhei (Shyouhei Urabe) via ruby-core
2024-04-03  9:57 ` [ruby-core:117426] " Eregon (Benoit Daloze) via ruby-core
2024-04-03 10:29 ` [ruby-core:117427] " Eregon (Benoit Daloze) via ruby-core
2024-04-03 10:33 ` [ruby-core:117428] " Eregon (Benoit Daloze) via ruby-core
2024-04-04  0:09 ` duerst via ruby-core [this message]
2024-04-04 11:35 ` [ruby-core:117441] " Eregon (Benoit Daloze) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-107813.20240404000916.13553@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=noreply@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).