ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:114181] [Ruby master Bug#19767] [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex
@ 2023-07-14  8:10 rubyFeedback (robert heiler) via ruby-core
  2023-09-29 12:57 ` [ruby-core:114922] [Ruby master Misc#19767] " Dan0042 (Daniel DeLorme) via ruby-core
  0 siblings, 1 reply; 3+ messages in thread
From: rubyFeedback (robert heiler) via ruby-core @ 2023-07-14  8:10 UTC (permalink / raw
  To: ruby-core; +Cc: rubyFeedback (robert heiler)

Issue #19767 has been reported by rubyFeedback (robert heiler).

----------------------------------------
Bug #19767: [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex
https://bugs.ruby-lang.org/issues/19767

* Author: rubyFeedback (robert heiler)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN
----------------------------------------
To get my knowledge about ruby regexes up-to-date I have been
going through this tutorial/book here at:

https://learnbyexample.github.io/Ruby_Regexp/unicode.html

One example they provide is this, with some odd characters:

    'fox:αλεπού'.scan(/\w+/n)

This will match the found word ("fox"), but it also reports
the following warning:

    warning: historical binary regexp match /.../n against UTF-8 string

Now: this may be obvious to others, but to me personally I am not
sure what a "historical" binary regexp match actually is. I assume
it may have meant that this was more used in the past, and may be
discouraged now? Or is something else meant? What does "historical"
mean in this context?

I may not be the only one who does not fully understand the term
historical. Most of ruby's warnings are fairly easy to understand,
but this one seems odd. Right now I do not know whether we can use
the "n" modifier in a regex - not that I really have a good use 
case for it (I am using UTF-8 these days, so I don't seem to need
ASCII-8BIT anyway), but perhaps the warning could be changed a little.

I have no good alternative suggestion how it can be changed, largely
because I do not know what it actually means, e. g. what is "historical"
about it (but, even then, I'd actually recommend against using the 
word "historical" because I don't understand what it means; deprecated
is easy to understand, historical does not tell me anything).

Perhaps it could be expressed somewhat differently and we could get
rid of the word "historical" there? Either way, it's a tiny issue so
I was not even sure whether to report it. But, from the point of view
of other warnings, I believe the term "historical" does not tell the
user enough about what the issue is here.


(irb):1: warning: historical binary regexp match /.../n against UTF-8 string
=> ["fox"]




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:114922] [Ruby master Misc#19767] [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex
  2023-07-14  8:10 [ruby-core:114181] [Ruby master Bug#19767] [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex rubyFeedback (robert heiler) via ruby-core
@ 2023-09-29 12:57 ` Dan0042 (Daniel DeLorme) via ruby-core
  2023-09-29 13:55   ` [ruby-core:114923] " Владислав Родин via ruby-core
  0 siblings, 1 reply; 3+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2023-09-29 12:57 UTC (permalink / raw
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #19767 has been updated by Dan0042 (Daniel DeLorme).


The "historical" and "binary" parts were added in 2017
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/d8cee4ff0a851037e96fe76d951a1549284c875a/diff/re.c
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/dbd4c4a7b373061d235857f7f34e15859a7f1051/diff/re.c
The original warning was added in 2008
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/880a96c795d30d95497cb216c8bfc7fa1b3b5387/diff/re.c

It means that even though it may look like a binary regexp, it doesn't act like one. `"é"[/./n] == "é"`, not the first byte of "é"

TBH I don't know why it was done that way. It would be convenient if `/.../n =~ str` was equivalent to `/.../n =~ str.b` but without the intermediary string.

----------------------------------------
Misc #19767: [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex
https://bugs.ruby-lang.org/issues/19767#change-104790

* Author: rubyFeedback (robert heiler)
* Status: Open
* Priority: Normal
----------------------------------------
To get my knowledge about ruby regexes up-to-date I have been
going through this tutorial/book here at:

https://learnbyexample.github.io/Ruby_Regexp/unicode.html

One example they provide is this, with some odd characters:

    'fox:αλεπού'.scan(/\w+/n)

This will match the found word ("fox"), but it also reports
the following warning:

    warning: historical binary regexp match /.../n against UTF-8 string

Now: this may be obvious to others, but to me personally I am not
sure what a "historical" binary regexp match actually is. I assume
it may have meant that this was more used in the past, and may be
discouraged now? Or is something else meant? What does "historical"
mean in this context?

I may not be the only one who does not fully understand the term
historical. Most of ruby's warnings are fairly easy to understand,
but this one seems odd. Right now I do not know whether we can use
the "n" modifier in a regex - not that I really have a good use 
case for it (I am using UTF-8 these days, so I don't seem to need
ASCII-8BIT anyway), but perhaps the warning could be changed a little.

I have no good alternative suggestion how it can be changed, largely
because I do not know what it actually means, e. g. what is "historical"
about it (but, even then, I'd actually recommend against using the 
word "historical" because I don't understand what it means; deprecated
is easy to understand, historical does not tell me anything).

Perhaps it could be expressed somewhat differently and we could get
rid of the word "historical" there? Either way, it's a tiny issue so
I was not even sure whether to report it. But, from the point of view
of other warnings, I believe the term "historical" does not tell the
user enough about what the issue is here.


(irb):1: warning: historical binary regexp match /.../n against UTF-8 string
=> ["fox"]




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:114923] Re: [Ruby master Misc#19767] [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex
  2023-09-29 12:57 ` [ruby-core:114922] [Ruby master Misc#19767] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2023-09-29 13:55   ` Владислав Родин via ruby-core
  0 siblings, 0 replies; 3+ messages in thread
From: Владислав Родин via ruby-core @ 2023-09-29 13:55 UTC (permalink / raw
  To: Ruby developers
  Cc: Владислав Родин


[-- Attachment #1.1: Type: text/plain, Size: 3684 bytes --]

😃

пт, 29 сент. 2023 г. в 18:57, Dan0042 (Daniel DeLorme) via ruby-core <
ruby-core@ml.ruby-lang.org>:

> Issue #19767 has been updated by Dan0042 (Daniel DeLorme).
>
>
> The "historical" and "binary" parts were added in 2017
>
> https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/d8cee4ff0a851037e96fe76d951a1549284c875a/diff/re.c
>
> https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/dbd4c4a7b373061d235857f7f34e15859a7f1051/diff/re.c
> The original warning was added in 2008
>
> https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/880a96c795d30d95497cb216c8bfc7fa1b3b5387/diff/re.c
>
> It means that even though it may look like a binary regexp, it doesn't act
> like one. `"é"[/./n] == "é"`, not the first byte of "é"
>
> TBH I don't know why it was done that way. It would be convenient if
> `/.../n =~ str` was equivalent to `/.../n =~ str.b` but without the
> intermediary string.
>
> ----------------------------------------
> Misc #19767: [Not really a bug, but more a not ideal notification]
> "historical binary regexp match" when using the "n" modifier in a ruby regex
> https://bugs.ruby-lang.org/issues/19767#change-104790
>
> * Author: rubyFeedback (robert heiler)
> * Status: Open
> * Priority: Normal
> ----------------------------------------
> To get my knowledge about ruby regexes up-to-date I have been
> going through this tutorial/book here at:
>
> https://learnbyexample.github.io/Ruby_Regexp/unicode.html
>
> One example they provide is this, with some odd characters:
>
>     'fox:αλεπού'.scan(/\w+/n)
>
> This will match the found word ("fox"), but it also reports
> the following warning:
>
>     warning: historical binary regexp match /.../n against UTF-8 string
>
> Now: this may be obvious to others, but to me personally I am not
> sure what a "historical" binary regexp match actually is. I assume
> it may have meant that this was more used in the past, and may be
> discouraged now? Or is something else meant? What does "historical"
> mean in this context?
>
> I may not be the only one who does not fully understand the term
> historical. Most of ruby's warnings are fairly easy to understand,
> but this one seems odd. Right now I do not know whether we can use
> the "n" modifier in a regex - not that I really have a good use
> case for it (I am using UTF-8 these days, so I don't seem to need
> ASCII-8BIT anyway), but perhaps the warning could be changed a little.
>
> I have no good alternative suggestion how it can be changed, largely
> because I do not know what it actually means, e. g. what is "historical"
> about it (but, even then, I'd actually recommend against using the
> word "historical" because I don't understand what it means; deprecated
> is easy to understand, historical does not tell me anything).
>
> Perhaps it could be expressed somewhat differently and we could get
> rid of the word "historical" there? Either way, it's a tiny issue so
> I was not even sure whether to report it. But, from the point of view
> of other warnings, I believe the term "historical" does not tell the
> user enough about what the issue is here.
>
>
> (irb):1: warning: historical binary regexp match /.../n against UTF-8
> string
> => ["fox"]
>
>
>
>
> --
> https://bugs.ruby-lang.org/
>  ______________________________________________
>  ruby-core mailing list -- ruby-core@ml.ruby-lang.org
>  To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
>  ruby-core info --
> https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

[-- Attachment #1.2: Type: text/html, Size: 5365 bytes --]

[-- Attachment #2: Type: text/plain, Size: 264 bytes --]

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-09-29 13:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-14  8:10 [ruby-core:114181] [Ruby master Bug#19767] [Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex rubyFeedback (robert heiler) via ruby-core
2023-09-29 12:57 ` [ruby-core:114922] [Ruby master Misc#19767] " Dan0042 (Daniel DeLorme) via ruby-core
2023-09-29 13:55   ` [ruby-core:114923] " Владислав Родин via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).