ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
@ 2022-02-17 11:02 andrykonchin (Andrew Konchin)
  2022-02-21  3:34 ` [ruby-core:107677] " mame (Yusuke Endoh)
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: andrykonchin (Andrew Konchin) @ 2022-02-17 11:02 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been reported by andrykonchin (Andrew Konchin).

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107677] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
@ 2022-02-21  3:34 ` mame (Yusuke Endoh)
  2022-02-21  3:44 ` [ruby-core:107678] " mame (Yusuke Endoh)
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2022-02-21  3:34 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by mame (Yusuke Endoh).

Assignee set to duerst (Martin Dürst)
Status changed from Open to Assigned

The document of Unicode case folding (http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt) says:

```
0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE
```

"F" is for "full case folding", and "T" is for "Turkic languages".

String#downcase uses full Unicode case mapping by default (See https://docs.ruby-lang.org/en/3.0/String.html#method-i-downcase). You can get the result you expected by `:turkic` option.

```
'İ'.downcase(:turkic).chars
=> ["i"]
```



----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96596

* Author: andrykonchin (Andrew Konchin)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107678] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
  2022-02-21  3:34 ` [ruby-core:107677] " mame (Yusuke Endoh)
@ 2022-02-21  3:44 ` mame (Yusuke Endoh)
  2022-02-22 23:15 ` [ruby-core:107727] " andrykonchin (Andrew Konchin)
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2022-02-21  3:44 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by mame (Yusuke Endoh).


@duerst Looks like this document https://www.unicode.org/charts/case/ (which is referred by https://docs.ruby-lang.org/en/master/doc/case_mapping_rdoc.html) says that the lowercase of U+0130 is U+0069. Which is correct?

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96597

* Author: andrykonchin (Andrew Konchin)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107727] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
  2022-02-21  3:34 ` [ruby-core:107677] " mame (Yusuke Endoh)
  2022-02-21  3:44 ` [ruby-core:107678] " mame (Yusuke Endoh)
@ 2022-02-22 23:15 ` andrykonchin (Andrew Konchin)
  2022-02-23  8:17 ` [ruby-core:107731] " duerst
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: andrykonchin (Andrew Konchin) @ 2022-02-22 23:15 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by andrykonchin (Andrew Konchin).


Thank you for the suggestion.

I am wondering whether `String#downcase` (when called without arguments) follows only Unicode case mapping rules (as stated in the [documentation]). Or also the folding ones.

I would expect that a call of `String#downcase` without arguments uses the one-to-one case mapping rules, that are specified in the [UnicodeData.txt] file.

[documentation]: https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase
[UnicodeData.txt]: https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96650

* Author: andrykonchin (Andrew Konchin)
* Status: Assigned
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107731] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (2 preceding siblings ...)
  2022-02-22 23:15 ` [ruby-core:107727] " andrykonchin (Andrew Konchin)
@ 2022-02-23  8:17 ` duerst
  2022-02-23  9:27 ` [ruby-core:107732] " andrykonchin (Andrew Konchin)
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: duerst @ 2022-02-23  8:17 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by duerst (Martin Dürst).

Status changed from Assigned to Closed

andrykonchin (Andrew Konchin) wrote in #note-3:
> Thank you for the suggestion.
> 
> I am wondering whether `String#downcase` (when called without arguments) follows only Unicode case mapping rules (as stated in the [documentation]). Or also the folding ones?
> 
> I would expect that a call of `String#downcase` without arguments uses the one-to-one case mapping rules, that are specified in the [UnicodeData.txt] file.

It should use the mappings in https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt.

And that is 0069 0307 (i.e. 'i' followed by dot above) for 'İ'.downcase.

> [documentation]: https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase
> [UnicodeData.txt]: https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

The data in UnicodeData is restricted to simple case mappings (i.e. mappings that don't change the length of the string in terms of number of codepoints). In Ruby, there is no need for such a restriction. See also https://www.sw.it.aoyama.ac.jp/2016/pub/RubyKaigi/, slide 23.

I'm closing this, because it works as intended/described, as far as I can see.


----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96654

* Author: andrykonchin (Andrew Konchin)
* Status: Closed
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107732] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (3 preceding siblings ...)
  2022-02-23  8:17 ` [ruby-core:107731] " duerst
@ 2022-02-23  9:27 ` andrykonchin (Andrew Konchin)
  2022-02-24  3:05 ` [ruby-core:107735] " mame (Yusuke Endoh)
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: andrykonchin (Andrew Konchin) @ 2022-02-23  9:27 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by andrykonchin (Andrew Konchin).


Thank you for your clarification.

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96655

* Author: andrykonchin (Andrew Konchin)
* Status: Closed
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107735] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (4 preceding siblings ...)
  2022-02-23  9:27 ` [ruby-core:107732] " andrykonchin (Andrew Konchin)
@ 2022-02-24  3:05 ` mame (Yusuke Endoh)
  2022-02-24  3:09 ` [ruby-core:107736] " mame (Yusuke Endoh)
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2022-02-24  3:05 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by mame (Yusuke Endoh).

Status changed from Closed to Open

@duerst Let me confirm. The rdoc of 3.1 and master refers to https://www.unicode.org/charts/case/.

> Default Case Mapping
> By default, all of these methods use full Unicode case mapping, which is suitable for most languages. See [Unicode Latin Case Chart](https://www.unicode.org/charts/case/).

It is not clear to me that the document says "0069 0307 for 'İ'.downcase". Is it okay? Should it be replaced with https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt ?

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96661

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107736] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (5 preceding siblings ...)
  2022-02-24  3:05 ` [ruby-core:107735] " mame (Yusuke Endoh)
@ 2022-02-24  3:09 ` mame (Yusuke Endoh)
  2022-02-24  9:50 ` [ruby-core:107737] " duerst
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2022-02-24  3:09 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by mame (Yusuke Endoh).


BTW, the rdoc of String#downcase in 3.1 and master is very less informative, and has a broken link (which is maybe the same issue as #18468). It was changed at commit:f7e266e6d2ccad63e4245a106a80c82ef2b38cbf between 3.0 and 3.1. Personally I strongly prefer [the 3.0 style](https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase).

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96662

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107737] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (6 preceding siblings ...)
  2022-02-24  3:09 ` [ruby-core:107736] " mame (Yusuke Endoh)
@ 2022-02-24  9:50 ` duerst
  2022-02-24 13:37 ` [ruby-core:107738] " mame (Yusuke Endoh)
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: duerst @ 2022-02-24  9:50 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by duerst (Martin Dürst).


mame (Yusuke Endoh) wrote in #note-7:
> BTW, the rdoc of String#downcase in 3.1 and master is very less informative, and has a broken link (which is maybe the same issue as #18468). It was changed at commit:f7e266e6d2ccad63e4245a106a80c82ef2b38cbf between 3.0 and 3.1. Personally I strongly prefer [the 3.0 style](https://ruby-doc.org/core-3.0.0/String.html#method-i-downcase).

I also prefer the 3.0 version, but that's probably because I wrote that documentation of these methods (when I implemented them). Anyway, I think the 3.1 way of documenting things could also work, but the options link on each casing method should include a fragment and point to https://ruby-doc.org/core-3.1.0/doc/case_mapping_rdoc.html#label-Default+Case+Mapping, not just to https://ruby-doc.org/core-3.1.0/doc/case_mapping_rdoc.html. @BurdetteLamar



mame (Yusuke Endoh) wrote in #note-6:
> @duerst Let me confirm. The rdoc of 3.1 and master refers to https://www.unicode.org/charts/case/.
> 
> > Default Case Mapping
> > By default, all of these methods use full Unicode case mapping, which is suitable for most languages. See [Unicode Latin Case Chart](https://www.unicode.org/charts/case/).
> 
> It is not clear to me that the document says "0069 0307 for 'İ'.downcase".

That document does NOT say "0069 0307 for 'İ'.downcase".

> Is it okay?

I reported to Unicode that they should check it an clarify how this chart was made.

> Should it be replaced with https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt ?

In the Ruby documentation, probably yes. SpecialCasing.txt is an official Unicode data file. The case charts are just a Web page. But the case charts may be easier to understand for non-experts.

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96663

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107738] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (7 preceding siblings ...)
  2022-02-24  9:50 ` [ruby-core:107737] " duerst
@ 2022-02-24 13:37 ` mame (Yusuke Endoh)
  2022-02-27  5:36 ` [ruby-core:107753] " duerst
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2022-02-24 13:37 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by mame (Yusuke Endoh).


duerst (Martin Dürst) wrote in #note-8:
> > Is it okay?
> 
> I reported to Unicode that they should check it an clarify how this chart was made.

I see, thanks!

> > Should it be replaced with https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt ?
> 
> In the Ruby documentation, probably yes. SpecialCasing.txt is an official Unicode data file. The case charts are just a Web page. But the case charts may be easier to understand for non-experts.

It's certainly easy to understand, but if it's wrong, I don't think it's even worth considering.

I wanted to create a PR to fix the document, but I am unsure what document is the best reference for full case mapping. @duerst Could you please fix it? Or should we wait until the chart will be fixed?

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96664

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107753] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (8 preceding siblings ...)
  2022-02-24 13:37 ` [ruby-core:107738] " mame (Yusuke Endoh)
@ 2022-02-27  5:36 ` duerst
  2022-02-28  8:04 ` [ruby-core:107758] " mame (Yusuke Endoh)
  2022-06-09  9:25 ` [ruby-core:108826] " mame (Yusuke Endoh)
  11 siblings, 0 replies; 13+ messages in thread
From: duerst @ 2022-02-27  5:36 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by duerst (Martin Dürst).


mame (Yusuke Endoh) wrote in #note-9:

> I wanted to create a PR to fix the document, but I am unsure what document is the best reference for full case mapping. @duerst Could you please fix it? Or should we wait until the chart will be fixed?

The best reference is section 3.13 (Default Case Algorithms) of https://www.unicode.org/versions/latest/ch03.pdf. This is a lot of text, not as easy to understand as a table. But maybe this is better. People don't need a table, it's easy to create one with Ruby :-).
[Please not that this URI currently redirects to https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf, but I still have to upgrade Ruby to Unicode 14.0.0; hope to be able to do this in the next couple weeks.]

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96678

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:107758] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (9 preceding siblings ...)
  2022-02-27  5:36 ` [ruby-core:107753] " duerst
@ 2022-02-28  8:04 ` mame (Yusuke Endoh)
  2022-06-09  9:25 ` [ruby-core:108826] " mame (Yusuke Endoh)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2022-02-28  8:04 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by mame (Yusuke Endoh).


@duerst Thanks, I have created a PR. https://github.com/ruby/ruby/pull/5607

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-96683

* Author: andrykonchin (Andrew Konchin)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:108826] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE
  2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
                   ` (10 preceding siblings ...)
  2022-02-28  8:04 ` [ruby-core:107758] " mame (Yusuke Endoh)
@ 2022-06-09  9:25 ` mame (Yusuke Endoh)
  11 siblings, 0 replies; 13+ messages in thread
From: mame (Yusuke Endoh) @ 2022-06-09  9:25 UTC (permalink / raw
  To: ruby-core

Issue #18590 has been updated by mame (Yusuke Endoh).

Status changed from Open to Closed

Fixed at commit:bda4d91f0599a8e2d278bc13660a5576d4ced353

----------------------------------------
Bug #18590: String#downcase and CAPITAL LETTER I WITH DOT ABOVE
https://bugs.ruby-lang.org/issues/18590#change-97902

* Author: andrykonchin (Andrew Konchin)
* Status: Closed
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* ruby -v: 3.1.0p0
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
Downcasing for "İ" character works in an unexpected way:

```ruby
'İ'.downcase
=> "i̇"
```

Expected result - downcasing should return "i". Instead, it returns small "i" and additional "dot" character:

```ruby
'İ'.downcase.chars
=> ["i", "̇"]
```

According to the standard Unicode case mapping character 'İ'(0130) maps to lowercased 'i' (0069).

```
0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069;
```

https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-06-09  9:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-17 11:02 [ruby-core:107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE andrykonchin (Andrew Konchin)
2022-02-21  3:34 ` [ruby-core:107677] " mame (Yusuke Endoh)
2022-02-21  3:44 ` [ruby-core:107678] " mame (Yusuke Endoh)
2022-02-22 23:15 ` [ruby-core:107727] " andrykonchin (Andrew Konchin)
2022-02-23  8:17 ` [ruby-core:107731] " duerst
2022-02-23  9:27 ` [ruby-core:107732] " andrykonchin (Andrew Konchin)
2022-02-24  3:05 ` [ruby-core:107735] " mame (Yusuke Endoh)
2022-02-24  3:09 ` [ruby-core:107736] " mame (Yusuke Endoh)
2022-02-24  9:50 ` [ruby-core:107737] " duerst
2022-02-24 13:37 ` [ruby-core:107738] " mame (Yusuke Endoh)
2022-02-27  5:36 ` [ruby-core:107753] " duerst
2022-02-28  8:04 ` [ruby-core:107758] " mame (Yusuke Endoh)
2022-06-09  9:25 ` [ruby-core:108826] " mame (Yusuke Endoh)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).