ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0
@ 2022-09-17  4:10 nobu (Nobuyoshi Nakada)
  2022-09-18 16:01 ` [ruby-core:109950] " nobu (Nobuyoshi Nakada)
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: nobu (Nobuyoshi Nakada) @ 2022-09-17  4:10 UTC (permalink / raw)
  To: ruby-core

Issue #19007 has been reported by nobu (Nobuyoshi Nakada).

----------------------------------------
Bug #19007: Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0
https://bugs.ruby-lang.org/issues/19007

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* Target version: 3.2
* ruby -v: 3.2.0 6898984f1cd
* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED
----------------------------------------
I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.
Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.

`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.

But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.
According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.

[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582

```diff
diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h
index 99a3eeca190..f49e5cd7273 100644
--- a/enc/unicode/14.0.0/name2ctype.h
+++ b/enc/unicode/14.0.0/name2ctype.h
@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {
 
 /* 'Lower': [[:Lower:]] */
 static const OnigCodePoint CR_Lower[] = {
-	664,
+	668,
 	0x0061, 0x007a,
 	0x00aa, 0x00aa,
 	0x00b5, 0x00b5,
@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {
 	0x105a3, 0x105b1,
 	0x105b3, 0x105b9,
 	0x105bb, 0x105bc,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 	0x10cc0, 0x10cf2,
 	0x118c0, 0x118df,
 	0x16e60, 0x16e7f,
@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {
 
 /* 'Cased': Derived Property */
 static const OnigCodePoint CR_Cased[] = {
-	151,
+	155,
 	0x0041, 0x005a,
 	0x0061, 0x007a,
 	0x00aa, 0x00aa,
@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {
 	0x105a3, 0x105b1,
 	0x105b3, 0x105b9,
 	0x105bb, 0x105bc,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 	0x10c80, 0x10cb2,
 	0x10cc0, 0x10cf2,
 	0x118a0, 0x118df,
@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {
 
 /* 'Other_Lowercase': Binary Property */
 static const OnigCodePoint CR_Other_Lowercase[] = {
-	20,
+	24,
 	0x00aa, 0x00aa,
 	0x00ba, 0x00ba,
 	0x02b0, 0x02b8,
@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {
 	0xa770, 0xa770,
 	0xa7f8, 0xa7f9,
 	0xab5c, 0xab5f,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 }; /* CR_Other_Lowercase */
 
 /* 'Other_Uppercase': Binary Property */
@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {
 
 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */
 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {
-	161,
+	160,
 	0x0903, 0x0903,
 	0x093b, 0x093b,
 	0x093e, 0x0940,
@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {
 	0x116ac, 0x116ac,
 	0x116ae, 0x116af,
 	0x116b6, 0x116b6,
-	0x11720, 0x11721,
 	0x11726, 0x11726,
 	0x1182c, 0x1182e,
 	0x11838, 0x11838,
```



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [ruby-core:109950] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0
  2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
@ 2022-09-18 16:01 ` nobu (Nobuyoshi Nakada)
  2022-10-07 13:11 ` [ruby-core:110229] " nobu (Nobuyoshi Nakada)
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: nobu (Nobuyoshi Nakada) @ 2022-09-18 16:01 UTC (permalink / raw)
  To: ruby-core

Issue #19007 has been updated by nobu (Nobuyoshi Nakada).


https://github.com/nobu/ruby/tree/emoji

----------------------------------------
Bug #19007: Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0
https://bugs.ruby-lang.org/issues/19007#change-99198

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* Target version: 3.2
* ruby -v: 3.2.0 6898984f1cd
* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED
----------------------------------------
I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.
Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.

`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.

But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.
According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.

[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582

```diff
diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h
index 99a3eeca190..f49e5cd7273 100644
--- a/enc/unicode/14.0.0/name2ctype.h
+++ b/enc/unicode/14.0.0/name2ctype.h
@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {
 
 /* 'Lower': [[:Lower:]] */
 static const OnigCodePoint CR_Lower[] = {
-	664,
+	668,
 	0x0061, 0x007a,
 	0x00aa, 0x00aa,
 	0x00b5, 0x00b5,
@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {
 	0x105a3, 0x105b1,
 	0x105b3, 0x105b9,
 	0x105bb, 0x105bc,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 	0x10cc0, 0x10cf2,
 	0x118c0, 0x118df,
 	0x16e60, 0x16e7f,
@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {
 
 /* 'Cased': Derived Property */
 static const OnigCodePoint CR_Cased[] = {
-	151,
+	155,
 	0x0041, 0x005a,
 	0x0061, 0x007a,
 	0x00aa, 0x00aa,
@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {
 	0x105a3, 0x105b1,
 	0x105b3, 0x105b9,
 	0x105bb, 0x105bc,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 	0x10c80, 0x10cb2,
 	0x10cc0, 0x10cf2,
 	0x118a0, 0x118df,
@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {
 
 /* 'Other_Lowercase': Binary Property */
 static const OnigCodePoint CR_Other_Lowercase[] = {
-	20,
+	24,
 	0x00aa, 0x00aa,
 	0x00ba, 0x00ba,
 	0x02b0, 0x02b8,
@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {
 	0xa770, 0xa770,
 	0xa7f8, 0xa7f9,
 	0xab5c, 0xab5f,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 }; /* CR_Other_Lowercase */
 
 /* 'Other_Uppercase': Binary Property */
@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {
 
 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */
 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {
-	161,
+	160,
 	0x0903, 0x0903,
 	0x093b, 0x093b,
 	0x093e, 0x0940,
@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {
 	0x116ac, 0x116ac,
 	0x116ae, 0x116af,
 	0x116b6, 0x116b6,
-	0x11720, 0x11721,
 	0x11726, 0x11726,
 	0x1182c, 0x1182e,
 	0x11838, 0x11838,
```



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [ruby-core:110229] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0
  2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
  2022-09-18 16:01 ` [ruby-core:109950] " nobu (Nobuyoshi Nakada)
@ 2022-10-07 13:11 ` nobu (Nobuyoshi Nakada)
  2022-12-06  0:08 ` [ruby-core:111215] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data duerst
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: nobu (Nobuyoshi Nakada) @ 2022-10-07 13:11 UTC (permalink / raw)
  To: ruby-core

Issue #19007 has been updated by nobu (Nobuyoshi Nakada).


nobu (Nobuyoshi Nakada) wrote:
> But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.
> According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.

Read https://www.unicode.org/reports/tr29/#SpacingMark again, and found U+11720 and U+11721 are listed as exceptions.

> Exceptions: The following (which have General_Category = Spacing_Mark and would otherwise be included) are specifically excluded:

That means these are removed intentionally.

----------------------------------------
Bug #19007: Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0
https://bugs.ruby-lang.org/issues/19007#change-99513

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: duerst (Martin Dürst)
* Target version: 3.2
* ruby -v: 3.2.0 6898984f1cd
* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED
----------------------------------------
I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.
Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.

`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.

But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.
According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.

[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582

```diff
diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h
index 99a3eeca190..f49e5cd7273 100644
--- a/enc/unicode/14.0.0/name2ctype.h
+++ b/enc/unicode/14.0.0/name2ctype.h
@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {
 
 /* 'Lower': [[:Lower:]] */
 static const OnigCodePoint CR_Lower[] = {
-	664,
+	668,
 	0x0061, 0x007a,
 	0x00aa, 0x00aa,
 	0x00b5, 0x00b5,
@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {
 	0x105a3, 0x105b1,
 	0x105b3, 0x105b9,
 	0x105bb, 0x105bc,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 	0x10cc0, 0x10cf2,
 	0x118c0, 0x118df,
 	0x16e60, 0x16e7f,
@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {
 
 /* 'Cased': Derived Property */
 static const OnigCodePoint CR_Cased[] = {
-	151,
+	155,
 	0x0041, 0x005a,
 	0x0061, 0x007a,
 	0x00aa, 0x00aa,
@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {
 	0x105a3, 0x105b1,
 	0x105b3, 0x105b9,
 	0x105bb, 0x105bc,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 	0x10c80, 0x10cb2,
 	0x10cc0, 0x10cf2,
 	0x118a0, 0x118df,
@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {
 
 /* 'Other_Lowercase': Binary Property */
 static const OnigCodePoint CR_Other_Lowercase[] = {
-	20,
+	24,
 	0x00aa, 0x00aa,
 	0x00ba, 0x00ba,
 	0x02b0, 0x02b8,
@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {
 	0xa770, 0xa770,
 	0xa7f8, 0xa7f9,
 	0xab5c, 0xab5f,
+	0x10780, 0x10780,
+	0x10783, 0x10785,
+	0x10787, 0x107b0,
+	0x107b2, 0x107ba,
 }; /* CR_Other_Lowercase */
 
 /* 'Other_Uppercase': Binary Property */
@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {
 
 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */
 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {
-	161,
+	160,
 	0x0903, 0x0903,
 	0x093b, 0x093b,
 	0x093e, 0x0940,
@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {
 	0x116ac, 0x116ac,
 	0x116ae, 0x116af,
 	0x116b6, 0x116b6,
-	0x11720, 0x11721,
 	0x11726, 0x11726,
 	0x1182c, 0x1182e,
 	0x11838, 0x11838,
```



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [ruby-core:111215] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
  2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
  2022-09-18 16:01 ` [ruby-core:109950] " nobu (Nobuyoshi Nakada)
  2022-10-07 13:11 ` [ruby-core:110229] " nobu (Nobuyoshi Nakada)
@ 2022-12-06  0:08 ` duerst
  2022-12-12  4:31 ` [ruby-core:111257] " hsbt (Hiroshi SHIBATA)
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: duerst @ 2022-12-06  0:08 UTC (permalink / raw)
  To: ruby-core

Issue #19007 has been updated by duerst (Martin Dürst).





The wrong properties were fixed for version 14.0.0 with commit e31d645. This issue should stay open until we are sure what caused the wrong properties in the first place.



----------------------------------------

Bug #19007: Unicode tables differences from Unicode.org 14.0 data

https://bugs.ruby-lang.org/issues/19007#change-100505



* Author: nobu (Nobuyoshi Nakada)

* Status: Open

* Priority: Normal

* Assignee: duerst (Martin Dürst)

* Target version: 3.2

* ruby -v: 3.2.0 6898984f1cd

* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

----------------------------------------

I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.

Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.



`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.



But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.

According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.



[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582



```diff

diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h

index 99a3eeca190..f49e5cd7273 100644

--- a/enc/unicode/14.0.0/name2ctype.h

+++ b/enc/unicode/14.0.0/name2ctype.h

@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {

 

 /* 'Lower': [[:Lower:]] */

 static const OnigCodePoint CR_Lower[] = {

-	664,

+	668,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

 	0x00b5, 0x00b5,

@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10cc0, 0x10cf2,

 	0x118c0, 0x118df,

 	0x16e60, 0x16e7f,

@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {

 

 /* 'Cased': Derived Property */

 static const OnigCodePoint CR_Cased[] = {

-	151,

+	155,

 	0x0041, 0x005a,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10c80, 0x10cb2,

 	0x10cc0, 0x10cf2,

 	0x118a0, 0x118df,

@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {

 

 /* 'Other_Lowercase': Binary Property */

 static const OnigCodePoint CR_Other_Lowercase[] = {

-	20,

+	24,

 	0x00aa, 0x00aa,

 	0x00ba, 0x00ba,

 	0x02b0, 0x02b8,

@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {

 	0xa770, 0xa770,

 	0xa7f8, 0xa7f9,

 	0xab5c, 0xab5f,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 }; /* CR_Other_Lowercase */

 

 /* 'Other_Uppercase': Binary Property */

@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {

 

 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */

 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

-	161,

+	160,

 	0x0903, 0x0903,

 	0x093b, 0x093b,

 	0x093e, 0x0940,

@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

 	0x116ac, 0x116ac,

 	0x116ae, 0x116af,

 	0x116b6, 0x116b6,

-	0x11720, 0x11721,

 	0x11726, 0x11726,

 	0x1182c, 0x1182e,

 	0x11838, 0x11838,

```







-- 

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:111257] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
  2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
                   ` (2 preceding siblings ...)
  2022-12-06  0:08 ` [ruby-core:111215] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data duerst
@ 2022-12-12  4:31 ` hsbt (Hiroshi SHIBATA)
  2022-12-12  4:51 ` [ruby-core:111258] " duerst
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: hsbt (Hiroshi SHIBATA) @ 2022-12-12  4:31 UTC (permalink / raw)
  To: ruby-core

Issue #19007 has been updated by hsbt (Hiroshi SHIBATA).





@duerst Is there any action for Ruby 3.2 related this? If there is nothing to do for Ruby 3.2, I'll remove this from Ruby 3.2 milestone.



----------------------------------------

Bug #19007: Unicode tables differences from Unicode.org 14.0 data

https://bugs.ruby-lang.org/issues/19007#change-100561



* Author: nobu (Nobuyoshi Nakada)

* Status: Open

* Priority: Normal

* Assignee: duerst (Martin Dürst)

* Target version: 3.2

* ruby -v: 3.2.0 6898984f1cd

* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

----------------------------------------

I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.

Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.



`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.



But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.

According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.



[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582



```diff

diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h

index 99a3eeca190..f49e5cd7273 100644

--- a/enc/unicode/14.0.0/name2ctype.h

+++ b/enc/unicode/14.0.0/name2ctype.h

@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {

 

 /* 'Lower': [[:Lower:]] */

 static const OnigCodePoint CR_Lower[] = {

-	664,

+	668,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

 	0x00b5, 0x00b5,

@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10cc0, 0x10cf2,

 	0x118c0, 0x118df,

 	0x16e60, 0x16e7f,

@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {

 

 /* 'Cased': Derived Property */

 static const OnigCodePoint CR_Cased[] = {

-	151,

+	155,

 	0x0041, 0x005a,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10c80, 0x10cb2,

 	0x10cc0, 0x10cf2,

 	0x118a0, 0x118df,

@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {

 

 /* 'Other_Lowercase': Binary Property */

 static const OnigCodePoint CR_Other_Lowercase[] = {

-	20,

+	24,

 	0x00aa, 0x00aa,

 	0x00ba, 0x00ba,

 	0x02b0, 0x02b8,

@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {

 	0xa770, 0xa770,

 	0xa7f8, 0xa7f9,

 	0xab5c, 0xab5f,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 }; /* CR_Other_Lowercase */

 

 /* 'Other_Uppercase': Binary Property */

@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {

 

 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */

 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

-	161,

+	160,

 	0x0903, 0x0903,

 	0x093b, 0x093b,

 	0x093e, 0x0940,

@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

 	0x116ac, 0x116ac,

 	0x116ae, 0x116af,

 	0x116b6, 0x116b6,

-	0x11720, 0x11721,

 	0x11726, 0x11726,

 	0x1182c, 0x1182e,

 	0x11838, 0x11838,

```







-- 

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:111258] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
  2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
                   ` (3 preceding siblings ...)
  2022-12-12  4:31 ` [ruby-core:111257] " hsbt (Hiroshi SHIBATA)
@ 2022-12-12  4:51 ` duerst
  2022-12-12  5:05 ` [ruby-core:111259] " hsbt (Hiroshi SHIBATA)
  2023-08-31  3:33 ` [ruby-core:114602] " jeremyevans0 (Jeremy Evans) via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: duerst @ 2022-12-12  4:51 UTC (permalink / raw)
  To: ruby-core

Issue #19007 has been updated by duerst (Martin Dürst).





hsbt (Hiroshi SHIBATA) wrote in #note-6:

> @duerst Is there any action for Ruby 3.2 related this? If there is nothing to do for Ruby 3.2, I'll remove this from Ruby 3.2 milestone.



No, nothing should be needed for Ruby 3.2. I removed it myself.



BTW, can we add a "3.3" target version?



----------------------------------------

Bug #19007: Unicode tables differences from Unicode.org 14.0 data

https://bugs.ruby-lang.org/issues/19007#change-100566



* Author: nobu (Nobuyoshi Nakada)

* Status: Open

* Priority: Normal

* Assignee: duerst (Martin Dürst)

* ruby -v: 3.2.0 6898984f1cd

* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

----------------------------------------

I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.

Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.



`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.



But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.

According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.



[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582



```diff

diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h

index 99a3eeca190..f49e5cd7273 100644

--- a/enc/unicode/14.0.0/name2ctype.h

+++ b/enc/unicode/14.0.0/name2ctype.h

@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {

 

 /* 'Lower': [[:Lower:]] */

 static const OnigCodePoint CR_Lower[] = {

-	664,

+	668,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

 	0x00b5, 0x00b5,

@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10cc0, 0x10cf2,

 	0x118c0, 0x118df,

 	0x16e60, 0x16e7f,

@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {

 

 /* 'Cased': Derived Property */

 static const OnigCodePoint CR_Cased[] = {

-	151,

+	155,

 	0x0041, 0x005a,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10c80, 0x10cb2,

 	0x10cc0, 0x10cf2,

 	0x118a0, 0x118df,

@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {

 

 /* 'Other_Lowercase': Binary Property */

 static const OnigCodePoint CR_Other_Lowercase[] = {

-	20,

+	24,

 	0x00aa, 0x00aa,

 	0x00ba, 0x00ba,

 	0x02b0, 0x02b8,

@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {

 	0xa770, 0xa770,

 	0xa7f8, 0xa7f9,

 	0xab5c, 0xab5f,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 }; /* CR_Other_Lowercase */

 

 /* 'Other_Uppercase': Binary Property */

@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {

 

 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */

 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

-	161,

+	160,

 	0x0903, 0x0903,

 	0x093b, 0x093b,

 	0x093e, 0x0940,

@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

 	0x116ac, 0x116ac,

 	0x116ae, 0x116af,

 	0x116b6, 0x116b6,

-	0x11720, 0x11721,

 	0x11726, 0x11726,

 	0x1182c, 0x1182e,

 	0x11838, 0x11838,

```







-- 

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:111259] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
  2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
                   ` (4 preceding siblings ...)
  2022-12-12  4:51 ` [ruby-core:111258] " duerst
@ 2022-12-12  5:05 ` hsbt (Hiroshi SHIBATA)
  2023-08-31  3:33 ` [ruby-core:114602] " jeremyevans0 (Jeremy Evans) via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: hsbt (Hiroshi SHIBATA) @ 2022-12-12  5:05 UTC (permalink / raw)
  To: ruby-core

Issue #19007 has been updated by hsbt (Hiroshi SHIBATA).





Thanks. I create the [3.3 milestone](https://bugs.ruby-lang.org/versions/71)



----------------------------------------

Bug #19007: Unicode tables differences from Unicode.org 14.0 data

https://bugs.ruby-lang.org/issues/19007#change-100569



* Author: nobu (Nobuyoshi Nakada)

* Status: Open

* Priority: Normal

* Assignee: duerst (Martin Dürst)

* ruby -v: 3.2.0 6898984f1cd

* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

----------------------------------------

I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.

Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.



`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.



But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.

According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.



[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582



```diff

diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h

index 99a3eeca190..f49e5cd7273 100644

--- a/enc/unicode/14.0.0/name2ctype.h

+++ b/enc/unicode/14.0.0/name2ctype.h

@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {

 

 /* 'Lower': [[:Lower:]] */

 static const OnigCodePoint CR_Lower[] = {

-	664,

+	668,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

 	0x00b5, 0x00b5,

@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10cc0, 0x10cf2,

 	0x118c0, 0x118df,

 	0x16e60, 0x16e7f,

@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {

 

 /* 'Cased': Derived Property */

 static const OnigCodePoint CR_Cased[] = {

-	151,

+	155,

 	0x0041, 0x005a,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10c80, 0x10cb2,

 	0x10cc0, 0x10cf2,

 	0x118a0, 0x118df,

@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {

 

 /* 'Other_Lowercase': Binary Property */

 static const OnigCodePoint CR_Other_Lowercase[] = {

-	20,

+	24,

 	0x00aa, 0x00aa,

 	0x00ba, 0x00ba,

 	0x02b0, 0x02b8,

@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {

 	0xa770, 0xa770,

 	0xa7f8, 0xa7f9,

 	0xab5c, 0xab5f,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 }; /* CR_Other_Lowercase */

 

 /* 'Other_Uppercase': Binary Property */

@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {

 

 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */

 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

-	161,

+	160,

 	0x0903, 0x0903,

 	0x093b, 0x093b,

 	0x093e, 0x0940,

@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

 	0x116ac, 0x116ac,

 	0x116ae, 0x116af,

 	0x116b6, 0x116b6,

-	0x11720, 0x11721,

 	0x11726, 0x11726,

 	0x1182c, 0x1182e,

 	0x11838, 0x11838,

```







-- 

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:114602] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
  2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
                   ` (5 preceding siblings ...)
  2022-12-12  5:05 ` [ruby-core:111259] " hsbt (Hiroshi SHIBATA)
@ 2023-08-31  3:33 ` jeremyevans0 (Jeremy Evans) via ruby-core
  6 siblings, 0 replies; 8+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2023-08-31  3:33 UTC (permalink / raw)
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #19007 has been updated by jeremyevans0 (Jeremy Evans).





@duerst Did this issue reoccur in the update to Unicode 15?  If not, do you think this can be closed?



----------------------------------------

Bug #19007: Unicode tables differences from Unicode.org 14.0 data

https://bugs.ruby-lang.org/issues/19007#change-104423



* Author: nobu (Nobuyoshi Nakada)

* Status: Open

* Priority: Normal

* Assignee: duerst (Martin Dürst)

* ruby -v: 3.2.0 6898984f1cd

* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

----------------------------------------

I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.

Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.



`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.



But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.

According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.



[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582



```diff

diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h

index 99a3eeca190..f49e5cd7273 100644

--- a/enc/unicode/14.0.0/name2ctype.h

+++ b/enc/unicode/14.0.0/name2ctype.h

@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {

 

 /* 'Lower': [[:Lower:]] */

 static const OnigCodePoint CR_Lower[] = {

-	664,

+	668,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

 	0x00b5, 0x00b5,

@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10cc0, 0x10cf2,

 	0x118c0, 0x118df,

 	0x16e60, 0x16e7f,

@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {

 

 /* 'Cased': Derived Property */

 static const OnigCodePoint CR_Cased[] = {

-	151,

+	155,

 	0x0041, 0x005a,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10c80, 0x10cb2,

 	0x10cc0, 0x10cf2,

 	0x118a0, 0x118df,

@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {

 

 /* 'Other_Lowercase': Binary Property */

 static const OnigCodePoint CR_Other_Lowercase[] = {

-	20,

+	24,

 	0x00aa, 0x00aa,

 	0x00ba, 0x00ba,

 	0x02b0, 0x02b8,

@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {

 	0xa770, 0xa770,

 	0xa7f8, 0xa7f9,

 	0xab5c, 0xab5f,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 }; /* CR_Other_Lowercase */

 

 /* 'Other_Uppercase': Binary Property */

@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {

 

 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */

 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

-	161,

+	160,

 	0x0903, 0x0903,

 	0x093b, 0x093b,

 	0x093e, 0x0940,

@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

 	0x116ac, 0x116ac,

 	0x116ae, 0x116af,

 	0x116b6, 0x116b6,

-	0x11720, 0x11721,

 	0x11726, 0x11726,

 	0x1182c, 0x1182e,

 	0x11838, 0x11838,

```







-- 

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-31  3:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
2022-09-18 16:01 ` [ruby-core:109950] " nobu (Nobuyoshi Nakada)
2022-10-07 13:11 ` [ruby-core:110229] " nobu (Nobuyoshi Nakada)
2022-12-06  0:08 ` [ruby-core:111215] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data duerst
2022-12-12  4:31 ` [ruby-core:111257] " hsbt (Hiroshi SHIBATA)
2022-12-12  4:51 ` [ruby-core:111258] " duerst
2022-12-12  5:05 ` [ruby-core:111259] " hsbt (Hiroshi SHIBATA)
2023-08-31  3:33 ` [ruby-core:114602] " jeremyevans0 (Jeremy Evans) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).