ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "jeremyevans0 (Jeremy Evans) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "jeremyevans0 (Jeremy Evans)" <noreply@ruby-lang.org>
Subject: [ruby-core:114602] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data
Date: Thu, 31 Aug 2023 03:33:07 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-104423.20230831033307.4@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-19007.20220917041048.4@ruby-lang.org

Issue #19007 has been updated by jeremyevans0 (Jeremy Evans).





@duerst Did this issue reoccur in the update to Unicode 15?  If not, do you think this can be closed?



----------------------------------------

Bug #19007: Unicode tables differences from Unicode.org 14.0 data

https://bugs.ruby-lang.org/issues/19007#change-104423



* Author: nobu (Nobuyoshi Nakada)

* Status: Open

* Priority: Normal

* Assignee: duerst (Martin Dürst)

* ruby -v: 3.2.0 6898984f1cd

* Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED

----------------------------------------

I found the header in Unicode Emoji 14.0 data files had changed slightly (and again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it.

Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2ctype.h` had diffences from the master, as bellow.



`CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the previous operation, and no problems.



But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent in the original data of the Unicode.org.

According to @naruse's investigation, it was removed at the commit [Update to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https://www.unicode.org/reports/tr29/.



[Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f05788af458cf335fc92683bc297e53ed94L582



```diff

diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctype.h

index 99a3eeca190..f49e5cd7273 100644

--- a/enc/unicode/14.0.0/name2ctype.h

+++ b/enc/unicode/14.0.0/name2ctype.h

@@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] = {

 

 /* 'Lower': [[:Lower:]] */

 static const OnigCodePoint CR_Lower[] = {

-	664,

+	668,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

 	0x00b5, 0x00b5,

@@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10cc0, 0x10cf2,

 	0x118c0, 0x118df,

 	0x16e60, 0x16e7f,

@@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] = {

 

 /* 'Cased': Derived Property */

 static const OnigCodePoint CR_Cased[] = {

-	151,

+	155,

 	0x0041, 0x005a,

 	0x0061, 0x007a,

 	0x00aa, 0x00aa,

@@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] = {

 	0x105a3, 0x105b1,

 	0x105b3, 0x105b9,

 	0x105bb, 0x105bc,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 	0x10c80, 0x10cb2,

 	0x10cc0, 0x10cf2,

 	0x118a0, 0x118df,

@@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] = {

 

 /* 'Other_Lowercase': Binary Property */

 static const OnigCodePoint CR_Other_Lowercase[] = {

-	20,

+	24,

 	0x00aa, 0x00aa,

 	0x00ba, 0x00ba,

 	0x02b0, 0x02b8,

@@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = {

 	0xa770, 0xa770,

 	0xa7f8, 0xa7f9,

 	0xab5c, 0xab5f,

+	0x10780, 0x10780,

+	0x10783, 0x10785,

+	0x10787, 0x107b0,

+	0x107b2, 0x107ba,

 }; /* CR_Other_Lowercase */

 

 /* 'Other_Uppercase': Binary Property */

@@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_Extend[] = {

 

 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=SpacingMark */

 static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

-	161,

+	160,

 	0x0903, 0x0903,

 	0x093b, 0x093b,

 	0x093e, 0x0940,

@@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] = {

 	0x116ac, 0x116ac,

 	0x116ae, 0x116af,

 	0x116b6, 0x116b6,

-	0x11720, 0x11721,

 	0x11726, 0x11726,

 	0x1182c, 0x1182e,

 	0x11838, 0x11838,

```







-- 

https://bugs.ruby-lang.org/

 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

      parent reply	other threads:[~2023-08-31  3:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-17  4:10 [ruby-core:109930] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data and removed properties since 13.0 nobu (Nobuyoshi Nakada)
2022-09-18 16:01 ` [ruby-core:109950] " nobu (Nobuyoshi Nakada)
2022-10-07 13:11 ` [ruby-core:110229] " nobu (Nobuyoshi Nakada)
2022-12-06  0:08 ` [ruby-core:111215] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data duerst
2022-12-12  4:31 ` [ruby-core:111257] " hsbt (Hiroshi SHIBATA)
2022-12-12  4:51 ` [ruby-core:111258] " duerst
2022-12-12  5:05 ` [ruby-core:111259] " hsbt (Hiroshi SHIBATA)
2023-08-31  3:33 ` jeremyevans0 (Jeremy Evans) via ruby-core [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-104423.20230831033307.4@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=noreply@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).