From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS24940 94.130.0.0/16 X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_BL_SPAMCOP_NET,SPF_HELO_PASS, SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from nue.mailmanlists.eu (nue.mailmanlists.eu [94.130.110.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id CC55B1F601 for ; Tue, 6 Dec 2022 00:08:57 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.b="ezEtsyTo"; dkim-atps=neutral Received: from nue.mailmanlists.eu (localhost [127.0.0.1]) by nue.mailmanlists.eu (Postfix) with ESMTP id 264267E791; Tue, 6 Dec 2022 00:08:50 +0000 (UTC) Authentication-Results: nue.mailmanlists.eu; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ruby-lang.org header.i=@ruby-lang.org header.a=rsa-sha256 header.s=s1 header.b=ezEtsyTo; dkim-atps=neutral Received: from xtrwkhkc.outbound-mail.sendgrid.net (xtrwkhkc.outbound-mail.sendgrid.net [167.89.16.28]) by nue.mailmanlists.eu (Postfix) with ESMTPS id 2748C7E58E for ; Tue, 6 Dec 2022 00:08:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ruby-lang.org; h=from:references:subject:mime-version:content-type: content-transfer-encoding:list-id:to:cc:content-type:from:subject:to; s=s1; bh=QbcJru5nO6az5Uj50+3Mh+YB92JWhLxOEJ9QZkk/iq4=; b=ezEtsyTo/wThR/bWEtQOHGgBdlqsU/GiBs5HKOaVvBTsiI6wUphkH8pwua+QfBd0G85R 9/jCZnPCCTa3rZcOf9ZgH2NAVp+Khb2RikLaZiFOigkmY8slLKk1HaIPVbQ//8BSYCkzfJ Vl7il2oBPRiLenUYwWrQMuh7I9tfH/cuOvjgTI83M/zKLsZ6pm3omK/d0ThcsYi4f4iWcQ 0YgAozYn55qpVz8sTiBO9qATjHvcPUOsX0W4tOLQhXlZ5NI2mX87NT5onk3Ktcu42yLCcU ls6VCFeFfScKz7PmoX69rjNEkSwWIj/YRPGNd08+BOZgJEdqjLXxWvs/R5qZ3hBA== Received: by filterdrecv-6f5868ff54-hh46f with SMTP id filterdrecv-6f5868ff54-hh46f-1-638E880B-1E 2022-12-06 00:08:43.778774631 +0000 UTC m=+1558911.624005339 Received: from herokuapp.com (unknown) by geopod-ismtpd-3-1 (SG) with ESMTP id ABh7SNOPQWaDOv9myzkSYQ for ; Tue, 06 Dec 2022 00:08:43.717 +0000 (UTC) Date: Tue, 06 Dec 2022 00:08:43 +0000 (UTC) From: duerst Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Bug X-Redmine-Issue-Id: 19007 X-Redmine-Issue-Author: nobu X-Redmine-Issue-Assignee: duerst X-Redmine-Sender: duerst X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 87571 X-SG-EID: =?us-ascii?Q?sZFLhNnqjcW9Ht8ByfkgOQbWPVETXXFxLuf0UPwpQ=2FbXrx066LnwciY0taRhC1?= =?us-ascii?Q?++A6DTJJ4lt=2FzJkwBQUmmbJXhB+C4nQ033MjSGj?= =?us-ascii?Q?je1hQtTuORgj7ssB5Cpu9dDK4uNUoFxwR5L4pPR?= =?us-ascii?Q?1B7m7LBX1+V5ZbdqtKUtr82jVwv=2F6K+b0axB8hN?= =?us-ascii?Q?Go1h=2FEeB6XE13mY5mIc9A9XzIWqhYFju8y5jNhQ?= =?us-ascii?Q?xch3ieXEdWIn2kc2C0BlG0iyhSzRytjrrt9D76l?= =?us-ascii?Q?UMddQwMPvEBnDtEecah9g=3D=3D?= To: ruby-core@ml.ruby-lang.org X-Entity-ID: b/2+PoftWZ6GuOu3b0IycA== Message-ID-Hash: HLHDS6L6CCERLCSA7PU5IZR4FECV3QAK X-Message-ID-Hash: HLHDS6L6CCERLCSA7PU5IZR4FECV3QAK X-MailFrom: bounces+313651-b711-ruby-core=ml.ruby-lang.org@em5188.ruby-lang.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list Reply-To: Ruby developers Subject: [ruby-core:111215] [Ruby master Bug#19007] Unicode tables differences from Unicode.org 14.0 data List-Id: Ruby developers Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Issue #19007 has been updated by duerst (Martin D=FCrst). The wrong properties were fixed for version 14.0.0 with commit e31d645. Thi= s issue should stay open until we are sure what caused the wrong properties= in the first place. ---------------------------------------- Bug #19007: Unicode tables differences from Unicode.org 14.0 data https://bugs.ruby-lang.org/issues/19007#change-100505 * Author: nobu (Nobuyoshi Nakada) * Status: Open * Priority: Normal * Assignee: duerst (Martin D=FCrst) * Target version: 3.2 * ruby -v: 3.2.0 6898984f1cd * Backport: 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED ---------------------------------------- I found the header in Unicode Emoji 14.0 data files had changed slightly (a= nd again at 15.0), but `enc/unicode/case-folding.rb` didn't follow it. Then I fixed it and rebuilt the headers under `enc/unicode/14.0.0`, `name2c= type.h` had diffences from the master, as bellow. `CR_Lower`, `CR_Cased` and `CR_Other_Lowercase` just seem misses in the pre= vious operation, and no problems. But U+11720..U+11721 in `CR_Grapheme_Cluster_Break_SpacingMark` is absent i= n the original data of the Unicode.org. According to @naruse's investigation, it was removed at the commit [Update = to Unicode 14.0.0], while U+11720 is still SpacingMark in the latest https:= //www.unicode.org/reports/tr29/. [Update to Unicode 14.0.0]: https://github.com/latex3/unicode-data/commit/5= 570040ac8a30e2c2ca4912d415ecaa0498fa23a#diff-1e957b94de10ea96d32a338c005b1f= 05788af458cf335fc92683bc297e53ed94L582 ```diff diff --git a/enc/unicode/14.0.0/name2ctype.h b/enc/unicode/14.0.0/name2ctyp= e.h index 99a3eeca190..f49e5cd7273 100644 --- a/enc/unicode/14.0.0/name2ctype.h +++ b/enc/unicode/14.0.0/name2ctype.h @@ -1565,7 +1565,7 @@ static const OnigCodePoint CR_Graph[] =3D { =20 /* 'Lower': [[:Lower:]] */ static const OnigCodePoint CR_Lower[] =3D { - 664, + 668, 0x0061, 0x007a, 0x00aa, 0x00aa, 0x00b5, 0x00b5, @@ -2196,6 +2196,10 @@ static const OnigCodePoint CR_Lower[] =3D { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10cc0, 0x10cf2, 0x118c0, 0x118df, 0x16e60, 0x16e7f, @@ -12651,7 +12655,7 @@ static const OnigCodePoint CR_Math[] =3D { =20 /* 'Cased': Derived Property */ static const OnigCodePoint CR_Cased[] =3D { - 151, + 155, 0x0041, 0x005a, 0x0061, 0x007a, 0x00aa, 0x00aa, @@ -12763,6 +12767,10 @@ static const OnigCodePoint CR_Cased[] =3D { 0x105a3, 0x105b1, 0x105b3, 0x105b9, 0x105bb, 0x105bc, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, 0x10c80, 0x10cb2, 0x10cc0, 0x10cf2, 0x118a0, 0x118df, @@ -22615,7 +22623,7 @@ static const OnigCodePoint CR_Extender[] =3D { =20 /* 'Other_Lowercase': Binary Property */ static const OnigCodePoint CR_Other_Lowercase[] =3D { - 20, + 24, 0x00aa, 0x00aa, 0x00ba, 0x00ba, 0x02b0, 0x02b8, @@ -22636,6 +22644,10 @@ static const OnigCodePoint CR_Other_Lowercase[] = =3D { 0xa770, 0xa770, 0xa7f8, 0xa7f9, 0xab5c, 0xab5f, + 0x10780, 0x10780, + 0x10783, 0x10785, + 0x10787, 0x107b0, + 0x107b2, 0x107ba, }; /* CR_Other_Lowercase */ =20 /* 'Other_Uppercase': Binary Property */ @@ -37049,7 +37061,7 @@ static const OnigCodePoint CR_Grapheme_Cluster_Brea= k_Extend[] =3D { =20 /* 'Grapheme_Cluster_Break_SpacingMark': Grapheme_Cluster_Break=3DSpacingM= ark */ static const OnigCodePoint CR_Grapheme_Cluster_Break_SpacingMark[] =3D { - 161, + 160, 0x0903, 0x0903, 0x093b, 0x093b, 0x093e, 0x0940, @@ -37183,7 +37195,6 @@ static const OnigCodePoint CR_Grapheme_Cluster_Brea= k_SpacingMark[] =3D { 0x116ac, 0x116ac, 0x116ae, 0x116af, 0x116b6, 0x116b6, - 0x11720, 0x11721, 0x11726, 0x11726, 0x1182c, 0x1182e, 0x11838, 0x11838, ``` --=20 https://bugs.ruby-lang.org/ ______________________________________________ ruby-core mailing list -- ruby-core@ml.ruby-lang.org To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c= ore.ml.ruby-lang.org/