From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 39BE61F8C6 for ; Thu, 2 Sep 2021 02:06:21 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0CD6B385DC01 for ; Thu, 2 Sep 2021 02:06:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0CD6B385DC01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630548380; bh=+k6P5beURJbmiTfopSmbp8WQEYyxJUTZbH9+k9O377g=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=euB8KyI0KyqMpjZKtuzgyqU5DHqrudFUJo82Z4gv8dyzNki4FfyfZKTC6N1c6onyK jmUAA+zcyhuw6S0cdr0bYxaCEgdODR/ZSK1AOokHWjS88c6M7/Zt7mM29RQnL41sdc h6I7lYBk+oczcXkbz4lvDj63BRQvMsKS80Fkyn1g= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id DBDD93858415 for ; Thu, 2 Sep 2021 02:05:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DBDD93858415 Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-182-B4-u7xdzNaW5ShHVVDYFBw-1; Wed, 01 Sep 2021 22:05:50 -0400 X-MC-Unique: B4-u7xdzNaW5ShHVVDYFBw-1 Received: by mail-qv1-f72.google.com with SMTP id w6-20020a0cfc46000000b00370b0997afeso389577qvp.15 for ; Wed, 01 Sep 2021 19:05:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=+k6P5beURJbmiTfopSmbp8WQEYyxJUTZbH9+k9O377g=; b=BIyEMwJ/bRRopr5pG37O3Yxa3hPv3fyQ77CWrwZp8hRmQfrT0Rewyg4OkXlnWlDolr 3u5LZQrxGVDyuZCi9Hncs1bjP0uaQdPFTGz8Xc9VR5gy3N87xl3ac9oxw/uFNKnrIzZl R73hIBZIYF2pMUIy06wR/Qm9Dr91soYNeTUJyqckJzJNFE9cepsJ1lewMJ97F2DYJqzE kB7+vdIdjzjss1iVEVSKLTMBvwe7UptsEpcoekiXHltNmFGj3v2QyVCS/X45aXEPcA3N GaSfZoKWLKzMfGz7+s0cwxQYPf5fOthUjbth4CbvNLnohdK1SoSZXTEl1aXec4rhB53C EeHA== X-Gm-Message-State: AOAM533AHMYtJ4uxFxOWGmzRfY+DHKUe6xsC7B0kVW28M2wekqHQ9drG mEFsaAxeTtZrAi2vVJjcAZEb4M/CvBSdPCerSwTI2wOa3CXNWmXrMIWcM7zk7BQVCzJIQHQuFuZ QHUp7JvPMXdhpnN1X3H4/c7dUAMm3AIwnJ2JmEQhAeLg6DWekfZ/3ZJyBgUYkGvwbMOv52w== X-Received: by 2002:ac8:dc9:: with SMTP id t9mr783357qti.293.1630548349964; Wed, 01 Sep 2021 19:05:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx2TdPN5iF//DIndUUh1SOg5eZdLZJTIKXmPtyXbV7l9NtaNkcRwIjqqNU7aPJjXV6mOPMnlg== X-Received: by 2002:ac8:dc9:: with SMTP id t9mr783339qti.293.1630548349650; Wed, 01 Sep 2021 19:05:49 -0700 (PDT) Received: from athas.redhat.com (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id j184sm402795qkd.74.2021.09.01.19.05.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Sep 2021 19:05:48 -0700 (PDT) To: libc-alpha@sourceware.org, fweimer@redhat.com Subject: [PATCH v9 0/2] C.UTF-8 Date: Wed, 1 Sep 2021 22:05:44 -0400 Message-Id: <20210902020546.90935-1-carlos@redhat.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Carlos O'Donell via Libc-alpha Reply-To: Carlos O'Donell Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" The following changes implement a minimally sized C.UTF-8. First we implement the 'codepoint_collation' directive. Then we implement C.UTF-8 with an LC_COLLATE that uses the 'codepoint_collation' directive to support using strcmp or wcscmp for collation i.e. code point sorting. The final C.UTF-8 is only ~396KiB with the largest ~346KiB in LC_CTYPE for all of Unicode. v9 is rebased against the changes to remove ISO-8859-1 characters from the bug-regex1.c test (69623c0db0a540f26ee537bae09446d3dcdf1f80). v8 includes a NEWS entry for the updated C.UTF-8. v7 fixed the regressions detected in Fedora Rawhide here: https://bugzilla.redhat.com/show_bug.cgi?id=1986421, but does so by generating identity tables for _NL_COLLATE_COLLSEQMB, and _NL_COLLATE_COLLSEQWC to provide mappings for ASCII characters. This ensures that static applications using the new C.UTF-8 have a functioning fnmatch, regcomp, and regexec for ASCII ranges. This raises the size of LC_COLLATE from 92 to 1406 bytes. Valgrind reports no errors using the tables with C.UTF-8 under tst-fnmatch. v7 also corrected collation sequence byte ordering on BE targets, and I verified this by building crossed locales with localedef --big-endian and confirming that s390x built native C.UTF-8 is the same as an x86_64 C.UTF-8 built wtih --big-endian. The fixes that were in v4 for nrules == 0 will be included in the next release of glibc, and when those are proven correct they can be backported to provide dyanmic or newly compiled static applications with the ability to use all code points in ranges. Carlos O'Donell (2): Add 'codepoint_collation' support for LC_COLLATE. Add generic C.UTF-8 locale (Bug 17318) NEWS | 10 +- iconv/Makefile | 22 +- iconv/tst-iconv9.c | 87 +++++ locale/C-collate-seq.c | 101 ++++++ locale/C-collate.c | 78 +---- locale/programs/ld-collate.c | 36 +- locale/programs/locfile-kw.gperf | 1 + locale/programs/locfile-kw.h | 299 ++++++++--------- locale/programs/locfile-token.h | 1 + localedata/C.UTF-8.in | 157 +++++++++ localedata/Makefile | 2 + localedata/SUPPORTED | 1 + localedata/locales/C | 194 +++++++++++ posix/Makefile | 16 +- posix/bug-regex1.c | 20 ++ posix/bug-regex19.c | 22 +- posix/bug-regex4.c | 25 ++ posix/bug-regex6.c | 2 +- posix/transbug.c | 22 +- posix/tst-fnmatch.input | 549 ++++++++++++++++++++++++++++++- posix/tst-regcomp-truncated.c | 1 + posix/tst-regex.c | 25 +- 22 files changed, 1413 insertions(+), 258 deletions(-) create mode 100644 iconv/tst-iconv9.c create mode 100644 locale/C-collate-seq.c create mode 100644 localedata/C.UTF-8.in create mode 100644 localedata/locales/C -- 2.31.1