From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id D76721F8C6 for ; Sun, 1 Aug 2021 05:54:33 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5887A383B817 for ; Sun, 1 Aug 2021 05:54:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5887A383B817 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627797272; bh=IeZXTBLbumspFtcA3KrK1XPL844y6VPgK3b+Kl30BFU=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=V1YOq3vEwyXxZhr9pMZHiadS+UCdj8QvHla+ou4/bF/YBCbGsuteWh7UYAkjhf/DR erSw04c/KStIM+ptnhBT11C8CAzF65QiPr0xMroYzsT+h6cWXsYcLX4zdP8zcWydJ5 7kKVzfKjhxZVdm5K7NsNoI87l3OgmeUKVxd2buxM= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 1E6253855021 for ; Sun, 1 Aug 2021 05:54:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1E6253855021 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-24-uvS-lrIZMAm7WJ20eKeEGw-1; Sun, 01 Aug 2021 01:54:09 -0400 X-MC-Unique: uvS-lrIZMAm7WJ20eKeEGw-1 Received: by mail-qv1-f71.google.com with SMTP id fq10-20020a056214258ab02903395e637cf9so5954287qvb.15 for ; Sat, 31 Jul 2021 22:54:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=IeZXTBLbumspFtcA3KrK1XPL844y6VPgK3b+Kl30BFU=; b=rvAM9pZQ/NdDnuEvHaBiwl+i7ipQmMLFGYD3lqyG0RPjwYoBGXRGN1+03DKrePZE/5 W893eU/ilqmsN43IGb+1Vrbgdzgk7LFo7aJQpeQK/loAkY4JylAdKoLLYNMImZL7nUfh 8cwbupZNwfKjYXrIf5wPREi+ufsA6ZCRQFo3vZhoBnnZCDPLRv6zeWClN6nA9B9Ae9VN VgpbdqTR7kIyFHm6Jes44q21GInzpcgtmBQqAYe+oEU/TFUyrupcl3R6Tu+8QAe9DDNc LyWzYK6AuFqyGtOeDj7X9vhgnsv/RO6SQoFMA4Zx58clp3qg2cg87lWAh29bfgUIQTJ3 lsRw== X-Gm-Message-State: AOAM530X7H9C+LoW2ZW2tWWuWNk9JtWPUX5H6iZq7AANH0TNOniiHdKX JkJM0spnegrPul5x7AVQxF8Dis5kNVBS7cGbI6HAAlRiLyvp3deZkEjOo5yJfdKrmVLlSLy6V9j eCYFnFw2C+j6d3WOgP9LRrMDXa+Tko13hFVr4Gl/nP8J13sBjs+IoIfYHtcnCFMGbnADQNg== X-Received: by 2002:ac8:7ca3:: with SMTP id z3mr9020373qtv.118.1627797249090; Sat, 31 Jul 2021 22:54:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy55jD4DxUUklujtZ5nLgjVkWcCRin1XrppngELGg1VmJNjh08RGkzFxkO0ye/9UJx9W8xyeA== X-Received: by 2002:ac8:7ca3:: with SMTP id z3mr9020363qtv.118.1627797248822; Sat, 31 Jul 2021 22:54:08 -0700 (PDT) Received: from athas.redhat.com (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id x14sm2876844qts.13.2021.07.31.22.54.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 31 Jul 2021 22:54:08 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v7 0/2] C.UTF-8 Date: Sun, 1 Aug 2021 01:54:03 -0400 Message-Id: <20210801055405.433547-1-carlos@redhat.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Carlos O'Donell via Libc-alpha Reply-To: Carlos O'Donell Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" The following changes implement a minimally sized C.UTF-8. First we implement the 'codepoint_collation' directive. Then we implement C.UTF-8 with an LC_COLLATE that uses the 'codepoint_collation' directive to support using strcmp or wcscmp for collation i.e. code point sorting. The final C.UTF-8 is only ~396KiB with the largest ~346KiB in LC_CTYPE for all of Unicode. v7 fixes the regressions detected in Fedora Rawhide here: https://bugzilla.redhat.com/show_bug.cgi?id=1986421, but does so by generating identity tables for _NL_COLLATE_COLLSEQMB, and _NL_COLLATE_COLLSEQWC to provide mappings for ASCII characters. This ensures that static applications using the new C.UTF-8 have a functioning fnmatch, regcomp, and regexec for ASCII ranges. This raises the size of LC_COLLATE from 92 to 1406 bytes. Valgrind reports no errors using the tables with C.UTF-8 under tst-fnmatch. v7 also corrects collation sequence byte ordering on BE targets, and I verified this by building crossed locales with localedef --big-endian and confirming that s390x built native C.UTF-8 is the same as an x86_64 C.UTF-8 built wtih --big-endian. The fixes that were in v4 for nrules == 0 will be included in the next release of glibc, and when those are proven correct they can be backported to provide dyanmic or newly compiled static applications with the ability to use all code points in ranges. Carlos O'Donell (2): Add 'codepoint_collation' support for LC_COLLATE. Add generic C.UTF-8 locale (Bug 17318) iconv/Makefile | 22 +- iconv/tst-iconv9.c | 87 +++++ locale/C-collate-seq.c | 101 ++++++ locale/C-collate.c | 78 +---- locale/programs/ld-collate.c | 36 +- locale/programs/locfile-kw.gperf | 1 + locale/programs/locfile-kw.h | 299 ++++++++--------- locale/programs/locfile-token.h | 1 + localedata/C.UTF-8.in | 157 +++++++++ localedata/Makefile | 2 + localedata/SUPPORTED | 1 + localedata/locales/C | 194 +++++++++++ posix/Makefile | 16 +- posix/bug-regex1.c | 20 ++ posix/bug-regex19.c | 22 +- posix/bug-regex4.c | 25 ++ posix/bug-regex6.c | 2 +- posix/transbug.c | 22 +- posix/tst-fnmatch.input | 549 ++++++++++++++++++++++++++++++- posix/tst-regcomp-truncated.c | 1 + posix/tst-regex.c | 25 +- 21 files changed, 1404 insertions(+), 257 deletions(-) create mode 100644 iconv/tst-iconv9.c create mode 100644 locale/C-collate-seq.c create mode 100644 localedata/C.UTF-8.in create mode 100644 localedata/locales/C -- 2.31.1