From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 648B71F8C6 for ; Fri, 30 Jul 2021 04:32:50 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 33E593858426 for ; Fri, 30 Jul 2021 04:32:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 33E593858426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1627619568; bh=XvpjvfGyYuixIsuyGM/T4WaCUuo5k+FkRsaPfTPlQiY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=rr6GPE5idMRNRkODjLL0iFV9C2IeuElHA1hfIQ6sIAujdJ7kXtfB6OV44nlqS1Ghp 6NbKd+/+qcDDfnoTpEuCbbj2nbI0qPrvuCEiClMom/IqO0cli+4MDKx37VW0PkGULQ TlvbdUXImFGJI4KyRYc2uyyc9DxLNJ2iOd/APQ+o= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 3F81C3858403 for ; Fri, 30 Jul 2021 04:32:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3F81C3858403 Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-421-lz7h_GbNOWi0rZbMEM-_1g-1; Fri, 30 Jul 2021 00:32:23 -0400 X-MC-Unique: lz7h_GbNOWi0rZbMEM-_1g-1 Received: by mail-qt1-f197.google.com with SMTP id y25-20020ac870990000b02902536d2bea0fso868508qto.19 for ; Thu, 29 Jul 2021 21:32:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=XvpjvfGyYuixIsuyGM/T4WaCUuo5k+FkRsaPfTPlQiY=; b=EmAeFIsMNhvY1V2IR/uMetBzWJZht7iN6px+jjd4GU1gcUCPsg01C9nuLcBYAprTlR IwBd2hBhVsT89BI4erO141CrzZeWmOxqGWMF04N0L1VuiMHfGmQSQ4QP2p2XEVjlUrqs hP260h97U+atclnHVx0c7fjS/bn5zt9X8Thy59azubrq2ispvbY3V1RoExzvsKzaFYtg p7vU4citXangLzm0FWH0jjPM4wk50su11omtqlOHU0IYqy/W2X83RU11JTZ+qIg0ONvM 6E3/6FmFljOHrkzFDX4q3AO0lByisGc4vgiowahFFPg1Qvb3D1WEz8F5WYI8o72S1TCk Imhw== X-Gm-Message-State: AOAM530msJH/msa4tIzyTTRX8eLs5GH8RTYm9rn05TdqjJgJvkaddvLA Le5b0I10tWp9aeS9pAneMKJ09VYbaZ6njcJ6Q82umLf76xVAuEOyB/SAaqlA0wB9fEyYnZ+ldVe w/nRY4kBt+3Bt2SEYuC0xKE+Z6Nz2n3QT/5kPKdcSQv0GrCg1CHsWf06Rp26bYhslCprqnw== X-Received: by 2002:a0c:edb0:: with SMTP id h16mr843951qvr.11.1627619543005; Thu, 29 Jul 2021 21:32:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxI0vyaTDu7GP8hk91f0wfP+MmtYEMDI2ZvRso+C8dq9S0Mp5jiDJmYh9SkDjMob3usXdsrbA== X-Received: by 2002:a0c:edb0:: with SMTP id h16mr843937qvr.11.1627619542757; Thu, 29 Jul 2021 21:32:22 -0700 (PDT) Received: from athas.redhat.com (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id j16sm296586qkk.132.2021.07.29.21.32.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Jul 2021 21:32:21 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v5 0/2] C.UTF-8 Date: Fri, 30 Jul 2021 00:32:16 -0400 Message-Id: <20210730043218.3701358-1-carlos@redhat.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Carlos O'Donell via Libc-alpha Reply-To: Carlos O'Donell Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" The following changes implement a minimally sized C.UTF-8. First we implement the 'strcmp_collation' directive. Then we implement C.UTF-8 with an LC_COLLATE that uses the 'strcmp_collation' directive to support using strcmp for collation i.e. code point sorting. The final C.UTF-8 is only ~396KiB with the largest ~346KiB in LC_CTYPE for all of Unicode. v5 fixes the regressions detected in Fedora Rawhide here: https://bugzilla.redhat.com/show_bug.cgi?id=1986421, but does so by generating identity tables for _NL_COLLATE_COLLSEQMB, and _NL_COLLATE_COLLSEQWC to provide mappings for ASCII characters. This ensures that static applications using the new C.UTF-8 have a functioning fnmatch, regcomp, and regexec for ASCII ranges. This raises the size of LC_COLLATE from 92 to 1406 bytes. Valgrind reports no errors using the tables with C.UTF-8 under tst-fnmatch. The fixes that were in v4 for nrules == 0 will be included in the next release of glibc, and when those are proven correct they can be backported to provide dyanmic or newly compiled static applications with the ability to use all code points in ranges. Carlos O'Donell (2): Add 'strcmp_collation' support for LC_COLLATE. Add generic C.UTF-8 locale (Bug 17318) iconv/Makefile | 22 +- iconv/tst-iconv9.c | 87 +++++ locale/C-collate-seq.c | 97 ++++++ locale/C-collate.c | 78 +---- locale/programs/ld-collate.c | 38 ++- locale/programs/locfile-kw.gperf | 1 + locale/programs/locfile-kw.h | 306 ++++++++--------- locale/programs/locfile-token.h | 1 + localedata/C.UTF-8.in | 157 +++++++++ localedata/Makefile | 2 + localedata/SUPPORTED | 1 + localedata/locales/C | 194 +++++++++++ posix/Makefile | 16 +- posix/bug-regex1.c | 20 ++ posix/bug-regex19.c | 22 +- posix/bug-regex4.c | 25 ++ posix/bug-regex6.c | 2 +- posix/transbug.c | 22 +- posix/tst-fnmatch.input | 549 ++++++++++++++++++++++++++++++- posix/tst-regcomp-truncated.c | 1 + posix/tst-regex.c | 25 +- 21 files changed, 1406 insertions(+), 260 deletions(-) create mode 100644 iconv/tst-iconv9.c create mode 100644 locale/C-collate-seq.c create mode 100644 localedata/C.UTF-8.in create mode 100644 localedata/locales/C -- 2.31.1