From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-5.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 948BA1F8C6 for ; Mon, 6 Sep 2021 03:41:44 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BD9383839C72 for ; Mon, 6 Sep 2021 03:41:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BD9383839C72 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1630899702; bh=CpwYEBD5TQRnyzXp2dO3T+SDarXN5ZVxQn2ZBpuvMgA=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=f0pVeOLQGaOj041VgYhqfLvKXmNZXVnvdI3OMsrtgmjalPaU2cNgKZNHP0Lw/agrA ZVy9NZIcJL8S8IUyrISTtgHQOa0xiMoYLHRpZOdUkXqq0dIt13UceDV0SK/8D5on68 V0FpSf0w+m09IECngblANOIUjDpTUekISAxdshAc= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id 2769E3858428 for ; Mon, 6 Sep 2021 03:41:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2769E3858428 Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-35-rGBJJtXnN1yDMb6GyNs_DA-1; Sun, 05 Sep 2021 23:41:20 -0400 X-MC-Unique: rGBJJtXnN1yDMb6GyNs_DA-1 Received: by mail-qv1-f69.google.com with SMTP id h14-20020a0cffce000000b00372ea3f12a5so10150522qvv.9 for ; Sun, 05 Sep 2021 20:41:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=CpwYEBD5TQRnyzXp2dO3T+SDarXN5ZVxQn2ZBpuvMgA=; b=U7RrKeAd6q0K4M0jp6JLeEakp1dKLQfd7ZQUAAg4y/3nVhrFETrxQonvmayIl/5th+ jIeDqIEk81XUqPN2OE98qhXDzhSpAuqjGdUcpiwzzDD2a6U103Qpj7cwbTTsFlN0ydbX awRoq83fstjsNu7P59C5tLDMMso558f2zO8vuuJjgwoUh87uzCQLk7dTSmUfkAgS8gnn N8ZfqpdRnfOHVjx4woG7eFyaSrfLSOYhi5PDWFHS10/+hB+KQvoeukIBkLbOjciQXDVv lOIVIY18AWrx9b1v0TxGovpD5V1jGAQRjP2KLBngNrOxrOUpKla0ThSzXM8hiAOnwFR6 v5+w== X-Gm-Message-State: AOAM530bOYNjyQrw5LjQk9OVF5SjuPhSt9/qGcwXw7e+jP6RUEG7zzNJ 1peSnpZoTedMjnGn2rqT9JcyKekpb5FzfkvekMze6FDXJENv79W+HAd4kFAHn0VQ6KsvUTtDZYO xsnsgDMCCGH0d/d+s95FhX4/8gzKafwrSq0hoo7v7g3tEi3XEdYfAeugMKGSdv6LkfQARNw== X-Received: by 2002:a37:9cc8:: with SMTP id f191mr9228401qke.113.1630899679974; Sun, 05 Sep 2021 20:41:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyzVZ8a0F67cdj1P5jytDkGBfrnIIE023UY/HAUQO57zgSY/ekec41gziDLfssQ7LQgUEgJ3A== X-Received: by 2002:a37:9cc8:: with SMTP id f191mr9228391qke.113.1630899679723; Sun, 05 Sep 2021 20:41:19 -0700 (PDT) Received: from [192.168.1.16] (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id 207sm5712888qkh.45.2021.09.05.20.41.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 05 Sep 2021 20:41:19 -0700 (PDT) Subject: Re: [PATCH v9 2/2] Add generic C.UTF-8 locale (Bug 17318) To: Florian Weimer References: <20210902020546.90935-1-carlos@redhat.com> <20210902020546.90935-3-carlos@redhat.com> <87mtov81g2.fsf@oldenburg.str.redhat.com> Organization: Red Hat Message-ID: <837d13d5-fccd-0dfe-759f-910cf9a01f5d@redhat.com> Date: Sun, 5 Sep 2021 23:41:17 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <87mtov81g2.fsf@oldenburg.str.redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Carlos O'Donell via Libc-alpha Reply-To: Carlos O'Donell Cc: libc-alpha@sourceware.org Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On 9/2/21 11:03 AM, Florian Weimer wrote: > * Carlos O'Donell: > >> diff --git a/NEWS b/NEWS >> index 79c895e382..807105a596 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -9,7 +9,15 @@ Version 2.35 >> >> Major new features: >> >> - [Add new features here] >> +* Support for the C.UTF-8 locale has been added to glibc. The locale >> + supports full code-point sorting for all valid Unicode code points. >> + A limitation in the framework for fnmatch, regexec, and regcomp requires >> + a compromise to save space and only ASCII-based range expressions are >> + supported for now (see bug 28255). The full size of the locale is only >> + ~400KiB, with 346KiB coming from LC_CTYPE information for Unicode. This >> + locale harmonizes downstream C.UTF-8 already shipping in Gentoo, Debian, >> + Ubuntu, Fedora, CentOS Stream, and RHEL. The locale is not built into >> + glibc, and must be installed. > > I would say “various downstream distributions”. You left out SUSE's > distributions, and they have C.UTF-8 as well: > > I double checked that implementation and it's a copy Mike Fabian's original that we put into Fedora/RHEL so we are already harmonized with that, which is good. I've adjusted the text following your recommendation though, it's clearer. >> --- /dev/null >> +++ b/iconv/tst-iconv9.c > >> + /* From ISO-8859-1 to ASCII. */ > >> + /* From UTF-8 to ASCII. */ > > Missing spaces after “.”. Fixed. >> diff --git a/posix/transbug.c b/posix/transbug.c >> index d0983b4d44..71632b7976 100644 >> --- a/posix/transbug.c >> +++ b/posix/transbug.c >> @@ -116,14 +116,30 @@ do_test (void) >> static const char lower[] = "[[:lower:]]+"; >> static const char upper[] = "[[:upper:]]+"; >> struct re_registers regs[4]; >> + int result; >> >> +#define CHECK(exp) \ >> + if (exp) { puts (#exp); result = 1; } >> + >> + printf ("INFO: Checking C.\n"); >> setlocale (LC_ALL, "C"); >> >> (void) re_set_syntax (RE_SYNTAX_GNU_AWK); >> >> - int result; >> -#define CHECK(exp) \ >> - if (exp) { puts (#exp); result = 1; } >> + result = run_test (lower, regs); >> + result |= run_test (upper, ®s[2]); >> + if (! result) >> + { >> + CHECK (regs[0].start[0] != regs[2].start[0]); >> + CHECK (regs[0].end[0] != regs[2].end[0]); >> + CHECK (regs[1].start[0] != regs[3].start[0]); >> + CHECK (regs[1].end[0] != regs[3].end[0]); >> + } >> + >> + printf ("INFO: Checking C.UTF-8.\n"); >> + setlocale (LC_ALL, "C.UTF-8"); >> + >> + (void) re_set_syntax (RE_SYNTAX_GNU_AWK); >> >> result = run_test (lower, regs); >> result |= run_test (upper, ®s[2]); > > The second-to-last line overwrites the previous test results. > > I think this can go in if you address those nits. Fixed. I'll use |= for all of them and init to zero. I'll post a v10. Only 2/2 needs a Reviewed-by. Thanks for your review. -- Cheers, Carlos.