From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 2C1751F463 for ; Wed, 18 Dec 2019 16:27:18 +0000 (UTC) Received: from localhost ([::1]:56820 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihcAO-00081E-Rn for normalperson@yhbt.net; Wed, 18 Dec 2019 11:27:16 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36377) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihcAG-000813-Ph for bug-gnulib@gnu.org; Wed, 18 Dec 2019 11:27:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihcAE-0007y1-LB for bug-gnulib@gnu.org; Wed, 18 Dec 2019 11:27:07 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:45568) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihcAE-0007iZ-Bc for bug-gnulib@gnu.org; Wed, 18 Dec 2019 11:27:06 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E4E65160179; Wed, 18 Dec 2019 08:27:03 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CtdvHEm390w8; Wed, 18 Dec 2019 08:27:03 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0702A16018B; Wed, 18 Dec 2019 08:27:03 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 1VgW6ycB2FGR; Wed, 18 Dec 2019 08:27:02 -0800 (PST) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id D6FE6160179; Wed, 18 Dec 2019 08:27:02 -0800 (PST) Subject: Re: LC_COLLATE in the C locale To: Bruno Haible References: <175192568.e2XXTFFdkW@omega> <09a43701-a998-5c26-ea9e-51c8c3446084@cs.ucla.edu> <8726723.BRRUbPPWXg@omega> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <05cbd4fe-5480-5075-3413-b3ae613992a4@cs.ucla.edu> Date: Wed, 18 Dec 2019 08:27:02 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <8726723.BRRUbPPWXg@omega> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" On 12/18/19 2:29 AM, Bruno Haible wrote: > Hi Paul, > >> I do have a qualm in that coreutils (and I assume others) interpret !hard_locale >> (LC_COLLATE) as meaning that the locale is unibyte and uses native byte >> comparison. > Isn't this warranted by section "LC_COLLATE Category in the POSIX Locale" in > ? I don't see where that section requires unibyte. >> As I recall on some platforms (macOS maybe?), the C locale uses >> UTF-8 so this interpretation isn't correct. > UTF-8 has the nice property that byte-per-byte comparison and codepoint-per- > codepoint comparison are equivalent. True, so the code that assumes strcmp == strcoll should work. But I think some code specifically assumes unibyte. Presumably that code should also check MB_CUR_MAX, which should be enough in practice (even though it doesn't suffice in theory).