From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=unavailable autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id DAD8B1F9FD for ; Sat, 6 Mar 2021 20:18:13 +0000 (UTC) Received: from localhost ([::1]:54690 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lIdNN-0002BW-2d for normalperson@yhbt.net; Sat, 06 Mar 2021 15:18:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:47288) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lIdNJ-0002BM-5d for bug-gnulib@gnu.org; Sat, 06 Mar 2021 15:18:09 -0500 Received: from mo4-p00-ob.smtp.rzone.de ([85.215.255.21]:21454) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lIdNH-0003U2-5L for bug-gnulib@gnu.org; Sat, 06 Mar 2021 15:18:08 -0500 ARC-Seal: i=1; a=rsa-sha256; t=1615061878; cv=none; d=strato.com; s=strato-dkim-0002; b=eyx2LOG8MZq2Uc7cRGs3JzFTML6Ewukd4C/chnZNLSnHTzCYqos4L1cxWEOac/wF/0 tUMawhgRxxOmWBI+5xP3kslPBkkOBuzIEGpU3TnMKFZPYF+ly5cOIX96wztO9nIH3g6d 88ysZ1f1vUN2/CgHxDI8/t12Liy8daFX8ZT2vxHy5lMj55rd9tjAExWZapVmvbIJUzfF 8QCj+GUwtvGZN+7jE5K2FBFG8Zy1kH9dRTfmq2Sfcg9nhBR2cCtpa+UAsDkCmbkVAkm/ sAI7y0qsRufE1XhjpeZKjCkbRhZmrQaRoWwrQIfR9Ey7jLsgXrXO25Dt5ESY7MBSJ/FB bycg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1615061878; s=strato-dkim-0002; d=strato.com; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Cc:Date: From:Subject:Sender; bh=rD57baRfO8qVwL+fp8Nw1q172BqE9T/a8DoBqAXWELQ=; b=l1ku+bsyk64p/mL6An405SrpJF4Ny1r3pEdWMByEQ92/6C0JSV6W9gxQzpSPUWicZG hwda8v3xDoFocVj0Hxtz8l2UKxvwY8A1A8h6mDTA0Yje0WRfh7n75lJ6jvhBu3W2vfLM klQBSWPfXuaOfPnwBqATvnEW5nAnIaDjhvgGu+PR/fC0Pgizennuy4Q1HqY/bwx7GqHe NZUJeKZ2+GSX0GSJJLkyNsBhIe5bOofwHSLCJvN9Oz43BlsV6IYJI+NdGDnj4ndhQMGc 8Gb52v1WOrgbrC1Me4iVzJQ+0QhQIeIReJvKuvLJTLzno25HHCZpsjnG/6MPeFggDJIP WWdQ== ARC-Authentication-Results: i=1; strato.com; dkim=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1615061878; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Cc:Date: From:Subject:Sender; bh=rD57baRfO8qVwL+fp8Nw1q172BqE9T/a8DoBqAXWELQ=; b=CUs1OOiTSkGD8u24P/w6OSFhDtMq0YpUJRyvwSo7NqP7sLwwQTvj1Lt/NJh8YDimaY 6qIk+uVLwqJpuDG3HyMLmXmfn6Cw98chhSWdHzYftdagmsq7Yat3K9fU5q84PeqppQEy bnJqscYFeErwD+wxlzszi7SkFHQ7xHNwoRlh09VveGL5FkvW9prwhlxwWSvnFvN3xyI6 AbF3NiJswKU5MB+S6dWiwOG4O5dn7WA4592Cyt/KaPY6Y53BzfrP+OL3emztDsE8iFBB LX7Sk6XkYLBe2KyNxpVPug7GkLleeyrN54A6oj00eqbWbHrY5cf9zdehj/a8HoY98UjL Vsvw== Authentication-Results: strato.com; dkim=none X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOHqf3zZdW" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 47.20.3 DYNA|AUTH) with ESMTPSA id n0b11fx26KHwCE3 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve X9_62_prime256v1 with 256 ECDH bits, eq. 3072 bits RSA)) (Client did not present a certificate); Sat, 6 Mar 2021 21:17:58 +0100 (CET) From: Bruno Haible To: Paul Eggert Subject: Re: dealing with non-ASCII-safe encodings Date: Sat, 06 Mar 2021 21:17:57 +0100 Message-ID: <403769359.ktsG0tK06S@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-203-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <0f088c6a-3255-33b8-e177-b9ac91b86c84@cs.ucla.edu> References: <20210104202528.1228255-1-adhemerval.zanella@linaro.org> <87bldrsurg.fsf@oldenburg2.str.redhat.com> <0f088c6a-3255-33b8-e177-b9ac91b86c84@cs.ucla.edu> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Received-SPF: none client-ip=85.215.255.21; envelope-from=bruno@clisp.org; helo=mo4-p00-ob.smtp.rzone.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_NONE=0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Weimer , libc-alpha@sourceware.org, bug-gnulib@gnu.org, Adhemerval Zanella Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" Paul Eggert wrote: > However, my worry is that good support for non-ASCII-safe encodings like > Shift-JIS is hard to do, and that any such support we'd add to > Gnulib/coreutils/etc. would not only increase maintenance costs and > reduce runtime performance Shift_JIS is not the only non-ASCII-safe encoding; GB18030, BIG5, BIG5-HKSCS, and GBK are as well, and among these GB18030 is used as locale encoding in China. Therefore it is important for programs to support these locale encodings. Gnulib has the support for it: - It has replacement functions that operate correctly with these locale encodings: strstr, c_strstr -> mbsstr strchr -> mbschr strrchr -> mbsrchr strspn -> mbsspn strcspn -> mbscspn strpbrk -> mbspbrk strsep -> mbssep strtok_r -> mbstok_r - It has warnings (through _GL_WARN_ON_USE) for uses of the functions that are not OK for non-ASCII-safe encodings. - It has modules mbchar, mbiter, mbfile for iterating through the multibyte characters of a string or file, that work for all locale encodings. Yes, it does reduce the performance to use these safer functions. I have shown in the past, through coreutils patches, how to accommodate both a "fast path" and a "safe path" in the same binary. Bruno