From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.0.0.0/16 X-Spam-Status: No, score=-4.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C1DD31F405 for ; Sun, 16 Dec 2018 23:16:33 +0000 (UTC) Received: from localhost ([::1]:44155 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gYfeB-0004S4-E6 for normalperson@yhbt.net; Sun, 16 Dec 2018 18:16:31 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60642) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gYfe4-0004Rn-2B for bug-gnulib@gnu.org; Sun, 16 Dec 2018 18:16:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gYfe2-0000sQ-Nd for bug-gnulib@gnu.org; Sun, 16 Dec 2018 18:16:24 -0500 Received: from scc-mailout-kit-02.scc.kit.edu ([2a00:1398:9:f712::810d:e752]:46992) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gYfe2-0000mt-3s for bug-gnulib@gnu.org; Sun, 16 Dec 2018 18:16:22 -0500 Received: from asta-nat.asta.uni-karlsruhe.de ([172.22.63.82] helo=hekate.usta.de) by scc-mailout-kit-02.scc.kit.edu with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (envelope-from ) id 1gYfdx-0001cW-H2; Mon, 17 Dec 2018 00:16:18 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1gYfdx-0002l3-4X; Mon, 17 Dec 2018 00:16:17 +0100 Received: from athene.usta.de ([172.24.96.10]) by donnerwolke.usta.de with esmtp (Exim 4.84_2) (envelope-from ) id 1gYfdw-0003T7-VI; Mon, 17 Dec 2018 00:16:17 +0100 Received: from localhost (athene.usta.de [local]) by athene.usta.de (OpenSMTPD) with ESMTPA id c7bf81af; Mon, 17 Dec 2018 00:16:16 +0100 (CET) Date: Mon, 17 Dec 2018 00:16:16 +0100 From: Ingo Schwarze To: Bruno Haible Subject: Re: OpenBSD locale system Message-ID: <20181216231616.GM90457@athene.usta.de> References: <7256199.52JTtAUSKG@omega> <20181216180407.GF90457@athene.usta.de> <45443746.orUszaOq50@omega> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45443746.orUszaOq50@omega> User-Agent: Mutt/1.8.0 (2017-02-23) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1398:9:f712::810d:e752 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" Hi Bruno, Bruno Haible wrote on Sun, Dec 16, 2018 at 08:01:04PM +0100: > Ingo Schwarze wrote: >> The OpenBSD C library intentionally doesn't implement any other >> locale(1) categories except LC_CTYPE because many here regard the >> other categories as overengineering and as detrimental to system >> security > I partially agree with this, regarding specific categories, such as > > - LC_MONETARY: The main API function for this category, strfmon(), > is defined in such a way that, if implemented correctly, it > produces misleading results. > > > - LC_PAPER: Any software which wants to print something should > better ask the attached printer, rather than make assumptions > about the printer device based on the locale. > > However, locale categories such as LC_NUMERIC and LC_MESSAGES > are useful when you assume that your software does have end-users > that are not sysadmins. Probably, you are right that LC_MESSAGES is not dangerous as long as the C library doesn't actually attempt to translate system error strings. But LC_NUMERIC is certainly dangerous, it can break parsers in subtle and surprising ways, whereas it doesn't really matter all that much for end users in the first place. But i guess discussing such considerations in detail would be off-topic on this mailing list; i merely mentioned them to provide minimal context regarding why certain decisions were made; so let's focus on the consequences of the decisions, how gnulib should best deal with them, and possibly identify parts that might need revisiting, see below. [...] > Regarding OpenBSD, the uselocale support is useful for adding a checkmark > to the checkbox "We support POSIX locale_t API", but is not useful, for > example, to have a multithreaded web server honor the Accept-Language > settings given by a browser user, other than by reimplementing all > needed locale-dependent behaviour. The "all needed" in this sentence sounds like it were a big deal; but all that is needed here is storing one language code per user, right? Why would any programmer call a library API for that rather than simply storing the selected language in a variable? For comparison, the point of using {set,new,use}locale(3) with LC_CTYPE is not merely remembering which character set the user asked for, but also changing the behaviour of many *wc*(3) and *mb*(3) library functions. LC_MESSAGES, on the other hand, will never have any effect on the behaviour of any library function in the OpenBSD libc. Also, in your web server example, you certainly don't want syslog messages in languages requested by clients, so calling uselocale(3) would merely be asking for trouble... (Of course it's still possible to write correct code, but harder.) >> POSIX does not require that "de_DE.UTF-8" and >> "fr_FR.UTF-8" must be different locales, or that they behave >> differently from each other in any way. > Here you need to distinguish > - locale-dependent behaviour defined by POSIX functions and > - locale-dependent behaviour defined by the application. > > In setlocale.c you made this distinction, as witnessed by the > comment in > https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/locale/setlocale.c?annotate=1.29 > lines 72..75. Actually, originally i proposed to delete that behaviour for consistency with {new,use}locale(3), but no consensus was reached on that point - some argued: given that it is already implemented, why not simply keep it in setlocale(3)? It may be useful in some situations. So it was kept. But i consider setlocale(3) the odd one out here rather than {new,use}locale(3), because setlocale(3) supports storing a string in the library that the application program could just as easily, or arguably even more easily, store itself. > Why not also for the per-thread locales? By implementing the FreeBSD > querylocale API (the equivalent of setlocale(category,NULL) for locale_t > objects), you would make it possible for applications to pull out > German versus French messages, depending whether the per-thread locale > is "de_DE.UTF-8" or "fr_FR.UTF-8". So, you suggest to store this string in the library (where it has no effect) even though POSIX does not define a method to retrieve it again once it is stored? I don't quite see yet how that might be useful - not even for your webserver example, because the webserver couldn't portably retrieve the string, or could it? I hoped to understand better what your point is by looking at the HEAD of the master branch of the git repo of GNU grep because you mentioned a test failure there - but grepping the grep repo, i can't even seem to find any usage of newlocale(3) or setlocale(3) in there, so i'm not quite sure what you are actually trying to achieve. Also, you mentioned "a test failure of test-localename", but "grep -RF localename *" returns nohing for me in the grep repo either... I also tried running the build myself in order to reproduce your issue on OpenBSD-current. Here are the findings: 1. ./bootstrap appears to run wget(1), unconditionally, which didn't exist on my system. On OpenBSD, the program for that purpose is called ftp(1) - even for https:// URIs. 2. make check yields only two failures: XFAIL: equiv-classes XFAIL: triple-backref ============================================================================ Testsuite summary for GNU grep 3.1.51-e767 ============================================================================ # TOTAL: 109 # PASS: 80 # SKIP: 27 # XFAIL: 2 # FAIL: 0 # XPASS: 0 # ERROR: 0 ============================================================================ ============================================================================ Testsuite summary for GNU grep 3.1.51-e767 ============================================================================ # TOTAL: 173 # PASS: 157 # SKIP: 16 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 ============================================================================ In particular, i see: PASS: test-localename Do you need more info? If so, what exactly? Better on or off list? Of course, asking for querylocale(3) support - as opposed to questioning the implementation of uselocale(3) - would be a rather different matter. But while i did hear from porters that the lack of {new,use}locale(3) and the related interfaces did cause porting trouble in the past, i didn't hear about trouble that would go away by implementing querylocale(3) so far, and given that it isn't standardized, that doesn't seem very surprising. Of course, i may simply have missed such trouble. Anyway, in case what you really ask for is implementing querylocale(3), then i no longer understand what is broken about {new,use}locale(3) as long as querylocale(3) does not exist, so why exactly it needs to be marked as non-working... Yours, Ingo