From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 1B2CF1F463 for ; Fri, 27 Dec 2019 10:51:32 +0000 (UTC) Received: from localhost ([::1]:33864 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iknDO-0001BA-Kl for normalperson@yhbt.net; Fri, 27 Dec 2019 05:51:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54468) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iknDK-0001Aq-Cp for bug-gnulib@gnu.org; Fri, 27 Dec 2019 05:51:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iknDJ-0000kg-31 for bug-gnulib@gnu.org; Fri, 27 Dec 2019 05:51:26 -0500 Received: from mo6-p00-ob.smtp.rzone.de ([2a01:238:20a:202:5300::10]:31813) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iknDI-0000hr-E0 for bug-gnulib@gnu.org; Fri, 27 Dec 2019 05:51:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1577443880; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: X-RZG-CLASS-ID:X-RZG-AUTH:From:Subject:Sender; bh=7opVsh+mbcOjuxqphcUtXLbBRrN+XLrSVyGU8ZWGHtw=; b=m7B3y0Kr6gFGdQulim4liasSAhTQcIYupL3bt2+qz9LGXF9nIHiOHh7WA3A/IvabAa IIgyaEBwxLRcihC5eTuopDHLbcfO/+hG4oCP1by51zqUpseTKNfJPki/zboFZc9Hc4hI NWcYdXDD3sy04mN5NxIKAxHoAtsdRE4BUbJ3+NPgd4cYWBgwULctlsufmmWZCwcW2+le amGmoF7wVWpRC9th9BfXsDK6vaQbH3+JtJJS5ghFP9zr+OerJa1+ZDhDYh4TKF8ahOym YLlpAemh73glPPLQ1uVhTKz7DZ+Tc/yZvXTjWjPVsmFxNaMj8ulFFAUHdgGUdkyV/Odx vtnQ== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AXj0Jt0kPFIlK7hby9Q=" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 46.1.3 DYNA|AUTH) with ESMTPSA id R06a06vBRApJ7nx (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve X9_62_prime256v1 with 256 ECDH bits, eq. 3072 bits RSA)) (Client did not present a certificate); Fri, 27 Dec 2019 11:51:19 +0100 (CET) From: Bruno Haible To: ag Subject: Re: string types Date: Fri, 27 Dec 2019 11:51:18 +0100 Message-ID: <2179574.G9OhZXe8sF@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-170-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <20191226221225.GA800@HATZ> References: <175192568.e2XXTFFdkW@omega> <20191226221225.GA800@HATZ> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a01:238:20a:202:5300::10 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tim =?ISO-8859-1?Q?R=FChsen?= , bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" Aga wrote: > I do not know if > you can (or if it is possible, how it can be done), extract with a way a specific > a functionality from gnulib, with the absolute necessary code and only that. gnulib-tool does this. With its --avoid option, the developer can even customize their notion of "absolutely necessary". > In a myriad of codebases a string type is implemented at least as: > size_t mem_size; > size_t num_bytes; > char *bytes; This is actually a string-buffer type. A string type does not need two size_t members. Long-term experience has shown that using different types for string and string-buffer is a win, because - a string can be put in a read-only virtual memory area, thus enforcing immutability (-> reducing multithread problems), - providing primitives for string allocation reduces the amount of buffer overflow bugs that otherwise occur in this area. [1] Unfortunately, the common string type in C is 'char *' with NUL termination, and a different type is hard to establish - because developers already know how to use 'char *', - because existing functions like printf consume 'char *' strings. - Few programs have had the need to correctly handles strings with embedded NULs. > An extended ustring (unicode|utf8) type can include information for its bytes with > character semantics, like: > (utf8 typedef'ed as signed int) > utf8 code; // the integer representation > int len; // the number of the needed bytes > int width; // the number of the occupied cells > char buf[5]; // and probably the character representation Such a type would have a niche use, IMO, because - 99% of the processing would not need to access the width (screen columns) - so why spend CPU time and RAM to store it and keep it up-to-date? - 80% of the processing does not care about the Unicode code points either, and libraries like libunistring can do the Unicode-aware processing. > But the programmer mind would be probably best > if could concentrate to how to express the thought (with whatever meaning of what we > are calling "thought") and follow this flow, or if could concentrate the energy to > understand the intentions (while reading) of the code (instead of wasting self with > the "details" of the code) and finally to the actual algorithm (usually conditions > that can or can't be met). That is the idea behind the container types (list, map) in gnulib. However, I don't see how to reasonably transpose this principle to string types. Bruno [1] https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00031.html