From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-2.0 required=3.0 tests=AWL,BAYES_00,BODY_8BITS, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 8DBB41F463 for ; Sun, 29 Dec 2019 17:13:01 +0000 (UTC) Received: from localhost ([::1]:53724 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ilc7g-0002Cj-03 for normalperson@yhbt.net; Sun, 29 Dec 2019 12:13:00 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:52059) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ilc7b-0002Cb-Si for bug-gnulib@gnu.org; Sun, 29 Dec 2019 12:12:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ilc7a-0007xi-6w for bug-gnulib@gnu.org; Sun, 29 Dec 2019 12:12:55 -0500 Received: from mail-ed1-x531.google.com ([2a00:1450:4864:20::531]:40530) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ilc7Z-0007se-TO for bug-gnulib@gnu.org; Sun, 29 Dec 2019 12:12:54 -0500 Received: by mail-ed1-x531.google.com with SMTP id b8so30404952edx.7 for ; Sun, 29 Dec 2019 09:12:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=kqPyofCl0UtwAyLvNBg6ArLepUvVIGklweDzig3KXQo=; b=uWGVDz+CK1q7duvwwsh9tWj0lt5VXYXzGt1s0JruxhLcy5wHmgNyBMBj4YGuV9YVk+ qbLWcmlbA8rF+RAihZ+PLvE4rHLNfWgbvGPC+0aWEsL0DPttOz+YevJdST5Zbp9oxobX bDHxWHbBgaoULJt6jyXzIFgysuLm5mN9fU4h4d1voj+7MPhcvF12osNSLnd1s4d0yUPz qHBeDK6DDrrd/Kp5A8uSQMTJ2aYbb7DQpSWiuYWV4ZWnrJDolKIashYi8jMXHq8fKGhy O36iaydkRGZXmBDlgo1I3cQcLicHH3HF1aZ4EjcAWO1HBmBEsA7VadBbsqbybMlypqFm sujw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=kqPyofCl0UtwAyLvNBg6ArLepUvVIGklweDzig3KXQo=; b=VxiVveXLdU5Lld0Gm73GQoFog99IaleEjEOhRu4y7+ARFrZhIfmE0d/m/zVEsiY1jh RaGYpxLgRR/+crIp5LkIqUANqxxVGercjdVpsYu0QzQwoKYUwrG5SzKhPYNGXWAkLPJK qVNetuUa3J0IXswUZNyDBtf9+R+1NrCOLOgV9/CT+l382IBqa2Jv48j6lOK2eXC6Vw64 GX+9WDwcnrVP/HgIkh8zgAGOOACf1wXok8BUR/NgoboQm6bewJ4fLMvbOObFCAULP0Iz OJlipCAcXsQ2NTDiKqJaGuKMFA0qhPKo3nahhbVyY1Xz3qageMboQXCMs0U0Om79I/Wi IxTg== X-Gm-Message-State: APjAAAVUyWljh+Crocb8keToNljGfEMzrDrcdzhhaU1jRtevsaQCG6nz 8ZrByyALtaTooLD3fyB8+dQ= X-Google-Smtp-Source: APXvYqySeONfZNwVcmXS2krKLadRFEsPKRAoPvBlRL9kMt34u7Z0XBYDezoTdsXdgwvVN+WWEu8J8g== X-Received: by 2002:a05:6402:1596:: with SMTP id c22mr66229201edv.268.1577639572162; Sun, 29 Dec 2019 09:12:52 -0800 (PST) Received: from localhost (ppp046176149051.access.hol.gr. [46.176.149.51]) by smtp.googlemail.com with ESMTPSA id g31sm1057070edd.53.2019.12.29.09.12.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Dec 2019 09:12:51 -0800 (PST) Date: Sun, 29 Dec 2019 19:13:39 +0200 From: ag To: Bruno Haible Subject: Re: string types Message-ID: <20191229171339.GB789@HATZ> References: <175192568.e2XXTFFdkW@omega> <2179574.G9OhZXe8sF@omega> <20191228131438.GA797@HATZ> <1726435.PWpjjHmTz1@omega> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1726435.PWpjjHmTz1@omega> User-Agent: Mutt/1.12.1 (2019-06-15) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::531 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tim =?utf-8?Q?R=C3=BChsen?= , Paul Eggert , bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" On Sun, Dec 29, at 10:19 Bruno Haible wrote: > I agree with the goal. How to do it precisely, is an art however. Ok, let's see what do we have until now. First the Base: (easy) that is malloc maximum requested size from the kernel, and that is PTRDIFF_MAX. We also here we have to forget SIZE_MAX as it is not guaranteed that PTRDIFF_MAX equals to SIZE_MAX. Second the (function returned value) Requirenment: (easy) a signed type. There is an agreement that introduced functions should return on error -1, else the interface will be complicated and we do not want complication. So ptrdiff_t is adequate, since ptrdiff_t is in standard C and include'd with stddef.h. The rest: Catching out of bounds conditions: (rather easy and already implemented in snprintf) after the destination argument will follow an argument with the allocated destination size (from the stack or from the heap). Now, snprintf uses size_t here, but (question) isn't this a contradiction with the above or not? Not probably but it's better ask to de-confuse things (as clarity is a requirenment (semantics should be able to understood by mere humans)). Another concern. What if destination is NULL. Should the internal functions procceed by allocating a buffer with the requested size? What they will do if the requested size <= 0? There are preceding's here, like realpath() which allocates a buffer and it's up to the user to free it. Also. Declared as static internal variables considered harmfull. But sometimes is desirable to have some data in a private place protected or handy to work without side effects. This is solved however with the new im(muttable) module. Catching truncation (first priority maybe): There is a choice to complicate a bit the interface to return more values than -1, but this rejected by the perfect legal assumption that humans are lazy, probably because they have been exposed to try/catch (not bad if you ask but innapropriated for C). The other thing it could be done is to return -1 and set errno accordingly with the error. But such an error doesn't exists or exists? So ETRUNC should be introduced. Few programmers will take the risk to make their program dependable in something that is not standard, but perhaps they will (doubtfull though at this stage). The other thing that left is to check the returned value. Now. In snprintf(3) there are notes about this and a method to calculate truncation (misty though). The functions snprintf() and vsnprintf() do not write more than size bytes (including the terminating null byte ('\0')). If the output was truncated due to this limit, then the return value is the number of characters (excluding the terminating null byte) which would have been written to the final string if enough space had been available. Thus, a return value of size or more means that the output was truncated. (See also below under NOTES.) "which would have been written?" why not always the bytes that had been written? Ok i got it after a break; still difficult to parse though and for what? We have to admit that this a programmer error. [Sh|H]e should know her strings. But we still want to help here. How? Three choises comes to mind. 1. Use a bit map flag argument to control the function behavior. But this adds verbosity but at the same time allows extensibility. Which conditions could be covered with that? Perhaps to return an error if destination is NULL and the function directed with the flag to return in this condition. Same with the source. Very convenient but still verbose as you have to learn another set of FLAGS. 2. Introduce wrappers. Actually wrappers maybe will be used either way. Or introduce a complete set of same functions, post-fixed with _un (to denote unsafety, if _s (not sure) means safe). 3. The programmer knows best. Based on that, either continue with the implementation like it is, or (where is appropriate) use a fourh argument for the requested bytes to be written. And sleep in full conscience, that you did your best you could. He should do the same. Now. What concerns me most is the userspace and all these functions that takes a variable number of arguments and a format string. I was fighting in my code to know with a reliable way the actual bytes produced by the sum of those arguments (as this can be really difficult to catch some of those described conditions above). You also said at one point that noone that does system programming will use (because of the overhead this set of functions). We could go further and say. Noone sane (sorry) would want to format big strings. Such functions are very prone to errors, but are easy to work with them. So what should do with them? There is a method to calculate the size beforehand (means before the declaration) and is given in the printf(3) Linux man page. va_start(ap, fmt); size = vsnprintf(p, size, fmt, ap); va_end(ap); So it parses twice varargs. Plus a compiler version (not 9*), gave warning with -fsanitize=undefined). Could we (users) do better? Can we rely on something else? No we can't. It's the only way. C strings are like this. So (just speaking loudly here), is it possible to introduce such a function that will handle this? Something like a growing buffer? But no. Usually such usage is with stack allocated strings in function scope, but maybe with some kind of recursivety (if such a word) when such a hypothetical function sees at some point that the actual bytes exceeds the allocated size. Sorry as i said it's about user convenience and safety (at the same time), but as it proved with the immutable string, perhaps there is a way with mmap (do not really know). Lastly since we were talking about assumptions and such. It's better to thing them like warrantees. And if we really want to go ahead, perhaps with a way, that even there will be no providence for obsolete systems or to care in this interface only for systems that should also provide these warrantees (perhaps systems that were developed in this decade) then we can wrap all this interface with a big fat: #if WE_WANT_TO_MOVE_ON ... #else continue with this you have, but i cannot help you as i want and i can #endif Bruno, Starting from zero always gives a breath of energy. So if we really want to move on, then the best that it can be done, is to do it like you want to do it, without any obligations to no[o]ne. It's always us and (for) us at the end. The art here is that through us, will benifit the outside of us at exact the same time. This is called dada i believe. > Bruno Thanks, Αγαθοκλής (you know the funny thing: Iggy, Nushrat-Fateh, the unknown to you (but our great) Manolis, and you are my beloved idols. What a life!