From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 23AFD1F4C0 for ; Sat, 12 Oct 2019 14:39:14 +0000 (UTC) Received: from localhost ([::1]:33938 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iJIY3-0003px-BV for normalperson@yhbt.net; Sat, 12 Oct 2019 10:39:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45090) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iJIXo-0003iM-OI for bug-gnulib@gnu.org; Sat, 12 Oct 2019 10:38:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iJIXn-0005Vf-Go for bug-gnulib@gnu.org; Sat, 12 Oct 2019 10:38:56 -0400 Received: from mo6-p00-ob.smtp.rzone.de ([2a01:238:20a:202:5300::6]:25965) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iJIXm-0005RI-Rh for bug-gnulib@gnu.org; Sat, 12 Oct 2019 10:38:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1570891130; s=strato-dkim-0002; d=clisp.org; h=Message-ID:Date:Subject:To:From:X-RZG-CLASS-ID:X-RZG-AUTH:From: Subject:Sender; bh=+zULPXicCycQBoCV4WKWnJiAng4MV1N8aKtN3UhtWHI=; b=QC6p3emeZVoTi880ANo4ON+B+S4cJrkdy+VpAEpR5VkCI38Cay4FB4xtSArUYerImw jB3KxrTh3+K+H7z1rw/ulB+QYS4JASwBNp2uHB9C4CuuGNXSwVoaLJisONJWBkiYtoP2 Ohb4s3R1bY6/50t0axRAFmuLPgT/XdtM+O9aLI3+4UFcs3cq/D+Ncne02MulCUIepYUB 9d5AqWCw9ivAjCLgN+AG5j54cdOEwstnp/EIF+A3Q3ORyToM28djNmbc/HumC+87Unwf v9hKV6aqE8a9iZTVKrEQMxAA/1LIjYs5rfhjNhjc/DCOb2YTXKPG2qFy818tJWs1GXJ9 +zCQ== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGaf0zJZW" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 44.28.0 DYNA|AUTH) with ESMTPSA id N06099v9CEcomjf (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Sat, 12 Oct 2019 16:38:50 +0200 (CEST) From: Bruno Haible To: bug-gnulib@gnu.org Subject: supporting strings > 2 GB Date: Sat, 12 Oct 2019 16:38:49 +0200 Message-ID: <15256545.f1uGFDiRv1@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-165-generic; KDE/5.18.0; x86_64; ; ) MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a01:238:20a:202:5300::6 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" Hi Paul, Eric, I'd like to get over the INT_MAX limit on string size for * the *printf family of functions, * the wcswidth, mbswidth functions, like it has been done for large files and regular expressions. The benefit I expect from that is: - Support of strings > 2 GB or 4 GB without making applications more complex. - Since such strings occur rarely, these corner cases of the code are most often untested. The change would eliminate these untested corners, thus eliminating a number of bugs. How was it done for regular expressions? 1) POSIX introduced a type 'regoff_t' that is to be used instead of 'int', in the context of the regex APIs. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/regex.h.html 2) glibc introduced a preprocessor define _REGEX_LARGE_OFFSETS. 3) gnulib defines _REGEX_LARGE_OFFSETS to 1. In a similar vein, I think it could be done like this for *printf: 1) Introduce a type 'printf_len_t' that is a signed type, either 'int' or 'ptrdiff_t'. And a constant PRINTF_LEN_MAX accordingly. 2) For each *printf functions that returns 'int', define a similar function *printfl, that returns 'printf_len_t'. 3) Introduce %ln as a printf_len_t alternative to %n. 4) If _PRINTF_LARGE is defined and non-zero, define xxxprintf as an alias of xxxprintfl (e.g. '#define xxxprintf xxxprintfl'). 5) Gnulib defines _PRINTF_LARGE to 1. And similarly for wcswidth, with new function wclswidth and macro _WCSWIDTH_LARGE. This way, applications could switch from *printf to *printfl at their pace, without introducing uncaught overflow bugs at any moment. Has this already been discussed in the Austin Group, or on the glibc list? Bruno