From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 89F891F55B for ; Sat, 23 May 2020 01:16:26 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 824DD386F83F; Sat, 23 May 2020 01:16:25 +0000 (GMT) Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [216.12.86.13]) by sourceware.org (Postfix) with ESMTPS id C88E23851C0C for ; Sat, 23 May 2020 01:16:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C88E23851C0C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=dalias@libc.org Date: Fri, 22 May 2020 21:16:16 -0400 From: Rich Felker To: Eric Blake Subject: Re: RFC: *scanf vs. overflow Message-ID: <20200523011614.GE1079@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Weimer , glibc list , "libguestfs@redhat.com" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On Fri, May 22, 2020 at 03:59:14PM -0500, Eric Blake via Libc-alpha wrote: > It has long been known that the C specification of *scanf() leaves > behavior undefined for things like > int i; > sscanf("9999999999999999", "%i", &i); > > C11 7.21.6.2 P12 > "Matches an optionally signed integer, whose format is the same as > expected for the subject sequence of the strtol function with the > value 0 for the base argument." > C11 7.21.6.2 P10 > "If this object does not have an appropriate type, or if the result > of the conversion cannot be represented in the object, the behavior > is undefined." > > as there is an overflow when consuming the input which matches the > strtol subject sequence but does not fit in the width of an int. On > my Linux system, 'man sscanf' mentions that ERANGE might be set in > such a case, but neither C nor POSIX actually requires this > behavior; other likely behaviors is storing the value mod 2^32 into > i, or storing INT_MAX into i, or ... > > This is annoying - the only safe way to parse integers from > untrustworthy sources, where overflow MUST be detected, is to > manually open-code strtol() calls, which can get quite lengthy in > comparison to the concise representations possible with *scanf. > > Would glibc be willing to consider a GNU extension to add an > optional flag character between '%' and the various numeric > conversion specifiers (both integral based on strto*l, and floating > point based on strtod), where we could force *scanf to treat numeric > overflow as a matching failure, rather than undefined behavior? Or > even a second flag to request that printf stop consuming characters > if the next character in input would cause overflow in the current > specifier, leaving that character to instead be matched to the > remainder of the format string? Since conversion specifier forms outside the standard *also* have undefined behavior, I see no advantage to defining that particular undefined case vs just defining the result of the overflowing conversion, unless you're worried the standard might later define a conflicting definition. Neither way is amenable to configure detection (without breaking cross compiling) without also adopting something like my proposal on libc-coord: https://www.openwall.com/lists/libc-coord/2020/04/22/1 BTW there is a portable only-somewhat-hideous way to do this with sscanf: using assignment suppression combined with %n, then strtol, etc. with the offsets sproduced by %n. > Let's suppose for arguments that we add '^' as a request to force > overflow to be a matching error. Then sscanf("9999999999999999", > "%^i", &i) would be well-specified to return 0, rather than > returning 1 with an unknown value assigned into i or any other > behavior that other libc do with the undefined behavior when the ^ > is not present. > > And if glibc likes the idea of such an extension, and we see an > uptick in applications actually using it, I'd also be happy to > champion the addition of such an extension in POSIX (but the POSIX > folks will definitely want to see existing practice first - both an > implementation and applications that use that implementation). The > libguestfs suite of programs is willing to be an early adopter, if > glibc is willing to pursue adding such a safety valve. I think it would be more useful to look for existing practice where the UB blows up in horrible ways, and if there is none (if all implementations behave somewhat reasonably) define the intersection of their behaviors as standard and get rid of the UB here. A new feature will not reliably be usable for decades in portable software, but new documentation of existing universal practice would be immediately usable. Rich