From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 118BB1F55B for ; Sat, 23 May 2020 16:11:48 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F121C386F44A; Sat, 23 May 2020 16:11:46 +0000 (GMT) Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [216.12.86.13]) by sourceware.org (Postfix) with ESMTPS id 53E7E3851C29 for ; Sat, 23 May 2020 16:11:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 53E7E3851C29 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=dalias@libc.org Date: Sat, 23 May 2020 12:11:43 -0400 From: Rich Felker To: Paul Eggert Subject: Re: RFC: *scanf vs. overflow Message-ID: <20200523161143.GI1079@brightrain.aerifal.cx> References: <20200523011614.GE1079@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Weimer , glibc list , Eric Blake , "libguestfs@redhat.com" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On Fri, May 22, 2020 at 08:06:34PM -0700, Paul Eggert wrote: > On 5/22/20 6:16 PM, Rich Felker wrote: > > A new feature > > will not reliably be usable for decades in portable software, but new > > documentation of existing universal practice would be immediately > > usable. > > We could do both. > > Also, we could change glibc's behavior in a simpler way, by not adding a new > flag; but if an integer is out of range, then scan only the initial prefix that > fits, leaving the trailing digits for the rest of the format to scan. This also > conforms to POSIX and is more likely to cause C programs to do the right thing > (i.e., report a failure) than the current behavior does. And with luck perhaps > we could eventually get POSIX to standardize this behavior. I'm not really a fan of stopping on an initial prefix. While UB allows anything, that's contrary to the abstract behavior defined for scanf (matching fields syntactically then value conversion) and does not admit easily sharing a backend with strto*. It's also even *more likely* to break programs that don't expect the behavior than just storing a wrapped or clamped value, since all the remaining fields will misalign with the conversion specifier string. FILE-based (as opposed to string-based) scanf forms inherently do not admit any kind of "recovery" after mismatch without the caller seeking backwards (requiring a seekable stream); many of them are lossy on error. This is mainly a reaon not to use them, not a justification for a weird definition for one special case. I'm pretty sure the real answer here is just "don't use *scanf for that." Rich