From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 2A9CD1F55B for ; Sat, 23 May 2020 16:21:16 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58450384A034; Sat, 23 May 2020 16:21:15 +0000 (GMT) Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [216.12.86.13]) by sourceware.org (Postfix) with ESMTPS id 2EF373851C16 for ; Sat, 23 May 2020 16:21:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2EF373851C16 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=dalias@libc.org Date: Sat, 23 May 2020 12:21:12 -0400 From: Rich Felker To: "Richard W.M. Jones" Subject: Re: RFC: *scanf vs. overflow Message-ID: <20200523162112.GJ1079@brightrain.aerifal.cx> References: <20200523070654.GO3888@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200523070654.GO3888@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Florian Weimer , glibc list , Eric Blake , "libguestfs@redhat.com" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On Sat, May 23, 2020 at 08:06:54AM +0100, Richard W.M. Jones via Libc-alpha wrote: > The context to this is that nbdkit uses sscanf to parse simple file > formats in various places, eg: > > https://github.com/libguestfs/nbdkit/blob/b23f4f53cf71326f1dba481f64f7f182c20fa3dc/plugins/data/format.c#L171-L172 > https://github.com/libguestfs/nbdkit/blob/b23f4f53cf71326f1dba481f64f7f182c20fa3dc/filters/ddrescue/ddrescue.c#L98 > > We can only do this safely where we can prove that overflow does not > matter. Being that it's specified as UB, it can never "not matter"; arbitrarily bad side effects are permitted. So it's only safe to use scanf where the input is *trusted not to contain overflowing values*. What would be really nice to fix here is getting the standard to specify that overflow has behavior like strto* or at least "unspecified value" rather than "undefined behavior" so that it's safe to let it overflow in cases where you don't care (e.g. you'll be consistency-checking the value afterwards anyway). > In other cases we've had to change sscanf uses to strto* etc > which is much more difficult to use correctly. Just look at how much > code is required to wrap strto* functions to use them safely: > > https://github.com/libguestfs/nbdkit/blob/b23f4f53cf71326f1dba481f64f7f182c20fa3dc/server/public.c#L113-L296 Really that much code is just for the sake of verbose error messages, and they're not even accurate. "errno!=0" does not mean "could not parse"; it can also be overflow of a perfectly parseable value. And if you've already caught errno!=0 then end==str is impossible (dead code). The last case, not hitting null, is also likely spurious/wrong; you usually *want* to pick up where strto* stopped, and the next thing the parser does will catch whether the characters after the number are valid there or not. strto* do have some annoying design flaws in error reporting, but they're not really that hard to use right, and much easier than scanf which just *lacks the reporting channels* for the kind of fine-grained error reporting you're insisting on doing here. FWIW this code would also be a lot cleaner as a static inline function rather than a many-line macro. Rich