From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 7AA291F55B for ; Fri, 22 May 2020 20:59:27 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5475F3985415; Fri, 22 May 2020 20:59:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5475F3985415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1590181166; bh=J/zXvmLNLDaFGib+Uf4OksJGwdPVhV9vkUmsnrzwE1U=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=JfV+44IwDO8hcQNvifww5iagGakg8hFWcnAnbwMiXSXIF0V6djR81re6ou7yyEPAM nAPMNNDgTO4Ms1besMVMhUiHJp7ssx6fZuFh+2aK15qTsmbM+KqvW81/vWU8cUXeez 9p0PTfZxWbRpLVELTMkFrLoOc7pdFy3Ouzdrg2/g= Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by sourceware.org (Postfix) with ESMTP id E66D23985415 for ; Fri, 22 May 2020 20:59:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E66D23985415 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-245-_HAQEU8bN6mIFIoEs1e_Wg-1; Fri, 22 May 2020 16:59:18 -0400 X-MC-Unique: _HAQEU8bN6mIFIoEs1e_Wg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EF84E80058A for ; Fri, 22 May 2020 20:59:17 +0000 (UTC) Received: from [10.3.112.88] (ovpn-112-88.phx2.redhat.com [10.3.112.88]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CBB0A5C1D0; Fri, 22 May 2020 20:59:14 +0000 (UTC) To: glibc list Subject: RFC: *scanf vs. overflow Organization: Red Hat, Inc. Message-ID: Date: Fri, 22 May 2020 15:59:14 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Eric Blake via Libc-alpha Reply-To: Eric Blake Cc: Florian Weimer , "libguestfs@redhat.com" Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" It has long been known that the C specification of *scanf() leaves behavior undefined for things like int i; sscanf("9999999999999999", "%i", &i); C11 7.21.6.2 P12 "Matches an optionally signed integer, whose format is the same as expected for the subject sequence of the strtol function with the value 0 for the base argument." C11 7.21.6.2 P10 "If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined." as there is an overflow when consuming the input which matches the strtol subject sequence but does not fit in the width of an int. On my Linux system, 'man sscanf' mentions that ERANGE might be set in such a case, but neither C nor POSIX actually requires this behavior; other likely behaviors is storing the value mod 2^32 into i, or storing INT_MAX into i, or ... This is annoying - the only safe way to parse integers from untrustworthy sources, where overflow MUST be detected, is to manually open-code strtol() calls, which can get quite lengthy in comparison to the concise representations possible with *scanf. Would glibc be willing to consider a GNU extension to add an optional flag character between '%' and the various numeric conversion specifiers (both integral based on strto*l, and floating point based on strtod), where we could force *scanf to treat numeric overflow as a matching failure, rather than undefined behavior? Or even a second flag to request that printf stop consuming characters if the next character in input would cause overflow in the current specifier, leaving that character to instead be matched to the remainder of the format string? Let's suppose for arguments that we add '^' as a request to force overflow to be a matching error. Then sscanf("9999999999999999", "%^i", &i) would be well-specified to return 0, rather than returning 1 with an unknown value assigned into i or any other behavior that other libc do with the undefined behavior when the ^ is not present. And if glibc likes the idea of such an extension, and we see an uptick in applications actually using it, I'd also be happy to champion the addition of such an extension in POSIX (but the POSIX folks will definitely want to see existing practice first - both an implementation and applications that use that implementation). The libguestfs suite of programs is willing to be an early adopter, if glibc is willing to pursue adding such a safety valve. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org