From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_CSS,URIBL_CSS_A shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 181E31F4D7 for ; Thu, 9 Jun 2022 18:28:00 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=pobox.com header.i=@pobox.com header.b="OTMgIixy"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345431AbiFIS1x (ORCPT ); Thu, 9 Jun 2022 14:27:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345415AbiFIS1t (ORCPT ); Thu, 9 Jun 2022 14:27:49 -0400 Received: from pb-smtp21.pobox.com (pb-smtp21.pobox.com [173.228.157.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CF35FD36B for ; Thu, 9 Jun 2022 11:27:47 -0700 (PDT) Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 71BC71A974F; Thu, 9 Jun 2022 14:27:47 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=sc3RTr09Jcu4jD2OzvD16IG7trKgd2CVoJVwxw /wewI=; b=OTMgIixycWo1I4O8Rd7iuBhtylUHnNo7nleN88Ycx+yYHVSN0c23Ye cBhZwLR2VOOPXqQohXK4FpQoDdixHab8xaDPbX89z7FsZnjI8ziZHp+YS1Al4ixC srzj+jXmmq5yNjfBk09Dldvc+XHfPNgyGU3YbFzv1GtZH/pFrkLZg= Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 69FFE1A974E; Thu, 9 Jun 2022 14:27:47 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.83.92.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id 094ED1A974D; Thu, 9 Jun 2022 14:27:44 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Han Xin Cc: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Git List , Jiang Xin , =?utf-8?Q?Ren=C3=A9?= Scharfe , Derrick Stolee , Philip Oakley , Neeraj Singh , Elijah Newren , Han Xin , Jiang Xin Subject: Re: [PATCH v13 1/7] unpack-objects: low memory footprint for get_data() in dry_run mode References: Date: Thu, 09 Jun 2022 11:27:42 -0700 In-Reply-To: (Han Xin's message of "Thu, 9 Jun 2022 12:10:09 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: DA975568-E821-11EC-AE63-CBA7845BAAA9-77302942!pb-smtp21.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Han Xin writes: >> I am not sure if this is not loosening the error checking in the >> dry-run case, though. In the original code, we set the avail_out >> to the total expected size so >> >> (1) if the caller gives too small a size, git_inflate() would stop >> at stream.total_out with ret that is not STREAM_END nor OK, >> bypassing the "break", and we catch the error. >> >> (2) if the caller gives too large a size, git_inflate() would stop >> at the true size of inflated zstream, with STREAM_END and would >> not hit this "break", and we catch the error. >> >> With the new code, since we keep refreshing avail_out (see below), >> git_inflate() does not even learn how many bytes we are _expecting_ >> to see. Is the error checking in the loop, with the updated code, >> catch the mismatch between expected and actual size (plausibly >> caused by a corrupted zstream) the same way as we do in the >> non dry-run code path? >> > > Unlike the original implementation, if we get a corrupted zstream, we > won't break at Z_BUFFER_ERROR, maybe until we've read all the > input. I think it can still catch the mismatch between expected and > actual size when "fill(1)" gets an EOF, if it's not too late. That is only one half of the two possible failure cases, i.e. input is shorter than the expected size. If the caller specified size is smaller than what the stream inflates to, I do not see the new code to be limiting the .avail_out near the end of the iteration, which would be necessary to catch such an error, even if we are not interested in using the inflated contents, no?