From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 70D7B1F62E for ; Tue, 15 Jan 2019 15:06:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728508AbfAOPG4 (ORCPT ); Tue, 15 Jan 2019 10:06:56 -0500 Received: from mail-qt1-f196.google.com ([209.85.160.196]:33624 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726077AbfAOPG4 (ORCPT ); Tue, 15 Jan 2019 10:06:56 -0500 Received: by mail-qt1-f196.google.com with SMTP id l11so3323961qtp.0 for ; Tue, 15 Jan 2019 07:06:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=1TOgT8WNSEbRsehZ5a6zZgBXLOh4BZTGipxumg7BtmI=; b=aDGs1C5alcve/oc620ovKIsJnlIIYoiclK9u1/HWSfVYL7sEGehIRArYLkMboUfm0+ 6QPbNZD0v0eGpGFMUNgcIWwom5dGPJjUNXeKYSBtbNQ6zJcGKvAGmh5e0mfl8b/DLlc1 304DGpim5tb2y1Gds7OhyQRQgpwJX2egA96ogGYmSLwxcscO+pWWhQNK6YWC9zwDJn2Y 2ss//u5/c8YrPmfeqhua2gli+3n62T/FSLR7sd3a2/s8pTif5XpOedsRKV9vkpjnI1CB h6/2UBMNipAqDDdk/PgHsQXLoLgG+4GQwUIa4Xrx9W9RUsEuzWXw48BZKhPzl8BRueZr 36cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=1TOgT8WNSEbRsehZ5a6zZgBXLOh4BZTGipxumg7BtmI=; b=Cz/zXALbVZXCnKxBofvpE2MeAfjV1ZR5t4o/kRjCSM9nBryzeVx0pNViS7msUWwL3D itD9kETFYBtrCibZzAfwBPvGO+MRZVDumPTqBhswiRxxweRsjJ+WSADcrEvCP8SgS+5I DWhKIO6LFezzG5vNvPDGPvPrg6jF3TvGQqvwCKtacAstKaYZzPkVVQM8t7iVQrn2XJxe ZB5oNjHBmkjx3rUmPy2vndw6k2thzNcDaeayTs2var50pFJ8REvcBA6H5IfuQtUtyxDp mAUTrFzxdPeu9uzz+TdsFzwsq5XD6j8201yFkYy/z7N7Vh4mmTUo1nhZEjvBRNv8JxSq GLgA== X-Gm-Message-State: AJcUukdTal/zxQQXYrNuq27GTejI9sfGgolXlBST3z6xOzsZRNFMtiWz iu0ZBdYGUBKqFfa3WrQVcoE= X-Google-Smtp-Source: ALg8bN7WSTEBSavf3Vm8eoC9KBdCRrQEkrDaePU7IYFxX0I+NE6Nd7nYS1dnnAehp0wQfS8h1IKH5w== X-Received: by 2002:a0c:9489:: with SMTP id j9mr3299051qvj.116.1547564814562; Tue, 15 Jan 2019 07:06:54 -0800 (PST) Received: from [10.0.1.15] ([98.122.160.2]) by smtp.gmail.com with ESMTPSA id l73sm59842719qkl.16.2019.01.15.07.06.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Jan 2019 07:06:54 -0800 (PST) Subject: Re: [PATCH v4 3/6] pack-objects: add --sparse option To: Junio C Hamano , Derrick Stolee via GitGitGadget Cc: git@vger.kernel.org, peff@peff.net, avarab@gmail.com, jrnieder@gmail.com, Derrick Stolee References: From: Derrick Stolee Message-ID: <8c50e066-b132-cf6e-ed0f-ce843cf3a634@gmail.com> Date: Tue, 15 Jan 2019 10:06:53 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:65.0) Gecko/20100101 Thunderbird/65.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 1/11/2019 5:30 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" writes: > >> From: Derrick Stolee >> >> Add a '--sparse' option flag to the pack-objects builtin. This >> allows the user to specify that they want to use the new logic >> for walking trees. This logic currently does not differ from the >> existing output, but will in a later change. >> >> Create a new test script, t5322-pack-objects-sparse.sh, to ensure >> the object list that is selected matches what we expect. When we >> update the logic to walk in a sparse fashion, the final test will >> be updated to show the extra objects that are added. > Somehow checking the "these are exactly what we expect" feels > brittle. In a history with three relevant commits A---B---C, > packing B..C could omit trees and blobs in C that appear in A but > not in B, but traditionally, because we stop traversal at B and do > not even look at A, we do not notice that such objects that need to > complete C's tree are already available in a repository that has B. > The approach of test in this patch feels similar to saying "we must > send these duplicates because we must stay dumb", even though with a > perfect knowledge about the history, e.g. bitmap, we would be able > to omit these objects in A that appear in C but not in B. My intention with this test was to show that the existing algorithm also sends "extra" objects in certain situations, which is later contrasted by a test where the new algorithm sends objects the old algorithm did not. Instead of "we must stay dumb" I wanted to say "we are not dumber (in this case)". I understand your brittle feeling. If the logic changed in either to be smarter, then we would need to change the test. In some sense, that does mean we have extra coverage of "this is how it works now" so we would understand the behavior change of that hypothetical future change. > I think we want to test test both "are we sending enough, even > though we might be wasting some bandwidth by not noticing the other > side already have some?" and "are we still avoiding from becoming > overly stupid, even though we may be cheating to save traversal cost > and risking to redundantly send some objects?" > > IOW, a set of tests to make sure that truly new objects are all sent > by seeing what is sent is a strict superset of what we expect. And > another set of tests to make sure that objects that are so obviously > (this criterion may be highly subjective) be present in the > receiving repository (e.g. the tree object of commit B and what it > contains when seinding B..C) are never sent, even when using an > algorithm that are tuned for traversal cost over bandwidth > consumption. To properly test "these objects are included," do we have an established pattern for saying "this file is a subset of that file"? Or, should I use something like `comm expected actual >common && test_cmp expected common`? > The code change in this step looks all trivially good, and it may > want to be squashed into the previous step to become a single patch. > Otherwise, [2/6] would be adding a dead code that nobody exercises > until this step. Will squash. Thanks, -Stolee