From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id A118E1F9E0 for ; Fri, 24 Apr 2020 05:32:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726072AbgDXFcG (ORCPT ); Fri, 24 Apr 2020 01:32:06 -0400 Received: from cloud.peff.net ([104.130.231.41]:38398 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725554AbgDXFcG (ORCPT ); Fri, 24 Apr 2020 01:32:06 -0400 Received: (qmail 19862 invoked by uid 109); 24 Apr 2020 05:32:06 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Fri, 24 Apr 2020 05:32:06 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 23265 invoked by uid 111); 24 Apr 2020 05:43:17 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 24 Apr 2020 01:43:16 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 24 Apr 2020 01:32:04 -0400 From: Jeff King To: Jonathan Tan Cc: Junio C Hamano , lkundrak@v3.sk, jrnieder@gmail.com, git@vger.kernel.org Subject: Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes Message-ID: <20200424053204.GD1648190@coredump.intra.peff.net> References: <20200422104000.GA551233@coredump.intra.peff.net> <20200423213735.242662-1-jonathantanmy@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200423213735.242662-1-jonathantanmy@google.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Apr 23, 2020 at 02:37:35PM -0700, Jonathan Tan wrote: > Thanks for the reproduction recipe (in [1]) and your analysis. I took a > look, and it's because the check for in_vain is done differently. In v0: > > if (got_continue && MAX_IN_VAIN < in_vain) { > > reflecting the documentation in pack-protocol.txt: > > However, the 256 limit *only* turns on in the canonical client > implementation if we have received at least one "ACK %s continue" > during a prior round. This helps to ensure that at least one common > ancestor is found before we give up entirely. Ah, thanks for that; I hadn't though to look in that file for more clues. > When debugging, I noticed that in_vain was increasing far in excess of > MAX_IN_VAIN, but because got_continue was false, the client did not give > up. > > But in v2: > > if (!haves_added || *in_vain >= MAX_IN_VAIN) { > > ("haves_added" is irrelevant to this discussion. It is another > termination condition - when we have run out of "have"s to send.) > > So there is no check that "continue" was sent. We probably should change > v2 to match v0. I can start writing a patch unless someone else would > like to take a further look at it. Yeah, this fills in the final pieces of the puzzle I was chasing in: https://lore.kernel.org/git/20200422193324.GB558336@coredump.intra.peff.net/ And the patch you suggest sounds like the best solution. I think there's some room for discussion about what the optimal strategies are (e.g., v0 does send a lot more haves than v2 in this instance, and it wouldn't always be helpful). But it makes sense to me to put v2 and v0 on the same footing for now, especially given the regressions people have mentioned, and then we can explore new options at our convenience (like switching on the skipping negotiation algorithm). -Peff