git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: peff@peff.net
Cc: lkundrak@v3.sk, jrnieder@gmail.com, git@vger.kernel.org,
	Jonathan Tan <jonathantanmy@google.com>
Subject: Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
Date: Thu, 23 Apr 2020 14:37:35 -0700	[thread overview]
Message-ID: <20200423213735.242662-1-jonathantanmy@google.com> (raw)
In-Reply-To: <20200422104000.GA551233@coredump.intra.peff.net>

> On Wed, Apr 22, 2020 at 06:30:11AM -0400, Jeff King wrote:
> 
> > So it really just seems like v2 does not try hard enough. I think the
> > culprit is the MAX_IN_VAIN setting. If I do this:
> > 
> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 1734a573b0..016a413d49 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -46,7 +46,7 @@ static struct strbuf fsck_msg_types = STRBUF_INIT;
> >   * After sending this many "have"s if we do not get any new ACK , we
> >   * give up traversing our history.
> >   */
> > -#define MAX_IN_VAIN 256
> > +#define MAX_IN_VAIN 20000
> >  
> >  static int multi_ack, use_sideband;
> >  /* Allow specifying sha1 if it is a ref tip. */
> > 
> > then I get that same 48k objects, 23MB fetch that v0 does.
> 
> I don't quite think that's the solution, though. Both old and new are
> supposed to be respecting MAX_IN_VAIN. So it's not at all clear to me
> why it restricts the number of haves we'll send in v2, but not in v0.
> 
> Maybe somebody more familiar with the negotiation code can comment
> further.

Thanks for the reproduction recipe (in [1]) and your analysis. I took a
look, and it's because the check for in_vain is done differently. In v0:

  if (got_continue && MAX_IN_VAIN < in_vain) {

reflecting the documentation in pack-protocol.txt:

  However, the 256 limit *only* turns on in the canonical client
  implementation if we have received at least one "ACK %s continue"
  during a prior round.  This helps to ensure that at least one common
  ancestor is found before we give up entirely.

(Note that both the code and the documentation call it "continue", but
the code also correctly handles multi_ack_detailed, which instructs the
server to send "ACK common" and "ACK ready" in lieu of "ACK continue".)

When debugging, I noticed that in_vain was increasing far in excess of
MAX_IN_VAIN, but because got_continue was false, the client did not give
up.

But in v2:

  if (!haves_added || *in_vain >= MAX_IN_VAIN) {

("haves_added" is irrelevant to this discussion. It is another
termination condition - when we have run out of "have"s to send.)

So there is no check that "continue" was sent. We probably should change
v2 to match v0. I can start writing a patch unless someone else would
like to take a further look at it.

[1] https://lore.kernel.org/git/20200422095702.GA475060@coredump.intra.peff.net/

  parent reply	other threads:[~2020-04-23 21:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22  8:42 Git 2.26 fetches many times more objects than it should, wasting gigabytes Lubomir Rintel
2020-04-22  9:57 ` Jeff King
2020-04-22 10:30   ` Jeff King
2020-04-22 10:40     ` Jeff King
2020-04-22 15:33       ` Junio C Hamano
2020-04-22 19:33         ` Jeff King
2020-04-23 21:37       ` Jonathan Tan [this message]
2020-04-23 21:54         ` Junio C Hamano
2020-04-24  5:32         ` Jeff King
2020-04-22 15:40   ` Jonathan Nieder
2020-04-22 19:36     ` Jeff King
2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
2020-04-22 18:23     ` Junio C Hamano
2020-04-22 19:40     ` Jeff King
2020-04-22 19:47       ` Jeff King
2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
2020-04-22 17:32     ` Junio C Hamano
2020-04-22 19:18     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200423213735.242662-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=lkundrak@v3.sk \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).