git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Lubomir Rintel <lkundrak@v3.sk>,
	git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
Date: Wed, 22 Apr 2020 08:40:25 -0700	[thread overview]
Message-ID: <20200422154025.GA91734@google.com> (raw)
In-Reply-To: <20200422095702.GA475060@coredump.intra.peff.net>

(+cc: Jonathan Tan)
Hi,

Jeff King wrote:

> Here's a recipe based on your fetches that shows the problem.
>
>   # start with an up-to-date regular clone of linus's tree; I had one
>   # lying around from https://github.com/torvalds/linux, but the source
>   # shouldn't matter
>   rm -rf repo.git
>   git clone --bare /path/to/linux repo.git
>   cd repo.git
>
>   git remote add next git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next
>   git remote add xo git@github.com:hackerspace/olpc-xo175-linux
>   git fetch --all
>
> The "next" fetch grabs about 30MB of objects. But the xo one downloads
> 1.5GB from 7.4M objects. That's using v2.26.2, so protocol 2.

Thanks!  I'll give it a try.

[...]
> There are a few data points we've been wanting to collect:
>
>  - does setting fetch.negotiationAlgorithm=skipping help? Yes, but not
>    as much as the v0 protocol does. It sends 84k objects, 33MB.

That's pretty good.  Tightening it further would require changing the
protocol to allow the client to say "please don't send me a pack; I want
to continue with negotiation".

>  - does the same fetch over v0 stateless-http have similar problems? No,
>    swapping out the second "remote add" for:
>
>      git remote add xo https://github.com/hackerspace/olpc-xo175-linux
>
>    results in the same 48k, 32MB fetch. The v0 conversation involved 10
>    POST requests. The v2 conversation only took 6 (and generates the
>    same big response as the ssh session, unsurprisingly).
>
> So it really does seem like something in v2 is not trying as hard to
> negotiate as v0 did, even when using stateless-http.

Interesting!  So it sounds like some refs that are not being fetched
are important here to the negotiation.  And the default (non-skipping)
negotiation algorithm is doing a bad job of exploring that part of
history.

Will take a closer look.

I think this still suggests that we should go ahead and switch
negotiation algorithms, both because it avoids this MAX_IN_VAIN and
because it reduces the number of rounds needed to make progress.

I'd also be tempted to get rid of MAX_IN_VAIN.  If we're at the point
of giving up, shouldn't we error out instead of having the server send
a copy of the entirety of history?

Jonathan

  parent reply	other threads:[~2020-04-22 15:40 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22  8:42 Git 2.26 fetches many times more objects than it should, wasting gigabytes Lubomir Rintel
2020-04-22  9:57 ` Jeff King
2020-04-22 10:30   ` Jeff King
2020-04-22 10:40     ` Jeff King
2020-04-22 15:33       ` Junio C Hamano
2020-04-22 19:33         ` Jeff King
2020-04-23 21:37       ` Jonathan Tan
2020-04-23 21:54         ` Junio C Hamano
2020-04-24  5:32         ` Jeff King
2020-04-22 15:40   ` Jonathan Nieder [this message]
2020-04-22 19:36     ` Jeff King
2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
2020-04-22 18:23     ` Junio C Hamano
2020-04-22 19:40     ` Jeff King
2020-04-22 19:47       ` Jeff King
2020-04-22 16:53   ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Jonathan Nieder
2020-04-22 17:32     ` Junio C Hamano
2020-04-22 19:18     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200422154025.GA91734@google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=lkundrak@v3.sk \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).