git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Lubomir Rintel <lkundrak@v3.sk>,
	git@vger.kernel.org, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: Git 2.26 fetches many times more objects than it should, wasting gigabytes
Date: Wed, 22 Apr 2020 09:53:58 -0700	[thread overview]
Message-ID: <20200422165358.GB140314@google.com> (raw)
In-Reply-To: <20200422095702.GA475060@coredump.intra.peff.net>

Jeff King wrote:

> So it really does seem like something in v2 is not trying as hard to
> negotiate as v0 did, even when using stateless-http.
>
> I'm attaching for-each-ref output before and after the xo fetch. That
> should be sufficient to recreate the situation synthetically even once
> these repos have moved on.

Thanks again.  Looked a little closer and the advertised refs shouldn't
matter.  We just have two completely different code paths for
negotiation, one for v0 and one for v2.  In v0, we do

	fetch_negotiator_init(r, negotiator);
	mark_complete_and_common_ref(negotiator, args, &ref);
	filter_refs(argrs, &ref, sought, nr_sought);
	find_common(negotiator, args, &ref, sought, nr_sought);

find_common does

	count = 0;
	while ((oid = negotiator->next(negotiator)) {
		write "have %s\n", oid
		if (++count >= flush_at) {
			... flush ...
			flush_at = next_flush(args->stateless_rpc, count);
			...

In v2, the corresponding code is in add_haves:

	while ((oid = negotiator->next(negotiator))) {
		write "have %s\n", oid
		if (++haves_added >= *haves_to_send)
			break;
	}
	*in_vain += haves_added;
	if (!haves_added || *in_vain >= MAX_IN_VAIN)
		write "done"
	*haves_to_send = next_flush(1, *haves_to_send);

In other words, in v2 we reset the counter on each round whereas in v0
we keep a running total.  That would be expected to produce larger
negotiate requests in v2 than v0 (which looks like an unintentional
difference, but not the one producing this bug).

So much for flush_at handling.  in_vain seems like another area to
compare: in v2, the driving logic is in do_fetch_pack_v2:

	haves_to_send = INITIAL_FLUSH;
 negotiate_loop:
	while (!send_fetch_request(negotiator, fd, args, ref,
				   &common, &haves_to_send,
				   &in_vain, reader.use_sideband))
		switch (process_acks(negotiator, &reader, &common)) {
		case 1:
			in_vain = 0; /* renewed hope!
			continue;
		case 2:
			break negotiate_loop; /* time to move on */
		default:
			continue;
		}

When process_acks sees an ACK, it passes it on to the negotiator.
It wants to record that it received an ack to reset in_vain, but
it forgets to!  The function is initialized and read but never
written to.

So I'd expect the following to help:

diff --git i/fetch-pack.c w/fetch-pack.c
index 1734a573b01..a1d743e1f61 100644
--- i/fetch-pack.c
+++ w/fetch-pack.c
@@ -1287,6 +1287,8 @@ static int process_acks(struct fetch_negotiator *negotiator,
 			struct object_id oid;
 			if (!get_oid_hex(arg, &oid)) {
 				struct commit *commit;
+
+				received_ack = 1;
 				oidset_insert(common, &oid);
 				commit = lookup_commit(the_repository, &oid);
 				if (negotiator)

  parent reply	other threads:[~2020-04-22 16:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22  8:42 Git 2.26 fetches many times more objects than it should, wasting gigabytes Lubomir Rintel
2020-04-22  9:57 ` Jeff King
2020-04-22 10:30   ` Jeff King
2020-04-22 10:40     ` Jeff King
2020-04-22 15:33       ` Junio C Hamano
2020-04-22 19:33         ` Jeff King
2020-04-23 21:37       ` Jonathan Tan
2020-04-23 21:54         ` Junio C Hamano
2020-04-24  5:32         ` Jeff King
2020-04-22 15:40   ` Jonathan Nieder
2020-04-22 19:36     ` Jeff King
2020-04-22 15:50   ` [PATCH] Revert "fetch: default to protocol version 2" Jonathan Nieder
2020-04-22 18:23     ` Junio C Hamano
2020-04-22 19:40     ` Jeff King
2020-04-22 19:47       ` Jeff King
2020-04-22 16:53   ` Jonathan Nieder [this message]
2020-04-22 17:32     ` Git 2.26 fetches many times more objects than it should, wasting gigabytes Junio C Hamano
2020-04-22 19:18     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200422165358.GB140314@google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=lkundrak@v3.sk \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).