git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Patrick Marlier (pamarlie)" <pamarlie@cisco.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects
Date: Wed, 27 Nov 2019 07:32:11 -0500	[thread overview]
Message-ID: <20191127123211.GG22221@sigill.intra.peff.net> (raw)
In-Reply-To: <CH2PR11MB429411CA1288526D21C7AF26CF4C0@CH2PR11MB4294.namprd11.prod.outlook.com>

On Tue, Nov 19, 2019 at 01:12:51PM +0000, Patrick Marlier (pamarlie) wrote:

> I am hitting a performance issue with "git push <remote> <refspec>".
> The local repository has only few refs and the remote repository has a
> lot of refs (1000+) with objects unknown to the local repository.
> 
> "git push" of only one refspec takes minutes to complete. A quick
> analysis shows that a lot of time is spent in the client side.
> A deeper analysis shows that the client receives the entire list of
> refs on the remote, then the client is checking in its local
> repository if the objects exist for all remote refs.

Right, this is expected. The client send-pack feeds the list of remote
objects (that it has) to pack-objects, which can then limit the size of
the packfile it sends based on what the other side has.

So the patch you showed (to skip refs that aren't part of the push)
would miss many opportunities for a smaller push. E.g., imagine I create
a new branch "topic" from the remote branch "master", and then try to
push it up. If I do not look at which objects are in "master", I'll end
up sending the whole history again, when I really just need to send the
differences between the two.

> Since the local repository has a only few refs, most of the objects
> are unknown.
>
> This issue is particularly amplified because the local repository is
> using many alternates. Indeed for each unknown object, git will try to
> find in all alternates too.

I think the patch below would help you.

-- >8 --
Subject: [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects

When pushing, we feed pack-objects a list of both positive and negative
objects. The positive objects are what we want to send, and the negative
objects are what the other side told us they have, which we can use to
limit the size of the push.

Before passing along a negative object, send_pack() will make sure we
actually have it (since we only know about it because the remote
mentioned it, not because it's one of our refs). So it's expected that
some of these objects will be missing on the local side. But looking for
a missing object is more expensive than one that we have: it triggers
reprepare_packed_git() to handle a racy repack, plus it has to explore
every alternate's loose object tree (which can be slow if you have a lot
of them, or have a high-latency filesystem).

This isn't usually a big problem, since repositories you're pushing to
don't generally have a large number of refs that are unrelated to what
the client has. But there's no reason such a setup is wrong, and it
currently performs poorly.

We can fix this by using OBJECT_INFO_QUICK, which tells the lookup
code that we expect objects to be missing. Notably, it will not re-scan
the packs, and it will use the loose cache from 61c7711cfe (sha1-file:
use loose object cache for quick existence check, 2018-11-12).

The downside is that in the rare case that we race with a local repack,
we might fail to feed some objects to pack-objects, making the resulting
push larger. But we'd never produce an invalid or incorrect push, just a
less optimal one. That seems like a reasonable tradeoff, and we already
do similar things on the fetch side (e.g., when marking COMPLETE
commits).

Signed-off-by: Jeff King <peff@peff.net>
---
Interestingly, upload-pack does not use OBJECT_INFO_QUICK when it's
getting oids from the other side. But I think it could possibly benefit
in the same way. Nobody seems to have noticed. Perhaps it simply comes
up less, as servers would tend to have more objects than their clients?

 send-pack.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/send-pack.c b/send-pack.c
index 34c77cbb1a..16d6584439 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -41,7 +41,9 @@ int option_parse_push_signed(const struct option *opt,
 static void feed_object(const struct object_id *oid, FILE *fh, int negative)
 {
 	if (negative &&
-	    !has_object_file_with_flags(oid, OBJECT_INFO_SKIP_FETCH_OBJECT))
+	    !has_object_file_with_flags(oid,
+					OBJECT_INFO_SKIP_FETCH_OBJECT |
+					OBJECT_INFO_QUICK))
 		return;
 
 	if (negative)
-- 
2.24.0.716.g722aff65ed


  parent reply	other threads:[~2019-11-27 12:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-19 13:12 Push a ref to a remote with many refs Patrick Marlier (pamarlie)
2019-11-25 16:22 ` Patrick Marlier (pamarlie)
2019-11-27 12:32 ` Jeff King [this message]
2019-11-29  9:22   ` [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects Patrick Marlier (pamarlie)
2019-11-30 17:08   ` Junio C Hamano
2019-12-03 23:20     ` Jeff King
2019-12-04 20:53       ` Jonathan Tan
2019-12-04 21:37         ` Junio C Hamano
2019-12-04  3:55   ` Jonathan Nieder
2019-12-04  4:05     ` Jeff King
2019-12-10 16:16       ` Patrick Marlier (pamarlie)
2019-12-10 20:27         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191127123211.GG22221@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=pamarlie@cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).