git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Patrick Marlier (pamarlie)" <pamarlie@cisco.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects
Date: Tue, 3 Dec 2019 19:55:22 -0800	[thread overview]
Message-ID: <20191204035522.GC214771@google.com> (raw)
In-Reply-To: <20191127123211.GG22221@sigill.intra.peff.net>

Hi,

Jeff King wrote:

> Subject: [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects
>
> When pushing, we feed pack-objects a list of both positive and negative
> objects. The positive objects are what we want to send, and the negative
> objects are what the other side told us they have, which we can use to
> limit the size of the push.
>
> Before passing along a negative object, send_pack() will make sure we
> actually have it (since we only know about it because the remote
> mentioned it, not because it's one of our refs). So it's expected that
> some of these objects will be missing on the local side. But looking for
> a missing object is more expensive than one that we have: it triggers
> reprepare_packed_git() to handle a racy repack, plus it has to explore
> every alternate's loose object tree (which can be slow if you have a lot
> of them, or have a high-latency filesystem).

Nice analysis.

> This isn't usually a big problem, since repositories you're pushing to
> don't generally have a large number of refs that are unrelated to what
> the client has. But there's no reason such a setup is wrong, and it
> currently performs poorly.
>
> We can fix this by using OBJECT_INFO_QUICK, which tells the lookup
> code that we expect objects to be missing. Notably, it will not re-scan
> the packs, and it will use the loose cache from 61c7711cfe (sha1-file:
> use loose object cache for quick existence check, 2018-11-12).

On first reading, I wondered how this would interact with alternates,
since you had mentioned that checking alternates can be expensive.  Does
this go too far in that direction by treating an object as missing
whenever it's not in the local object store, even if it's available from
an alternate?

But I believe that was a misreading.  With this patch, we still do pay
the cost of checking alternates for the missing object.  The savings
is instead about having to *double* check.

Am I understanding correctly?

[...]
> Signed-off-by: Jeff King <peff@peff.net>

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

> ---
> Interestingly, upload-pack does not use OBJECT_INFO_QUICK when it's
> getting oids from the other side. But I think it could possibly benefit
> in the same way. Nobody seems to have noticed. Perhaps it simply comes
> up less, as servers would tend to have more objects than their clients?

I like to imagine that servers are also more likely to keep a tidy set
of packs and to avoid alternates.  But using INFO_QUICK when checking
the fetcher's "have"s does sound like a sensible change to me.

Thanks,
Jonathan

  parent reply	other threads:[~2019-12-04  3:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-19 13:12 Push a ref to a remote with many refs Patrick Marlier (pamarlie)
2019-11-25 16:22 ` Patrick Marlier (pamarlie)
2019-11-27 12:32 ` [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects Jeff King
2019-11-29  9:22   ` Patrick Marlier (pamarlie)
2019-11-30 17:08   ` Junio C Hamano
2019-12-03 23:20     ` Jeff King
2019-12-04 20:53       ` Jonathan Tan
2019-12-04 21:37         ` Junio C Hamano
2019-12-04  3:55   ` Jonathan Nieder [this message]
2019-12-04  4:05     ` Jeff King
2019-12-10 16:16       ` Patrick Marlier (pamarlie)
2019-12-10 20:27         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191204035522.GC214771@google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=pamarlie@cisco.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).