git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Richard Oliver <roliver@roku.com>
To: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>
Cc: Derrick Stolee <derrickstolee@github.com>,
	git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH] mktree: learn about promised objects
Date: Wed, 15 Jun 2022 18:40:46 +0100	[thread overview]
Message-ID: <ad9b5ec9-14fd-cd66-be87-2fe1eb24296a@roku.com> (raw)
In-Reply-To: <YqlZb3Ycc71+dPu4@coredump.intra.peff.net>

On 15/06/2022 05:00, Jeff King wrote:
> On Tue, Jun 14, 2022 at 08:35:16PM -0400, Taylor Blau wrote:
> 
>> On Tue, Jun 14, 2022 at 01:27:18PM -0400, Derrick Stolee wrote:
>>>> Did you have any other sort of performance test in mind? The remotes we
>>>> typically deal with are geographically far away and deal with a high volume
>>>> of traffic so we're keen to move behaviour to the client where it makes sense
>>>> to do so.
>>>
>>> I guess I wonder how large your promisor pack-files are in this test,
>>> since your implementation depends on for_each_packed_object(), which
>>> should be really inefficient if you're actually dealing with a large
>>> partial clone.
>>
>> I had the same thought. Storing data available in the promisor packs
>> into an oid_map is going to be expensive if there are many such objects.
>>
>> Is there a reason that we can't introduce a variant of
>> find_kept_pack_entry() that deals only with .promisor packs and look
>> these things up as-needed?
> 
> It's much worse than that. The promisor mechanism is fundamentally very
> inefficient in runtime, optimizing instead for size. Imagine I have a
> partial clone and I retrieve tree X, which points to a blob Y that I
> don't get. I have X in a promisor pack, and asking about it is
> efficient. But if I want to know about Y, I have no data structure
> mentioning Y except the tree X itself. So to enumerate all of the
> promisor edges, I have to walk all of the trees in the promisor pack.
> 
> So it is not just lookup, but actual tree walking that is expensive. The
> flip side is that you don't have to store a complete separate list of
> the promised objects. Whether that's a win depends on how many local
> objects you have, versus how many are promised.
> 
> But it would be possible to cache the promisor list to make the tradeoff
> separately. E.g., do the walk over the promisor trees once (perhaps at
> pack creation time), and store a sorted list of fixed-length (oid, type)
> records that could be binary searched. You could even put it in the
> .promisor file. :)
> 
> -Peff

I like the idea of caching the promisor list at pack creation time;
I'll start work on a patch set that implements this.

Meanwhile, is it worth considering a '--promised-as-missing' option
(or a config option) for invocations such as 'mktree --missing' that
prevents promised objects being faulted-in? Currently, the only
reliable way that I've found to prevent 'mktree --missing' faulting-in
promised objects is to remove the remote. Such an option could either
set the global variable 'fetch_if_missing' to '0' or could ensure
'OBJECT_INFO_SKIP_FETCH_OBJECT' is passed appropriately.

Cheers,
Richard

  reply	other threads:[~2022-06-15 17:51 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 13:36 [PATCH] mktree: learn about promised objects Richard Oliver
2022-06-14 14:14 ` Derrick Stolee
2022-06-14 16:33   ` Richard Oliver
2022-06-14 17:27     ` Derrick Stolee
2022-06-15  0:35       ` Taylor Blau
2022-06-15  4:00         ` Jeff King
2022-06-15 17:40           ` Richard Oliver [this message]
2022-06-15 18:17             ` Derrick Stolee
2022-06-16  6:07               ` Jeff King
2022-06-16  6:54                 ` [PATCH] is_promisor_object(): walk promisor packs in pack-order Jeff King
2022-06-16 14:00                   ` Derrick Stolee
2022-06-17 19:50                   ` Jonathan Tan
2022-06-16 13:59                 ` [PATCH] mktree: learn about promised objects Derrick Stolee
2022-06-15 21:01             ` Junio C Hamano
2022-06-16  5:02               ` Jeff King
2022-06-16 15:46               ` [PATCH] mktree: Make '--missing' behave as documented Richard Oliver
2022-06-16 17:44                 ` Junio C Hamano
2022-06-21 13:59                   ` [PATCH] mktree: do not check type of remote objects Richard Oliver
2022-06-21 16:51                     ` Junio C Hamano
2022-06-21 17:48                     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad9b5ec9-14fd-cd66-be87-2fe1eb24296a@roku.com \
    --to=roliver@roku.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).