git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Richard Oliver <roliver@roku.com>, Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, jonathantanmy@google.com
Subject: Re: [PATCH] mktree: learn about promised objects
Date: Thu, 16 Jun 2022 02:07:41 -0400	[thread overview]
Message-ID: <YqrIrYHKUP6b/EtN@coredump.intra.peff.net> (raw)
In-Reply-To: <1fe6c00a-806c-89de-cb67-d063dc4a5279@github.com>

On Wed, Jun 15, 2022 at 02:17:58PM -0400, Derrick Stolee wrote:

> On 6/15/2022 1:40 PM, Richard Oliver wrote:
> > On 15/06/2022 05:00, Jeff King wrote:
> 
> >> So it is not just lookup, but actual tree walking that is expensive. The
> >> flip side is that you don't have to store a complete separate list of
> >> the promised objects. Whether that's a win depends on how many local
> >> objects you have, versus how many are promised.
> 
> This is also why blobless (or blob-size filters) are the recommended way
> to use partial clone. It's just too expensive to have tree misses.

I agree that tree misses are awful, but I'm actually talking about
something different: traversing the local trees we _do_ have in order to
find the set of promised objects. Which is worse for blob:none, because
it means you have more trees locally. :)

Try this with a big repo like linux.git:

  git clone --no-local --filter=blob:none linux.git repo
  cd repo

  # this is fast; we mark the promisor trees as UNINTERESTING, so we do
  # not look at them as part of the traversal, and never call
  # is_promisor_object().
  time git rev-list --count --objects --all --exclude-promisor-objects

  # but imagine we had a fixed mktree[1] that did not fault in the blobs
  # unnecessarily, and we made a new tree that references a promised
  # blob.
  tree=$(git ls-tree HEAD~1000 | grep Makefile | git mktree --missing)
  commit=$(echo foo | git commit-tree -p HEAD $tree)
  git update-ref refs/heads/foo $commit

  # this is now slow; even though we only call is_promisor_object()
  # once, we have to open every single tree in the pack to find it!
  time git rev-list --count --objects --all --exclude-promisor-objects

Those rev-lists run in 1.7s and 224s respectively. Ouch!

Now the setup there is kind of contrived, and it's actually not trivial
to convince rev-list to actually call is_promisor_object() these days.
That's because promisor-ness (promisosity?) tends to be one-way
transitive. A promised blob is usually either only referenced by
promised trees (which we have in this case), or is faulted in as part of
making a new tree.

But it's not guaranteed, and certainly a faulted-in object could later
be deleted from the local repo, since it's promised. I suspect there are
less convoluted workflows to get to a similar state, but I don't know
them offhand. There's more discussion of this issue in this thread from
last summer:

  https://lore.kernel.org/git/20210403090412.GH2271@szeder.dev/

-Peff

[1] The mktree I used was the fix discussed elsewhere in the thread:

diff --git a/builtin/mktree.c b/builtin/mktree.c
index 902edba6d2..42ae82230c 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -117,15 +117,11 @@ static void mktree_line(char *buf, int nul_term_line, int allow_missing)
 	}
 
 	/* Check the type of object identified by sha1 */
-	obj_type = oid_object_info(the_repository, &oid, NULL);
-	if (obj_type < 0) {
-		if (allow_missing) {
-			; /* no problem - missing objects are presumed to be of the right type */
-		} else {
+	if (!allow_missing) {
+		obj_type = oid_object_info(the_repository, &oid, NULL);
+		if (obj_type < 0)
 			die("entry '%s' object %s is unavailable", path, oid_to_hex(&oid));
-		}
-	} else {
-		if (obj_type != mode_type) {
+		else if (obj_type != mode_type) {
 			/*
 			 * The object exists but is of the wrong type.
 			 * This is a problem regardless of allow_missing

  reply	other threads:[~2022-06-16  6:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 13:36 [PATCH] mktree: learn about promised objects Richard Oliver
2022-06-14 14:14 ` Derrick Stolee
2022-06-14 16:33   ` Richard Oliver
2022-06-14 17:27     ` Derrick Stolee
2022-06-15  0:35       ` Taylor Blau
2022-06-15  4:00         ` Jeff King
2022-06-15 17:40           ` Richard Oliver
2022-06-15 18:17             ` Derrick Stolee
2022-06-16  6:07               ` Jeff King [this message]
2022-06-16  6:54                 ` [PATCH] is_promisor_object(): walk promisor packs in pack-order Jeff King
2022-06-16 14:00                   ` Derrick Stolee
2022-06-17 19:50                   ` Jonathan Tan
2022-06-16 13:59                 ` [PATCH] mktree: learn about promised objects Derrick Stolee
2022-06-15 21:01             ` Junio C Hamano
2022-06-16  5:02               ` Jeff King
2022-06-16 15:46               ` [PATCH] mktree: Make '--missing' behave as documented Richard Oliver
2022-06-16 17:44                 ` Junio C Hamano
2022-06-21 13:59                   ` [PATCH] mktree: do not check type of remote objects Richard Oliver
2022-06-21 16:51                     ` Junio C Hamano
2022-06-21 17:48                     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YqrIrYHKUP6b/EtN@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=roliver@roku.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).