git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Richard Oliver <roliver@roku.com>, Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, jonathantanmy@google.com
Subject: [PATCH] is_promisor_object(): walk promisor packs in pack-order
Date: Thu, 16 Jun 2022 02:54:41 -0400	[thread overview]
Message-ID: <YqrTsbXbEjx0Pabn@coredump.intra.peff.net> (raw)
In-Reply-To: <YqrIrYHKUP6b/EtN@coredump.intra.peff.net>

On Thu, Jun 16, 2022 at 02:07:41AM -0400, Jeff King wrote:

> Those rev-lists run in 1.7s and 224s respectively. Ouch!

Even though I expected the second one to be slow, I was shocked at just
how slow it was. The patch below speeds it up by a factor of 2, and I
think is worth applying separately, regardless of anything else being
discussed in this thread.

-- >8 --
Subject: is_promisor_object(): walk promisor packs in pack-order

When we generate the list of promisor objects, we walk every pack with a
.promisor file and examine its objects for any links to other objects.
By default, for_each_packed_object() will go in pack .idx order.

This is the worst case with respect to our delta base cache. If we have
a delta chain of A->B->C->D, then visiting A may require reconstructing
both B and C, unless we also visited B recently, in which case we may
have cached its value. Because .idx order is based on sha1, it's random
with respect to the actual object contents and deltas, and thus we're
unlikely to get many cache hits.

If we instead traverse in pack order, then we get the optimal case:
packs are written to keep delta families together, and to place bases
before their children.

Even on a modest repository like git.git, this has a noticeable speedup
on p5600.4, which runs "fsck" on a partial clone with blob:none (so lots
of trees which need to be walked, and which delta well):

Test       HEAD^               HEAD
-------------------------------------------------------
5600.4:    17.87(17.83+0.04)   15.42(15.35+0.06) -13.7%

On a larger repository like linux.git, the speedup is even more
pronounced:

Test       HEAD^                 HEAD
-----------------------------------------------------------
5600.4:    322.47(322.01+0.42)   186.41(185.76+0.63) -42.2%

Any other operations that call is_promisor_object(), like "rev-list
--exclude-promisor-objects", would similarly benefit, but the
invocations in p5600 don't actually trigger any such cases.

Note that we may pay a small price to build a rev-index in-memory to do
the pack-order traversal. But it's still a big net win, and even that
small cost goes away if you are using pack.writeReverseIndex.

Signed-off-by: Jeff King <peff@peff.net>
---
As an aside, feels like pack.writeReverseIndex ought to become the
default (which AFAIK hasn't happened yet).

 packfile.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/packfile.c b/packfile.c
index 8e812a84a3..ed69fe457b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2275,7 +2275,8 @@ int is_promisor_object(const struct object_id *oid)
 		if (has_promisor_remote()) {
 			for_each_packed_object(add_promisor_object,
 					       &promisor_objects,
-					       FOR_EACH_OBJECT_PROMISOR_ONLY);
+					       FOR_EACH_OBJECT_PROMISOR_ONLY |
+					       FOR_EACH_OBJECT_PACK_ORDER);
 		}
 		promisor_objects_prepared = 1;
 	}
-- 
2.37.0.rc0.352.g10876ef154


  reply	other threads:[~2022-06-16  6:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 13:36 [PATCH] mktree: learn about promised objects Richard Oliver
2022-06-14 14:14 ` Derrick Stolee
2022-06-14 16:33   ` Richard Oliver
2022-06-14 17:27     ` Derrick Stolee
2022-06-15  0:35       ` Taylor Blau
2022-06-15  4:00         ` Jeff King
2022-06-15 17:40           ` Richard Oliver
2022-06-15 18:17             ` Derrick Stolee
2022-06-16  6:07               ` Jeff King
2022-06-16  6:54                 ` Jeff King [this message]
2022-06-16 14:00                   ` [PATCH] is_promisor_object(): walk promisor packs in pack-order Derrick Stolee
2022-06-17 19:50                   ` Jonathan Tan
2022-06-16 13:59                 ` [PATCH] mktree: learn about promised objects Derrick Stolee
2022-06-15 21:01             ` Junio C Hamano
2022-06-16  5:02               ` Jeff King
2022-06-16 15:46               ` [PATCH] mktree: Make '--missing' behave as documented Richard Oliver
2022-06-16 17:44                 ` Junio C Hamano
2022-06-21 13:59                   ` [PATCH] mktree: do not check type of remote objects Richard Oliver
2022-06-21 16:51                     ` Junio C Hamano
2022-06-21 17:48                     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YqrTsbXbEjx0Pabn@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=roliver@roku.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).