git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: looking up object types quickly, was Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sun, 22 Mar 2020 15:18:38 -0400	[thread overview]
Message-ID: <20200322191838.GA671552@coredump.intra.peff.net> (raw)
In-Reply-To: <20200322184519.GA654322@coredump.intra.peff.net>

On Sun, Mar 22, 2020 at 02:45:19PM -0400, Jeff King wrote:

> 3. To find the type of a delta object, we have to walk back to a base
>    object with a concrete type. When we access the actual object
>    contents, we cache intermediate bases to avoid digging down the same
>    chain over and over. But I don't think there's any such cache for the
>    type lookup. So if you have a 50-deep delta chain and want the type
>    of each object, you'll walk down 49 links, then 48 links, and so on.
>    It probably wouldn't be that hard to have a small type cache (or just
>    allow the existing cache to have entries for "I know the type, but
>    don't have the contents").
> 
> I suspect (1) would give the biggest speedup, but is more work and
> requires bitmaps. I think (3) is likely to give a moderate win and
> probably isn't that big a patch.

Sadly, it doesn't seem to help. Here's a hacky implementation (I tested
it only on my limited cat-file case; I wouldn't be at all surprised if
it causes problems for users of the delta cache that want the actual
object contents).

After some rudimentary profiling (which I _should_ have done in the
first place before opening my mouth), it looks like finding the types
isn't actually the most expensive thing anyway: it's just dealing with
the sheer number of objects in the pack.

That implies that teaching packed_object_info() to use bitmaps to find
the individual types wouldn't help much either (but being able to ask
"just give me the commits" and iterating over the commit-type bitmap
probably _would_).

Anyway, here's the patch for the curious.

---
diff --git a/packfile.c b/packfile.c
index f4e752996d..482733689a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1273,6 +1273,13 @@ static int retry_bad_packed_offset(struct repository *r,
 
 #define POI_STACK_PREALLOC 64
 
+/* some reordering of static functions could avoid these */
+static enum object_type get_delta_base_cache_type(struct packed_git *p,
+						  off_t curpos);
+static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
+				 void *base, unsigned long base_size,
+				 enum object_type type);
+
 static enum object_type packed_to_object_type(struct repository *r,
 					      struct packed_git *p,
 					      off_t obj_offset,
@@ -1287,6 +1294,14 @@ static enum object_type packed_to_object_type(struct repository *r,
 	while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
 		off_t base_offset;
 		unsigned long size;
+		enum object_type cached_type;
+
+		cached_type = get_delta_base_cache_type(p, obj_offset);
+		if (cached_type > 0) {
+			type = cached_type;
+			goto out;
+		}
+
 		/* Push the object we're going to leave behind */
 		if (poi_stack_nr >= poi_stack_alloc && poi_stack == small_poi_stack) {
 			poi_stack_alloc = alloc_nr(poi_stack_nr);
@@ -1326,6 +1341,11 @@ static enum object_type packed_to_object_type(struct repository *r,
 	}
 
 out:
+	if (type != OBJ_BAD) {
+		size_t i;
+		for (i = 0; i < poi_stack_nr; i++)
+			add_delta_base_cache(p, poi_stack[i], NULL, 0, type);
+	}
 	if (poi_stack != small_poi_stack)
 		free(poi_stack);
 	return type;
@@ -1385,6 +1405,13 @@ get_delta_base_cache_entry(struct packed_git *p, off_t base_offset)
 	return e ? container_of(e, struct delta_base_cache_entry, ent) : NULL;
 }
 
+static enum object_type get_delta_base_cache_type(struct packed_git *p, off_t curpos)
+{
+	struct delta_base_cache_entry *ent =
+		get_delta_base_cache_entry(p, curpos);
+	return ent ? ent->type : OBJ_BAD;
+}
+
 static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
 				   const struct delta_base_cache_key *b)
 {
@@ -1433,7 +1460,7 @@ static void *cache_or_unpack_entry(struct repository *r, struct packed_git *p,
 	struct delta_base_cache_entry *ent;
 
 	ent = get_delta_base_cache_entry(p, base_offset);
-	if (!ent)
+	if (!ent || !ent->data)
 		return unpack_entry(r, p, base_offset, type, base_size);
 
 	if (type)
@@ -1677,7 +1704,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		struct delta_base_cache_entry *ent;
 
 		ent = get_delta_base_cache_entry(p, curpos);
-		if (ent) {
+		if (ent && ent->data) {
 			type = ent->type;
 			data = ent->data;
 			size = ent->size;

  reply	other threads:[~2020-03-22 19:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-21  3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21  3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21  5:00   ` Jeff King
2020-03-21  6:11     ` Taylor Blau
2020-03-21  6:24       ` Taylor Blau
2020-03-21  7:03       ` Jeff King
2020-03-21 17:27         ` Taylor Blau
2020-03-22  5:36           ` Jeff King
2020-03-22 11:04             ` SZEDER Gábor
2020-03-22 18:45               ` looking up object types quickly, was " Jeff King
2020-03-22 19:18                 ` Jeff King [this message]
2020-03-23 20:15               ` Taylor Blau
2020-03-22 16:45             ` Taylor Blau
2020-03-24  6:06               ` Jeff King
2020-03-21 18:50         ` Junio C Hamano
2020-03-22  0:03           ` Derrick Stolee
2020-03-22  0:20             ` Taylor Blau
2020-03-22  0:23               ` Derrick Stolee
2020-03-22  5:49                 ` Jeff King
2020-03-22  6:04                   ` Jeff King
2020-03-22 15:47                     ` Taylor Blau
2020-03-24  6:11                       ` Jeff King
2020-03-24 23:08                         ` Taylor Blau
2020-03-27  8:42                           ` Jeff King
2020-03-27 15:03                             ` Taylor Blau
2020-03-22 15:44                   ` Taylor Blau
2020-03-24  6:14                     ` Jeff King
2020-03-21  5:01   ` Junio C Hamano
2020-03-21  4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21  5:04   ` Jeff King
2020-03-21  6:12     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200322191838.GA671552@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).