From: Jeff King <peff@peff.net>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Taylor Blau <me@ttaylorr.com>,
git@vger.kernel.org, dstolee@microsoft.com
Subject: Re: looking up object types quickly, was Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging
Date: Sun, 22 Mar 2020 15:18:38 -0400 [thread overview]
Message-ID: <20200322191838.GA671552@coredump.intra.peff.net> (raw)
In-Reply-To: <20200322184519.GA654322@coredump.intra.peff.net>
On Sun, Mar 22, 2020 at 02:45:19PM -0400, Jeff King wrote:
> 3. To find the type of a delta object, we have to walk back to a base
> object with a concrete type. When we access the actual object
> contents, we cache intermediate bases to avoid digging down the same
> chain over and over. But I don't think there's any such cache for the
> type lookup. So if you have a 50-deep delta chain and want the type
> of each object, you'll walk down 49 links, then 48 links, and so on.
> It probably wouldn't be that hard to have a small type cache (or just
> allow the existing cache to have entries for "I know the type, but
> don't have the contents").
>
> I suspect (1) would give the biggest speedup, but is more work and
> requires bitmaps. I think (3) is likely to give a moderate win and
> probably isn't that big a patch.
Sadly, it doesn't seem to help. Here's a hacky implementation (I tested
it only on my limited cat-file case; I wouldn't be at all surprised if
it causes problems for users of the delta cache that want the actual
object contents).
After some rudimentary profiling (which I _should_ have done in the
first place before opening my mouth), it looks like finding the types
isn't actually the most expensive thing anyway: it's just dealing with
the sheer number of objects in the pack.
That implies that teaching packed_object_info() to use bitmaps to find
the individual types wouldn't help much either (but being able to ask
"just give me the commits" and iterating over the commit-type bitmap
probably _would_).
Anyway, here's the patch for the curious.
---
diff --git a/packfile.c b/packfile.c
index f4e752996d..482733689a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1273,6 +1273,13 @@ static int retry_bad_packed_offset(struct repository *r,
#define POI_STACK_PREALLOC 64
+/* some reordering of static functions could avoid these */
+static enum object_type get_delta_base_cache_type(struct packed_git *p,
+ off_t curpos);
+static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
+ void *base, unsigned long base_size,
+ enum object_type type);
+
static enum object_type packed_to_object_type(struct repository *r,
struct packed_git *p,
off_t obj_offset,
@@ -1287,6 +1294,14 @@ static enum object_type packed_to_object_type(struct repository *r,
while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t base_offset;
unsigned long size;
+ enum object_type cached_type;
+
+ cached_type = get_delta_base_cache_type(p, obj_offset);
+ if (cached_type > 0) {
+ type = cached_type;
+ goto out;
+ }
+
/* Push the object we're going to leave behind */
if (poi_stack_nr >= poi_stack_alloc && poi_stack == small_poi_stack) {
poi_stack_alloc = alloc_nr(poi_stack_nr);
@@ -1326,6 +1341,11 @@ static enum object_type packed_to_object_type(struct repository *r,
}
out:
+ if (type != OBJ_BAD) {
+ size_t i;
+ for (i = 0; i < poi_stack_nr; i++)
+ add_delta_base_cache(p, poi_stack[i], NULL, 0, type);
+ }
if (poi_stack != small_poi_stack)
free(poi_stack);
return type;
@@ -1385,6 +1405,13 @@ get_delta_base_cache_entry(struct packed_git *p, off_t base_offset)
return e ? container_of(e, struct delta_base_cache_entry, ent) : NULL;
}
+static enum object_type get_delta_base_cache_type(struct packed_git *p, off_t curpos)
+{
+ struct delta_base_cache_entry *ent =
+ get_delta_base_cache_entry(p, curpos);
+ return ent ? ent->type : OBJ_BAD;
+}
+
static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
const struct delta_base_cache_key *b)
{
@@ -1433,7 +1460,7 @@ static void *cache_or_unpack_entry(struct repository *r, struct packed_git *p,
struct delta_base_cache_entry *ent;
ent = get_delta_base_cache_entry(p, base_offset);
- if (!ent)
+ if (!ent || !ent->data)
return unpack_entry(r, p, base_offset, type, base_size);
if (type)
@@ -1677,7 +1704,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
struct delta_base_cache_entry *ent;
ent = get_delta_base_cache_entry(p, curpos);
- if (ent) {
+ if (ent && ent->data) {
type = ent->type;
data = ent->data;
size = ent->size;
next prev parent reply other threads:[~2020-03-22 19:18 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-21 3:44 [PATCH 0/1] commit-graph: avoid unnecessary tag deference when merging Taylor Blau
2020-03-21 3:44 ` [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference " Taylor Blau
2020-03-21 5:00 ` Jeff King
2020-03-21 6:11 ` Taylor Blau
2020-03-21 6:24 ` Taylor Blau
2020-03-21 7:03 ` Jeff King
2020-03-21 17:27 ` Taylor Blau
2020-03-22 5:36 ` Jeff King
2020-03-22 11:04 ` SZEDER Gábor
2020-03-22 18:45 ` looking up object types quickly, was " Jeff King
2020-03-22 19:18 ` Jeff King [this message]
2020-03-23 20:15 ` Taylor Blau
2020-03-22 16:45 ` Taylor Blau
2020-03-24 6:06 ` Jeff King
2020-03-21 18:50 ` Junio C Hamano
2020-03-22 0:03 ` Derrick Stolee
2020-03-22 0:20 ` Taylor Blau
2020-03-22 0:23 ` Derrick Stolee
2020-03-22 5:49 ` Jeff King
2020-03-22 6:04 ` Jeff King
2020-03-22 15:47 ` Taylor Blau
2020-03-24 6:11 ` Jeff King
2020-03-24 23:08 ` Taylor Blau
2020-03-27 8:42 ` Jeff King
2020-03-27 15:03 ` Taylor Blau
2020-03-22 15:44 ` Taylor Blau
2020-03-24 6:14 ` Jeff King
2020-03-21 5:01 ` Junio C Hamano
2020-03-21 4:56 ` [PATCH 0/1] commit-graph: avoid unnecessary tag deference " Junio C Hamano
2020-03-21 5:04 ` Jeff King
2020-03-21 6:12 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200322191838.GA671552@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).