From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net
Subject: [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry()
Date: Wed, 3 Feb 2021 22:59:21 -0500 [thread overview]
Message-ID: <f1c07324f62cf4d087c41165cefed98f554cfd78.1612411124.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1612411123.git.me@ttaylorr.com>
From: Jeff King <peff@peff.net>
In a recent patch we added a function 'find_kept_pack_entry()' to look
for an object only among kept packs.
While this function avoids doing any lookup work in non-kept packs, it
is still linear in the number of packs, since we have to traverse the
linked list of packs once per object. Let's cache a reduced version of
that list to save us time.
Note that this cache will last the lifetime of the program. We could
invalidate it on reprepare_packed_git(), but there's not much point in
being rigorous here:
- we might already fail to notice new .keep packs showing up after the
program starts. We only reprepare_packed_git() when we fail to find
an object. But adding a new pack won't cause that to happen.
Somebody repacking could add a new pack and delete an old one, but
most of the time we'd have a descriptor or mmap open to the old
pack anyway, so we might not even notice.
- in pack-objects we already cache the .keep state at startup, since
56dfeb6263 (pack-objects: compute local/ignore_pack_keep early,
2016-07-29). So this is just extending that concept further.
- we don't have to worry about any packed_git being removed; we always
keep the old structs around, even after reprepare_packed_git()
Here are p5303 results (as always, measured against the kernel):
Test HEAD^ HEAD
----------------------------------------------------------------------------------------------
5303.5: repack (1) 57.44(54.71+10.78) 57.06(54.29+10.96) -0.7%
5303.6: repack with --stdin-packs (1) 0.01(0.00+0.01) 0.01(0.01+0.00) +0.0%
5303.10: repack (50) 71.32(88.38+4.90) 71.47(88.60+5.04) +0.2%
5303.11: repack with --stdin-packs (50) 3.43(11.81+0.22) 3.49(12.21+0.26) +1.7%
5303.15: repack (1000) 215.59(493.75+14.62) 217.41(495.36+14.85) +0.8%
5303.16: repack with --stdin-packs (1000) 131.44(314.24+8.11) 126.75(309.88+8.09) -3.6%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
builtin/pack-objects.c | 6 +--
object-store.h | 10 ++++
packfile.c | 103 +++++++++++++++++++++++------------------
packfile.h | 4 --
revision.c | 8 ++--
5 files changed, 76 insertions(+), 55 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fbd7b54d70..b2ba5aa14f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1225,9 +1225,9 @@ static int want_found_object(const struct object_id *oid, int exclude,
*/
unsigned flags = 0;
if (ignore_packed_keep_on_disk)
- flags |= ON_DISK_KEEP_PACKS;
+ flags |= CACHE_ON_DISK_KEEP_PACKS;
if (ignore_packed_keep_in_core)
- flags |= IN_CORE_KEEP_PACKS;
+ flags |= CACHE_IN_CORE_KEEP_PACKS;
if (ignore_packed_keep_on_disk && p->pack_keep)
return 0;
@@ -3089,7 +3089,7 @@ static void read_packs_list_from_stdin(void)
* an optimization during delta selection.
*/
revs.no_kept_objects = 1;
- revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+ revs.keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS;
revs.blob_objects = 1;
revs.tree_objects = 1;
revs.tag_objects = 1;
diff --git a/object-store.h b/object-store.h
index c4fc9dd74e..4cbe8eae3c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -105,6 +105,14 @@ static inline int pack_map_entry_cmp(const void *unused_cmp_data,
return strcmp(pg1->pack_name, key ? key : pg2->pack_name);
}
+#define CACHE_ON_DISK_KEEP_PACKS 1
+#define CACHE_IN_CORE_KEEP_PACKS 2
+
+struct kept_pack_cache {
+ struct packed_git **packs;
+ unsigned flags;
+};
+
struct raw_object_store {
/*
* Set of all object directories; the main directory is first (and
@@ -150,6 +158,8 @@ struct raw_object_store {
/* A most-recently-used ordered version of the packed_git list. */
struct list_head packed_git_mru;
+ struct kept_pack_cache *kept_pack_cache;
+
/*
* A map of packfiles to packed_git structs for tracking which
* packs have been loaded already.
diff --git a/packfile.c b/packfile.c
index 5f35cfe788..2a139c907b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2031,10 +2031,7 @@ static int fill_pack_entry(const struct object_id *oid,
return 1;
}
-static int find_one_pack_entry(struct repository *r,
- const struct object_id *oid,
- struct pack_entry *e,
- int kept_only)
+int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
{
struct list_head *pos;
struct multi_pack_index *m;
@@ -2044,49 +2041,64 @@ static int find_one_pack_entry(struct repository *r,
return 0;
for (m = r->objects->multi_pack_index; m; m = m->next) {
- if (!fill_midx_entry(r, oid, e, m))
- continue;
-
- if (!kept_only)
- return 1;
-
- if (((kept_only & ON_DISK_KEEP_PACKS) && e->p->pack_keep) ||
- ((kept_only & IN_CORE_KEEP_PACKS) && e->p->pack_keep_in_core))
+ if (fill_midx_entry(r, oid, e, m))
return 1;
}
list_for_each(pos, &r->objects->packed_git_mru) {
struct packed_git *p = list_entry(pos, struct packed_git, mru);
- if (p->multi_pack_index && !kept_only) {
- /*
- * If this pack is covered by the MIDX, we'd have found
- * the object already in the loop above if it was here,
- * so don't bother looking.
- *
- * The exception is if we are looking only at kept
- * packs. An object can be present in two packs covered
- * by the MIDX, one kept and one not-kept. And as the
- * MIDX points to only one copy of each object, it might
- * have returned only the non-kept version above. We
- * have to check again to be thorough.
- */
- continue;
- }
- if (!kept_only ||
- (((kept_only & ON_DISK_KEEP_PACKS) && p->pack_keep) ||
- ((kept_only & IN_CORE_KEEP_PACKS) && p->pack_keep_in_core))) {
- if (fill_pack_entry(oid, e, p)) {
- list_move(&p->mru, &r->objects->packed_git_mru);
- return 1;
- }
+ if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) {
+ list_move(&p->mru, &r->objects->packed_git_mru);
+ return 1;
}
}
return 0;
}
-int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
+static void maybe_invalidate_kept_pack_cache(struct repository *r,
+ unsigned flags)
{
- return find_one_pack_entry(r, oid, e, 0);
+ if (!r->objects->kept_pack_cache)
+ return;
+ if (r->objects->kept_pack_cache->flags == flags)
+ return;
+ free(r->objects->kept_pack_cache->packs);
+ FREE_AND_NULL(r->objects->kept_pack_cache);
+}
+
+static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags)
+{
+ maybe_invalidate_kept_pack_cache(r, flags);
+
+ if (!r->objects->kept_pack_cache) {
+ struct packed_git **packs = NULL;
+ size_t nr = 0, alloc = 0;
+ struct packed_git *p;
+
+ /*
+ * We want "all" packs here, because we need to cover ones that
+ * are used by a midx, as well. We need to look in every one of
+ * them (instead of the midx itself) to cover duplicates. It's
+ * possible that an object is found in two packs that the midx
+ * covers, one kept and one not kept, but the midx returns only
+ * the non-kept version.
+ */
+ for (p = get_all_packs(r); p; p = p->next) {
+ if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) ||
+ (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) {
+ ALLOC_GROW(packs, nr + 1, alloc);
+ packs[nr++] = p;
+ }
+ }
+ ALLOC_GROW(packs, nr + 1, alloc);
+ packs[nr] = NULL;
+
+ r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache));
+ r->objects->kept_pack_cache->packs = packs;
+ r->objects->kept_pack_cache->flags = flags;
+ }
+
+ return r->objects->kept_pack_cache->packs;
}
int find_kept_pack_entry(struct repository *r,
@@ -2094,13 +2106,15 @@ int find_kept_pack_entry(struct repository *r,
unsigned flags,
struct pack_entry *e)
{
- /*
- * Load all packs, including midx packs, since our "kept" strategy
- * relies on that. We're relying on the side effect of it setting up
- * r->objects->packed_git, which is a little ugly.
- */
- get_all_packs(r);
- return find_one_pack_entry(r, oid, e, flags);
+ struct packed_git **cache;
+
+ for (cache = kept_pack_cache(r, flags); *cache; cache++) {
+ struct packed_git *p = *cache;
+ if (fill_pack_entry(oid, e, p))
+ return 1;
+ }
+
+ return 0;
}
int has_object_pack(const struct object_id *oid)
@@ -2109,7 +2123,8 @@ int has_object_pack(const struct object_id *oid)
return find_pack_entry(the_repository, oid, &e);
}
-int has_object_kept_pack(const struct object_id *oid, unsigned flags)
+int has_object_kept_pack(const struct object_id *oid,
+ unsigned flags)
{
struct pack_entry e;
return find_kept_pack_entry(the_repository, oid, flags, &e);
diff --git a/packfile.h b/packfile.h
index 624327f64d..eb56db2a7b 100644
--- a/packfile.h
+++ b/packfile.h
@@ -161,10 +161,6 @@ int packed_object_info(struct repository *r,
void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
-#define ON_DISK_KEEP_PACKS 1
-#define IN_CORE_KEEP_PACKS 2
-#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS)
-
/*
* Iff a pack file in the given repository contains the object named by sha1,
* return true and store its location to e.
diff --git a/revision.c b/revision.c
index 4c5adb90b1..41c0478705 100644
--- a/revision.c
+++ b/revision.c
@@ -2338,14 +2338,14 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
die(_("--unpacked=<packfile> no longer supported"));
} else if (!strcmp(arg, "--no-kept-objects")) {
revs->no_kept_objects = 1;
- revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
- revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS;
+ revs->keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS;
+ revs->keep_pack_cache_flags |= CACHE_ON_DISK_KEEP_PACKS;
} else if (skip_prefix(arg, "--no-kept-objects=", &optarg)) {
revs->no_kept_objects = 1;
if (!strcmp(optarg, "in-core"))
- revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+ revs->keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS;
if (!strcmp(optarg, "on-disk"))
- revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS;
+ revs->keep_pack_cache_flags |= CACHE_ON_DISK_KEEP_PACKS;
} else if (!strcmp(arg, "-r")) {
revs->diff = 1;
revs->diffopt.flags.recursive = 1;
--
2.30.0.533.g2f8b6b552f.dirty
next prev parent reply other threads:[~2021-02-04 4:01 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40 ` Derrick Stolee
2021-01-20 14:38 ` Taylor Blau
2021-01-29 2:33 ` Junio C Hamano
2021-01-29 18:38 ` Taylor Blau
2021-01-29 19:31 ` Jeff King
2021-01-29 20:20 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29 3:10 ` Junio C Hamano
2021-01-29 19:13 ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29 3:21 ` Junio C Hamano
2021-01-29 19:19 ` Jeff King
2021-01-29 20:01 ` Taylor Blau
2021-01-29 20:25 ` Jeff King
2021-01-29 22:10 ` Taylor Blau
2021-01-29 22:57 ` Jeff King
2021-01-29 23:03 ` Junio C Hamano
2021-01-29 23:28 ` Taylor Blau
2021-02-02 3:04 ` Taylor Blau
2021-01-29 23:31 ` Jeff King
2021-01-29 22:13 ` Junio C Hamano
2021-01-29 20:30 ` Junio C Hamano
2021-01-29 22:43 ` Jeff King
2021-01-29 22:53 ` Taylor Blau
2021-01-29 23:00 ` Jeff King
2021-01-29 23:10 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29 3:40 ` Junio C Hamano
2021-01-29 19:32 ` Jeff King
2021-01-29 20:04 ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19 ` Eric Sunshine
2021-01-29 20:27 ` Jeff King
2021-01-29 20:36 ` Eric Sunshine
2021-01-29 22:11 ` Taylor Blau
2021-01-29 20:38 ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10 ` Jeff King
2021-01-29 23:12 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59 ` Derrick Stolee
2021-01-20 14:34 ` Taylor Blau
2021-01-20 15:51 ` Derrick Stolee
2021-01-21 3:45 ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04 3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04 3:58 ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42 ` Jeff King
2021-02-16 21:48 ` Taylor Blau
2021-02-04 3:58 ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17 ` Jeff King
2021-02-17 18:35 ` Taylor Blau
2021-02-04 3:59 ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46 ` Jeff King
2021-02-17 18:59 ` Taylor Blau
2021-02-17 19:21 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04 3:59 ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58 ` Jeff King
2021-02-17 0:02 ` Jeff King
2021-02-17 19:13 ` Taylor Blau
2021-02-17 19:25 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05 ` Jeff King
2021-02-17 19:23 ` Taylor Blau
2021-02-17 19:29 ` Jeff King
2021-02-04 3:59 ` Taylor Blau [this message]
2021-02-17 17:11 ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Jeff King
2021-02-17 19:54 ` Taylor Blau
2021-02-17 20:25 ` Jeff King
2021-02-17 20:29 ` Taylor Blau
2021-02-17 21:43 ` Jeff King
2021-02-04 3:59 ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17 ` Jeff King
2021-02-17 20:01 ` Taylor Blau
2021-02-17 0:01 ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18 ` Jeff King
2021-02-18 3:14 ` [PATCH v3 " Taylor Blau
2021-02-18 3:14 ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18 3:14 ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18 3:14 ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18 3:14 ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18 3:14 ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18 3:14 ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18 3:14 ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18 3:14 ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23 0:31 ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23 1:06 ` Taylor Blau
2021-02-23 1:42 ` Jeff King
2021-02-23 2:24 ` [PATCH v4 " Taylor Blau
2021-02-23 2:25 ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23 2:25 ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23 2:25 ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23 8:07 ` Junio C Hamano
2021-02-23 18:51 ` Jeff King
2021-02-23 2:25 ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23 2:25 ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23 2:25 ` [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-23 2:25 ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23 2:25 ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19 ` Junio C Hamano
2021-02-24 23:43 ` Junio C Hamano
2021-03-04 21:40 ` Taylor Blau
2021-03-04 21:55 ` Taylor Blau
2021-02-23 3:39 ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23 7:43 ` Junio C Hamano
2021-02-23 18:44 ` Jeff King
2021-02-23 19:54 ` Martin Fick
2021-02-23 20:06 ` Taylor Blau
2021-02-23 21:57 ` Martin Fick
2021-02-23 20:15 ` Jeff King
2021-02-23 21:41 ` Martin Fick
2021-02-23 21:53 ` Jeff King
2021-02-24 18:13 ` Martin Fick
2021-02-26 6:23 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f1c07324f62cf4d087c41165cefed98f554cfd78.1612411124.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).