From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Derrick Stolee" <stolee@gmail.com>,
"René Scharfe" <l.s.r@web.de>
Subject: [PATCH v2 0/7] Speed up mirror-fetches with many refs
Date: Tue, 24 Aug 2021 12:36:51 +0200 [thread overview]
Message-ID: <cover.1629800774.git.ps@pks.im> (raw)
In-Reply-To: <cover.1629452412.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 13857 bytes --]
Hi,
this is the second version of my patch series to speed up mirror-fetches
with many refs. This topic applies on top of Junio's 9d5700f60b (Merge
branch 'ps/connectivity-optim' into jch, 2021-08-23).
Changes compared to v1:
- Patch 1/7: I've applied Stolee's proposal to only
opportunistically load objects via the commit-graph in case the
reference is not in refs/tags/ such that we don't regress repos
with many annotated tags.
- Patch 3/7: The return parameter of the iterator is now const to
allow further optimizations by the compiler, as suggested by
René. I've also re-benchmarked this, and one can now see a very
slight performance improvement of ~1%.
- Patch 4/7: Added my missing DCO, as pointed out by Junio.
- Patch 5, 6, 7: I've redone these to make it clearer that the
refactoring I'm doing doesn't cause us to miss any object
connectivity checks. Most importantly, I've merged `fetch_refs()`
and `consume_refs()` into `fetch_and_consume_refs()` in 6/7, which
makes the optimization where we elide the second connectivity
check in 7/7 trivial.
Thanks for your feedback!
Patrick
Patrick Steinhardt (7):
fetch: speed up lookup of want refs via commit-graph
fetch: avoid unpacking headers in object existence check
connected: refactor iterator to return next object ID directly
fetch-pack: optimize loading of refs via commit graph
fetch: refactor fetch refs to be more extendable
fetch: merge fetching and consuming refs
fetch: avoid second connectivity check if we already have all objects
builtin/clone.c | 8 ++---
builtin/fetch.c | 74 +++++++++++++++++++++++-------------------
builtin/receive-pack.c | 17 ++++------
connected.c | 15 +++++----
connected.h | 2 +-
fetch-pack.c | 14 +++++---
6 files changed, 68 insertions(+), 62 deletions(-)
Range-diff against v1:
1: 6872979c45 ! 1: 4a819a6830 fetch: speed up lookup of want refs via commit-graph
@@ Commit message
referenced objects.
Speed this up by opportunistcally trying to resolve object IDs via the
- commit graph: more likely than not, they're going to be a commit anyway,
- and this lets us avoid having to unpack object headers completely in
- case the object is a commit that is part of the commit-graph. This
- significantly speeds up mirror-fetches in a real-world repository with
+ commit graph. We only do so for any refs which are not in "refs/tags":
+ more likely than not, these are going to be a commit anyway, and this
+ lets us avoid having to unpack object headers completely in case the
+ object is a commit that is part of the commit-graph. This significantly
+ speeds up mirror-fetches in a real-world repository with
2.3M refs:
Benchmark #1: HEAD~: git-fetch
- Time (mean ± σ): 56.942 s ± 0.449 s [User: 53.360 s, System: 5.356 s]
- Range (min … max): 56.372 s … 57.533 s 5 runs
+ Time (mean ± σ): 56.482 s ± 0.384 s [User: 53.340 s, System: 5.365 s]
+ Range (min … max): 56.050 s … 57.045 s 5 runs
Benchmark #2: HEAD: git-fetch
- Time (mean ± σ): 33.657 s ± 0.167 s [User: 30.302 s, System: 5.181 s]
- Range (min … max): 33.454 s … 33.844 s 5 runs
+ Time (mean ± σ): 33.727 s ± 0.170 s [User: 30.252 s, System: 5.194 s]
+ Range (min … max): 33.452 s … 33.871 s 5 runs
Summary
'HEAD: git-fetch' ran
- 1.69 ± 0.02 times faster than 'HEAD~: git-fetch'
+ 1.67 ± 0.01 times faster than 'HEAD~: git-fetch'
Signed-off-by: Patrick Steinhardt <ps@pks.im>
## builtin/fetch.c ##
+@@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
+ int connectivity_checked, struct ref *ref_map)
+ {
+ struct fetch_head fetch_head;
+- struct commit *commit;
+ int url_len, i, rc = 0;
+ struct strbuf note = STRBUF_INIT, err = STRBUF_INIT;
+ struct ref_transaction *transaction = NULL;
+@@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
+ want_status <= FETCH_HEAD_IGNORE;
+ want_status++) {
+ for (rm = ref_map; rm; rm = rm->next) {
++ struct commit *commit = NULL;
+ struct ref *ref = NULL;
+
+ if (rm->status == REF_STATUS_REJECT_SHALLOW) {
@@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *remote_name,
continue;
}
@@ builtin/fetch.c: static int store_updated_refs(const char *raw_url, const char *
- 1);
- if (!commit)
- rm->fetch_head_status = FETCH_HEAD_NOT_FOR_MERGE;
-+ commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
++ /*
++ * References in "refs/tags/" are often going to point
++ * to annotated tags, which are not part of the
++ * commit-graph. We thus only try to look up refs in
++ * the graph which are not in that namespace to not
++ * regress performance in repositories with many
++ * annotated tags.
++ */
++ if (!starts_with(rm->name, "refs/tags/"))
++ commit = lookup_commit_in_graph(the_repository, &rm->old_oid);
+ if (!commit) {
+ commit = lookup_commit_reference_gently(the_repository,
+ &rm->old_oid,
2: d3dac607f2 = 2: 81ebadabe8 fetch: avoid unpacking headers in object existence check
3: 3bdad7bc8b ! 3: 98e981ced9 connected: refactor iterator to return next object ID directly
@@ Commit message
The object ID iterator used by the connectivity checks returns the next
object ID via an out-parameter and then uses a return code to indicate
whether an item was found. This is a bit roundabout: instead of a
- separate error code, we can just retrun the next object ID directly and
+ separate error code, we can just return the next object ID directly and
use `NULL` pointers as indicator that the iterator got no items left.
Furthermore, this avoids a copy of the object ID.
Refactor the iterator and all its implementations to return object IDs
- directly. While I was honestly hoping for a small speedup given that we
- can now avoid a copy, both versions perform the same. Still, the end
- result is easier to understand and thus it makes sense to keep this
- refactoring regardless.
+ directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:
+
+ Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
+ Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s]
+ Range (min … max): 29.934 s … 30.406 s 10 runs
+
+ Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
+ Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s]
+ Range (min … max): 29.696 s … 29.996 s 10 runs
+
+ Summary
+ '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
+ 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'
+
+ While this 1% speedup could be labelled as statistically insignificant,
+ the speedup is consistent on my machine. Furthermore, this is an end to
+ end test, so it is expected that the improvement in the connectivity
+ check itself is more significant.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
@@ builtin/clone.c: static void write_followtags(const struct ref *refs, const char
}
-static int iterate_ref_map(void *cb_data, struct object_id *oid)
-+static struct object_id *iterate_ref_map(void *cb_data)
++static const struct object_id *iterate_ref_map(void *cb_data)
{
struct ref **rm = cb_data;
struct ref *ref = *rm;
@@ builtin/fetch.c: static int update_local_ref(struct ref *ref,
}
-static int iterate_ref_map(void *cb_data, struct object_id *oid)
-+static struct object_id *iterate_ref_map(void *cb_data)
++static const struct object_id *iterate_ref_map(void *cb_data)
{
struct ref **rm = cb_data;
struct ref *ref = *rm;
@@ builtin/receive-pack.c: static void refuse_unconfigured_deny_delete_current(void
}
-static int command_singleton_iterator(void *cb_data, struct object_id *oid);
-+static struct object_id *command_singleton_iterator(void *cb_data);
++static const struct object_id *command_singleton_iterator(void *cb_data);
static int update_shallow_ref(struct command *cmd, struct shallow_info *si)
{
struct shallow_lock shallow_lock = SHALLOW_LOCK_INIT;
@@ builtin/receive-pack.c: static void check_aliased_updates(struct command *comman
}
-static int command_singleton_iterator(void *cb_data, struct object_id *oid)
-+static struct object_id *command_singleton_iterator(void *cb_data)
++static const struct object_id *command_singleton_iterator(void *cb_data)
{
struct command **cmd_list = cb_data;
struct command *cmd = *cmd_list;
@@ builtin/receive-pack.c: struct iterate_data {
};
-static int iterate_receive_command_list(void *cb_data, struct object_id *oid)
-+static struct object_id *iterate_receive_command_list(void *cb_data)
++static const struct object_id *iterate_receive_command_list(void *cb_data)
{
struct iterate_data *data = cb_data;
struct command **cmd_list = &data->cmds;
@@ connected.c: int check_connected(oid_iterate_fn fn, void *cb_data,
FILE *rev_list_in;
struct check_connected_options defaults = CHECK_CONNECTED_INIT;
- struct object_id oid;
-+ struct object_id *oid;
++ const struct object_id *oid;
int err = 0;
struct packed_git *new_pack = NULL;
struct transport *transport;
@@ connected.h: struct transport;
* to signal EOF, otherwise return 0.
*/
-typedef int (*oid_iterate_fn)(void *, struct object_id *oid);
-+typedef struct object_id *(*oid_iterate_fn)(void *);
++typedef const struct object_id *(*oid_iterate_fn)(void *);
/*
* Named-arguments struct for check_connected. All arguments are
@@ fetch-pack.c: static void update_shallow(struct fetch_pack_args *args,
}
-static int iterate_ref_map(void *cb_data, struct object_id *oid)
-+static struct object_id *iterate_ref_map(void *cb_data)
++static const struct object_id *iterate_ref_map(void *cb_data)
{
struct ref **rm = cb_data;
struct ref *ref = *rm;
4: 67917af7ce ! 4: 6311203f08 fetch-pack: optimize loading of refs via commit graph
@@ Commit message
In case this fails, we fall back to the old code which peels the
objects to a commit.
+ Signed-off-by: Patrick Steinhardt <ps@pks.im>
+
## fetch-pack.c ##
@@ fetch-pack.c: static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
{
5: 7653f8eabc ! 5: 56a9158ac3 fetch: refactor fetch refs to be more extendable
@@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
static int fetch_refs(struct transport *transport, struct ref *ref_map)
{
- int ret = check_exist_and_connected(ref_map);
-- if (ret) {
-- trace2_region_enter("fetch", "fetch_refs", the_repository);
-- ret = transport_fetch_refs(transport, ref_map);
-- trace2_region_leave("fetch", "fetch_refs", the_repository);
-- }
+ int ret;
+
+ /*
@@ builtin/fetch.c: static int check_exist_and_connected(struct ref *ref_map)
+ * refs.
+ */
+ ret = check_exist_and_connected(ref_map);
- if (!ret)
+ if (ret) {
+ trace2_region_enter("fetch", "fetch_refs", the_repository);
+ ret = transport_fetch_refs(transport, ref_map);
+ trace2_region_leave("fetch", "fetch_refs", the_repository);
++ if (ret) {
++ transport_unlock_pack(transport);
++ return ret;
++ }
+ }
+- if (!ret)
- /*
- * Keep the new pack's ".keep" file around to allow the caller
- * time to update refs to reference the new objects.
- */
- return 0;
+- return 0;
- transport_unlock_pack(transport);
- return ret;
+
-+ trace2_region_enter("fetch", "fetch_refs", the_repository);
-+ ret = transport_fetch_refs(transport, ref_map);
-+ trace2_region_leave("fetch", "fetch_refs", the_repository);
-+ if (ret) {
-+ transport_unlock_pack(transport);
-+ return ret;
-+ }
-+
+ /*
+ * Keep the new pack's ".keep" file around to allow the caller
+ * time to update refs to reference the new objects.
6: 646ac90e62 < -: ---------- fetch: avoid second connectivity check if we already have all objects
-: ---------- > 6: 31d9f72edf fetch: merge fetching and consuming refs
-: ---------- > 7: 84e39c847f fetch: avoid second connectivity check if we already have all objects
--
2.33.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-08-24 10:37 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-20 10:08 [PATCH 0/6] Speed up mirror-fetches with many refs Patrick Steinhardt
2021-08-20 10:08 ` [PATCH 1/6] fetch: speed up lookup of want refs via commit-graph Patrick Steinhardt
2021-08-20 14:27 ` Derrick Stolee
2021-08-20 17:18 ` Junio C Hamano
2021-08-23 6:46 ` Patrick Steinhardt
2021-08-25 14:12 ` Derrick Stolee
2021-08-20 10:08 ` [PATCH 2/6] fetch: avoid unpacking headers in object existence check Patrick Steinhardt
2021-08-25 23:44 ` Ævar Arnfjörð Bjarmason
2021-08-20 10:08 ` [PATCH 3/6] connected: refactor iterator to return next object ID directly Patrick Steinhardt
2021-08-20 14:32 ` Derrick Stolee
2021-08-20 17:43 ` Junio C Hamano
2021-08-20 17:43 ` René Scharfe
2021-08-23 6:47 ` Patrick Steinhardt
2021-08-20 10:08 ` [PATCH 4/6] fetch-pack: optimize loading of refs via commit graph Patrick Steinhardt
2021-08-20 14:37 ` Derrick Stolee
2021-08-20 10:08 ` [PATCH 5/6] fetch: refactor fetch refs to be more extendable Patrick Steinhardt
2021-08-20 14:41 ` Derrick Stolee
2021-08-20 10:08 ` [PATCH 6/6] fetch: avoid second connectivity check if we already have all objects Patrick Steinhardt
2021-08-20 14:47 ` Derrick Stolee
2021-08-23 6:52 ` Patrick Steinhardt
2021-08-20 14:50 ` [PATCH 0/6] Speed up mirror-fetches with many refs Derrick Stolee
2021-08-21 0:09 ` Junio C Hamano
2021-08-24 10:36 ` Patrick Steinhardt [this message]
2021-08-24 10:36 ` [PATCH v2 1/7] fetch: speed up lookup of want refs via commit-graph Patrick Steinhardt
2021-08-25 14:16 ` Derrick Stolee
2021-08-24 10:37 ` [PATCH v2 2/7] fetch: avoid unpacking headers in object existence check Patrick Steinhardt
2021-08-24 10:37 ` [PATCH v2 3/7] connected: refactor iterator to return next object ID directly Patrick Steinhardt
2021-08-24 10:37 ` [PATCH v2 4/7] fetch-pack: optimize loading of refs via commit graph Patrick Steinhardt
2021-08-24 10:37 ` [PATCH v2 5/7] fetch: refactor fetch refs to be more extendable Patrick Steinhardt
2021-08-25 14:19 ` Derrick Stolee
2021-09-01 12:48 ` Patrick Steinhardt
2021-08-24 10:37 ` [PATCH v2 6/7] fetch: merge fetching and consuming refs Patrick Steinhardt
2021-08-25 14:26 ` Derrick Stolee
2021-09-01 12:49 ` Patrick Steinhardt
2021-08-24 10:37 ` [PATCH v2 7/7] fetch: avoid second connectivity check if we already have all objects Patrick Steinhardt
2021-08-24 22:48 ` [PATCH v2 0/7] Speed up mirror-fetches with many refs Junio C Hamano
2021-08-25 6:04 ` Patrick Steinhardt
2021-08-25 14:27 ` Derrick Stolee
2021-09-01 13:09 ` [PATCH v3 " Patrick Steinhardt
2021-09-01 13:09 ` [PATCH v3 1/7] fetch: speed up lookup of want refs via commit-graph Patrick Steinhardt
2021-09-01 13:09 ` [PATCH v3 2/7] fetch: avoid unpacking headers in object existence check Patrick Steinhardt
2021-09-01 13:09 ` [PATCH v3 3/7] connected: refactor iterator to return next object ID directly Patrick Steinhardt
2021-09-01 13:09 ` [PATCH v3 4/7] fetch-pack: optimize loading of refs via commit graph Patrick Steinhardt
2021-09-01 13:09 ` [PATCH v3 5/7] fetch: refactor fetch refs to be more extendable Patrick Steinhardt
2021-09-01 13:10 ` [PATCH v3 6/7] fetch: merge fetching and consuming refs Patrick Steinhardt
2021-09-01 13:10 ` [PATCH v3 7/7] fetch: avoid second connectivity check if we already have all objects Patrick Steinhardt
2021-09-01 19:58 ` [PATCH v3 0/7] Speed up mirror-fetches with many refs Junio C Hamano
2021-09-08 0:08 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1629800774.git.ps@pks.im \
--to=ps@pks.im \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).