git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 2/2] receive-pack: use advertised reference tips to inform connectivity check
Date: Fri, 28 Oct 2022 16:42:27 +0200	[thread overview]
Message-ID: <006e89f384be1227b922fb6fdc8755ae84cac587.1666967670.git.ps@pks.im> (raw)
In-Reply-To: <cover.1666967670.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 8421 bytes --]

When serving a push, git-receive-pack(1) needs to verify that the
packfile sent by the client contains all objects that are required by
the updated references. This connectivity check works by marking all
preexisting references as uninteresting and using the new reference tips
as starting point for a graph walk.

This strategy has the major downside that it will not require any object
to be sent by the client that is reachable by any of the repositories'
references. While that sounds like it would be indeed what we are after
with the connectivity check, it is arguably not. The administrator that
manages the server-side Git repository may have configured certain refs
to be hidden during the reference advertisement via `transfer.hideRefs`
or `receivepack.hideRefs`. Whatever the reason, the result is that the
client shouldn't expect that any of those hidden references exists on
the remote side, and neither should they assume any of the pointed-to
objects to exist except if referenced by any visible reference. But
because we treat _all_ local refs as uninteresting in the connectivity
check, a client is free to send a packfile that references objects that
are only reachable via a hidden reference on the server-side, and we
will gladly accept it.

Besides the correctness issue there is also a performance issue. Git
forges tend to do internal bookkeeping to keep alive sets of objects for
internal use or make them easy to find via certain references. These
references are typically hidden away from the user so that they are
neither advertised nor writeable. At GitLab, we have one particular
repository that contains a total of 7 million references, of which 6.8
million are indeed internal references. With the current connectivity
check we are forced to load all these references in order to mark them
as uninteresting, and this alone takes around 15 seconds to compute.

We can fix both of these issues by changing the logic for stateful
invocations of git-receive-pack(1) where the reference advertisement and
packfile negotiation are served by the same process. Instead of marking
all preexisting references as unreachable, we will only mark those that
we have announced to the client.

Besides the stated fix to correctness this also provides a huge boost to
performance in the repository mentioned above. Pushing a new commit into
this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs
as it is configured in Gitaly leads to an almost 7.5-fold speedup:

    Benchmark 1: main
      Time (mean ± σ):     29.902 s ±  0.105 s    [User: 29.176 s, System: 1.052 s]
      Range (min … max):   29.781 s … 29.969 s    3 runs

    Benchmark 2: pks-connectivity-check-hide-refs
      Time (mean ± σ):      4.033 s ±  0.088 s    [User: 4.071 s, System: 0.374 s]
      Range (min … max):    3.953 s …  4.128 s    3 runs

    Summary
      'pks-connectivity-check-hide-refs' ran
        7.42 ± 0.16 times faster than 'main'

Unfortunately, this change comes with a performance hit when refs are
not hidden. Executed in the same repository:

    Benchmark 1: main
      Time (mean ± σ):     45.780 s ±  0.507 s    [User: 46.908 s, System: 4.838 s]
      Range (min … max):   45.453 s … 46.364 s    3 runs

    Benchmark 2: pks-connectivity-check-hide-refs
      Time (mean ± σ):     49.886 s ±  0.282 s    [User: 51.168 s, System: 5.015 s]
      Range (min … max):   49.589 s … 50.149 s    3 runs

    Summary
      'main' ran
        1.09 ± 0.01 times faster than 'pks-connectivity-check-hide-refs'

This is probably caused by the overhead of reachable tips being passed
in via git-rev-list(1)'s standard input, which seems to be slower than
reading the references from disk.

It is debatable what to do about this. If this were only about improving
performance then it would be trivial to make the new logic depend on
whether or not `transfer.hideRefs` has been configured in the repo. But
as explained this is also about correctness, even though this can be
considered an edge case. Furthermore, this slowdown is really only
noticeable in outliers like the above repository with an unreasonable
amount of refs. The same benchmark in linux-stable.git with about
4500 references shows no measurable difference:

    Benchmark 1: main
      Time (mean ± σ):     375.4 ms ±  25.4 ms    [User: 312.2 ms, System: 155.7 ms]
      Range (min … max):   324.2 ms … 492.9 ms    50 runs

    Benchmark 2: pks-connectivity-check-hide-refs
      Time (mean ± σ):     374.9 ms ±  36.9 ms    [User: 311.6 ms, System: 158.2 ms]
      Range (min … max):   319.2 ms … 583.1 ms    50 runs

    Summary
      'pks-connectivity-check-hide-refs' ran
        1.00 ± 0.12 times faster than 'main'

Let's keep this as-is for the time being and accept the performance hit.
It is arguably extremely noticeable to a user if a push now performs 7.5
times faster than before, but a lot less so in case an already-slow push
becomes about 10% slower.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/receive-pack.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 44bcea3a5b..50794539c6 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -326,13 +326,10 @@ static void show_one_alternate_ref(const struct object_id *oid,
 	show_ref(".have", oid);
 }
 
-static void write_head_info(void)
+static void write_head_info(struct oidset *announced_objects)
 {
-	static struct oidset seen = OIDSET_INIT;
-
-	for_each_ref(show_ref_cb, &seen);
-	for_each_alternate_ref(show_one_alternate_ref, &seen);
-	oidset_clear(&seen);
+	for_each_ref(show_ref_cb, announced_objects);
+	for_each_alternate_ref(show_one_alternate_ref, announced_objects);
 	if (!sent_capabilities)
 		show_ref("capabilities^{}", null_oid());
 
@@ -1896,12 +1893,20 @@ static void execute_commands_atomic(struct command *commands,
 	strbuf_release(&err);
 }
 
+static const struct object_id *iterate_announced_oids(void *cb_data)
+{
+	struct oidset_iter *iter = cb_data;
+	return oidset_iter_next(iter);
+}
+
 static void execute_commands(struct command *commands,
 			     const char *unpacker_error,
 			     struct shallow_info *si,
-			     const struct string_list *push_options)
+			     const struct string_list *push_options,
+			     struct oidset *announced_oids)
 {
 	struct check_connected_options opt = CHECK_CONNECTED_INIT;
+	struct oidset_iter announced_oids_iter;
 	struct command *cmd;
 	struct iterate_data data;
 	struct async muxer;
@@ -1928,6 +1933,12 @@ static void execute_commands(struct command *commands,
 	opt.err_fd = err_fd;
 	opt.progress = err_fd && !quiet;
 	opt.env = tmp_objdir_env(tmp_objdir);
+	if (oidset_size(announced_oids) != 0) {
+		oidset_iter_init(announced_oids, &announced_oids_iter);
+		opt.reachable_oids_fn = iterate_announced_oids;
+		opt.reachable_oids_data = &announced_oids_iter;
+	}
+
 	if (check_connected(iterate_receive_command_list, &data, &opt))
 		set_connectivity_errors(commands, si);
 
@@ -2462,6 +2473,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 {
 	int advertise_refs = 0;
 	struct command *commands;
+	struct oidset announced_oids = OIDSET_INIT;
 	struct oid_array shallow = OID_ARRAY_INIT;
 	struct oid_array ref = OID_ARRAY_INIT;
 	struct shallow_info si;
@@ -2524,7 +2536,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 	}
 
 	if (advertise_refs || !stateless_rpc) {
-		write_head_info();
+		write_head_info(&announced_oids);
 	}
 	if (advertise_refs)
 		return 0;
@@ -2554,7 +2566,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		}
 		use_keepalive = KEEPALIVE_ALWAYS;
 		execute_commands(commands, unpack_status, &si,
-				 &push_options);
+				 &push_options, &announced_oids);
 		if (pack_lockfile)
 			unlink_or_warn(pack_lockfile);
 		sigchain_push(SIGPIPE, SIG_IGN);
@@ -2591,6 +2603,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		packet_flush(1);
 	oid_array_clear(&shallow);
 	oid_array_clear(&ref);
+	oidset_clear(&announced_oids);
 	free((void *)push_cert_nonce);
 	return 0;
 }
-- 
2.38.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2022-10-28 14:44 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-28 14:42 [PATCH 0/2] receive-pack: use advertised reference tips to inform connectivity check Patrick Steinhardt
2022-10-28 14:42 ` [PATCH 1/2] connected: allow supplying different view of reachable objects Patrick Steinhardt
2022-10-28 14:54   ` Ævar Arnfjörð Bjarmason
2022-10-28 18:12   ` Junio C Hamano
2022-10-30 18:49     ` Taylor Blau
2022-10-31 13:10     ` Patrick Steinhardt
2022-11-01  1:16       ` Taylor Blau
2022-10-28 14:42 ` Patrick Steinhardt [this message]
2022-10-28 15:01   ` [PATCH 2/2] receive-pack: use advertised reference tips to inform connectivity check Ævar Arnfjörð Bjarmason
2022-10-31 14:21     ` Patrick Steinhardt
2022-10-31 15:36       ` Ævar Arnfjörð Bjarmason
2022-10-30 19:09   ` Taylor Blau
2022-10-31 14:45     ` Patrick Steinhardt
2022-11-01  1:28       ` Taylor Blau
2022-11-01  7:20         ` Patrick Steinhardt
2022-11-01 11:53           ` Patrick Steinhardt
2022-11-02  1:05             ` Taylor Blau
2022-11-01  8:28       ` Jeff King
2022-10-28 16:40 ` [PATCH 0/2] " Junio C Hamano
2022-11-01  1:30 ` Taylor Blau
2022-11-01  9:00 ` Jeff King
2022-11-01 11:49   ` Patrick Steinhardt
2022-11-03 14:37 ` [PATCH v2 0/3] receive-pack: only use visible refs for " Patrick Steinhardt
2022-11-03 14:37   ` [PATCH v2 1/3] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-03 14:37   ` [PATCH v2 2/3] revision: add new parameter to specify all visible refs Patrick Steinhardt
2022-11-05 12:46     ` Jeff King
2022-11-07  8:20       ` Patrick Steinhardt
2022-11-08 14:32         ` Jeff King
2022-11-05 12:55     ` Jeff King
2022-11-03 14:37   ` [PATCH v2 3/3] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-05  0:40   ` [PATCH v2 0/3] " Taylor Blau
2022-11-05 12:55     ` Jeff King
2022-11-05 12:52   ` Jeff King
2022-11-07 12:16 ` [PATCH v3 0/6] " Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-07 12:51     ` Ævar Arnfjörð Bjarmason
2022-11-08  9:11       ` Patrick Steinhardt
2022-11-07 12:16   ` [PATCH v3 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-07 13:34     ` Ævar Arnfjörð Bjarmason
2022-11-07 17:07       ` Ævar Arnfjörð Bjarmason
2022-11-08  9:48         ` Patrick Steinhardt
2022-11-08  9:22       ` Patrick Steinhardt
2022-11-08  0:57     ` Taylor Blau
2022-11-08  8:16       ` Patrick Steinhardt
2022-11-08 14:42         ` Jeff King
2022-11-07 12:16   ` [PATCH v3 5/6] revparse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 14:44     ` Jeff King
2022-11-07 12:16   ` [PATCH v3 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-08  0:59   ` [PATCH v3 0/6] " Taylor Blau
2022-11-08 10:03 ` [PATCH v4 " Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 1/6] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-08 13:36     ` Ævar Arnfjörð Bjarmason
2022-11-08 14:49       ` Patrick Steinhardt
2022-11-08 14:51     ` Jeff King
2022-11-08 10:03   ` [PATCH v4 2/6] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 3/6] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 4/6] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-08 15:07     ` Jeff King
2022-11-08 21:13       ` Taylor Blau
2022-11-11  5:48       ` Patrick Steinhardt
2022-11-08 10:03   ` [PATCH v4 5/6] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-08 10:04   ` [PATCH v4 6/6] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11  6:49 ` [PATCH v5 0/7] " Patrick Steinhardt
2022-11-11  6:49   ` [PATCH v5 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-11  6:49   ` [PATCH v5 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-11  6:50   ` [PATCH v5 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-11 22:18   ` [PATCH v5 0/7] " Taylor Blau
2022-11-15 17:26     ` Jeff King
2022-11-16 21:22       ` Taylor Blau
2022-11-16 22:04         ` Jeff King
2022-11-16 22:33           ` Taylor Blau
2022-11-17  5:45             ` Patrick Steinhardt
2022-11-17  5:46 ` [PATCH v6 " Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 1/7] refs: fix memory leak when parsing hideRefs config Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 2/7] refs: get rid of global list of hidden refs Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 3/7] revision: move together exclusion-related functions Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 4/7] revision: introduce struct to handle exclusions Patrick Steinhardt
2022-11-17  5:46   ` [PATCH v6 5/7] revision: add new parameter to exclude hidden refs Patrick Steinhardt
2022-11-17  5:47   ` [PATCH v6 6/7] rev-parse: add `--exclude-hidden=` option Patrick Steinhardt
2022-11-17  5:47   ` [PATCH v6 7/7] receive-pack: only use visible refs for connectivity check Patrick Steinhardt
2022-11-17 15:03   ` [PATCH v6 0/7] " Jeff King
2022-11-17 21:24     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=006e89f384be1227b922fb6fdc8755ae84cac587.1666967670.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).