git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH] push: perform negotiation before sending packfile
@ 2021-02-18  1:21 Jonathan Tan
  2021-02-18  1:34 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2021-02-18  1:21 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The idea of adding negotiation to push has been floating around for a
while. Here's my implementation of my idea of reusing a lot of the fetch
mechanism to perform the negotiation.

The basic idea is that when a client pushes, the client will first
perform the negotiation steps that it normally does during a fetch,
except that it does not send any "want"s and it only uses the commits to
be pushed as negotiation tips (instead of all refs). Once the client has
received enough ACKs that all ancestral paths from all tips to the
original orphans are blocked by ACKed commits, it will proceed with the
push, using this information to determine the contents of the
to-be-pushed packfile. (This check is done by the server when doing a
user-triggered fetch.) This requires a minimal change on the server to
accept an argument that tells the server that it cannot take the
initiative to send the packfile, and to allow the client to not send any
"want"s.

The main features (which are implemented in this patch) are:

 - Minimal change to the fetch-serving part of the server, as described
   (new "wait-for-done" argument that inhibits the server taking the
   initiative to send the packfile, and to allow the client to not send
   any "want"s).
 - No change needed to the push-serving part of the server, although for
   performance, we would want something that can stop the ref
   advertisement since it is no longer needed (perhaps a simplification
   of my earlier work [1]).
 - Reuse of a lot of fetch code.
 - We inherit any fetch negotiation algorithm improvements (e.g. the
   "skipping" negotiator).

What hasn't been done yet:

 - Refactor negotiation tip code that I copied over from builtin/fetch.c
   to builtin/fetch-pack.c.
 - Non-main-path cases (e.g. if the server doesn't support this new
   "wait-for-done")
 - Certain optimizations like avoiding the ls-refs call when negotiating
 - Config option on the client to enable/disable this feature
 - Do we need statistics in the commit message to show the performance
   gains?
 - Anything else that comes up upon more scrutiny

Any comments are welcome, especially if you have ideas about the overall
design or implementation, and/or if you've thought about similar things
before.

[1] https://lore.kernel.org/git/20201109224732.2549561-1-jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fetch-pack.c  | 57 +++++++++++++++++++++++++++++++++++++++++++
 fetch-pack.c          | 49 ++++++++++++++++++++++++++++++++++---
 fetch-pack.h          |  6 +++++
 object.h              |  2 +-
 send-pack.c           | 51 +++++++++++++++++++++++++++++++++++---
 t/t5510-fetch.sh      | 19 +++++++++++++++
 t/t5516-fetch-push.sh | 24 ++++++++++++++++++
 transport.c           |  1 +
 transport.h           |  5 ++++
 upload-pack.c         | 16 +++++++++---
 10 files changed, 217 insertions(+), 13 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 58b7c1fbdc..c8c942b42c 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -5,6 +5,8 @@
 #include "connect.h"
 #include "oid-array.h"
 #include "protocol.h"
+#include "oidset.h"
+#include "refs.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
@@ -40,6 +42,39 @@ static void add_sought_entry(struct ref ***sought, int *nr, int *alloc,
 	(*sought)[*nr - 1] = ref;
 }
 
+static int add_oid(const char *refname, const struct object_id *oid, int flags,
+		   void *cb_data)
+{
+	struct oid_array *oids = cb_data;
+
+	oid_array_append(oids, oid);
+	return 0;
+}
+
+static void add_negotiation_tips(struct fetch_pack_args *args, struct string_list *negotiation_tip)
+{
+	struct oid_array *oids = xcalloc(1, sizeof(*oids));
+	int i;
+
+	for (i = 0; i < negotiation_tip->nr; i++) {
+		const char *s = negotiation_tip->items[i].string;
+		int old_nr;
+		if (!has_glob_specials(s)) {
+			struct object_id oid;
+			if (get_oid(s, &oid))
+				die("%s is not a valid object", s);
+			oid_array_append(oids, &oid);
+			continue;
+		}
+		old_nr = oids->nr;
+		for_each_glob_ref(add_oid, s, oids);
+		if (old_nr == oids->nr)
+			warning("Ignoring --negotiation-tip=%s because it does not match any refs",
+				s);
+	}
+	args->negotiation_tips = oids;
+}
+
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, ret;
@@ -54,6 +89,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 	struct fetch_pack_args args;
 	struct oid_array shallow = OID_ARRAY_INIT;
 	struct string_list deepen_not = STRING_LIST_INIT_DUP;
+	struct string_list negotiation_tip = STRING_LIST_INIT_DUP;
+	struct oidset acked_commits = OIDSET_INIT;
 	struct packet_reader reader;
 	enum protocol_version version;
 
@@ -161,10 +198,20 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 			list_objects_filter_set_no_filter(&args.filter_options);
 			continue;
 		}
+		if (skip_prefix(arg, "--negotiation-tip=", &arg)) {
+			string_list_append(&negotiation_tip, arg);
+			continue;
+		}
+		if (!strcmp("--negotiate-only", arg)) {
+			args.acked_commits = &acked_commits;
+			continue;
+		}
 		usage(fetch_pack_usage);
 	}
 	if (deepen_not.nr)
 		args.deepen_not = &deepen_not;
+	if (negotiation_tip.nr)
+		add_negotiation_tips(&args, &negotiation_tip);
 
 	if (i < argc)
 		dest = argv[i++];
@@ -232,6 +279,16 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 
 	ref = fetch_pack(&args, fd, ref, sought, nr_sought,
 			 &shallow, pack_lockfiles_ptr, version);
+	if (args.acked_commits) {
+		struct oidset_iter iter;
+		const struct object_id *oid;
+		oidset_iter_init(args.acked_commits, &iter);
+
+		while ((oid = oidset_iter_next(&iter))) {
+			printf("%s\n", oid_to_hex(oid));
+		}
+		return 0;
+	}
 	if (pack_lockfiles.nr) {
 		int i;
 
diff --git a/fetch-pack.c b/fetch-pack.c
index 1eaedcb5dc..4d2c6119f1 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -23,6 +23,7 @@
 #include "fetch-negotiator.h"
 #include "fsck.h"
 #include "shallow.h"
+#include "commit-reach.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -44,6 +45,8 @@ static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 /* Remember to update object flag allocation in object.h */
 #define COMPLETE	(1U << 0)
 #define ALTERNATE	(1U << 1)
+#define COMMON		(1U << 6)
+#define REACH_SCRATCH	(1U << 7)
 
 /*
  * After sending this many "have"s if we do not get any new ACK , we
@@ -1224,6 +1227,8 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 		packet_buf_write(&req_buf, "ofs-delta");
 	if (sideband_all)
 		packet_buf_write(&req_buf, "sideband-all");
+	if (args->acked_commits)
+		packet_buf_write(&req_buf, "wait-for-done");
 
 	/* Add shallow-info and deepen request */
 	if (server_supports_feature("fetch", "shallow", 0))
@@ -1324,12 +1329,15 @@ enum common_found {
 	 * "ready" was received, indicating that the server is ready to send
 	 * the packfile without any further negotiation.
 	 */
-	READY
+	READY,
+
+	CLOSURE
 };
 
 static enum common_found process_acks(struct fetch_negotiator *negotiator,
 				      struct packet_reader *reader,
-				      struct oidset *common)
+				      struct oidset *common,
+				      struct object_array *negotiation_tips)
 {
 	/* received */
 	int received_ready = 0;
@@ -1351,6 +1359,8 @@ static enum common_found process_acks(struct fetch_negotiator *negotiator,
 				commit = lookup_commit(the_repository, &oid);
 				if (negotiator)
 					negotiator->ack(negotiator, commit);
+				if (negotiation_tips)
+					commit->object.flags |= COMMON;
 			}
 			continue;
 		}
@@ -1379,6 +1389,13 @@ static enum common_found process_acks(struct fetch_negotiator *negotiator,
 	if (!received_ready && reader->status != PACKET_READ_FLUSH)
 		die(_("expected no other sections to be sent after no 'ready'"));
 
+	if (negotiation_tips) {
+		if (can_all_from_reach_with_flag(negotiation_tips, COMMON,
+						    REACH_SCRATCH, 0,
+						    GENERATION_NUMBER_ZERO))
+			return CLOSURE;
+	}
+
 	return received_ready ? READY :
 		(received_ack ? COMMON_FOUND : NO_COMMON_FOUND);
 }
@@ -1509,6 +1526,14 @@ static void do_check_stateless_delimiter(const struct fetch_pack_args *args,
 				  _("git fetch-pack: expected response end packet"));
 }
 
+static int add_to_object_array(const struct object_id *oid, void *data)
+{
+	struct object_array *a = data;
+
+	add_object_array(parse_object(the_repository, oid), "", a);
+	return 0;
+}
+
 static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				    int fd[2],
 				    const struct ref *orig_ref,
@@ -1527,6 +1552,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
 	int seen_ack = 0;
+	struct object_array negotiation_tips_object_array = OBJECT_ARRAY_INIT;
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
 
@@ -1542,6 +1568,11 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		reader.me = "fetch-pack";
 	}
 
+	if (args->acked_commits)
+		oid_array_for_each((struct oid_array *) args->negotiation_tips,
+				   add_to_object_array,
+				   &negotiation_tips_object_array);
+
 	while (state != FETCH_DONE) {
 		switch (state) {
 		case FETCH_CHECK_LOCAL:
@@ -1557,7 +1588,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			/* Filter 'ref' by 'sought' and those that aren't local */
 			mark_complete_and_common_ref(negotiator, args, &ref);
 			filter_refs(args, &ref, sought, nr_sought);
-			if (everything_local(args, &ref))
+			if (!args->acked_commits && everything_local(args, &ref))
 				state = FETCH_DONE;
 			else
 				state = FETCH_SEND_REQUEST;
@@ -1574,6 +1605,11 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 						    the_repository);
 			}
 			if (send_fetch_request(negotiator, fd[1], args, ref,
+					       /*
+						* If using acked_commits, we
+						* want an empty oidset here, so
+						* &common is used in all cases.
+						*/
 					       &common,
 					       &haves_to_send, &in_vain,
 					       reader.use_sideband,
@@ -1584,7 +1620,12 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			break;
 		case FETCH_PROCESS_ACKS:
 			/* Process ACKs/NAKs */
-			switch (process_acks(negotiator, &reader, &common)) {
+			switch (process_acks(negotiator, &reader,
+					     args->acked_commits ? args->acked_commits : &common,
+					     args->acked_commits ? &negotiation_tips_object_array : NULL)) {
+			case CLOSURE:
+				state = FETCH_DONE;
+				break;
 			case READY:
 				/*
 				 * Don't check for response delimiter; get_pack() will
diff --git a/fetch-pack.h b/fetch-pack.h
index 736a3dae46..3fc8ba6126 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -23,6 +23,12 @@ struct fetch_pack_args {
 	 */
 	const struct oid_array *negotiation_tips;
 
+	/*
+	 * If not NULL, fetch-pack will not fetch any packs but will only store
+	 * in here commits ACKed by the server during negotiation.
+	 */
+	struct oidset *acked_commits;
+
 	unsigned deepen_relative:1;
 	unsigned quiet:1;
 	unsigned keep_pack:1;
diff --git a/object.h b/object.h
index 59daadce21..4806fa8e66 100644
--- a/object.h
+++ b/object.h
@@ -60,7 +60,7 @@ struct object_array {
 /*
  * object flag allocation:
  * revision.h:               0---------10         15             23------26
- * fetch-pack.c:             01
+ * fetch-pack.c:             01    67
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
  * upload-pack.c:                4       11-----14  16-----19
diff --git a/send-pack.c b/send-pack.c
index 9045f8a082..f846b408c7 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -56,7 +56,7 @@ static void feed_object(const struct object_id *oid, FILE *fh, int negative)
 /*
  * Make a pack stream and spit it out into file descriptor fd
  */
-static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struct send_pack_args *args)
+static int pack_objects(int fd, struct ref *refs, struct oid_array *advertised, struct oid_array *negotiated, struct send_pack_args *args)
 {
 	/*
 	 * The child becomes pack-objects --revs; we feed
@@ -94,8 +94,10 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struc
 	 * parameters by writing to the pipe.
 	 */
 	po_in = xfdopen(po.in, "w");
-	for (i = 0; i < extra->nr; i++)
-		feed_object(&extra->oid[i], po_in, 1);
+	for (i = 0; i < advertised->nr; i++)
+		feed_object(&advertised->oid[i], po_in, 1);
+	for (i = 0; i < negotiated->nr; i++)
+		feed_object(&negotiated->oid[i], po_in, 1);
 
 	while (refs) {
 		if (!is_null_oid(&refs->old_oid))
@@ -409,11 +411,49 @@ static void reject_invalid_nonce(const char *nonce, int len)
 	}
 }
 
+static void get_commons_through_negotiation(const char *url,
+					    const struct ref *remote_refs,
+					    struct oid_array *commons)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+	const struct ref *ref;
+	int len = the_hash_algo->hexsz + 1; /* hash + NL */
+
+	child.git_cmd = 1;
+	child.no_stdin = 1;
+	child.out = -1;
+	strvec_pushl(&child.args, "fetch-pack", "--negotiate-only", NULL);
+	for (ref = remote_refs; ref; ref = ref->next)
+		strvec_pushf(&child.args, "--negotiation-tip=%s", oid_to_hex(&ref->new_oid));
+	strvec_push(&child.args, url);
+
+	if (start_command(&child))
+		die(_("send-pack: unable to fork off fetch-pack subprocess"));
+
+	do {
+		char hex_hash[GIT_MAX_HEXSZ + 1];
+		int read_len = read_in_full(child.out, hex_hash, len);
+		struct object_id oid;
+		const char *end;
+
+		if (!read_len)
+			return;
+		if (read_len != len)
+			die("invalid length read %d", read_len);
+		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
+			die("invalid hash");
+		oid_array_append(commons, &oid);
+	} while (1);
+
+	finish_command(&child);
+}
+
 int send_pack(struct send_pack_args *args,
 	      int fd[], struct child_process *conn,
 	      struct ref *remote_refs,
 	      struct oid_array *extra_have)
 {
+	struct oid_array commons = OID_ARRAY_INIT;
 	int in = fd[0];
 	int out = fd[1];
 	struct strbuf req_buf = STRBUF_INIT;
@@ -437,6 +477,9 @@ int send_pack(struct send_pack_args *args,
 	const char *push_cert_nonce = NULL;
 	struct packet_reader reader;
 
+	if (getenv("GIT_TEST_PUSH_NEGOTIATION"))
+		get_commons_through_negotiation(args->url, remote_refs, &commons);
+
 	git_config_get_bool("transfer.advertisesid", &advertise_sid);
 
 	/* Does the other end support the reporting? */
@@ -625,7 +668,7 @@ int send_pack(struct send_pack_args *args,
 			   PACKET_READ_DIE_ON_ERR_PACKET);
 
 	if (need_pack_data && cmds_sent) {
-		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
+		if (pack_objects(out, remote_refs, extra_have, &commons, args) < 0) {
 			if (args->stateless_rpc)
 				close(out);
 			if (git_connection_is_socket(conn))
diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
index 42f5503004..3b87a411a5 100755
--- a/t/t5510-fetch.sh
+++ b/t/t5510-fetch.sh
@@ -1214,6 +1214,25 @@ test_expect_success '--negotiation-tip understands abbreviated SHA-1' '
 	check_negotiation_tip
 '
 
+test_expect_success '--negotiate-only' '
+	rm -rf server client trace &&
+
+	git init server &&
+	test_commit -C server one &&
+	test_commit -C server two &&
+
+	git clone "file://$(pwd)/server" client &&
+	test_commit -C client three &&
+
+	git -C client for-each-ref &&
+	git -C client fetch-pack \
+		--negotiate-only \
+		--negotiation-tip=$(git -C client rev-parse HEAD) \
+		"file://$(pwd)/server" >out &&
+	COMMON=$(git -C server rev-parse two) &&
+	grep "$COMMON" out
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index 15262b4192..684f4a7fc8 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -191,6 +191,30 @@ test_expect_success 'fetch with pushInsteadOf (should not rewrite)' '
 	)
 '
 
+grep_wrote () {
+	object_count=$1
+	file_name=$2
+	grep 'write_pack_file/wrote.*"value":"'$1'"' $2
+}
+
+
+test_expect_success 'push with negotiation' '
+	mk_empty testrepo &&
+	git push testrepo $the_first_commit:refs/remotes/origin/first_commit &&
+	git -C testrepo config receive.hideRefs refs/remotes/origin/first_commit &&
+	echo now pushing without negotiation &&
+	GIT_TRACE2_EVENT="$(pwd)/event" git push testrepo refs/heads/main:refs/remotes/origin/main &&
+	grep_wrote 5 event && # 2 commits, 2 trees, 1 blob
+
+	rm event &&
+	mk_empty testrepo &&
+	git push testrepo $the_first_commit:refs/remotes/origin/first_commit &&
+	git -C testrepo config receivePack.hideRefs refs/remotes/origin/first_commit &&
+	echo now pushing with negotiation &&
+	GIT_TEST_PUSH_NEGOTIATION=1 GIT_TRACE2_EVENT="$(pwd)/event" git push testrepo refs/heads/main:refs/remotes/origin/main &&
+	grep_wrote 2 event # 1 commit, 1 tree
+'
+
 test_expect_success 'push without wildcard' '
 	mk_empty testrepo &&
 
diff --git a/transport.c b/transport.c
index 679a35e7c1..0c2925bb12 100644
--- a/transport.c
+++ b/transport.c
@@ -370,6 +370,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 	args.stateless_rpc = transport->stateless_rpc;
 	args.server_options = transport->server_options;
 	args.negotiation_tips = data->options.negotiation_tips;
+	args.acked_commits = data->options.acked_commits;
 
 	if (!data->got_remote_heads) {
 		int i;
diff --git a/transport.h b/transport.h
index 24558c027d..b6659f8aa6 100644
--- a/transport.h
+++ b/transport.h
@@ -46,6 +46,11 @@ struct git_transport_options {
 	 * transport_set_option().
 	 */
 	struct oid_array *negotiation_tips;
+
+	/*
+	 * See fetch-pack.h for more information.
+	 */
+	struct oidset *acked_commits;
 };
 
 enum transport_family {
diff --git a/upload-pack.c b/upload-pack.c
index 4ab55ce2b5..bdca9daf7d 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -103,6 +103,7 @@ struct upload_pack_data {
 	unsigned use_ofs_delta : 1;
 	unsigned no_progress : 1;
 	unsigned use_include_tag : 1;
+	unsigned wait_for_done : 1;
 	unsigned allow_filter : 1;
 	unsigned allow_filter_fallback : 1;
 	unsigned long tree_filter_max_depth;
@@ -1503,6 +1504,10 @@ static void process_args(struct packet_reader *request,
 			data->done = 1;
 			continue;
 		}
+		if (!strcmp(arg, "wait-for-done")) {
+			data->wait_for_done = 1;
+			continue;
+		}
 
 		/* Shallow related arguments */
 		if (process_shallow(arg, &data->shallows))
@@ -1585,7 +1590,7 @@ static int send_acks(struct upload_pack_data *data, struct oid_array *acks)
 				    oid_to_hex(&acks->oid[i]));
 	}
 
-	if (ok_to_give_up(data)) {
+	if (!data->wait_for_done && ok_to_give_up(data)) {
 		/* Send Ready */
 		packet_writer_write(&data->writer, "ready\n");
 		return 1;
@@ -1675,10 +1680,13 @@ int upload_pack_v2(struct repository *r, struct strvec *keys,
 		case FETCH_PROCESS_ARGS:
 			process_args(request, &data);
 
-			if (!data.want_obj.nr) {
+			if (!data.want_obj.nr && !data.wait_for_done) {
 				/*
-				 * Request didn't contain any 'want' lines,
-				 * guess they didn't want anything.
+				 * Request didn't contain any 'want' lines (and
+				 * the request does not contain
+				 * "wait-for-done", in which it is reasonable
+				 * to just send 'have's without 'want's); guess
+				 * they didn't want anything.
 				 */
 				state = FETCH_DONE;
 			} else if (data.haves.nr) {
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-18  1:21 [RFC PATCH] push: perform negotiation before sending packfile Jonathan Tan
@ 2021-02-18  1:34 ` Junio C Hamano
  2021-02-18 20:25   ` Junio C Hamano
  2021-02-18 23:02   ` Jonathan Tan
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-02-18  1:34 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> The idea of adding negotiation to push has been floating around for a
> while. Here's my implementation of my idea of reusing a lot of the fetch
> mechanism to perform the negotiation.

Finally? Yay!

> The basic idea is that when a client pushes, the client will first
> perform the negotiation steps that it normally does during a fetch,
> except that it does not send any "want"s and it only uses the commits to
> be pushed as negotiation tips (instead of all refs). Once the client has
> received enough ACKs that all ancestral paths from all tips to the
> original orphans are blocked by ACKed commits, it will proceed with the
> push, using this information to determine the contents of the
> to-be-pushed packfile. (This check is done by the server when doing a
> user-triggered fetch.)

So when pushing 'HEAD' to some ref, we say "I have HEAD^{commit},
HEAD^^, HEAD^^^, ..." and they keep saying "never heard of it" for
each of them until they find "ah, I know that one" with an ACK, at
which point we can stop traversing our side of the history behind
that acked commit (because everything behind it is common between
us). And that way, we know what we do not have to send (i.e. what
we should use as the negative ends of "rev-list --objects A..B";
their ACK lets us discover "A").

Do we take advantage of the ref advertisement the other side
perform, or is this v2 only and we even skip ls-refs?

What do you mean by an "orphan", though?  Except for that part, I
think what you wrote the above makes quite a lot of sense.

When we have an "--allow-unrelated-histories" merge with a history
they've never heard of, we'd end up digging down to the root of the
unrelated side history with "have/nack" exchange.  On the fetch
side, we have "give up with too many nack" band-aid.  Do we inherit
the same from the fetch side?

>  - Do we need statistics in the commit message to show the performance
>    gains?

Not until we see the thing fully working, I would say.

>  - Anything else that comes up upon more scrutiny
>
> Any comments are welcome, especially if you have ideas about the overall
> design or implementation, and/or if you've thought about similar things
> before.

I'll have more comments after reading the code, but that will happen
much later tonight.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-18  1:34 ` Junio C Hamano
@ 2021-02-18 20:25   ` Junio C Hamano
  2021-02-18 21:33     ` Junio C Hamano
  2021-02-19  0:42     ` Jonathan Tan
  2021-02-18 23:02   ` Jonathan Tan
  1 sibling, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-02-18 20:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>> Any comments are welcome, especially if you have ideas about the overall
>> design or implementation, and/or if you've thought about similar things
>> before.

I think that it is a good way to prototype by adding a separate
session with the upload-pack to only learn about what is common
before talking to the receive-pack, presumably running on the same
repository and may share common view with the previous upload-pack.

And the code seems to implement the idea just fine.  I wonder how
much we can cut out the unnecessary resource usage in pathological
cases with this approach.

 - If the target ref got rewound, with the current send-pack
   approach, we may not find any common commit, or we may end up
   using a mediocre one---that would result in unnecessary packdata
   transfer, which we may be able to cut down significantly.

   I thought you guys have some server-side hacks to send a bit
   stale commit as a "fake" ref to help with this situation, though.
   So the comparison is really between how much each of these two
   approaches helps.

 - With this approach, we can cut down the initial ref advertisement
   from the receiving end to the minimum.  The protocol with
   negotiation could work correctly without receiving end
   advertising no refs, but I suspect that it would be very common
   in a publishing repository (not a shared everybody-pushes-into
   repository) that the tip of the target repository is known by the
   pusher, and it may help such a case to know where the ref being
   updated with this push originally points at.  But even in such a
   case, additional negotiation may help---the target branch may
   have not advanced, but may of the side branches the pusher have
   merged into the commit being pushed to the target branch may
   already be known by the receiver.

For a real implementation, I think we'd want to do the negotiation
inside the conversation between send-pack and receive-pack, so that
what is agreed to be common between two parties will not shift in
the middle (in the same spirit that upload-pack grabs all the
relevant refs first, advertises them, negotiates what is common and
creates a pack, all using the same worldview of where the tips of
refs are throughout the process, even if some refs change in the
meantime).

Thanks for a fun read.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-18 20:25   ` Junio C Hamano
@ 2021-02-18 21:33     ` Junio C Hamano
  2021-02-19  0:42     ` Jonathan Tan
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-02-18 21:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>  - With this approach, we can cut down the initial ref advertisement
>    from the receiving end to the minimum.  The protocol with
>    negotiation could work correctly without receiving end
>    advertising no refs, but I suspect that it would be very common
>    in a publishing repository (not a shared everybody-pushes-into
>    repository) that the tip of the target repository is known by the
>    pusher, and it may help such a case to know where the ref being
>    updated with this push originally points at.  But even in such a
>    case, additional negotiation may help---the target branch may
>    have not advanced, but may of the side branches the pusher have
>    merged into the commit being pushed to the target branch may
>    already be known by the receiver.

Addendum.  The original push protocol where the pusher learns all
the tips of refs the receiver has cuts down the resulting packdata
transfer in such a "side branches already known by the receiver,
only the fact that they were merged to the target branch is new"
case.  The pain only starts happening when we try to reduce the
advertisement made by the receiver.

As the "negotiation" is more targetted for the ref that actually is
being pushed (as opposed to the blanket "here is everything I have"
that can get unreasonably big), I think the idea to introduce the
same common ancestor discovery that happens on the "fetch" side, and
to share the code with the "fetch" side that has been battle tested,
is a great one.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-18  1:34 ` Junio C Hamano
  2021-02-18 20:25   ` Junio C Hamano
@ 2021-02-18 23:02   ` Jonathan Tan
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Tan @ 2021-02-18 23:02 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > The idea of adding negotiation to push has been floating around for a
> > while. Here's my implementation of my idea of reusing a lot of the fetch
> > mechanism to perform the negotiation.
> 
> Finally? Yay!

Thanks.

> > The basic idea is that when a client pushes, the client will first
> > perform the negotiation steps that it normally does during a fetch,
> > except that it does not send any "want"s and it only uses the commits to
> > be pushed as negotiation tips (instead of all refs). Once the client has
> > received enough ACKs that all ancestral paths from all tips to the
> > original orphans are blocked by ACKed commits, it will proceed with the
> > push, using this information to determine the contents of the
> > to-be-pushed packfile. (This check is done by the server when doing a
> > user-triggered fetch.)
> 
> So when pushing 'HEAD' to some ref, we say "I have HEAD^{commit},
> HEAD^^, HEAD^^^, ..." and they keep saying "never heard of it" for
> each of them until they find "ah, I know that one" with an ACK, at
> which point we can stop traversing our side of the history behind
> that acked commit (because everything behind it is common between
> us). And that way, we know what we do not have to send (i.e. what
> we should use as the negative ends of "rev-list --objects A..B";
> their ACK lets us discover "A").

Yes, that's right.

> Do we take advantage of the ref advertisement the other side
> perform, or is this v2 only and we even skip ls-refs?

My plan is to make it v2-only, but I don't think that there are
technical limitations in adding it to v0. I'm planning to skip ls-refs
(the current proof-of-concept code still calls ls-refs but doesn't use
its results). If we need to take advantage of the ref advertisement, we
could just use push's one.

> What do you mean by an "orphan", though?  Except for that part, I
> think what you wrote the above makes quite a lot of sense.

By "orphan" I meant the commits that don't have any parents - so, the
root commits.

> When we have an "--allow-unrelated-histories" merge with a history
> they've never heard of, we'd end up digging down to the root of the
> unrelated side history with "have/nack" exchange.  On the fetch
> side, we have "give up with too many nack" band-aid.  Do we inherit
> the same from the fetch side?

Yes. (But like fetch, this "in vain" check triggers only after the first
ACK.)

> >  - Do we need statistics in the commit message to show the performance
> >    gains?
> 
> Not until we see the thing fully working, I would say.

OK.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-18 20:25   ` Junio C Hamano
  2021-02-18 21:33     ` Junio C Hamano
@ 2021-02-19  0:42     ` Jonathan Tan
  2021-02-20  3:25       ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2021-02-19  0:42 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> For a real implementation, I think we'd want to do the negotiation
> inside the conversation between send-pack and receive-pack, so that
> what is agreed to be common between two parties will not shift in
> the middle (in the same spirit that upload-pack grabs all the
> relevant refs first, advertises them, negotiates what is common and
> creates a pack, all using the same worldview of where the tips of
> refs are throughout the process, even if some refs change in the
> meantime).

Upload-pack does that for protocol v0 ssh:// and git:// but not
http(s)://, and does not do that for protocol v2, I believe.

If we were to do that, I don't think it would work for the transports
that have are stateless (e.g. HTTP). Also, this seems like it would
involve a significant reworking of how the server serves (receive-pack
would need to know to communicate just like upload-pack does temporarily
before proceeding with its usual behavior, and the client would need to
learn this new way - as opposed to the idea in this patch to do it
separately and reuse the already existing fetch and push mechanisms). I
think the greater complexity is not worth it for something that won't
work in HTTP, but I'm open to other opinions.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-19  0:42     ` Jonathan Tan
@ 2021-02-20  3:25       ` Junio C Hamano
  2021-02-22 20:01         ` Jonathan Tan
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2021-02-20  3:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>> For a real implementation, I think we'd want to do the negotiation
>> inside the conversation between send-pack and receive-pack, so that
>> what is agreed to be common between two parties will not shift in
>> the middle (in the same spirit that upload-pack grabs all the
>> relevant refs first, advertises them, negotiates what is common and
>> creates a pack, all using the same worldview of where the tips of
>> refs are throughout the process, even if some refs change in the
>> meantime).
>
> Upload-pack does that for protocol v0 ssh:// and git:// but not
> http(s)://, and does not do that for protocol v2, I believe.
>
> If we were to do that, I don't think it would work for the transports
> that have are stateless (e.g. HTTP).

Yeah, I consider it a bug in the "stateless" hack, though, and v2
somehow chose to take the common denominator to propagate the same
bug to protocols that are otherwise capable of being stateful.

In any case, I think I heard in another response from you that you
plan to do only v2, and I think that is OK.  Perhaps we can have a
separate service (like 'ls-refs' is a service that can be used
independent of the 'fetch' service in v2, and can be used by
somebody trying to 'push') 'negotiate' that can become a separate
thing, so that "fetch<->upload-pack" conversation would become
ls-refs plus negotiate plus fetch, while "push<->receive-pack"
conversation would become ls-refs plus negotiate plus push?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-20  3:25       ` Junio C Hamano
@ 2021-02-22 20:01         ` Jonathan Tan
  2021-02-22 21:32           ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Tan @ 2021-02-22 20:01 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >> For a real implementation, I think we'd want to do the negotiation
> >> inside the conversation between send-pack and receive-pack, so that
> >> what is agreed to be common between two parties will not shift in
> >> the middle (in the same spirit that upload-pack grabs all the
> >> relevant refs first, advertises them, negotiates what is common and
> >> creates a pack, all using the same worldview of where the tips of
> >> refs are throughout the process, even if some refs change in the
> >> meantime).
> >
> > Upload-pack does that for protocol v0 ssh:// and git:// but not
> > http(s)://, and does not do that for protocol v2, I believe.
> >
> > If we were to do that, I don't think it would work for the transports
> > that have are stateless (e.g. HTTP).
> 
> Yeah, I consider it a bug in the "stateless" hack, though, and v2
> somehow chose to take the common denominator to propagate the same
> bug to protocols that are otherwise capable of being stateful.
> 
> In any case, I think I heard in another response from you that you
> plan to do only v2, and I think that is OK.  Perhaps we can have a
> separate service (like 'ls-refs' is a service that can be used
> independent of the 'fetch' service in v2, and can be used by
> somebody trying to 'push') 'negotiate' that can become a separate
> thing, so that "fetch<->upload-pack" conversation would become
> ls-refs plus negotiate plus fetch,

That does make sense conceptually, although the fact that every
negotiate step could potentially also include a packfile (when fetching,
as we do today) makes things more complicated. (Besides the fact that we
would be making another change in the protocol.)

> while "push<->receive-pack"
> conversation would become ls-refs plus negotiate plus push?
> 
> Thanks.

I guess the idea is to have a push that does not start with a ref
advertisement, therefore making everything more modular? That sounds
reasonable (and does mean that if we ever decide that pushes with
negotiate don't need ref advertisement at all, we can just remove the
ls-refs part), but this sounds like it would require some sort of v2 for
push - which is another discussion topic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] push: perform negotiation before sending packfile
  2021-02-22 20:01         ` Jonathan Tan
@ 2021-02-22 21:32           ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2021-02-22 21:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> I guess the idea is to have a push that does not start with a ref
> advertisement, therefore making everything more modular?

Yes, making things modular and reusable would be valuable---if the
fetch side were already structured like I dreamed in the message you
are responding to with a separate 'negotiate' service, the RFC patch
would have looked much nicer.

I am also interested in seeing was not to require a new connection
for an extra roundtrip.

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-22 21:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18  1:21 [RFC PATCH] push: perform negotiation before sending packfile Jonathan Tan
2021-02-18  1:34 ` Junio C Hamano
2021-02-18 20:25   ` Junio C Hamano
2021-02-18 21:33     ` Junio C Hamano
2021-02-19  0:42     ` Jonathan Tan
2021-02-20  3:25       ` Junio C Hamano
2021-02-22 20:01         ` Jonathan Tan
2021-02-22 21:32           ` Junio C Hamano
2021-02-18 23:02   ` Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).