git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] "Push" protocol change proposal: user-specified base
@ 2020-11-03  0:26 Jonathan Tan
  2020-11-03  0:26 ` [PATCH 1/4] connect: refactor building of Extra Parameters Jonathan Tan
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Jonathan Tan @ 2020-11-03  0:26 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

At $DAY_JOB, we've noticed some pushes being larger than necessary.
Here's a first step towards solving that. In this protocol change, we're
allowing the user to specify an ancestor believed to be common; if the
server agrees, the server sends that ancestor as the lone so-called ref
being advertised instead of the full ref advertisement. Not only does
this save bandwidth for the ref advertisement, but this avoids the issue
of the server's refs being ahead of the client (and thus the client
cannot use them since it is unaware of those objects).

More information (including a design doc) can be found in the
documentation patch.

This works for file://, ssh://, and http(s)://.

One last concern I had is the locations of the tests - I just bundled
the protocol v0 ones with their corresponding protocol v1 ones, as I
couldn't find good places to put them.

Jonathan Tan (4):
  connect: refactor building of Extra Parameters
  push: teach --base for ssh:// and file://
  remote-curl: teach --base for http(s)://
  Doc: push with --base

 Documentation/gitremote-helpers.txt        |   8 ++
 Documentation/technical/pack-protocol.txt  |  10 +-
 Documentation/technical/push-with-base.txt |  61 ++++++++++++
 builtin/fetch-pack.c                       |   2 +-
 builtin/push.c                             |  12 ++-
 builtin/receive-pack.c                     |  46 ++++++++-
 builtin/send-pack.c                        |   2 +-
 builtin/upload-pack.c                      |   2 +-
 connect.c                                  |  39 +++++---
 connect.h                                  |   4 +-
 http-backend.c                             |   2 +-
 protocol.c                                 |   5 +-
 protocol.h                                 |   5 +-
 remote-curl.c                              |  25 +++--
 t/t5700-protocol-v1.sh                     | 110 +++++++++++++++++++++
 transport-helper.c                         |  15 +++
 transport.c                                |  14 ++-
 transport.h                                |   9 ++
 18 files changed, 332 insertions(+), 39 deletions(-)
 create mode 100644 Documentation/technical/push-with-base.txt

-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] connect: refactor building of Extra Parameters
  2020-11-03  0:26 [PATCH 0/4] "Push" protocol change proposal: user-specified base Jonathan Tan
@ 2020-11-03  0:26 ` Jonathan Tan
  2020-11-03  0:26 ` [PATCH 2/4] push: teach --base for ssh:// and file:// Jonathan Tan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Jonathan Tan @ 2020-11-03  0:26 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Build the Extra Parameters string ("version=%d") in only one place
(git_connect()) instead of two (git_connect() and push_ssh_options()).
This is in preparation for a subsequent patch, which will change how the
Extra Parameters are built.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 connect.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/connect.c b/connect.c
index 8b8f56cf6d..5221f1764c 100644
--- a/connect.c
+++ b/connect.c
@@ -1201,14 +1201,13 @@ static struct child_process *git_connect_git(int fd[2], char *hostandport,
  */
 static void push_ssh_options(struct strvec *args, struct strvec *env,
 			     enum ssh_variant variant, const char *port,
-			     enum protocol_version version, int flags)
+			     const char *extra_parameters, int flags)
 {
-	if (variant == VARIANT_SSH &&
-	    version > 0) {
+	if (variant == VARIANT_SSH && extra_parameters) {
 		strvec_push(args, "-o");
 		strvec_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT);
-		strvec_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d",
-			     version);
+		strvec_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=%s",
+			     extra_parameters);
 	}
 
 	if (flags & CONNECT_IPV4) {
@@ -1261,7 +1260,7 @@ static void push_ssh_options(struct strvec *args, struct strvec *env,
 
 /* Prepare a child_process for use by Git's SSH-tunneled transport. */
 static void fill_ssh_args(struct child_process *conn, const char *ssh_host,
-			  const char *port, enum protocol_version version,
+			  const char *port, const char *extra_parameters,
 			  int flags)
 {
 	const char *ssh;
@@ -1296,14 +1295,14 @@ static void fill_ssh_args(struct child_process *conn, const char *ssh_host,
 		strvec_push(&detect.args, ssh);
 		strvec_push(&detect.args, "-G");
 		push_ssh_options(&detect.args, &detect.env_array,
-				 VARIANT_SSH, port, version, flags);
+				 VARIANT_SSH, port, extra_parameters, flags);
 		strvec_push(&detect.args, ssh_host);
 
 		variant = run_command(&detect) ? VARIANT_SIMPLE : VARIANT_SSH;
 	}
 
 	strvec_push(&conn->args, ssh);
-	push_ssh_options(&conn->args, &conn->env_array, variant, port, version, flags);
+	push_ssh_options(&conn->args, &conn->env_array, variant, port, extra_parameters, flags);
 	strvec_push(&conn->args, ssh_host);
 }
 
@@ -1352,6 +1351,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 	} else {
 		struct strbuf cmd = STRBUF_INIT;
 		const char *const *var;
+		struct strbuf extra_parameters = STRBUF_INIT;
 
 		conn = xmalloc(sizeof(*conn));
 		child_process_init(conn);
@@ -1367,6 +1367,9 @@ struct child_process *git_connect(int fd[2], const char *url,
 		for (var = local_repo_env; *var; var++)
 			strvec_push(&conn->env_array, *var);
 
+		if (version > 0)
+			strbuf_addf(&extra_parameters, "version=%d", version);
+
 		conn->use_shell = 1;
 		conn->in = conn->out = -1;
 		if (protocol == PROTO_SSH) {
@@ -1392,15 +1395,16 @@ struct child_process *git_connect(int fd[2], const char *url,
 				return NULL;
 			}
 			conn->trace2_child_class = "transport/ssh";
-			fill_ssh_args(conn, ssh_host, port, version, flags);
+			fill_ssh_args(conn, ssh_host, port,
+				      extra_parameters.len ? extra_parameters.buf : NULL,
+				      flags);
 		} else {
 			transport_check_allowed("file");
 			conn->trace2_child_class = "transport/file";
-			if (version > 0) {
+			if (extra_parameters.len)
 				strvec_pushf(&conn->env_array,
-					     GIT_PROTOCOL_ENVIRONMENT "=version=%d",
-					     version);
-			}
+					     GIT_PROTOCOL_ENVIRONMENT "=%s",
+					     extra_parameters.buf);
 		}
 		strvec_push(&conn->args, cmd.buf);
 
@@ -1409,6 +1413,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 
 		fd[0] = conn->out; /* read from child's stdout */
 		fd[1] = conn->in;  /* write to child's stdin */
+		strbuf_release(&extra_parameters);
 		strbuf_release(&cmd);
 	}
 	free(hostandport);
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/4] push: teach --base for ssh:// and file://
  2020-11-03  0:26 [PATCH 0/4] "Push" protocol change proposal: user-specified base Jonathan Tan
  2020-11-03  0:26 ` [PATCH 1/4] connect: refactor building of Extra Parameters Jonathan Tan
@ 2020-11-03  0:26 ` Jonathan Tan
  2020-11-03 10:23   ` SZEDER Gábor
                     ` (2 more replies)
  2020-11-03  0:26 ` [PATCH 3/4] remote-curl: teach --base for http(s):// Jonathan Tan
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 21+ messages in thread
From: Jonathan Tan @ 2020-11-03  0:26 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

For the ssh:// and file:// transports, teach push the "--base"
parameter, which indicates an ancestor of the commits to be pushed that
is believed to be known by the server.

If specified, it will be communicated to receive-pack through an Extra
Parameter. If the repository served by receive-pack indeed has that
object, receive-pack will send an abbreviated ref advertisement
containing only that object instead of the usual ref advertisement,
reducing the size of the ref advertisement and, potentially, the size of
the packfile to be sent.

This is supported both on protocol v0 and v1. (Protocol v1 is the same
as protocol v0, except that the client sends "version=1" as an Extra
Parameter and the server responds with "version 1".)

Support for http(s):// will be added in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/fetch-pack.c   |  2 +-
 builtin/push.c         | 12 +++++--
 builtin/receive-pack.c | 46 +++++++++++++++++++++---
 builtin/send-pack.c    |  2 +-
 builtin/upload-pack.c  |  2 +-
 connect.c              |  8 ++++-
 connect.h              |  4 ++-
 http-backend.c         |  2 +-
 protocol.c             |  5 ++-
 protocol.h             |  5 ++-
 t/t5700-protocol-v1.sh | 80 ++++++++++++++++++++++++++++++++++++++++++
 transport.c            | 14 ++++++--
 transport.h            |  9 +++++
 13 files changed, 175 insertions(+), 16 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 58b7c1fbdc..9452969319 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -207,7 +207,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
 		if (args.diag_url)
 			flags |= CONNECT_DIAG_URL;
 		conn = git_connect(fd, dest, args.uploadpack,
-				   flags);
+				   flags, NULL);
 		if (!conn)
 			return args.diag_url ? 0 : 1;
 	}
diff --git a/builtin/push.c b/builtin/push.c
index 6da3a8e5d3..a9f3278a24 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -381,7 +381,8 @@ static int push_with_options(struct transport *transport, struct refspec *rs,
 
 static int do_push(int flags,
 		   const struct string_list *push_options,
-		   struct remote *remote)
+		   struct remote *remote,
+		   const char *push_base)
 {
 	int i, errs;
 	const char **url;
@@ -405,6 +406,8 @@ static int do_push(int flags,
 				transport_get(remote, url[i]);
 			if (flags & TRANSPORT_PUSH_OPTIONS)
 				transport->push_options = push_options;
+			if (push_base)
+				transport_set_option(transport, TRANS_OPT_PUSH_BASE, push_base);
 			if (push_with_options(transport, push_refspec, flags))
 				errs++;
 		}
@@ -413,6 +416,8 @@ static int do_push(int flags,
 			transport_get(remote, NULL);
 		if (flags & TRANSPORT_PUSH_OPTIONS)
 			transport->push_options = push_options;
+		if (push_base)
+			transport_set_option(transport, TRANS_OPT_PUSH_BASE, push_base);
 		if (push_with_options(transport, push_refspec, flags))
 			errs++;
 	}
@@ -526,6 +531,7 @@ int cmd_push(int argc, const char **argv, const char *prefix)
 	struct string_list *push_options;
 	const struct string_list_item *item;
 	struct remote *remote;
+	const char *push_base = NULL;
 
 	struct option options[] = {
 		OPT__VERBOSITY(&verbosity),
@@ -562,6 +568,8 @@ int cmd_push(int argc, const char **argv, const char *prefix)
 				TRANSPORT_FAMILY_IPV4),
 		OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
 				TRANSPORT_FAMILY_IPV6),
+		OPT_STRING(0, "base", &push_base, N_("revision"),
+			   N_("ancestor of commits to be pushed that is believed to be known by the server")),
 		OPT_END()
 	};
 
@@ -629,7 +637,7 @@ int cmd_push(int argc, const char **argv, const char *prefix)
 		if (strchr(item->string, '\n'))
 			die(_("push options must not have new line characters"));
 
-	rc = do_push(flags, push_options, remote);
+	rc = do_push(flags, push_options, remote, push_base);
 	string_list_clear(&push_options_cmdline, 0);
 	string_list_clear(&push_options_config, 0);
 	if (rc == -1)
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index bb9909c52e..274f39c944 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -313,13 +313,38 @@ static void show_one_alternate_ref(const struct object_id *oid,
 	show_ref(".have", oid);
 }
 
-static void write_head_info(void)
+struct check_connected_for_one_args {
+	struct object_id *base;
+	unsigned already_emitted : 1;
+};
+static int check_connected_for_one(void *args, struct object_id *oid)
+{
+	struct check_connected_for_one_args *a = args;
+
+	if (a->already_emitted)
+		return -1;
+	oidcpy(oid, a->base);
+	a->already_emitted = 1;
+	return 0;
+}
+
+static void write_head_info(struct object_id *base)
 {
 	static struct oidset seen = OIDSET_INIT;
 
+	if (base) {
+		struct check_connected_for_one_args args = {base};
+
+		if (!check_connected(check_connected_for_one, &args, NULL)) {
+			show_ref(".have", base);
+			goto refs_shown;
+		}
+	}
+
 	for_each_ref(show_ref_cb, &seen);
 	for_each_alternate_ref(show_one_alternate_ref, &seen);
 	oidset_clear(&seen);
+refs_shown:
 	if (!sent_capabilities)
 		show_ref("capabilities^{}", &null_oid);
 
@@ -2417,6 +2442,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 	struct oid_array ref = OID_ARRAY_INIT;
 	struct shallow_info si;
 	struct packet_reader reader;
+	struct strbuf base_sb = STRBUF_INIT;
 
 	struct option options[] = {
 		OPT__QUIET(&quiet, N_("quiet")),
@@ -2451,7 +2477,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 	else if (0 <= receive_unpack_limit)
 		unpack_limit = receive_unpack_limit;
 
-	switch (determine_protocol_version_server()) {
+	switch (determine_protocol_version_server(&base_sb)) {
 	case protocol_v2:
 		/*
 		 * push support for protocol v2 has not been implemented yet,
@@ -2474,10 +2500,21 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 	}
 
 	if (advertise_refs || !stateless_rpc) {
-		write_head_info();
+		if (base_sb.len) {
+			struct object_id oid;
+			const char *p;
+
+			if (parse_oid_hex(base_sb.buf, &oid, &p) || *p != '\0')
+				die("invalid base");
+			write_head_info(&oid);
+		} else {
+			write_head_info(NULL);
+		}
 	}
-	if (advertise_refs)
+	if (advertise_refs) {
+		strbuf_release(&base_sb);
 		return 0;
+	}
 
 	packet_reader_init(&reader, 0, NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -2540,6 +2577,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 	}
 	if (use_sideband)
 		packet_flush(1);
+	strbuf_release(&base_sb);
 	oid_array_clear(&shallow);
 	oid_array_clear(&ref);
 	free((void *)push_cert_nonce);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 7af148d733..1557db7a42 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -262,7 +262,7 @@ int cmd_send_pack(int argc, const char **argv, const char *prefix)
 		fd[1] = 1;
 	} else {
 		conn = git_connect(fd, dest, receivepack,
-			args.verbose ? CONNECT_VERBOSE : 0);
+			args.verbose ? CONNECT_VERBOSE : 0, NULL);
 	}
 
 	packet_reader_init(&reader, fd[0], NULL, 0,
diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 6da8fa2607..32aaa7742c 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -48,7 +48,7 @@ int cmd_upload_pack(int argc, const char **argv, const char *prefix)
 	if (!enter_repo(dir, strict))
 		die("'%s' does not appear to be a git repository", dir);
 
-	switch (determine_protocol_version_server()) {
+	switch (determine_protocol_version_server(NULL)) {
 	case protocol_v2:
 		serve_opts.advertise_capabilities = opts.advertise_refs;
 		serve_opts.stateless_rpc = opts.stateless_rpc;
diff --git a/connect.c b/connect.c
index 5221f1764c..fa5b7ea886 100644
--- a/connect.c
+++ b/connect.c
@@ -1318,7 +1318,8 @@ static void fill_ssh_args(struct child_process *conn, const char *ssh_host,
  * the connection failed).
  */
 struct child_process *git_connect(int fd[2], const char *url,
-				  const char *prog, int flags)
+				  const char *prog, int flags,
+				  const char *other_extra_parameters)
 {
 	char *hostandport, *path;
 	struct child_process *conn;
@@ -1369,6 +1370,11 @@ struct child_process *git_connect(int fd[2], const char *url,
 
 		if (version > 0)
 			strbuf_addf(&extra_parameters, "version=%d", version);
+		if (other_extra_parameters) {
+			if (extra_parameters.len)
+				strbuf_addch(&extra_parameters, ':');
+			strbuf_addstr(&extra_parameters, other_extra_parameters);
+		}
 
 		conn->use_shell = 1;
 		conn->in = conn->out = -1;
diff --git a/connect.h b/connect.h
index c53586e929..6d502d4224 100644
--- a/connect.h
+++ b/connect.h
@@ -7,7 +7,9 @@
 #define CONNECT_DIAG_URL      (1u << 1)
 #define CONNECT_IPV4          (1u << 2)
 #define CONNECT_IPV6          (1u << 3)
-struct child_process *git_connect(int fd[2], const char *url, const char *prog, int flags);
+struct child_process *git_connect(int fd[2], const char *url,
+				  const char *prog, int flags,
+				  const char *other_extra_parameters);
 int finish_connect(struct child_process *conn);
 int git_connection_is_socket(struct child_process *conn);
 int server_supports(const char *feature);
diff --git a/http-backend.c b/http-backend.c
index a03b4bae22..7cd684c8b5 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -544,7 +544,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg)
 		end_headers(hdr);
 
 
-		if (determine_protocol_version_server() != protocol_v2) {
+		if (determine_protocol_version_server(NULL) != protocol_v2) {
 			packet_write_fmt(1, "# service=git-%s\n", svc->name);
 			packet_flush(1);
 		}
diff --git a/protocol.c b/protocol.c
index 052d7edbb9..ad653b7521 100644
--- a/protocol.c
+++ b/protocol.c
@@ -42,7 +42,7 @@ enum protocol_version get_protocol_version_config(void)
 	return protocol_v2;
 }
 
-enum protocol_version determine_protocol_version_server(void)
+enum protocol_version determine_protocol_version_server(struct strbuf *base_sb)
 {
 	const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
 	enum protocol_version version = protocol_v0;
@@ -67,6 +67,9 @@ enum protocol_version determine_protocol_version_server(void)
 				v = parse_protocol_version(value);
 				if (v > version)
 					version = v;
+			} else if (skip_prefix(item->string, "base=", &value)) {
+				if (base_sb)
+					strbuf_addstr(base_sb, value);
 			}
 		}
 
diff --git a/protocol.h b/protocol.h
index cef1a4a01c..e418b92049 100644
--- a/protocol.h
+++ b/protocol.h
@@ -22,8 +22,11 @@ enum protocol_version get_protocol_version_config(void);
  * by setting appropriate values for the key 'version'.  If a client doesn't
  * request a particular protocol version, a default of 'protocol_v0' will be
  * used.
+ *
+ * If base_sb is not NULL and an extra parameter "base" is specified, this
+ * function will append its value to base_sb.
  */
-enum protocol_version determine_protocol_version_server(void);
+enum protocol_version determine_protocol_version_server(struct strbuf *base_sb);
 
 /*
  * Used by a client to determine which protocol version the server is speaking
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
index 022901b9eb..22459d37f5 100755
--- a/t/t5700-protocol-v1.sh
+++ b/t/t5700-protocol-v1.sh
@@ -146,6 +146,56 @@ test_expect_success 'push with file:// using protocol v1' '
 	grep "push< version 1" log
 '
 
+test_expect_success 'push with file:// using protocol v1 and --base' '
+	test_commit -C file_child four &&
+	COMMON_HASH=$(git -C file_child rev-parse three) &&
+
+	# Push to another branch, as the target repository has the
+	# master branch checked out and we cannot push into it.
+	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=1 \
+		push --base=three origin HEAD:client_branch_four 2>log &&
+
+	# Server responded using protocol v1
+	grep "push< version 1" log &&
+	# Server advertised only the expected object
+	grep "$COMMON_HASH .have" log
+'
+
+test_expect_success 'push with file:// using protocol v0 and --base' '
+	test_commit -C file_child five &&
+	COMMON_HASH=$(git -C file_child rev-parse four) &&
+
+	# Push to another branch, as the target repository has the
+	# master branch checked out and we cannot push into it.
+	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=0 \
+		push --base=four origin HEAD:client_branch_five 2>log &&
+
+	# Server did not respond with any version
+	! grep "push< version" log &&
+	# Server advertised only the expected object
+	grep "$COMMON_HASH .have" log
+'
+
+test_expect_success 'push with invalid --base' '
+	test_commit -C file_child six &&
+
+	# Server does not have "six".
+	test_must_fail git -C file_child -c protocol.version=0 \
+		push --base=an_invalid_object origin HEAD:client_branch_six 2>log &&
+	grep "is not a valid object" log
+'
+
+test_expect_success 'push with --base that does not exist on server' '
+	COMMON_HASH=$(git -C file_child rev-parse six) &&
+
+	# The push still succeeds.
+	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=0 \
+		push --base=six origin HEAD:client_branch_six 2>log &&
+
+	# Server did not advertise "six", since it does not know it
+	! grep "$COMMON_HASH .have" log
+'
+
 # Test protocol v1 with 'ssh://' transport
 #
 test_expect_success 'setup ssh wrapper' '
@@ -226,6 +276,36 @@ test_expect_success 'push with ssh:// using protocol v1' '
 	grep "push< version 1" log
 '
 
+test_expect_success 'push with ssh:// using protocol v1 and --base' '
+	test_commit -C ssh_child four &&
+	COMMON_HASH=$(git -C ssh_child rev-parse three) &&
+
+	# Push to another branch, as the target repository has the
+	# master branch checked out and we cannot push into it.
+	GIT_TRACE_PACKET=1 git -C ssh_child -c protocol.version=1 \
+		push --base="$COMMON_HASH" origin HEAD:client_branch_four 2>log &&
+
+	# Server responded using protocol v1
+	grep "push< version 1" log &&
+	# Server advertised only the expected object
+	grep "$COMMON_HASH .have" log
+'
+
+test_expect_success 'push with ssh:// using protocol v0 and --base' '
+	test_commit -C ssh_child five &&
+	COMMON_HASH=$(git -C ssh_child rev-parse four) &&
+
+	# Push to another branch, as the target repository has the
+	# master branch checked out and we cannot push into it.
+	GIT_TRACE_PACKET=1 git -C ssh_child -c protocol.version=0 \
+		push --base="$COMMON_HASH" origin HEAD:client_branch_five 2>log &&
+
+	# Server did not respond with any version
+	! grep "push< version" log &&
+	# Server advertised only the expected object
+	grep "$COMMON_HASH .have" log
+'
+
 # Test protocol v1 with 'http://' transport
 #
 . "$TEST_DIRECTORY"/lib-httpd.sh
diff --git a/transport.c b/transport.c
index ffe2115845..531ca0a834 100644
--- a/transport.c
+++ b/transport.c
@@ -236,6 +236,10 @@ static int set_git_option(struct git_transport_options *opts,
 		list_objects_filter_die_if_populated(&opts->filter_options);
 		parse_list_objects_filter(&opts->filter_options, value);
 		return 0;
+	} else if (!strcmp(name, TRANS_OPT_PUSH_BASE)) {
+		if (get_oid(value, &opts->push_base))
+			die(_("transport: '%s' is not a valid object"), value);
+		return 0;
 	}
 	return 1;
 }
@@ -244,6 +248,7 @@ static int connect_setup(struct transport *transport, int for_push)
 {
 	struct git_transport_data *data = transport->data;
 	int flags = transport->verbose > 0 ? CONNECT_VERBOSE : 0;
+	char *extra_parameters = NULL;
 
 	if (data->conn)
 		return 0;
@@ -254,11 +259,16 @@ static int connect_setup(struct transport *transport, int for_push)
 	case TRANSPORT_FAMILY_IPV6: flags |= CONNECT_IPV6; break;
 	}
 
+	if (!is_null_oid(&data->options.push_base))
+		extra_parameters = xstrfmt("base=%s",
+					   oid_to_hex(&data->options.push_base));
+
 	data->conn = git_connect(data->fd, transport->url,
 				 for_push ? data->options.receivepack :
 				 data->options.uploadpack,
-				 flags);
+				 flags, extra_parameters);
 
+	free(extra_parameters);
 	return 0;
 }
 
@@ -815,7 +825,7 @@ static int connect_git(struct transport *transport, const char *name,
 {
 	struct git_transport_data *data = transport->data;
 	data->conn = git_connect(data->fd, transport->url,
-				 executable, 0);
+				 executable, 0, NULL);
 	fd[0] = data->fd[0];
 	fd[1] = data->fd[1];
 	return 0;
diff --git a/transport.h b/transport.h
index ca409ea1e4..ce51edd673 100644
--- a/transport.h
+++ b/transport.h
@@ -46,6 +46,12 @@ struct git_transport_options {
 	 * transport_set_option().
 	 */
 	struct oid_array *negotiation_tips;
+
+	/*
+	 * When pushing, if this is not a null OID, indicates an ancestor of
+	 * the commits to be pushed that is believed to be known by the server.
+	 */
+	struct object_id push_base;
 };
 
 enum transport_family {
@@ -208,6 +214,9 @@ void transport_check_allowed(const char *type);
 /* Request atomic (all-or-nothing) updates when pushing */
 #define TRANS_OPT_ATOMIC "atomic"
 
+/* See "push_base" in struct git_transport_options */
+#define TRANS_OPT_PUSH_BASE "push-base"
+
 /**
  * Returns 0 if the option was used, non-zero otherwise. Prints a
  * message to stderr if the option is not used.
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/4] remote-curl: teach --base for http(s)://
  2020-11-03  0:26 [PATCH 0/4] "Push" protocol change proposal: user-specified base Jonathan Tan
  2020-11-03  0:26 ` [PATCH 1/4] connect: refactor building of Extra Parameters Jonathan Tan
  2020-11-03  0:26 ` [PATCH 2/4] push: teach --base for ssh:// and file:// Jonathan Tan
@ 2020-11-03  0:26 ` Jonathan Tan
  2020-11-03  1:13   ` Junio C Hamano
  2020-11-03  0:26 ` [PATCH 4/4] Doc: push with --base Jonathan Tan
  2020-11-03  0:46 ` [PATCH 0/4] "Push" protocol change proposal: user-specified base Junio C Hamano
  4 siblings, 1 reply; 21+ messages in thread
From: Jonathan Tan @ 2020-11-03  0:26 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

For the http(s):// transports, teach push the "--base" parameter, which
indicates an ancestor of the commits to be pushed that is believed to be
known by the server.

This is done through a new remote helper capability and option
"push-base". If a remote helper advertises this capability, then it also
supports the option of the same name. Git can then use this option to
inform the remote helper of a base that the user has specified.

See the commit message of this commit's parent for more information.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/gitremote-helpers.txt |  8 ++++++++
 remote-curl.c                       | 25 +++++++++++++++++-------
 t/t5700-protocol-v1.sh              | 30 +++++++++++++++++++++++++++++
 transport-helper.c                  | 15 +++++++++++++++
 4 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 6f1e269ae4..b4eff16a59 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -116,6 +116,9 @@ Supported commands: 'stateless-connect'.
 +
 Supported commands: 'list for-push', 'push'.
 
+'push-base'::
+	Indicates that this remote helper supports the 'push-base' option.
+
 'export'::
 	Can discover remote refs and push specified objects from a
 	fast-import stream to remote refs.
@@ -541,6 +544,11 @@ set by Git if the remote helper has the 'option' capability.
 If set to an algorithm, indicate that the caller wants to interact with
 the remote side using that algorithm.
 
+'option push-base' <ref>::
+	Only supported if this remote helper advertises the 'push-base'
+	capability. Indicate that <ref> is an ancestor of the commits to be
+	pushed, and it is believed to be known by the server.
+
 SEE ALSO
 --------
 linkgit:git-remote[1]
diff --git a/remote-curl.c b/remote-curl.c
index 32cc4a0c55..7919e4ebcf 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -29,6 +29,7 @@ struct options {
 	struct string_list deepen_not;
 	struct string_list push_options;
 	char *filter;
+	struct object_id push_base;
 	unsigned progress : 1,
 		check_self_contained_and_connected : 1,
 		cloning : 1,
@@ -205,6 +206,10 @@ static int set_option(const char *name, const char *value)
 			options.hash_algo = &hash_algos[algo];
 		}
 		return 0;
+	} else if (!strcmp(name, "push-base")) {
+		if (get_oid(value, &options.push_base))
+			die(_("%s is not a valid object"), value);
+		return 0;
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -364,16 +369,21 @@ static int show_http_message(struct strbuf *type, struct strbuf *charset,
 }
 
 static int get_protocol_http_header(enum protocol_version version,
+				    const struct object_id *base,
 				    struct strbuf *header)
 {
-	if (version > 0) {
+	if (version > 0)
 		strbuf_addf(header, GIT_PROTOCOL_HEADER ": version=%d",
 			    version);
-
-		return 1;
+	if (!is_null_oid(base)) {
+		if (version > 0)
+			strbuf_addch(header, ':');
+		else
+			strbuf_addstr(header, GIT_PROTOCOL_HEADER ": ");
+		strbuf_addf(header, "base=%s", oid_to_hex(base));
 	}
 
-	return 0;
+	return !!(version > 0 || !(is_null_oid(base)));
 }
 
 static void check_smart_http(struct discovery *d, const char *service,
@@ -469,7 +479,7 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		version = protocol_v0;
 
 	/* Add the extra Git-Protocol header */
-	if (get_protocol_http_header(version, &protocol_header))
+	if (get_protocol_http_header(version, &options.push_base, &protocol_header))
 		string_list_append(&extra_headers, protocol_header.buf);
 
 	memset(&http_options, 0, sizeof(http_options));
@@ -1074,7 +1084,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
 	strbuf_addf(&buf, "Accept: application/x-%s-result", svc);
 	rpc->hdr_accept = strbuf_detach(&buf, NULL);
 
-	if (get_protocol_http_header(heads->version, &buf))
+	if (get_protocol_http_header(heads->version, &options.push_base, &buf))
 		rpc->protocol_header = strbuf_detach(&buf, NULL);
 	else
 		rpc->protocol_header = NULL;
@@ -1406,7 +1416,7 @@ static int stateless_connect(const char *service_name)
 	rpc.service_url = xstrfmt("%s%s", url.buf, rpc.service_name);
 	rpc.hdr_content_type = xstrfmt("Content-Type: application/x-%s-request", rpc.service_name);
 	rpc.hdr_accept = xstrfmt("Accept: application/x-%s-result", rpc.service_name);
-	if (get_protocol_http_header(discover->version, &buf)) {
+	if (get_protocol_http_header(discover->version, &options.push_base, &buf)) {
 		rpc.protocol_header = strbuf_detach(&buf, NULL);
 	} else {
 		rpc.protocol_header = NULL;
@@ -1538,6 +1548,7 @@ int cmd_main(int argc, const char **argv)
 			printf("push\n");
 			printf("check-connectivity\n");
 			printf("object-format\n");
+			printf("push-base\n");
 			printf("\n");
 			fflush(stdout);
 		} else if (skip_prefix(buf.buf, "stateless-connect ", &arg)) {
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
index 22459d37f5..4ee29c1be0 100755
--- a/t/t5700-protocol-v1.sh
+++ b/t/t5700-protocol-v1.sh
@@ -373,6 +373,36 @@ test_expect_success 'push with http:// using protocol v1' '
 	grep "git< version 1" log
 '
 
+test_expect_success 'push with http:// using protocol v1 and --base' '
+	test_commit -C http_child four &&
+	COMMON_HASH=$(git -C http_child rev-parse three) &&
+
+	# Push to another branch, as the target repository has the
+	# master branch checked out and we cannot push into it.
+	GIT_TRACE_PACKET=1 test_might_fail git -C http_child -c protocol.version=1 \
+		push --base=three origin HEAD:client_branch_four 2>log &&
+
+	# Server responded using protocol v1
+	grep "git< version 1" log &&
+	# Server advertised only the expected object
+	grep "$COMMON_HASH .have" log
+'
+
+test_expect_success 'push with http:// using protocol v0 and --base' '
+	test_commit -C http_child five &&
+	COMMON_HASH=$(git -C http_child rev-parse four) &&
+
+	# Push to another branch, as the target repository has the
+	# master branch checked out and we cannot push into it.
+	GIT_TRACE_PACKET=1 git -C http_child -c protocol.version=0 \
+		push --base=four origin HEAD:client_branch_five 2>log &&
+
+	# Server did not respond with any version
+	! grep "git< version" log &&
+	# Server advertised only the expected object
+	grep "$COMMON_HASH .have" log
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
diff --git a/transport-helper.c b/transport-helper.c
index b573b6c730..15d9527419 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -27,6 +27,7 @@ struct helper_data {
 		export : 1,
 		option : 1,
 		push : 1,
+		push_base : 1,
 		connect : 1,
 		stateless_connect : 1,
 		signed_tags : 1,
@@ -186,6 +187,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->option = 1;
 		else if (!strcmp(capname, "push"))
 			data->push = 1;
+		else if (!strcmp(capname, "push-base"))
+			data->push_base = 1;
 		else if (!strcmp(capname, "import"))
 			data->import = 1;
 		else if (!strcmp(capname, "bidi-import"))
@@ -1183,6 +1186,18 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 			exit(128);
 	}
 
+	if (!is_null_oid(&data->transport_options.push_base)) {
+		if (data->push_base) {
+			write_str_in_full(helper->in, "option push-base ");
+			write_str_in_full(helper->in, oid_to_hex(&data->transport_options.push_base));
+			write_str_in_full(helper->in, "\n");
+			if (recvline(data, &buf) || strcmp(buf.buf, "ok"))
+				exit(128);
+		} else {
+			warning(_("transport does not support --base"));
+		}
+	}
+
 	if (data->push && for_push)
 		write_str_in_full(helper->in, "list for-push\n");
 	else
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/4] Doc: push with --base
  2020-11-03  0:26 [PATCH 0/4] "Push" protocol change proposal: user-specified base Jonathan Tan
                   ` (2 preceding siblings ...)
  2020-11-03  0:26 ` [PATCH 3/4] remote-curl: teach --base for http(s):// Jonathan Tan
@ 2020-11-03  0:26 ` Jonathan Tan
  2020-11-03  5:35   ` Jonathan Nieder
  2020-11-03 13:53   ` Derrick Stolee
  2020-11-03  0:46 ` [PATCH 0/4] "Push" protocol change proposal: user-specified base Junio C Hamano
  4 siblings, 2 replies; 21+ messages in thread
From: Jonathan Tan @ 2020-11-03  0:26 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Includes protocol documentation and a design document.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/pack-protocol.txt  | 10 ++--
 Documentation/technical/push-with-base.txt | 61 ++++++++++++++++++++++
 2 files changed, 68 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/technical/push-with-base.txt

diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
index e13a2c064d..0485616701 100644
--- a/Documentation/technical/pack-protocol.txt
+++ b/Documentation/technical/pack-protocol.txt
@@ -59,9 +59,9 @@ Parameters", and are supported by the Git, SSH, and HTTP protocols.
 Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
 
 Servers that receive any such Extra Parameters MUST ignore all
-unrecognized keys. Currently, the only Extra Parameter recognized is
-"version" with a value of '1' or '2'.  See protocol-v2.txt for more
-information on protocol version 2.
+unrecognized keys. Currently, the only Extra Parameters recognized are
+"version" with a value of '1' or '2' and, for push, "base" with an OID.  See
+protocol-v2.txt for more information on protocol version 2.
 
 Git Transport
 -------------
@@ -506,6 +506,10 @@ real difference is that the capability listing is different - the only
 possible values are 'report-status', 'report-status-v2', 'delete-refs',
 'ofs-delta', 'atomic' and 'push-options'.
 
+If a "base=<oid>" Extra Parameter was sent by the client, and the
+server recognizes that object, the server MAY send "<oid> .have" in
+lieu of all the reference obj-ids and names.
+
 Reference Update Request and Packfile Transfer
 ----------------------------------------------
 
diff --git a/Documentation/technical/push-with-base.txt b/Documentation/technical/push-with-base.txt
new file mode 100644
index 0000000000..d56aa7f900
--- /dev/null
+++ b/Documentation/technical/push-with-base.txt
@@ -0,0 +1,61 @@
+Push with base design notes
+===========================
+
+This feature allows clients, when pushing, to indicate that a
+certain object is an ancestor of all pushed commits and that they
+believe that the server knows of this object. This in turn allows
+servers to send an abbreviated ref advertisement containing only that
+object.
+
+Besides bandwidth savings, this also ensures that the ref
+advertisement contains information relevant to the client. For
+example, at least one project (Gerrit [1]) have included workarounds
+to send ancestors of refs that move often, even though the ref
+advertisement is only meant to contain refs.
+
+[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java
+
+
+Design overview
+---------------
+
+The "base" being sent is sent as an Extra Parameter, supported in the
+git://, ssh://, and http(s):// protocols. By sending it as an Extra
+Parameter, the server is aware of this parameter before it generates
+the ref advertisement, thus making it able to tailor the ref
+advertisement accordingly. Sending it as an Extra Parameter also makes
+this protocol backwards-compatible, as servers will ignore any Extra
+Parameters they do not understand. (The push will then proceed as if
+neither party had this feature.)
+
+The remote helper protocol has been extended to support the
+"push-base" capability and an option of the same name. When a remote
+helper advertises this capability, it thus indicates that it supports
+this option. Git then will send "option push-base" if the user
+specifies it when invoking "git push".
+
+The remote-curl remote helper bundled with Git has been updated to
+support this capability and option.
+
+
+Future work
+-----------
+
+In the future, we might want a way to automatically determine the base
+instead of always having the user specify it. However, this does not
+make obsolete any of the current work - once the base is automatically
+determined, we still need this protocol to communicate it to the
+server, and allowing the user to specify the base manually is still
+useful.
+
+
+Alternatives
+------------
+
+- Making a more substantial protocol change like "fetch" protocol v2.
+  This would eliminate the need for some of the remote helper updates;
+  as part of the protocol change, the protocol could be made to
+  support "stateless-connect" and thus no remote helper updates (like
+  "push-base") would be needed. For "fetch", the protocol change has
+  enabled features like wanted-refs and packfile-uris, but I do not
+  have any similar ideas in mind for "push".
-- 
2.29.1.341.ge80a0c044ae-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/4] "Push" protocol change proposal: user-specified base
  2020-11-03  0:26 [PATCH 0/4] "Push" protocol change proposal: user-specified base Jonathan Tan
                   ` (3 preceding siblings ...)
  2020-11-03  0:26 ` [PATCH 4/4] Doc: push with --base Jonathan Tan
@ 2020-11-03  0:46 ` Junio C Hamano
  4 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2020-11-03  0:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> At $DAY_JOB, we've noticed some pushes being larger than necessary.
> Here's a first step towards solving that. In this protocol change, we're
> allowing the user to specify an ancestor believed to be common

Is this designed to be a short-term kludge until we have a similar
"common discovery" negotiation we have on the upload-pack side, or
do we know some fundamental reason why the "push" side cannot do so?

Just being curious as to what the eventual and ideal future beyond
"we must have an immediate problem worked around in some way even
with an ugly hack before the end of this cycle" would be.

Will take a look but not today.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] remote-curl: teach --base for http(s)://
  2020-11-03  0:26 ` [PATCH 3/4] remote-curl: teach --base for http(s):// Jonathan Tan
@ 2020-11-03  1:13   ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2020-11-03  1:13 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
> index 22459d37f5..4ee29c1be0 100755
> --- a/t/t5700-protocol-v1.sh
> +++ b/t/t5700-protocol-v1.sh
> ...
> +	# Push to another branch, as the target repository has the
> +	# master branch checked out and we cannot push into it.
> +	GIT_TRACE_PACKET=1 test_might_fail git -C http_child -c protocol.version=1 \
> +		push --base=three origin HEAD:client_branch_four 2>log &&

This attempt to one-shot export GIT_TRACE_PACKET is flagged as an
error by test-lint-shell-syntax.

    test_might_fail env GIT_TRACE_PACKET=1 \
	git -C ... push --base=three ... &&

perhaps?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-03  0:26 ` [PATCH 4/4] Doc: push with --base Jonathan Tan
@ 2020-11-03  5:35   ` Jonathan Nieder
  2020-11-03 15:18     ` Jeff King
  2020-11-03 13:53   ` Derrick Stolee
  1 sibling, 1 reply; 21+ messages in thread
From: Jonathan Nieder @ 2020-11-03  5:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Elijah Newren, Derrick Stolee

(+cc: Elijah Newren and Derrick Stolee because of [1])
Hi,

Jonathan Tan wrote:

> Includes protocol documentation and a design document.

Thanks for writing this.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/technical/pack-protocol.txt  | 10 ++--
>  Documentation/technical/push-with-base.txt | 61 ++++++++++++++++++++++
>  2 files changed, 68 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/technical/push-with-base.txt
> 
> diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt
> index e13a2c064d..0485616701 100644
> --- a/Documentation/technical/pack-protocol.txt
> +++ b/Documentation/technical/pack-protocol.txt
> @@ -59,9 +59,9 @@ Parameters", and are supported by the Git, SSH, and HTTP protocols.
>  Each Extra Parameter takes the form of `<key>=<value>` or `<key>`.
>  
>  Servers that receive any such Extra Parameters MUST ignore all
> -unrecognized keys. Currently, the only Extra Parameter recognized is
> -"version" with a value of '1' or '2'.  See protocol-v2.txt for more
> -information on protocol version 2.
> +unrecognized keys. Currently, the only Extra Parameters recognized are
> +"version" with a value of '1' or '2' and, for push, "base" with an OID.  See
> +protocol-v2.txt for more information on protocol version 2.

It's nice to see an example of extra parameters being useful in
protocol v0 as well. :)

>  
>  Git Transport
>  -------------
> @@ -506,6 +506,10 @@ real difference is that the capability listing is different - the only
>  possible values are 'report-status', 'report-status-v2', 'delete-refs',
>  'ofs-delta', 'atomic' and 'push-options'.
>  
> +If a "base=<oid>" Extra Parameter was sent by the client, and the
> +server recognizes that object, the server MAY send "<oid> .have" in
> +lieu of all the reference obj-ids and names.
> +

Can this only appear once, or is it permitted to pass multiple oids
this way?

I'm also curious about what Junio asked about elsewhere in the thread:
for cases that benefit from more complex negotiation than the client
proposing a particular oid, what comes next in this roadmap?  I like
that this is an optional feature so we could clean up later by
removing support for it; do we expect to?  If we do expect to, is
there anything we could do to minimize the impact of the feature later
(e.g. using a less short-and-sweet key name than `base`, maybe)?

>  Reference Update Request and Packfile Transfer
>  ----------------------------------------------
>  
> diff --git a/Documentation/technical/push-with-base.txt b/Documentation/technical/push-with-base.txt
> new file mode 100644
> index 0000000000..d56aa7f900
> --- /dev/null
> +++ b/Documentation/technical/push-with-base.txt
> @@ -0,0 +1,61 @@
> +Push with base design notes
> +===========================
> +
> +This feature allows clients, when pushing, to indicate that a
> +certain object is an ancestor of all pushed commits and that they
> +believe that the server knows of this object. This in turn allows
> +servers to send an abbreviated ref advertisement containing only that
> +object.
> +
> +Besides bandwidth savings, this also ensures that the ref
> +advertisement contains information relevant to the client. For
> +example, at least one project (Gerrit [1]) have included workarounds
> +to send ancestors of refs that move often, even though the ref
> +advertisement is only meant to contain refs.
> +
> +[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java
> +
> +
> +Design overview
> +---------------
> +
> +The "base" being sent is sent as an Extra Parameter, supported in the
> +git://, ssh://, and http(s):// protocols. By sending it as an Extra
> +Parameter, the server is aware of this parameter before it generates
> +the ref advertisement, thus making it able to tailor the ref
> +advertisement accordingly. Sending it as an Extra Parameter also makes
> +this protocol backwards-compatible, as servers will ignore any Extra
> +Parameters they do not understand. (The push will then proceed as if
> +neither party had this feature.)
> +
> +The remote helper protocol has been extended to support the
> +"push-base" capability and an option of the same name. When a remote
> +helper advertises this capability, it thus indicates that it supports
> +this option. Git then will send "option push-base" if the user
> +specifies it when invoking "git push".
> +
> +The remote-curl remote helper bundled with Git has been updated to
> +support this capability and option.

Since this is an optional remote helper capability, it could be
removed later.  Good (but the capability and option names would still
need to be reserved).

> +
> +
> +Future work
> +-----------
> +
> +In the future, we might want a way to automatically determine the base
> +instead of always having the user specify it. However, this does not
> +make obsolete any of the current work - once the base is automatically
> +determined, we still need this protocol to communicate it to the
> +server, and allowing the user to specify the base manually is still
> +useful.

Makes sense.  I think this isn't "might", but "would"; most users
that do not know why their push is slow wouldn't know to use this
feature.

> +
> +
> +Alternatives
> +------------
> +
> +- Making a more substantial protocol change like "fetch" protocol v2.
> +  This would eliminate the need for some of the remote helper updates;
> +  as part of the protocol change, the protocol could be made to
> +  support "stateless-connect" and thus no remote helper updates (like
> +  "push-base") would be needed. For "fetch", the protocol change has
> +  enabled features like wanted-refs and packfile-uris, but I do not
> +  have any similar ideas in mind for "push".

I think you're saying that we don't need a "push" v2 because v0
already has what a user would want.

Git protocol v2 for fetch brought two major changes:

- it changed the response for the initial request, allowing
  abbreviating the ref advertisement at last

- it defined a structure for requests and responses, simplifying the
  addition of later protocol improvements.  In particular, because the
  initial response is a capability advertisement, it allows changing
  the ref advertisement format more in the future.

Both of those changes would be valuable for push.  The ref
advertisements are large, and matching the structure of commands used
by fetchv2 would make debugging easier.

There are some specific applications I'm interested in after that
(e.g., pushing symrefs), but the fundamental extensibility improvement
is larger than any particular application I could think of.

That said, I'm not against experimenting with extra parameters before
we go there, as a way of getting more information about what a
workable negotiation for push looks like.  The push ref advertisement
serves two roles: it advertises commits, to serve as negotiation, and
it tells the client the current values of refs they may want to
update, so they can send an appropriate compare-and-swap command for
a force-push.  If we have a workable replacement negotiation system,
then the other role can be replaced with something simpler.  For
example:

- listing ref updates before the pack, to allow the server to bail
  out early if they would be rejected, and

- allowing an explicit force update of a ref instead of requiring the
  client to compare-and-swap if they don't care about the old value
  (but taking care to still make the client "fetch first" when
  appropriate!)

Thanks,
Jonathan

[1] https://lore.kernel.org/git/CABPp-BEswHLhymQ1_07g3qqu=7kFR3eQyAHR0qMgSvi6THy=zQ@mail.gmail.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] push: teach --base for ssh:// and file://
  2020-11-03  0:26 ` [PATCH 2/4] push: teach --base for ssh:// and file:// Jonathan Tan
@ 2020-11-03 10:23   ` SZEDER Gábor
  2020-11-08 19:31     ` Junio C Hamano
  2020-11-03 13:57   ` Derrick Stolee
  2020-11-08 19:30   ` Junio C Hamano
  2 siblings, 1 reply; 21+ messages in thread
From: SZEDER Gábor @ 2020-11-03 10:23 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Nov 02, 2020 at 04:26:11PM -0800, Jonathan Tan wrote:
> diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
> index 022901b9eb..22459d37f5 100755
> --- a/t/t5700-protocol-v1.sh
> +++ b/t/t5700-protocol-v1.sh
> @@ -146,6 +146,56 @@ test_expect_success 'push with file:// using protocol v1' '
>  	grep "push< version 1" log
>  '
>  
> +test_expect_success 'push with file:// using protocol v1 and --base' '
> +	test_commit -C file_child four &&
> +	COMMON_HASH=$(git -C file_child rev-parse three) &&
> +
> +	# Push to another branch, as the target repository has the
> +	# master branch checked out and we cannot push into it.
> +	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=1 \
> +		push --base=three origin HEAD:client_branch_four 2>log &&
> +
> +	# Server responded using protocol v1
> +	grep "push< version 1" log &&
> +	# Server advertised only the expected object
> +	grep "$COMMON_HASH .have" log
> +'
> +
> +test_expect_success 'push with file:// using protocol v0 and --base' '
> +	test_commit -C file_child five &&
> +	COMMON_HASH=$(git -C file_child rev-parse four) &&
> +
> +	# Push to another branch, as the target repository has the
> +	# master branch checked out and we cannot push into it.
> +	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=0 \
> +		push --base=four origin HEAD:client_branch_five 2>log &&
> +
> +	# Server did not respond with any version
> +	! grep "push< version" log &&
> +	# Server advertised only the expected object
> +	grep "$COMMON_HASH .have" log
> +'
> +
> +test_expect_success 'push with invalid --base' '
> +	test_commit -C file_child six &&
> +
> +	# Server does not have "six".
> +	test_must_fail git -C file_child -c protocol.version=0 \
> +		push --base=an_invalid_object origin HEAD:client_branch_six 2>log &&
> +	grep "is not a valid object" log

This should rather use 'test_i18ngrep' ...

> +'
> +
> +test_expect_success 'push with --base that does not exist on server' '
> +	COMMON_HASH=$(git -C file_child rev-parse six) &&
> +
> +	# The push still succeeds.
> +	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=0 \
> +		push --base=six origin HEAD:client_branch_six 2>log &&
> +
> +	# Server did not advertise "six", since it does not know it
> +	! grep "$COMMON_HASH .have" log
> +'
> +
>  # Test protocol v1 with 'ssh://' transport
>  #
>  test_expect_success 'setup ssh wrapper' '


> diff --git a/transport.c b/transport.c
> index ffe2115845..531ca0a834 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -236,6 +236,10 @@ static int set_git_option(struct git_transport_options *opts,
>  		list_objects_filter_die_if_populated(&opts->filter_options);
>  		parse_list_objects_filter(&opts->filter_options, value);
>  		return 0;
> +	} else if (!strcmp(name, TRANS_OPT_PUSH_BASE)) {
> +		if (get_oid(value, &opts->push_base))
> +			die(_("transport: '%s' is not a valid object"), value);

... because the error message here is translated.

> +		return 0;
>  	}
>  	return 1;
>  }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-03  0:26 ` [PATCH 4/4] Doc: push with --base Jonathan Tan
  2020-11-03  5:35   ` Jonathan Nieder
@ 2020-11-03 13:53   ` Derrick Stolee
  1 sibling, 0 replies; 21+ messages in thread
From: Derrick Stolee @ 2020-11-03 13:53 UTC (permalink / raw)
  To: Jonathan Tan, git

On 11/2/2020 7:26 PM, Jonathan Tan wrote:
> Includes protocol documentation and a design document.

>  Servers that receive any such Extra Parameters MUST ignore all
> -unrecognized keys. Currently, the only Extra Parameter recognized is
> -"version" with a value of '1' or '2'.  See protocol-v2.txt for more
> -information on protocol version 2.
> +unrecognized keys. Currently, the only Extra Parameters recognized are
> +"version" with a value of '1' or '2' and, for push, "base" with an OID.  See
> +protocol-v2.txt for more information on protocol version 2.
>  
>  Git Transport
>  -------------
> @@ -506,6 +506,10 @@ real difference is that the capability listing is different - the only
>  possible values are 'report-status', 'report-status-v2', 'delete-refs',
>  'ofs-delta', 'atomic' and 'push-options'.
>  
> +If a "base=<oid>" Extra Parameter was sent by the client, and the
> +server recognizes that object, the server MAY send "<oid> .have" in
> +lieu of all the reference obj-ids and names.
> +

nit: no comma before "and the server recognizes" as these are both
conditions of the "if". My preference is also to include a "then"
before the result, especially with compound conditions like this.
Perhaps also make it more active voice while we are here?

+ If the client sent a "base=<oid>" Extra Parameter and the server
+ recognizes that object, then the server MAY send "<oid> .have" in
+ lieu of all the reference object ids and names.

> +Push with base design notes
> +===========================

Perhaps start with a brief problem statement?

+ If a client wishes to push data to a server without first running
+ `git fetch`, then they might not have local copies of the objects
+ at the server's ref tips. The client cannot compute an adequate
+ common base and will send more objects than necessary.

Then,

+ The "push with base" feature allows...

> +This feature allows clients, when pushing, to indicate that a
> +certain object is an ancestor of all pushed commits and that they
> +believe that the server knows of this object. This in turn allows
> +servers to send an abbreviated ref advertisement containing only that
> +object.

Why only one base? Could there be value in sending multiple possible
bases and the server can report the subset that they know about?

> +Besides bandwidth savings,

It seems you are underselling the bandwidth savings here, because
we save in both the ref advertisement and the push pack size. It
might be good to call out both metrics so we can see that the ref
improvement works even for cases where the client recently fetched
and can figure out the proper base on its own!

> this also ensures that the ref
> +advertisement contains information relevant to the client. For
> +example, at least one project (Gerrit [1]) have included workarounds
> +to send ancestors of refs that move often, even though the ref
> +advertisement is only meant to contain refs.
> +
> +[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java
> +
> +
> +Design overview
> +---------------
> +
> +The "base" being sent is sent as an Extra Parameter, supported in the
> +git://, ssh://, and http(s):// protocols. By sending it as an Extra
> +Parameter, the server is aware of this parameter before it generates
> +the ref advertisement, thus making it able to tailor the ref
> +advertisement accordingly. Sending it as an Extra Parameter also makes
> +this protocol backwards-compatible, as servers will ignore any Extra
> +Parameters they do not understand. (The push will then proceed as if
> +neither party had this feature.)

I'm not familiar enough with the extra parameter logic to know if we
allow multi-valued extra parameters. It would be good to relax the
"base" parameter to either allow multiple instances of the parameter
or to have something like a comma-separated list of OIDs be allowed.

This might be particularly important for multiple refs being pushed
at the same time.

> +The remote helper protocol has been extended to support the
> +"push-base" capability and an option of the same name. When a remote
> +helper advertises this capability, it thus indicates that it supports
> +this option. Git then will send "option push-base" if the user
> +specifies it when invoking "git push".
> +
> +The remote-curl remote helper bundled with Git has been updated to
> +support this capability and option.> +
> +
> +Future work
> +-----------
> +
> +In the future, we might want a way to automatically determine the base
> +instead of always having the user specify it. However, this does not
> +make obsolete any of the current work - once the base is automatically
> +determined, we still need this protocol to communicate it to the
> +server, and allowing the user to specify the base manually is still
> +useful.

It is appropriate to separate the protocol ability from the automatic
computation on the client. It would be nice to know how you expect that
to work.

For the single-ref push case, my default instinct would be to find a
merge-base between that ref and all of the remote refs matching the
remote we are pushing to. Bonus points if we find _all_ (maximal)
merge-bases!

The equivalent for the multi-ref push might be to find the boundary
commits from something like the following command:

	git rev-list --boundary \
		--not refs/remotes/origin/A \
		--not refs/remotes/origin/B \
		--not refs/remotes/origin/C \
		refs/heads/toPush/1 \
		refs/heads/toPush/2 \
		refs/heads/toPush/3

But you might have something better in mind.

> +
> +
> +Alternatives
> +------------

Odd to have a single bullet in a bulleted list.

> +- Making a more substantial protocol change like "fetch" protocol v2.

This isn't a full sentence.

Perhaps: "One considered approach was to introduce a multi-phase
negotiation step similar to how 'git fetch' operates."

> +  This would eliminate the need for some of the remote helper updates;
> +  as part of the protocol change, the protocol could be made to
> +  support "stateless-connect" and thus no remote helper updates (like
> +  "push-base") would be needed. For "fetch", the protocol change has
> +  enabled features like wanted-refs and packfile-uris, but I do not
> +  have any similar ideas in mind for "push".

I think that you should mention that a negotiation phase here would
use significantly more bandwidth to do the ref advertisements on both
ends with only a rare case that the negotiated bases are better than
the locally-computed bases. Everything is about tradeoffs here!

I'm generally excited about the opportunities here. I'd love to see
some measurements for reduced ref advertisements and reduced object
counts in the pushed pack.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] push: teach --base for ssh:// and file://
  2020-11-03  0:26 ` [PATCH 2/4] push: teach --base for ssh:// and file:// Jonathan Tan
  2020-11-03 10:23   ` SZEDER Gábor
@ 2020-11-03 13:57   ` Derrick Stolee
  2020-11-08 19:30   ` Junio C Hamano
  2 siblings, 0 replies; 21+ messages in thread
From: Derrick Stolee @ 2020-11-03 13:57 UTC (permalink / raw)
  To: Jonathan Tan, git

On 11/2/2020 7:26 PM, Jonathan Tan wrote:
> For the ssh:// and file:// transports, teach push the "--base"
> parameter, which indicates an ancestor of the commits to be pushed that
> is believed to be known by the server.

Looking at this patch, I don't see any reason why this couldn't be
a multi-valued parameter that populates a string_list on the client
and the server translates into a list of object_ids. Converting from
the current single-base model to a multiple-base model would probably
be just as big of a patch as this one, so perhaps jump straight to
that capability instead of pausing at the single-base model here?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-03  5:35   ` Jonathan Nieder
@ 2020-11-03 15:18     ` Jeff King
  2020-11-03 17:35       ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2020-11-03 15:18 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, git, Elijah Newren, Derrick Stolee

On Mon, Nov 02, 2020 at 09:35:54PM -0800, Jonathan Nieder wrote:

> I think you're saying that we don't need a "push" v2 because v0
> already has what a user would want.
> 
> Git protocol v2 for fetch brought two major changes:
> 
> - it changed the response for the initial request, allowing
>   abbreviating the ref advertisement at last
> 
> - it defined a structure for requests and responses, simplifying the
>   addition of later protocol improvements.  In particular, because the
>   initial response is a capability advertisement, it allows changing
>   the ref advertisement format more in the future.
> 
> Both of those changes would be valuable for push.  The ref
> advertisements are large, and matching the structure of commands used
> by fetchv2 would make debugging easier.
> 
> There are some specific applications I'm interested in after that
> (e.g., pushing symrefs), but the fundamental extensibility improvement
> is larger than any particular application I could think of.

You pretty much summed up what I was going to respond. :)

But I'd go further here...

> That said, I'm not against experimenting with extra parameters before
> we go there, as a way of getting more information about what a
> workable negotiation for push looks like.

I'd prefer to avoid doing this as an extra parameter for a few reasons:

  - once it's in a released version, it's much harder for us to take it
    away

  - the extra parameters area is a hack that helped us bootstrap v2. We
    could probably use the same hack to bootstrap v3, etc. But it has
    limitations for stuffing in arbitrary data. An obvious one is size.
    We can transmit a single base, but would be limited if we wanted to
    be able to send multiple. We already ran into this once with the
    "symref=foo:bar" capability overflowing pkt-line limits. Here I'm
    not even sure what the limits might be (it's subject to things like
    how big an HTTP header a proxy will pass, or how large an
    environment variable an ssh implementation supports)

  - it potentially pushes more data/work outside of the git protocol
    itself. E.g., web servers have to translate Git-Protocol headers
    into the GIT_PROTOCOL environment for v2. I guess this new field
    works in our tests because we copy the header's value entirely in
    our apache.conf. But I wonder how many systems in the wild may only
    work if it contains "version=2".

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-03 15:18     ` Jeff King
@ 2020-11-03 17:35       ` Junio C Hamano
  2020-11-09 19:56         ` Jonathan Tan
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2020-11-03 17:35 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Jonathan Tan, git, Elijah Newren, Derrick Stolee

Jeff King <peff@peff.net> writes:

> On Mon, Nov 02, 2020 at 09:35:54PM -0800, Jonathan Nieder wrote:
>
>> I think you're saying that we don't need a "push" v2 because v0
>> already has what a user would want.
>> 
>> Git protocol v2 for fetch brought two major changes:
>> 
>> - it changed the response for the initial request, allowing
>>   abbreviating the ref advertisement at last
>> 
>> - it defined a structure for requests and responses, simplifying the
>>   addition of later protocol improvements.  In particular, because the
>>   initial response is a capability advertisement, it allows changing
>>   the ref advertisement format more in the future.
>> 
>> Both of those changes would be valuable for push.  The ref
>> advertisements are large, and matching the structure of commands used
>> by fetchv2 would make debugging easier.
>> 
>> There are some specific applications I'm interested in after that
>> (e.g., pushing symrefs), but the fundamental extensibility improvement
>> is larger than any particular application I could think of.
>
> You pretty much summed up what I was going to respond. :)
>
> But I'd go further here...
>
>> That said, I'm not against experimenting with extra parameters before
>> we go there, as a way of getting more information about what a
>> workable negotiation for push looks like.
>
> I'd prefer to avoid doing this as an extra parameter for a few reasons:
>
>   - once it's in a released version, it's much harder for us to take it
>     away
>
>   - the extra parameters area is a hack that helped us bootstrap v2. We
>     could probably use the same hack to bootstrap v3, etc. But it has
>     limitations for stuffing in arbitrary data. An obvious one is size.
>     We can transmit a single base, but would be limited if we wanted to
>     be able to send multiple. We already ran into this once with the
>     "symref=foo:bar" capability overflowing pkt-line limits. Here I'm
>     not even sure what the limits might be (it's subject to things like
>     how big an HTTP header a proxy will pass, or how large an
>     environment variable an ssh implementation supports)
>
>   - it potentially pushes more data/work outside of the git protocol
>     itself. E.g., web servers have to translate Git-Protocol headers
>     into the GIT_PROTOCOL environment for v2. I guess this new field
>     works in our tests because we copy the header's value entirely in
>     our apache.conf. But I wonder how many systems in the wild may only
>     work if it contains "version=2".

I do not have much to add to what has been said so far, other than
offering historical perspective.

The single biggest reason why "fetch" has common ancestor discovery
negotiation and "push" does not is because the design comes from the
use case the inventor of Git and those worked on the early protocol
wanted to support---you are pushing into your own repository you
alone push into, your work is disseminated to others who fetch from
your repository, and you get others' work by fetching from theirs.

In such a world without a central server where everybody pushes
into, by definition, a pusher knows all the objects that have ever
been pushed into the receiving repository when running "git push".
They are all objects that passed through the repository you are
pushing from to the receiving repository in your previous pushes.
The advertised ref(s) are expected to be known to the repository you
are pusing from anyway, and if that is not the case, you would first
fetch from there before force-pushing.

Hence, when you push, there isn't much need to walk back from the
tip of refs at the remote to discover common ancestor like we do for
the fetch side in pre-central-server world.

On the other hand, you expect that remote refs point at objects
unknown to you when you fetch from your colleagues, so it is
expected that you have to perform the common ancestor discovery
negotiation.

After 15 years, we live in a different world.

People expect that a single repository at their hosting sites can be
used as the central meeting point for the project, just like CVS/SVN
servers were in older world.  "git push" would need to accept that
reality and start common ancestor discovery eventually.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] push: teach --base for ssh:// and file://
  2020-11-03  0:26 ` [PATCH 2/4] push: teach --base for ssh:// and file:// Jonathan Tan
  2020-11-03 10:23   ` SZEDER Gábor
  2020-11-03 13:57   ` Derrick Stolee
@ 2020-11-08 19:30   ` Junio C Hamano
  2 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2020-11-08 19:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

>  static int do_push(int flags,
>  		   const struct string_list *push_options,
> -		   struct remote *remote)
> +		   struct remote *remote,
> +		   const char *push_base)
>  {
>  	int i, errs;
>  	const char **url;
> @@ -405,6 +406,8 @@ static int do_push(int flags,
>  				transport_get(remote, url[i]);
>  			if (flags & TRANSPORT_PUSH_OPTIONS)
>  				transport->push_options = push_options;
> +			if (push_base)
> +				transport_set_option(transport, TRANS_OPT_PUSH_BASE, push_base);
>  			if (push_with_options(transport, push_refspec, flags))
>  				errs++;
>  		}
> @@ -413,6 +416,8 @@ static int do_push(int flags,
>  			transport_get(remote, NULL);
>  		if (flags & TRANSPORT_PUSH_OPTIONS)
>  			transport->push_options = push_options;
> +		if (push_base)
> +			transport_set_option(transport, TRANS_OPT_PUSH_BASE, push_base);
>  		if (push_with_options(transport, push_refspec, flags))
>  			errs++;
>  	}

These just send push_base as-is.

> @@ -526,6 +531,7 @@ int cmd_push(int argc, const char **argv, const char *prefix)
>  	struct string_list *push_options;
>  	const struct string_list_item *item;
>  	struct remote *remote;
> +	const char *push_base = NULL;
>  
>  	struct option options[] = {
>  		OPT__VERBOSITY(&verbosity),
> @@ -562,6 +568,8 @@ int cmd_push(int argc, const char **argv, const char *prefix)
>  				TRANSPORT_FAMILY_IPV4),
>  		OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
>  				TRANSPORT_FAMILY_IPV6),
> +		OPT_STRING(0, "base", &push_base, N_("revision"),
> +			   N_("ancestor of commits to be pushed that is believed to be known by the server")),
>  		OPT_END()
>  	};

And this takes push_base as a string that is not even validated for
any constraints.

> @@ -629,7 +637,7 @@ int cmd_push(int argc, const char **argv, const char *prefix)
>  		if (strchr(item->string, '\n'))
>  			die(_("push options must not have new line characters"));
>  
> -	rc = do_push(flags, push_options, remote);
> +	rc = do_push(flags, push_options, remote, push_base);


And passes that arbitrary cruft given by the user down to the
transport.

As the spirit of the "base" parameter is to tell the other side that
it is what the receiving end believes to be common, shouldn't we
make sure we do have it on our side after getting it from the user
with OPT_STRING() before passing it down to the transport layer and
have the transport layer convert it to an object name?  This patch
assumes that running get_oid_hex() at the transport layer and
assuming that the transport would keep working on the_repository
(hence when we say "We expect that the receiving end has 'master'",
the transport somehow knows that is 'master' in our repository, not
in a submodule repository, for example), but by converting it to
full object name early, we do not have to assume transport to stay
that way.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] push: teach --base for ssh:// and file://
  2020-11-03 10:23   ` SZEDER Gábor
@ 2020-11-08 19:31     ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2020-11-08 19:31 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jonathan Tan, git

SZEDER Gábor <szeder.dev@gmail.com> writes:

>> +	# Server does not have "six".
>> +	test_must_fail git -C file_child -c protocol.version=0 \
>> +		push --base=an_invalid_object origin HEAD:client_branch_six 2>log &&
>> +	grep "is not a valid object" log
>
> This should rather use 'test_i18ngrep' ...
>
>> +'
>> +
>> +test_expect_success 'push with --base that does not exist on server' '
>> +	COMMON_HASH=$(git -C file_child rev-parse six) &&
>> +
>> +	# The push still succeeds.
>> +	GIT_TRACE_PACKET=1 git -C file_child -c protocol.version=0 \
>> +		push --base=six origin HEAD:client_branch_six 2>log &&
>> +
>> +	# Server did not advertise "six", since it does not know it
>> +	! grep "$COMMON_HASH .have" log
>> +'
>> +
>>  # Test protocol v1 with 'ssh://' transport
>>  #
>>  test_expect_success 'setup ssh wrapper' '
>
>
>> diff --git a/transport.c b/transport.c
>> index ffe2115845..531ca0a834 100644
>> --- a/transport.c
>> +++ b/transport.c
>> @@ -236,6 +236,10 @@ static int set_git_option(struct git_transport_options *opts,
>>  		list_objects_filter_die_if_populated(&opts->filter_options);
>>  		parse_list_objects_filter(&opts->filter_options, value);
>>  		return 0;
>> +	} else if (!strcmp(name, TRANS_OPT_PUSH_BASE)) {
>> +		if (get_oid(value, &opts->push_base))
>> +			die(_("transport: '%s' is not a valid object"), value);
>
> ... because the error message here is translated.
>
>> +		return 0;
>>  	}
>>  	return 1;
>>  }

Yes, and we are getting CI failure ever since we queued this patch.

Let's discard the topic for now.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-03 17:35       ` Junio C Hamano
@ 2020-11-09 19:56         ` Jonathan Tan
  2020-11-09 21:00           ` Derrick Stolee
  2020-11-09 21:20           ` Junio C Hamano
  0 siblings, 2 replies; 21+ messages in thread
From: Jonathan Tan @ 2020-11-09 19:56 UTC (permalink / raw)
  To: gitster; +Cc: peff, jrnieder, jonathantanmy, git, newren, stolee

> People expect that a single repository at their hosting sites can be
> used as the central meeting point for the project, just like CVS/SVN
> servers were in older world.  "git push" would need to accept that
> reality and start common ancestor discovery eventually.

Thanks for your reply (and everyone else's). I was thinking that a more
rudimentary form of the feature would suffice, since I wasn't expecting
much more need in the future, but looks like this isn't the case. I'll
be thinking of a more comprehensive idea.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-09 19:56         ` Jonathan Tan
@ 2020-11-09 21:00           ` Derrick Stolee
  2020-11-09 22:22             ` Jonathan Tan
  2020-11-09 21:20           ` Junio C Hamano
  1 sibling, 1 reply; 21+ messages in thread
From: Derrick Stolee @ 2020-11-09 21:00 UTC (permalink / raw)
  To: Jonathan Tan, gitster; +Cc: peff, jrnieder, git, newren

On 11/9/20 2:56 PM, Jonathan Tan wrote:
>> People expect that a single repository at their hosting sites can be
>> used as the central meeting point for the project, just like CVS/SVN
>> servers were in older world.  "git push" would need to accept that
>> reality and start common ancestor discovery eventually.
> 
> Thanks for your reply (and everyone else's). I was thinking that a more
> rudimentary form of the feature would suffice, since I wasn't expecting
> much more need in the future, but looks like this isn't the case. I'll
> be thinking of a more comprehensive idea.

I think this "half round negotiation" idea you have has merit, and can
get us 95% of the benefit that a multi-round negotiation would bring
without those extra steps.

My concerns with the current series is that it isn't fully ready for
even that case. In my mind, a protocol change like this would need:

1. A top-to-bottom implementation that allows a user to opt-in to
   this new behavior with a config setting.

2. A demonstration of situations where this algorithm out-performs
   the existing algorithm (i.e. client is far behind server, but
   topic is a small change based on something in server's history)

3. A clear way to handle odd cases, such as multiple merge-bases.
   This leads to a change in how you are sending the data.

Perhaps this "send multiple OIDs in a payload" is already half-way to
implementing a full negotiation, and we might as well go all the way
in the writing. I expect that sending all maximal merge-bases will be
sufficient for the vast majority of cases, and so any multi-round
negotiation process will almost always end after the client sends that
data.

Looking forward to your next version.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-09 19:56         ` Jonathan Tan
  2020-11-09 21:00           ` Derrick Stolee
@ 2020-11-09 21:20           ` Junio C Hamano
  2020-11-09 21:40             ` Jeff King
  2020-11-09 22:47             ` Jonathan Tan
  1 sibling, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2020-11-09 21:20 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: peff, jrnieder, git, newren, stolee

Jonathan Tan <jonathantanmy@google.com> writes:

>> People expect that a single repository at their hosting sites can be
>> used as the central meeting point for the project, just like CVS/SVN
>> servers were in older world.  "git push" would need to accept that
>> reality and start common ancestor discovery eventually.
>
> Thanks for your reply (and everyone else's). I was thinking that a more
> rudimentary form of the feature would suffice, since I wasn't expecting
> much more need in the future, but looks like this isn't the case. I'll
> be thinking of a more comprehensive idea.

I said "eventually", meaning that we may not have to solve it
immediately, but judging from the need for ad-hoc workarounds like
sending older commits that are not necessarily at the tip of
anything from the receiving end as if they are tips and then another
ad-hoc workaround like this one, it seems that the need is real.

Would the earlier refactoring of the negotiation part into a
separate negotiator module help, or did the refactor not remove the
deep assumption that it is only about the fetch/upload-pack traffic
and we need a design for push/receive-pack from scratch?

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-09 21:20           ` Junio C Hamano
@ 2020-11-09 21:40             ` Jeff King
  2020-11-09 22:47             ` Jonathan Tan
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff King @ 2020-11-09 21:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, jrnieder, git, newren, stolee

On Mon, Nov 09, 2020 at 01:20:14PM -0800, Junio C Hamano wrote:

> Would the earlier refactoring of the negotiation part into a
> separate negotiator module help, or did the refactor not remove the
> deep assumption that it is only about the fetch/upload-pack traffic
> and we need a design for push/receive-pack from scratch?

I haven't thought too deeply about this problem area, but there is one
challenge I'd expect with reusing the existing negotiation protocol and
code: the stateless side is flipped for http.

In the fetch protocol, the client is stateful and makes a series of
requests to a stateless server, each time saying "I want X, we agreed
previously on Y, and here are some more options Z; let me know if that's
enough to generate a pack". If we were to flip that around, the client
must remain the stateful party, but it's no longer the receiver of the
pack. For the receiver (i.e., the stateless server) to iterate through
its history looking for a shard point, I think the message would have to
be more like "last time you told me about X, and gave me cut-points Y to
resume your traversal; tell me more about Y so that I can decide if
we're ready to send a pack".

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-09 21:00           ` Derrick Stolee
@ 2020-11-09 22:22             ` Jonathan Tan
  0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Tan @ 2020-11-09 22:22 UTC (permalink / raw)
  To: stolee; +Cc: jonathantanmy, gitster, peff, jrnieder, git, newren

> On 11/9/20 2:56 PM, Jonathan Tan wrote:
> >> People expect that a single repository at their hosting sites can be
> >> used as the central meeting point for the project, just like CVS/SVN
> >> servers were in older world.  "git push" would need to accept that
> >> reality and start common ancestor discovery eventually.
> > 
> > Thanks for your reply (and everyone else's). I was thinking that a more
> > rudimentary form of the feature would suffice, since I wasn't expecting
> > much more need in the future, but looks like this isn't the case. I'll
> > be thinking of a more comprehensive idea.
> 
> I think this "half round negotiation" idea you have has merit, and can
> get us 95% of the benefit that a multi-round negotiation would bring
> without those extra steps.

Thanks.

> My concerns with the current series is that it isn't fully ready for
> even that case. In my mind, a protocol change like this would need:
> 
> 1. A top-to-bottom implementation that allows a user to opt-in to
>    this new behavior with a config setting.

Makes sense. The client already needs to opt-in by specifying the base,
but yes, an option needs to be added for the server.

> 2. A demonstration of situations where this algorithm out-performs
>    the existing algorithm (i.e. client is far behind server, but
>    topic is a small change based on something in server's history)

Noted.

> 3. A clear way to handle odd cases, such as multiple merge-bases.
>    This leads to a change in how you are sending the data.
> 
> Perhaps this "send multiple OIDs in a payload" is already half-way to
> implementing a full negotiation, and we might as well go all the way
> in the writing. I expect that sending all maximal merge-bases will be
> sufficient for the vast majority of cases, and so any multi-round
> negotiation process will almost always end after the client sends that
> data.

I think the ability to send multiple bases would work in this scenario,
but this might run into concerns that other reviewers have brought up
(specifically, the fact that we're using environment variables and HTTP
headers to pass this information - and among other things, line length
limits). So a more comprehensive solution might be needed, even if only
for this reason.

> Looking forward to your next version.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] Doc: push with --base
  2020-11-09 21:20           ` Junio C Hamano
  2020-11-09 21:40             ` Jeff King
@ 2020-11-09 22:47             ` Jonathan Tan
  1 sibling, 0 replies; 21+ messages in thread
From: Jonathan Tan @ 2020-11-09 22:47 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, peff, jrnieder, git, newren, stolee

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >> People expect that a single repository at their hosting sites can be
> >> used as the central meeting point for the project, just like CVS/SVN
> >> servers were in older world.  "git push" would need to accept that
> >> reality and start common ancestor discovery eventually.
> >
> > Thanks for your reply (and everyone else's). I was thinking that a more
> > rudimentary form of the feature would suffice, since I wasn't expecting
> > much more need in the future, but looks like this isn't the case. I'll
> > be thinking of a more comprehensive idea.
> 
> I said "eventually", meaning that we may not have to solve it
> immediately, but judging from the need for ad-hoc workarounds like
> sending older commits that are not necessarily at the tip of
> anything from the receiving end as if they are tips and then another
> ad-hoc workaround like this one, it seems that the need is real.

That's true. And if we need a more comprehensive solution than putting
bases as Extra Parameters (and don't think that base-in-Extra-Parameter
will form a part of that comprehensive solution), then I do think that
we should go for the comprehensive solution instead of making something
obsolete that we will still need to support later.

> Would the earlier refactoring of the negotiation part into a
> separate negotiator module help, or did the refactor not remove the
> deep assumption that it is only about the fetch/upload-pack traffic
> and we need a design for push/receive-pack from scratch?

The negotiator module might help - looking that the API, it takes a list
of known commons (none in this case, because we want to skip the long
ref advertisement) and a list of tips to check (for this case, the tips
we want to push), and then the negotiation starts. So we might be able
to use it in a push v2.

Having said that, it might be possible to reuse the existing fetch v0/v2
protocol to perform this negotiation (preferably v2, so that we can skip
the ref advertisement). We'll just need to add the opposite of the
"no-done" capability to make sure that the server never sends a
packfile.

Once we have found the base commit(s) either through push v2 or through
fetch+opposite-of-no-done, we'll need to send the packfile somehow. My
proposal here could do it, although then we might run into the problems
Peff describes about Extra Parameters [1]. If we don't use Extra
Parameters, we would probably need a push v2, but we might then run into
similar problems to what we had during the fetch v2 migration (e.g.
unexpected subtly different behaviors).

[1] https://lore.kernel.org/git/20201103151859.GA444466@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-11-09 22:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03  0:26 [PATCH 0/4] "Push" protocol change proposal: user-specified base Jonathan Tan
2020-11-03  0:26 ` [PATCH 1/4] connect: refactor building of Extra Parameters Jonathan Tan
2020-11-03  0:26 ` [PATCH 2/4] push: teach --base for ssh:// and file:// Jonathan Tan
2020-11-03 10:23   ` SZEDER Gábor
2020-11-08 19:31     ` Junio C Hamano
2020-11-03 13:57   ` Derrick Stolee
2020-11-08 19:30   ` Junio C Hamano
2020-11-03  0:26 ` [PATCH 3/4] remote-curl: teach --base for http(s):// Jonathan Tan
2020-11-03  1:13   ` Junio C Hamano
2020-11-03  0:26 ` [PATCH 4/4] Doc: push with --base Jonathan Tan
2020-11-03  5:35   ` Jonathan Nieder
2020-11-03 15:18     ` Jeff King
2020-11-03 17:35       ` Junio C Hamano
2020-11-09 19:56         ` Jonathan Tan
2020-11-09 21:00           ` Derrick Stolee
2020-11-09 22:22             ` Jonathan Tan
2020-11-09 21:20           ` Junio C Hamano
2020-11-09 21:40             ` Jeff King
2020-11-09 22:47             ` Jonathan Tan
2020-11-03 13:53   ` Derrick Stolee
2020-11-03  0:46 ` [PATCH 0/4] "Push" protocol change proposal: user-specified base Junio C Hamano

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).