git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* Add proto v2 archive command with HTTP support
@ 2018-09-12  5:35 Josh Steadmon
  2018-09-12  5:35 ` [PATCH 1/3] archive: use packet_reader for communications Josh Steadmon
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Josh Steadmon @ 2018-09-12  5:35 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, l.s.r, sandals

This series adds a new protocol v2 command for archiving, and allows
this command to work over HTTP(S). This was previously discussed in [1].
I've CCed everyone who participated in that discussion.

[1]: https://public-inbox.org/git/CANq=j3tK7QeBJOC7VNWkh4+WBNibMJJp5YUkd9te5NaYwukAow@mail.gmail.com/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] archive: use packet_reader for communications
  2018-09-12  5:35 Add proto v2 archive command with HTTP support Josh Steadmon
@ 2018-09-12  5:35 ` Josh Steadmon
  2018-09-12 22:01   ` Stefan Beller
  2018-09-13 14:58   ` Junio C Hamano
  2018-09-12  5:35 ` [PATCH 2/3] archive: implement protocol v2 archive command Josh Steadmon
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Josh Steadmon @ 2018-09-12  5:35 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, l.s.r, sandals, Josh Steadmon

Using packet_reader will simplify version detection and capability
handling, which will make implementation of protocol v2 support in
git-archive easier.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 builtin/archive.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f67539..e54fc39ad 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char **argv,
 			       const char *remote, const char *exec,
 			       const char *name_hint)
 {
-	char *buf;
 	int fd[2], i, rv;
 	struct transport *transport;
 	struct remote *_remote;
+	struct packet_reader reader;
+	enum packet_read_status status;
 
 	_remote = remote_get(remote);
 	if (!_remote->url[0])
@@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv,
 	transport = transport_get(_remote, _remote->url[0]);
 	transport_connect(transport, "git-upload-archive", exec, fd);
 
+	packet_reader_init(&reader, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
+
 	/*
 	 * Inject a fake --format field at the beginning of the
 	 * arguments, with the format inferred from our output
@@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char **argv,
 		packet_write_fmt(fd[1], "argument %s\n", argv[i]);
 	packet_flush(fd[1]);
 
-	buf = packet_read_line(fd[0], NULL);
-	if (!buf)
+	status = packet_reader_read(&reader);
+
+	if (status == PACKET_READ_FLUSH)
 		die(_("git archive: expected ACK/NAK, got a flush packet"));
-	if (strcmp(buf, "ACK")) {
-		if (starts_with(buf, "NACK "))
-			die(_("git archive: NACK %s"), buf + 5);
-		if (starts_with(buf, "ERR "))
-			die(_("remote error: %s"), buf + 4);
+	if (strcmp(reader.buffer, "ACK")) {
+		if (starts_with(reader.buffer, "NACK "))
+			die(_("git archive: NACK %s"), reader.buffer + 5);
+		if (starts_with(reader.buffer, "ERR "))
+			die(_("remote error: %s"), reader.buffer + 4);
 		die(_("git archive: protocol error"));
 	}
 
-	if (packet_read_line(fd[0], NULL))
+	status = packet_reader_read(&reader);
+	if (status != PACKET_READ_FLUSH)
 		die(_("git archive: expected a flush"));
 
 	/* Now, start reading from fd[0] and spit it out to stdout */
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-12  5:35 Add proto v2 archive command with HTTP support Josh Steadmon
  2018-09-12  5:35 ` [PATCH 1/3] archive: use packet_reader for communications Josh Steadmon
@ 2018-09-12  5:35 ` Josh Steadmon
  2018-09-12 22:28   ` Stefan Beller
                     ` (2 more replies)
  2018-09-12  5:35 ` [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2 Josh Steadmon
  2018-09-14  5:36 ` Add proto v2 archive command with HTTP support Jonathan Nieder
  3 siblings, 3 replies; 19+ messages in thread
From: Josh Steadmon @ 2018-09-12  5:35 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, l.s.r, sandals, Josh Steadmon

This adds a new archive command for protocol v2. The command expects
arguments in the form "argument X" which are passed unmodified to
git-upload-archive--writer.

This command works over the file://, Git, and SSH transports. HTTP
support will be added in a separate patch.

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 builtin/archive.c        | 45 +++++++++++++++++++++++++++-------------
 builtin/upload-archive.c | 44 ++++++++++++++++++++++++++++++++++++---
 t/t5000-tar-tree.sh      |  5 +++++
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e54fc39ad..73831887d 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -5,9 +5,11 @@
 #include "cache.h"
 #include "builtin.h"
 #include "archive.h"
+#include "connect.h"
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -23,6 +25,13 @@ static void create_output_file(const char *output_file)
 	}
 }
 
+static int do_v2_command_and_cap(int out)
+{
+	packet_write_fmt(out, "command=archive\n");
+	/* Capability list would go here, if we had any. */
+	packet_delim(out);
+}
+
 static int run_remote_archiver(int argc, const char **argv,
 			       const char *remote, const char *exec,
 			       const char *name_hint)
@@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
 	struct remote *_remote;
 	struct packet_reader reader;
 	enum packet_read_status status;
+	enum protocol_version version;
 
 	_remote = remote_get(remote);
 	if (!_remote->url[0])
@@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
 
 	packet_reader_init(&reader, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
 
+	version = discover_version(&reader);
+
+	if (version == protocol_v2)
+		do_v2_command_and_cap(fd[1]);
+
 	/*
 	 * Inject a fake --format field at the beginning of the
 	 * arguments, with the format inferred from our output
@@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv,
 		packet_write_fmt(fd[1], "argument %s\n", argv[i]);
 	packet_flush(fd[1]);
 
-	status = packet_reader_read(&reader);
-
-	if (status == PACKET_READ_FLUSH)
-		die(_("git archive: expected ACK/NAK, got a flush packet"));
-	if (strcmp(reader.buffer, "ACK")) {
-		if (starts_with(reader.buffer, "NACK "))
-			die(_("git archive: NACK %s"), reader.buffer + 5);
-		if (starts_with(reader.buffer, "ERR "))
-			die(_("remote error: %s"), reader.buffer + 4);
-		die(_("git archive: protocol error"));
+	if (version == protocol_v0) {
+		status = packet_reader_read(&reader);
+
+		if (status == PACKET_READ_FLUSH)
+			die(_("git archive: expected ACK/NAK, got a flush packet"));
+		if (strcmp(reader.buffer, "ACK")) {
+			if (starts_with(reader.buffer, "NACK "))
+				die(_("git archive: NACK %s"), reader.buffer + 5);
+			if (starts_with(reader.buffer, "ERR "))
+				die(_("remote error: %s"), reader.buffer + 4);
+			die(_("git archive: protocol error"));
+		}
+
+		status = packet_reader_read(&reader);
+		if (status != PACKET_READ_FLUSH)
+			die(_("git archive: expected a flush"));
 	}
 
-	status = packet_reader_read(&reader);
-	if (status != PACKET_READ_FLUSH)
-		die(_("git archive: expected a flush"));
-
 	/* Now, start reading from fd[0] and spit it out to stdout */
 	rv = recv_sideband("archive", fd[0], 1);
 	rv |= transport_disconnect(transport);
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 25d911635..534e8fd56 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -5,6 +5,7 @@
 #include "builtin.h"
 #include "archive.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 #include "run-command.h"
 #include "argv-array.h"
@@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band)
 	return sz;
 }
 
+static int handle_v2_command_and_cap(void)
+{
+	struct packet_reader reader;
+	enum packet_read_status status;
+
+	packet_reader_init(&reader, 0, NULL, 0, PACKET_READ_CHOMP_NEWLINE);
+
+	packet_write_fmt(1, "version 2\n");
+	/*
+	 * We don't currently send any capabilities, but maybe we could list
+	 * supported archival formats?
+	 */
+	packet_flush(1);
+
+	status = packet_reader_read(&reader);
+	if (status != PACKET_READ_NORMAL ||
+	    strcmp(reader.buffer, "command=archive"))
+		die(_("upload-archive: expected command=archive"));
+	while (status == PACKET_READ_NORMAL) {
+		/* We don't currently expect any client capabilities, but we
+		 * should still read (and ignore) any that happen to get sent.
+		 */
+		status = packet_reader_read(&reader);
+	}
+	if (status != PACKET_READ_DELIM)
+		die(_("upload-archive: expected delim packet"));
+
+	/* Let git-upload-archive--writer handle the arguments. */
+
+	return 0;
+}
+
 int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 {
 	struct child_process writer = { argv };
+	enum protocol_version version = determine_protocol_version_server();
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(upload_archive_usage);
 
+	if (version == protocol_v2)
+		handle_v2_command_and_cap();
+	else {
+		packet_write_fmt(1, "ACK\n");
+		packet_flush(1);
+	}
+
 	/*
 	 * Set up sideband subprocess.
 	 *
@@ -96,9 +137,6 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 		die("upload-archive: %s", strerror(err));
 	}
 
-	packet_write_fmt(1, "ACK\n");
-	packet_flush(1);
-
 	while (1) {
 		struct pollfd pfd[2];
 
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 2a97b27b0..4be74d6e9 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -145,6 +145,11 @@ test_expect_success \
 
 check_tar b
 
+test_expect_success 'protocol v2 for remote' '
+	GIT_PROTOCOL="version=2" git archive --remote=. HEAD >v2_remote.tar
+'
+check_tar v2_remote
+
 test_expect_success 'git archive --prefix=prefix/' '
 	git archive --prefix=prefix/ HEAD >with_prefix.tar
 '
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2
  2018-09-12  5:35 Add proto v2 archive command with HTTP support Josh Steadmon
  2018-09-12  5:35 ` [PATCH 1/3] archive: use packet_reader for communications Josh Steadmon
  2018-09-12  5:35 ` [PATCH 2/3] archive: implement protocol v2 archive command Josh Steadmon
@ 2018-09-12  5:35 ` Josh Steadmon
  2018-09-12 22:38   ` Stefan Beller
                     ` (2 more replies)
  2018-09-14  5:36 ` Add proto v2 archive command with HTTP support Jonathan Nieder
  3 siblings, 3 replies; 19+ messages in thread
From: Josh Steadmon @ 2018-09-12  5:35 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, l.s.r, sandals, Josh Steadmon

Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 builtin/archive.c  |  8 +++++++-
 http-backend.c     | 10 +++++++++-
 transport-helper.c |  5 +++--
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index 73831887d..5fa75b3f7 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
 		status = packet_reader_read(&reader);
 		if (status != PACKET_READ_FLUSH)
 			die(_("git archive: expected a flush"));
-	}
+	} else if (version == protocol_v2 &&
+		   starts_with(transport->url, "http"))
+		/*
+		 * Commands over HTTP require two requests, so there's an
+		 * additional server response to parse.
+		 */
+		discover_version(&reader);
 
 	/* Now, start reading from fd[0] and spit it out to stdout */
 	rv = recv_sideband("archive", fd[0], 1);
diff --git a/http-backend.c b/http-backend.c
index 458642ef7..d62d583c7 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -32,6 +32,7 @@ struct rpc_service {
 static struct rpc_service rpc_service[] = {
 	{ "upload-pack", "uploadpack", 1, 1 },
 	{ "receive-pack", "receivepack", 0, -1 },
+	{ "upload-archive", "uploadarchive", 1, 1 },
 };
 
 static struct string_list *get_parameters(void)
@@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char *service_name)
 	struct rpc_service *svc = select_service(hdr, service_name);
 	struct strbuf buf = STRBUF_INIT;
 
+	if (!strcmp(service_name, "git-upload-archive")) {
+		/* git-upload-archive doesn't need --stateless-rpc */
+		argv[1] = ".";
+		argv[2] = NULL;
+	}
+
 	strbuf_reset(&buf);
 	strbuf_addf(&buf, "application/x-git-%s-request", svc->name);
 	check_content_type(hdr, buf.buf);
@@ -713,7 +720,8 @@ static struct service_cmd {
 	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
 
 	{"POST", "/git-upload-pack$", service_rpc},
-	{"POST", "/git-receive-pack$", service_rpc}
+	{"POST", "/git-receive-pack$", service_rpc},
+	{"POST", "/git-upload-archive$", service_rpc},
 };
 
 static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
diff --git a/transport-helper.c b/transport-helper.c
index 143ca008c..b4b96fc89 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -605,7 +605,8 @@ static int process_connect_service(struct transport *transport,
 		ret = run_connect(transport, &cmdbuf);
 	} else if (data->stateless_connect &&
 		   (get_protocol_version_config() == protocol_v2) &&
-		   !strcmp("git-upload-pack", name)) {
+		   (!strcmp("git-upload-pack", name) ||
+		    !strcmp("git-upload-archive", name))) {
 		strbuf_addf(&cmdbuf, "stateless-connect %s\n", name);
 		ret = run_connect(transport, &cmdbuf);
 		if (ret)
@@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, const char *name,
 
 	/* Get_helper so connect is inited. */
 	get_helper(transport);
-	if (!data->connect)
+	if (!data->connect && !data->stateless_connect)
 		die(_("operation not supported by protocol"));
 
 	if (!process_connect_service(transport, name, exec))
-- 
2.19.0.397.gdd90340f6a-goog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] archive: use packet_reader for communications
  2018-09-12  5:35 ` [PATCH 1/3] archive: use packet_reader for communications Josh Steadmon
@ 2018-09-12 22:01   ` Stefan Beller
  2018-09-13 14:58   ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-09-12 22:01 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: git, Jonathan Nieder, Junio C Hamano, René Scharfe,
	brian m. carlson

On Tue, Sep 11, 2018 at 10:35 PM Josh Steadmon <steadmon@google.com> wrote:
>
> Using packet_reader will simplify version detection and capability
> handling, which will make implementation of protocol v2 support in
> git-archive easier.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>

Reviewed-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-12  5:35 ` [PATCH 2/3] archive: implement protocol v2 archive command Josh Steadmon
@ 2018-09-12 22:28   ` Stefan Beller
  2018-09-13 18:45     ` Ævar Arnfjörð Bjarmason
  2018-09-13 16:31   ` Junio C Hamano
  2018-09-14  5:39   ` Jonathan Nieder
  2 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2018-09-12 22:28 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: git, Jonathan Nieder, Junio C Hamano, René Scharfe,
	brian m. carlson

On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon <steadmon@google.com> wrote:
>
> This adds a new archive command for protocol v2. The command expects
> arguments in the form "argument X" which are passed unmodified to
> git-upload-archive--writer.
>
> This command works over the file://, Git, and SSH transports. HTTP
> support will be added in a separate patch.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  builtin/archive.c        | 45 +++++++++++++++++++++++++++-------------
>  builtin/upload-archive.c | 44 ++++++++++++++++++++++++++++++++++++---
>  t/t5000-tar-tree.sh      |  5 +++++
>  3 files changed, 77 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index e54fc39ad..73831887d 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -5,9 +5,11 @@
>  #include "cache.h"
>  #include "builtin.h"
>  #include "archive.h"
> +#include "connect.h"
>  #include "transport.h"
>  #include "parse-options.h"
>  #include "pkt-line.h"
> +#include "protocol.h"
>  #include "sideband.h"
>
>  static void create_output_file(const char *output_file)
> @@ -23,6 +25,13 @@ static void create_output_file(const char *output_file)
>         }
>  }
>
> +static int do_v2_command_and_cap(int out)
> +{
> +       packet_write_fmt(out, "command=archive\n");
> +       /* Capability list would go here, if we had any. */
> +       packet_delim(out);
> +}
> +
>  static int run_remote_archiver(int argc, const char **argv,
>                                const char *remote, const char *exec,
>                                const char *name_hint)
> @@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
>         struct remote *_remote;
>         struct packet_reader reader;
>         enum packet_read_status status;
> +       enum protocol_version version;
>
>         _remote = remote_get(remote);
>         if (!_remote->url[0])
> @@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
>
>         packet_reader_init(&reader, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
>
> +       version = discover_version(&reader);
> +
> +       if (version == protocol_v2)
> +               do_v2_command_and_cap(fd[1]);
> +
>         /*
>          * Inject a fake --format field at the beginning of the
>          * arguments, with the format inferred from our output
> @@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv,
>                 packet_write_fmt(fd[1], "argument %s\n", argv[i]);
>         packet_flush(fd[1]);
>
> -       status = packet_reader_read(&reader);
> -
> -       if (status == PACKET_READ_FLUSH)
> -               die(_("git archive: expected ACK/NAK, got a flush packet"));
> -       if (strcmp(reader.buffer, "ACK")) {
> -               if (starts_with(reader.buffer, "NACK "))
> -                       die(_("git archive: NACK %s"), reader.buffer + 5);
> -               if (starts_with(reader.buffer, "ERR "))
> -                       die(_("remote error: %s"), reader.buffer + 4);
> -               die(_("git archive: protocol error"));

Maybe we also want to support v1
(which is v0 prefixed with one pkt_line saying it is v1).

    If (version == protocol_v1)
        /* drop version v1 line, and then follow v0 logic. */
        packet_reader_read(&reader);

Do we care about v1, or do we just ignore it here? why?
(Don't answer me here, but rather put it in the commit message)

> +       if (version == protocol_v0) {
> +               status = packet_reader_read(&reader);
> +
> +               if (status == PACKET_READ_FLUSH)
> +                       die(_("git archive: expected ACK/NAK, got a flush packet"));
> +               if (strcmp(reader.buffer, "ACK")) {
> +                       if (starts_with(reader.buffer, "NACK "))
> +                               die(_("git archive: NACK %s"), reader.buffer + 5);
> +                       if (starts_with(reader.buffer, "ERR "))
> +                               die(_("remote error: %s"), reader.buffer + 4);
> +                       die(_("git archive: protocol error"));
> +               }
> +
> +               status = packet_reader_read(&reader);
> +               if (status != PACKET_READ_FLUSH)
> +                       die(_("git archive: expected a flush"));
>         }
>
> -       status = packet_reader_read(&reader);
> -       if (status != PACKET_READ_FLUSH)
> -               die(_("git archive: expected a flush"));
> -
>         /* Now, start reading from fd[0] and spit it out to stdout */
>         rv = recv_sideband("archive", fd[0], 1);
>         rv |= transport_disconnect(transport);
> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 25d911635..534e8fd56 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -5,6 +5,7 @@
>  #include "builtin.h"
>  #include "archive.h"
>  #include "pkt-line.h"
> +#include "protocol.h"
>  #include "sideband.h"
>  #include "run-command.h"
>  #include "argv-array.h"
> @@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band)
>         return sz;
>  }
>
> +static int handle_v2_command_and_cap(void)
> +{
> +       struct packet_reader reader;
> +       enum packet_read_status status;
> +
> +       packet_reader_init(&reader, 0, NULL, 0, PACKET_READ_CHOMP_NEWLINE);
> +
> +       packet_write_fmt(1, "version 2\n");
> +       /*
> +        * We don't currently send any capabilities, but maybe we could list
> +        * supported archival formats?
> +        */
> +       packet_flush(1);
> +
> +       status = packet_reader_read(&reader);
> +       if (status != PACKET_READ_NORMAL ||
> +           strcmp(reader.buffer, "command=archive"))
> +               die(_("upload-archive: expected command=archive"));
> +       while (status == PACKET_READ_NORMAL) {
> +               /* We don't currently expect any client capabilities, but we
> +                * should still read (and ignore) any that happen to get sent.

/*
 * Makes sense to ignore the client capabilities here,
 * but the multi line comments take their opening
 * and closing line on a separate line. just like above.
 */

> +                */
> +               status = packet_reader_read(&reader);
> +       }
> +       if (status != PACKET_READ_DELIM)
> +               die(_("upload-archive: expected delim packet"));

This is upload-archive, which is a low level plumbing command
(see the main man page of git for an explanation of that category),
so we do not translate the error/die() calls. Besides, this is executed
on the server, which might have a different locale than the requesting
client?

Would asking for a setlocale() on the server side be an unreasonable
feature request for the capabilities (in a follow up patch, and then not
just for archive but also fetch/push, etc.)?

>  int cmd_upload_archive(int argc, const char **argv, const char *prefix)
>  {
>         struct child_process writer = { argv };
> +       enum protocol_version version = determine_protocol_version_server();
>
>         if (argc == 2 && !strcmp(argv[1], "-h"))
>                 usage(upload_archive_usage);
>
> +       if (version == protocol_v2)
> +               handle_v2_command_and_cap();
> +       else {

So if the client asked for v1, we still fall back to v0 here,
which answers my question above.

> +               packet_write_fmt(1, "ACK\n");
> +               packet_flush(1);
> +       }
> +
>         /*
>          * Set up sideband subprocess.
>          *
> @@ -96,9 +137,6 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
>                 die("upload-archive: %s", strerror(err));
>         }
>
> -       packet_write_fmt(1, "ACK\n");
> -       packet_flush(1);
> -
>         while (1) {
>                 struct pollfd pfd[2];
>
> diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> index 2a97b27b0..4be74d6e9 100755
> --- a/t/t5000-tar-tree.sh
> +++ b/t/t5000-tar-tree.sh
> @@ -145,6 +145,11 @@ test_expect_success \
>
>  check_tar b
>
> +test_expect_success 'protocol v2 for remote' '
> +       GIT_PROTOCOL="version=2" git archive --remote=. HEAD >v2_remote.tar
> +'
> +check_tar v2_remote

Our current standard is to keep all executions inside
a test_expect_* block, but here it is hard to comply with
that as the check_tar function contains test_expect_*
and calling test_expect_* from within itself doesn't work
with our test suite.

So bonus points for a refactoring to bring t5000 up to
our current standard (c.f. t0020 for a reasonable new
code, and t2002 for older code, though that only covers
syntax, not functions)

The check itself is just testing that giving GIT_PROTOCOL=2
in the environment also let's you obtain an archive. It doesn't
test if the actual communication *is* v2.
See 5e3548ef161 (fetch: send server options when using
protocol v2, 2018-04-23) for an example how to sniff on the
network traffic in tests, i.e. use GIT_TRACE_PACKET=...
and grep on that?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2
  2018-09-12  5:35 ` [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2 Josh Steadmon
@ 2018-09-12 22:38   ` Stefan Beller
  2018-09-13 16:47   ` Junio C Hamano
  2018-09-14  5:57   ` Jonathan Nieder
  2 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-09-12 22:38 UTC (permalink / raw)
  To: Josh Steadmon
  Cc: git, Jonathan Nieder, Junio C Hamano, René Scharfe,
	brian m. carlson

On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon <steadmon@google.com> wrote:
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  builtin/archive.c  |  8 +++++++-
>  http-backend.c     | 10 +++++++++-
>  transport-helper.c |  5 +++--
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index 73831887d..5fa75b3f7 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
>                 status = packet_reader_read(&reader);
>                 if (status != PACKET_READ_FLUSH)
>                         die(_("git archive: expected a flush"));
> -       }
> +       } else if (version == protocol_v2 &&
> +                  starts_with(transport->url, "http"))

I assume this is a smart way to cover both http and https
as the latter is prefixed by the former.

How does this interact with remote helpers
(See gitremote-helpers(1), which doesn't mention archives,
but I would imagine that one would be able to use a remote
helper eventually, too?

    git archive --remote=abc://example.org/test ....

> +               /*
> +                * Commands over HTTP require two requests, so there's an
> +                * additional server response to parse.
> +                */
> +               discover_version(&reader);
>
>         /* Now, start reading from fd[0] and spit it out to stdout */
>         rv = recv_sideband("archive", fd[0], 1);
> diff --git a/http-backend.c b/http-backend.c
> index 458642ef7..d62d583c7 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -32,6 +32,7 @@ struct rpc_service {
>  static struct rpc_service rpc_service[] = {
>         { "upload-pack", "uploadpack", 1, 1 },
>         { "receive-pack", "receivepack", 0, -1 },
> +       { "upload-archive", "uploadarchive", 1, 1 },

So git archives are not supported in protocol v0 via http?
Then it makes sense to see not mention of archives in
the remote helpers as well.

The rest of the code is a surprisingly small patch.

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] archive: use packet_reader for communications
  2018-09-12  5:35 ` [PATCH 1/3] archive: use packet_reader for communications Josh Steadmon
  2018-09-12 22:01   ` Stefan Beller
@ 2018-09-13 14:58   ` Junio C Hamano
  2018-09-13 15:34     ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2018-09-13 14:58 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jrnieder, l.s.r, sandals

Josh Steadmon <steadmon@google.com> writes:

> Using packet_reader will simplify version detection and capability
> handling, which will make implementation of protocol v2 support in
> git-archive easier.

Is this meant as a change in implementation without any change in
behaviour?

> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  builtin/archive.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index e74f67539..e54fc39ad 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char **argv,
>  			       const char *remote, const char *exec,
>  			       const char *name_hint)
>  {
> -	char *buf;
>  	int fd[2], i, rv;
>  	struct transport *transport;
>  	struct remote *_remote;
> +	struct packet_reader reader;
> +	enum packet_read_status status;
>  
>  	_remote = remote_get(remote);
>  	if (!_remote->url[0])
> @@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv,
>  	transport = transport_get(_remote, _remote->url[0]);
>  	transport_connect(transport, "git-upload-archive", exec, fd);
>  
> +	packet_reader_init(&reader, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
> +
>  	/*
>  	 * Inject a fake --format field at the beginning of the
>  	 * arguments, with the format inferred from our output
> @@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char **argv,
>  		packet_write_fmt(fd[1], "argument %s\n", argv[i]);
>  	packet_flush(fd[1]);
>  
> -	buf = packet_read_line(fd[0], NULL);
> -	if (!buf)
> +	status = packet_reader_read(&reader);
> +
> +	if (status == PACKET_READ_FLUSH)
>  		die(_("git archive: expected ACK/NAK, got a flush packet"));

It is true that packet_read_line() returns a NULL on flush, but the
function also returns NULL on other conditions for which underlying
packet_read() returns 0 (or negative) length.  EOF, normal data with
zero length (i.e. packet length itself is 4), and DELIM packets
would all have led to this die() in the original code.  We fail to
notice that we got something unexpected when we were expecting to
get a normal packet with ACK or NACK on it.

> -	if (strcmp(buf, "ACK")) {
> -		if (starts_with(buf, "NACK "))
> -			die(_("git archive: NACK %s"), buf + 5);
> -		if (starts_with(buf, "ERR "))
> -			die(_("remote error: %s"), buf + 4);
> +	if (strcmp(reader.buffer, "ACK")) {

The way I read packet_reader_read()'s implementation and existing
callers (like the ones in fetch-pack.c) tells me that consumers
should not be looking at "reader.buffer"; they should instead be
reading from "reader.line".

> +		if (starts_with(reader.buffer, "NACK "))
> +			die(_("git archive: NACK %s"), reader.buffer + 5);
> +		if (starts_with(reader.buffer, "ERR "))
> +			die(_("remote error: %s"), reader.buffer + 4);
>  		die(_("git archive: protocol error"));
>  	}
>  
> -	if (packet_read_line(fd[0], NULL))
> +	status = packet_reader_read(&reader);
> +	if (status != PACKET_READ_FLUSH)
>  		die(_("git archive: expected a flush"));

This makes me wonder what happens if we got an EOF instead.  We fail
to notice protocol error here, but do the code after this part
correctly handle the situation?

>  	/* Now, start reading from fd[0] and spit it out to stdout */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] archive: use packet_reader for communications
  2018-09-13 14:58   ` Junio C Hamano
@ 2018-09-13 15:34     ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2018-09-13 15:34 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jrnieder, l.s.r, sandals

Junio C Hamano <gitster@pobox.com> writes:

>> -	if (packet_read_line(fd[0], NULL))
>> +	status = packet_reader_read(&reader);
>> +	if (status != PACKET_READ_FLUSH)
>>  		die(_("git archive: expected a flush"));
>
> This makes me wonder what happens if we got an EOF instead.  We fail
> to notice protocol error here, but do the code after this part
> correctly handle the situation?

Sorry, this part of my comment is completely backwards.

We require they send a flush, not a 0-length data packet of length
4, and otherwise we die, even though we used to treate 0-length data
packet of length 4 just like a flush.

So this is making the code more strict than the original.  As long
as all the existing implementations correctly use flush here, there
would be no unintended regression, but it bothers me that we have to
even worry if these behaviour changes affect the already deployed
software negatively.

>
>>  	/* Now, start reading from fd[0] and spit it out to stdout */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-12  5:35 ` [PATCH 2/3] archive: implement protocol v2 archive command Josh Steadmon
  2018-09-12 22:28   ` Stefan Beller
@ 2018-09-13 16:31   ` Junio C Hamano
  2018-09-14  5:39   ` Jonathan Nieder
  2 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2018-09-13 16:31 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jrnieder, l.s.r, sandals

Josh Steadmon <steadmon@google.com> writes:

> +static int do_v2_command_and_cap(int out)
> +{
> +	packet_write_fmt(out, "command=archive\n");
> +	/* Capability list would go here, if we had any. */
> +	packet_delim(out);
> +}
> +
>  static int run_remote_archiver(int argc, const char **argv,
>  			       const char *remote, const char *exec,
>  			       const char *name_hint)
> @@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
>  	struct remote *_remote;
>  	struct packet_reader reader;
>  	enum packet_read_status status;
> +	enum protocol_version version;
>  
>  	_remote = remote_get(remote);
>  	if (!_remote->url[0])
> @@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
>  
>  	packet_reader_init(&reader, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
>  
> +	version = discover_version(&reader);

The original version of upload-archive that is correctly running on
the other end sends either NACK (unable to spawn) or ACK (ready to
serve) to us without waiting for us to speak first, so peeking that
with this discover_version() is a safe thing to do.

> +	if (version == protocol_v2)
> +		do_v2_command_and_cap(fd[1]);
> +

With proto v2, "server capabilities" have already been collected in
server_capabilities_v2 array in discover_version().  We are to pick
and ask the capabilities in that function and respond.  Right now we
do not need to do much, as we saw that very thin implementation of
that function above.

>  	/*
>  	 * Inject a fake --format field at the beginning of the
>  	 * arguments, with the format inferred from our output

And then after that, both the original and updated protocol lets us
send the archive format and arguments (like revs and pathspecs),
followed by a flush packet...

> @@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv,
>  		packet_write_fmt(fd[1], "argument %s\n", argv[i]);
>  	packet_flush(fd[1]);

... which is a piece of code shared between the protocol versions
that ends here.

> -	status = packet_reader_read(&reader);
> -
> -	if (status == PACKET_READ_FLUSH)
> -		die(_("git archive: expected ACK/NAK, got a flush packet"));
> -	if (strcmp(reader.buffer, "ACK")) {
> -		if (starts_with(reader.buffer, "NACK "))
> -			die(_("git archive: NACK %s"), reader.buffer + 5);
> -		if (starts_with(reader.buffer, "ERR "))
> -			die(_("remote error: %s"), reader.buffer + 4);
> -		die(_("git archive: protocol error"));
> +	if (version == protocol_v0) {
> +		status = packet_reader_read(&reader);
> +
> +		if (status == PACKET_READ_FLUSH)
> +			die(_("git archive: expected ACK/NAK, got a flush packet"));
> +		if (strcmp(reader.buffer, "ACK")) {
> +			if (starts_with(reader.buffer, "NACK "))
> +				die(_("git archive: NACK %s"), reader.buffer + 5);
> +			if (starts_with(reader.buffer, "ERR "))
> +				die(_("remote error: %s"), reader.buffer + 4);
> +			die(_("git archive: protocol error"));
> +		}
> +
> +		status = packet_reader_read(&reader);
> +		if (status != PACKET_READ_FLUSH)
> +			die(_("git archive: expected a flush"));
>  	}

The original protocol lets upload-archive to report failure to spawn
the writer backend process and lets us act on it.  We do not need a
similar support in the updated protocol and instead can jump right
into receiving the archive stream because...?

> -	status = packet_reader_read(&reader);
> -	if (status != PACKET_READ_FLUSH)
> -		die(_("git archive: expected a flush"));
> -

>  	/* Now, start reading from fd[0] and spit it out to stdout */
>  	rv = recv_sideband("archive", fd[0], 1);
>  	rv |= transport_disconnect(transport);



> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 25d911635..534e8fd56 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -5,6 +5,7 @@
>  #include "builtin.h"
>  #include "archive.h"
>  #include "pkt-line.h"
> +#include "protocol.h"
>  #include "sideband.h"
>  #include "run-command.h"
>  #include "argv-array.h"
> @@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band)
>  	return sz;
>  }
>  
> +static int handle_v2_command_and_cap(void)
> +{
> +	struct packet_reader reader;
> +	enum packet_read_status status;
> +
> +	packet_reader_init(&reader, 0, NULL, 0, PACKET_READ_CHOMP_NEWLINE);
> +
> +	packet_write_fmt(1, "version 2\n");

This lets the discover_version() on the other side notice that we
are speaking version 2.

> +	/*
> +	 * We don't currently send any capabilities, but maybe we could list
> +	 * supported archival formats?
> +	 */
> +	packet_flush(1);

process_capabilities_v2() expects the list of caps ends with a
flush, which is given here.

> +	status = packet_reader_read(&reader);
> +	if (status != PACKET_READ_NORMAL ||
> +	    strcmp(reader.buffer, "command=archive"))
> +		die(_("upload-archive: expected command=archive"));

The other side in do_v2_command_and_cap() would ask command=archive
and that is verified.  _() is unwanted, I suppose, by the way, as
you do not know what language the other side wants anyway.

> +	while (status == PACKET_READ_NORMAL) {
> +		/* We don't currently expect any client capabilities, but we
> +		 * should still read (and ignore) any that happen to get sent.
> +		 */
> +		status = packet_reader_read(&reader);

It is wrong to say we should "ignore".  If you are asked to behave
in a certain way by a capability that is not understood, the other
side expects you to honor that request and you have no idea how to
comply.  At least you should make sure that what is asked is among
the capabilities you offered (or you understand), and you should
error out when you see an unknown one, no?

> +	}
> +	if (status != PACKET_READ_DELIM)
> +		die(_("upload-archive: expected delim packet"));
> +
> +	/* Let git-upload-archive--writer handle the arguments. */

The choice of DELIM here over FLUSH is a bit curious, but it is
consistent between upload-archive and run-remote-archiver.

> +	return 0;
> +}
> +
>  int cmd_upload_archive(int argc, const char **argv, const char *prefix)
>  {
>  	struct child_process writer = { argv };
> +	enum protocol_version version = determine_protocol_version_server();
>  
>  	if (argc == 2 && !strcmp(argv[1], "-h"))
>  		usage(upload_archive_usage);
>  
> +	if (version == protocol_v2)
> +		handle_v2_command_and_cap();
> +	else {
> +		packet_write_fmt(1, "ACK\n");
> +		packet_flush(1);
> +	}
> +

This breaks the original protocol, no?  At this point we haven't
even tried to start the writer process, and letting the other side
go by giving ACK + flush prematurely.  After start_command() fails,
we may say NACK, but the other side is no longer listening to it.

>  	/*
>  	 * Set up sideband subprocess.
>  	 *
> @@ -96,9 +137,6 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
>  		die("upload-archive: %s", strerror(err));
>  	}
>  
> -	packet_write_fmt(1, "ACK\n");
> -	packet_flush(1);
> -
>  	while (1) {
>  		struct pollfd pfd[2];
>  
> diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> index 2a97b27b0..4be74d6e9 100755
> --- a/t/t5000-tar-tree.sh
> +++ b/t/t5000-tar-tree.sh
> @@ -145,6 +145,11 @@ test_expect_success \
>  
>  check_tar b
>  
> +test_expect_success 'protocol v2 for remote' '
> +	GIT_PROTOCOL="version=2" git archive --remote=. HEAD >v2_remote.tar
> +'
> +check_tar v2_remote
> +
>  test_expect_success 'git archive --prefix=prefix/' '
>  	git archive --prefix=prefix/ HEAD >with_prefix.tar
>  '

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2
  2018-09-12  5:35 ` [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2 Josh Steadmon
  2018-09-12 22:38   ` Stefan Beller
@ 2018-09-13 16:47   ` Junio C Hamano
  2018-09-14  5:57   ` Jonathan Nieder
  2 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2018-09-13 16:47 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, jrnieder, l.s.r, sandals

Josh Steadmon <steadmon@google.com> writes:

> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  builtin/archive.c  |  8 +++++++-
>  http-backend.c     | 10 +++++++++-
>  transport-helper.c |  5 +++--
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index 73831887d..5fa75b3f7 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
>  		status = packet_reader_read(&reader);
>  		if (status != PACKET_READ_FLUSH)
>  			die(_("git archive: expected a flush"));
> -	}
> +	} else if (version == protocol_v2 &&
> +		   starts_with(transport->url, "http"))
> +		/*
> +		 * Commands over HTTP require two requests, so there's an
> +		 * additional server response to parse.
> +		 */
> +		discover_version(&reader);

What should happen if the version discovered here is different from
v2 or the capabilities offered are different from what we saw
before?  Perhaps we need some sanity checks, as we on this side of
the connection know we are making two requests, and may even end up
talking with an instance of "upload-archive" that is different from
the one we talked with earlier.

> diff --git a/http-backend.c b/http-backend.c
> index 458642ef7..d62d583c7 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -32,6 +32,7 @@ struct rpc_service {
>  static struct rpc_service rpc_service[] = {
>  	{ "upload-pack", "uploadpack", 1, 1 },
>  	{ "receive-pack", "receivepack", 0, -1 },
> +	{ "upload-archive", "uploadarchive", 1, 1 },
>  };
>  
>  static struct string_list *get_parameters(void)
> @@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char *service_name)
>  	struct rpc_service *svc = select_service(hdr, service_name);
>  	struct strbuf buf = STRBUF_INIT;
>  
> +	if (!strcmp(service_name, "git-upload-archive")) {
> +		/* git-upload-archive doesn't need --stateless-rpc */
> +		argv[1] = ".";
> +		argv[2] = NULL;
> +	}
> +
>  	strbuf_reset(&buf);
>  	strbuf_addf(&buf, "application/x-git-%s-request", svc->name);
>  	check_content_type(hdr, buf.buf);
> @@ -713,7 +720,8 @@ static struct service_cmd {
>  	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
>  
>  	{"POST", "/git-upload-pack$", service_rpc},
> -	{"POST", "/git-receive-pack$", service_rpc}
> +	{"POST", "/git-receive-pack$", service_rpc},
> +	{"POST", "/git-upload-archive$", service_rpc},
>  };
>  
>  static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
> diff --git a/transport-helper.c b/transport-helper.c
> index 143ca008c..b4b96fc89 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -605,7 +605,8 @@ static int process_connect_service(struct transport *transport,
>  		ret = run_connect(transport, &cmdbuf);
>  	} else if (data->stateless_connect &&
>  		   (get_protocol_version_config() == protocol_v2) &&
> -		   !strcmp("git-upload-pack", name)) {
> +		   (!strcmp("git-upload-pack", name) ||
> +		    !strcmp("git-upload-archive", name))) {
>  		strbuf_addf(&cmdbuf, "stateless-connect %s\n", name);
>  		ret = run_connect(transport, &cmdbuf);
>  		if (ret)
> @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, const char *name,
>  
>  	/* Get_helper so connect is inited. */
>  	get_helper(transport);
> -	if (!data->connect)
> +	if (!data->connect && !data->stateless_connect)
>  		die(_("operation not supported by protocol"));

This is somewhat curious.  The upload-pack going over HTTP also is
triggered by the same "stateless-connect" remote helper command, as
we just saw in the previous hunk, and that support is not new.  Why
do we need this change then?  What's different between the "data"
that is obtained by get_helper(transport) for driving upload-pack
and upload-archive?  Presumably upload-pack was working without this
change because it also sets the connect bit on, and upload-archive
does not work without this change because it does not?  Why do these
two behave differently?

>  	if (!process_connect_service(transport, name, exec))

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-12 22:28   ` Stefan Beller
@ 2018-09-13 18:45     ` Ævar Arnfjörð Bjarmason
  2018-09-14  6:05       ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-13 18:45 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Josh Steadmon, git, Jonathan Nieder, Junio C Hamano,
	René Scharfe, brian m. carlson, Brandon Williams, Ben Peart,
	Jeff Hostetler


On Wed, Sep 12 2018, Stefan Beller wrote:

> On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon <steadmon@google.com> wrote:
>> +                */
>> +               status = packet_reader_read(&reader);
>> +       }
>> +       if (status != PACKET_READ_DELIM)
>> +               die(_("upload-archive: expected delim packet"));
>
> This is upload-archive, which is a low level plumbing command
> (see the main man page of git for an explanation of that category),
> so we do not translate the error/die() calls. Besides, this is executed
> on the server, which might have a different locale than the requesting
> client?
>
> Would asking for a setlocale() on the server side be an unreasonable
> feature request for the capabilities (in a follow up patch, and then not
> just for archive but also fetch/push, etc.)?

This would be very nice to have, but as you suggest in some follow-up
change.

I think though that instead of doing setlocale() it would be better to
pass some flag saying we're operating in a machine-readable mode, and
then we'd (as part of the protocol defintion) say we're going to emit
GIT_ERR_UPLOAD_ARCHIVE_EXPECTED_DELIM_PACKET or whatever.

Advantages of doing that over a server-side setlocale():

 1) Purely for translation purposes, users can update to a newer client
    to get new translations, even though they're talking to an old
    server.

 2) Again, only for translation purposes, servers may not have the
    appropriate locales generated and/or linked to libgettext.

 3) Ditto, some clients that aren't git.git may want/need to emit
    different translation messages to their consumers than what we have,
    think some GUI client / Emacs magit etc. whose UI is different from
    ours.

 4) Aside from translation purposes, getting a machine-readable
    "push/pull" etc. mode would be very handy. E.g. now you need to
    parse stderr to see why exactly your push failed (hook denied, or
    non-fast-forward, or non-fast-forward where there was a lock race
    condition? ...).

I also wonder if something like #4 wouldn't compliment something like
the proposed structured logging[1]. I.e. even though we'd like to run
git.git, and present exactly the message to the user we do now, we might
want to run in such a machine-readable mode under the hood when talking
to the server so we can log exactly how the push went / how it failed
for the purposes of aggregation.

1. https://public-inbox.org/git/20180713165621.52017-2-git@jeffhostetler.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Add proto v2 archive command with HTTP support
  2018-09-12  5:35 Add proto v2 archive command with HTTP support Josh Steadmon
                   ` (2 preceding siblings ...)
  2018-09-12  5:35 ` [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2 Josh Steadmon
@ 2018-09-14  5:36 ` Jonathan Nieder
  3 siblings, 0 replies; 19+ messages in thread
From: Jonathan Nieder @ 2018-09-14  5:36 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, gitster, l.s.r, sandals

Hi,

Josh Steadmon wrote:

> This series adds a new protocol v2 command for archiving, and allows
> this command to work over HTTP(S). This was previously discussed in [1].
> I've CCed everyone who participated in that discussion.

Yay!  Getting ready to read it now.

For the future, "git format-patch --cover-letter" does a few things
that can be nice for this kind of opening message:

- it lists the patches in the series, and a diffstat
- it puts [PATCH 0/3] in the subject line so people know what to expect

Thanks,
Jonathan

> [1]: https://public-inbox.org/git/CANq=j3tK7QeBJOC7VNWkh4+WBNibMJJp5YUkd9te5NaYwukAow@mail.gmail.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-12  5:35 ` [PATCH 2/3] archive: implement protocol v2 archive command Josh Steadmon
  2018-09-12 22:28   ` Stefan Beller
  2018-09-13 16:31   ` Junio C Hamano
@ 2018-09-14  5:39   ` Jonathan Nieder
  2 siblings, 0 replies; 19+ messages in thread
From: Jonathan Nieder @ 2018-09-14  5:39 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, gitster, l.s.r, sandals

Hi,

Josh Steadmon wrote:

> This adds a new archive command for protocol v2. The command expects
> arguments in the form "argument X" which are passed unmodified to
> git-upload-archive--writer.
>
> This command works over the file://, Git, and SSH transports. HTTP
> support will be added in a separate patch.
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  builtin/archive.c        | 45 +++++++++++++++++++++++++++-------------
>  builtin/upload-archive.c | 44 ++++++++++++++++++++++++++++++++++++---
>  t/t5000-tar-tree.sh      |  5 +++++
>  3 files changed, 77 insertions(+), 17 deletions(-)

I like the diffstat. :)

Can this include some docs in Documentation/technical/ as well, to
help other implementors to understand the protocol so they can
interoperate?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2
  2018-09-12  5:35 ` [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2 Josh Steadmon
  2018-09-12 22:38   ` Stefan Beller
  2018-09-13 16:47   ` Junio C Hamano
@ 2018-09-14  5:57   ` Jonathan Nieder
  2 siblings, 0 replies; 19+ messages in thread
From: Jonathan Nieder @ 2018-09-14  5:57 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, gitster, l.s.r, sandals

Hi,

Josh Steadmon wrote:

> Subject: archive: allow archive over HTTP(S) with proto v2

It's interesting how little this has to touch the client.

> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  builtin/archive.c  |  8 +++++++-
>  http-backend.c     | 10 +++++++++-
>  transport-helper.c |  5 +++--
>  3 files changed, 19 insertions(+), 4 deletions(-)
[....]
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
>  		status = packet_reader_read(&reader);
>  		if (status != PACKET_READ_FLUSH)
>  			die(_("git archive: expected a flush"));
> -	}
> +	} else if (version == protocol_v2 &&
> +		   starts_with(transport->url, "http"))

As Stefan noticed, this starts_with test seems a bit too loose.  For
example, what happens if I try an scp-style SSH URL like
http.example.com:path/to/repo, a local path like http/foo/bar, or a
custom protocol like httplikebutbetter://path/to/repo (honest
question: I haven't tried)?

> +		/*
> +		 * Commands over HTTP require two requests, so there's an
> +		 * additional server response to parse.
> +		 */
> +		discover_version(&reader);

Can this be made consistent with the non-http case?  The original
capabilities (/info/refs) response told us what protocol version the
server wants to use, which means that a hypothetical protocol v3 could
use a completely different request format for the followup commands:
so could the server omit the protocol version in the v2
/git-upload-archive response?  Alternatively, if we want to include
the protocol version again, could we do that in stateful protocols as
well?

Related question: what should happen if the two responses declare
different protocol versions?  Should we diagnose that as a protocol
error?

[...]
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -32,6 +32,7 @@ struct rpc_service {
>  static struct rpc_service rpc_service[] = {
>  	{ "upload-pack", "uploadpack", 1, 1 },
>  	{ "receive-pack", "receivepack", 0, -1 },
> +	{ "upload-archive", "uploadarchive", 1, 1 },

shell.c orders these in almost-alphabetical order (receive-pack,
upload-pack, upload-archive).  I guess they should both use actual
alphabetical order?  (If you agree, then please feel free to do that
in a separate patch.)

[...]
> @@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char *service_name)
>  	struct rpc_service *svc = select_service(hdr, service_name);
>  	struct strbuf buf = STRBUF_INIT;
>  
> +	if (!strcmp(service_name, "git-upload-archive")) {
> +		/* git-upload-archive doesn't need --stateless-rpc */

This comment doesn't seem actionable.  Can it say why?  E.g. "[...]
because an upload-archive command always involves a single
round-trip".  Or alternatively, I think it's fine to omit the comment.

> +		argv[1] = ".";
> +		argv[2] = NULL;
> +	}
[...]
> @@ -713,7 +720,8 @@ static struct service_cmd {
>  	{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
>  
>  	{"POST", "/git-upload-pack$", service_rpc},
> -	{"POST", "/git-receive-pack$", service_rpc}
> +	{"POST", "/git-receive-pack$", service_rpc},
> +	{"POST", "/git-upload-archive$", service_rpc},

Same comment about services seeming to be in a randomish order.

[...]
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -605,7 +605,8 @@ static int process_connect_service(struct transport *transport,
>  		ret = run_connect(transport, &cmdbuf);
>  	} else if (data->stateless_connect &&
>  		   (get_protocol_version_config() == protocol_v2) &&

(not about this patch) These parens don't help --- they make it harder
for me to read, especially with the new parens to try to match them up
with.

> -		   !strcmp("git-upload-pack", name)) {
> +		   (!strcmp("git-upload-pack", name) ||
> +		    !strcmp("git-upload-archive", name))) {

A part of me wonders about the wasted cycles comparing to
"git-upload-" twice, but (1) it is tiny relative to actually serving
the request and (2) if we're lucky, the compiler (or a compiler of the
future) inlines the strcmp call and could optimize it out.

[...]
> @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, const char *name,
>  
>  	/* Get_helper so connect is inited. */
>  	get_helper(transport);
> -	if (!data->connect)
> +	if (!data->connect && !data->stateless_connect)
>  		die(_("operation not supported by protocol"));

I don't understand this part.  Can you explain it further (possibly by
putting it in its own patch)?

Thanks for a pleasant read,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-13 18:45     ` Ævar Arnfjörð Bjarmason
@ 2018-09-14  6:05       ` Jonathan Nieder
  2018-09-14 14:31         ` Ævar Arnfjörð Bjarmason
  2018-09-14 16:14         ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Jonathan Nieder @ 2018-09-14  6:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Stefan Beller, Josh Steadmon, git, Junio C Hamano,
	René Scharfe, brian m. carlson, Brandon Williams, Ben Peart,
	Jeff Hostetler

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Sep 12 2018, Stefan Beller wrote:

>> Would asking for a setlocale() on the server side be an unreasonable
>> feature request for the capabilities (in a follow up patch, and then not
>> just for archive but also fetch/push, etc.)?
>
> This would be very nice to have, but as you suggest in some follow-up
> change.

Indeed, I think we've gone pretty far afield from the goal of this
patch series.

> I think though that instead of doing setlocale() it would be better to
> pass some flag saying we're operating in a machine-readable mode, and
> then we'd (as part of the protocol defintion) say we're going to emit
> GIT_ERR_UPLOAD_ARCHIVE_EXPECTED_DELIM_PACKET or whatever.

I think you're suggesting client-side message generation, and that is
one way to handle internationalization of server output.

The main downside is when the server really does want to provide a
custom error message.  For that, we'd need

 1. To propagate LANG to the server, so it knows what human language
    to generate messages in.

 2. On the server side, to produce messages in that language if
    available, with an appropriate fallback if not.

We've been thinking of doing at least (1) using the same trick as
server-options use (cramming it into client capabilities).

It is difficult to use setlocale for this because it affects the whole
program (problematic for a threaded server) and affects features like
collation order instead of just message generation (problematic for
many things).  Does gettext have a variant that takes a locale_t
argument?

[...]
>  4) Aside from translation purposes, getting a machine-readable
>     "push/pull" etc. mode would be very handy. E.g. now you need to
>     parse stderr to see why exactly your push failed (hook denied, or
>     non-fast-forward, or non-fast-forward where there was a lock race
>     condition? ...).

Indeed, this is a good reason to provide error codes instead of (in
the case where the message doesn't add anything to it) or alongside
(in case the error message is more specialized) human-oriented error
messages.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-14  6:05       ` Jonathan Nieder
@ 2018-09-14 14:31         ` Ævar Arnfjörð Bjarmason
  2018-09-14 16:14         ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-14 14:31 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Stefan Beller, Josh Steadmon, git, Junio C Hamano,
	René Scharfe, brian m. carlson, Brandon Williams, Ben Peart,
	Jeff Hostetler


On Fri, Sep 14 2018, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Sep 12 2018, Stefan Beller wrote:
>
>>> Would asking for a setlocale() on the server side be an unreasonable
>>> feature request for the capabilities (in a follow up patch, and then not
>>> just for archive but also fetch/push, etc.)?
>>
>> This would be very nice to have, but as you suggest in some follow-up
>> change.
>
> Indeed, I think we've gone pretty far afield from the goal of this
> patch series.
>
>> I think though that instead of doing setlocale() it would be better to
>> pass some flag saying we're operating in a machine-readable mode, and
>> then we'd (as part of the protocol defintion) say we're going to emit
>> GIT_ERR_UPLOAD_ARCHIVE_EXPECTED_DELIM_PACKET or whatever.
>
> I think you're suggesting client-side message generation, and that is
> one way to handle internationalization of server output.
>
> The main downside is when the server really does want to provide a
> custom error message.  For that, we'd need

Yeah you can't do it for everything. E.g. hooks will want to spew out
custom messages, and this hypothetical protocol extension won't have
codes for those. So you'll still need to pass $LANG along.

>  1. To propagate LANG to the server, so it knows what human language
>     to generate messages in.
>
>  2. On the server side, to produce messages in that language if
>     available, with an appropriate fallback if not.
>
> We've been thinking of doing at least (1) using the same trick as
> server-options use (cramming it into client capabilities).
>
> It is difficult to use setlocale for this because it affects the whole
> program (problematic for a threaded server) and affects features like
> collation order instead of just message generation (problematic for
> many things).  Does gettext have a variant that takes a locale_t
> argument?

No, its API is fairly crappy and depends on setlocale().

Keep in mind though that we're not tied to that API. E.g. one way to
work around this problem is to simply loop through all the languages we
have translations for at server startup, for each one call setlocale()
and gettext(), and save the result in a hash table for runtime lookup,
then you'd just call sprintf(hash[language][message_id], ...) at
runtime.

That's all libintl is really doing under the hood, in a roundabout way
where calls to setlocale() determine what table we're looking things up
in.

The reason I opted to go for gettext to begin with was mainly a) it was
there b) the ubiquitous availability of tooling for translators when it
comes to the *.po files.

But the API for looking things up at runtime is fairly small, and easy
to replace. We could e.g. replace all of our own gettext.[ch] wrapper
with something that works somewhat like what I described above, with
some extra build step to extract the relevant data from the *.mo files
(or parse the *.po directly).

> [...]
>>  4) Aside from translation purposes, getting a machine-readable
>>     "push/pull" etc. mode would be very handy. E.g. now you need to
>>     parse stderr to see why exactly your push failed (hook denied, or
>>     non-fast-forward, or non-fast-forward where there was a lock race
>>     condition? ...).
>
> Indeed, this is a good reason to provide error codes instead of (in
> the case where the message doesn't add anything to it) or alongside
> (in case the error message is more specialized) human-oriented error
> messages.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-14  6:05       ` Jonathan Nieder
  2018-09-14 14:31         ` Ævar Arnfjörð Bjarmason
@ 2018-09-14 16:14         ` Junio C Hamano
  2018-09-14 16:19           ` Jonathan Nieder
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2018-09-14 16:14 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller,
	Josh Steadmon, git, René Scharfe, brian m. carlson,
	Brandon Williams, Ben Peart, Jeff Hostetler

Jonathan Nieder <jrnieder@gmail.com> writes:

>> I think though that instead of doing setlocale() it would be better to
>> pass some flag saying we're operating in a machine-readable mode, and
>> then we'd (as part of the protocol defintion) say we're going to emit
>> GIT_ERR_UPLOAD_ARCHIVE_EXPECTED_DELIM_PACKET or whatever.
>
> I think you're suggesting client-side message generation, and that is
> one way to handle internationalization of server output.
>
> The main downside is when the server really does want to provide a
> custom error message.  For that, we'd need
>
>  1. To propagate LANG to the server, so it knows what human language
>     to generate messages in.
>
>  2. On the server side, to produce messages in that language if
>     available, with an appropriate fallback if not.

That is one way to do so, but it does not have to be the only way, I
would think.  You can send a machine parsable message in pieces, and
assemble the parts of speech into a message at the receiving end.
Like sending a msgid to identify an entry in the .pot file, and
values to be filled in.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] archive: implement protocol v2 archive command
  2018-09-14 16:14         ` Junio C Hamano
@ 2018-09-14 16:19           ` Jonathan Nieder
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Nieder @ 2018-09-14 16:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Stefan Beller,
	Josh Steadmon, git, René Scharfe, brian m. carlson,
	Brandon Williams, Ben Peart, Jeff Hostetler

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>>> I think though that instead of doing setlocale() it would be better to
>>> pass some flag saying we're operating in a machine-readable mode, and
>>> then we'd (as part of the protocol defintion) say we're going to emit
>>> GIT_ERR_UPLOAD_ARCHIVE_EXPECTED_DELIM_PACKET or whatever.
>>
>> I think you're suggesting client-side message generation, and that is
>> one way to handle internationalization of server output.
>>
>> The main downside is when the server really does want to provide a
>> custom error message.  For that, we'd need
>>
>>  1. To propagate LANG to the server, so it knows what human language
>>     to generate messages in.
>>
>>  2. On the server side, to produce messages in that language if
>>     available, with an appropriate fallback if not.
>
> That is one way to do so, but it does not have to be the only way, I
> would think.  You can send a machine parsable message in pieces, and
> assemble the parts of speech into a message at the receiving end.
> Like sending a msgid to identify an entry in the .pot file, and
> values to be filled in.

That works if the same party controls the client and server and the
client is up to date enough to know about every message the server
would want to send.

It doesn't work for
- hooks
- alternate server implementations
- messages involved in an emergency fix
- ... etc ...

Don't get me wrong: for messages with a machine as an audience, error
codes or similar structured errors are a great way to go, and getting
client-side generation of messages for humans (not to mention styling,
etc) are a nice bonus there.  I stand by what's in the message you're
replying to, though: if we actually want to be able to consistently
provide useful messages to people who do not like to read English,
then client-side generation won't get us all the way there.

Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-12  5:35 Add proto v2 archive command with HTTP support Josh Steadmon
2018-09-12  5:35 ` [PATCH 1/3] archive: use packet_reader for communications Josh Steadmon
2018-09-12 22:01   ` Stefan Beller
2018-09-13 14:58   ` Junio C Hamano
2018-09-13 15:34     ` Junio C Hamano
2018-09-12  5:35 ` [PATCH 2/3] archive: implement protocol v2 archive command Josh Steadmon
2018-09-12 22:28   ` Stefan Beller
2018-09-13 18:45     ` Ævar Arnfjörð Bjarmason
2018-09-14  6:05       ` Jonathan Nieder
2018-09-14 14:31         ` Ævar Arnfjörð Bjarmason
2018-09-14 16:14         ` Junio C Hamano
2018-09-14 16:19           ` Jonathan Nieder
2018-09-13 16:31   ` Junio C Hamano
2018-09-14  5:39   ` Jonathan Nieder
2018-09-12  5:35 ` [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2 Josh Steadmon
2018-09-12 22:38   ` Stefan Beller
2018-09-13 16:47   ` Junio C Hamano
2018-09-14  5:57   ` Jonathan Nieder
2018-09-14  5:36 ` Add proto v2 archive command with HTTP support Jonathan Nieder

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox