git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v7 0/9] connect: various cleanups
@ 2016-05-21 23:17 Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port Mike Hommey
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Difference with v6:
- Dropped the core.gitProxy change as per discussion.
- Added a comment about the extra get_port.

Note that after this series, parse_connect_url can be further refactored to
avoid the kind of back-and-forth with host_end, get_host_and_port and
get_port.

Mike Hommey (9):
  connect: document why we sometimes call get_port after
    get_host_and_port
  connect: call get_host_and_port() earlier
  connect: re-derive a host:port string from the separate host and port
    variables
  connect: make parse_connect_url() return separated host and port
  connect: group CONNECT_DIAG_URL handling code
  connect: make parse_connect_url() return the user part of the url as a
    separate value
  connect: change the --diag-url output to separate user and host
  connect: actively reject git:// urls with a user part
  connect: move ssh command line preparation to a separate function

 connect.c             | 231 +++++++++++++++++++++++++++++---------------------
 t/t5500-fetch-pack.sh |  42 ++++++---
 2 files changed, 166 insertions(+), 107 deletions(-)

-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-22  2:40   ` Eric Sunshine
  2016-05-22  6:07   ` Torsten Bögershausen
  2016-05-21 23:17 ` [PATCH v7 2/9] connect: call get_host_and_port() earlier Mike Hommey
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/connect.c b/connect.c
index c53f3f1..caa2a3c 100644
--- a/connect.c
+++ b/connect.c
@@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
 			transport_check_allowed("ssh");
 			get_host_and_port(&ssh_host, &port);
 
+			/* get_host_and_port may not return a port even when
+			 * there is one: In the [host:port]:path case,
+			 * get_host_and_port is called with "[host:port]" and
+			 * returns "host:port" and NULL.
+			 * In that specific case, we still need to split the
+			 * port. */
 			if (!port)
 				port = get_port(ssh_host);
 
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 2/9] connect: call get_host_and_port() earlier
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 3/9] connect: re-derive a host:port string from the separate host and port variables Mike Hommey
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Currently, get_host_and_port() is called in git_connect() for the ssh
protocol, and in git_tcp_connect_sock() for the git protocol. Instead
of doing this, just call it from a single place, right after
parse_connect_url(), and pass the host and port separately to
git_*_connect() functions.

We however preserve hostandport, at least for now.

Note that in git_tcp_connect_sock, the port was set to "<none>" in a
case that never can happen, so that code path was removed.

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c | 47 +++++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/connect.c b/connect.c
index caa2a3c..0cdbeb2 100644
--- a/connect.c
+++ b/connect.c
@@ -343,18 +343,16 @@ static const char *ai_name(const struct addrinfo *ai)
 /*
  * Returns a connected socket() fd, or else die()s.
  */
-static int git_tcp_connect_sock(char *host, int flags)
+static int git_tcp_connect_sock(const char *host, const char *port, int flags)
 {
 	struct strbuf error_message = STRBUF_INIT;
 	int sockfd = -1;
-	const char *port = STR(DEFAULT_GIT_PORT);
 	struct addrinfo hints, *ai0, *ai;
 	int gai;
 	int cnt = 0;
 
-	get_host_and_port(&host, &port);
-	if (!*port)
-		port = "<none>";
+	if (!port)
+		port = STR(DEFAULT_GIT_PORT);
 
 	memset(&hints, 0, sizeof(hints));
 	if (flags & CONNECT_IPV4)
@@ -411,11 +409,10 @@ static int git_tcp_connect_sock(char *host, int flags)
 /*
  * Returns a connected socket() fd, or else die()s.
  */
-static int git_tcp_connect_sock(char *host, int flags)
+static int git_tcp_connect_sock(const char *host, const char *port, int flags)
 {
 	struct strbuf error_message = STRBUF_INIT;
 	int sockfd = -1;
-	const char *port = STR(DEFAULT_GIT_PORT);
 	char *ep;
 	struct hostent *he;
 	struct sockaddr_in sa;
@@ -423,7 +420,8 @@ static int git_tcp_connect_sock(char *host, int flags)
 	unsigned int nport;
 	int cnt;
 
-	get_host_and_port(&host, &port);
+	if (!port)
+		port = STR(DEFAULT_GIT_PORT);
 
 	if (flags & CONNECT_VERBOSE)
 		fprintf(stderr, "Looking up %s ... ", host);
@@ -482,9 +480,10 @@ static int git_tcp_connect_sock(char *host, int flags)
 #endif /* NO_IPV6 */
 
 
-static void git_tcp_connect(int fd[2], char *host, int flags)
+static void git_tcp_connect(int fd[2], const char *host, const char *port,
+			    int flags)
 {
-	int sockfd = git_tcp_connect_sock(host, flags);
+	int sockfd = git_tcp_connect_sock(host, port, flags);
 
 	fd[0] = sockfd;
 	fd[1] = dup(sockfd);
@@ -550,18 +549,16 @@ static int git_use_proxy(const char *host)
 	return (git_proxy_command && *git_proxy_command);
 }
 
-static struct child_process *git_proxy_connect(int fd[2], char *host)
+static struct child_process *git_proxy_connect(int fd[2], const char *host,
+					       const char *port)
 {
-	const char *port = STR(DEFAULT_GIT_PORT);
 	struct child_process *proxy;
 
-	get_host_and_port(&host, &port);
-
 	proxy = xmalloc(sizeof(*proxy));
 	child_process_init(proxy);
 	argv_array_push(&proxy->args, git_proxy_command);
 	argv_array_push(&proxy->args, host);
-	argv_array_push(&proxy->args, port);
+	argv_array_push(&proxy->args, port ? port : STR(DEFAULT_GIT_PORT));
 	proxy->in = -1;
 	proxy->out = -1;
 	if (start_command(proxy))
@@ -672,7 +669,8 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
 struct child_process *git_connect(int fd[2], const char *url,
 				  const char *prog, int flags)
 {
-	char *hostandport, *path;
+	char *hostandport, *path, *host;
+	const char *port = NULL;
 	struct child_process *conn = &no_fork;
 	enum protocol protocol;
 	struct strbuf cmd = STRBUF_INIT;
@@ -683,6 +681,8 @@ struct child_process *git_connect(int fd[2], const char *url,
 	signal(SIGCHLD, SIG_DFL);
 
 	protocol = parse_connect_url(url, &hostandport, &path);
+	host = xstrdup(hostandport);
+	get_host_and_port(&host, &port);
 	if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
 		printf("Diag: url=%s\n", url ? url : "NULL");
 		printf("Diag: protocol=%s\n", prot_name(protocol));
@@ -707,9 +707,9 @@ struct child_process *git_connect(int fd[2], const char *url,
 		 * cannot connect.
 		 */
 		if (git_use_proxy(hostandport))
-			conn = git_proxy_connect(fd, hostandport);
+			conn = git_proxy_connect(fd, host, port);
 		else
-			git_tcp_connect(fd, hostandport, flags);
+			git_tcp_connect(fd, host, port, flags);
 		/*
 		 * Separate original protocol components prog and path
 		 * from extended host header with a NUL byte.
@@ -737,10 +737,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 		if (protocol == PROTO_SSH) {
 			const char *ssh;
 			int putty = 0, tortoiseplink = 0;
-			char *ssh_host = hostandport;
-			const char *port = NULL;
 			transport_check_allowed("ssh");
-			get_host_and_port(&ssh_host, &port);
 
 			/* get_host_and_port may not return a port even when
 			 * there is one: In the [host:port]:path case,
@@ -749,16 +746,17 @@ struct child_process *git_connect(int fd[2], const char *url,
 			 * In that specific case, we still need to split the
 			 * port. */
 			if (!port)
-				port = get_port(ssh_host);
+				port = get_port(host);
 
 			if (flags & CONNECT_DIAG_URL) {
 				printf("Diag: url=%s\n", url ? url : "NULL");
 				printf("Diag: protocol=%s\n", prot_name(protocol));
-				printf("Diag: userandhost=%s\n", ssh_host ? ssh_host : "NULL");
+				printf("Diag: userandhost=%s\n", host ? host : "NULL");
 				printf("Diag: port=%s\n", port ? port : "NONE");
 				printf("Diag: path=%s\n", path ? path : "NULL");
 
 				free(hostandport);
+				free(host);
 				free(path);
 				free(conn);
 				return NULL;
@@ -804,7 +802,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 				argv_array_push(&conn->args, putty ? "-P" : "-p");
 				argv_array_push(&conn->args, port);
 			}
-			argv_array_push(&conn->args, ssh_host);
+			argv_array_push(&conn->args, host);
 		} else {
 			transport_check_allowed("file");
 		}
@@ -818,6 +816,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 		strbuf_release(&cmd);
 	}
 	free(hostandport);
+	free(host);
 	free(path);
 	return conn;
 }
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 3/9] connect: re-derive a host:port string from the separate host and port variables
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 2/9] connect: call get_host_and_port() earlier Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 4/9] connect: make parse_connect_url() return separated host and port Mike Hommey
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

The last uses of the hostandport variable, besides being strdup'ed
before being split into host and port, is to fill the host header in the
git protocol and to test whether to proxy the request.

Instead of relying on parse_connect_url() to return a host:port string
that makes sense there, re-derive one from the host and port variables.
This will allow to refactor parse_connect_url() to return separate host
and port strings.

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/connect.c b/connect.c
index 0cdbeb2..7cdaed1 100644
--- a/connect.c
+++ b/connect.c
@@ -695,18 +695,32 @@ struct child_process *git_connect(int fd[2], const char *url,
 		 * connect, unless the user has overridden us in
 		 * the environment.
 		 */
-		char *target_host = getenv("GIT_OVERRIDE_VIRTUAL_HOST");
-		if (target_host)
-			target_host = xstrdup(target_host);
-		else
-			target_host = xstrdup(hostandport);
+		struct strbuf target_host = STRBUF_INIT;
+		struct strbuf virtual_host = STRBUF_INIT;
+		const char *colon = strchr(host, ':');
+		char *override_vhost = getenv("GIT_OVERRIDE_VIRTUAL_HOST");
+
+		/* If the host contains a colon (ipv6 address), it needs to
+		 * be enclosed with square brackets. */
+		if (colon)
+			strbuf_addch(&target_host, '[');
+		strbuf_addstr(&target_host, host);
+		if (colon)
+			strbuf_addch(&target_host, ']');
+		if (port) {
+			strbuf_addch(&target_host, ':');
+			strbuf_addstr(&target_host, port);
+		}
+
+		strbuf_addstr(&virtual_host, override_vhost ? override_vhost
+							    : target_host.buf);
 
 		transport_check_allowed("git");
 
 		/* These underlying connection commands die() if they
 		 * cannot connect.
 		 */
-		if (git_use_proxy(hostandport))
+		if (git_use_proxy(target_host.buf))
 			conn = git_proxy_connect(fd, host, port);
 		else
 			git_tcp_connect(fd, host, port, flags);
@@ -720,8 +734,9 @@ struct child_process *git_connect(int fd[2], const char *url,
 		packet_write(fd[1],
 			     "%s %s%chost=%s%c",
 			     prog, path, 0,
-			     target_host, 0);
-		free(target_host);
+			     virtual_host.buf, 0);
+		strbuf_release(&virtual_host);
+		strbuf_release(&target_host);
 	} else {
 		conn = xmalloc(sizeof(*conn));
 		child_process_init(conn);
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 4/9] connect: make parse_connect_url() return separated host and port
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
                   ` (2 preceding siblings ...)
  2016-05-21 23:17 ` [PATCH v7 3/9] connect: re-derive a host:port string from the separate host and port variables Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 5/9] connect: group CONNECT_DIAG_URL handling code Mike Hommey
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Now that nothing besides CONNECT_DIAG_URL is using hostandport, we can
have parse_connect_url() itself do the host and port splitting.

This still leaves "user@" part of the host, if there is one, which will
be addressed in a subsequent change. This however does add /some/
handling of the "user@" part of the host, in order to pick the port
properly.

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c             | 43 +++++++++++++++++++++++++------------------
 t/t5500-fetch-pack.sh | 32 +++++++++++++++++++++-----------
 2 files changed, 46 insertions(+), 29 deletions(-)

diff --git a/connect.c b/connect.c
index 7cdaed1..edbf0e2 100644
--- a/connect.c
+++ b/connect.c
@@ -589,10 +589,11 @@ static char *get_port(char *host)
  * The caller must free() the returned strings.
  */
 static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
-				       char **ret_path)
+				       char **ret_port, char **ret_path)
 {
 	char *url;
 	char *host, *path;
+	const char *port = NULL;
 	char *end;
 	int separator = '/';
 	enum protocol protocol = PROTO_LOCAL;
@@ -647,7 +648,24 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
 	path = xstrdup(path);
 	*end = '\0';
 
+	get_host_and_port(&host, &port);
+
+	if (*host && !port) {
+		/* get_host_and_port may not return a port even when there is
+		 * one: In the [host:port]:path case, get_host_and_port is
+		 * called with "[host:port]" and returns "host:port" and NULL.
+		 * In that specific case, we still need to split the port.
+		 * "host:port" may also look like "user@host:port". As the
+		 * `user` portion tends to be less strict than `host:port`,
+		 * we first put it out of the equation: since a hostname
+		 * cannot contain a '@', we start from the last '@' in the
+		 * string. */
+		char *end_user = strrchr(host, '@');
+		port = get_port(end_user ? end_user : host);
+	}
+
 	*ret_host = xstrdup(host);
+	*ret_port = port ? xstrdup(port) : NULL;
 	*ret_path = path;
 	free(url);
 	return protocol;
@@ -669,8 +687,7 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
 struct child_process *git_connect(int fd[2], const char *url,
 				  const char *prog, int flags)
 {
-	char *hostandport, *path, *host;
-	const char *port = NULL;
+	char *host, *port, *path;
 	struct child_process *conn = &no_fork;
 	enum protocol protocol;
 	struct strbuf cmd = STRBUF_INIT;
@@ -680,13 +697,12 @@ struct child_process *git_connect(int fd[2], const char *url,
 	 */
 	signal(SIGCHLD, SIG_DFL);
 
-	protocol = parse_connect_url(url, &hostandport, &path);
-	host = xstrdup(hostandport);
-	get_host_and_port(&host, &port);
+	protocol = parse_connect_url(url, &host, &port, &path);
 	if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
 		printf("Diag: url=%s\n", url ? url : "NULL");
 		printf("Diag: protocol=%s\n", prot_name(protocol));
-		printf("Diag: hostandport=%s\n", hostandport ? hostandport : "NULL");
+		printf("Diag: userandhost=%s\n", host ? host : "NULL");
+		printf("Diag: port=%s\n", port ? port : "NONE");
 		printf("Diag: path=%s\n", path ? path : "NULL");
 		conn = NULL;
 	} else if (protocol == PROTO_GIT) {
@@ -754,15 +770,6 @@ struct child_process *git_connect(int fd[2], const char *url,
 			int putty = 0, tortoiseplink = 0;
 			transport_check_allowed("ssh");
 
-			/* get_host_and_port may not return a port even when
-			 * there is one: In the [host:port]:path case,
-			 * get_host_and_port is called with "[host:port]" and
-			 * returns "host:port" and NULL.
-			 * In that specific case, we still need to split the
-			 * port. */
-			if (!port)
-				port = get_port(host);
-
 			if (flags & CONNECT_DIAG_URL) {
 				printf("Diag: url=%s\n", url ? url : "NULL");
 				printf("Diag: protocol=%s\n", prot_name(protocol));
@@ -770,8 +777,8 @@ struct child_process *git_connect(int fd[2], const char *url,
 				printf("Diag: port=%s\n", port ? port : "NONE");
 				printf("Diag: path=%s\n", path ? path : "NULL");
 
-				free(hostandport);
 				free(host);
+				free(port);
 				free(path);
 				free(conn);
 				return NULL;
@@ -830,8 +837,8 @@ struct child_process *git_connect(int fd[2], const char *url,
 		fd[1] = conn->in;  /* write to child's stdin */
 		strbuf_release(&cmd);
 	}
-	free(hostandport);
 	free(host);
+	free(port);
 	free(path);
 	return conn;
 }
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 91a69fc..739c6b1 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -553,7 +553,7 @@ check_prot_path () {
 	Diag: protocol=$2
 	Diag: path=$3
 	EOF
-	git fetch-pack --diag-url "$1" | grep -v hostandport= >actual &&
+	git fetch-pack --diag-url "$1" | egrep -v '(host|port)=' >actual &&
 	test_cmp expected actual
 }
 
@@ -562,22 +562,17 @@ check_prot_host_port_path () {
 	case "$2" in
 		*ssh*)
 		pp=ssh
-		uah=userandhost
-		ehost=$(echo $3 | tr -d "[]")
-		diagport="Diag: port=$4"
 		;;
 		*)
-		pp=$p
-		uah=hostandport
-		ehost=$(echo $3$4 | sed -e "s/22$/:22/" -e "s/NONE//")
-		diagport=""
+		pp=$2
 		;;
 	esac
+	ehost=$(echo $3 | tr -d "[]")
 	cat >exp <<-EOF &&
 	Diag: url=$1
 	Diag: protocol=$pp
-	Diag: $uah=$ehost
-	$diagport
+	Diag: userandhost=$ehost
+	Diag: port=$4
 	Diag: path=$5
 	EOF
 	grep -v "^$" exp >expected
@@ -585,7 +580,7 @@ check_prot_host_port_path () {
 	test_cmp expected actual
 }
 
-for r in repo re:po re/po
+for r in repo re:po re/po re@po
 do
 	# git or ssh with scheme
 	for p in "ssh+git" "git+ssh" git ssh
@@ -608,6 +603,9 @@ do
 			test_expect_success "fetch-pack --diag-url $p://$h:22/$r" '
 				check_prot_host_port_path $p://$h:22/$r $p "$h" 22 "/$r"
 			'
+			test_expect_success "fetch-pack --diag-url $p://$h:22/$r" '
+				check_prot_host_port_path $p://$h:22/$r $p "$h" 22 "/$r"
+			'
 		done
 	done
 	# file with scheme
@@ -644,6 +642,18 @@ do
 			check_prot_host_port_path $h:/~$r $p "$h" NONE "~$r"
 		'
 	done
+	#ssh without scheme with port
+	p=ssh
+	for h in host user@host
+	do
+		test_expect_success "fetch-pack --diag-url [$h:22]:$r" '
+			check_prot_host_port_path [$h:22]:$r $p $h 22 "$r"
+		'
+		# Do "/~" -> "~" conversion
+		test_expect_success "fetch-pack --diag-url [$h:22]:/~$r" '
+			check_prot_host_port_path [$h:22]:/~$r $p $h 22 "~$r"
+		'
+	done
 done
 
 test_expect_success MINGW 'fetch-pack --diag-url file://c:/repo' '
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 5/9] connect: group CONNECT_DIAG_URL handling code
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
                   ` (3 preceding siblings ...)
  2016-05-21 23:17 ` [PATCH v7 4/9] connect: make parse_connect_url() return separated host and port Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 6/9] connect: make parse_connect_url() return the user part of the url as a separate value Mike Hommey
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Previous changes made both branches handling CONNECT_DIAG_URL identical.
We can now remove one of those branches and have CONNECT_DIAG_URL be
handled in one place.

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/connect.c b/connect.c
index edbf0e2..c0fad4f 100644
--- a/connect.c
+++ b/connect.c
@@ -698,7 +698,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 	signal(SIGCHLD, SIG_DFL);
 
 	protocol = parse_connect_url(url, &host, &port, &path);
-	if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
+	if (flags & CONNECT_DIAG_URL) {
 		printf("Diag: url=%s\n", url ? url : "NULL");
 		printf("Diag: protocol=%s\n", prot_name(protocol));
 		printf("Diag: userandhost=%s\n", host ? host : "NULL");
@@ -770,20 +770,6 @@ struct child_process *git_connect(int fd[2], const char *url,
 			int putty = 0, tortoiseplink = 0;
 			transport_check_allowed("ssh");
 
-			if (flags & CONNECT_DIAG_URL) {
-				printf("Diag: url=%s\n", url ? url : "NULL");
-				printf("Diag: protocol=%s\n", prot_name(protocol));
-				printf("Diag: userandhost=%s\n", host ? host : "NULL");
-				printf("Diag: port=%s\n", port ? port : "NONE");
-				printf("Diag: path=%s\n", path ? path : "NULL");
-
-				free(host);
-				free(port);
-				free(path);
-				free(conn);
-				return NULL;
-			}
-
 			ssh = getenv("GIT_SSH_COMMAND");
 			if (!ssh) {
 				const char *base;
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 6/9] connect: make parse_connect_url() return the user part of the url as a separate value
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
                   ` (4 preceding siblings ...)
  2016-05-21 23:17 ` [PATCH v7 5/9] connect: group CONNECT_DIAG_URL handling code Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 7/9] connect: change the --diag-url output to separate user and host Mike Hommey
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c | 51 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/connect.c b/connect.c
index c0fad4f..48d9cd2 100644
--- a/connect.c
+++ b/connect.c
@@ -588,11 +588,13 @@ static char *get_port(char *host)
  * Extract protocol and relevant parts from the specified connection URL.
  * The caller must free() the returned strings.
  */
-static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
-				       char **ret_port, char **ret_path)
+static enum protocol parse_connect_url(const char *url_orig, char **ret_user,
+				       char **ret_host, char **ret_port,
+				       char **ret_path)
 {
 	char *url;
 	char *host, *path;
+	const char *user = NULL;
 	const char *port = NULL;
 	char *end;
 	int separator = '/';
@@ -650,20 +652,27 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
 
 	get_host_and_port(&host, &port);
 
-	if (*host && !port) {
-		/* get_host_and_port may not return a port even when there is
-		 * one: In the [host:port]:path case, get_host_and_port is
-		 * called with "[host:port]" and returns "host:port" and NULL.
-		 * In that specific case, we still need to split the port.
-		 * "host:port" may also look like "user@host:port". As the
-		 * `user` portion tends to be less strict than `host:port`,
-		 * we first put it out of the equation: since a hostname
-		 * cannot contain a '@', we start from the last '@' in the
-		 * string. */
+	if (*host) {
+		/* At this point, the host part may look like user@host:port.
+		 * As the `user` portion tends to be less strict than
+		 * `host:port`, we first put it out of the equation: since a
+		 * hostname cannot contain a '@', we start from the last '@'
+		 * in the string. */
 		char *end_user = strrchr(host, '@');
-		port = get_port(end_user ? end_user : host);
+		if (end_user) {
+			*end_user = '\0';
+			user = host;
+			host = end_user + 1;
+		}
 	}
+	/* get_host_and_port may not have returned a port even when there is
+	 * one: In the [host:port]:path case, get_host_and_port is called
+	 * with "[host:port]" and returns "host:port" and NULL.
+	 * In that specific case, we still need to split the port. */
+	if (!port)
+		port = get_port(host);
 
+	*ret_user = user ? xstrdup(user) : NULL;
 	*ret_host = xstrdup(host);
 	*ret_port = port ? xstrdup(port) : NULL;
 	*ret_path = path;
@@ -687,7 +696,7 @@ static struct child_process no_fork = CHILD_PROCESS_INIT;
 struct child_process *git_connect(int fd[2], const char *url,
 				  const char *prog, int flags)
 {
-	char *host, *port, *path;
+	char *user, *host, *port, *path;
 	struct child_process *conn = &no_fork;
 	enum protocol protocol;
 	struct strbuf cmd = STRBUF_INIT;
@@ -697,11 +706,14 @@ struct child_process *git_connect(int fd[2], const char *url,
 	 */
 	signal(SIGCHLD, SIG_DFL);
 
-	protocol = parse_connect_url(url, &host, &port, &path);
+	protocol = parse_connect_url(url, &user, &host, &port, &path);
 	if (flags & CONNECT_DIAG_URL) {
 		printf("Diag: url=%s\n", url ? url : "NULL");
 		printf("Diag: protocol=%s\n", prot_name(protocol));
-		printf("Diag: userandhost=%s\n", host ? host : "NULL");
+		if (user)
+			printf("Diag: userandhost=%s@%s\n", user, host);
+		else
+			printf("Diag: userandhost=%s\n", host ? host : "NULL");
 		printf("Diag: port=%s\n", port ? port : "NONE");
 		printf("Diag: path=%s\n", path ? path : "NULL");
 		conn = NULL;
@@ -810,7 +822,11 @@ struct child_process *git_connect(int fd[2], const char *url,
 				argv_array_push(&conn->args, putty ? "-P" : "-p");
 				argv_array_push(&conn->args, port);
 			}
-			argv_array_push(&conn->args, host);
+			if (user)
+				argv_array_pushf(&conn->args, "%s@%s",
+						 user, host);
+			else
+				argv_array_push(&conn->args, host);
 		} else {
 			transport_check_allowed("file");
 		}
@@ -823,6 +839,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 		fd[1] = conn->in;  /* write to child's stdin */
 		strbuf_release(&cmd);
 	}
+	free(user);
 	free(host);
 	free(port);
 	free(path);
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 7/9] connect: change the --diag-url output to separate user and host
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
                   ` (5 preceding siblings ...)
  2016-05-21 23:17 ` [PATCH v7 6/9] connect: make parse_connect_url() return the user part of the url as a separate value Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 8/9] connect: actively reject git:// urls with a user part Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 9/9] connect: move ssh command line preparation to a separate function Mike Hommey
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c             |  6 ++----
 t/t5500-fetch-pack.sh | 14 ++++++++++++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/connect.c b/connect.c
index 48d9cd2..52d34c7 100644
--- a/connect.c
+++ b/connect.c
@@ -710,10 +710,8 @@ struct child_process *git_connect(int fd[2], const char *url,
 	if (flags & CONNECT_DIAG_URL) {
 		printf("Diag: url=%s\n", url ? url : "NULL");
 		printf("Diag: protocol=%s\n", prot_name(protocol));
-		if (user)
-			printf("Diag: userandhost=%s@%s\n", user, host);
-		else
-			printf("Diag: userandhost=%s\n", host ? host : "NULL");
+		printf("Diag: user=%s\n", user ? user : "NULL");
+		printf("Diag: host=%s\n", host ? host : "NULL");
 		printf("Diag: port=%s\n", port ? port : "NONE");
 		printf("Diag: path=%s\n", path ? path : "NULL");
 		conn = NULL;
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 739c6b1..2d9c4be 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -553,7 +553,7 @@ check_prot_path () {
 	Diag: protocol=$2
 	Diag: path=$3
 	EOF
-	git fetch-pack --diag-url "$1" | egrep -v '(host|port)=' >actual &&
+	git fetch-pack --diag-url "$1" | egrep -v "(user|host|port)=" >actual &&
 	test_cmp expected actual
 }
 
@@ -568,10 +568,20 @@ check_prot_host_port_path () {
 		;;
 	esac
 	ehost=$(echo $3 | tr -d "[]")
+	case "$ehost" in
+		*@*)
+		user=${ehost%@*}
+		ehost=${ehost#$user@}
+		;;
+		*)
+		user=NULL
+		;;
+	esac
 	cat >exp <<-EOF &&
 	Diag: url=$1
 	Diag: protocol=$pp
-	Diag: userandhost=$ehost
+	Diag: user=$user
+	Diag: host=$ehost
 	Diag: port=$4
 	Diag: path=$5
 	EOF
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 8/9] connect: actively reject git:// urls with a user part
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
                   ` (6 preceding siblings ...)
  2016-05-21 23:17 ` [PATCH v7 7/9] connect: change the --diag-url output to separate user and host Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  2016-05-21 23:17 ` [PATCH v7 9/9] connect: move ssh command line preparation to a separate function Mike Hommey
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Currently, urls of the for git://user@host don't work because user@host
is not resolving at the DNS level, but we shouldn't be relying on it
being an invalid host name, and actively reject it for containing a
username in the first place.

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/connect.c b/connect.c
index 52d34c7..04ce210 100644
--- a/connect.c
+++ b/connect.c
@@ -726,6 +726,9 @@ struct child_process *git_connect(int fd[2], const char *url,
 		const char *colon = strchr(host, ':');
 		char *override_vhost = getenv("GIT_OVERRIDE_VIRTUAL_HOST");
 
+		if (user)
+			die("user@host is not allowed in git:// urls");
+
 		/* If the host contains a colon (ipv6 address), it needs to
 		 * be enclosed with square brackets. */
 		if (colon)
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 9/9] connect: move ssh command line preparation to a separate function
  2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
                   ` (7 preceding siblings ...)
  2016-05-21 23:17 ` [PATCH v7 8/9] connect: actively reject git:// urls with a user part Mike Hommey
@ 2016-05-21 23:17 ` Mike Hommey
  8 siblings, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-21 23:17 UTC (permalink / raw)
  To: git; +Cc: gitster, tboegi

Signed-off-by: Mike Hommey <mh@glandium.org>
---
 connect.c | 108 +++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 58 insertions(+), 50 deletions(-)

diff --git a/connect.c b/connect.c
index 04ce210..ddfda4e 100644
--- a/connect.c
+++ b/connect.c
@@ -680,6 +680,61 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_user,
 	return protocol;
 }
 
+static int prepare_ssh_command(struct argv_array *cmd, const char *user,
+			       const char *host, const char *port, int flags)
+{
+	const char *ssh;
+	int putty = 0, tortoiseplink = 0, use_shell = 1;
+	transport_check_allowed("ssh");
+
+	ssh = getenv("GIT_SSH_COMMAND");
+	if (!ssh) {
+		const char *base;
+		char *ssh_dup;
+
+		/*
+		 * GIT_SSH is the no-shell version of
+		 * GIT_SSH_COMMAND (and must remain so for
+		 * historical compatibility).
+		 */
+		use_shell = 0;
+
+		ssh = getenv("GIT_SSH");
+		if (!ssh)
+			ssh = "ssh";
+
+		ssh_dup = xstrdup(ssh);
+		base = basename(ssh_dup);
+
+		tortoiseplink = !strcasecmp(base, "tortoiseplink") ||
+			!strcasecmp(base, "tortoiseplink.exe");
+		putty = tortoiseplink ||
+			!strcasecmp(base, "plink") ||
+			!strcasecmp(base, "plink.exe");
+
+		free(ssh_dup);
+	}
+
+	argv_array_push(cmd, ssh);
+	if (flags & CONNECT_IPV4)
+		argv_array_push(cmd, "-4");
+	else if (flags & CONNECT_IPV6)
+		argv_array_push(cmd, "-6");
+	if (tortoiseplink)
+		argv_array_push(cmd, "-batch");
+	if (port) {
+		/* P is for PuTTY, p is for OpenSSH */
+		argv_array_push(cmd, putty ? "-P" : "-p");
+		argv_array_push(cmd, port);
+	}
+	if (user)
+		argv_array_pushf(cmd, "%s@%s", user, host);
+	else
+		argv_array_push(cmd, host);
+
+	return use_shell;
+}
+
 static struct child_process no_fork = CHILD_PROCESS_INIT;
 
 /*
@@ -776,59 +831,12 @@ struct child_process *git_connect(int fd[2], const char *url,
 
 		/* remove repo-local variables from the environment */
 		conn->env = local_repo_env;
-		conn->use_shell = 1;
 		conn->in = conn->out = -1;
 		if (protocol == PROTO_SSH) {
-			const char *ssh;
-			int putty = 0, tortoiseplink = 0;
-			transport_check_allowed("ssh");
-
-			ssh = getenv("GIT_SSH_COMMAND");
-			if (!ssh) {
-				const char *base;
-				char *ssh_dup;
-
-				/*
-				 * GIT_SSH is the no-shell version of
-				 * GIT_SSH_COMMAND (and must remain so for
-				 * historical compatibility).
-				 */
-				conn->use_shell = 0;
-
-				ssh = getenv("GIT_SSH");
-				if (!ssh)
-					ssh = "ssh";
-
-				ssh_dup = xstrdup(ssh);
-				base = basename(ssh_dup);
-
-				tortoiseplink = !strcasecmp(base, "tortoiseplink") ||
-					!strcasecmp(base, "tortoiseplink.exe");
-				putty = tortoiseplink ||
-					!strcasecmp(base, "plink") ||
-					!strcasecmp(base, "plink.exe");
-
-				free(ssh_dup);
-			}
-
-			argv_array_push(&conn->args, ssh);
-			if (flags & CONNECT_IPV4)
-				argv_array_push(&conn->args, "-4");
-			else if (flags & CONNECT_IPV6)
-				argv_array_push(&conn->args, "-6");
-			if (tortoiseplink)
-				argv_array_push(&conn->args, "-batch");
-			if (port) {
-				/* P is for PuTTY, p is for OpenSSH */
-				argv_array_push(&conn->args, putty ? "-P" : "-p");
-				argv_array_push(&conn->args, port);
-			}
-			if (user)
-				argv_array_pushf(&conn->args, "%s@%s",
-						 user, host);
-			else
-				argv_array_push(&conn->args, host);
+			conn->use_shell = prepare_ssh_command(
+				&conn->args, user, host, port, flags);
 		} else {
+			conn->use_shell = 1;
 			transport_check_allowed("file");
 		}
 		argv_array_push(&conn->args, cmd.buf);
-- 
2.8.3.401.ga81c606.dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-21 23:17 ` [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port Mike Hommey
@ 2016-05-22  2:40   ` Eric Sunshine
  2016-05-22  6:07   ` Torsten Bögershausen
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Sunshine @ 2016-05-22  2:40 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Git List, Junio C Hamano, Torsten Bögershausen

On Sat, May 21, 2016 at 7:17 PM, Mike Hommey <mh@glandium.org> wrote:
> Signed-off-by: Mike Hommey <mh@glandium.org>
> ---
> diff --git a/connect.c b/connect.c
> @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
>                         transport_check_allowed("ssh");
>                         get_host_and_port(&ssh_host, &port);
>
> +                       /* get_host_and_port may not return a port even when
> +                        * there is one: In the [host:port]:path case,
> +                        * get_host_and_port is called with "[host:port]" and
> +                        * returns "host:port" and NULL.
> +                        * In that specific case, we still need to split the
> +                        * port. */

Style:

    /*
     * This is a multi-
     * line comment.
     */

>                         if (!port)
>                                 port = get_port(ssh_host);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-21 23:17 ` [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port Mike Hommey
  2016-05-22  2:40   ` Eric Sunshine
@ 2016-05-22  6:07   ` Torsten Bögershausen
  2016-05-22  8:03     ` Mike Hommey
  1 sibling, 1 reply; 19+ messages in thread
From: Torsten Bögershausen @ 2016-05-22  6:07 UTC (permalink / raw)
  To: Mike Hommey, git; +Cc: gitster, tboegi

On 22.05.16 01:17, Mike Hommey wrote:
> Signed-off-by: Mike Hommey <mh@glandium.org>
> ---
>  connect.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/connect.c b/connect.c
> index c53f3f1..caa2a3c 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
>  			transport_check_allowed("ssh");
>  			get_host_and_port(&ssh_host, &port);
>  
> +			/* get_host_and_port may not return a port even when
> +			 * there is one: In the [host:port]:path case,
> +			 * get_host_and_port is called with "[host:port]" and
> +			 * returns "host:port" and NULL.
> +			 * In that specific case, we still need to split the
> +			 * port. */
Is it worth to mention that this case is "still supported legacy" ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-22  6:07   ` Torsten Bögershausen
@ 2016-05-22  8:03     ` Mike Hommey
  2016-05-23  4:31       ` Torsten Bögershausen
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Hommey @ 2016-05-22  8:03 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git, gitster

On Sun, May 22, 2016 at 08:07:05AM +0200, Torsten Bögershausen wrote:
> On 22.05.16 01:17, Mike Hommey wrote:
> > Signed-off-by: Mike Hommey <mh@glandium.org>
> > ---
> >  connect.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/connect.c b/connect.c
> > index c53f3f1..caa2a3c 100644
> > --- a/connect.c
> > +++ b/connect.c
> > @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
> >  			transport_check_allowed("ssh");
> >  			get_host_and_port(&ssh_host, &port);
> >  
> > +			/* get_host_and_port may not return a port even when
> > +			 * there is one: In the [host:port]:path case,
> > +			 * get_host_and_port is called with "[host:port]" and
> > +			 * returns "host:port" and NULL.
> > +			 * In that specific case, we still need to split the
> > +			 * port. */
> Is it worth to mention that this case is "still supported legacy" ?

If it's worth mentioning anywhere, it seems to me it would start with
urls.txt?

Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-22  8:03     ` Mike Hommey
@ 2016-05-23  4:31       ` Torsten Bögershausen
  2016-05-23 21:30         ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Torsten Bögershausen @ 2016-05-23  4:31 UTC (permalink / raw)
  To: Mike Hommey, Torsten Bögershausen; +Cc: git, gitster

On 05/22/2016 10:03 AM, Mike Hommey wrote:
> On Sun, May 22, 2016 at 08:07:05AM +0200, Torsten Bögershausen wrote:
>> On 22.05.16 01:17, Mike Hommey wrote:
>>> Signed-off-by: Mike Hommey <mh@glandium.org>
>>> ---
>>>   connect.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/connect.c b/connect.c
>>> index c53f3f1..caa2a3c 100644
>>> --- a/connect.c
>>> +++ b/connect.c
>>> @@ -742,6 +742,12 @@ struct child_process *git_connect(int fd[2], const char *url,
>>>   			transport_check_allowed("ssh");
>>>   			get_host_and_port(&ssh_host, &port);
>>>   
>>> +			/* get_host_and_port may not return a port even when
>>> +			 * there is one: In the [host:port]:path case,
>>> +			 * get_host_and_port is called with "[host:port]" and
>>> +			 * returns "host:port" and NULL.
>>> +			 * In that specific case, we still need to split the
>>> +			 * port. */
>> Is it worth to mention that this case is "still supported legacy" ?
> If it's worth mentioning anywhere, it seems to me it would start with
> urls.txt?
>
> Mike
>
I don't know.
urls.txt is for Git users, and connect.c is for Git developers.
urls.txt does not mention that Git follows any RFC when parsing the
URLS', it doesn't claim to be 100% compliant.
Even if it makes sense to do so, as many user simply expect Git to accept
RFC compliant URL's, and it makes the development easier, if there is an 
already
written specification, that describes all the details.
The parser is not 100% RFC compliant, one example:
- old-style usgage like "git clone [host:222]:~/path/to/repo are supported
To easy the live for developers, it could make sense why the code is as 
it is,
simply to avoid that people around the world suddenly run into problems,
when they upgrade the Git version.
So if there is a comment in the code, it could help new developers to 
understand
things easier.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-23  4:31       ` Torsten Bögershausen
@ 2016-05-23 21:30         ` Junio C Hamano
  2016-05-23 21:50           ` Mike Hommey
  2016-05-24  4:44           ` Torsten Bögershausen
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2016-05-23 21:30 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Mike Hommey, git

Torsten Bögershausen <tboegi@web.de> writes:

>>>>   			get_host_and_port(&ssh_host, &port);
>>>>   +			/* get_host_and_port may not return a port
>>>> even when
>>>> +			 * there is one: In the [host:port]:path case,
>>>> +			 * get_host_and_port is called with "[host:port]" and
>>>> +			 * returns "host:port" and NULL.
>>>> +			 * In that specific case, we still need to split the
>>>> +			 * port. */
>>> Is it worth to mention that this case is "still supported legacy" ?
>> If it's worth mentioning anywhere, it seems to me it would start with
>> urls.txt?
>>
>> Mike
>>
> I don't know.
> urls.txt is for Git users, and connect.c is for Git developers.
> urls.txt does not mention that Git follows any RFC when parsing the
> URLS', it doesn't claim to be 100% compliant.
> Even if it makes sense to do so, as many user simply expect Git to accept
> RFC compliant URL's, and it makes the development easier, if there is
> an already
> written specification, that describes all the details.
> The parser is not 100% RFC compliant, one example:
> - old-style usgage like "git clone [host:222]:~/path/to/repo are supported

Is it an option to fix get_host_and_port() so that it returns what
the caller expects even when it is given "[host:port]"?  When the
caller passes other forms like "host:port", it expects to get "host"
and "port" parsed out into two variables.  Why can't the caller
expect to see the same happen when feeding "[host:port]" to the
function?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-23 21:30         ` Junio C Hamano
@ 2016-05-23 21:50           ` Mike Hommey
  2016-05-24  4:44           ` Torsten Bögershausen
  1 sibling, 0 replies; 19+ messages in thread
From: Mike Hommey @ 2016-05-23 21:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Torsten Bögershausen, git

On Mon, May 23, 2016 at 02:30:57PM -0700, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
> >>>>   			get_host_and_port(&ssh_host, &port);
> >>>>   +			/* get_host_and_port may not return a port
> >>>> even when
> >>>> +			 * there is one: In the [host:port]:path case,
> >>>> +			 * get_host_and_port is called with "[host:port]" and
> >>>> +			 * returns "host:port" and NULL.
> >>>> +			 * In that specific case, we still need to split the
> >>>> +			 * port. */
> >>> Is it worth to mention that this case is "still supported legacy" ?
> >> If it's worth mentioning anywhere, it seems to me it would start with
> >> urls.txt?
> >>
> >> Mike
> >>
> > I don't know.
> > urls.txt is for Git users, and connect.c is for Git developers.
> > urls.txt does not mention that Git follows any RFC when parsing the
> > URLS', it doesn't claim to be 100% compliant.
> > Even if it makes sense to do so, as many user simply expect Git to accept
> > RFC compliant URL's, and it makes the development easier, if there is
> > an already
> > written specification, that describes all the details.
> > The parser is not 100% RFC compliant, one example:
> > - old-style usgage like "git clone [host:222]:~/path/to/repo are supported

"This parser handles the urls described in urls.txt" is about as much as
there is to say IMHO.

> Is it an option to fix get_host_and_port() so that it returns what
> the caller expects even when it is given "[host:port]"?  When the
> caller passes other forms like "host:port", it expects to get "host"
> and "port" parsed out into two variables.  Why can't the caller
> expect to see the same happen when feeding "[host:port]" to the
> function?

After this series, there's only one use of get_host_and_port(). As well
as one use of host_end() and get_port(). They become implementation
details of the parse_connect_url function. And as I mentioned in the
cover letter this round, parse_connect_url could be further modified to
avoid the kind of back-and-forth that's going on between these
functions. But I'd rather leave that for later.

Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-23 21:30         ` Junio C Hamano
  2016-05-23 21:50           ` Mike Hommey
@ 2016-05-24  4:44           ` Torsten Bögershausen
  2016-05-25 23:34             ` Mike Hommey
  1 sibling, 1 reply; 19+ messages in thread
From: Torsten Bögershausen @ 2016-05-24  4:44 UTC (permalink / raw)
  To: Junio C Hamano, Torsten Bögershausen; +Cc: Mike Hommey, git

On 05/23/2016 11:30 PM, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
>>>>>    			get_host_and_port(&ssh_host, &port);
>>>>>    +			/* get_host_and_port may not return a port
>>>>> even when
>>>>> +			 * there is one: In the [host:port]:path case,
>>>>> +			 * get_host_and_port is called with "[host:port]" and
>>>>> +			 * returns "host:port" and NULL.
>>>>> +			 * In that specific case, we still need to split the
>>>>> +			 * port. */
>>>> Is it worth to mention that this case is "still supported legacy" ?
>>> If it's worth mentioning anywhere, it seems to me it would start with
>>> urls.txt?
>>>
>>> Mike
>>>
>> I don't know.
>> urls.txt is for Git users, and connect.c is for Git developers.
>> urls.txt does not mention that Git follows any RFC when parsing the
>> URLS', it doesn't claim to be 100% compliant.
>> Even if it makes sense to do so, as many user simply expect Git to accept
>> RFC compliant URL's, and it makes the development easier, if there is
>> an already
>> written specification, that describes all the details.
>> The parser is not 100% RFC compliant, one example:
>> - old-style usgage like "git clone [host:222]:~/path/to/repo are supported
> Is it an option to fix get_host_and_port() so that it returns what
> the caller expects even when it is given "[host:port]"?  When the
> caller passes other forms like "host:port", it expects to get "host"
> and "port" parsed out into two variables.  Why can't the caller
> expect to see the same happen when feeding "[host:port]" to the
> function?
>
This is somewhat out of my head:
git clone   git://[example.com:123]:/test        #illegal, malformated URL
git clone   [example.com:123]:/test               #scp-like URL, legacy
git clone   ssh://[example.com:123]:/test       #illegal, but supported 
as legacy, because
git clone  ssh://[user@::1]/test                       # was the only 
way to specify a user name at a literal IPv6 address

May be we should have some  more test cases for malformated git:// URLs?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-24  4:44           ` Torsten Bögershausen
@ 2016-05-25 23:34             ` Mike Hommey
  2016-05-26  5:35               ` Torsten Bögershausen
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Hommey @ 2016-05-25 23:34 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: Junio C Hamano, git

On Tue, May 24, 2016 at 06:44:26AM +0200, Torsten Bögershausen wrote:
> On 05/23/2016 11:30 PM, Junio C Hamano wrote:
> > Torsten Bögershausen <tboegi@web.de> writes:
> > 
> > > > > >    			get_host_and_port(&ssh_host, &port);
> > > > > >    +			/* get_host_and_port may not return a port
> > > > > > even when
> > > > > > +			 * there is one: In the [host:port]:path case,
> > > > > > +			 * get_host_and_port is called with "[host:port]" and
> > > > > > +			 * returns "host:port" and NULL.
> > > > > > +			 * In that specific case, we still need to split the
> > > > > > +			 * port. */
> > > > > Is it worth to mention that this case is "still supported legacy" ?
> > > > If it's worth mentioning anywhere, it seems to me it would start with
> > > > urls.txt?
> > > > 
> > > > Mike
> > > > 
> > > I don't know.
> > > urls.txt is for Git users, and connect.c is for Git developers.
> > > urls.txt does not mention that Git follows any RFC when parsing the
> > > URLS', it doesn't claim to be 100% compliant.
> > > Even if it makes sense to do so, as many user simply expect Git to accept
> > > RFC compliant URL's, and it makes the development easier, if there is
> > > an already
> > > written specification, that describes all the details.
> > > The parser is not 100% RFC compliant, one example:
> > > - old-style usgage like "git clone [host:222]:~/path/to/repo are supported
> > Is it an option to fix get_host_and_port() so that it returns what
> > the caller expects even when it is given "[host:port]"?  When the
> > caller passes other forms like "host:port", it expects to get "host"
> > and "port" parsed out into two variables.  Why can't the caller
> > expect to see the same happen when feeding "[host:port]" to the
> > function?
> > 
> This is somewhat out of my head:
> git clone   git://[example.com:123]:/test        #illegal, malformated URL
> git clone   [example.com:123]:/test               #scp-like URL, legacy
> git clone   ssh://[example.com:123]:/test       #illegal, but supported as
> legacy, because
> git clone  ssh://[user@::1]/test                       # was the only way to
> specify a user name at a literal IPv6 address
> 
> May be we should have some  more test cases for malformated git:// URLs?

None of these malformed urls are rejected with or without my series
applied:

Without:
$ git fetch-pack --diag-url git://[example.com:123]:/test 
Diag: url=git://[example.com:123]:/test
Diag: protocol=git
Diag: hostandport=[example.com:123]:
Diag: path=/test
$ git fetch-pack --diag-url
ssh://[example.com:123]:/test
Diag: url=ssh://[example.com:123]:/test
Diag: protocol=ssh
Diag: userandhost=example.com
Diag: port=123
Diag: path=/test

With:
$ git fetch-pack --diag-url git://[example.com:123]:/test 
Diag: url=git://[example.com:123]:/test
Diag: protocol=git
Diag: user=NULL
Diag: host=example.com
Diag: port=123
Diag: path=/test
$ git fetch-pack --diag-url ssh://[example.com:123]:/test
Diag: url=ssh://[example.com:123]:/test
Diag: protocol=ssh
Diag: user=NULL
Diag: host=example.com
Diag: port=123
Diag: path=/test

Note in the first case, hostandport is "[example.com:123]:", and that
is treated as host=example.com:123 and port=NULL further down, so my
series is changing something here, but arguably, it makes it less worse.
(note that both with and without my series,
"git://[example.com:123]:42/path" is treated the same, so only a corner
case changed)

Can we go forward with the current series (modulo the comment style
thing Eric noted, and maybe adding a note about the parser handling urls
as per urls.txt), and not bloat scope it? If anything, the state of the
code after the series should make further parser changes easier.

Cheers,

Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port
  2016-05-25 23:34             ` Mike Hommey
@ 2016-05-26  5:35               ` Torsten Bögershausen
  0 siblings, 0 replies; 19+ messages in thread
From: Torsten Bögershausen @ 2016-05-26  5:35 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Junio C Hamano, git

On 05/26/2016 01:34 AM, Mike Hommey wrote:
> On Tue, May 24, 2016 at 06:44:26AM +0200, Torsten Bögershausen wrote:
>> On 05/23/2016 11:30 PM, Junio C Hamano wrote:
>>> Torsten Bögershausen <tboegi@web.de> writes:
>>>
>>>>>>>     			get_host_and_port(&ssh_host, &port);
>>>>>>>     +			/* get_host_and_port may not return a port
>>>>>>> even when
>>>>>>> +			 * there is one: In the [host:port]:path case,
>>>>>>> +			 * get_host_and_port is called with "[host:port]" and
>>>>>>> +			 * returns "host:port" and NULL.
>>>>>>> +			 * In that specific case, we still need to split the
>>>>>>> +			 * port. */
>>>>>> Is it worth to mention that this case is "still supported legacy" ?
>>>>> If it's worth mentioning anywhere, it seems to me it would start with
>>>>> urls.txt?
>>>>>
>>>>> Mike
>>>>>
>>>> I don't know.
>>>> urls.txt is for Git users, and connect.c is for Git developers.
>>>> urls.txt does not mention that Git follows any RFC when parsing the
>>>> URLS', it doesn't claim to be 100% compliant.
>>>> Even if it makes sense to do so, as many user simply expect Git to accept
>>>> RFC compliant URL's, and it makes the development easier, if there is
>>>> an already
>>>> written specification, that describes all the details.
>>>> The parser is not 100% RFC compliant, one example:
>>>> - old-style usgage like "git clone [host:222]:~/path/to/repo are supported
>>> Is it an option to fix get_host_and_port() so that it returns what
>>> the caller expects even when it is given "[host:port]"?  When the
>>> caller passes other forms like "host:port", it expects to get "host"
>>> and "port" parsed out into two variables.  Why can't the caller
>>> expect to see the same happen when feeding "[host:port]" to the
>>> function?
>>>
>> This is somewhat out of my head:
>> git clone   git://[example.com:123]:/test        #illegal, malformated URL
>> git clone   [example.com:123]:/test               #scp-like URL, legacy
>> git clone   ssh://[example.com:123]:/test       #illegal, but supported as
>> legacy, because
>> git clone  ssh://[user@::1]/test                       # was the only way to
>> specify a user name at a literal IPv6 address
>>
>> May be we should have some  more test cases for malformated git:// URLs?
>
> None of these malformed urls are rejected with or without my series
> applied:
>
> Without:
> $ git fetch-pack --diag-url git://[example.com:123]:/test
> Diag: url=git://[example.com:123]:/test
> Diag: protocol=git
> Diag: hostandport=[example.com:123]:
> Diag: path=/test
> $ git fetch-pack --diag-url
> ssh://[example.com:123]:/test
> Diag: url=ssh://[example.com:123]:/test
> Diag: protocol=ssh
> Diag: userandhost=example.com
> Diag: port=123
> Diag: path=/test
>
> With:
> $ git fetch-pack --diag-url git://[example.com:123]:/test
> Diag: url=git://[example.com:123]:/test
> Diag: protocol=git
> Diag: user=NULL
> Diag: host=example.com
> Diag: port=123
> Diag: path=/test
> $ git fetch-pack --diag-url ssh://[example.com:123]:/test
> Diag: url=ssh://[example.com:123]:/test
> Diag: protocol=ssh
> Diag: user=NULL
> Diag: host=example.com
> Diag: port=123
> Diag: path=/test
>
> Note in the first case, hostandport is "[example.com:123]:", and that
> is treated as host=example.com:123 and port=NULL further down, so my
> series is changing something here, but arguably, it makes it less worse.
> (note that both with and without my series,
> "git://[example.com:123]:42/path" is treated the same, so only a corner
> case changed)
>
> Can we go forward with the current series (modulo the comment style
> thing Eric noted, and maybe adding a note about the parser handling urls
> as per urls.txt), and not bloat scope it? If anything, the state of the
> code after the series should make further parser changes easier.
>
> Cheers,
>
> Mike
>


Thanks for digging.

How about something like this:

/*
  * get_host_and_port may not return a port in the [host:port]:path case.
  * To support this undocumented legacy we still need to split the port.
*/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-05-26  5:36 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-21 23:17 [PATCH v7 0/9] connect: various cleanups Mike Hommey
2016-05-21 23:17 ` [PATCH v7 1/9] connect: document why we sometimes call get_port after get_host_and_port Mike Hommey
2016-05-22  2:40   ` Eric Sunshine
2016-05-22  6:07   ` Torsten Bögershausen
2016-05-22  8:03     ` Mike Hommey
2016-05-23  4:31       ` Torsten Bögershausen
2016-05-23 21:30         ` Junio C Hamano
2016-05-23 21:50           ` Mike Hommey
2016-05-24  4:44           ` Torsten Bögershausen
2016-05-25 23:34             ` Mike Hommey
2016-05-26  5:35               ` Torsten Bögershausen
2016-05-21 23:17 ` [PATCH v7 2/9] connect: call get_host_and_port() earlier Mike Hommey
2016-05-21 23:17 ` [PATCH v7 3/9] connect: re-derive a host:port string from the separate host and port variables Mike Hommey
2016-05-21 23:17 ` [PATCH v7 4/9] connect: make parse_connect_url() return separated host and port Mike Hommey
2016-05-21 23:17 ` [PATCH v7 5/9] connect: group CONNECT_DIAG_URL handling code Mike Hommey
2016-05-21 23:17 ` [PATCH v7 6/9] connect: make parse_connect_url() return the user part of the url as a separate value Mike Hommey
2016-05-21 23:17 ` [PATCH v7 7/9] connect: change the --diag-url output to separate user and host Mike Hommey
2016-05-21 23:17 ` [PATCH v7 8/9] connect: actively reject git:// urls with a user part Mike Hommey
2016-05-21 23:17 ` [PATCH v7 9/9] connect: move ssh command line preparation to a separate function Mike Hommey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).