git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] upload-pack: handle unexpected v2 delim packets
@ 2020-03-27  8:02 Jeff King
  2020-03-27  8:03 ` [PATCH 1/2] test-lib-functions: make packetize() more efficient Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Jeff King @ 2020-03-27  8:02 UTC (permalink / raw)
  To: git

We saw an upload-pack segfault in the wild today at GitHub. It's caused
by a client sending bogus v2 protocol lines (a "delim" packet instead of
a "flush"). So the client is broken and our only option is to break the
network connection, but we shouldn't segfault while doing so. :)

I don't think the broken client was Git. It didn't send an "agent"
capability at all, which makes me suspect it was somebody generating the
request manually (nor was there anything interesting in the transport
layer agent; it was just an openssh client).

The fix was simple enough, and is in the second patch. The first one is
just a small cleanup / refactor in preparation.

  [1/2]: test-lib-functions: make packetize() more efficient
  [2/2]: upload-pack: handle unexpected delim packets

 ls-refs.c                              |  5 +++-
 t/t5562-http-backend-content-length.sh | 19 +++++++++------
 t/t5704-protocol-violations.sh         | 33 ++++++++++++++++++++++++++
 t/test-lib-functions.sh                | 23 ++++++++++++------
 upload-pack.c                          |  5 +++-
 5 files changed, 69 insertions(+), 16 deletions(-)
 create mode 100755 t/t5704-protocol-violations.sh

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-27  8:02 [PATCH 0/2] upload-pack: handle unexpected v2 delim packets Jeff King
@ 2020-03-27  8:03 ` Jeff King
  2020-03-27 15:16   ` Taylor Blau
  2020-03-27 19:18   ` Junio C Hamano
  2020-03-27  8:03 ` [PATCH 2/2] upload-pack: handle unexpected delim packets Jeff King
  2020-03-27 15:18 ` [PATCH 0/2] upload-pack: handle unexpected v2 " Taylor Blau
  2 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2020-03-27  8:03 UTC (permalink / raw)
  To: git

The packetize() function takes its input on stdin, and requires 4
separate sub-processes to format a simple string. We can do much better
by getting the length via the shell's "${#packet}" construct. The one
caveat is that the shell can't put a NUL into a variable, so we'll have
to continue to provide the stdin form for a few calls.

There are a few other cleanups here in the touched code:

 - the stdin form of packetize() had an extra stray "%s" when printing
   the packet

 - the converted calls in t5562 can be made simpler by redirecting
   output as a block, rather than repeated appending

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5562-http-backend-content-length.sh | 19 ++++++++++++-------
 t/test-lib-functions.sh                | 23 ++++++++++++++++-------
 2 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 4a110b307e..3f4ac71f83 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -53,15 +53,20 @@ test_expect_success 'setup' '
 	test_commit c1 &&
 	hash_head=$(git rev-parse HEAD) &&
 	hash_prev=$(git rev-parse HEAD~1) &&
-	printf "want %s" "$hash_head" | packetize >fetch_body &&
-	printf 0000 >>fetch_body &&
-	printf "have %s" "$hash_prev" | packetize >>fetch_body &&
-	printf done | packetize >>fetch_body &&
+	{
+		packetize "want $hash_head" &&
+		printf 0000 &&
+		packetize "have $hash_prev" &&
+		packetize "done"
+	} >fetch_body &&
 	test_copy_bytes 10 <fetch_body >fetch_body.trunc &&
 	hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) &&
-	printf "%s %s refs/heads/newbranch\\0report-status\\n" "$ZERO_OID" "$hash_next" | packetize >push_body &&
-	printf 0000 >>push_body &&
-	echo "$hash_next" | git pack-objects --stdout >>push_body &&
+	{
+		printf "%s %s refs/heads/newbranch\\0report-status\\n" \
+			"$ZERO_OID" "$hash_next" | packetize &&
+		printf 0000 &&
+		echo "$hash_next" | git pack-objects --stdout
+	} >push_body &&
 	test_copy_bytes 10 <push_body >push_body.trunc &&
 	: >empty_body
 '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 352c213d52..216918a58c 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1362,14 +1362,23 @@ nongit () {
 	)
 } 7>&2 2>&4
 
-# convert stdin to pktline representation; note that empty input becomes an
-# empty packet, not a flush packet (for that you can just print 0000 yourself).
+# convert function arguments or stdin (if not arguments given) to pktline
+# representation. If multiple arguments are given, they are separated by
+# whitespace and put in a single packet. Note that data containing NULs must be
+# given on stdin, and that empty input becomes an empty packet, not a flush
+# packet (for that you can just print 0000 yourself).
 packetize() {
-	cat >packetize.tmp &&
-	len=$(wc -c <packetize.tmp) &&
-	printf '%04x%s' "$(($len + 4))" &&
-	cat packetize.tmp &&
-	rm -f packetize.tmp
+	if test $# -gt 0
+	then
+		packet="$*"
+		printf '%04x%s' "$((4 + ${#packet}))" "$packet"
+	else
+		cat >packetize.tmp &&
+		len=$(wc -c <packetize.tmp) &&
+		printf '%04x' "$(($len + 4))" &&
+		cat packetize.tmp &&
+		rm -f packetize.tmp
+	fi
 }
 
 # Parse the input as a series of pktlines, writing the result to stdout.
-- 
2.26.0.581.g322f77c0ee


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] upload-pack: handle unexpected delim packets
  2020-03-27  8:02 [PATCH 0/2] upload-pack: handle unexpected v2 delim packets Jeff King
  2020-03-27  8:03 ` [PATCH 1/2] test-lib-functions: make packetize() more efficient Jeff King
@ 2020-03-27  8:03 ` Jeff King
  2020-03-27 15:17   ` Taylor Blau
  2020-03-27 15:18 ` [PATCH 0/2] upload-pack: handle unexpected v2 " Taylor Blau
  2 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-03-27  8:03 UTC (permalink / raw)
  To: git

When processing the arguments list for a v2 ls-refs or fetch command, we
loop like this:

  while (packet_reader_read(request) != PACKET_READ_FLUSH) {
          const char *arg = request->line;
	  ...handle arg...
  }

to read and handle packets until we see a flush. The hidden assumption
here is that anything except PACKET_READ_FLUSH will give us valid packet
data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF
will leave packet->line as NULL, and we'll segfault trying to look at
it.

Instead, we should follow the more careful model demonstrated on the
client side (e.g., in process_capabilities_v2): keep looping as long
as we get normal packets, and then make sure that we broke out of the
loop due to a real flush. That fixes the segfault and correctly
diagnoses any unexpected input from the client.

Signed-off-by: Jeff King <peff@peff.net>
---
 ls-refs.c                      |  5 ++++-
 t/t5704-protocol-violations.sh | 33 +++++++++++++++++++++++++++++++++
 upload-pack.c                  |  5 ++++-
 3 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100755 t/t5704-protocol-violations.sh

diff --git a/ls-refs.c b/ls-refs.c
index 818aef70a0..50d86866c6 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -93,7 +93,7 @@ int ls_refs(struct repository *r, struct argv_array *keys,
 
 	git_config(ls_refs_config, NULL);
 
-	while (packet_reader_read(request) != PACKET_READ_FLUSH) {
+	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
 		const char *arg = request->line;
 		const char *out;
 
@@ -105,6 +105,9 @@ int ls_refs(struct repository *r, struct argv_array *keys,
 			argv_array_push(&data.prefixes, out);
 	}
 
+	if (request->status != PACKET_READ_FLUSH)
+		die(_("expected flush after ls-refs arguments"));
+
 	head_ref_namespaced(send_ref, &data);
 	for_each_namespaced_ref(send_ref, &data);
 	packet_flush(1);
diff --git a/t/t5704-protocol-violations.sh b/t/t5704-protocol-violations.sh
new file mode 100755
index 0000000000..950cfb21fe
--- /dev/null
+++ b/t/t5704-protocol-violations.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+test_description='Test responses to violations of the network protocol. In most
+of these cases it will generally be acceptable for one side to break off
+communications if the other side says something unexpected. We are mostly
+making sure that we do not segfault or otherwise behave badly.'
+. ./test-lib.sh
+
+test_expect_success 'extra delim packet in v2 ls-refs args' '
+	{
+		packetize command=ls-refs &&
+		printf 0001 &&
+		# protocol expects 0000 flush here
+		printf 0001
+	} >input &&
+	test_must_fail env GIT_PROTOCOL=version=2 \
+		git upload-pack . <input 2>err &&
+	test_i18ngrep "expected flush after ls-refs arguments" err
+'
+
+test_expect_success 'extra delim packet in v2 fetch args' '
+	{
+		packetize command=fetch &&
+		printf 0001 &&
+		# protocol expects 0000 flush here
+		printf 0001
+	} >input &&
+	test_must_fail env GIT_PROTOCOL=version=2 \
+		git upload-pack . <input 2>err &&
+	test_i18ngrep "expected flush after fetch arguments" err
+'
+
+test_done
diff --git a/upload-pack.c b/upload-pack.c
index c53249cac1..902d0ad5e1 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1252,7 +1252,7 @@ static void process_args(struct packet_reader *request,
 			 struct upload_pack_data *data,
 			 struct object_array *want_obj)
 {
-	while (packet_reader_read(request) != PACKET_READ_FLUSH) {
+	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
 		const char *arg = request->line;
 		const char *p;
 
@@ -1321,6 +1321,9 @@ static void process_args(struct packet_reader *request,
 		/* ignore unknown lines maybe? */
 		die("unexpected line: '%s'", arg);
 	}
+
+	if (request->status != PACKET_READ_FLUSH)
+		die(_("expected flush after fetch arguments"));
 }
 
 static int process_haves(struct oid_array *haves, struct oid_array *common,
-- 
2.26.0.581.g322f77c0ee

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-27  8:03 ` [PATCH 1/2] test-lib-functions: make packetize() more efficient Jeff King
@ 2020-03-27 15:16   ` Taylor Blau
  2020-03-28 12:25     ` Jeff King
  2020-03-27 19:18   ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Taylor Blau @ 2020-03-27 15:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Fri, Mar 27, 2020 at 04:03:00AM -0400, Jeff King wrote:
> The packetize() function takes its input on stdin, and requires 4
> separate sub-processes to format a simple string. We can do much better
> by getting the length via the shell's "${#packet}" construct. The one
> caveat is that the shell can't put a NUL into a variable, so we'll have
> to continue to provide the stdin form for a few calls.
>
> There are a few other cleanups here in the touched code:
>
>  - the stdin form of packetize() had an extra stray "%s" when printing
>    the packet
>
>  - the converted calls in t5562 can be made simpler by redirecting
>    output as a block, rather than repeated appending
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5562-http-backend-content-length.sh | 19 ++++++++++++-------
>  t/test-lib-functions.sh                | 23 ++++++++++++++++-------
>  2 files changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
> index 4a110b307e..3f4ac71f83 100755
> --- a/t/t5562-http-backend-content-length.sh
> +++ b/t/t5562-http-backend-content-length.sh
> @@ -53,15 +53,20 @@ test_expect_success 'setup' '
>  	test_commit c1 &&
>  	hash_head=$(git rev-parse HEAD) &&
>  	hash_prev=$(git rev-parse HEAD~1) &&
> -	printf "want %s" "$hash_head" | packetize >fetch_body &&
> -	printf 0000 >>fetch_body &&
> -	printf "have %s" "$hash_prev" | packetize >>fetch_body &&
> -	printf done | packetize >>fetch_body &&
> +	{
> +		packetize "want $hash_head" &&
> +		printf 0000 &&
> +		packetize "have $hash_prev" &&
> +		packetize "done"
> +	} >fetch_body &&

Nicely done, this is an obvious improvement in clarity over the
appending '>>' that was here before. The new version reads mouch more
cleanly.

>  	test_copy_bytes 10 <fetch_body >fetch_body.trunc &&
>  	hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) &&
> -	printf "%s %s refs/heads/newbranch\\0report-status\\n" "$ZERO_OID" "$hash_next" | packetize >push_body &&
> -	printf 0000 >>push_body &&
> -	echo "$hash_next" | git pack-objects --stdout >>push_body &&
> +	{
> +		printf "%s %s refs/heads/newbranch\\0report-status\\n" \
> +			"$ZERO_OID" "$hash_next" | packetize &&
> +		printf 0000 &&
> +		echo "$hash_next" | git pack-objects --stdout
> +	} >push_body &&
>  	test_copy_bytes 10 <push_body >push_body.trunc &&
>  	: >empty_body
>  '
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 352c213d52..216918a58c 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1362,14 +1362,23 @@ nongit () {
>  	)
>  } 7>&2 2>&4
>
> -# convert stdin to pktline representation; note that empty input becomes an
> -# empty packet, not a flush packet (for that you can just print 0000 yourself).
> +# convert function arguments or stdin (if not arguments given) to pktline
> +# representation. If multiple arguments are given, they are separated by
> +# whitespace and put in a single packet. Note that data containing NULs must be
> +# given on stdin, and that empty input becomes an empty packet, not a flush
> +# packet (for that you can just print 0000 yourself).
>  packetize() {
> -	cat >packetize.tmp &&
> -	len=$(wc -c <packetize.tmp) &&
> -	printf '%04x%s' "$(($len + 4))" &&
> -	cat packetize.tmp &&
> -	rm -f packetize.tmp
> +	if test $# -gt 0
> +	then
> +		packet="$*"

Mentioned off-list in a discussion already, but: though I find this
behavior of joining multiple arguments by a whitespace character a
little confusing (i.e., what would callers thing this function does if
they hadn't read the documentation?) I think that this is probably the
most sane thing that you could do here.

On the other hand, nowhere in this patch do we currently call packetize
with multiple arguments, so perhaps it would be made simpler if we
instead wrote "$1" anywhere there was "$packet".

> +		printf '%04x%s' "$((4 + ${#packet}))" "$packet"
> +	else
> +		cat >packetize.tmp &&
> +		len=$(wc -c <packetize.tmp) &&
> +		printf '%04x' "$(($len + 4))" &&
> +		cat packetize.tmp &&
> +		rm -f packetize.tmp
> +	fi
>  }

Right: if the contents contains an unrepresentable byte, then it has to
be passed over stdin. This part is obviously correct.

>  # Parse the input as a series of pktlines, writing the result to stdout.
> --
> 2.26.0.581.g322f77c0ee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] upload-pack: handle unexpected delim packets
  2020-03-27  8:03 ` [PATCH 2/2] upload-pack: handle unexpected delim packets Jeff King
@ 2020-03-27 15:17   ` Taylor Blau
  0 siblings, 0 replies; 16+ messages in thread
From: Taylor Blau @ 2020-03-27 15:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Fri, Mar 27, 2020 at 04:03:38AM -0400, Jeff King wrote:
> When processing the arguments list for a v2 ls-refs or fetch command, we
> loop like this:
>
>   while (packet_reader_read(request) != PACKET_READ_FLUSH) {
>           const char *arg = request->line;
> 	  ...handle arg...
>   }
>
> to read and handle packets until we see a flush. The hidden assumption
> here is that anything except PACKET_READ_FLUSH will give us valid packet
> data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF
> will leave packet->line as NULL, and we'll segfault trying to look at
> it.
>
> Instead, we should follow the more careful model demonstrated on the
> client side (e.g., in process_capabilities_v2): keep looping as long
> as we get normal packets, and then make sure that we broke out of the
> loop due to a real flush. That fixes the segfault and correctly
> diagnoses any unexpected input from the client.

This is a very clean fix, thank you.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  ls-refs.c                      |  5 ++++-
>  t/t5704-protocol-violations.sh | 33 +++++++++++++++++++++++++++++++++
>  upload-pack.c                  |  5 ++++-
>  3 files changed, 41 insertions(+), 2 deletions(-)
>  create mode 100755 t/t5704-protocol-violations.sh
>
> diff --git a/ls-refs.c b/ls-refs.c
> index 818aef70a0..50d86866c6 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -93,7 +93,7 @@ int ls_refs(struct repository *r, struct argv_array *keys,
>
>  	git_config(ls_refs_config, NULL);
>
> -	while (packet_reader_read(request) != PACKET_READ_FLUSH) {
> +	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
>  		const char *arg = request->line;
>  		const char *out;
>
> @@ -105,6 +105,9 @@ int ls_refs(struct repository *r, struct argv_array *keys,
>  			argv_array_push(&data.prefixes, out);
>  	}
>
> +	if (request->status != PACKET_READ_FLUSH)
> +		die(_("expected flush after ls-refs arguments"));
> +

...and it's implemented faithfully here. Thanks.

>  	head_ref_namespaced(send_ref, &data);
>  	for_each_namespaced_ref(send_ref, &data);
>  	packet_flush(1);
> diff --git a/t/t5704-protocol-violations.sh b/t/t5704-protocol-violations.sh
> new file mode 100755
> index 0000000000..950cfb21fe
> --- /dev/null
> +++ b/t/t5704-protocol-violations.sh
> @@ -0,0 +1,33 @@
> +#!/bin/sh
> +
> +test_description='Test responses to violations of the network protocol. In most
> +of these cases it will generally be acceptable for one side to break off
> +communications if the other side says something unexpected. We are mostly
> +making sure that we do not segfault or otherwise behave badly.'
> +. ./test-lib.sh
> +
> +test_expect_success 'extra delim packet in v2 ls-refs args' '
> +	{
> +		packetize command=ls-refs &&
> +		printf 0001 &&
> +		# protocol expects 0000 flush here
> +		printf 0001
> +	} >input &&
> +	test_must_fail env GIT_PROTOCOL=version=2 \
> +		git upload-pack . <input 2>err &&
> +	test_i18ngrep "expected flush after ls-refs arguments" err
> +'
> +
> +test_expect_success 'extra delim packet in v2 fetch args' '
> +	{
> +		packetize command=fetch &&
> +		printf 0001 &&
> +		# protocol expects 0000 flush here
> +		printf 0001
> +	} >input &&
> +	test_must_fail env GIT_PROTOCOL=version=2 \
> +		git upload-pack . <input 2>err &&
> +	test_i18ngrep "expected flush after fetch arguments" err
> +'
>
> +test_done
> diff --git a/upload-pack.c b/upload-pack.c
> index c53249cac1..902d0ad5e1 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -1252,7 +1252,7 @@ static void process_args(struct packet_reader *request,
>  			 struct upload_pack_data *data,
>  			 struct object_array *want_obj)
>  {
> -	while (packet_reader_read(request) != PACKET_READ_FLUSH) {
> +	while (packet_reader_read(request) == PACKET_READ_NORMAL) {
>  		const char *arg = request->line;
>  		const char *p;
>
> @@ -1321,6 +1321,9 @@ static void process_args(struct packet_reader *request,
>  		/* ignore unknown lines maybe? */
>  		die("unexpected line: '%s'", arg);
>  	}
> +
> +	if (request->status != PACKET_READ_FLUSH)
> +		die(_("expected flush after fetch arguments"));
>  }

...and here, too :).

>  static int process_haves(struct oid_array *haves, struct oid_array *common,
> --
> 2.26.0.581.g322f77c0ee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] upload-pack: handle unexpected v2 delim packets
  2020-03-27  8:02 [PATCH 0/2] upload-pack: handle unexpected v2 delim packets Jeff King
  2020-03-27  8:03 ` [PATCH 1/2] test-lib-functions: make packetize() more efficient Jeff King
  2020-03-27  8:03 ` [PATCH 2/2] upload-pack: handle unexpected delim packets Jeff King
@ 2020-03-27 15:18 ` Taylor Blau
  2 siblings, 0 replies; 16+ messages in thread
From: Taylor Blau @ 2020-03-27 15:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Peff,

On Fri, Mar 27, 2020 at 04:02:10AM -0400, Jeff King wrote:
> We saw an upload-pack segfault in the wild today at GitHub. It's caused
> by a client sending bogus v2 protocol lines (a "delim" packet instead of
> a "flush"). So the client is broken and our only option is to break the
> network connection, but we shouldn't segfault while doing so. :)
>
> I don't think the broken client was Git. It didn't send an "agent"
> capability at all, which makes me suspect it was somebody generating the
> request manually (nor was there anything interesting in the transport
> layer agent; it was just an openssh client).
>
> The fix was simple enough, and is in the second patch. The first one is
> just a small cleanup / refactor in preparation.
>
>   [1/2]: test-lib-functions: make packetize() more efficient
>   [2/2]: upload-pack: handle unexpected delim packets
>
>  ls-refs.c                              |  5 +++-
>  t/t5562-http-backend-content-length.sh | 19 +++++++++------
>  t/t5704-protocol-violations.sh         | 33 ++++++++++++++++++++++++++
>  t/test-lib-functions.sh                | 23 ++++++++++++------
>  upload-pack.c                          |  5 +++-
>  5 files changed, 69 insertions(+), 16 deletions(-)
>  create mode 100755 t/t5704-protocol-violations.sh

Thanks. This series looks good to me, and is certainly improving things.

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-27  8:03 ` [PATCH 1/2] test-lib-functions: make packetize() more efficient Jeff King
  2020-03-27 15:16   ` Taylor Blau
@ 2020-03-27 19:18   ` Junio C Hamano
  2020-03-28 11:20     ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-03-27 19:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> The packetize() function takes its input on stdin, and requires 4
> separate sub-processes to format a simple string. We can do much better
> by getting the length via the shell's "${#packet}" construct. The one
> caveat is that the shell can't put a NUL into a variable, so we'll have
> to continue to provide the stdin form for a few calls.

Yuck.  Binary protocol and shell variables do not mix well.

Documentation/CodingGuidelines forbids ${#parameter} in the first
place and we seem to use it only when we know we are using bash.

Perhaps we should start considering to lift it.  I dunno.

> +# convert function arguments or stdin (if not arguments given) to pktline
> +# representation. If multiple arguments are given, they are separated by
> +# whitespace and put in a single packet. Note that data containing NULs must be
> +# given on stdin, and that empty input becomes an empty packet, not a flush
> +# packet (for that you can just print 0000 yourself).

>  packetize() {
> +	if test $# -gt 0
> +	then
> +		packet="$*"
> +		printf '%04x%s' "$((4 + ${#packet}))" "$packet"

This allows 

		packetize "want $hash_head"

to be written like so:

		packetize want "$hash_head"

which maybe is a handy thing to do.

> +	else
> +		cat >packetize.tmp &&
> +		len=$(wc -c <packetize.tmp) &&
> +		printf '%04x' "$(($len + 4))" &&
> +		cat packetize.tmp &&
> +		rm -f packetize.tmp

	perl -e '
		my $data = do { local $?; <STDIN> };
                printf "%04x%s", length($data), $data;
	'

That's one process but much heavier than cat/wc/printf/cat, I guess.

> +	fi
>  }
>  
>  # Parse the input as a series of pktlines, writing the result to stdout.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-27 19:18   ` Junio C Hamano
@ 2020-03-28 11:20     ` Jeff King
  2020-03-29  0:11       ` Junio C Hamano
  2020-03-29 15:02       ` [PATCH] test-lib-functions: simplify packetize() stdin code Jeff King
  0 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2020-03-28 11:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Mar 27, 2020 at 12:18:34PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The packetize() function takes its input on stdin, and requires 4
> > separate sub-processes to format a simple string. We can do much better
> > by getting the length via the shell's "${#packet}" construct. The one
> > caveat is that the shell can't put a NUL into a variable, so we'll have
> > to continue to provide the stdin form for a few calls.
> 
> Yuck.  Binary protocol and shell variables do not mix well.
> 
> Documentation/CodingGuidelines forbids ${#parameter} in the first
> place and we seem to use it only when we know we are using bash.
> 
> Perhaps we should start considering to lift it.  I dunno.

Hmph. I had a vague recollection there but checked beforehand:

 - we do use it in t9010, which is /bin/sh (and have since 2010). I
   thought it might not be run on obscure platforms because it's
   svn-related, but I think it doesn't actually require svn.

 - it's in POSIX, at least as far back as 2004 (I couldn't find an easy
   copy of the 2001 version). That doesn't prove there aren't
   problematic systems, of course, but it at least passes the bar of
   "not even in POSIX".

 - it's not in check-non-portable-shell.pl. :) That doesn't mean
   CodingGuidelines is wrong, but we should probably reconcile them.

Given that the first point means we've had a 10-year weather balloon,
and that the original rule seems to have come from a big list of
rules[1] rather than a specific known problem shell, I think we should
declare it available.

[1] https://lore.kernel.org/git/7vtznzf5jb.fsf@gitster.siamese.dyndns.org/

> >  packetize() {
> > +	if test $# -gt 0
> > +	then
> > +		packet="$*"
> > +		printf '%04x%s' "$((4 + ${#packet}))" "$packet"
> 
> This allows 
> 
> 		packetize "want $hash_head"
> 
> to be written like so:
> 
> 		packetize want "$hash_head"
> 
> which maybe is a handy thing to do.

Yeah. I admit I don't care overly much between that and doing something
else sensible with multiple arguments (like putting each one in its own
packet). I just really didn't want to silently ignore anything after
"$1". This at least behaves like "echo".

> > +	else
> > +		cat >packetize.tmp &&
> > +		len=$(wc -c <packetize.tmp) &&
> > +		printf '%04x' "$(($len + 4))" &&
> > +		cat packetize.tmp &&
> > +		rm -f packetize.tmp
> 
> 	perl -e '
> 		my $data = do { local $?; <STDIN> };
>                 printf "%04x%s", length($data), $data;
> 	'
> 
> That's one process but much heavier than cat/wc/printf/cat, I guess.

Yeah, I was tempted to do that, but ${#packet} is even one process
shorter. :) It might be worth simplifying the stdin path above, but it's
much less important if most of those call-sites go away.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-27 15:16   ` Taylor Blau
@ 2020-03-28 12:25     ` Jeff King
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff King @ 2020-03-28 12:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git

On Fri, Mar 27, 2020 at 09:16:07AM -0600, Taylor Blau wrote:

> >  packetize() {
> > -	cat >packetize.tmp &&
> > -	len=$(wc -c <packetize.tmp) &&
> > -	printf '%04x%s' "$(($len + 4))" &&
> > -	cat packetize.tmp &&
> > -	rm -f packetize.tmp
> > +	if test $# -gt 0
> > +	then
> > +		packet="$*"
> 
> Mentioned off-list in a discussion already, but: though I find this
> behavior of joining multiple arguments by a whitespace character a
> little confusing (i.e., what would callers thing this function does if
> they hadn't read the documentation?) I think that this is probably the
> most sane thing that you could do here.
> 
> On the other hand, nowhere in this patch do we currently call packetize
> with multiple arguments, so perhaps it would be made simpler if we
> instead wrote "$1" anywhere there was "$packet".

Of all the options, I like that the least because somebody doing:

  packetize foo bar

would have the "bar" silently ignored. And it's more lines of code to
check and complain about that than it is to just do something sensible.

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-28 11:20     ` Jeff King
@ 2020-03-29  0:11       ` Junio C Hamano
  2020-03-29  3:05         ` Junio C Hamano
  2020-03-29 14:52         ` Jeff King
  2020-03-29 15:02       ` [PATCH] test-lib-functions: simplify packetize() stdin code Jeff King
  1 sibling, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-03-29  0:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>> Documentation/CodingGuidelines forbids ${#parameter} in the first
>> place and we seem to use it only when we know we are using bash.
>> 
>> Perhaps we should start considering to lift it.  I dunno.
>
> Hmph. I had a vague recollection there but checked beforehand:
>
>  - we do use it in t9010, which is /bin/sh (and have since 2010). I
>    thought it might not be run on obscure platforms because it's
>    svn-related, but I think it doesn't actually require svn.

I do not think I ran git-svn stuff myself for the past 10 years,
though, after a few projects that matter to me migrated away from
svn ;-)

>  - it's in POSIX, at least as far back as 2004 (I couldn't find an easy
>    copy of the 2001 version). That doesn't prove there aren't
>    problematic systems, of course, but it at least passes the bar of
>    "not even in POSIX".

Yeah, IIRC the list was written in response to a request for _some_
guidance, so it largely came from in-house rules of my previous
life, back when I had to deal with various flavours of UNIXen.

>  - it's not in check-non-portable-shell.pl. :) That doesn't mean
>    CodingGuidelines is wrong, but we should probably reconcile them.

That checker came much much later than the guidelines so it is not
surprising at all for it to be "buggy", in the sense that it does
not check everything the guidelines ask.  Yes, we may need bugfixes
and there may be other bugs, too.

> Yeah, I was tempted to do that, but ${#packet} is even one process
> shorter. :) It might be worth simplifying the stdin path above, but it's
> much less important if most of those call-sites go away.

The largest issue (which is not that large, though) I felt with the
approach when I saw the patch for the first time was that we need
the big warning comment before the helper, i.e.

> +# Note that data containing NULs must be given on stdin,...

But after thinking about it a bit more, I think it is probably OK.
I do not think you can assign a string with NUL in it to a variable
and you can use "$(command substitution)" as an argument to the
helper to pass such a string, either (not portably anyway).  If such
a string cannot be made into "$*", ${#packet} won't have to worry
about counting bytes in a string with NUL in it to begin with, so
the above note won't even be necessary (it would have to say "you
cannot pass data containing NULs as arguments---you are welcome to
try, but you won't be able to do so, not portably anyway"), anyway
;-).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-29  0:11       ` Junio C Hamano
@ 2020-03-29  3:05         ` Junio C Hamano
  2020-03-29 14:53           ` Jeff King
  2020-03-29 14:52         ` Jeff King
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2020-03-29  3:05 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Junio C Hamano <gitster@pobox.com> writes:

I guess I didn't give an explicit conclusion in my message.

>>  - it's in POSIX, at least as far back as 2004 (I couldn't find an easy
>>    copy of the 2001 version). That doesn't prove there aren't
>>    problematic systems, of course, but it at least passes the bar of
>>    "not even in POSIX".
>
> Yeah, IIRC the list was written in response to a request for _some_
> guidance, so it largely came from in-house rules of my previous
> life, back when I had to deal with various flavours of UNIXen.

I strongly suspect that most of these historical curiosity systems
died out or learned ${#posix}; I wouldn't at all be surprised if
/bin/sh on Solaris back then was one of the motivating systems that
led to the forbidding of the use of ${#parameter} in our in-house
rules, but luckily, we have written it off as unsalvageable in this
project ;-)

It has been in POSIX long enough, and it is useful at times, so
let's drop it from the list of guidelines (patch?).

>>  - it's not in check-non-portable-shell.pl. :) That doesn't mean
>>    CodingGuidelines is wrong, but we should probably reconcile them.
>
> That checker came much much later than the guidelines so it is not
> surprising at all for it to be "buggy", in the sense that it does
> not check everything the guidelines ask.  Yes, we may need bugfixes
> and there may be other bugs, too.

This still gives us something to keep an eye on.

-- >8 --
Subject: CodingGuidelines: allow ${#posix} == strlen($posix)

The construct has been in POSIX for the past 10+ years, and we have
used in t9xxx (subversion) series of the tests, so we know it is at
least portable across systems that subversion Perl bindings have
been ported to.

Let's loosen the rule; luckily, the check-non-portable-shell script
does not have any rule to find its use, so the only change needed is
a removal of one paragraph from the documentation.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 Documentation/CodingGuidelines | 2 --
 1 file changed, 2 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index ed4e443a3c..390ceece52 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -91,8 +91,6 @@ For shell scripts specifically (not exhaustive):
 
    - No shell arrays.
 
-   - No strlen ${#parameter}.
-
    - No pattern replacement ${parameter/pattern/string}.
 
  - We use Arithmetic Expansion $(( ... )).

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-29  0:11       ` Junio C Hamano
  2020-03-29  3:05         ` Junio C Hamano
@ 2020-03-29 14:52         ` Jeff King
  1 sibling, 0 replies; 16+ messages in thread
From: Jeff King @ 2020-03-29 14:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Mar 28, 2020 at 05:11:59PM -0700, Junio C Hamano wrote:

> >  - we do use it in t9010, which is /bin/sh (and have since 2010). I
> >    thought it might not be run on obscure platforms because it's
> >    svn-related, but I think it doesn't actually require svn.
> 
> I do not think I ran git-svn stuff myself for the past 10 years,
> though, after a few projects that matter to me migrated away from
> svn ;-)

Me either. I don't even run the git-svn tests, usually. t9010 is about
the svn fast-importer, though. We've all been running its tests for
years, even though AFAIK it's not really used for anything.

> The largest issue (which is not that large, though) I felt with the
> approach when I saw the patch for the first time was that we need
> the big warning comment before the helper, i.e.
> 
> > +# Note that data containing NULs must be given on stdin,...
> 
> But after thinking about it a bit more, I think it is probably OK.
> I do not think you can assign a string with NUL in it to a variable
> and you can use "$(command substitution)" as an argument to the
> helper to pass such a string, either (not portably anyway).  If such
> a string cannot be made into "$*", ${#packet} won't have to worry
> about counting bytes in a string with NUL in it to begin with, so
> the above note won't even be necessary (it would have to say "you
> cannot pass data containing NULs as arguments---you are welcome to
> try, but you won't be able to do so, not portably anyway"), anyway
> ;-).

Yes. You could probably screw yourself with:

  packetize "$(printf "0020git-upload-pack\0host=example.com")"

but presumably you'd quickly notice when your test fails (both dash and
bash just drop the NUL byte entirely, and bash even issues a warning).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-29  3:05         ` Junio C Hamano
@ 2020-03-29 14:53           ` Jeff King
  2020-03-29 15:44             ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-03-29 14:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Mar 28, 2020 at 08:05:10PM -0700, Junio C Hamano wrote:

> -- >8 --
> Subject: CodingGuidelines: allow ${#posix} == strlen($posix)

I'm in favor of this patch, but...

> The construct has been in POSIX for the past 10+ years, and we have
> used in t9xxx (subversion) series of the tests, so we know it is at
> least portable across systems that subversion Perl bindings have
> been ported to.

It is even stronger than that. t9010 is run for everybody (with the
exception of the last test, which needs svnadmin).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] test-lib-functions: simplify packetize() stdin code
  2020-03-28 11:20     ` Jeff King
  2020-03-29  0:11       ` Junio C Hamano
@ 2020-03-29 15:02       ` Jeff King
  2020-03-29 15:49         ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Jeff King @ 2020-03-29 15:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Mar 28, 2020 at 07:20:10AM -0400, Jeff King wrote:

> > 	perl -e '
> > 		my $data = do { local $?; <STDIN> };
> >                 printf "%04x%s", length($data), $data;
> > 	'
> > 
> > That's one process but much heavier than cat/wc/printf/cat, I guess.
> 
> Yeah, I was tempted to do that, but ${#packet} is even one process
> shorter. :) It might be worth simplifying the stdin path above, but it's
> much less important if most of those call-sites go away.

Here it is as a patch on top. I doubt it matters that much (there are
only two stdin calls in the whole suite). So I'm not sure if this patch
should be "eh, why not?" or "meh, pointless churn".

-- >8 --
Subject: [PATCH] test-lib-functions: simplify packetize() stdin code

The code path in packetize() for reading stdin needs to handle NUL
bytes, so we can't rely on shell variables. However, the current code
takes a whopping 4 processes and uses a temporary file. We can do this
much more simply and efficiently by using a single perl invocation (and
we already rely on perl in the matching depacketize() function).

We'll keep the non-stdin code path as it is, since that uses zero extra
processes.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib-functions.sh | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 216918a58c..88b7dbd69a 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1373,11 +1373,10 @@ packetize() {
 		packet="$*"
 		printf '%04x%s' "$((4 + ${#packet}))" "$packet"
 	else
-		cat >packetize.tmp &&
-		len=$(wc -c <packetize.tmp) &&
-		printf '%04x' "$(($len + 4))" &&
-		cat packetize.tmp &&
-		rm -f packetize.tmp
+		perl -e '
+			my $packet = do { local $/; <STDIN> };
+			printf "%04x%s", 4 + length($packet), $packet;
+		'
 	fi
 }
 
-- 
2.26.0.581.g322f77c0ee


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] test-lib-functions: make packetize() more efficient
  2020-03-29 14:53           ` Jeff King
@ 2020-03-29 15:44             ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-03-29 15:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Sat, Mar 28, 2020 at 08:05:10PM -0700, Junio C Hamano wrote:
>
>> -- >8 --
>> Subject: CodingGuidelines: allow ${#posix} == strlen($posix)
>
> I'm in favor of this patch, but...
>
>> The construct has been in POSIX for the past 10+ years, and we have
>> used in t9xxx (subversion) series of the tests, so we know it is at
>> least portable across systems that subversion Perl bindings have
>> been ported to.
>
> It is even stronger than that. t9010 is run for everybody (with the
> exception of the last test, which needs svnadmin).

Ah, you're right.  Lemme make it stronger.

    The construct has been in POSIX for the past 10+ years, and we have
    used in t9xxx (subversion) series of the tests, so we know it is at
    portable across systems that people have run those tests, which is
    almost everything we'd care about.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] test-lib-functions: simplify packetize() stdin code
  2020-03-29 15:02       ` [PATCH] test-lib-functions: simplify packetize() stdin code Jeff King
@ 2020-03-29 15:49         ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-03-29 15:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Here it is as a patch on top. I doubt it matters that much (there are
> only two stdin calls in the whole suite). So I'm not sure if this patch
> should be "eh, why not?" or "meh, pointless churn".

Me, neither. ;-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-29 15:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27  8:02 [PATCH 0/2] upload-pack: handle unexpected v2 delim packets Jeff King
2020-03-27  8:03 ` [PATCH 1/2] test-lib-functions: make packetize() more efficient Jeff King
2020-03-27 15:16   ` Taylor Blau
2020-03-28 12:25     ` Jeff King
2020-03-27 19:18   ` Junio C Hamano
2020-03-28 11:20     ` Jeff King
2020-03-29  0:11       ` Junio C Hamano
2020-03-29  3:05         ` Junio C Hamano
2020-03-29 14:53           ` Jeff King
2020-03-29 15:44             ` Junio C Hamano
2020-03-29 14:52         ` Jeff King
2020-03-29 15:02       ` [PATCH] test-lib-functions: simplify packetize() stdin code Jeff King
2020-03-29 15:49         ` Junio C Hamano
2020-03-27  8:03 ` [PATCH 2/2] upload-pack: handle unexpected delim packets Jeff King
2020-03-27 15:17   ` Taylor Blau
2020-03-27 15:18 ` [PATCH 0/2] upload-pack: handle unexpected v2 " Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).