git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/9] reducing memory allocations for v2 servers
@ 2021-09-14 15:29 Jeff King
  2021-09-14 15:30 ` [PATCH 1/9] serve: rename is_command() to parse_command() Jeff King
                   ` (12 more replies)
  0 siblings, 13 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:29 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

While looking at [1], I noticed that v2 servers will read a few bits of
client input into strvecs. Even though we expect these to be small-ish,
there's nothing preventing a client from sending us a bunch of junk and
wasting memory.

This series changes that, putting a cap on how much data we'll receive.
The two spots are the "capabilities" list we receive (before we even
dispatch to a particular command like ls-refs or fetch), and the
ref-prefix list we receive for ls-refs.

To deal with the capabilities issue, we'll just handle each capability
line as it comes in, rather than storing a list. This requires a bit of
cleanup, which is why it takes up the first 6 patches. It also needs a
few other cleanups from ab/serve-cleanup (and so this series is based on
that). The dependencies there are both textual (e.g., using designated
initializers in the capabilities table) and functional (dropping the
"keys" parameter from v2 command() functions).

Patch 7 fixes the ls-refs issue by degrading when we see too many
prefixes (which works because the capability is optional in the first
place).

The final two patches aren't strictly necessary. They're a tightening of
the protocol against some bogus input that I noticed while doing the
earlier cleanups. That bogus input isn't really hurting anything, but I
think it makes sense to reject nonsense from the client rather than
ignoring it. The obvious risk is that some client for some reason would
send us nonsense, but I don't think Git ever has.

Once again, these are based on ab/serve-cleanup, which is currently in
next.

  [1/9]: serve: rename is_command() to parse_command()
  [2/9]: serve: return capability "value" from get_capability()
  [3/9]: serve: add "receive" method for v2 capabilities table
  [4/9]: serve: provide "receive" function for object-format capability
  [5/9]: serve: provide "receive" function for session-id capability
  [6/9]: serve: drop "keys" strvec
  [7/9]: ls-refs: ignore very long ref-prefix counts
  [8/9]: serve: reject bogus v2 "command=ls-refs=foo"
  [9/9]: serve: reject commands used as capabilities

 ls-refs.c            |  19 ++++++-
 serve.c              | 120 +++++++++++++++++++++++--------------------
 t/t5701-git-serve.sh |  60 ++++++++++++++++++++++
 3 files changed, 142 insertions(+), 57 deletions(-)

-Peff

[1] https://lore.kernel.org/git/YT54CNYgtGcqexwq@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 1/9] serve: rename is_command() to parse_command()
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
@ 2021-09-14 15:30 ` Jeff King
  2021-09-14 15:30 ` [PATCH 2/9] serve: return capability "value" from get_capability() Jeff King
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:30 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

The is_command() function not only tells us whether the pktline is a
valid command string, but it also parses out the command (and complains
if we see a duplicate). Let's rename it to make those extra functions
a bit more obvious.

Signed-off-by: Jeff King <peff@peff.net>
---
Obviously not strictly necessary, but the name caused me a fair bit of
confusion at first while touching this code.

 serve.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/serve.c b/serve.c
index 1817edc7f5..fd88b95343 100644
--- a/serve.c
+++ b/serve.c
@@ -163,7 +163,7 @@ static int is_valid_capability(const char *key)
 	return c && c->advertise(the_repository, NULL);
 }
 
-static int is_command(const char *key, struct protocol_capability **command)
+static int parse_command(const char *key, struct protocol_capability **command)
 {
 	const char *out;
 
@@ -251,7 +251,7 @@ static int process_request(void)
 			BUG("Should have already died when seeing EOF");
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
-			if (is_command(reader.line, &command) ||
+			if (parse_command(reader.line, &command) ||
 			    is_valid_capability(reader.line))
 				strvec_push(&keys, reader.line);
 			else
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 2/9] serve: return capability "value" from get_capability()
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
  2021-09-14 15:30 ` [PATCH 1/9] serve: rename is_command() to parse_command() Jeff King
@ 2021-09-14 15:30 ` Jeff King
  2021-09-14 15:31 ` [PATCH 3/9] serve: add "receive" method for v2 capabilities table Jeff King
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:30 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

When the client sends v2 capabilities, they may be simple, like:

  foo

or have a value like:

  foo=bar

(all of the current capabilities actually expect a value, but the
protocol allows for boolean ones).

We use get_capability() to make sure the client's pktline matches a
capability. In doing so, we parse enough to see the "=" and the value
(if any), but we immediately forget it. Nobody cares for now, because they end
up parsing the values out later using has_capability(). But in
preparation for changing that, let's pass back a pointer so the callers
know what we found.

Note that unlike has_capability(), we'll return NULL for a "simple"
capability. Distinguishing these will be useful for some future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
We get rid of has_capability() later, so the inconsistency in return
types will go away.

 serve.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/serve.c b/serve.c
index fd88b95343..78a4e83554 100644
--- a/serve.c
+++ b/serve.c
@@ -139,7 +139,7 @@ void protocol_v2_advertise_capabilities(void)
 	strbuf_release(&value);
 }
 
-static struct protocol_capability *get_capability(const char *key)
+static struct protocol_capability *get_capability(const char *key, const char **value)
 {
 	int i;
 
@@ -149,16 +149,25 @@ static struct protocol_capability *get_capability(const char *key)
 	for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
 		struct protocol_capability *c = &capabilities[i];
 		const char *out;
-		if (skip_prefix(key, c->name, &out) && (!*out || *out == '='))
+		if (!skip_prefix(key, c->name, &out))
+			continue;
+		if (!*out) {
+			*value = NULL;
 			return c;
+		}
+		if (*out++ == '=') {
+			*value = out;
+			return c;
+		}
 	}
 
 	return NULL;
 }
 
 static int is_valid_capability(const char *key)
 {
-	const struct protocol_capability *c = get_capability(key);
+	const char *value;
+	const struct protocol_capability *c = get_capability(key, &value);
 
 	return c && c->advertise(the_repository, NULL);
 }
@@ -168,7 +177,8 @@ static int parse_command(const char *key, struct protocol_capability **command)
 	const char *out;
 
 	if (skip_prefix(key, "command=", &out)) {
-		struct protocol_capability *cmd = get_capability(out);
+		const char *value;
+		struct protocol_capability *cmd = get_capability(out, &value);
 
 		if (*command)
 			die("command '%s' requested after already requesting command '%s'",
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 3/9] serve: add "receive" method for v2 capabilities table
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
  2021-09-14 15:30 ` [PATCH 1/9] serve: rename is_command() to parse_command() Jeff King
  2021-09-14 15:30 ` [PATCH 2/9] serve: return capability "value" from get_capability() Jeff King
@ 2021-09-14 15:31 ` Jeff King
  2021-09-14 15:31 ` [PATCH 4/9] serve: provide "receive" function for object-format capability Jeff King
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:31 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

We have a capabilities table that tells us what we should tell the
client we are capable of, and what to do when a client gives us a
particular command (e.g., "command=ls-refs"). But it doesn't tell us
what to do when the client sends us back a capability (e.g.,
"object-format=sha256"). We just collect them all in a strvec and hope
somebody can use them later.

Instead, let's provide a function pointer in the table to act on these.
This will eventually help us avoid collecting the strings, which will be
more efficient and less prone to mischief.

Using the new method is optional, which helps in two ways:

  - we can move existing capabilities over to this new system gradually
    in individual commits

  - some capabilities we don't actually do anything with anyway. For
    example, the client is free to say "agent=git/1.2.3" to us, but we
    do not act on the information in any way.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/serve.c b/serve.c
index 78a4e83554..a161189984 100644
--- a/serve.c
+++ b/serve.c
@@ -70,6 +70,16 @@ struct protocol_capability {
 	 * This field should be NULL for capabilities which are not commands.
 	 */
 	int (*command)(struct repository *r, struct packet_reader *request);
+
+	/*
+	 * Function called when a client requests the capability as a
+	 * non-command. This may be NULL if the capability does nothing.
+	 *
+	 * For a capability of the form "foo=bar", the value string points to
+	 * the content after the "=" (i.e., "bar"). For simple capabilities
+	 * (just "foo"), it is NULL.
+	 */
+	void (*receive)(struct repository *r, const char *value);
 };
 
 static struct protocol_capability capabilities[] = {
@@ -164,12 +174,17 @@ static struct protocol_capability *get_capability(const char *key, const char **
 	return NULL;
 }
 
-static int is_valid_capability(const char *key)
+static int receive_client_capability(const char *key)
 {
 	const char *value;
 	const struct protocol_capability *c = get_capability(key, &value);
 
-	return c && c->advertise(the_repository, NULL);
+	if (!c || !c->advertise(the_repository, NULL))
+		return 0;
+
+	if (c->receive)
+		c->receive(the_repository, value);
+	return 1;
 }
 
 static int parse_command(const char *key, struct protocol_capability **command)
@@ -262,7 +277,7 @@ static int process_request(void)
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
 			if (parse_command(reader.line, &command) ||
-			    is_valid_capability(reader.line))
+			    receive_client_capability(reader.line))
 				strvec_push(&keys, reader.line);
 			else
 				die("unknown capability '%s'", reader.line);
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 4/9] serve: provide "receive" function for object-format capability
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (2 preceding siblings ...)
  2021-09-14 15:31 ` [PATCH 3/9] serve: add "receive" method for v2 capabilities table Jeff King
@ 2021-09-14 15:31 ` Jeff King
  2021-09-14 18:59   ` Martin Ågren
  2021-09-14 15:33 ` [PATCH 5/9] serve: provide "receive" function for session-id capability Jeff King
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:31 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

We get any "object-format" specified by the client by searching for it
in the collected list of capabilities the client sent. We can instead
just handle it as soon as they send it. This is slightly more efficient,
and gets us one step closer to dropping that collected list.

Note that we do still have to do our final hash check after receiving
all capabilities (because they might not have sent an object-format line
at all, and we still have to check that the default matches our
repository algorithm). Since the check_algorithm() function would now be
done to a single if() statement, I've just inlined it in its only
caller.

There should be no change of behavior here, except for two
broken-protocol cases:

  - if the client sends multiple conflicting object-format capabilities
    (which they should not), we'll now choose the last one rather than
    the first. We could also detect and complain about the duplicates
    quite easily now, which we could not before, but I didn't do so
    here.

  - if the client sends a bogus "object-format" with no equals sign,
    we'll now say so, rather than "unknown object format: ''"

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/serve.c b/serve.c
index a161189984..f6ea2953eb 100644
--- a/serve.c
+++ b/serve.c
@@ -10,6 +10,7 @@
 #include "upload-pack.h"
 
 static int advertise_sid = -1;
+static int client_hash_algo = GIT_HASH_SHA1;
 
 static int always_advertise(struct repository *r,
 			    struct strbuf *value)
@@ -33,6 +34,17 @@ static int object_format_advertise(struct repository *r,
 	return 1;
 }
 
+static void object_format_receive(struct repository *r,
+				  const char *algo_name)
+{
+	if (!algo_name)
+		die("object-format capability requires an argument");
+
+	client_hash_algo = hash_algo_by_name(algo_name);
+	if (client_hash_algo == GIT_HASH_UNKNOWN)
+		die("unknown object format '%s'", algo_name);
+}
+
 static int session_id_advertise(struct repository *r, struct strbuf *value)
 {
 	if (advertise_sid == -1 &&
@@ -104,6 +116,7 @@ static struct protocol_capability capabilities[] = {
 	{
 		.name = "object-format",
 		.advertise = object_format_advertise,
+		.receive = object_format_receive,
 	},
 	{
 		.name = "session-id",
@@ -228,22 +241,6 @@ static int has_capability(const struct strvec *keys, const char *capability,
 	return 0;
 }
 
-static void check_algorithm(struct repository *r, struct strvec *keys)
-{
-	int client = GIT_HASH_SHA1, server = hash_algo_by_ptr(r->hash_algo);
-	const char *algo_name;
-
-	if (has_capability(keys, "object-format", &algo_name)) {
-		client = hash_algo_by_name(algo_name);
-		if (client == GIT_HASH_UNKNOWN)
-			die("unknown object format '%s'", algo_name);
-	}
-
-	if (client != server)
-		die("mismatched object format: server %s; client %s\n",
-		    r->hash_algo->name, hash_algos[client].name);
-}
-
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -317,7 +314,10 @@ static int process_request(void)
 	if (!command)
 		die("no command requested");
 
-	check_algorithm(the_repository, &keys);
+	if (client_hash_algo != hash_algo_by_ptr(the_repository->hash_algo))
+		die("mismatched object format: server %s; client %s\n",
+		    the_repository->hash_algo->name,
+		    hash_algos[client_hash_algo].name);
 
 	if (has_capability(&keys, "session-id", &client_sid))
 		trace2_data_string("transfer", NULL, "client-sid", client_sid);
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 5/9] serve: provide "receive" function for session-id capability
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (3 preceding siblings ...)
  2021-09-14 15:31 ` [PATCH 4/9] serve: provide "receive" function for object-format capability Jeff King
@ 2021-09-14 15:33 ` Jeff King
  2021-09-14 16:55   ` Taylor Blau
  2021-09-14 19:02   ` Martin Ågren
  2021-09-14 15:33 ` [PATCH 6/9] serve: drop "keys" strvec Jeff King
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:33 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

Rather than pulling the session-id string from the list of collected
capabilities, we can handle it as soon as we receive it. This gets us
closer to dropping the collected list entirely.

The behavior should be the same, with one exception. Previously if the
client sent us multiple session-id lines, we'd report only the first.
Now we'll pass each one along to trace2. This shouldn't matter in
practice, since clients shouldn't do that (and if they do, it's probably
sensible to log them all).

As this removes the last caller of the static has_capability(), we can
remove it, as well (and in fact we must to avoid -Wunused-function
complaining).

Signed-off-by: Jeff King <peff@peff.net>
---
I had originally dropped has_capability() in a separate patch, to keep
this one more readable. That breaks bisectability, but only with
-Werror. I'm not sure where we should fall on that spectrum (I generally
bisect with -Wno-error just because warnings may come and go when
working with different compilers than what was normal at the time).

Not that big a deal either way for this patch, but I wonder if people
have opinions in general.

 serve.c | 33 +++++++++------------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/serve.c b/serve.c
index f6ea2953eb..6bbf54cbbe 100644
--- a/serve.c
+++ b/serve.c
@@ -57,6 +57,14 @@ static int session_id_advertise(struct repository *r, struct strbuf *value)
 	return 1;
 }
 
+static void session_id_receive(struct repository *r,
+			       const char *client_sid)
+{
+	if (!client_sid)
+		client_sid = "";
+	trace2_data_string("transfer", NULL, "client-sid", client_sid);
+}
+
 struct protocol_capability {
 	/*
 	 * The name of the capability.  The server uses this name when
@@ -121,6 +129,7 @@ static struct protocol_capability capabilities[] = {
 	{
 		.name = "session-id",
 		.advertise = session_id_advertise,
+		.receive = session_id_receive,
 	},
 	{
 		.name = "object-info",
@@ -221,26 +230,6 @@ static int parse_command(const char *key, struct protocol_capability **command)
 	return 0;
 }
 
-static int has_capability(const struct strvec *keys, const char *capability,
-			  const char **value)
-{
-	int i;
-	for (i = 0; i < keys->nr; i++) {
-		const char *out;
-		if (skip_prefix(keys->v[i], capability, &out) &&
-		    (!*out || *out == '=')) {
-			if (value) {
-				if (*out == '=')
-					out++;
-				*value = out;
-			}
-			return 1;
-		}
-	}
-
-	return 0;
-}
-
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -252,7 +241,6 @@ static int process_request(void)
 	struct packet_reader reader;
 	struct strvec keys = STRVEC_INIT;
 	struct protocol_capability *command = NULL;
-	const char *client_sid;
 
 	packet_reader_init(&reader, 0, NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -319,9 +307,6 @@ static int process_request(void)
 		    the_repository->hash_algo->name,
 		    hash_algos[client_hash_algo].name);
 
-	if (has_capability(&keys, "session-id", &client_sid))
-		trace2_data_string("transfer", NULL, "client-sid", client_sid);
-
 	command->command(the_repository, &reader);
 
 	strvec_clear(&keys);
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 6/9] serve: drop "keys" strvec
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (4 preceding siblings ...)
  2021-09-14 15:33 ` [PATCH 5/9] serve: provide "receive" function for session-id capability Jeff King
@ 2021-09-14 15:33 ` Jeff King
  2021-09-14 16:59   ` Taylor Blau
  2021-09-14 15:37 ` [PATCH 7/9] ls-refs: ignore very long ref-prefix counts Jeff King
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:33 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

We collect the set of capabilities the client sends us in a strvec.
While this is usually small, there's no limit to the number of
capabilities the client can send us (e.g., they could just send us
"agent" pkt-lines over and over, and we'd keep adding them to the list).

Since all code has been converted away from using this list, let's get
rid of it. This avoids a potential attack where clients waste our
memory.

Note that we do have to replace it with a flag, because some of the
flush-packet logic checks whether we've seen any valid commands or keys.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/serve.c b/serve.c
index 6bbf54cbbe..baa0a17502 100644
--- a/serve.c
+++ b/serve.c
@@ -239,7 +239,7 @@ static int process_request(void)
 {
 	enum request_state state = PROCESS_REQUEST_KEYS;
 	struct packet_reader reader;
-	struct strvec keys = STRVEC_INIT;
+	int seen_capability_or_command = 0;
 	struct protocol_capability *command = NULL;
 
 	packet_reader_init(&reader, 0, NULL, 0,
@@ -263,10 +263,11 @@ static int process_request(void)
 			/* collect request; a sequence of keys and values */
 			if (parse_command(reader.line, &command) ||
 			    receive_client_capability(reader.line))
-				strvec_push(&keys, reader.line);
+				seen_capability_or_command = 1;
 			else
 				die("unknown capability '%s'", reader.line);
 
+
 			/* Consume the peeked line */
 			packet_reader_read(&reader);
 			break;
@@ -275,7 +276,7 @@ static int process_request(void)
 			 * If no command and no keys were given then the client
 			 * wanted to terminate the connection.
 			 */
-			if (!keys.nr)
+			if (!seen_capability_or_command)
 				return 1;
 
 			/*
@@ -309,7 +310,6 @@ static int process_request(void)
 
 	command->command(the_repository, &reader);
 
-	strvec_clear(&keys);
 	return 0;
 }
 
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (5 preceding siblings ...)
  2021-09-14 15:33 ` [PATCH 6/9] serve: drop "keys" strvec Jeff King
@ 2021-09-14 15:37 ` Jeff King
  2021-09-14 17:18   ` Taylor Blau
                     ` (3 more replies)
  2021-09-14 15:37 ` [PATCH 8/9] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
                   ` (5 subsequent siblings)
  12 siblings, 4 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:37 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

Because each "ref-prefix" capability from the client comes in its own
pkt-line, there's no limit to the number of them that a misbehaving
client may send. We read them all into a strvec, which means the client
can waste arbitrary amounts of our memory by just sending us "ref-prefix
foo" over and over.

One possible solution is to just drop the connection when the limit is
reached. If we set it high enough, then only misbehaving or malicious
clients would hit it. But "high enough" is vague, and it's unfriendly if
we guess wrong and a legitimate client hits this.

But we can do better. Since supporting the ref-prefix capability is
optional anyway, the client has to further cull the response based on
their own patterns. So we can simply ignore the patterns once we cross a
certain threshold. Note that we have to ignore _all_ patterns, not just
the ones past our limit (since otherwise we'd send too little data).

The limit here is fairly arbitrary, and probably much higher than anyone
would need in practice. It might be worth limiting it further, if only
because we check it linearly (so with "m" local refs and "n" patterns,
we do "m * n" string comparisons). But if we care about optimizing this,
an even better solution may be a more advanced data structure anyway.

I didn't bother making the limit configurable, since it's so high and
since Git should behave correctly in either case. It wouldn't be too
hard to do, but it makes both the code and documentation more complex.

Signed-off-by: Jeff King <peff@peff.net>
---
We're perhaps bending "optional" a little here. The client does know if
we said "yes, we support ref-prefix" and until now, that meant they
could trust us to cull. But no version of Git has ever relied on that
(we tell the transport code "if you can limit by these prefixes, go for
it" but then just post-process the result).

The other option is that we could just say "no, you're sending too many
prefixes" and hangup. This seemed friendlier to me (though either way, I
really find it quite unlikely anybody would legitimately hit this
limit).

 ls-refs.c            | 19 +++++++++++++++++--
 t/t5701-git-serve.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index a1a0250607..839fb0caa9 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -40,6 +40,12 @@ static void ensure_config_read(void)
 	config_read = 1;
 }
 
+/*
+ * The maximum number of "ref-prefix" lines we'll allow the client to send.
+ * If they go beyond this, we'll avoid using the prefix feature entirely.
+ */
+#define MAX_ALLOWED_PREFIXES 65536
+
 /*
  * Check if one of the prefixes is a prefix of the ref.
  * If no prefixes were provided, all refs match.
@@ -141,6 +147,7 @@ static int ls_refs_config(const char *var, const char *value, void *data)
 int ls_refs(struct repository *r, struct packet_reader *request)
 {
 	struct ls_refs_data data;
+	int too_many_prefixes = 0;
 
 	memset(&data, 0, sizeof(data));
 	strvec_init(&data.prefixes);
@@ -156,8 +163,16 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 			data.peel = 1;
 		else if (!strcmp("symrefs", arg))
 			data.symrefs = 1;
-		else if (skip_prefix(arg, "ref-prefix ", &out))
-			strvec_push(&data.prefixes, out);
+		else if (skip_prefix(arg, "ref-prefix ", &out)) {
+			if (too_many_prefixes) {
+				/* ignore any further ones */
+			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
+				strvec_clear(&data.prefixes);
+				too_many_prefixes = 1;
+			} else {
+				strvec_push(&data.prefixes, out);
+			}
+		}
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
 	}
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 930721f053..b095bfa0ac 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
 	test_cmp expect actual
 '
 
+test_expect_success 'ignore very large set of prefixes' '
+	# generate a large number of ref-prefixes that we expect
+	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
+	# from ls-refs.c.
+	{
+		echo command=ls-refs &&
+		echo object-format=$(test_oid algo)
+		echo 0001 &&
+		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
+		echo 0000
+	} |
+	test-tool pkt-line pack >in &&
+
+	# and then confirm that we see unmatched prefixes anyway (i.e.,
+	# that the prefix was not applied).
+	cat >expect <<-EOF &&
+	$(git rev-parse HEAD) HEAD
+	$(git rev-parse refs/heads/dev) refs/heads/dev
+	$(git rev-parse refs/heads/main) refs/heads/main
+	$(git rev-parse refs/heads/release) refs/heads/release
+	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
+	$(git rev-parse refs/tags/one) refs/tags/one
+	$(git rev-parse refs/tags/two) refs/tags/two
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 8/9] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (6 preceding siblings ...)
  2021-09-14 15:37 ` [PATCH 7/9] ls-refs: ignore very long ref-prefix counts Jeff King
@ 2021-09-14 15:37 ` Jeff King
  2021-09-14 17:21   ` Taylor Blau
  2021-09-14 15:37 ` [PATCH 9/9] serve: reject commands used as capabilities Jeff King
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:37 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

When we see a line from the client like "command=ls-refs", we parse
everything after the equals sign as a capability, which we check against
our capabilities table. If we don't recognize the command (e.g.,
"command=foo"), we'll reject it. But we use the same parser that checks
for regular capabilities like "object-format=sha256". And so we'll
accept "ls-refs=foo", even though everything after the equals is bogus,
and simply ignored.

This isn't really hurting anything, but the request does violate the
spec. Let's tighten it up to prevent any surprising behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c              |  2 +-
 t/t5701-git-serve.sh | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/serve.c b/serve.c
index baa0a17502..123abbaa83 100644
--- a/serve.c
+++ b/serve.c
@@ -220,7 +220,7 @@ static int parse_command(const char *key, struct protocol_capability **command)
 		if (*command)
 			die("command '%s' requested after already requesting command '%s'",
 			    out, (*command)->name);
-		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command)
+		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command || value)
 			die("invalid command '%s'", out);
 
 		*command = cmd;
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index b095bfa0ac..ebb41657ab 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -72,6 +72,16 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'requested command is command=value' '
+	test-tool pkt-line pack >in <<-\EOF &&
+	command=ls-refs=whatever
+	object-format=$(test_oid algo)
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep invalid.command.*ls-refs=whatever err
+'
+
 test_expect_success 'wrong object-format' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
-- 
2.33.0.887.g5b1f44e68d


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 9/9] serve: reject commands used as capabilities
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (7 preceding siblings ...)
  2021-09-14 15:37 ` [PATCH 8/9] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
@ 2021-09-14 15:37 ` Jeff King
  2021-09-14 17:30 ` [PATCH 0/9] reducing memory allocations for v2 servers Taylor Blau
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 15:37 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

Our table of v2 "capabilities" contains everything we might tell the
client we support. But there are differences in how we expect the client
to respond. Some of the entries are true capabilities (i.e., we expect
the client to say "yes, I support this"), and some are ones we expect
them to send as commands (with "command=ls-refs" or similar).

When we receive a capability used as a command, we complain about that.
But when we receive a command used as a capability (e.g., just "ls-refs"
in a pkt-line by itself), we silently ignore it.

This isn't really hurting anything (clients shouldn't send it, and we'll
ignore it), but we can tighten up the protocol to match what we expect
to happen.

There are two new tests here. The first one checks a capability used as
a command, which already passes. The second tests a command as a
capability, which this patch fixes.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c              |  2 +-
 t/t5701-git-serve.sh | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/serve.c b/serve.c
index 123abbaa83..9149fbb2f2 100644
--- a/serve.c
+++ b/serve.c
@@ -201,7 +201,7 @@ static int receive_client_capability(const char *key)
 	const char *value;
 	const struct protocol_capability *c = get_capability(key, &value);
 
-	if (!c || !c->advertise(the_repository, NULL))
+	if (!c || c->command || !c->advertise(the_repository, NULL))
 		return 0;
 
 	if (c->receive)
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index ebb41657ab..209122b747 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -72,6 +72,25 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'request capability as command' '
+	test-tool pkt-line pack >in <<-\EOF &&
+	command=agent
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep invalid.command.*agent err
+'
+
+test_expect_success 'request command as capability' '
+	test-tool pkt-line pack >in <<-\EOF &&
+	command=ls-refs
+	fetch
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep unknown.capability err
+'
+
 test_expect_success 'requested command is command=value' '
 	test-tool pkt-line pack >in <<-\EOF &&
 	command=ls-refs=whatever
-- 
2.33.0.887.g5b1f44e68d

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 5/9] serve: provide "receive" function for session-id capability
  2021-09-14 15:33 ` [PATCH 5/9] serve: provide "receive" function for session-id capability Jeff King
@ 2021-09-14 16:55   ` Taylor Blau
  2021-09-14 17:06     ` Jeff King
  2021-09-14 19:02   ` Martin Ågren
  1 sibling, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-09-14 16:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:33:09AM -0400, Jeff King wrote:
> I had originally dropped has_capability() in a separate patch, to keep
> this one more readable. That breaks bisectability, but only with
> -Werror. I'm not sure where we should fall on that spectrum (I generally
> bisect with -Wno-error just because warnings may come and go when
> working with different compilers than what was normal at the time).

I tend to fall the same way, especially when bisecting things in ancient
(to me) versions of Git where my current compiler complains. So I think
the approach that you took here is just fine.

(This patch, as with all leading up to it, looks good to me.)

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 6/9] serve: drop "keys" strvec
  2021-09-14 15:33 ` [PATCH 6/9] serve: drop "keys" strvec Jeff King
@ 2021-09-14 16:59   ` Taylor Blau
  2021-09-14 17:16     ` Jeff King
  0 siblings, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-09-14 16:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:33:18AM -0400, Jeff King wrote:
> We collect the set of capabilities the client sends us in a strvec.
> While this is usually small, there's no limit to the number of
> capabilities the client can send us (e.g., they could just send us
> "agent" pkt-lines over and over, and we'd keep adding them to the list).
>
> Since all code has been converted away from using this list, let's get
> rid of it. This avoids a potential attack where clients waste our
> memory.

...because now we only bother to tell capabilities about information the
client sent as it happened, instead of accumulating an unbounded set of
strings together into a single strvec.

> Note that we do have to replace it with a flag, because some of the
> flush-packet logic checks whether we've seen any valid commands or keys.

Makes sense.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  serve.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/serve.c b/serve.c
> index 6bbf54cbbe..baa0a17502 100644
> --- a/serve.c
> +++ b/serve.c
> @@ -239,7 +239,7 @@ static int process_request(void)
>  {
>  	enum request_state state = PROCESS_REQUEST_KEYS;
>  	struct packet_reader reader;
> -	struct strvec keys = STRVEC_INIT;
> +	int seen_capability_or_command = 0;
>  	struct protocol_capability *command = NULL;
>
>  	packet_reader_init(&reader, 0, NULL, 0,
> @@ -263,10 +263,11 @@ static int process_request(void)
>  			/* collect request; a sequence of keys and values */
>  			if (parse_command(reader.line, &command) ||
>  			    receive_client_capability(reader.line))
> -				strvec_push(&keys, reader.line);
> +				seen_capability_or_command = 1;
>  			else
>  				die("unknown capability '%s'", reader.line);
>
> +

Nit; unnecessary whitespace change (but obviously not worth a re-roll on
its own).

>  			/* Consume the peeked line */
>  			packet_reader_read(&reader);
>  			break;
> @@ -275,7 +276,7 @@ static int process_request(void)
>  			 * If no command and no keys were given then the client
>  			 * wanted to terminate the connection.
>  			 */
> -			if (!keys.nr)
> +			if (!seen_capability_or_command)
>  				return 1;
>
>  			/*
> @@ -309,7 +310,6 @@ static int process_request(void)
>
>  	command->command(the_repository, &reader);
>
> -	strvec_clear(&keys);
>  	return 0;
>  }
>
> --
> 2.33.0.887.g5b1f44e68d

The rest of this change looks obviously good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 5/9] serve: provide "receive" function for session-id capability
  2021-09-14 16:55   ` Taylor Blau
@ 2021-09-14 17:06     ` Jeff King
  2021-09-14 17:12       ` Taylor Blau
  0 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 17:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 12:55:01PM -0400, Taylor Blau wrote:

> On Tue, Sep 14, 2021 at 11:33:09AM -0400, Jeff King wrote:
> > I had originally dropped has_capability() in a separate patch, to keep
> > this one more readable. That breaks bisectability, but only with
> > -Werror. I'm not sure where we should fall on that spectrum (I generally
> > bisect with -Wno-error just because warnings may come and go when
> > working with different compilers than what was normal at the time).
> 
> I tend to fall the same way, especially when bisecting things in ancient
> (to me) versions of Git where my current compiler complains. So I think
> the approach that you took here is just fine.

To be clear, the approach here is conservative: it will bisect even with
-Werror. I think what you're saying is I _could_ have done the other
approach, and put the removal into its own commit?

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 5/9] serve: provide "receive" function for session-id capability
  2021-09-14 17:06     ` Jeff King
@ 2021-09-14 17:12       ` Taylor Blau
  0 siblings, 0 replies; 77+ messages in thread
From: Taylor Blau @ 2021-09-14 17:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 01:06:42PM -0400, Jeff King wrote:
> On Tue, Sep 14, 2021 at 12:55:01PM -0400, Taylor Blau wrote:
>
> > On Tue, Sep 14, 2021 at 11:33:09AM -0400, Jeff King wrote:
> > > I had originally dropped has_capability() in a separate patch, to keep
> > > this one more readable. That breaks bisectability, but only with
> > > -Werror. I'm not sure where we should fall on that spectrum (I generally
> > > bisect with -Wno-error just because warnings may come and go when
> > > working with different compilers than what was normal at the time).
> >
> > I tend to fall the same way, especially when bisecting things in ancient
> > (to me) versions of Git where my current compiler complains. So I think
> > the approach that you took here is just fine.
>
> To be clear, the approach here is conservative: it will bisect even with
> -Werror. I think what you're saying is I _could_ have done the other
> approach, and put the removal into its own commit?

Yes. I would have been fine with dropping has_capability() in its own
patch, since the result would have been more readable (and since I never
bisect with `-Werror`). But this (the conservative approach of
persevering bisect-ability even with `-Werror` is fine with me, too).

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 6/9] serve: drop "keys" strvec
  2021-09-14 16:59   ` Taylor Blau
@ 2021-09-14 17:16     ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 17:16 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 12:59:18PM -0400, Taylor Blau wrote:

> On Tue, Sep 14, 2021 at 11:33:18AM -0400, Jeff King wrote:
> > We collect the set of capabilities the client sends us in a strvec.
> > While this is usually small, there's no limit to the number of
> > capabilities the client can send us (e.g., they could just send us
> > "agent" pkt-lines over and over, and we'd keep adding them to the list).
> >
> > Since all code has been converted away from using this list, let's get
> > rid of it. This avoids a potential attack where clients waste our
> > memory.
> 
> ...because now we only bother to tell capabilities about information the
> client sent as it happened, instead of accumulating an unbounded set of
> strings together into a single strvec.

Right. By the way, one thing I noticed while working on this (that is
unchanged by my series) is that some capabilities are ignored. That make
sense for "agent", where the primary goal is telling the client about
our version (and letting them speak their version, which might be
helpful for debug logs but doesn't impact the protocol itself). But we
do nothing at all with "server-option".

The client wouldn't send one by default, but we do support "git fetch
--server-option". So I assume that it would be a sort of "I'll tell you
this string, and you might feed it to hooks, etc" function, like
push-options. But we just throw it away (well, before my series we
stuffed it into a strvec that nobody looked at).

So possibly we could stop advertising it, but I wonder if any clients
would get unhappy.

I also wonder if jgit uses it. Grepping in jgit.git shows that it is
recorded but I don't see anybody doing anything useful with it; but
maybe some user of the library does.

> > @@ -263,10 +263,11 @@ static int process_request(void)
> >  			/* collect request; a sequence of keys and values */
> >  			if (parse_command(reader.line, &command) ||
> >  			    receive_client_capability(reader.line))
> > -				strvec_push(&keys, reader.line);
> > +				seen_capability_or_command = 1;
> >  			else
> >  				die("unknown capability '%s'", reader.line);
> >
> > +
> 
> Nit; unnecessary whitespace change (but obviously not worth a re-roll on
> its own).

Thanks. While this section ended up quite simple, it got rebased a whole
lot of times as I figured out all of quirks that led to the final
protocol-tightening patches. :)

I don't know how I missed the extra line on my final read-through.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 15:37 ` [PATCH 7/9] ls-refs: ignore very long ref-prefix counts Jeff King
@ 2021-09-14 17:18   ` Taylor Blau
  2021-09-14 17:27     ` Jeff King
  2021-09-14 17:23   ` Jeff King
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-09-14 17:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:
> Because each "ref-prefix" capability from the client comes in its own
> pkt-line, there's no limit to the number of them that a misbehaving
> client may send. We read them all into a strvec, which means the client
> can waste arbitrary amounts of our memory by just sending us "ref-prefix
> foo" over and over.
>
> One possible solution is to just drop the connection when the limit is
> reached. If we set it high enough, then only misbehaving or malicious
> clients would hit it. But "high enough" is vague, and it's unfriendly if
> we guess wrong and a legitimate client hits this.
>
> But we can do better. Since supporting the ref-prefix capability is
> optional anyway, the client has to further cull the response based on
> their own patterns. So we can simply ignore the patterns once we cross a
> certain threshold. Note that we have to ignore _all_ patterns, not just
> the ones past our limit (since otherwise we'd send too little data).

Right, because each ref-prefix line *adds* references to the advertised
set instead of culling them out. So as soon as we start ignoring even a
single ref-prefix line, we have to ignore all of them. Makes sense.

> The limit here is fairly arbitrary, and probably much higher than anyone
> would need in practice. It might be worth limiting it further, if only
> because we check it linearly (so with "m" local refs and "n" patterns,
> we do "m * n" string comparisons). But if we care about optimizing this,
> an even better solution may be a more advanced data structure anyway.
>
> I didn't bother making the limit configurable, since it's so high and
> since Git should behave correctly in either case. It wouldn't be too
> hard to do, but it makes both the code and documentation more complex.

Agreed. I don't think it's worth making it configurable because the
limit is so absurdly high that probably nobody will ever want to tweak
it.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> We're perhaps bending "optional" a little here. The client does know if
> we said "yes, we support ref-prefix" and until now, that meant they
> could trust us to cull. But no version of Git has ever relied on that
> (we tell the transport code "if you can limit by these prefixes, go for
> it" but then just post-process the result).
>
> The other option is that we could just say "no, you're sending too many
> prefixes" and hangup. This seemed friendlier to me (though either way, I
> really find it quite unlikely anybody would legitimately hit this
> limit).

FWIW, either (dropping the connection or the approach you took here)
would have been fine with me, but I find it unlikely that any real users
will notice ;).

>  ls-refs.c            | 19 +++++++++++++++++--
>  t/t5701-git-serve.sh | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/ls-refs.c b/ls-refs.c
> index a1a0250607..839fb0caa9 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -40,6 +40,12 @@ static void ensure_config_read(void)
>  	config_read = 1;
>  }
>
> +/*
> + * The maximum number of "ref-prefix" lines we'll allow the client to send.
> + * If they go beyond this, we'll avoid using the prefix feature entirely.
> + */
> +#define MAX_ALLOWED_PREFIXES 65536
> +
>  /*
>   * Check if one of the prefixes is a prefix of the ref.
>   * If no prefixes were provided, all refs match.
> @@ -141,6 +147,7 @@ static int ls_refs_config(const char *var, const char *value, void *data)
>  int ls_refs(struct repository *r, struct packet_reader *request)
>  {
>  	struct ls_refs_data data;
> +	int too_many_prefixes = 0;
>
>  	memset(&data, 0, sizeof(data));
>  	strvec_init(&data.prefixes);
> @@ -156,8 +163,16 @@ int ls_refs(struct repository *r, struct packet_reader *request)
>  			data.peel = 1;
>  		else if (!strcmp("symrefs", arg))
>  			data.symrefs = 1;
> -		else if (skip_prefix(arg, "ref-prefix ", &out))
> -			strvec_push(&data.prefixes, out);
> +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
> +			if (too_many_prefixes) {
> +				/* ignore any further ones */
> +			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> +				strvec_clear(&data.prefixes);
> +				too_many_prefixes = 1;
> +			} else {
> +				strvec_push(&data.prefixes, out);
> +			}
> +		}

The order of this if-statement is a little odd to me, but obviously
correct. I might have wrote:

    if (too_many_prefixes)
      continue;

    if (data.prefixes.nr < MAX_ALLOWED_PREFIXES) {
      strvec_push(&data.prefixes, out);
    } else {
      too_many_prefixes = 1;
      strvec_clear(&data.prefixes);
    }

But certainly what you wrote here works just fine (so this is a cosmetic
comment, and not a functional one).

>  		else if (!strcmp("unborn", arg))
>  			data.unborn = allow_unborn;
>  	}
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> index 930721f053..b095bfa0ac 100755
> --- a/t/t5701-git-serve.sh
> +++ b/t/t5701-git-serve.sh
> @@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'ignore very large set of prefixes' '
> +	# generate a large number of ref-prefixes that we expect
> +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> +	# from ls-refs.c.
> +	{
> +		echo command=ls-refs &&
> +		echo object-format=$(test_oid algo)
> +		echo 0001 &&
> +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> +		echo 0000
> +	} |
> +	test-tool pkt-line pack >in &&
> +
> +	# and then confirm that we see unmatched prefixes anyway (i.e.,
> +	# that the prefix was not applied).
> +	cat >expect <<-EOF &&
> +	$(git rev-parse HEAD) HEAD
> +	$(git rev-parse refs/heads/dev) refs/heads/dev
> +	$(git rev-parse refs/heads/main) refs/heads/main
> +	$(git rev-parse refs/heads/release) refs/heads/release
> +	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
> +	$(git rev-parse refs/tags/one) refs/tags/one
> +	$(git rev-parse refs/tags/two) refs/tags/two

You could have written this as a loop over the unmatched prefixes, but I
vastly prefer the result you came up with, which is much more explicit
and doesn't require readers to parse out what the loop does.

So this part looks very good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 8/9] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-14 15:37 ` [PATCH 8/9] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
@ 2021-09-14 17:21   ` Taylor Blau
  0 siblings, 0 replies; 77+ messages in thread
From: Taylor Blau @ 2021-09-14 17:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:37:21AM -0400, Jeff King wrote:
> This isn't really hurting anything, but the request does violate the
> spec. Let's tighten it up to prevent any surprising behavior.

Yeah. I seriously doubt that anybody is relying on this obviously broken
behavior, and it's definitely a spec violation. So I have no trouble
rejecting 'command=ls-refs=foo' over the wire.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 15:37 ` [PATCH 7/9] ls-refs: ignore very long ref-prefix counts Jeff King
  2021-09-14 17:18   ` Taylor Blau
@ 2021-09-14 17:23   ` Jeff King
  2021-09-14 19:06   ` Martin Ågren
  2021-09-14 22:09   ` Jeff King
  3 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 17:23 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:

> The limit here is fairly arbitrary, and probably much higher than anyone
> would need in practice. It might be worth limiting it further, if only
> because we check it linearly (so with "m" local refs and "n" patterns,
> we do "m * n" string comparisons). But if we care about optimizing this,
> an even better solution may be a more advanced data structure anyway.

The limit I picked is 65536, because it seemed round and high. But note
that somebody can put up to almost-64k in a single ref-prefix line,
which means ultimately you can allocate 4GB. I do wonder if dropping
this to something like 1024 might be reasonable.

In practice I'd expect it to be a handful in most cases (refs/heads/*,
refs/tags/*, HEAD). But if you do something like:

  git fetch $remote 1 2 3 4 5 6 7 ...

then we'll prefix-expand those names with the usual lookup rules into
refs/1, refs/heads/1, refs/2, refs/heads/2, and so on.

At some point it becomes silly and works counter to the purpose of the
optimization (you send more prefix constraints than the actual ref
advertisement, not to mention that client bandwidth may not be
symmetric). I'm not sure what we want to declare as a reasonable limit.

And this is just about protecting the server; probably it makes sense
for the client to realize it's going to send a ridiculous number of
prefixes and just skip the feature entirely (since that's what actually
saves the bandwidth).

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 17:18   ` Taylor Blau
@ 2021-09-14 17:27     ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 17:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 01:18:06PM -0400, Taylor Blau wrote:

> > @@ -156,8 +163,16 @@ int ls_refs(struct repository *r, struct packet_reader *request)
> >  			data.peel = 1;
> >  		else if (!strcmp("symrefs", arg))
> >  			data.symrefs = 1;
> > -		else if (skip_prefix(arg, "ref-prefix ", &out))
> > -			strvec_push(&data.prefixes, out);
> > +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
> > +			if (too_many_prefixes) {
> > +				/* ignore any further ones */
> > +			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> > +				strvec_clear(&data.prefixes);
> > +				too_many_prefixes = 1;
> > +			} else {
> > +				strvec_push(&data.prefixes, out);
> > +			}
> > +		}
> 
> The order of this if-statement is a little odd to me, but obviously
> correct. I might have wrote:
> 
>     if (too_many_prefixes)
>       continue;
> 
>     if (data.prefixes.nr < MAX_ALLOWED_PREFIXES) {
>       strvec_push(&data.prefixes, out);
>     } else {
>       too_many_prefixes = 1;
>       strvec_clear(&data.prefixes);
>     }
> 
> But certainly what you wrote here works just fine (so this is a cosmetic
> comment, and not a functional one).

My view of it was: check every case that may avoid us pushing a prefix,
and then finally push one. But that may have been related to my goal in
writing the patch. :)

> > +test_expect_success 'ignore very large set of prefixes' '
> > +	# generate a large number of ref-prefixes that we expect
> > +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> > +	# from ls-refs.c.
> > +	{
> > +		echo command=ls-refs &&
> > +		echo object-format=$(test_oid algo)
> > +		echo 0001 &&
> > +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> > +		echo 0000
> > +	} |
> > +	test-tool pkt-line pack >in &&
> > +
> > +	# and then confirm that we see unmatched prefixes anyway (i.e.,
> > +	# that the prefix was not applied).
> > +	cat >expect <<-EOF &&
> > +	$(git rev-parse HEAD) HEAD
> > +	$(git rev-parse refs/heads/dev) refs/heads/dev
> > +	$(git rev-parse refs/heads/main) refs/heads/main
> > +	$(git rev-parse refs/heads/release) refs/heads/release
> > +	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
> > +	$(git rev-parse refs/tags/one) refs/tags/one
> > +	$(git rev-parse refs/tags/two) refs/tags/two
> 
> You could have written this as a loop over the unmatched prefixes, but I
> vastly prefer the result you came up with, which is much more explicit
> and doesn't require readers to parse out what the loop does.

I actually think that:

  git for-each-ref --format='%(objectname) %(refname)' >expect

is pretty readable (and is much more efficient, and nicely avoids the
master/main brittle-ness, which I ran into while backporting this). But:

  - this matches what the rest of the script does

  - for-each-ref doesn't report on HEAD, so we have to add that in
    separately

  - the "pkt-line unpack" will include the flush packet, so we'd have to
    add that in, too.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 0/9] reducing memory allocations for v2 servers
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (8 preceding siblings ...)
  2021-09-14 15:37 ` [PATCH 9/9] serve: reject commands used as capabilities Jeff King
@ 2021-09-14 17:30 ` Taylor Blau
  2021-09-14 18:00 ` Junio C Hamano
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Taylor Blau @ 2021-09-14 17:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:29:46AM -0400, Jeff King wrote:
> While looking at [1], I noticed that v2 servers will read a few bits of
> client input into strvecs. Even though we expect these to be small-ish,
> there's nothing preventing a client from sending us a bunch of junk and
> wasting memory.
>
> This series changes that, putting a cap on how much data we'll receive.
> The two spots are the "capabilities" list we receive (before we even
> dispatch to a particular command like ls-refs or fetch), and the
> ref-prefix list we receive for ls-refs.
>
> [...]

Thanks. I reviewed this series carefully and all of my comments were
either of the form "you could have written it this way, but I'm equally
happy with what you wrote here" or "this behavior change won't affect
real users and so I'm OK with it".

    Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 0/9] reducing memory allocations for v2 servers
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (9 preceding siblings ...)
  2021-09-14 17:30 ` [PATCH 0/9] reducing memory allocations for v2 servers Taylor Blau
@ 2021-09-14 18:00 ` Junio C Hamano
  2021-09-14 18:38   ` Jeff King
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
  2021-09-15  0:25 ` [PATCH 0/9] reducing memory allocations for v2 servers Ævar Arnfjörð Bjarmason
  12 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-09-14 18:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> While looking at [1], I noticed that v2 servers will read a few bits of
> client input into strvecs. Even though we expect these to be small-ish,
> there's nothing preventing a client from sending us a bunch of junk and
> wasting memory.

The title of the topic says "reducing", but it smells more to me
like this is about "limiting" allocations to protect ourselves.  Am
I reading the series correctly?

> This series changes that, putting a cap on how much data we'll receive.
> The two spots are the "capabilities" list we receive (before we even
> dispatch to a particular command like ls-refs or fetch), and the
> ref-prefix list we receive for ls-refs.
>
> To deal with the capabilities issue, we'll just handle each capability
> line as it comes in, rather than storing a list. This requires a bit of
> cleanup, which is why it takes up the first 6 patches. It also needs a
> few other cleanups from ab/serve-cleanup (and so this series is based on
> that). The dependencies there are both textual (e.g., using designated
> initializers in the capabilities table) and functional (dropping the
> "keys" parameter from v2 command() functions).
>
> Patch 7 fixes the ls-refs issue by degrading when we see too many
> prefixes (which works because the capability is optional in the first
> place).
>
> The final two patches aren't strictly necessary. They're a tightening of
> the protocol against some bogus input that I noticed while doing the
> earlier cleanups. That bogus input isn't really hurting anything, but I
> think it makes sense to reject nonsense from the client rather than
> ignoring it. The obvious risk is that some client for some reason would
> send us nonsense, but I don't think Git ever has.

Your cover letters (as proposed log messages for individual commits)
are always to the point and very pleasant to read.  Very much
appreciated.

Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 0/9] reducing memory allocations for v2 servers
  2021-09-14 18:00 ` Junio C Hamano
@ 2021-09-14 18:38   ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 18:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:00:56AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > While looking at [1], I noticed that v2 servers will read a few bits of
> > client input into strvecs. Even though we expect these to be small-ish,
> > there's nothing preventing a client from sending us a bunch of junk and
> > wasting memory.
> 
> The title of the topic says "reducing", but it smells more to me
> like this is about "limiting" allocations to protect ourselves.  Am
> I reading the series correctly?

Yeah, I think "limiting" is probably a better word. We will still
allocate for ref-prefix, of course, but we'll limit the number of
allocations. We did "reduce" the number of spots that allocate, since
capabilities no longer do so. :)

But I think your understanding of the series is correct.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 4/9] serve: provide "receive" function for object-format capability
  2021-09-14 15:31 ` [PATCH 4/9] serve: provide "receive" function for object-format capability Jeff King
@ 2021-09-14 18:59   ` Martin Ågren
  0 siblings, 0 replies; 77+ messages in thread
From: Martin Ågren @ 2021-09-14 18:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Ævar Arnfjörð Bjarmason

On Tue, 14 Sept 2021 at 17:33, Jeff King <peff@peff.net> wrote:
> repository algorithm). Since the check_algorithm() function would now be
> done to a single if() statement, I've just inlined it in its only
> caller.

s/done/down/, I believe.

> There should be no change of behavior here, except for two
> broken-protocol cases:

FWIW, I agree with this.


Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 5/9] serve: provide "receive" function for session-id capability
  2021-09-14 15:33 ` [PATCH 5/9] serve: provide "receive" function for session-id capability Jeff King
  2021-09-14 16:55   ` Taylor Blau
@ 2021-09-14 19:02   ` Martin Ågren
  2021-09-14 19:14     ` Jeff King
  1 sibling, 1 reply; 77+ messages in thread
From: Martin Ågren @ 2021-09-14 19:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Ævar Arnfjörð Bjarmason

On Tue, 14 Sept 2021 at 17:34, Jeff King <peff@peff.net> wrote:
>
> Rather than pulling the session-id string from the list of collected
> capabilities, we can handle it as soon as we receive it. This gets us
> closer to dropping the collected list entirely.

Looking good.

> As this removes the last caller of the static has_capability(), we can
> remove it, as well (and in fact we must to avoid -Wunused-function
> complaining).

> I had originally dropped has_capability() in a separate patch, to keep
> this one more readable. That breaks bisectability, but only with
> -Werror. I'm not sure where we should fall on that spectrum (I generally
> bisect with -Wno-error just because warnings may come and go when
> working with different compilers than what was normal at the time).
>
> Not that big a deal either way for this patch, but I wonder if people
> have opinions in general.

First of all, agreed about the "not that big a deal" part. Just a random
thought: You could do the opposite of what Elijah sometimes does by
first adding a "MAYBE_UNUSED" function, then actually using it. You'd
add "MAYBE_UNUSED" here, then the next commit would drop the whole
thing. It could be worth it if you're removing many many lines so that
the "actual" change gets lost in the noise. But this patch isn't near
any such threshold, IMHO (if there even is such a "threshold").

> +static void session_id_receive(struct repository *r,
> +                              const char *client_sid)
> +{
> +       if (!client_sid)
> +               client_sid = "";
> +       trace2_data_string("transfer", NULL, "client-sid", client_sid);
> +}

Handling NULL. Nice. :)


Martin

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 15:37 ` [PATCH 7/9] ls-refs: ignore very long ref-prefix counts Jeff King
  2021-09-14 17:18   ` Taylor Blau
  2021-09-14 17:23   ` Jeff King
@ 2021-09-14 19:06   ` Martin Ågren
  2021-09-14 19:22     ` Jeff King
  2021-09-14 22:09   ` Jeff King
  3 siblings, 1 reply; 77+ messages in thread
From: Martin Ågren @ 2021-09-14 19:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Ævar Arnfjörð Bjarmason

On Tue, 14 Sept 2021 at 17:38, Jeff King <peff@peff.net> wrote:
>
> One possible solution is to just drop the connection when the limit is
> reached. If we set it high enough, then only misbehaving or malicious
> clients would hit it. But "high enough" is vague, and it's unfriendly if
> we guess wrong and a legitimate client hits this.
>
> But we can do better. Since supporting the ref-prefix capability is
> optional anyway, the client has to further cull the response based on
> their own patterns. So we can simply ignore the patterns once we cross a
> certain threshold. Note that we have to ignore _all_ patterns, not just
> the ones past our limit (since otherwise we'd send too little data).

This all makes sense to me. At some point, we should be able to go "I
don't know what you're trying to do, but let me just ignore all this
craziness and instead try to give you a useful result sooner rather than
later".

I do wonder if we should document that the client can't trust us to
actually do all this culling. In general, I find that it's a matter of
hygiene for the client to do its own checks, but with this change they
actually *need* to do them. (Unless they know our limit and that they're
on the right side of it, but that kind of magic is even less hygienic.)

> +               else if (skip_prefix(arg, "ref-prefix ", &out)) {
> +                       if (too_many_prefixes) {
> +                               /* ignore any further ones */
> +                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> +                               strvec_clear(&data.prefixes);
> +                               too_many_prefixes = 1;
> +                       } else {
> +                               strvec_push(&data.prefixes, out);
> +                       }
> +               }

Is it easier to reason about with something like this
(whitespace-damaged) on top?

diff --git a/ls-refs.c b/ls-refs.c
index 839fb0caa9..b3101ff361 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -147,7 +147,6 @@ static int ls_refs_config(const char *var, const
char *value, void *data)
 int ls_refs(struct repository *r, struct packet_reader *request)
 {
        struct ls_refs_data data;
-       int too_many_prefixes = 0;

        memset(&data, 0, sizeof(data));
        strvec_init(&data.prefixes);
@@ -164,14 +163,8 @@ int ls_refs(struct repository *r, struct
packet_reader *request)
                else if (!strcmp("symrefs", arg))
                        data.symrefs = 1;
                else if (skip_prefix(arg, "ref-prefix ", &out)) {
-                       if (too_many_prefixes) {
-                               /* ignore any further ones */
-                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
-                               strvec_clear(&data.prefixes);
-                               too_many_prefixes = 1;
-                       } else {
+                       if (data.prefixes.nr <= MAX_ALLOWED_PREFIXES)
                                strvec_push(&data.prefixes, out);
-                       }
                }
                else if (!strcmp("unborn", arg))
                        data.unborn = allow_unborn;
@@ -180,6 +173,9 @@ int ls_refs(struct repository *r, struct
packet_reader *request)
        if (request->status != PACKET_READ_FLUSH)
                die(_("expected flush after ls-refs arguments"));

+       if (data.prefixes.nr > MAX_ALLOWED_PREFIXES)
+               strvec_clear(&data.prefixes);
+
        send_possibly_unborn_head(&data);
        if (!data.prefixes.nr)
                strvec_push(&data.prefixes, "");

Maybe even name the macro TOO_MANY_PREFIXES (and bump it by one)
to make the logic instead be

        if (data.prefixes.nr < TOO_MANY_PREFIXES)
                strvec_push(&data.prefixes, out);
 ...
        if (data.prefixes.nr >= TOO_MANY_PREFIXES)
                strvec_clear(&data.prefixes);

Just a thought. I'm reaching to try to find a way to improve this
series. ;-) It was a nice read.


Martin

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 5/9] serve: provide "receive" function for session-id capability
  2021-09-14 19:02   ` Martin Ågren
@ 2021-09-14 19:14     ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 19:14 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 09:02:10PM +0200, Martin Ågren wrote:

> > I had originally dropped has_capability() in a separate patch, to keep
> > this one more readable. That breaks bisectability, but only with
> > -Werror. I'm not sure where we should fall on that spectrum (I generally
> > bisect with -Wno-error just because warnings may come and go when
> > working with different compilers than what was normal at the time).
> >
> > Not that big a deal either way for this patch, but I wonder if people
> > have opinions in general.
> 
> First of all, agreed about the "not that big a deal" part. Just a random
> thought: You could do the opposite of what Elijah sometimes does by
> first adding a "MAYBE_UNUSED" function, then actually using it. You'd
> add "MAYBE_UNUSED" here, then the next commit would drop the whole
> thing. It could be worth it if you're removing many many lines so that
> the "actual" change gets lost in the noise. But this patch isn't near
> any such threshold, IMHO (if there even is such a "threshold").

Yeah, I considered that (because I had seen Elijah do it; I didn't think
of it myself). I don't love it, if only because now the extra
MAYBE_UNUSED is a head-scratcher for somebody reading the patch. I think
it makes sense if the code will exist in that maybe-unused state for a
while, but here it's just going away immediately anyway. I dunno.

> > +static void session_id_receive(struct repository *r,
> > +                              const char *client_sid)
> > +{
> > +       if (!client_sid)
> > +               client_sid = "";
> > +       trace2_data_string("transfer", NULL, "client-sid", client_sid);
> > +}
> 
> Handling NULL. Nice. :)

Otherwise segfault if the client just says "session-id". :)

To be clear, the old code behaved the same way. It's just that
has_capability() returned the empty string for this case instead of
NULL. I changed get_capability() to distinguish the two so that the
later fixes for "command=ls-refs=whatever" could treat them differently.

I didn't add tests for this case (nor for "object-format" without a
value), but we could do that if anybody cares.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 19:06   ` Martin Ågren
@ 2021-09-14 19:22     ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 19:22 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 09:06:55PM +0200, Martin Ågren wrote:

> > But we can do better. Since supporting the ref-prefix capability is
> > optional anyway, the client has to further cull the response based on
> > their own patterns. So we can simply ignore the patterns once we cross a
> > certain threshold. Note that we have to ignore _all_ patterns, not just
> > the ones past our limit (since otherwise we'd send too little data).
> 
> This all makes sense to me. At some point, we should be able to go "I
> don't know what you're trying to do, but let me just ignore all this
> craziness and instead try to give you a useful result sooner rather than
> later".
> 
> I do wonder if we should document that the client can't trust us to
> actually do all this culling. In general, I find that it's a matter of
> hygiene for the client to do its own checks, but with this change they
> actually *need* to do them. (Unless they know our limit and that they're
> on the right side of it, but that kind of magic is even less hygienic.)

Perhaps we could say so more explicitly in the v2 protocol spec. I'll
take a look.

> > +               else if (skip_prefix(arg, "ref-prefix ", &out)) {
> > +                       if (too_many_prefixes) {
> > +                               /* ignore any further ones */
> > +                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> > +                               strvec_clear(&data.prefixes);
> > +                               too_many_prefixes = 1;
> > +                       } else {
> > +                               strvec_push(&data.prefixes, out);
> > +                       }
> > +               }
> 
> Is it easier to reason about with something like this
> (whitespace-damaged) on top?

You're the second person to complain about this if-else chain. I'll take
the hint. ;)

> diff --git a/ls-refs.c b/ls-refs.c
> index 839fb0caa9..b3101ff361 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -147,7 +147,6 @@ static int ls_refs_config(const char *var, const
> char *value, void *data)
>  int ls_refs(struct repository *r, struct packet_reader *request)
>  {
>         struct ls_refs_data data;
> -       int too_many_prefixes = 0;
> 
>         memset(&data, 0, sizeof(data));
>         strvec_init(&data.prefixes);
> @@ -164,14 +163,8 @@ int ls_refs(struct repository *r, struct
> packet_reader *request)
>                 else if (!strcmp("symrefs", arg))
>                         data.symrefs = 1;
>                 else if (skip_prefix(arg, "ref-prefix ", &out)) {
> -                       if (too_many_prefixes) {
> -                               /* ignore any further ones */
> -                       } else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
> -                               strvec_clear(&data.prefixes);
> -                               too_many_prefixes = 1;
> -                       } else {
> +                       if (data.prefixes.nr <= MAX_ALLOWED_PREFIXES)
>                                 strvec_push(&data.prefixes, out);
> -                       }
>                 }

Hmm. At first I liked this, because it reduces the number of cases (and
variables!). But there's something really subtle going on here. I
thought at first it should be "<", but you are intentionally
over-allocating by one entry to indicate the overflow. I.e., you've
essentially stuffed the too_many_prefixes boolean into the count.

> @@ -180,6 +173,9 @@ int ls_refs(struct repository *r, struct
> packet_reader *request)
>         if (request->status != PACKET_READ_FLUSH)
>                 die(_("expected flush after ls-refs arguments"));
> 
> +       if (data.prefixes.nr > MAX_ALLOWED_PREFIXES)
> +               strvec_clear(&data.prefixes);
> +

This is far from the parser, but I think that's OK. I'd probably couple
it with a comment explaining why we need to clear rather than using what
we got.

> Maybe even name the macro TOO_MANY_PREFIXES (and bump it by one)
> to make the logic instead be
> 
>         if (data.prefixes.nr < TOO_MANY_PREFIXES)
>                 strvec_push(&data.prefixes, out);
>  ...
>         if (data.prefixes.nr >= TOO_MANY_PREFIXES)
>                 strvec_clear(&data.prefixes);

At first I thought this was just being cute, but it's an attempt to
compensate for the off-by-one subtlety in the early check. I'll give it
some thought. I kind of like it, but the fact that it took me a minute
or three to be sure the code is correct makes me worried it's being too
clever.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 15:37 ` [PATCH 7/9] ls-refs: ignore very long ref-prefix counts Jeff King
                     ` (2 preceding siblings ...)
  2021-09-14 19:06   ` Martin Ågren
@ 2021-09-14 22:09   ` Jeff King
  2021-09-14 22:11     ` Taylor Blau
  3 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 22:09 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:

> +test_expect_success 'ignore very large set of prefixes' '
> +	# generate a large number of ref-prefixes that we expect
> +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> +	# from ls-refs.c.
> +	{
> +		echo command=ls-refs &&
> +		echo object-format=$(test_oid algo)
> +		echo 0001 &&
> +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> +		echo 0000
> +	} |
> +	test-tool pkt-line pack >in &&

Yuck. While double-checking some refactoring, I realized this test does
not actually generate the correct input!

It omits the "ref-prefix" header. _And_ it accidentally expands $_ in
the shell rather than in perl.

The test does work once corrected. And in fact I had originally written
it correctly as a "test_seq | sed" pipeline, but generating 64k+ lines
in the shell seemed a bit much (we do avoid sub-processes, but the "set
-x" output is unwieldy).

I'll fix it in a re-roll.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 22:09   ` Jeff King
@ 2021-09-14 22:11     ` Taylor Blau
  2021-09-14 22:15       ` Jeff King
  0 siblings, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-09-14 22:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 06:09:45PM -0400, Jeff King wrote:
> On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:
>
> > +test_expect_success 'ignore very large set of prefixes' '
> > +	# generate a large number of ref-prefixes that we expect
> > +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> > +	# from ls-refs.c.
> > +	{
> > +		echo command=ls-refs &&
> > +		echo object-format=$(test_oid algo)
> > +		echo 0001 &&
> > +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> > +		echo 0000
> > +	} |
> > +	test-tool pkt-line pack >in &&
>
> Yuck. While double-checking some refactoring, I realized this test does
> not actually generate the correct input!
>
> It omits the "ref-prefix" header. _And_ it accidentally expands $_ in
> the shell rather than in perl.

Hah, nice find. You'd think that one of us would have caught it earlier
given that we both discussed it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 7/9] ls-refs: ignore very long ref-prefix counts
  2021-09-14 22:11     ` Taylor Blau
@ 2021-09-14 22:15       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 22:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 06:11:32PM -0400, Taylor Blau wrote:

> On Tue, Sep 14, 2021 at 06:09:45PM -0400, Jeff King wrote:
> > On Tue, Sep 14, 2021 at 11:37:06AM -0400, Jeff King wrote:
> >
> > > +test_expect_success 'ignore very large set of prefixes' '
> > > +	# generate a large number of ref-prefixes that we expect
> > > +	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
> > > +	# from ls-refs.c.
> > > +	{
> > > +		echo command=ls-refs &&
> > > +		echo object-format=$(test_oid algo)
> > > +		echo 0001 &&
> > > +		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
> > > +		echo 0000
> > > +	} |
> > > +	test-tool pkt-line pack >in &&
> >
> > Yuck. While double-checking some refactoring, I realized this test does
> > not actually generate the correct input!
> >
> > It omits the "ref-prefix" header. _And_ it accidentally expands $_ in
> > the shell rather than in perl.
> 
> Hah, nice find. You'd think that one of us would have caught it earlier
> given that we both discussed it.

Really, I'd have thought that ls-refs would complain about a totally
bogus capability. I'll see if I can fix that, as well.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 0/11] limit memory allocations for v2 servers
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (10 preceding siblings ...)
  2021-09-14 18:00 ` Junio C Hamano
@ 2021-09-14 23:51 ` Jeff King
  2021-09-14 23:51   ` [PATCH v2 01/11] serve: rename is_command() to parse_command() Jeff King
                     ` (12 more replies)
  2021-09-15  0:25 ` [PATCH 0/9] reducing memory allocations for v2 servers Ævar Arnfjörð Bjarmason
  12 siblings, 13 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Here's a re-roll of my series to limit the memory a v2 server is willing
to allocate on behalf of a client. See v1:

  https://lore.kernel.org/git/YUC%2F6n1hhUbMJiLw@coredump.intra.peff.net/

for an overview. The existing patches are mostly small fixups pointed
out by reviewers (thanks!), but I did take Martin's TOO_MANY_PREFIXES
suggestion in patch 7 (without the change to the name of the constant it
seemed too clever to me, but with it it seems just the right amount of
clever).

There are two new patches here:

 - patch 8 explicitly documents the v2 ref-prefix "best effort" behavior

 - patch 11 teaches ls-refs to reject bogus arguments, which violates
   the spec (the current behavior caused a broken test to go unnoticed)

Full range-diff is below the diffstat.

  [01/11]: serve: rename is_command() to parse_command()
  [02/11]: serve: return capability "value" from get_capability()
  [03/11]: serve: add "receive" method for v2 capabilities table
  [04/11]: serve: provide "receive" function for object-format capability
  [05/11]: serve: provide "receive" function for session-id capability
  [06/11]: serve: drop "keys" strvec
  [07/11]: ls-refs: ignore very long ref-prefix counts
  [08/11]: docs/protocol-v2: clarify some ls-refs ref-prefix details
  [09/11]: serve: reject bogus v2 "command=ls-refs=foo"
  [10/11]: serve: reject commands used as capabilities
  [11/11]: ls-refs: reject unknown arguments

 Documentation/technical/protocol-v2.txt |   6 +-
 ls-refs.c                               |  22 ++++-
 serve.c                                 | 119 +++++++++++++-----------
 t/t5701-git-serve.sh                    |  73 +++++++++++++++
 4 files changed, 162 insertions(+), 58 deletions(-)

 1:  eb8e7b21a1 =  1:  eb8e7b21a1 serve: rename is_command() to parse_command()
 2:  8cc66cae41 =  2:  8cc66cae41 serve: return capability "value" from get_capability()
 3:  3343f9bb0a =  3:  3343f9bb0a serve: add "receive" method for v2 capabilities table
 4:  c4cc80fe7a !  4:  0319b69881 serve: provide "receive" function for object-format capability
    @@ Commit message
         all capabilities (because they might not have sent an object-format line
         at all, and we still have to check that the default matches our
         repository algorithm). Since the check_algorithm() function would now be
    -    done to a single if() statement, I've just inlined it in its only
    +    down to a single if() statement, I've just inlined it in its only
         caller.
     
         There should be no change of behavior here, except for two
 5:  c8527ca5a7 =  5:  efe207c35c serve: provide "receive" function for session-id capability
 6:  250e4723ba !  6:  463aa7faa3 serve: drop "keys" strvec
    @@ serve.c: static int process_request(void)
      			else
      				die("unknown capability '%s'", reader.line);
      
    -+
    - 			/* Consume the peeked line */
    - 			packet_reader_read(&reader);
    - 			break;
     @@ serve.c: static int process_request(void)
      			 * If no command and no keys were given then the client
      			 * wanted to terminate the connection.
 7:  1218d62247 !  7:  da2043f42f ls-refs: ignore very long ref-prefix counts
    @@ ls-refs.c: static void ensure_config_read(void)
      }
      
     +/*
    -+ * The maximum number of "ref-prefix" lines we'll allow the client to send.
    -+ * If they go beyond this, we'll avoid using the prefix feature entirely.
    ++ * If we see this many or more "ref-prefix" lines from the client, we consider
    ++ * it "too many" and will avoid using the prefix feature entirely.
     + */
    -+#define MAX_ALLOWED_PREFIXES 65536
    ++#define TOO_MANY_PREFIXES 65536
     +
      /*
       * Check if one of the prefixes is a prefix of the ref.
       * If no prefixes were provided, all refs match.
    -@@ ls-refs.c: static int ls_refs_config(const char *var, const char *value, void *data)
    - int ls_refs(struct repository *r, struct packet_reader *request)
    - {
    - 	struct ls_refs_data data;
    -+	int too_many_prefixes = 0;
    - 
    - 	memset(&data, 0, sizeof(data));
    - 	strvec_init(&data.prefixes);
     @@ ls-refs.c: int ls_refs(struct repository *r, struct packet_reader *request)
      			data.peel = 1;
      		else if (!strcmp("symrefs", arg))
      			data.symrefs = 1;
     -		else if (skip_prefix(arg, "ref-prefix ", &out))
     -			strvec_push(&data.prefixes, out);
     +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
    -+			if (too_many_prefixes) {
    -+				/* ignore any further ones */
    -+			} else if (data.prefixes.nr >= MAX_ALLOWED_PREFIXES) {
    -+				strvec_clear(&data.prefixes);
    -+				too_many_prefixes = 1;
    -+			} else {
    ++			if (data.prefixes.nr < TOO_MANY_PREFIXES)
     +				strvec_push(&data.prefixes, out);
    -+			}
     +		}
      		else if (!strcmp("unborn", arg))
      			data.unborn = allow_unborn;
      	}
    + 
    + 	if (request->status != PACKET_READ_FLUSH)
    + 		die(_("expected flush after ls-refs arguments"));
    + 
    ++	/*
    ++	 * If we saw too many prefixes, we must avoid using them at all; as
    ++	 * soon as we have any prefix, they are meant to form a comprehensive
    ++	 * list.
    ++	 */
    ++	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
    ++		strvec_clear(&data.prefixes);
    ++
    + 	send_possibly_unborn_head(&data);
    + 	if (!data.prefixes.nr)
    + 		strvec_push(&data.prefixes, "");
     
      ## t/t5701-git-serve.sh ##
     @@ t/t5701-git-serve.sh: test_expect_success 'refs/heads prefix' '
    @@ t/t5701-git-serve.sh: test_expect_success 'refs/heads prefix' '
      
     +test_expect_success 'ignore very large set of prefixes' '
     +	# generate a large number of ref-prefixes that we expect
    -+	# to match nothing; the value here exceeds MAX_ALLOWED_PREFIXES
    ++	# to match nothing; the value here exceeds TOO_MANY_PREFIXES
     +	# from ls-refs.c.
     +	{
     +		echo command=ls-refs &&
     +		echo object-format=$(test_oid algo)
     +		echo 0001 &&
    -+		perl -le "print \"refs/heads/$_\" for (1..65536+1)" &&
    ++		perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
     +		echo 0000
     +	} |
     +	test-tool pkt-line pack >in &&
 -:  ---------- >  8:  ee540a4ef7 docs/protocol-v2: clarify some ls-refs ref-prefix details
 8:  b1567fdc82 =  9:  481c07cfac serve: reject bogus v2 "command=ls-refs=foo"
 9:  9786b9a11f = 10:  dff965c1d2 serve: reject commands used as capabilities
 -:  ---------- > 11:  f7339f924b ls-refs: reject unknown arguments

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 01/11] serve: rename is_command() to parse_command()
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
@ 2021-09-14 23:51   ` Jeff King
  2021-09-14 23:51   ` [PATCH v2 02/11] serve: return capability "value" from get_capability() Jeff King
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

The is_command() function not only tells us whether the pktline is a
valid command string, but it also parses out the command (and complains
if we see a duplicate). Let's rename it to make those extra functions
a bit more obvious.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/serve.c b/serve.c
index 1817edc7f5..fd88b95343 100644
--- a/serve.c
+++ b/serve.c
@@ -163,7 +163,7 @@ static int is_valid_capability(const char *key)
 	return c && c->advertise(the_repository, NULL);
 }
 
-static int is_command(const char *key, struct protocol_capability **command)
+static int parse_command(const char *key, struct protocol_capability **command)
 {
 	const char *out;
 
@@ -251,7 +251,7 @@ static int process_request(void)
 			BUG("Should have already died when seeing EOF");
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
-			if (is_command(reader.line, &command) ||
+			if (parse_command(reader.line, &command) ||
 			    is_valid_capability(reader.line))
 				strvec_push(&keys, reader.line);
 			else
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 02/11] serve: return capability "value" from get_capability()
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
  2021-09-14 23:51   ` [PATCH v2 01/11] serve: rename is_command() to parse_command() Jeff King
@ 2021-09-14 23:51   ` Jeff King
  2021-09-14 23:51   ` [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table Jeff King
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

When the client sends v2 capabilities, they may be simple, like:

  foo

or have a value like:

  foo=bar

(all of the current capabilities actually expect a value, but the
protocol allows for boolean ones).

We use get_capability() to make sure the client's pktline matches a
capability. In doing so, we parse enough to see the "=" and the value
(if any), but we immediately forget it. Nobody cares for now, because they end
up parsing the values out later using has_capability(). But in
preparation for changing that, let's pass back a pointer so the callers
know what we found.

Note that unlike has_capability(), we'll return NULL for a "simple"
capability. Distinguishing these will be useful for some future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/serve.c b/serve.c
index fd88b95343..78a4e83554 100644
--- a/serve.c
+++ b/serve.c
@@ -139,7 +139,7 @@ void protocol_v2_advertise_capabilities(void)
 	strbuf_release(&value);
 }
 
-static struct protocol_capability *get_capability(const char *key)
+static struct protocol_capability *get_capability(const char *key, const char **value)
 {
 	int i;
 
@@ -149,16 +149,25 @@ static struct protocol_capability *get_capability(const char *key)
 	for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
 		struct protocol_capability *c = &capabilities[i];
 		const char *out;
-		if (skip_prefix(key, c->name, &out) && (!*out || *out == '='))
+		if (!skip_prefix(key, c->name, &out))
+			continue;
+		if (!*out) {
+			*value = NULL;
 			return c;
+		}
+		if (*out++ == '=') {
+			*value = out;
+			return c;
+		}
 	}
 
 	return NULL;
 }
 
 static int is_valid_capability(const char *key)
 {
-	const struct protocol_capability *c = get_capability(key);
+	const char *value;
+	const struct protocol_capability *c = get_capability(key, &value);
 
 	return c && c->advertise(the_repository, NULL);
 }
@@ -168,7 +177,8 @@ static int parse_command(const char *key, struct protocol_capability **command)
 	const char *out;
 
 	if (skip_prefix(key, "command=", &out)) {
-		struct protocol_capability *cmd = get_capability(out);
+		const char *value;
+		struct protocol_capability *cmd = get_capability(out, &value);
 
 		if (*command)
 			die("command '%s' requested after already requesting command '%s'",
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
  2021-09-14 23:51   ` [PATCH v2 01/11] serve: rename is_command() to parse_command() Jeff King
  2021-09-14 23:51   ` [PATCH v2 02/11] serve: return capability "value" from get_capability() Jeff King
@ 2021-09-14 23:51   ` Jeff King
  2021-09-15  0:31     ` Ævar Arnfjörð Bjarmason
  2021-09-15 16:41     ` Junio C Hamano
  2021-09-14 23:51   ` [PATCH v2 04/11] serve: provide "receive" function for object-format capability Jeff King
                     ` (9 subsequent siblings)
  12 siblings, 2 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We have a capabilities table that tells us what we should tell the
client we are capable of, and what to do when a client gives us a
particular command (e.g., "command=ls-refs"). But it doesn't tell us
what to do when the client sends us back a capability (e.g.,
"object-format=sha256"). We just collect them all in a strvec and hope
somebody can use them later.

Instead, let's provide a function pointer in the table to act on these.
This will eventually help us avoid collecting the strings, which will be
more efficient and less prone to mischief.

Using the new method is optional, which helps in two ways:

  - we can move existing capabilities over to this new system gradually
    in individual commits

  - some capabilities we don't actually do anything with anyway. For
    example, the client is free to say "agent=git/1.2.3" to us, but we
    do not act on the information in any way.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/serve.c b/serve.c
index 78a4e83554..a161189984 100644
--- a/serve.c
+++ b/serve.c
@@ -70,6 +70,16 @@ struct protocol_capability {
 	 * This field should be NULL for capabilities which are not commands.
 	 */
 	int (*command)(struct repository *r, struct packet_reader *request);
+
+	/*
+	 * Function called when a client requests the capability as a
+	 * non-command. This may be NULL if the capability does nothing.
+	 *
+	 * For a capability of the form "foo=bar", the value string points to
+	 * the content after the "=" (i.e., "bar"). For simple capabilities
+	 * (just "foo"), it is NULL.
+	 */
+	void (*receive)(struct repository *r, const char *value);
 };
 
 static struct protocol_capability capabilities[] = {
@@ -164,12 +174,17 @@ static struct protocol_capability *get_capability(const char *key, const char **
 	return NULL;
 }
 
-static int is_valid_capability(const char *key)
+static int receive_client_capability(const char *key)
 {
 	const char *value;
 	const struct protocol_capability *c = get_capability(key, &value);
 
-	return c && c->advertise(the_repository, NULL);
+	if (!c || !c->advertise(the_repository, NULL))
+		return 0;
+
+	if (c->receive)
+		c->receive(the_repository, value);
+	return 1;
 }
 
 static int parse_command(const char *key, struct protocol_capability **command)
@@ -262,7 +277,7 @@ static int process_request(void)
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
 			if (parse_command(reader.line, &command) ||
-			    is_valid_capability(reader.line))
+			    receive_client_capability(reader.line))
 				strvec_push(&keys, reader.line);
 			else
 				die("unknown capability '%s'", reader.line);
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 04/11] serve: provide "receive" function for object-format capability
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (2 preceding siblings ...)
  2021-09-14 23:51   ` [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table Jeff King
@ 2021-09-14 23:51   ` Jeff King
  2021-09-15 16:54     ` Junio C Hamano
  2021-09-14 23:51   ` [PATCH v2 05/11] serve: provide "receive" function for session-id capability Jeff King
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We get any "object-format" specified by the client by searching for it
in the collected list of capabilities the client sent. We can instead
just handle it as soon as they send it. This is slightly more efficient,
and gets us one step closer to dropping that collected list.

Note that we do still have to do our final hash check after receiving
all capabilities (because they might not have sent an object-format line
at all, and we still have to check that the default matches our
repository algorithm). Since the check_algorithm() function would now be
down to a single if() statement, I've just inlined it in its only
caller.

There should be no change of behavior here, except for two
broken-protocol cases:

  - if the client sends multiple conflicting object-format capabilities
    (which they should not), we'll now choose the last one rather than
    the first. We could also detect and complain about the duplicates
    quite easily now, which we could not before, but I didn't do so
    here.

  - if the client sends a bogus "object-format" with no equals sign,
    we'll now say so, rather than "unknown object format: ''"

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/serve.c b/serve.c
index a161189984..f6ea2953eb 100644
--- a/serve.c
+++ b/serve.c
@@ -10,6 +10,7 @@
 #include "upload-pack.h"
 
 static int advertise_sid = -1;
+static int client_hash_algo = GIT_HASH_SHA1;
 
 static int always_advertise(struct repository *r,
 			    struct strbuf *value)
@@ -33,6 +34,17 @@ static int object_format_advertise(struct repository *r,
 	return 1;
 }
 
+static void object_format_receive(struct repository *r,
+				  const char *algo_name)
+{
+	if (!algo_name)
+		die("object-format capability requires an argument");
+
+	client_hash_algo = hash_algo_by_name(algo_name);
+	if (client_hash_algo == GIT_HASH_UNKNOWN)
+		die("unknown object format '%s'", algo_name);
+}
+
 static int session_id_advertise(struct repository *r, struct strbuf *value)
 {
 	if (advertise_sid == -1 &&
@@ -104,6 +116,7 @@ static struct protocol_capability capabilities[] = {
 	{
 		.name = "object-format",
 		.advertise = object_format_advertise,
+		.receive = object_format_receive,
 	},
 	{
 		.name = "session-id",
@@ -228,22 +241,6 @@ static int has_capability(const struct strvec *keys, const char *capability,
 	return 0;
 }
 
-static void check_algorithm(struct repository *r, struct strvec *keys)
-{
-	int client = GIT_HASH_SHA1, server = hash_algo_by_ptr(r->hash_algo);
-	const char *algo_name;
-
-	if (has_capability(keys, "object-format", &algo_name)) {
-		client = hash_algo_by_name(algo_name);
-		if (client == GIT_HASH_UNKNOWN)
-			die("unknown object format '%s'", algo_name);
-	}
-
-	if (client != server)
-		die("mismatched object format: server %s; client %s\n",
-		    r->hash_algo->name, hash_algos[client].name);
-}
-
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -317,7 +314,10 @@ static int process_request(void)
 	if (!command)
 		die("no command requested");
 
-	check_algorithm(the_repository, &keys);
+	if (client_hash_algo != hash_algo_by_ptr(the_repository->hash_algo))
+		die("mismatched object format: server %s; client %s\n",
+		    the_repository->hash_algo->name,
+		    hash_algos[client_hash_algo].name);
 
 	if (has_capability(&keys, "session-id", &client_sid))
 		trace2_data_string("transfer", NULL, "client-sid", client_sid);
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 05/11] serve: provide "receive" function for session-id capability
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (3 preceding siblings ...)
  2021-09-14 23:51   ` [PATCH v2 04/11] serve: provide "receive" function for object-format capability Jeff King
@ 2021-09-14 23:51   ` Jeff King
  2021-09-15 16:56     ` Junio C Hamano
  2021-09-14 23:51   ` [PATCH v2 06/11] serve: drop "keys" strvec Jeff King
                     ` (7 subsequent siblings)
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Rather than pulling the session-id string from the list of collected
capabilities, we can handle it as soon as we receive it. This gets us
closer to dropping the collected list entirely.

The behavior should be the same, with one exception. Previously if the
client sent us multiple session-id lines, we'd report only the first.
Now we'll pass each one along to trace2. This shouldn't matter in
practice, since clients shouldn't do that (and if they do, it's probably
sensible to log them all).

As this removes the last caller of the static has_capability(), we can
remove it, as well (and in fact we must to avoid -Wunused-function
complaining).

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 33 +++++++++------------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/serve.c b/serve.c
index f6ea2953eb..6bbf54cbbe 100644
--- a/serve.c
+++ b/serve.c
@@ -57,6 +57,14 @@ static int session_id_advertise(struct repository *r, struct strbuf *value)
 	return 1;
 }
 
+static void session_id_receive(struct repository *r,
+			       const char *client_sid)
+{
+	if (!client_sid)
+		client_sid = "";
+	trace2_data_string("transfer", NULL, "client-sid", client_sid);
+}
+
 struct protocol_capability {
 	/*
 	 * The name of the capability.  The server uses this name when
@@ -121,6 +129,7 @@ static struct protocol_capability capabilities[] = {
 	{
 		.name = "session-id",
 		.advertise = session_id_advertise,
+		.receive = session_id_receive,
 	},
 	{
 		.name = "object-info",
@@ -221,26 +230,6 @@ static int parse_command(const char *key, struct protocol_capability **command)
 	return 0;
 }
 
-static int has_capability(const struct strvec *keys, const char *capability,
-			  const char **value)
-{
-	int i;
-	for (i = 0; i < keys->nr; i++) {
-		const char *out;
-		if (skip_prefix(keys->v[i], capability, &out) &&
-		    (!*out || *out == '=')) {
-			if (value) {
-				if (*out == '=')
-					out++;
-				*value = out;
-			}
-			return 1;
-		}
-	}
-
-	return 0;
-}
-
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -252,7 +241,6 @@ static int process_request(void)
 	struct packet_reader reader;
 	struct strvec keys = STRVEC_INIT;
 	struct protocol_capability *command = NULL;
-	const char *client_sid;
 
 	packet_reader_init(&reader, 0, NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -319,9 +307,6 @@ static int process_request(void)
 		    the_repository->hash_algo->name,
 		    hash_algos[client_hash_algo].name);
 
-	if (has_capability(&keys, "session-id", &client_sid))
-		trace2_data_string("transfer", NULL, "client-sid", client_sid);
-
 	command->command(the_repository, &reader);
 
 	strvec_clear(&keys);
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 06/11] serve: drop "keys" strvec
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (4 preceding siblings ...)
  2021-09-14 23:51   ` [PATCH v2 05/11] serve: provide "receive" function for session-id capability Jeff King
@ 2021-09-14 23:51   ` Jeff King
  2021-09-15 17:01     ` Junio C Hamano
  2021-09-14 23:51   ` [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
                     ` (6 subsequent siblings)
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We collect the set of capabilities the client sends us in a strvec.
While this is usually small, there's no limit to the number of
capabilities the client can send us (e.g., they could just send us
"agent" pkt-lines over and over, and we'd keep adding them to the list).

Since all code has been converted away from using this list, let's get
rid of it. This avoids a potential attack where clients waste our
memory.

Note that we do have to replace it with a flag, because some of the
flush-packet logic checks whether we've seen any valid commands or keys.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/serve.c b/serve.c
index 6bbf54cbbe..5ea6c915cb 100644
--- a/serve.c
+++ b/serve.c
@@ -239,7 +239,7 @@ static int process_request(void)
 {
 	enum request_state state = PROCESS_REQUEST_KEYS;
 	struct packet_reader reader;
-	struct strvec keys = STRVEC_INIT;
+	int seen_capability_or_command = 0;
 	struct protocol_capability *command = NULL;
 
 	packet_reader_init(&reader, 0, NULL, 0,
@@ -263,7 +263,7 @@ static int process_request(void)
 			/* collect request; a sequence of keys and values */
 			if (parse_command(reader.line, &command) ||
 			    receive_client_capability(reader.line))
-				strvec_push(&keys, reader.line);
+				seen_capability_or_command = 1;
 			else
 				die("unknown capability '%s'", reader.line);
 
@@ -275,7 +275,7 @@ static int process_request(void)
 			 * If no command and no keys were given then the client
 			 * wanted to terminate the connection.
 			 */
-			if (!keys.nr)
+			if (!seen_capability_or_command)
 				return 1;
 
 			/*
@@ -309,7 +309,6 @@ static int process_request(void)
 
 	command->command(the_repository, &reader);
 
-	strvec_clear(&keys);
 	return 0;
 }
 
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (5 preceding siblings ...)
  2021-09-14 23:51   ` [PATCH v2 06/11] serve: drop "keys" strvec Jeff King
@ 2021-09-14 23:51   ` Jeff King
  2021-09-15  4:16     ` Taylor Blau
  2021-09-15  5:00     ` Eric Sunshine
  2021-09-14 23:52   ` [PATCH v2 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details Jeff King
                     ` (5 subsequent siblings)
  12 siblings, 2 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Because each "ref-prefix" capability from the client comes in its own
pkt-line, there's no limit to the number of them that a misbehaving
client may send. We read them all into a strvec, which means the client
can waste arbitrary amounts of our memory by just sending us "ref-prefix
foo" over and over.

One possible solution is to just drop the connection when the limit is
reached. If we set it high enough, then only misbehaving or malicious
clients would hit it. But "high enough" is vague, and it's unfriendly if
we guess wrong and a legitimate client hits this.

But we can do better. Since supporting the ref-prefix capability is
optional anyway, the client has to further cull the response based on
their own patterns. So we can simply ignore the patterns once we cross a
certain threshold. Note that we have to ignore _all_ patterns, not just
the ones past our limit (since otherwise we'd send too little data).

The limit here is fairly arbitrary, and probably much higher than anyone
would need in practice. It might be worth limiting it further, if only
because we check it linearly (so with "m" local refs and "n" patterns,
we do "m * n" string comparisons). But if we care about optimizing this,
an even better solution may be a more advanced data structure anyway.

I didn't bother making the limit configurable, since it's so high and
since Git should behave correctly in either case. It wouldn't be too
hard to do, but it makes both the code and documentation more complex.

Signed-off-by: Jeff King <peff@peff.net>
---
 ls-refs.c            | 20 ++++++++++++++++++--
 t/t5701-git-serve.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index a1a0250607..18c4f41e87 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -40,6 +40,12 @@ static void ensure_config_read(void)
 	config_read = 1;
 }
 
+/*
+ * If we see this many or more "ref-prefix" lines from the client, we consider
+ * it "too many" and will avoid using the prefix feature entirely.
+ */
+#define TOO_MANY_PREFIXES 65536
+
 /*
  * Check if one of the prefixes is a prefix of the ref.
  * If no prefixes were provided, all refs match.
@@ -156,15 +162,25 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 			data.peel = 1;
 		else if (!strcmp("symrefs", arg))
 			data.symrefs = 1;
-		else if (skip_prefix(arg, "ref-prefix ", &out))
-			strvec_push(&data.prefixes, out);
+		else if (skip_prefix(arg, "ref-prefix ", &out)) {
+			if (data.prefixes.nr < TOO_MANY_PREFIXES)
+				strvec_push(&data.prefixes, out);
+		}
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
 		die(_("expected flush after ls-refs arguments"));
 
+	/*
+	 * If we saw too many prefixes, we must avoid using them at all; as
+	 * soon as we have any prefix, they are meant to form a comprehensive
+	 * list.
+	 */
+	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
+		strvec_clear(&data.prefixes);
+
 	send_possibly_unborn_head(&data);
 	if (!data.prefixes.nr)
 		strvec_push(&data.prefixes, "");
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 930721f053..3bc96ebcde 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
 	test_cmp expect actual
 '
 
+test_expect_success 'ignore very large set of prefixes' '
+	# generate a large number of ref-prefixes that we expect
+	# to match nothing; the value here exceeds TOO_MANY_PREFIXES
+	# from ls-refs.c.
+	{
+		echo command=ls-refs &&
+		echo object-format=$(test_oid algo)
+		echo 0001 &&
+		perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
+		echo 0000
+	} |
+	test-tool pkt-line pack >in &&
+
+	# and then confirm that we see unmatched prefixes anyway (i.e.,
+	# that the prefix was not applied).
+	cat >expect <<-EOF &&
+	$(git rev-parse HEAD) HEAD
+	$(git rev-parse refs/heads/dev) refs/heads/dev
+	$(git rev-parse refs/heads/main) refs/heads/main
+	$(git rev-parse refs/heads/release) refs/heads/release
+	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
+	$(git rev-parse refs/tags/one) refs/tags/one
+	$(git rev-parse refs/tags/two) refs/tags/two
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (6 preceding siblings ...)
  2021-09-14 23:51   ` [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
@ 2021-09-14 23:52   ` Jeff King
  2021-09-14 23:52   ` [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We've never documented the fact that a client can provide multiple
ref-prefix capabilities. Let's describe the behavior.

We also never discussed the "best effort" nature of the prefixes. The
client side of git.git has always treated them this way, filtering the
result with local patterns. And indeed any client must do this, because
the prefix patterns are not sufficient to express the usual refspecs
(and so for "foo" we ask for "refs/heads/foo", "refs/tags/foo", and so
on).

So this may be considered a change in the spec with respect to client
expectations / requirements, but it's mostly codifying existing
behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/protocol-v2.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 213538f1d0..9347b2ad13 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -193,7 +193,11 @@ ls-refs takes in the following arguments:
 	Show peeled tags.
     ref-prefix <prefix>
 	When specified, only references having a prefix matching one of
-	the provided prefixes are displayed.
+	the provided prefixes are displayed. Multiple instances may be
+	given, in which case references matching any prefix will be
+	shown. Note that this is purely for optimization; a server MAY
+	show refs not matching the prefix if it chooses, and clients
+	should filter the result themselves.
 
 If the 'unborn' feature is advertised the following argument can be
 included in the client's request.
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (7 preceding siblings ...)
  2021-09-14 23:52   ` [PATCH v2 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details Jeff King
@ 2021-09-14 23:52   ` Jeff King
  2021-09-15  0:27     ` Ævar Arnfjörð Bjarmason
                       ` (2 more replies)
  2021-09-14 23:52   ` [PATCH v2 10/11] serve: reject commands used as capabilities Jeff King
                     ` (3 subsequent siblings)
  12 siblings, 3 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

When we see a line from the client like "command=ls-refs", we parse
everything after the equals sign as a capability, which we check against
our capabilities table. If we don't recognize the command (e.g.,
"command=foo"), we'll reject it. But we use the same parser that checks
for regular capabilities like "object-format=sha256". And so we'll
accept "ls-refs=foo", even though everything after the equals is bogus,
and simply ignored.

This isn't really hurting anything, but the request does violate the
spec. Let's tighten it up to prevent any surprising behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c              |  2 +-
 t/t5701-git-serve.sh | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/serve.c b/serve.c
index 5ea6c915cb..63ee1be7ff 100644
--- a/serve.c
+++ b/serve.c
@@ -220,7 +220,7 @@ static int parse_command(const char *key, struct protocol_capability **command)
 		if (*command)
 			die("command '%s' requested after already requesting command '%s'",
 			    out, (*command)->name);
-		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command)
+		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command || value)
 			die("invalid command '%s'", out);
 
 		*command = cmd;
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 3bc96ebcde..ab15078bc0 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -72,6 +72,16 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'requested command is command=value' '
+	test-tool pkt-line pack >in <<-\EOF &&
+	command=ls-refs=whatever
+	object-format=$(test_oid algo)
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep invalid.command.*ls-refs=whatever err
+'
+
 test_expect_success 'wrong object-format' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 10/11] serve: reject commands used as capabilities
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (8 preceding siblings ...)
  2021-09-14 23:52   ` [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
@ 2021-09-14 23:52   ` Jeff King
  2021-09-14 23:54   ` [PATCH v2 11/11] ls-refs: reject unknown arguments Jeff King
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:52 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Our table of v2 "capabilities" contains everything we might tell the
client we support. But there are differences in how we expect the client
to respond. Some of the entries are true capabilities (i.e., we expect
the client to say "yes, I support this"), and some are ones we expect
them to send as commands (with "command=ls-refs" or similar).

When we receive a capability used as a command, we complain about that.
But when we receive a command used as a capability (e.g., just "ls-refs"
in a pkt-line by itself), we silently ignore it.

This isn't really hurting anything (clients shouldn't send it, and we'll
ignore it), but we can tighten up the protocol to match what we expect
to happen.

There are two new tests here. The first one checks a capability used as
a command, which already passes. The second tests a command as a
capability, which this patch fixes.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c              |  2 +-
 t/t5701-git-serve.sh | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/serve.c b/serve.c
index 63ee1be7ff..0636b79f92 100644
--- a/serve.c
+++ b/serve.c
@@ -201,7 +201,7 @@ static int receive_client_capability(const char *key)
 	const char *value;
 	const struct protocol_capability *c = get_capability(key, &value);
 
-	if (!c || !c->advertise(the_repository, NULL))
+	if (!c || c->command || !c->advertise(the_repository, NULL))
 		return 0;
 
 	if (c->receive)
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index ab15078bc0..b027ba9b06 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -72,6 +72,25 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'request capability as command' '
+	test-tool pkt-line pack >in <<-\EOF &&
+	command=agent
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep invalid.command.*agent err
+'
+
+test_expect_success 'request command as capability' '
+	test-tool pkt-line pack >in <<-\EOF &&
+	command=ls-refs
+	fetch
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep unknown.capability err
+'
+
 test_expect_success 'requested command is command=value' '
 	test-tool pkt-line pack >in <<-\EOF &&
 	command=ls-refs=whatever
-- 
2.33.0.917.gae6ecbedc7


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 11/11] ls-refs: reject unknown arguments
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (9 preceding siblings ...)
  2021-09-14 23:52   ` [PATCH v2 10/11] serve: reject commands used as capabilities Jeff King
@ 2021-09-14 23:54   ` Jeff King
  2021-09-15  0:09     ` Ævar Arnfjörð Bjarmason
  2021-09-15  4:17   ` [PATCH v2 0/11] limit memory allocations for v2 servers Taylor Blau
  2021-09-15 18:33   ` Jeff King
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-14 23:54 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

The v2 ls-refs command may receive extra arguments from the client, one
per pkt-line. The spec is pretty clear that the arguments must come from
a specified set, but we silently ignore any unknown entries. For a
well-behaved client this doesn't matter, but it makes testing and
debugging more confusing. Let's tighten this up to match the spec.

In theory this liberal behavior _could_ be useful for extending the
protocol. But:

  - every other part of the protocol requires that the server first
    indicate that it supports the argument; this includes the fetch and
    object-info commands, plus the "unborn" capability added to ls-refs
    itself

  - it's not a very good extension mechanism anyway; without the server
    advertising support, clients would have no idea if the argument was
    silently ignored, or accepted and simply had no effect

So we're not really losing anything by tightening this.

Signed-off-by: Jeff King <peff@peff.net>
---
 ls-refs.c            |  2 ++
 t/t5701-git-serve.sh | 13 +++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/ls-refs.c b/ls-refs.c
index 18c4f41e87..460ac9b229 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -168,6 +168,8 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		}
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
+		else
+			die(_("unexpected line: '%s'"), arg);
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index b027ba9b06..e4d60bc605 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -145,6 +145,19 @@ test_expect_success 'basics of ls-refs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'ls-refs complains about unknown options' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=ls-refs
+	object-format=$(test_oid algo)
+	0001
+	no-such-arg
+	0000
+	EOF
+
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep unexpected.line.*no-such-arg err
+'
+
 test_expect_success 'basic ref-prefixes' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
-- 
2.33.0.917.gae6ecbedc7

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 11/11] ls-refs: reject unknown arguments
  2021-09-14 23:54   ` [PATCH v2 11/11] ls-refs: reject unknown arguments Jeff King
@ 2021-09-15  0:09     ` Ævar Arnfjörð Bjarmason
  2021-09-15 16:25       ` Jeff King
  0 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-15  0:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren


On Tue, Sep 14 2021, Jeff King wrote:

> The v2 ls-refs command may receive extra arguments from the client, one
> per pkt-line. The spec is pretty clear that the arguments must come from
> a specified set, but we silently ignore any unknown entries. For a
> well-behaved client this doesn't matter, but it makes testing and
> debugging more confusing. Let's tighten this up to match the spec.
>
> In theory this liberal behavior _could_ be useful for extending the
> protocol. But:
>
>   - every other part of the protocol requires that the server first
>     indicate that it supports the argument; this includes the fetch and
>     object-info commands, plus the "unborn" capability added to ls-refs
>     itself
>
>   - it's not a very good extension mechanism anyway; without the server
>     advertising support, clients would have no idea if the argument was
>     silently ignored, or accepted and simply had no effect
>
> So we're not really losing anything by tightening this.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  ls-refs.c            |  2 ++
>  t/t5701-git-serve.sh | 13 +++++++++++++
>  2 files changed, 15 insertions(+)
>
> diff --git a/ls-refs.c b/ls-refs.c
> index 18c4f41e87..460ac9b229 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -168,6 +168,8 @@ int ls_refs(struct repository *r, struct packet_reader *request)
>  		}
>  		else if (!strcmp("unborn", arg))
>  			data.unborn = allow_unborn;
> +		else
> +			die(_("unexpected line: '%s'"), arg);
>  	}
>  
>  	if (request->status != PACKET_READ_FLUSH)
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> index b027ba9b06..e4d60bc605 100755
> --- a/t/t5701-git-serve.sh
> +++ b/t/t5701-git-serve.sh
> @@ -145,6 +145,19 @@ test_expect_success 'basics of ls-refs' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'ls-refs complains about unknown options' '
> +	test-tool pkt-line pack >in <<-EOF &&
> +	command=ls-refs
> +	object-format=$(test_oid algo)
> +	0001
> +	no-such-arg
> +	0000
> +	EOF
> +
> +	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
> +	grep unexpected.line.*no-such-arg err
> +'
> +
>  test_expect_success 'basic ref-prefixes' '
>  	test-tool pkt-line pack >in <<-EOF &&
>  	command=ls-refs

Looks good. For what it's worth some of this series is stuff I've been
meaning to submit after my current in-flight series, so I guess we'll be
playing some rebase ping-pong in this area.

In my version of this I'd led with a change to do changes like these for
all the protocol verbs:
	
	diff --git a/connect.c b/connect.c
	index 70b13389ba5..5e991a92279 100644
	--- a/connect.c
	+++ b/connect.c
	@@ -514,6 +514,8 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
	        packet_write_fmt(fd_out, "symrefs\n");
	        if (server_supports_feature("ls-refs", "unborn", 0))
	                packet_write_fmt(fd_out, "unborn\n");
	+       if (git_env_bool("GIT_TEST_PROTOCOL_BAD_LS_REFS", 0))
	+               packet_write_fmt(fd_out, "test-bad-client\n");
	        for (i = 0; ref_prefixes && i < ref_prefixes->nr; i++) {
	                packet_write_fmt(fd_out, "ref-prefix %s\n",
	                                 ref_prefixes->v[i]);

So the below patch isn't anywhere near applying, but you can see the
test coverage (those tests are new) & what changes if we instrument a
client to actually send this bad data.

That packet_client_error() function is new, part of what I'm doing there
is converting all of this error emitting from die() to properly sending
ERR packets.

I think a bug in your versio is that you're using _() here, if your
server program happens to be started in Chinese you probably still want
to emit errors to clients in LANG=C.

That's why I'm using N_() in my version, it knows to emit the die()
localized, but the ERR packet in LANG=C, no you'll get the i18n message
in the server terminal, but the client gets LANG=C.

Of course that actually working is subject to various races that I may
or may not be able to untangle...


-- >8 --
protocol v2: correct & adjust "ls-refs" ERR behavior

Change the protocol v2 "ls-refs" to error out on unknown arguments,
before this we'd silently ignore unknown arguments.

This brings us in line with the behavior we've had in
"fetch" (i.e. upload-pack.c) since 3145ea957d2 (upload-pack: introduce
fetch server command, 2018-03-15) (see the "/* ignore unknown lines
maybe? */" comment in "process_args()".

The looser behavior in "ls-refs" seems to have been unintentionally
added in 72d0ea0056b (ls-refs: introduce ls-refs server command,
2018-03-15).

I've also changed the wording of the "expected flush" message in
"ls-refs" to be consistent with "fetch" and "object-info".

In the changes leading up to this we had to grep out the racy "fatal:
the remote end hung up unexpectedly" messages. Now we need to guard
the full test_cmp with a test_expect_failure. I could also write a
looser test, but this makes subsequent changes smaller, and is more
accurate.

Most of the time the test_expect_failure() succeeds, but in some cases
the two messages will be out-of-order, or e.g. only the STDERR one
will make it, and not the ERR message, or the other way
around. I.e. now that we emit both an ERR line and die() on the
remote, we might see ERR before die(), or die() before ERR.

This race does not happen in the t5701-git-serve.sh and other
"test-tool pkt-line" tests, which only use one process.

Ideally we should only emit ERR and not also something on STDERR when
we die() on the server. Subsequent commits will address that, but for
now let's keep that behavior. I.e. we'll now emit two mostly duplicate
errors in some cases from the client, or one where we emitted none (or
one that made no sense) before.

The current client implementation doesn't expect the server to be this
overly helpful in giving us errors both via pkt-line and STDERR, so we
usually get two lines, and in some cases just one (but never
zero).

Subsequent commits will address this. I.e. we can assume that the
client can format the ERR lines, and that we can just exit(128) or
disconnect on the server, not die().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 ls-refs.c                      |  6 +++++-
 t/t5703-protocol-v2-file.sh    | 24 +++++++++++++++++-------
 t/t5704-protocol-v2-git.sh     | 20 +++++++++++++-------
 t/t5705-protocol-v2-http.sh    | 25 +++++++++++++++++++------
 t/t5710-protocol-violations.sh |  2 +-
 5 files changed, 55 insertions(+), 22 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index 1e50e785665..f9bd577dcd2 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -150,11 +150,15 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 			strvec_push(&data.prefixes, out);
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
+		else
+			packet_client_error(&data.writer,
+					    N_("ls-refs: unexpected argument: '%s'"),
+					    request->line);
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
 		packet_client_error(&data.writer,
-				    N_("expected flush after ls-refs arguments"));
+				    N_("ls-refs: expected flush after arguments"));
 
 	send_possibly_unborn_head(&data);
 	if (!data.prefixes.nr)
diff --git a/t/t5703-protocol-v2-file.sh b/t/t5703-protocol-v2-file.sh
index e572767ee00..79457eaf101 100755
--- a/t/t5703-protocol-v2-file.sh
+++ b/t/t5703-protocol-v2-file.sh
@@ -32,18 +32,28 @@ test_expect_success 'list refs with file:// using protocol v2' '
 test_expect_success 'ls-remote handling a bad client using file:// protocol v2' '
 	test_when_finished "rm -f log" &&
 
-	cat >expect <<-EOF &&
-	$(git -C file_parent rev-parse refs/heads/main)$(printf "\t")refs/heads/main
+	cat >log.expect <<-\EOF &&
+	packet:  upload-pack> ERR ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	packet:          git< ERR ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
 	EOF
-	env \
+	cat >err.expect <<-\EOF &&
+	fatal: ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	fatal: remote error: ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	EOF
+	test_must_fail env \
 		GIT_TRACE_PACKET="$(pwd)/log" \
 		GIT_TEST_PROTOCOL_BAD_LS_REFS=true \
 		git -c protocol.version=2 \
-		ls-remote "file://$(pwd)/file_parent" main >actual 2>err &&
+		ls-remote "file://$(pwd)/file_parent" main >out 2>err.actual &&
 
-	test_must_be_empty err &&
-	test_cmp expect actual &&
-	grep "git> test-bad-client$" log
+	grep "unexpected argument.*test-bad-client" err.actual &&
+	test_must_be_empty out &&
+	grep ERR log >log.actual &&
+	test_cmp log.expect log.actual
+'
+
+test_expect_failure 'ls-remote ERR and die() is racy under file:// protocol v2' '
+	test_cmp err.expect err.actual
 '
 
 test_expect_success 'ref advertisement is filtered with ls-remote using protocol v2' '
diff --git a/t/t5704-protocol-v2-git.sh b/t/t5704-protocol-v2-git.sh
index c9293bbaad2..123c3f56ef8 100755
--- a/t/t5704-protocol-v2-git.sh
+++ b/t/t5704-protocol-v2-git.sh
@@ -48,19 +48,25 @@ test_expect_success 'ref advertisement is filtered with ls-remote using protocol
 	test_cmp expect actual
 '
 
-test_expect_success 'ls-remote handling a bad client using git:// protocol v2' '
+test_expect_success 'ls-remote handling a bad client using protocol v2' '
 	test_when_finished "rm -f log" &&
 
-	git ls-remote "$daemon_parent" >expect &&
-	env \
+	cat >err.expect <<-EOF &&
+	fatal: remote error: ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	EOF
+	test_must_fail env \
 		GIT_TRACE_PACKET="$(pwd)/log" \
 		GIT_TEST_PROTOCOL_BAD_LS_REFS=true \
 		git -c protocol.version=2 \
-		ls-remote "$GIT_DAEMON_URL/parent" >actual 2>err &&
+		ls-remote "$GIT_DAEMON_URL/parent" main >out.actual 2>err.actual &&
 
-	test_must_be_empty err &&
-	test_cmp expect actual &&
-	grep "git> test-bad-client$" log
+	test_must_be_empty out.actual &&
+	grep "unexpected argument.*test-bad-client" err.actual &&
+	grep "ERR ls-refs: unexpected argument.*test-bad-client" log
+'
+
+test_expect_failure  'ls-remote ERR and die() is racy under file:// protocol v2' '
+	test_cmp err.expect err.actual
 '
 
 test_expect_success 'clone with git:// using protocol v2' '
diff --git a/t/t5705-protocol-v2-http.sh b/t/t5705-protocol-v2-http.sh
index 5982c38743e..6f526e0829f 100755
--- a/t/t5705-protocol-v2-http.sh
+++ b/t/t5705-protocol-v2-http.sh
@@ -192,16 +192,29 @@ test_expect_success 'fetch from namespaced repo respects namespaces' '
 test_expect_success 'ls-remote handling a bad client using http:// protocol v2' '
 	test_when_finished "rm -f log" &&
 
-	git ls-remote "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" >expect &&
-	env \
+	cat >log.expect <<-\EOF &&
+	packet:  upload-pack> ERR ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	packet:          git< ERR ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	EOF
+
+	cat >err.expect <<-\EOF &&
+	fatal: ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	fatal: remote error: ls-refs: unexpected argument: '"'"'test-bad-client'"'"'
+	EOF
+	test_must_fail env \
 		GIT_TRACE_PACKET="$(pwd)/log" \
 		GIT_TEST_PROTOCOL_BAD_LS_REFS=true \
 		git -c protocol.version=2 \
-		ls-remote "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" >actual 2>err &&
+		ls-remote "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" >out 2>err.actual &&
 
-	test_must_be_empty err &&
-	test_cmp expect actual &&
-	grep "git> test-bad-client$" log
+	grep "unexpected argument.*test-bad-client" err.actual &&
+	test_must_be_empty out &&
+	grep ERR log >log.actual &&
+	test_cmp log.expect log.actual
+'
+
+test_expect_failure  'ls-remote ERR and die() is racy under http:// protocol v2' '
+	test_cmp err.expect err.actual
 '
 
 test_expect_success 'ls-remote with v2 http sends only one POST' '
diff --git a/t/t5710-protocol-violations.sh b/t/t5710-protocol-violations.sh
index 44e2c0d3ded..75a457d39ca 100755
--- a/t/t5710-protocol-violations.sh
+++ b/t/t5710-protocol-violations.sh
@@ -16,7 +16,7 @@ test_expect_success 'extra delim packet in v2 ls-refs args' '
 	EOF
 
 	cat >err.expect <<-\EOF &&
-	fatal: expected flush after ls-refs arguments
+	fatal: ls-refs: expected flush after arguments
 	EOF
 	test_must_fail env GIT_PROTOCOL=version=2 \
 		git upload-pack . <input 2>err.actual &&
-- 
2.33.0.1013.ge8323766266


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 0/9] reducing memory allocations for v2 servers
  2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
                   ` (11 preceding siblings ...)
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
@ 2021-09-15  0:25 ` Ævar Arnfjörð Bjarmason
  2021-09-15 16:41   ` Jeff King
  12 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-15  0:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git


On Tue, Sep 14 2021, Jeff King wrote:

> While looking at [1], I noticed that v2 servers will read a few bits of
> client input into strvecs. Even though we expect these to be small-ish,
> there's nothing preventing a client from sending us a bunch of junk and
> wasting memory.

[...]

>
> [1] https://lore.kernel.org/git/YT54CNYgtGcqexwq@coredump.intra.peff.net/

When I first read this I expected it to be a link to
https://lore.kernel.org/git/YPCsDLoiiAG%2FC%2Fft@coredump.intra.peff.net/;
i.e. that the object-info leak discussion back in July had encouraged
you to work on this ... :)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-14 23:52   ` [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
@ 2021-09-15  0:27     ` Ævar Arnfjörð Bjarmason
  2021-09-15 16:28       ` Jeff King
  2021-09-15  5:09     ` Eric Sunshine
  2021-09-15 17:33     ` Junio C Hamano
  2 siblings, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-15  0:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren


On Tue, Sep 14 2021, Jeff King wrote:

> When we see a line from the client like "command=ls-refs", we parse
> everything after the equals sign as a capability, which we check against
> our capabilities table. If we don't recognize the command (e.g.,
> "command=foo"), we'll reject it. But we use the same parser that checks
> for regular capabilities like "object-format=sha256". And so we'll
> accept "ls-refs=foo", even though everything after the equals is bogus,
> and simply ignored.
>
> This isn't really hurting anything, but the request does violate the
> spec. Let's tighten it up to prevent any surprising behavior.

Doesn't need a re-roll, but just for my own sanity:

By violating the spec you mean it doesn't coform to the "key" in the
"Capability Advertisement" section of protocol-v2.txt, one might skim
this and think values with "=" in them are OK, but a command=WHATEVER
has a WHATEVER as a "key", not a "value", that's for other parts of the
dialog.

But you could also have meant that it violates the spec because there's
no such command as "ls-refs=whatever", just like there isn't "foobar",
but I don't think that's what you mean...

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  serve.c              |  2 +-
>  t/t5701-git-serve.sh | 10 ++++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/serve.c b/serve.c
> index 5ea6c915cb..63ee1be7ff 100644
> --- a/serve.c
> +++ b/serve.c
> @@ -220,7 +220,7 @@ static int parse_command(const char *key, struct protocol_capability **command)
>  		if (*command)
>  			die("command '%s' requested after already requesting command '%s'",
>  			    out, (*command)->name);
> -		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command)
> +		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command || value)
>  			die("invalid command '%s'", out);
>  
>  		*command = cmd;
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> index 3bc96ebcde..ab15078bc0 100755
> --- a/t/t5701-git-serve.sh
> +++ b/t/t5701-git-serve.sh
> @@ -72,6 +72,16 @@ test_expect_success 'request invalid command' '
>  	test_i18ngrep "invalid command" err
>  '
>  
> +test_expect_success 'requested command is command=value' '
> +	test-tool pkt-line pack >in <<-\EOF &&
> +	command=ls-refs=whatever
> +	object-format=$(test_oid algo)
> +	0000
> +	EOF
> +	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
> +	grep invalid.command.*ls-refs=whatever err
> +'
> +
>  test_expect_success 'wrong object-format' '
>  	test-tool pkt-line pack >in <<-EOF &&
>  	command=fetch


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table
  2021-09-14 23:51   ` [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table Jeff King
@ 2021-09-15  0:31     ` Ævar Arnfjörð Bjarmason
  2021-09-15 16:35       ` Jeff King
  2021-09-15 16:41     ` Junio C Hamano
  1 sibling, 1 reply; 77+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-15  0:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren


On Tue, Sep 14 2021, Jeff King wrote:

> -static int is_valid_capability(const char *key)
> +static int receive_client_capability(const char *key)
>  {
>  	const char *value;
>  	const struct protocol_capability *c = get_capability(key, &value);
>  
> -	return c && c->advertise(the_repository, NULL);
> +	if (!c || !c->advertise(the_repository, NULL))
> +		return 0;
> +
> +	if (c->receive)
> +		c->receive(the_repository, value);
> +	return 1;
>  }
>  

I haven't actually run this yet (and need to zZzZ soon), but AFAICT at
the end of this series you leave the existing advertise semantics of
advertise() be (which is fine). I have this unsubmitted patch locally
which you may or may not want to work into this.

I considered splitting up the advertise() method as well, i.e. we could
have a new "is_advertised" boolean callback, and then a
"capability_string" or whatever. "server-option" and "object-info" never
add anything, so they could leave that as NULL.

But it's probably not worth it, just food for thought...

-- >8 -- serve: document that "advertise" is called in two modes

If we're being called with a non-NULL argument we're sending out the
advertisement line, but if it's NULL we're actually in the middle of a
request.

So you can use the check for NULL to emit your own "die" on "return
0", like "wtf, I said I don't support command xyz, why are you calling
it?", as opposed to the default "invalid command '%s'".

Maybe nobody cares, and we can't assume that we're going from an
advertisement to a command for the same request anyway (can
we?). I.e. are we serving multiple clients?

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 serve.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/serve.c b/serve.c
index aa8209f147e..b187ce26911 100644
--- a/serve.c
+++ b/serve.c
@@ -55,6 +55,12 @@ struct protocol_capability {
 	 * Optionally a value can be specified by adding it to 'value'.
 	 * If a value is added to 'value', the server will advertise this
 	 * capability as "<name>=<value>" instead of "<name>".
+	 *
+	 * When called with a NULL value we're past the advertisement
+	 * itself, and are checking during a request whether we
+	 * support this command. This can be checked and used used to
+	 * e.g. emit a die("we support this, but don't have it
+	 * enabled!") error.
 	 */
 	int (*advertise)(struct repository *r, struct strbuf *value);
 
-- 
2.33.0.1013.ge8323766266


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts
  2021-09-14 23:51   ` [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
@ 2021-09-15  4:16     ` Taylor Blau
  2021-09-15 16:39       ` Jeff King
  2021-09-15  5:00     ` Eric Sunshine
  1 sibling, 1 reply; 77+ messages in thread
From: Taylor Blau @ 2021-09-15  4:16 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 07:51:35PM -0400, Jeff King wrote:
> @@ -156,15 +162,25 @@ int ls_refs(struct repository *r, struct packet_reader *request)
>  			data.peel = 1;
>  		else if (!strcmp("symrefs", arg))
>  			data.symrefs = 1;
> -		else if (skip_prefix(arg, "ref-prefix ", &out))
> -			strvec_push(&data.prefixes, out);
> +		else if (skip_prefix(arg, "ref-prefix ", &out)) {
> +			if (data.prefixes.nr < TOO_MANY_PREFIXES)
> +				strvec_push(&data.prefixes, out);
> +		}
>  		else if (!strcmp("unborn", arg))
>  			data.unborn = allow_unborn;
>  	}
>
>  	if (request->status != PACKET_READ_FLUSH)
>  		die(_("expected flush after ls-refs arguments"));
>
> +	/*
> +	 * If we saw too many prefixes, we must avoid using them at all; as
> +	 * soon as we have any prefix, they are meant to form a comprehensive
> +	 * list.
> +	 */
> +	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
> +		strvec_clear(&data.prefixes);
> +

Great, I find this version even better than what I suggested where the
case where the list already has too many prefixes `continue`s through
the loop before calling strvec_add().

Just noting aloud, the `>` part of this conditional will never happen
(since data.prefixes.nr will be at most equal to TOO_MANY_PREFIXES, but
never larger than it).

Obviously not wrong, but I figured it'd be worth mentioning in case it
caught the attention of other reviewers.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/11] limit memory allocations for v2 servers
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (10 preceding siblings ...)
  2021-09-14 23:54   ` [PATCH v2 11/11] ls-refs: reject unknown arguments Jeff King
@ 2021-09-15  4:17   ` Taylor Blau
  2021-09-15 18:33   ` Jeff King
  12 siblings, 0 replies; 77+ messages in thread
From: Taylor Blau @ 2021-09-15  4:17 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 07:51:06PM -0400, Jeff King wrote:
> Here's a re-roll of my series to limit the memory a v2 server is willing
> to allocate on behalf of a client. See v1:
>
>   https://lore.kernel.org/git/YUC%2F6n1hhUbMJiLw@coredump.intra.peff.net/
>
> for an overview. The existing patches are mostly small fixups pointed
> out by reviewers (thanks!), but I did take Martin's TOO_MANY_PREFIXES
> suggestion in patch 7 (without the change to the name of the constant it
> seemed too clever to me, but with it it seems just the right amount of
> clever).

Thanks for this. The two new patches look good to me, and I took a light
skim over the ones which were modified since v1. Everything I saw seemed
very reasonable to me.

    Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts
  2021-09-14 23:51   ` [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
  2021-09-15  4:16     ` Taylor Blau
@ 2021-09-15  5:00     ` Eric Sunshine
  2021-09-15 16:40       ` Jeff King
  1 sibling, 1 reply; 77+ messages in thread
From: Eric Sunshine @ 2021-09-15  5:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Git List, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 7:51 PM Jeff King <peff@peff.net> wrote:
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> @@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
> +test_expect_success 'ignore very large set of prefixes' '
> +       # generate a large number of ref-prefixes that we expect
> +       # to match nothing; the value here exceeds TOO_MANY_PREFIXES
> +       # from ls-refs.c.
> +       {
> +               echo command=ls-refs &&
> +               echo object-format=$(test_oid algo)
> +               echo 0001 &&
> +               perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
> +               echo 0000
> +       } |
> +       test-tool pkt-line pack >in &&

Broken &&-chain in {...}.

(Granted, it doesn't matter in this case since the exit code gets lost
down the pipe, but better to be consistent about it.)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-14 23:52   ` [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
  2021-09-15  0:27     ` Ævar Arnfjörð Bjarmason
@ 2021-09-15  5:09     ` Eric Sunshine
  2021-09-15 16:32       ` Jeff King
  2021-09-15 17:33     ` Junio C Hamano
  2 siblings, 1 reply; 77+ messages in thread
From: Eric Sunshine @ 2021-09-15  5:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Git List, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Tue, Sep 14, 2021 at 7:52 PM Jeff King <peff@peff.net> wrote:
> When we see a line from the client like "command=ls-refs", we parse
> everything after the equals sign as a capability, which we check against
> our capabilities table. If we don't recognize the command (e.g.,
> "command=foo"), we'll reject it. But we use the same parser that checks
> for regular capabilities like "object-format=sha256". And so we'll
> accept "ls-refs=foo", even though everything after the equals is bogus,
> and simply ignored.
>
> This isn't really hurting anything, but the request does violate the
> spec. Let's tighten it up to prevent any surprising behavior.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> @@ -72,6 +72,16 @@ test_expect_success 'request invalid command' '
> +test_expect_success 'requested command is command=value' '
> +       test-tool pkt-line pack >in <<-\EOF &&
> +       command=ls-refs=whatever
> +       object-format=$(test_oid algo)
> +       0000
> +       EOF

This here-doc uses <<-\EOF yet (presumably) expects $(test_oid algo)
to be expanded. I'm guessing that you meant <<-EOF but never
noticed...

> +       test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&

... because of this test_must_fail().

> +       grep invalid.command.*ls-refs=whatever err
> +'

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 11/11] ls-refs: reject unknown arguments
  2021-09-15  0:09     ` Ævar Arnfjörð Bjarmason
@ 2021-09-15 16:25       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren

On Wed, Sep 15, 2021 at 02:09:16AM +0200, Ævar Arnfjörð Bjarmason wrote:

> So the below patch isn't anywhere near applying, but you can see the
> test coverage (those tests are new) & what changes if we instrument a
> client to actually send this bad data.
> 
> That packet_client_error() function is new, part of what I'm doing there
> is converting all of this error emitting from die() to properly sending
> ERR packets.

Right, I noticed that none of this is likely to make it to a client
(unless they are using file:// or an ssh channel which passes back
stderr).

I agree that we should probably be passing these back via ERR packets,
but note that the client is racy here. If the server reports an error
and dies while the client is still writing, they'll see SIGPIPE and exit
without showing the user the ERR packet. The solution may be something
along these lines:

  https://public-inbox.org/git/20200422163357.27056-1-chriscool@tuxfamily.org/

Alternatively, the server can pump the client data to /dev/null until we
hit a flush, and _then_ write the ERR. But that doesn't work for some
protocol-level errors (like "oops, I'm having trouble reading your
pkt-lines).

> I think a bug in your versio is that you're using _() here, if your
> server program happens to be started in Chinese you probably still want
> to emit errors to clients in LANG=C.

I'm not sure it's a bug. If you set LANG=zh on your server, that might
be harmful (if you serve a multi-lingual international audience), or it
might be helpful (if it's a company server where everybody speaks the
language). Likewise, for a file:// or ssh:// operation, your local LANG
would kick in.

I don't really have a strong opinion either way on whether it's helpful
or hindering overall. But I did follow the convention of nearby code in
this series, so I think we don't need to worry about it now (it also
seemed inconsistent to me; serve.c does not mark for translation, but
ls-refs.c does).

> Of course that actually working is subject to various races that I may
> or may not be able to untangle...

Oh good, so you know about them. :)

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-15  0:27     ` Ævar Arnfjörð Bjarmason
@ 2021-09-15 16:28       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren

On Wed, Sep 15, 2021 at 02:27:35AM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Sep 14 2021, Jeff King wrote:
> 
> > When we see a line from the client like "command=ls-refs", we parse
> > everything after the equals sign as a capability, which we check against
> > our capabilities table. If we don't recognize the command (e.g.,
> > "command=foo"), we'll reject it. But we use the same parser that checks
> > for regular capabilities like "object-format=sha256". And so we'll
> > accept "ls-refs=foo", even though everything after the equals is bogus,
> > and simply ignored.
> >
> > This isn't really hurting anything, but the request does violate the
> > spec. Let's tighten it up to prevent any surprising behavior.
> 
> Doesn't need a re-roll, but just for my own sanity:
> 
> By violating the spec you mean it doesn't coform to the "key" in the
> "Capability Advertisement" section of protocol-v2.txt, one might skim
> this and think values with "=" in them are OK, but a command=WHATEVER
> has a WHATEVER as a "key", not a "value", that's for other parts of the
> dialog.
> 
> But you could also have meant that it violates the spec because there's
> no such command as "ls-refs=whatever", just like there isn't "foobar",
> but I don't think that's what you mean...

Both, I think. :) The line "command=foo=bar" is obvious nonsense by the
spec, no matter which way you interpret it:

  - "foo" is a command name, but adding a value does not make sense when
    it is used in this context, so it's invalid

  - "foo=bar" is not a command name

Our parser does pick out "foo", so in terms of the code it is more like
the former. But since it is invalid either way, I'm not sure the
distinction matters.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-15  5:09     ` Eric Sunshine
@ 2021-09-15 16:32       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:32 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Wed, Sep 15, 2021 at 01:09:19AM -0400, Eric Sunshine wrote:

> > diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> > @@ -72,6 +72,16 @@ test_expect_success 'request invalid command' '
> > +test_expect_success 'requested command is command=value' '
> > +       test-tool pkt-line pack >in <<-\EOF &&
> > +       command=ls-refs=whatever
> > +       object-format=$(test_oid algo)
> > +       0000
> > +       EOF
> 
> This here-doc uses <<-\EOF yet (presumably) expects $(test_oid algo)
> to be expanded. I'm guessing that you meant <<-EOF but never
> noticed...

Ah, thanks. It's muscle memory to inhibit expansion in my here-docs. :)

> > +       test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
> 
> ... because of this test_must_fail().
> 
> > +       grep invalid.command.*ls-refs=whatever err
> > +'

Actually, it's a little more complicated. The "grep" here would notice
if we failed for the wrong reason. But we never actually look at the
object-format line at all, because we barf as soon as we see the bogus
command line.

So the object-format line is superfluous in this test. I added it mostly
for consistency with the other tests (including the existing
unknown-command test, which is in the same boat). But it also makes the
test slightly more robust, in that the command is more likely to succeed
(and thus fail the test) if the code accidentally did not notice the
wrong command line.

But obviously it's worth fixing. Thanks for noticing.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table
  2021-09-15  0:31     ` Ævar Arnfjörð Bjarmason
@ 2021-09-15 16:35       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Taylor Blau, Martin Ågren

On Wed, Sep 15, 2021 at 02:31:28AM +0200, Ævar Arnfjörð Bjarmason wrote:

> I haven't actually run this yet (and need to zZzZ soon), but AFAICT at
> the end of this series you leave the existing advertise semantics of
> advertise() be (which is fine). I have this unsubmitted patch locally
> which you may or may not want to work into this.
>
> I considered splitting up the advertise() method as well, i.e. we could
> have a new "is_advertised" boolean callback, and then a
> "capability_string" or whatever. "server-option" and "object-info" never
> add anything, so they could leave that as NULL.
> 
> But it's probably not worth it, just food for thought...

Yes, I noticed the advertise() function is really doing double-duty
here. Nothing changes with respect to it in my series, but I agree it's
a bit confusing.

The functions are simple enough that splitting up the two functions of
advertise() might make sense. Likewise, your documentation seems
reasonable. I'd prefer to punt on it for now, though, as my series isn't
otherwise touching this function.

I'm sure there are textual conflicts with what I'm doing here, since
it's nearby, but I'd prefer to just build more cleanups on top later.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts
  2021-09-15  4:16     ` Taylor Blau
@ 2021-09-15 16:39       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:39 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Junio C Hamano, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Wed, Sep 15, 2021 at 12:16:04AM -0400, Taylor Blau wrote:

> > +	/*
> > +	 * If we saw too many prefixes, we must avoid using them at all; as
> > +	 * soon as we have any prefix, they are meant to form a comprehensive
> > +	 * list.
> > +	 */
> > +	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
> > +		strvec_clear(&data.prefixes);
> > +
> 
> Great, I find this version even better than what I suggested where the
> case where the list already has too many prefixes `continue`s through
> the loop before calling strvec_add().
> 
> Just noting aloud, the `>` part of this conditional will never happen
> (since data.prefixes.nr will be at most equal to TOO_MANY_PREFIXES, but
> never larger than it).
> 
> Obviously not wrong, but I figured it'd be worth mentioning in case it
> caught the attention of other reviewers.

Yeah, your analysis is right. Long ago I read some advice (I think from
K&R, either the C book or a related article) that suggested always using
bounding checks like this rather than equality, because they do the
right thing even if the variable ends up with a surprising value.

I can certainly see an argument against it (like that if you're so
worried about it, maybe you ought to detect it and figure out who is
setting the variable to be unexpectedly high). But it seems like a
reasonable defensive measure to me (against an off-by-one mis-analysis,
or against the earlier code changing in the future).

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts
  2021-09-15  5:00     ` Eric Sunshine
@ 2021-09-15 16:40       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:40 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Wed, Sep 15, 2021 at 01:00:01AM -0400, Eric Sunshine wrote:

> On Tue, Sep 14, 2021 at 7:51 PM Jeff King <peff@peff.net> wrote:
> > diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
> > @@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
> > +test_expect_success 'ignore very large set of prefixes' '
> > +       # generate a large number of ref-prefixes that we expect
> > +       # to match nothing; the value here exceeds TOO_MANY_PREFIXES
> > +       # from ls-refs.c.
> > +       {
> > +               echo command=ls-refs &&
> > +               echo object-format=$(test_oid algo)
> > +               echo 0001 &&
> > +               perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
> > +               echo 0000
> > +       } |
> > +       test-tool pkt-line pack >in &&
> 
> Broken &&-chain in {...}.

Thanks, will fix.

> (Granted, it doesn't matter in this case since the exit code gets lost
> down the pipe, but better to be consistent about it.)

Yep. I think what happened is that I started to convert this to a
here-doc to match the nearby tests, but then realized that expanding the
multi-line bit was weird (even more so when it was a shell loop with a
pipe, before I switched it to perl).

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table
  2021-09-14 23:51   ` [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table Jeff King
  2021-09-15  0:31     ` Ævar Arnfjörð Bjarmason
@ 2021-09-15 16:41     ` Junio C Hamano
  2021-09-15 16:57       ` Jeff King
  1 sibling, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-09-15 16:41 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> +	/*
> +	 * Function called when a client requests the capability as a
> +	 * non-command. This may be NULL if the capability does nothing.
> +	 *
> +	 * For a capability of the form "foo=bar", the value string points to
> +	 * the content after the "=" (i.e., "bar"). For simple capabilities
> +	 * (just "foo"), it is NULL.
> +	 */
> +	void (*receive)(struct repository *r, const char *value);

What does "as a non-command" mean?  To put it another way, when a
client requests the capability as a command, what does the receive
method do differently?

> @@ -164,12 +174,17 @@ static struct protocol_capability *get_capability(const char *key, const char **
>  	return NULL;
>  }
>  
> -static int is_valid_capability(const char *key)
> +static int receive_client_capability(const char *key)
>  {
>  	const char *value;
>  	const struct protocol_capability *c = get_capability(key, &value);
>  
> -	return c && c->advertise(the_repository, NULL);
> +	if (!c || !c->advertise(the_repository, NULL))
> +		return 0;
> +
> +	if (c->receive)
> +		c->receive(the_repository, value);
> +	return 1;
>  }
>  
>  static int parse_command(const char *key, struct protocol_capability **command)
> @@ -262,7 +277,7 @@ static int process_request(void)
>  		case PACKET_READ_NORMAL:
>  			/* collect request; a sequence of keys and values */

The comment tentatively gets slightly stale here, but that will be
corrected at the end, so it would be fine ;-)

>  			if (parse_command(reader.line, &command) ||
> -			    is_valid_capability(reader.line))
> +			    receive_client_capability(reader.line))
>  				strvec_push(&keys, reader.line);
>  			else
>  				die("unknown capability '%s'", reader.line);

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 0/9] reducing memory allocations for v2 servers
  2021-09-15  0:25 ` [PATCH 0/9] reducing memory allocations for v2 servers Ævar Arnfjörð Bjarmason
@ 2021-09-15 16:41   ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Wed, Sep 15, 2021 at 02:25:49AM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Sep 14 2021, Jeff King wrote:
> 
> > While looking at [1], I noticed that v2 servers will read a few bits of
> > client input into strvecs. Even though we expect these to be small-ish,
> > there's nothing preventing a client from sending us a bunch of junk and
> > wasting memory.
> 
> [...]
> 
> >
> > [1] https://lore.kernel.org/git/YT54CNYgtGcqexwq@coredump.intra.peff.net/
> 
> When I first read this I expected it to be a link to
> https://lore.kernel.org/git/YPCsDLoiiAG%2FC%2Fft@coredump.intra.peff.net/;
> i.e. that the object-info leak discussion back in July had encouraged
> you to work on this ... :)

Nope, I got terrified by the "yes | upload-pack" example I showed
earlier. :)

I was really worried it could turn into a heap overflow, but it turns
out that it cannot. But I still think tightening things up (and avoiding
any memory-consumption attacks) is worth doing.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 04/11] serve: provide "receive" function for object-format capability
  2021-09-14 23:51   ` [PATCH v2 04/11] serve: provide "receive" function for object-format capability Jeff King
@ 2021-09-15 16:54     ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2021-09-15 16:54 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> There should be no change of behavior here, except for two
> broken-protocol cases:
>
>   - if the client sends multiple conflicting object-format capabilities
>     (which they should not), we'll now choose the last one rather than
>     the first. We could also detect and complain about the duplicates
>     quite easily now, which we could not before, but I didn't do so
>     here.

I'd imagine that it would have been easy to retain the "use the
first one" behaviour if we wanted to; I do not think this behaviour
change really matters in practice, so it's OK.

>   - if the client sends a bogus "object-format" with no equals sign,
>     we'll now say so, rather than "unknown object format: ''"

OK.

> @@ -228,22 +241,6 @@ static int has_capability(const struct strvec *keys, const char *capability,
>  	return 0;
>  }
>  
> -static void check_algorithm(struct repository *r, struct strvec *keys)
> -{
> -	int client = GIT_HASH_SHA1, server = hash_algo_by_ptr(r->hash_algo);
> -	const char *algo_name;
> -
> -	if (has_capability(keys, "object-format", &algo_name)) {
> -		client = hash_algo_by_name(algo_name);
> -		if (client == GIT_HASH_UNKNOWN)
> -			die("unknown object format '%s'", algo_name);
> -	}
> -
> -	if (client != server)
> -		die("mismatched object format: server %s; client %s\n",
> -		    r->hash_algo->name, hash_algos[client].name);
> -}
> -
>  enum request_state {
>  	PROCESS_REQUEST_KEYS,
>  	PROCESS_REQUEST_DONE,
> @@ -317,7 +314,10 @@ static int process_request(void)
>  	if (!command)
>  		die("no command requested");
>  
> -	check_algorithm(the_repository, &keys);
> +	if (client_hash_algo != hash_algo_by_ptr(the_repository->hash_algo))
> +		die("mismatched object format: server %s; client %s\n",
> +		    the_repository->hash_algo->name,
> +		    hash_algos[client_hash_algo].name);

OK.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 05/11] serve: provide "receive" function for session-id capability
  2021-09-14 23:51   ` [PATCH v2 05/11] serve: provide "receive" function for session-id capability Jeff King
@ 2021-09-15 16:56     ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2021-09-15 16:56 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> Rather than pulling the session-id string from the list of collected
> capabilities, we can handle it as soon as we receive it. This gets us
> closer to dropping the collected list entirely.
>
> The behavior should be the same, with one exception. Previously if the
> client sent us multiple session-id lines, we'd report only the first.
> Now we'll pass each one along to trace2. This shouldn't matter in
> practice, since clients shouldn't do that (and if they do, it's probably
> sensible to log them all).
>
> As this removes the last caller of the static has_capability(), we can
> remove it, as well (and in fact we must to avoid -Wunused-function
> complaining).

Nice.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  serve.c | 33 +++++++++------------------------
>  1 file changed, 9 insertions(+), 24 deletions(-)
>
> diff --git a/serve.c b/serve.c
> index f6ea2953eb..6bbf54cbbe 100644
> --- a/serve.c
> +++ b/serve.c
> @@ -57,6 +57,14 @@ static int session_id_advertise(struct repository *r, struct strbuf *value)
>  	return 1;
>  }
>  
> +static void session_id_receive(struct repository *r,
> +			       const char *client_sid)
> +{
> +	if (!client_sid)
> +		client_sid = "";
> +	trace2_data_string("transfer", NULL, "client-sid", client_sid);
> +}
> +
>  struct protocol_capability {
>  	/*
>  	 * The name of the capability.  The server uses this name when
> @@ -121,6 +129,7 @@ static struct protocol_capability capabilities[] = {
>  	{
>  		.name = "session-id",
>  		.advertise = session_id_advertise,
> +		.receive = session_id_receive,
>  	},
>  	{
>  		.name = "object-info",
> @@ -221,26 +230,6 @@ static int parse_command(const char *key, struct protocol_capability **command)
>  	return 0;
>  }
>  
> -static int has_capability(const struct strvec *keys, const char *capability,
> -			  const char **value)
> -{
> -	int i;
> -	for (i = 0; i < keys->nr; i++) {
> -		const char *out;
> -		if (skip_prefix(keys->v[i], capability, &out) &&
> -		    (!*out || *out == '=')) {
> -			if (value) {
> -				if (*out == '=')
> -					out++;
> -				*value = out;
> -			}
> -			return 1;
> -		}
> -	}
> -
> -	return 0;
> -}
> -
>  enum request_state {
>  	PROCESS_REQUEST_KEYS,
>  	PROCESS_REQUEST_DONE,
> @@ -252,7 +241,6 @@ static int process_request(void)
>  	struct packet_reader reader;
>  	struct strvec keys = STRVEC_INIT;
>  	struct protocol_capability *command = NULL;
> -	const char *client_sid;
>  
>  	packet_reader_init(&reader, 0, NULL, 0,
>  			   PACKET_READ_CHOMP_NEWLINE |
> @@ -319,9 +307,6 @@ static int process_request(void)
>  		    the_repository->hash_algo->name,
>  		    hash_algos[client_hash_algo].name);
>  
> -	if (has_capability(&keys, "session-id", &client_sid))
> -		trace2_data_string("transfer", NULL, "client-sid", client_sid);
> -
>  	command->command(the_repository, &reader);
>  
>  	strvec_clear(&keys);

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table
  2021-09-15 16:41     ` Junio C Hamano
@ 2021-09-15 16:57       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 16:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Wed, Sep 15, 2021 at 09:41:28AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +	/*
> > +	 * Function called when a client requests the capability as a
> > +	 * non-command. This may be NULL if the capability does nothing.
> > +	 *
> > +	 * For a capability of the form "foo=bar", the value string points to
> > +	 * the content after the "=" (i.e., "bar"). For simple capabilities
> > +	 * (just "foo"), it is NULL.
> > +	 */
> > +	void (*receive)(struct repository *r, const char *value);
> 
> What does "as a non-command" mean?  To put it another way, when a
> client requests the capability as a command, what does the receive
> method do differently?

For each entry in this capability table, clients can say:

  command=foo

or just:

  foo

The latter is a non-command. The "receive" function is not called at all
if it is a "command". I think this is a bit more clear when read
together with the existing function just above (which you can't see in
the diff context):

        /*
         * Function called when a client requests the capability as a command.
         * Will be provided a struct packet_reader 'request' which it should
         * use to read the command specific part of the request.  Every command
         * MUST read until a flush packet is seen before sending a response.
         *
         * This field should be NULL for capabilities which are not commands.
         */
        int (*command)(struct repository *r, struct packet_reader *request);

I guess these could define "as a command", but I think it's pretty clear
in the context.

> >  static int parse_command(const char *key, struct protocol_capability **command)
> > @@ -262,7 +277,7 @@ static int process_request(void)
> >  		case PACKET_READ_NORMAL:
> >  			/* collect request; a sequence of keys and values */
> 
> The comment tentatively gets slightly stale here, but that will be
> corrected at the end, so it would be fine ;-)

Hmm. I think it is not stale here, as we are still collecting the "keys"
strvec. But it _does_ get stale by the end, when we stop doing so.

I'm preparing a re-roll for the test issues Eric noted, so I'll drop
that comment in the appropriate patch.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 06/11] serve: drop "keys" strvec
  2021-09-14 23:51   ` [PATCH v2 06/11] serve: drop "keys" strvec Jeff King
@ 2021-09-15 17:01     ` Junio C Hamano
  0 siblings, 0 replies; 77+ messages in thread
From: Junio C Hamano @ 2021-09-15 17:01 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> We collect the set of capabilities the client sends us in a strvec.
> While this is usually small, there's no limit to the number of
> capabilities the client can send us (e.g., they could just send us
> "agent" pkt-lines over and over, and we'd keep adding them to the list).
>
> Since all code has been converted away from using this list, let's get
> rid of it. This avoids a potential attack where clients waste our
> memory.
>
> Note that we do have to replace it with a flag, because some of the
> flush-packet logic checks whether we've seen any valid commands or keys.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  serve.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/serve.c b/serve.c
> index 6bbf54cbbe..5ea6c915cb 100644
> --- a/serve.c
> +++ b/serve.c
> @@ -239,7 +239,7 @@ static int process_request(void)
>  {
>  	enum request_state state = PROCESS_REQUEST_KEYS;
>  	struct packet_reader reader;
> -	struct strvec keys = STRVEC_INIT;
> +	int seen_capability_or_command = 0;
>  	struct protocol_capability *command = NULL;
>  
>  	packet_reader_init(&reader, 0, NULL, 0,
> @@ -263,7 +263,7 @@ static int process_request(void)
>  			/* collect request; a sequence of keys and values */
>  			if (parse_command(reader.line, &command) ||
>  			    receive_client_capability(reader.line))
> -				strvec_push(&keys, reader.line);
> +				seen_capability_or_command = 1;

OK, we no longer "collect" request in the keys strvec, but I guess
what receive_client_capability() does still counts as "collecting",
so the "tentatively stale" comment is not wrong after all at the end
(we have tentatively been collecting in two different places and one
of them is dropped here).

> @@ -275,7 +275,7 @@ static int process_request(void)
>  			 * If no command and no keys were given then the client
>  			 * wanted to terminate the connection.
>  			 */
> -			if (!keys.nr)
> +			if (!seen_capability_or_command)
>  				return 1;
>  
>  			/*
> @@ -309,7 +309,6 @@ static int process_request(void)
>  
>  	command->command(the_repository, &reader);
>  
> -	strvec_clear(&keys);
>  	return 0;
>  }

Nice.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-14 23:52   ` [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
  2021-09-15  0:27     ` Ævar Arnfjörð Bjarmason
  2021-09-15  5:09     ` Eric Sunshine
@ 2021-09-15 17:33     ` Junio C Hamano
  2021-09-15 17:39       ` Jeff King
  2 siblings, 1 reply; 77+ messages in thread
From: Junio C Hamano @ 2021-09-15 17:33 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> When we see a line from the client like "command=ls-refs", we parse
> everything after the equals sign as a capability, which we check against
> our capabilities table. If we don't recognize the command (e.g.,
> "command=foo"), we'll reject it. But we use the same parser that checks
> for regular capabilities like "object-format=sha256". And so we'll
> accept "ls-refs=foo", even though everything after the equals is bogus,
> and simply ignored.

Maybe I am slow but I had to read the above a few times and finally
look at the implementation of parse_command() to realize that what
the last sentence describes is:

    When parse_command() is fed "command=ls-refs=foo", it strips
    "command=", feeds "ls-refs=foo" to get_capability(), and because
    we do not ensure value is NULL, we silently ignore "=foo" that
    is bogus.

And it makes sense.  It would probably have helped if I peeked the
updated test ;-)

> This isn't really hurting anything, but the request does violate the
> spec. Let's tighten it up to prevent any surprising behavior.

Good.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-15 17:33     ` Junio C Hamano
@ 2021-09-15 17:39       ` Jeff King
  0 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 17:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Wed, Sep 15, 2021 at 10:33:32AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > When we see a line from the client like "command=ls-refs", we parse
> > everything after the equals sign as a capability, which we check against
> > our capabilities table. If we don't recognize the command (e.g.,
> > "command=foo"), we'll reject it. But we use the same parser that checks
> > for regular capabilities like "object-format=sha256". And so we'll
> > accept "ls-refs=foo", even though everything after the equals is bogus,
> > and simply ignored.
> 
> Maybe I am slow but I had to read the above a few times and finally
> look at the implementation of parse_command() to realize that what
> the last sentence describes is:
> 
>     When parse_command() is fed "command=ls-refs=foo", it strips
>     "command=", feeds "ls-refs=foo" to get_capability(), and because
>     we do not ensure value is NULL, we silently ignore "=foo" that
>     is bogus.
> 
> And it makes sense.  It would probably have helped if I peeked the
> updated test ;-)

Since I'm re-rolling anyway, I'll expand it a bit (and also cover Ævar's
"what exactly does violate mean here" question).

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/11] limit memory allocations for v2 servers
  2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
                     ` (11 preceding siblings ...)
  2021-09-15  4:17   ` [PATCH v2 0/11] limit memory allocations for v2 servers Taylor Blau
@ 2021-09-15 18:33   ` Jeff King
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
  12 siblings, 1 reply; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:33 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

OK, hopefully third time's the charm. The changes from v2 are pretty
minor:

 - test typos noticed by Eric (none affected the outcome of the tests)

 - I added "object-format=$(test_oid algo)" to the input of a few of the
   new tests even where it doesn't change the outcome, for consistency
   with the existing tests

 - dropped out-dated "collect" comment noticed by Junio

 - explained the command=ls-refs=foo case a little further, including
   specific references in how it violates the spec

Range-diff is below.

  [01/11]: serve: rename is_command() to parse_command()
  [02/11]: serve: return capability "value" from get_capability()
  [03/11]: serve: add "receive" method for v2 capabilities table
  [04/11]: serve: provide "receive" function for object-format capability
  [05/11]: serve: provide "receive" function for session-id capability
  [06/11]: serve: drop "keys" strvec
  [07/11]: ls-refs: ignore very long ref-prefix counts
  [08/11]: docs/protocol-v2: clarify some ls-refs ref-prefix details
  [09/11]: serve: reject bogus v2 "command=ls-refs=foo"
  [10/11]: serve: reject commands used as capabilities
  [11/11]: ls-refs: reject unknown arguments

 Documentation/technical/protocol-v2.txt |   6 +-
 ls-refs.c                               |  22 ++++-
 serve.c                                 | 120 +++++++++++++-----------
 t/t5701-git-serve.sh                    |  75 +++++++++++++++
 4 files changed, 164 insertions(+), 59 deletions(-)

 1:  e642ced1e8 =  1:  71003eb21a serve: rename is_command() to parse_command()
 2:  75d119ae49 =  2:  1e86c31477 serve: return capability "value" from get_capability()
 3:  9fb21cab1d =  3:  12a159d5c8 serve: add "receive" method for v2 capabilities table
 4:  5b661c9ed5 =  4:  f5c29b5cdf serve: provide "receive" function for object-format capability
 5:  34e0ce5c12 =  5:  063cb60d1e serve: provide "receive" function for session-id capability
 6:  8e42fa9aec !  6:  0ed0b946ea serve: drop "keys" strvec
    @@ serve.c: static int process_request(void)
      
      	packet_reader_init(&reader, 0, NULL, 0,
     @@ serve.c: static int process_request(void)
    - 			/* collect request; a sequence of keys and values */
    + 		case PACKET_READ_EOF:
    + 			BUG("Should have already died when seeing EOF");
    + 		case PACKET_READ_NORMAL:
    +-			/* collect request; a sequence of keys and values */
      			if (parse_command(reader.line, &command) ||
      			    receive_client_capability(reader.line))
     -				strvec_push(&keys, reader.line);
 7:  de7ac11ad3 !  7:  a9392d0e68 ls-refs: ignore very long ref-prefix counts
    @@ t/t5701-git-serve.sh: test_expect_success 'refs/heads prefix' '
     +	# from ls-refs.c.
     +	{
     +		echo command=ls-refs &&
    -+		echo object-format=$(test_oid algo)
    ++		echo object-format=$(test_oid algo) &&
     +		echo 0001 &&
     +		perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
     +		echo 0000
 8:  3f78422fd3 =  8:  0a982f676a docs/protocol-v2: clarify some ls-refs ref-prefix details
 9:  6c55a7412d !  9:  553db6f83e serve: reject bogus v2 "command=ls-refs=foo"
    @@ Commit message
         When we see a line from the client like "command=ls-refs", we parse
         everything after the equals sign as a capability, which we check against
         our capabilities table. If we don't recognize the command (e.g.,
    -    "command=foo"), we'll reject it. But we use the same parser that checks
    -    for regular capabilities like "object-format=sha256". And so we'll
    -    accept "ls-refs=foo", even though everything after the equals is bogus,
    -    and simply ignored.
    +    "command=foo"), we'll reject it.
     
    -    This isn't really hurting anything, but the request does violate the
    -    spec. Let's tighten it up to prevent any surprising behavior.
    +    But in parse_command(), we use the same get_capability() parser for
    +    parsing non-command lines. So if we see "command=ls-refs=foo", we will
    +    feed "ls-refs=foo" to get_capability(), which will say "OK, that's
    +    ls-refs, with value 'foo'". But then we simply ignore the value
    +    entirely.
    +
    +    The client is violating the spec here, which says:
    +
    +          command = PKT-LINE("command=" key LF)
    +          key = 1*(ALPHA | DIGIT | "-_")
    +
    +    I.e., the key is not even allowed to have an equals sign in it. Whereas
    +    a real non-command capability does allow a value:
    +
    +          capability = PKT-LINE(key[=value] LF)
    +
    +    So by reusing the same get_capability() parser, we are mixing up the
    +    "key" and "capability" tokens. However, since that parser tells us
    +    whether it saw an "=", we can still use it; we just reject any input
    +    that produces a non-NULL value field.
    +
    +    The current behavior isn't really hurting anything (the client should
    +    never send such a request, and if it does, we just ignore the "value"
    +    part). But since it does violate the spec, let's tighten it up to
    +    prevent any surprising behavior.
     
         Signed-off-by: Jeff King <peff@peff.net>
     
    @@ t/t5701-git-serve.sh: test_expect_success 'request invalid command' '
      '
      
     +test_expect_success 'requested command is command=value' '
    -+	test-tool pkt-line pack >in <<-\EOF &&
    ++	test-tool pkt-line pack >in <<-EOF &&
     +	command=ls-refs=whatever
     +	object-format=$(test_oid algo)
     +	0000
10:  bbb12669b9 ! 10:  93216b9eaa serve: reject commands used as capabilities
    @@ t/t5701-git-serve.sh: test_expect_success 'request invalid command' '
      '
      
     +test_expect_success 'request capability as command' '
    -+	test-tool pkt-line pack >in <<-\EOF &&
    ++	test-tool pkt-line pack >in <<-EOF &&
     +	command=agent
    ++	object-format=$(test_oid algo)
     +	0000
     +	EOF
     +	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
     +	grep invalid.command.*agent err
     +'
     +
     +test_expect_success 'request command as capability' '
    -+	test-tool pkt-line pack >in <<-\EOF &&
    ++	test-tool pkt-line pack >in <<-EOF &&
     +	command=ls-refs
    ++	object-format=$(test_oid algo)
     +	fetch
     +	0000
     +	EOF
    @@ t/t5701-git-serve.sh: test_expect_success 'request invalid command' '
     +'
     +
      test_expect_success 'requested command is command=value' '
    - 	test-tool pkt-line pack >in <<-\EOF &&
    + 	test-tool pkt-line pack >in <<-EOF &&
      	command=ls-refs=whatever
11:  375a85b9a6 = 11:  30802eeb54 ls-refs: reject unknown arguments

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 0/11] limit memory allocations for v2 servers
  2021-09-15 18:33   ` Jeff King
@ 2021-09-15 18:34     ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 01/11] serve: rename is_command() to parse_command() Jeff King
                         ` (10 more replies)
  0 siblings, 11 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:34 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

On Wed, Sep 15, 2021 at 02:33:53PM -0400, Jeff King wrote:

> OK, hopefully third time's the charm. The changes from v2 are pretty
> minor:
> 
>  - test typos noticed by Eric (none affected the outcome of the tests)
> 
>  - I added "object-format=$(test_oid algo)" to the input of a few of the
>    new tests even where it doesn't change the outcome, for consistency
>    with the existing tests
> 
>  - dropped out-dated "collect" comment noticed by Junio
> 
>  - explained the command=ls-refs=foo case a little further, including
>    specific references in how it violates the spec
> 
> Range-diff is below.

Er, that should have been "v3" in the subject, as this one now is.

-Peff

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 01/11] serve: rename is_command() to parse_command()
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 02/11] serve: return capability "value" from get_capability() Jeff King
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

The is_command() function not only tells us whether the pktline is a
valid command string, but it also parses out the command (and complains
if we see a duplicate). Let's rename it to make those extra functions
a bit more obvious.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/serve.c b/serve.c
index 1817edc7f5..fd88b95343 100644
--- a/serve.c
+++ b/serve.c
@@ -163,7 +163,7 @@ static int is_valid_capability(const char *key)
 	return c && c->advertise(the_repository, NULL);
 }
 
-static int is_command(const char *key, struct protocol_capability **command)
+static int parse_command(const char *key, struct protocol_capability **command)
 {
 	const char *out;
 
@@ -251,7 +251,7 @@ static int process_request(void)
 			BUG("Should have already died when seeing EOF");
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
-			if (is_command(reader.line, &command) ||
+			if (parse_command(reader.line, &command) ||
 			    is_valid_capability(reader.line))
 				strvec_push(&keys, reader.line);
 			else
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 02/11] serve: return capability "value" from get_capability()
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
  2021-09-15 18:35       ` [PATCH v3 01/11] serve: rename is_command() to parse_command() Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 03/11] serve: add "receive" method for v2 capabilities table Jeff King
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

When the client sends v2 capabilities, they may be simple, like:

  foo

or have a value like:

  foo=bar

(all of the current capabilities actually expect a value, but the
protocol allows for boolean ones).

We use get_capability() to make sure the client's pktline matches a
capability. In doing so, we parse enough to see the "=" and the value
(if any), but we immediately forget it. Nobody cares for now, because they end
up parsing the values out later using has_capability(). But in
preparation for changing that, let's pass back a pointer so the callers
know what we found.

Note that unlike has_capability(), we'll return NULL for a "simple"
capability. Distinguishing these will be useful for some future patches.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/serve.c b/serve.c
index fd88b95343..78a4e83554 100644
--- a/serve.c
+++ b/serve.c
@@ -139,7 +139,7 @@ void protocol_v2_advertise_capabilities(void)
 	strbuf_release(&value);
 }
 
-static struct protocol_capability *get_capability(const char *key)
+static struct protocol_capability *get_capability(const char *key, const char **value)
 {
 	int i;
 
@@ -149,16 +149,25 @@ static struct protocol_capability *get_capability(const char *key)
 	for (i = 0; i < ARRAY_SIZE(capabilities); i++) {
 		struct protocol_capability *c = &capabilities[i];
 		const char *out;
-		if (skip_prefix(key, c->name, &out) && (!*out || *out == '='))
+		if (!skip_prefix(key, c->name, &out))
+			continue;
+		if (!*out) {
+			*value = NULL;
 			return c;
+		}
+		if (*out++ == '=') {
+			*value = out;
+			return c;
+		}
 	}
 
 	return NULL;
 }
 
 static int is_valid_capability(const char *key)
 {
-	const struct protocol_capability *c = get_capability(key);
+	const char *value;
+	const struct protocol_capability *c = get_capability(key, &value);
 
 	return c && c->advertise(the_repository, NULL);
 }
@@ -168,7 +177,8 @@ static int parse_command(const char *key, struct protocol_capability **command)
 	const char *out;
 
 	if (skip_prefix(key, "command=", &out)) {
-		struct protocol_capability *cmd = get_capability(out);
+		const char *value;
+		struct protocol_capability *cmd = get_capability(out, &value);
 
 		if (*command)
 			die("command '%s' requested after already requesting command '%s'",
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 03/11] serve: add "receive" method for v2 capabilities table
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
  2021-09-15 18:35       ` [PATCH v3 01/11] serve: rename is_command() to parse_command() Jeff King
  2021-09-15 18:35       ` [PATCH v3 02/11] serve: return capability "value" from get_capability() Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 04/11] serve: provide "receive" function for object-format capability Jeff King
                         ` (7 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We have a capabilities table that tells us what we should tell the
client we are capable of, and what to do when a client gives us a
particular command (e.g., "command=ls-refs"). But it doesn't tell us
what to do when the client sends us back a capability (e.g.,
"object-format=sha256"). We just collect them all in a strvec and hope
somebody can use them later.

Instead, let's provide a function pointer in the table to act on these.
This will eventually help us avoid collecting the strings, which will be
more efficient and less prone to mischief.

Using the new method is optional, which helps in two ways:

  - we can move existing capabilities over to this new system gradually
    in individual commits

  - some capabilities we don't actually do anything with anyway. For
    example, the client is free to say "agent=git/1.2.3" to us, but we
    do not act on the information in any way.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/serve.c b/serve.c
index 78a4e83554..a161189984 100644
--- a/serve.c
+++ b/serve.c
@@ -70,6 +70,16 @@ struct protocol_capability {
 	 * This field should be NULL for capabilities which are not commands.
 	 */
 	int (*command)(struct repository *r, struct packet_reader *request);
+
+	/*
+	 * Function called when a client requests the capability as a
+	 * non-command. This may be NULL if the capability does nothing.
+	 *
+	 * For a capability of the form "foo=bar", the value string points to
+	 * the content after the "=" (i.e., "bar"). For simple capabilities
+	 * (just "foo"), it is NULL.
+	 */
+	void (*receive)(struct repository *r, const char *value);
 };
 
 static struct protocol_capability capabilities[] = {
@@ -164,12 +174,17 @@ static struct protocol_capability *get_capability(const char *key, const char **
 	return NULL;
 }
 
-static int is_valid_capability(const char *key)
+static int receive_client_capability(const char *key)
 {
 	const char *value;
 	const struct protocol_capability *c = get_capability(key, &value);
 
-	return c && c->advertise(the_repository, NULL);
+	if (!c || !c->advertise(the_repository, NULL))
+		return 0;
+
+	if (c->receive)
+		c->receive(the_repository, value);
+	return 1;
 }
 
 static int parse_command(const char *key, struct protocol_capability **command)
@@ -262,7 +277,7 @@ static int process_request(void)
 		case PACKET_READ_NORMAL:
 			/* collect request; a sequence of keys and values */
 			if (parse_command(reader.line, &command) ||
-			    is_valid_capability(reader.line))
+			    receive_client_capability(reader.line))
 				strvec_push(&keys, reader.line);
 			else
 				die("unknown capability '%s'", reader.line);
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 04/11] serve: provide "receive" function for object-format capability
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (2 preceding siblings ...)
  2021-09-15 18:35       ` [PATCH v3 03/11] serve: add "receive" method for v2 capabilities table Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 05/11] serve: provide "receive" function for session-id capability Jeff King
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We get any "object-format" specified by the client by searching for it
in the collected list of capabilities the client sent. We can instead
just handle it as soon as they send it. This is slightly more efficient,
and gets us one step closer to dropping that collected list.

Note that we do still have to do our final hash check after receiving
all capabilities (because they might not have sent an object-format line
at all, and we still have to check that the default matches our
repository algorithm). Since the check_algorithm() function would now be
down to a single if() statement, I've just inlined it in its only
caller.

There should be no change of behavior here, except for two
broken-protocol cases:

  - if the client sends multiple conflicting object-format capabilities
    (which they should not), we'll now choose the last one rather than
    the first. We could also detect and complain about the duplicates
    quite easily now, which we could not before, but I didn't do so
    here.

  - if the client sends a bogus "object-format" with no equals sign,
    we'll now say so, rather than "unknown object format: ''"

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/serve.c b/serve.c
index a161189984..f6ea2953eb 100644
--- a/serve.c
+++ b/serve.c
@@ -10,6 +10,7 @@
 #include "upload-pack.h"
 
 static int advertise_sid = -1;
+static int client_hash_algo = GIT_HASH_SHA1;
 
 static int always_advertise(struct repository *r,
 			    struct strbuf *value)
@@ -33,6 +34,17 @@ static int object_format_advertise(struct repository *r,
 	return 1;
 }
 
+static void object_format_receive(struct repository *r,
+				  const char *algo_name)
+{
+	if (!algo_name)
+		die("object-format capability requires an argument");
+
+	client_hash_algo = hash_algo_by_name(algo_name);
+	if (client_hash_algo == GIT_HASH_UNKNOWN)
+		die("unknown object format '%s'", algo_name);
+}
+
 static int session_id_advertise(struct repository *r, struct strbuf *value)
 {
 	if (advertise_sid == -1 &&
@@ -104,6 +116,7 @@ static struct protocol_capability capabilities[] = {
 	{
 		.name = "object-format",
 		.advertise = object_format_advertise,
+		.receive = object_format_receive,
 	},
 	{
 		.name = "session-id",
@@ -228,22 +241,6 @@ static int has_capability(const struct strvec *keys, const char *capability,
 	return 0;
 }
 
-static void check_algorithm(struct repository *r, struct strvec *keys)
-{
-	int client = GIT_HASH_SHA1, server = hash_algo_by_ptr(r->hash_algo);
-	const char *algo_name;
-
-	if (has_capability(keys, "object-format", &algo_name)) {
-		client = hash_algo_by_name(algo_name);
-		if (client == GIT_HASH_UNKNOWN)
-			die("unknown object format '%s'", algo_name);
-	}
-
-	if (client != server)
-		die("mismatched object format: server %s; client %s\n",
-		    r->hash_algo->name, hash_algos[client].name);
-}
-
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -317,7 +314,10 @@ static int process_request(void)
 	if (!command)
 		die("no command requested");
 
-	check_algorithm(the_repository, &keys);
+	if (client_hash_algo != hash_algo_by_ptr(the_repository->hash_algo))
+		die("mismatched object format: server %s; client %s\n",
+		    the_repository->hash_algo->name,
+		    hash_algos[client_hash_algo].name);
 
 	if (has_capability(&keys, "session-id", &client_sid))
 		trace2_data_string("transfer", NULL, "client-sid", client_sid);
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 05/11] serve: provide "receive" function for session-id capability
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (3 preceding siblings ...)
  2021-09-15 18:35       ` [PATCH v3 04/11] serve: provide "receive" function for object-format capability Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 06/11] serve: drop "keys" strvec Jeff King
                         ` (5 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Rather than pulling the session-id string from the list of collected
capabilities, we can handle it as soon as we receive it. This gets us
closer to dropping the collected list entirely.

The behavior should be the same, with one exception. Previously if the
client sent us multiple session-id lines, we'd report only the first.
Now we'll pass each one along to trace2. This shouldn't matter in
practice, since clients shouldn't do that (and if they do, it's probably
sensible to log them all).

As this removes the last caller of the static has_capability(), we can
remove it, as well (and in fact we must to avoid -Wunused-function
complaining).

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 33 +++++++++------------------------
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/serve.c b/serve.c
index f6ea2953eb..6bbf54cbbe 100644
--- a/serve.c
+++ b/serve.c
@@ -57,6 +57,14 @@ static int session_id_advertise(struct repository *r, struct strbuf *value)
 	return 1;
 }
 
+static void session_id_receive(struct repository *r,
+			       const char *client_sid)
+{
+	if (!client_sid)
+		client_sid = "";
+	trace2_data_string("transfer", NULL, "client-sid", client_sid);
+}
+
 struct protocol_capability {
 	/*
 	 * The name of the capability.  The server uses this name when
@@ -121,6 +129,7 @@ static struct protocol_capability capabilities[] = {
 	{
 		.name = "session-id",
 		.advertise = session_id_advertise,
+		.receive = session_id_receive,
 	},
 	{
 		.name = "object-info",
@@ -221,26 +230,6 @@ static int parse_command(const char *key, struct protocol_capability **command)
 	return 0;
 }
 
-static int has_capability(const struct strvec *keys, const char *capability,
-			  const char **value)
-{
-	int i;
-	for (i = 0; i < keys->nr; i++) {
-		const char *out;
-		if (skip_prefix(keys->v[i], capability, &out) &&
-		    (!*out || *out == '=')) {
-			if (value) {
-				if (*out == '=')
-					out++;
-				*value = out;
-			}
-			return 1;
-		}
-	}
-
-	return 0;
-}
-
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -252,7 +241,6 @@ static int process_request(void)
 	struct packet_reader reader;
 	struct strvec keys = STRVEC_INIT;
 	struct protocol_capability *command = NULL;
-	const char *client_sid;
 
 	packet_reader_init(&reader, 0, NULL, 0,
 			   PACKET_READ_CHOMP_NEWLINE |
@@ -319,9 +307,6 @@ static int process_request(void)
 		    the_repository->hash_algo->name,
 		    hash_algos[client_hash_algo].name);
 
-	if (has_capability(&keys, "session-id", &client_sid))
-		trace2_data_string("transfer", NULL, "client-sid", client_sid);
-
 	command->command(the_repository, &reader);
 
 	strvec_clear(&keys);
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 06/11] serve: drop "keys" strvec
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (4 preceding siblings ...)
  2021-09-15 18:35       ` [PATCH v3 05/11] serve: provide "receive" function for session-id capability Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
                         ` (4 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We collect the set of capabilities the client sends us in a strvec.
While this is usually small, there's no limit to the number of
capabilities the client can send us (e.g., they could just send us
"agent" pkt-lines over and over, and we'd keep adding them to the list).

Since all code has been converted away from using this list, let's get
rid of it. This avoids a potential attack where clients waste our
memory.

Note that we do have to replace it with a flag, because some of the
flush-packet logic checks whether we've seen any valid commands or keys.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/serve.c b/serve.c
index 6bbf54cbbe..1a7c8a118f 100644
--- a/serve.c
+++ b/serve.c
@@ -239,7 +239,7 @@ static int process_request(void)
 {
 	enum request_state state = PROCESS_REQUEST_KEYS;
 	struct packet_reader reader;
-	struct strvec keys = STRVEC_INIT;
+	int seen_capability_or_command = 0;
 	struct protocol_capability *command = NULL;
 
 	packet_reader_init(&reader, 0, NULL, 0,
@@ -260,10 +260,9 @@ static int process_request(void)
 		case PACKET_READ_EOF:
 			BUG("Should have already died when seeing EOF");
 		case PACKET_READ_NORMAL:
-			/* collect request; a sequence of keys and values */
 			if (parse_command(reader.line, &command) ||
 			    receive_client_capability(reader.line))
-				strvec_push(&keys, reader.line);
+				seen_capability_or_command = 1;
 			else
 				die("unknown capability '%s'", reader.line);
 
@@ -275,7 +274,7 @@ static int process_request(void)
 			 * If no command and no keys were given then the client
 			 * wanted to terminate the connection.
 			 */
-			if (!keys.nr)
+			if (!seen_capability_or_command)
 				return 1;
 
 			/*
@@ -309,7 +308,6 @@ static int process_request(void)
 
 	command->command(the_repository, &reader);
 
-	strvec_clear(&keys);
 	return 0;
 }
 
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 07/11] ls-refs: ignore very long ref-prefix counts
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (5 preceding siblings ...)
  2021-09-15 18:35       ` [PATCH v3 06/11] serve: drop "keys" strvec Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:35       ` [PATCH v3 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details Jeff King
                         ` (3 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Because each "ref-prefix" capability from the client comes in its own
pkt-line, there's no limit to the number of them that a misbehaving
client may send. We read them all into a strvec, which means the client
can waste arbitrary amounts of our memory by just sending us "ref-prefix
foo" over and over.

One possible solution is to just drop the connection when the limit is
reached. If we set it high enough, then only misbehaving or malicious
clients would hit it. But "high enough" is vague, and it's unfriendly if
we guess wrong and a legitimate client hits this.

But we can do better. Since supporting the ref-prefix capability is
optional anyway, the client has to further cull the response based on
their own patterns. So we can simply ignore the patterns once we cross a
certain threshold. Note that we have to ignore _all_ patterns, not just
the ones past our limit (since otherwise we'd send too little data).

The limit here is fairly arbitrary, and probably much higher than anyone
would need in practice. It might be worth limiting it further, if only
because we check it linearly (so with "m" local refs and "n" patterns,
we do "m * n" string comparisons). But if we care about optimizing this,
an even better solution may be a more advanced data structure anyway.

I didn't bother making the limit configurable, since it's so high and
since Git should behave correctly in either case. It wouldn't be too
hard to do, but it makes both the code and documentation more complex.

Signed-off-by: Jeff King <peff@peff.net>
---
 ls-refs.c            | 20 ++++++++++++++++++--
 t/t5701-git-serve.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index a1a0250607..18c4f41e87 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -40,6 +40,12 @@ static void ensure_config_read(void)
 	config_read = 1;
 }
 
+/*
+ * If we see this many or more "ref-prefix" lines from the client, we consider
+ * it "too many" and will avoid using the prefix feature entirely.
+ */
+#define TOO_MANY_PREFIXES 65536
+
 /*
  * Check if one of the prefixes is a prefix of the ref.
  * If no prefixes were provided, all refs match.
@@ -156,15 +162,25 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 			data.peel = 1;
 		else if (!strcmp("symrefs", arg))
 			data.symrefs = 1;
-		else if (skip_prefix(arg, "ref-prefix ", &out))
-			strvec_push(&data.prefixes, out);
+		else if (skip_prefix(arg, "ref-prefix ", &out)) {
+			if (data.prefixes.nr < TOO_MANY_PREFIXES)
+				strvec_push(&data.prefixes, out);
+		}
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
 		die(_("expected flush after ls-refs arguments"));
 
+	/*
+	 * If we saw too many prefixes, we must avoid using them at all; as
+	 * soon as we have any prefix, they are meant to form a comprehensive
+	 * list.
+	 */
+	if (data.prefixes.nr >= TOO_MANY_PREFIXES)
+		strvec_clear(&data.prefixes);
+
 	send_possibly_unborn_head(&data);
 	if (!data.prefixes.nr)
 		strvec_push(&data.prefixes, "");
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 930721f053..520672f842 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -158,6 +158,37 @@ test_expect_success 'refs/heads prefix' '
 	test_cmp expect actual
 '
 
+test_expect_success 'ignore very large set of prefixes' '
+	# generate a large number of ref-prefixes that we expect
+	# to match nothing; the value here exceeds TOO_MANY_PREFIXES
+	# from ls-refs.c.
+	{
+		echo command=ls-refs &&
+		echo object-format=$(test_oid algo) &&
+		echo 0001 &&
+		perl -le "print \"ref-prefix refs/heads/\$_\" for (1..65536)" &&
+		echo 0000
+	} |
+	test-tool pkt-line pack >in &&
+
+	# and then confirm that we see unmatched prefixes anyway (i.e.,
+	# that the prefix was not applied).
+	cat >expect <<-EOF &&
+	$(git rev-parse HEAD) HEAD
+	$(git rev-parse refs/heads/dev) refs/heads/dev
+	$(git rev-parse refs/heads/main) refs/heads/main
+	$(git rev-parse refs/heads/release) refs/heads/release
+	$(git rev-parse refs/tags/annotated-tag) refs/tags/annotated-tag
+	$(git rev-parse refs/tags/one) refs/tags/one
+	$(git rev-parse refs/tags/two) refs/tags/two
+	0000
+	EOF
+
+	test-tool serve-v2 --stateless-rpc <in >out &&
+	test-tool pkt-line unpack <out >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (6 preceding siblings ...)
  2021-09-15 18:35       ` [PATCH v3 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
@ 2021-09-15 18:35       ` Jeff King
  2021-09-15 18:36       ` [PATCH v3 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
                         ` (2 subsequent siblings)
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:35 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

We've never documented the fact that a client can provide multiple
ref-prefix capabilities. Let's describe the behavior.

We also never discussed the "best effort" nature of the prefixes. The
client side of git.git has always treated them this way, filtering the
result with local patterns. And indeed any client must do this, because
the prefix patterns are not sufficient to express the usual refspecs
(and so for "foo" we ask for "refs/heads/foo", "refs/tags/foo", and so
on).

So this may be considered a change in the spec with respect to client
expectations / requirements, but it's mostly codifying existing
behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/protocol-v2.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 213538f1d0..9347b2ad13 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -193,7 +193,11 @@ ls-refs takes in the following arguments:
 	Show peeled tags.
     ref-prefix <prefix>
 	When specified, only references having a prefix matching one of
-	the provided prefixes are displayed.
+	the provided prefixes are displayed. Multiple instances may be
+	given, in which case references matching any prefix will be
+	shown. Note that this is purely for optimization; a server MAY
+	show refs not matching the prefix if it chooses, and clients
+	should filter the result themselves.
 
 If the 'unborn' feature is advertised the following argument can be
 included in the client's request.
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 09/11] serve: reject bogus v2 "command=ls-refs=foo"
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (7 preceding siblings ...)
  2021-09-15 18:35       ` [PATCH v3 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details Jeff King
@ 2021-09-15 18:36       ` Jeff King
  2021-09-15 18:36       ` [PATCH v3 10/11] serve: reject commands used as capabilities Jeff King
  2021-09-15 18:36       ` [PATCH v3 11/11] ls-refs: reject unknown arguments Jeff King
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:36 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

When we see a line from the client like "command=ls-refs", we parse
everything after the equals sign as a capability, which we check against
our capabilities table. If we don't recognize the command (e.g.,
"command=foo"), we'll reject it.

But in parse_command(), we use the same get_capability() parser for
parsing non-command lines. So if we see "command=ls-refs=foo", we will
feed "ls-refs=foo" to get_capability(), which will say "OK, that's
ls-refs, with value 'foo'". But then we simply ignore the value
entirely.

The client is violating the spec here, which says:

      command = PKT-LINE("command=" key LF)
      key = 1*(ALPHA | DIGIT | "-_")

I.e., the key is not even allowed to have an equals sign in it. Whereas
a real non-command capability does allow a value:

      capability = PKT-LINE(key[=value] LF)

So by reusing the same get_capability() parser, we are mixing up the
"key" and "capability" tokens. However, since that parser tells us
whether it saw an "=", we can still use it; we just need to reject any
input that produces a non-NULL value field.

The current behavior isn't really hurting anything (the client should
never send such a request, and if it does, we just ignore the "value"
part). But since it does violate the spec, let's tighten it up to
prevent any surprising behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c              |  2 +-
 t/t5701-git-serve.sh | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/serve.c b/serve.c
index 1a7c8a118f..db5ecfed2d 100644
--- a/serve.c
+++ b/serve.c
@@ -220,7 +220,7 @@ static int parse_command(const char *key, struct protocol_capability **command)
 		if (*command)
 			die("command '%s' requested after already requesting command '%s'",
 			    out, (*command)->name);
-		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command)
+		if (!cmd || !cmd->advertise(the_repository, NULL) || !cmd->command || value)
 			die("invalid command '%s'", out);
 
 		*command = cmd;
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 520672f842..2e51886def 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -72,6 +72,16 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'requested command is command=value' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=ls-refs=whatever
+	object-format=$(test_oid algo)
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep invalid.command.*ls-refs=whatever err
+'
+
 test_expect_success 'wrong object-format' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 10/11] serve: reject commands used as capabilities
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (8 preceding siblings ...)
  2021-09-15 18:36       ` [PATCH v3 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
@ 2021-09-15 18:36       ` Jeff King
  2021-09-15 18:36       ` [PATCH v3 11/11] ls-refs: reject unknown arguments Jeff King
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:36 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

Our table of v2 "capabilities" contains everything we might tell the
client we support. But there are differences in how we expect the client
to respond. Some of the entries are true capabilities (i.e., we expect
the client to say "yes, I support this"), and some are ones we expect
them to send as commands (with "command=ls-refs" or similar).

When we receive a capability used as a command, we complain about that.
But when we receive a command used as a capability (e.g., just "ls-refs"
in a pkt-line by itself), we silently ignore it.

This isn't really hurting anything (clients shouldn't send it, and we'll
ignore it), but we can tighten up the protocol to match what we expect
to happen.

There are two new tests here. The first one checks a capability used as
a command, which already passes. The second tests a command as a
capability, which this patch fixes.

Signed-off-by: Jeff King <peff@peff.net>
---
 serve.c              |  2 +-
 t/t5701-git-serve.sh | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/serve.c b/serve.c
index db5ecfed2d..b3fe9b5126 100644
--- a/serve.c
+++ b/serve.c
@@ -201,7 +201,7 @@ static int receive_client_capability(const char *key)
 	const char *value;
 	const struct protocol_capability *c = get_capability(key, &value);
 
-	if (!c || !c->advertise(the_repository, NULL))
+	if (!c || c->command || !c->advertise(the_repository, NULL))
 		return 0;
 
 	if (c->receive)
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 2e51886def..3928424e1b 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -72,6 +72,27 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'request capability as command' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=agent
+	object-format=$(test_oid algo)
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep invalid.command.*agent err
+'
+
+test_expect_success 'request command as capability' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=ls-refs
+	object-format=$(test_oid algo)
+	fetch
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep unknown.capability err
+'
+
 test_expect_success 'requested command is command=value' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs=whatever
-- 
2.33.0.917.g33ebf6a5f6


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 11/11] ls-refs: reject unknown arguments
  2021-09-15 18:34     ` [PATCH v3 " Jeff King
                         ` (9 preceding siblings ...)
  2021-09-15 18:36       ` [PATCH v3 10/11] serve: reject commands used as capabilities Jeff King
@ 2021-09-15 18:36       ` Jeff King
  10 siblings, 0 replies; 77+ messages in thread
From: Jeff King @ 2021-09-15 18:36 UTC (permalink / raw)
  To: git
  Cc: Eric Sunshine, Junio C Hamano, Taylor Blau, Martin Ågren,
	Ævar Arnfjörð Bjarmason

The v2 ls-refs command may receive extra arguments from the client, one
per pkt-line. The spec is pretty clear that the arguments must come from
a specified set, but we silently ignore any unknown entries. For a
well-behaved client this doesn't matter, but it makes testing and
debugging more confusing. Let's tighten this up to match the spec.

In theory this liberal behavior _could_ be useful for extending the
protocol. But:

  - every other part of the protocol requires that the server first
    indicate that it supports the argument; this includes the fetch and
    object-info commands, plus the "unborn" capability added to ls-refs
    itself

  - it's not a very good extension mechanism anyway; without the server
    advertising support, clients would have no idea if the argument was
    silently ignored, or accepted and simply had no effect

So we're not really losing anything by tightening this.

Signed-off-by: Jeff King <peff@peff.net>
---
 ls-refs.c            |  2 ++
 t/t5701-git-serve.sh | 13 +++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/ls-refs.c b/ls-refs.c
index 18c4f41e87..460ac9b229 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -168,6 +168,8 @@ int ls_refs(struct repository *r, struct packet_reader *request)
 		}
 		else if (!strcmp("unborn", arg))
 			data.unborn = allow_unborn;
+		else
+			die(_("unexpected line: '%s'"), arg);
 	}
 
 	if (request->status != PACKET_READ_FLUSH)
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index 3928424e1b..aa1827d841 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -147,6 +147,19 @@ test_expect_success 'basics of ls-refs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'ls-refs complains about unknown options' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=ls-refs
+	object-format=$(test_oid algo)
+	0001
+	no-such-arg
+	0000
+	EOF
+
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	grep unexpected.line.*no-such-arg err
+'
+
 test_expect_success 'basic ref-prefixes' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
-- 
2.33.0.917.g33ebf6a5f6

^ permalink raw reply related	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2021-09-15 18:36 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-14 15:29 [PATCH 0/9] reducing memory allocations for v2 servers Jeff King
2021-09-14 15:30 ` [PATCH 1/9] serve: rename is_command() to parse_command() Jeff King
2021-09-14 15:30 ` [PATCH 2/9] serve: return capability "value" from get_capability() Jeff King
2021-09-14 15:31 ` [PATCH 3/9] serve: add "receive" method for v2 capabilities table Jeff King
2021-09-14 15:31 ` [PATCH 4/9] serve: provide "receive" function for object-format capability Jeff King
2021-09-14 18:59   ` Martin Ågren
2021-09-14 15:33 ` [PATCH 5/9] serve: provide "receive" function for session-id capability Jeff King
2021-09-14 16:55   ` Taylor Blau
2021-09-14 17:06     ` Jeff King
2021-09-14 17:12       ` Taylor Blau
2021-09-14 19:02   ` Martin Ågren
2021-09-14 19:14     ` Jeff King
2021-09-14 15:33 ` [PATCH 6/9] serve: drop "keys" strvec Jeff King
2021-09-14 16:59   ` Taylor Blau
2021-09-14 17:16     ` Jeff King
2021-09-14 15:37 ` [PATCH 7/9] ls-refs: ignore very long ref-prefix counts Jeff King
2021-09-14 17:18   ` Taylor Blau
2021-09-14 17:27     ` Jeff King
2021-09-14 17:23   ` Jeff King
2021-09-14 19:06   ` Martin Ågren
2021-09-14 19:22     ` Jeff King
2021-09-14 22:09   ` Jeff King
2021-09-14 22:11     ` Taylor Blau
2021-09-14 22:15       ` Jeff King
2021-09-14 15:37 ` [PATCH 8/9] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
2021-09-14 17:21   ` Taylor Blau
2021-09-14 15:37 ` [PATCH 9/9] serve: reject commands used as capabilities Jeff King
2021-09-14 17:30 ` [PATCH 0/9] reducing memory allocations for v2 servers Taylor Blau
2021-09-14 18:00 ` Junio C Hamano
2021-09-14 18:38   ` Jeff King
2021-09-14 23:51 ` [PATCH v2 0/11] limit " Jeff King
2021-09-14 23:51   ` [PATCH v2 01/11] serve: rename is_command() to parse_command() Jeff King
2021-09-14 23:51   ` [PATCH v2 02/11] serve: return capability "value" from get_capability() Jeff King
2021-09-14 23:51   ` [PATCH v2 03/11] serve: add "receive" method for v2 capabilities table Jeff King
2021-09-15  0:31     ` Ævar Arnfjörð Bjarmason
2021-09-15 16:35       ` Jeff King
2021-09-15 16:41     ` Junio C Hamano
2021-09-15 16:57       ` Jeff King
2021-09-14 23:51   ` [PATCH v2 04/11] serve: provide "receive" function for object-format capability Jeff King
2021-09-15 16:54     ` Junio C Hamano
2021-09-14 23:51   ` [PATCH v2 05/11] serve: provide "receive" function for session-id capability Jeff King
2021-09-15 16:56     ` Junio C Hamano
2021-09-14 23:51   ` [PATCH v2 06/11] serve: drop "keys" strvec Jeff King
2021-09-15 17:01     ` Junio C Hamano
2021-09-14 23:51   ` [PATCH v2 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
2021-09-15  4:16     ` Taylor Blau
2021-09-15 16:39       ` Jeff King
2021-09-15  5:00     ` Eric Sunshine
2021-09-15 16:40       ` Jeff King
2021-09-14 23:52   ` [PATCH v2 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details Jeff King
2021-09-14 23:52   ` [PATCH v2 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
2021-09-15  0:27     ` Ævar Arnfjörð Bjarmason
2021-09-15 16:28       ` Jeff King
2021-09-15  5:09     ` Eric Sunshine
2021-09-15 16:32       ` Jeff King
2021-09-15 17:33     ` Junio C Hamano
2021-09-15 17:39       ` Jeff King
2021-09-14 23:52   ` [PATCH v2 10/11] serve: reject commands used as capabilities Jeff King
2021-09-14 23:54   ` [PATCH v2 11/11] ls-refs: reject unknown arguments Jeff King
2021-09-15  0:09     ` Ævar Arnfjörð Bjarmason
2021-09-15 16:25       ` Jeff King
2021-09-15  4:17   ` [PATCH v2 0/11] limit memory allocations for v2 servers Taylor Blau
2021-09-15 18:33   ` Jeff King
2021-09-15 18:34     ` [PATCH v3 " Jeff King
2021-09-15 18:35       ` [PATCH v3 01/11] serve: rename is_command() to parse_command() Jeff King
2021-09-15 18:35       ` [PATCH v3 02/11] serve: return capability "value" from get_capability() Jeff King
2021-09-15 18:35       ` [PATCH v3 03/11] serve: add "receive" method for v2 capabilities table Jeff King
2021-09-15 18:35       ` [PATCH v3 04/11] serve: provide "receive" function for object-format capability Jeff King
2021-09-15 18:35       ` [PATCH v3 05/11] serve: provide "receive" function for session-id capability Jeff King
2021-09-15 18:35       ` [PATCH v3 06/11] serve: drop "keys" strvec Jeff King
2021-09-15 18:35       ` [PATCH v3 07/11] ls-refs: ignore very long ref-prefix counts Jeff King
2021-09-15 18:35       ` [PATCH v3 08/11] docs/protocol-v2: clarify some ls-refs ref-prefix details Jeff King
2021-09-15 18:36       ` [PATCH v3 09/11] serve: reject bogus v2 "command=ls-refs=foo" Jeff King
2021-09-15 18:36       ` [PATCH v3 10/11] serve: reject commands used as capabilities Jeff King
2021-09-15 18:36       ` [PATCH v3 11/11] ls-refs: reject unknown arguments Jeff King
2021-09-15  0:25 ` [PATCH 0/9] reducing memory allocations for v2 servers Ævar Arnfjörð Bjarmason
2021-09-15 16:41   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).