git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] promisor-remote: fix segfault when remote URL is missing
@ 2025-03-10  7:40 Christian Couder
  2025-03-10 16:29 ` Junio C Hamano
  2025-03-11 15:24 ` [PATCH v2] " Christian Couder
  0 siblings, 2 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-10  7:40 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder, Christian Couder

Pushing NULL into a 'strvec' results in a segfault, as 'strvec' is a
kind of NULL terminated array that is designed to be compatible with
'argv' variables used on the command line.

So when an URL is missing from the config, let's push an empty string
instead of NULL into the 'strvec' that stores URLs.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---

This is a fix for cc/lop-remote. Sorry for sending it late in the cycle.

 promisor-remote.c                     | 10 +++++++---
 t/t5710-promisor-remote-capability.sh | 16 ++++++++++++++++
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..56567c6e45 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,11 +323,15 @@ static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		char *url = NULL;
+		const char *url_pushed = "";
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
+		if (!git_config_get_string(url_key, &url) && url)
+			url_pushed = url;
+
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+		strvec_push(urls, url_pushed);
 
 		free(url);
 		free(url_key);
@@ -356,7 +360,7 @@ char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
+		if (urls.v[i] && *urls.v[i]) {
 			strbuf_addstr(&sb, ",url=");
 			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 		}
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..d9c5676e4d 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,22 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
-- 
2.49.0.rc1.84.gc0d5bab425.dirty



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH] promisor-remote: fix segfault when remote URL is missing
  2025-03-10  7:40 [PATCH] promisor-remote: fix segfault when remote URL is missing Christian Couder
@ 2025-03-10 16:29 ` Junio C Hamano
  2025-03-11 15:24   ` Christian Couder
  2025-03-11 15:24 ` [PATCH v2] " Christian Couder
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-03-10 16:29 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> Pushing NULL into a 'strvec' results in a segfault, as 'strvec' is a
> kind of NULL terminated array that is designed to be compatible with
> 'argv' variables used on the command line.

It is good that you corrected a caller that tries to make a strvec
into such a state, but shouldn't strvec_push() protect itself with a
BUG or something, I have to wonder.

> So when an URL is missing from the config, let's push an empty string
> instead of NULL into the 'strvec' that stores URLs.

How will these strings in the "names" strvec used?  When URLs are
present, I'm sure we are going to use it as URL, but when they are
missing, what happens?  The second hunk of this patch seems to
ignore such a broken entry with an empty URL, but that smells like
sweeping a problem under the rug.  Shouldn't such a promisor be
either diagnosed as an error, or skipped when you accumulate URLs to
be used into the strvec, so that the later code (i.e. the second
hunk) does not even have to worry about it?

> @@ -356,7 +360,7 @@ char *promisor_remote_info(struct repository *repo)
>  			strbuf_addch(&sb, ';');
>  		strbuf_addstr(&sb, "name=");
>  		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
> -		if (urls.v[i]) {
> +		if (urls.v[i] && *urls.v[i]) {

Why does urls.v[i] need to be checked?  Didn't you just make sure
that the strvec would not have NULL in it?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-10  7:40 [PATCH] promisor-remote: fix segfault when remote URL is missing Christian Couder
  2025-03-10 16:29 ` Junio C Hamano
@ 2025-03-11 15:24 ` Christian Couder
  2025-03-11 16:59   ` Junio C Hamano
                     ` (3 more replies)
  1 sibling, 4 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-11 15:24 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder, Christian Couder

Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's push an empty string
instead of `NULL` into the 'strvec' that stores URLs.

We could have modified strvec_push() to behave like
strvec_push_nodup() and accept `NULL`, but it's not clear that it's
the right thing to do for the strvec API. 'strvec' is a kind of NULL
terminated array that is designed to be compatible with 'argv'
variables used on the command line. So we might want to disallow
pushing any `NULL` in it instead.

It's also not clear if `xstrdup(NULL)` should crash or BUG or just
return NULL.

For all these reasons, let's just focus on fixing the issue in
"promisor-remote.c" and let's leave improving the strvec API and/or
xstrdup() for a future effort.

While at it let's warn and reject the remote, in the 'KnownUrl'
case, when no URL is advertised by the server or no URL is
configured on the client for a remote name advertised by the server
and configured on the client. This is on par with a warning already
emitted when URLs are different in the same case.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c                     | 20 +++++++++++---
 t/t5710-promisor-remote-capability.sh | 39 +++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..92786d72b4 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,11 +323,15 @@ static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		char *url = NULL;
+		const char *url_pushed = "";
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
+		if (!git_config_get_string(url_key, &url) && url)
+			url_pushed = url;
+
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+		strvec_push(urls, url_pushed);
 
 		free(url);
 		free(url_key);
@@ -356,7 +360,7 @@ char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
+		if (*urls.v[i]) {
 			strbuf_addstr(&sb, ",url=");
 			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 		}
@@ -409,6 +413,16 @@ static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url) {
+		warning(_("no URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*urls->v[i]) {
+		warning(_("no URL configured for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..23203814d6 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
@@ -228,6 +247,26 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not advertised" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config unset remote.lop.url &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP
+	# as URLs are different, and the server cannot lazy fetch as
+	# the LOP URL is missing.
+	GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&
 
-- 
2.49.0.rc2.1.gb8a6f6852f



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH] promisor-remote: fix segfault when remote URL is missing
  2025-03-10 16:29 ` Junio C Hamano
@ 2025-03-11 15:24   ` Christian Couder
  2025-03-11 16:57     ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Couder @ 2025-03-11 15:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Mon, Mar 10, 2025 at 5:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > Pushing NULL into a 'strvec' results in a segfault, as 'strvec' is a
> > kind of NULL terminated array that is designed to be compatible with
> > 'argv' variables used on the command line.
>
> It is good that you corrected a caller that tries to make a strvec
> into such a state, but shouldn't strvec_push() protect itself with a
> BUG or something, I have to wonder.

Actually strvec_push() uses xstrdup() on the value it is passed and
xstrdup() crashes if that value is NULL. So another way to avoid
crashes would be to make xstrdup() BUG when it's passed NULL. Or maybe
xstrdup() should just return NULL in this case?

Also it looks like strvec_push_nodup() kind of works if it is passed a
NULL. (It adds the NULL to the array and grows it.) So I wonder if the
right solution for strvec_push() would be to make it kind of work in
the same way.

Anyway I think these are separate issues that deserve their own
discussions and can wait for after the 2.49.0 release. Here I am just
providing a hotfix for the "promisor-remote" protocol capability.

> > So when an URL is missing from the config, let's push an empty string
> > instead of NULL into the 'strvec' that stores URLs.
>
> How will these strings in the "names" strvec used?  When URLs are
> present, I'm sure we are going to use it as URL, but when they are
> missing, what happens?

The 'names' strvec and 'urls' strvec contain what exists in the client
config. They are only used by the promisor remote code to compare the
names and maybe urls in the config with what the server advertises in
case 'promisor.acceptfromserver' is set to KnownName or KnownUrl. This
is done to decide if an advertised promisor remote is accepted or not
by the client.

When 'promisor.acceptfromserver' is set to KnownUrl, a remote should
be rejected in any of the following cases:

  - the server doesn't advertise an URL for that remote,
  - the client doesn't have an URL configured for that remote.

When 'promisor.acceptfromserver' is set to KnownName, URLs should not
be taken into account to decide if an advertised promisor remote is
accepted or not.

Note that the promisor remote code added by this series doesn't change
the code that actually uses the remote names and urls to clone or
fetch objects, so there is no change there. In particular, if the
client doesn't have an URL configured for a remote, even if the remote
is accepted and the server provides an URL, the client will not be
able to fetch or clone from the remote as it will not use the server
provided URL. The test called "clone with 'KnownName' and missing URL
in the config" shows that.

> The second hunk of this patch seems to
> ignore such a broken entry with an empty URL, but that smells like
> sweeping a problem under the rug.  Shouldn't such a promisor be
> either diagnosed as an error, or skipped when you accumulate URLs to
> be used into the strvec, so that the later code (i.e. the second
> hunk) does not even have to worry about it?

If 'promisor.acceptfromserver' is set to KnownName, I think it is
simpler to just ignore URLs altogether. Such a behavior is easier to
document and implement.

Also if we ever develop a mode where the advertised URL can be used
for cloning or fetching by the client, then it won't matter if no URL
is configured on the client side. In fact it might be the common case.

Skipping remotes with no URL would make it more difficult to explain
why a remote was rejected in the KnownUrl case. In the next version, I
have added checks and warnings to help diagnose why a remote is
rejected when a URL is missing.

I have also added a test case where the LOP URL is not configured on
the server side, so not advertised.

> > @@ -356,7 +360,7 @@ char *promisor_remote_info(struct repository *repo)
> >                       strbuf_addch(&sb, ';');
> >               strbuf_addstr(&sb, "name=");
> >               strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
> > -             if (urls.v[i]) {
> > +             if (urls.v[i] && *urls.v[i]) {
>
> Why does urls.v[i] need to be checked?  Didn't you just make sure
> that the strvec would not have NULL in it?

Yeah, right. I changed this to just `if (*urls.v[i])` in the next version.

Thanks for your review.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 15:24   ` Christian Couder
@ 2025-03-11 16:57     ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-03-11 16:57 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> Actually strvec_push() uses xstrdup() on the value it is passed and
> xstrdup() crashes if that value is NULL. So another way to avoid
> crashes would be to make xstrdup() BUG when it's passed NULL. Or maybe
> xstrdup() should just return NULL in this case?
>
> Also it looks like strvec_push_nodup() kind of works if it is passed a
> NULL. (It adds the NULL to the array and grows it.) So I wonder if the
> right solution for strvec_push() would be to make it kind of work in
> the same way.

Meaning "xstrdup" -> "xstrdup_or_null"?  

It actually may be a reasonable thing to do, to remain parallel to
the _nodup() variant, but given strvec is about making it easier to
do argc/argv[], allowing NULL in it does not sound like it is in
line with its stated purpose.

I also do not think checking inside xstrdup() is a good idea; it is
at way too low a level.

> Anyway I think these are separate issues that deserve their own
> discussions and can wait for after the 2.49.0 release.

Absolutely.  

> Here I am just
> providing a hotfix for the "promisor-remote" protocol capability.

Ooof, I didn't realize that cc/lop-remote escaped
'seen' with such a bug X-<.

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 15:24 ` [PATCH v2] " Christian Couder
@ 2025-03-11 16:59   ` Junio C Hamano
  2025-03-12 11:48     ` Christian Couder
  2025-03-11 20:48   ` Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-03-11 16:59 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> Using strvec_push() to push `NULL` into a 'strvec' results in a
> segfault, because `xstrdup(NULL)` crashes.
>
> So when an URL is missing from the config, let's push an empty string
> instead of `NULL` into the 'strvec' that stores URLs.
>

> We could have modified strvec_push() to behave like
> strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> the right thing to do for the strvec API. 'strvec' is a kind of NULL
> terminated array that is designed to be compatible with 'argv'
> variables used on the command line. So we might want to disallow
> pushing any `NULL` in it instead.
>
> It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> return NULL.

Yup, the above two paragraphs are irrelevant, I would think.

What we could have done may be to ignore (or error out) a
configuration entry that lacks URL as an error.  After all, isn't
this caused by a misconfiguration?  How such a misconfiguration is
swept under the rug, whether with an empty string or NULL, is
secondary, I would think.

> For all these reasons, let's just focus on fixing the issue in
> "promisor-remote.c" and let's leave improving the strvec API and/or
> xstrdup() for a future effort.

Absolutely.

> While at it let's warn and reject the remote, in the 'KnownUrl'
> case, when no URL is advertised by the server or no URL is
> configured on the client for a remote name advertised by the server
> and configured on the client. This is on par with a warning already
> emitted when URLs are different in the same case.

Yup.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 15:24 ` [PATCH v2] " Christian Couder
  2025-03-11 16:59   ` Junio C Hamano
@ 2025-03-11 20:48   ` Junio C Hamano
  2025-03-12 11:47     ` Christian Couder
  2025-03-11 23:06   ` Jeff King
  2025-03-12 11:46   ` [PATCH v3] " Christian Couder
  3 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-03-11 20:48 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> +	GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \

This one triggers test-lint violation.

diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index 23203814d6..4c5c3c7656 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -257,7 +257,7 @@ test_expect_success "clone with 'KnownUrl' and url not advertised" '
 	# It should fail because the client will reject the LOP
 	# as URLs are different, and the server cannot lazy fetch as
 	# the LOP URL is missing.
-	GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
 		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
 		-c remote.lop.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=KnownUrl \


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 15:24 ` [PATCH v2] " Christian Couder
  2025-03-11 16:59   ` Junio C Hamano
  2025-03-11 20:48   ` Junio C Hamano
@ 2025-03-11 23:06   ` Jeff King
  2025-03-11 23:36     ` Junio C Hamano
  2025-03-12 11:47     ` Christian Couder
  2025-03-12 11:46   ` [PATCH v3] " Christian Couder
  3 siblings, 2 replies; 35+ messages in thread
From: Jeff King @ 2025-03-11 23:06 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder

On Tue, Mar 11, 2025 at 04:24:13PM +0100, Christian Couder wrote:

> Using strvec_push() to push `NULL` into a 'strvec' results in a
> segfault, because `xstrdup(NULL)` crashes.
> 
> So when an URL is missing from the config, let's push an empty string
> instead of `NULL` into the 'strvec' that stores URLs.

Is a configured remote with out a url key really a missing url, though?
In other contexts it defaults to the name of the remote. E.g.:

  # make a repo so "foo" is a valid url
  git init foo
  git -C foo commit --allow-empty bar

  # configure a fetch refspec, but no url!
  git init
  git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*'

  # now fetching will use the configured refspec with a url of "foo"
  git fetch foo

  # and git-remote will report it, along with its url
  git remote ;# shows "foo"
  git remote --get-url foo ;# also shows "foo"

This is obviously a weird thing to be doing, so I admit I don't really
care all that much. But it feels like the most natural thing is just:

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..761eb1dbd5 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -327,7 +327,7 @@ static void promisor_info_vecs(struct repository *repo,
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+		strvec_push(urls, git_config_get_string(url_key, &url) ? r->name : url);
 
 		free(url);
 		free(url_key);

> We could have modified strvec_push() to behave like
> strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> the right thing to do for the strvec API. 'strvec' is a kind of NULL
> terminated array that is designed to be compatible with 'argv'
> variables used on the command line. So we might want to disallow
> pushing any `NULL` in it instead.
> 
> It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> return NULL.

We have xstrdup_or_null() for the latter suggestion. There was some
light discussion at the time about having xstrdup(NULL) handle this
automatically:

  https://lore.kernel.org/git/20150112231231.GA4023@peff.net/

but it was mostly negative. I don't think anybody really dug into the
thought experiment beyond a general "it might propagate NULL places you
wouldn't expect" vibe, though.

For the same reason I'd be a little hesitant to bless NULLs inside
strvec structures. I think "nodup" allowing them is mostly an unintended
consequence.

> For all these reasons, let's just focus on fixing the issue in
> "promisor-remote.c" and let's leave improving the strvec API and/or
> xstrdup() for a future effort.

This part I certainly agree with. ;)

>  	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> -		char *url;
> +		char *url = NULL;
> +		const char *url_pushed = "";
>  		char *url_key = xstrfmt("remote.%s.url", r->name);
>  
> +		if (!git_config_get_string(url_key, &url) && url)
> +			url_pushed = url;
> +
>  		strvec_push(names, r->name);
> -		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> +		strvec_push(urls, url_pushed);
>  
>  		free(url);

Probably not super important, but while reading this I noticed that
using git_config_get_string_tmp() would make the memory management a
little simpler (since you do not need to free "url", you are free to
point it to at the empty string and do not need a separate url_pushed).

-Peff


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 23:06   ` Jeff King
@ 2025-03-11 23:36     ` Junio C Hamano
  2025-03-12 11:47     ` Christian Couder
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-03-11 23:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Christian Couder, git, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder

Jeff King <peff@peff.net> writes:

> Is a configured remote with out a url key really a missing url, though?
> In other contexts it defaults to the name of the remote. E.g.:
>
>   # make a repo so "foo" is a valid url
>   git init foo
>   git -C foo commit --allow-empty bar
>
>   # configure a fetch refspec, but no url!
>   git init
>   git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*'
>
>   # now fetching will use the configured refspec with a url of "foo"
>   git fetch foo
>
>   # and git-remote will report it, along with its url
>   git remote ;# shows "foo"
>   git remote --get-url foo ;# also shows "foo"

Yeah, that does sound like a more natural way to look at it.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 15:24 ` [PATCH v2] " Christian Couder
                     ` (2 preceding siblings ...)
  2025-03-11 23:06   ` Jeff King
@ 2025-03-12 11:46   ` Christian Couder
  2025-03-12 17:02     ` Junio C Hamano
  2025-03-13 10:38     ` [PATCH v4] " Christian Couder
  3 siblings, 2 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-12 11:46 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's push the remote name
instead of `NULL` into the 'strvec' that stores URLs. This is
similar to what other Git commands do. For example `git fetch` uses
the remote name to fetch if no url has been configured. Similarly
`git remote get-url` reports the remote name if no url has been
configured.

We leave improving the strvec API and/or xstrdup() for a future
separate effort.

Note that an empty URL can still be configured using something like
`git remote add foo ""`.

While at it, let's warn and reject the remote, in the 'KnownUrl' case,
when no URL or an empty URL is advertised by the server, or when an
empty URL is configured on the client for a remote name advertised by
the server and configured on the client. This is on par with a warning
already emitted when URLs are different in the same case.

Let's also warn if the remote is rejected because name and url are the
same, as it could mean the url has not been configured.

While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.

Also let's spell "URL" with uppercase letters in all the warnings.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c                     | 46 ++++++++++++++++++---
 t/t5710-promisor-remote-capability.sh | 59 +++++++++++++++++++++++++++
 2 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..479b9a27af 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,13 +323,19 @@ static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		const char *url;
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
 
-		free(url);
+		/*
+		 * No URL defaults to the name of the remote, like
+		 * elsewhere in Git (e.g. `git fetch` or `git remote
+		 * get-url`). It's still possible that an empty URL is
+		 * configured.
+		 */
+		strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);
+
 		free(url_key);
 	}
 }
@@ -356,7 +362,7 @@ char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
+		if (*urls.v[i]) {
 			strbuf_addstr(&sb, ",url=");
 			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 		}
@@ -409,12 +415,42 @@ static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url) {
+		warning(_("no URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*remote_url) {
+		/*
+		 * This shouldn't happen with a Git server, but not
+		 * sure how other servers will be implemented in the
+		 * future.
+		 */
+		warning(_("empty URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*urls->v[i]) {
+		warning(_("empty URL configured for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
-	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
+	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
 		remote_name, urls->v[i], remote_url);
 
+	if (!strcmp(remote_name, urls->v[i]))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured locally"),
+			remote_name);
+
+	if (!strcmp(remote_name, remote_url))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured on the remote side"),
+			remote_name);
+
 	return 0;
 }
 
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..05ae96d1f7 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
@@ -228,6 +247,46 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config unset remote.lop.url &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as URLs are
+	# different, and the server cannot lazy fetch as the LOP URL is
+	# missing, so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config set remote.lop.url "" &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as an empty URL is
+	# not advertised, and the server cannot lazy fetch as the LOP URL is empty,
+	# so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&
 
-- 
2.49.0.rc2.1.g28c2a23e4a



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 20:48   ` Junio C Hamano
@ 2025-03-12 11:47     ` Christian Couder
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-12 11:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Tue, Mar 11, 2025 at 9:48 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > +     GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
>
> This one triggers test-lint violation.
>
> diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
> index 23203814d6..4c5c3c7656 100755
> --- a/t/t5710-promisor-remote-capability.sh
> +++ b/t/t5710-promisor-remote-capability.sh
> @@ -257,7 +257,7 @@ test_expect_success "clone with 'KnownUrl' and url not advertised" '
>         # It should fail because the client will reject the LOP
>         # as URLs are different, and the server cannot lazy fetch as
>         # the LOP URL is missing.
> -       GIT_NO_LAZY_FETCH=0 test_must_fail git clone -c remote.lop.promisor=true \
> +       test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
>                 -c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
>                 -c remote.lop.url="file://$(pwd)/lop" \
>                 -c promisor.acceptfromserver=KnownUrl \

Sorry for forgetting to check that and thanks for the fix. I use it in
the next version.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 23:06   ` Jeff King
  2025-03-11 23:36     ` Junio C Hamano
@ 2025-03-12 11:47     ` Christian Couder
  1 sibling, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-12 11:47 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder

On Wed, Mar 12, 2025 at 12:06 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Mar 11, 2025 at 04:24:13PM +0100, Christian Couder wrote:
>
> > Using strvec_push() to push `NULL` into a 'strvec' results in a
> > segfault, because `xstrdup(NULL)` crashes.
> >
> > So when an URL is missing from the config, let's push an empty string
> > instead of `NULL` into the 'strvec' that stores URLs.
>
> Is a configured remote with out a url key really a missing url, though?
> In other contexts it defaults to the name of the remote. E.g.:
>
>   # make a repo so "foo" is a valid url
>   git init foo
>   git -C foo commit --allow-empty bar
>
>   # configure a fetch refspec, but no url!
>   git init
>   git config remote.foo.fetch '+refs/heads/*:refs/remotes/foo/*'
>
>   # now fetching will use the configured refspec with a url of "foo"
>   git fetch foo
>
>   # and git-remote will report it, along with its url
>   git remote ;# shows "foo"
>   git remote --get-url foo ;# also shows "foo"
>
> This is obviously a weird thing to be doing, so I admit I don't really
> care all that much. But it feels like the most natural thing is just:
>
> diff --git a/promisor-remote.c b/promisor-remote.c
> index 6a0a61382f..761eb1dbd5 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -327,7 +327,7 @@ static void promisor_info_vecs(struct repository *repo,
>                 char *url_key = xstrfmt("remote.%s.url", r->name);
>
>                 strvec_push(names, r->name);
> -               strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> +               strvec_push(urls, git_config_get_string(url_key, &url) ? r->name : url);
>
>                 free(url);
>                 free(url_key);

Yeah, right I am using this in the next version. I have added warnings
to help debug this in the case a remote is rejected because urls are
different, as I think it could confuse users.

> > We could have modified strvec_push() to behave like
> > strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> > the right thing to do for the strvec API. 'strvec' is a kind of NULL
> > terminated array that is designed to be compatible with 'argv'
> > variables used on the command line. So we might want to disallow
> > pushing any `NULL` in it instead.
> >
> > It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> > return NULL.
>
> We have xstrdup_or_null() for the latter suggestion.

Yeah, I forgot about it. I think it makes sense to replace xstrdup()
with xstrdup_or_null() in strvec_push().

If we ever want a mode (possibly the default one) that forbids NULL in
strvec, we could add that on top. But right now as strvec_push_nodup()
accepts NULL, I think it makes sense for strvec_push() to accept NULL
too.

Anyway this is something we can work on after the release.

> There was some
> light discussion at the time about having xstrdup(NULL) handle this
> automatically:
>
>   https://lore.kernel.org/git/20150112231231.GA4023@peff.net/
>
> but it was mostly negative. I don't think anybody really dug into the
> thought experiment beyond a general "it might propagate NULL places you
> wouldn't expect" vibe, though.

I don't mind having both xstrdup() and xstrdup_or_null(). At least it
gives a hint to readers about NULL being expected or not.

> For the same reason I'd be a little hesitant to bless NULLs inside
> strvec structures. I think "nodup" allowing them is mostly an unintended
> consequence.

Yeah, but then if we ever need a strvec like struct that can contain
NULL, it would be kind of sad to have a separate struct with its own
files mostly duplicating the strvec code. I think we would then be
better with strvec having two modes, one accepting NULL and one
rejecting it.

> > For all these reasons, let's just focus on fixing the issue in
> > "promisor-remote.c" and let's leave improving the strvec API and/or
> > xstrdup() for a future effort.
>
> This part I certainly agree with. ;)
>
> >       for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> > -             char *url;
> > +             char *url = NULL;
> > +             const char *url_pushed = "";
> >               char *url_key = xstrfmt("remote.%s.url", r->name);
> >
> > +             if (!git_config_get_string(url_key, &url) && url)
> > +                     url_pushed = url;
> > +
> >               strvec_push(names, r->name);
> > -             strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> > +             strvec_push(urls, url_pushed);
> >
> >               free(url);
>
> Probably not super important, but while reading this I noticed that
> using git_config_get_string_tmp() would make the memory management a
> little simpler (since you do not need to free "url", you are free to
> point it to at the empty string and do not need a separate url_pushed).

Yeah, I will use this in the next version.

Thanks for the review.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2] promisor-remote: fix segfault when remote URL is missing
  2025-03-11 16:59   ` Junio C Hamano
@ 2025-03-12 11:48     ` Christian Couder
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-12 11:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Tue, Mar 11, 2025 at 5:59 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:

> > We could have modified strvec_push() to behave like
> > strvec_push_nodup() and accept `NULL`, but it's not clear that it's
> > the right thing to do for the strvec API. 'strvec' is a kind of NULL
> > terminated array that is designed to be compatible with 'argv'
> > variables used on the command line. So we might want to disallow
> > pushing any `NULL` in it instead.
> >
> > It's also not clear if `xstrdup(NULL)` should crash or BUG or just
> > return NULL.
>
> Yup, the above two paragraphs are irrelevant, I would think.

I have removed them.

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3] promisor-remote: fix segfault when remote URL is missing
  2025-03-12 11:46   ` [PATCH v3] " Christian Couder
@ 2025-03-12 17:02     ` Junio C Hamano
  2025-03-13 10:39       ` Christian Couder
  2025-03-13 10:38     ` [PATCH v4] " Christian Couder
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-03-12 17:02 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

>  	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> +		const char *url;
>  		char *url_key = xstrfmt("remote.%s.url", r->name);
>  
>  		strvec_push(names, r->name);
>  
> +		/*
> +		 * No URL defaults to the name of the remote, like
> +		 * elsewhere in Git (e.g. `git fetch` or `git remote
> +		 * get-url`). It's still possible that an empty URL is
> +		 * configured.
> +		 */

Not a huge deal as it is not telling any lies, but does the second
sentence need to be said?  An element in the urls strvec being an
empty string is not all that more interesting than it being an
incorrect or malformed URL to those who are reading this piece of
code, is it?  It is also possible that an unreachable URL or
misspelt URL is configured, but it is not a job of this piece of
code to worry about them, just like it is none of the business of
this code if the configured URL is an empty string, no?

> +		strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);

More on this below.  Unlike "git fetch" and "git push" used as the
source and destination, the remote URL used in this context are
exposed to the outside world, and I am not sure the usual r->name
fallback makes sense.

>  		free(url_key);
>  	}
>  }
> @@ -356,7 +362,7 @@ char *promisor_remote_info(struct repository *repo)
>  			strbuf_addch(&sb, ';');
>  		strbuf_addstr(&sb, "name=");
>  		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
> -		if (urls.v[i]) {
> +		if (*urls.v[i]) {
>  			strbuf_addstr(&sb, ",url=");
>  			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);

We used to advertise an empty string name to the other end, but we
no longer do, which is a good hygiene to be strict on what we send
out.

But now our updated promisor_info_vecs() pushes our local name
r->name as a fallback. The idea of r->name fallback is to use it as
a local directory path for "git fetch" and friends, but the local
pathname has no meaning to the other side, does it?  Is it something
we want to let the other side even know???

What other uses do the name/url vectors prepared by
promisor_info_vecs() have?  Is it that we use them only to advertise
with this code, and then match with what they advertise?  If we are
not using these names and urls locally to fetch from in code paths,
I am inclined to suggest that promisor_info_vecs() should not shove
these fallback URLs (local directory name implicitly inferred) into
the names/urls vectors.

On the other hand, if other callsites that use the names/urls
obtained from that function do want to see such local pathnames, we
cannot lose information at the source, so we'd somehow need to
filter them at various places, I guess.  And this place that builds
up the string to be sent as capability response should be one of
these places that must filter.

> @@ -409,12 +415,42 @@ static int should_accept_remote(enum accept_promisor accept,
>  	if (accept != ACCEPT_KNOWN_URL)
>  		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
>  
> +	if (!remote_url) {
> +		warning(_("no URL advertised for remote '%s'"), remote_name);
> +		return 0;
> +	}

Except for the above "no URL advertised" warning and returning,
which is absolutely a good thing to do, I am still not sure how
relevant various checks for an empty string new code added by this
patch makes are ...

> +	if (!*remote_url) {
> +		/*
> +		 * This shouldn't happen with a Git server, but not
> +		 * sure how other servers will be implemented in the
> +		 * future.
> +		 */
> +		warning(_("empty URL advertised for remote '%s'"), remote_name);
> +		return 0;
> +	}
> +
> +	if (!*urls->v[i]) {
> +		warning(_("empty URL configured for remote '%s'"), remote_name);
> +		return 0;
> +	}
> +

... would it be so different to pass an empty string as to pass a
misspelt URL received from the other end?  Wouldn't the end result
the same (i.e., we thought we had a URL usable as a promisor remote,
but it turns out that we cannot reach it)?

>  	if (!strcmp(urls->v[i], remote_url))
>  		return 1;

Past this point, I am not sure what the points of these checks and
warnings are; even with these "problematic" remote_name and remote_url
combinations these warnings attempt to warn against are used, as long
as the above check said it is OK, we'd silently said "should accept"
already to the caller.

> -	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
> +	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
>  		remote_name, urls->v[i], remote_url);
>  
> +	if (!strcmp(remote_name, urls->v[i]))

The 'i' was obtained by calling remote_nick_find(), which uses
strcasecmp() to find named remote (which I doubt it is a sensible
design by the way).  This code should be consistent with whatever
comparison used there.

> +		warning(_("remote name and URL are the same '%s', "
> +			  "maybe the URL is not configured locally"),
> +			remote_name);
> +
> +	if (!strcmp(remote_name, remote_url))

This is matching what r->name fallback did so it is correct to be
strcmp().  But (1) it may be way too late after the above "return
1", and (2) if we are *not* going to use it, perhaps we shouldn't
place it in the resulting strvec from promisor_info_vecs() in the
first place?

> +		warning(_("remote name and URL are the same '%s', "
> +			  "maybe the URL is not configured on the remote side"),
> +			remote_name);
> +
>  	return 0;
>  }


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v4] promisor-remote: fix segfault when remote URL is missing
  2025-03-12 11:46   ` [PATCH v3] " Christian Couder
  2025-03-12 17:02     ` Junio C Hamano
@ 2025-03-13 10:38     ` Christian Couder
  2025-03-13 16:28       ` Junio C Hamano
  2025-03-14 14:12       ` [PATCH v5 0/3] "promisor-remote" capability fixes Christian Couder
  1 sibling, 2 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-13 10:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's push the remote name
instead of `NULL` into the 'strvec' that stores URLs. This is
similar to what other Git commands do. For example `git fetch` uses
the remote name to fetch if no url has been configured. Similarly
`git remote get-url` reports the remote name if no url has been
configured.

We leave improving the strvec API and/or xstrdup() for a future
separate effort.

Note that an empty URL can still be configured using something like
`git remote add foo ""`.

While at it, let's warn and reject the remote, in the 'KnownUrl' case,
when no URL or an empty URL is advertised by the server, or when an
empty URL is configured on the client for a remote name advertised by
the server and configured on the client. This is on par with a warning
already emitted when URLs are different in the same case.

Let's also warn if the remote is rejected because name and url are the
same, as it could mean the url has not been configured.

While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.

Also let's spell "URL" with uppercase letters in all the warnings.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---

Only 2 small changes since v3:

  - a sentence in a code comment has been removed,
  - a strcmp() has been replaced with strcasecmp()

Range diff since v3:

1:  ea3c8347de ! 1:  f94452eaa2 promisor-remote: fix segfault when remote URL is missing
    @@ promisor-remote.c: static void promisor_info_vecs(struct repository *repo,
      
     -          free(url);
     +          /*
    -+           * No URL defaults to the name of the remote, like
    -+           * elsewhere in Git (e.g. `git fetch` or `git remote
    -+           * get-url`). It's still possible that an empty URL is
    -+           * configured.
    ++           * No URL defaults to the name of the remote, like elsewhere
    ++           * in Git (e.g. `git fetch` or `git remote get-url`).
     +           */
     +          strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);
     +
    @@ promisor-remote.c: static int should_accept_remote(enum accept_promisor accept,
     +  warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
                remote_name, urls->v[i], remote_url);
      
    -+  if (!strcmp(remote_name, urls->v[i]))
    ++  if (!strcasecmp(remote_name, urls->v[i]))
     +          warning(_("remote name and URL are the same '%s', "
     +                    "maybe the URL is not configured locally"),
     +                  remote_name);

 promisor-remote.c                     | 44 +++++++++++++++++---
 t/t5710-promisor-remote-capability.sh | 59 +++++++++++++++++++++++++++
 2 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..dd589dd4ea 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,13 +323,17 @@ static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		const char *url;
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
 		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
 
-		free(url);
+		/*
+		 * No URL defaults to the name of the remote, like elsewhere
+		 * in Git (e.g. `git fetch` or `git remote get-url`).
+		 */
+		strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);
+
 		free(url_key);
 	}
 }
@@ -356,7 +360,7 @@ char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
+		if (*urls.v[i]) {
 			strbuf_addstr(&sb, ",url=");
 			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 		}
@@ -409,12 +413,42 @@ static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url) {
+		warning(_("no URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*remote_url) {
+		/*
+		 * This shouldn't happen with a Git server, but not
+		 * sure how other servers will be implemented in the
+		 * future.
+		 */
+		warning(_("empty URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
+	if (!*urls->v[i]) {
+		warning(_("empty URL configured for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
-	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
+	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
 		remote_name, urls->v[i], remote_url);
 
+	if (!strcasecmp(remote_name, urls->v[i]))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured locally"),
+			remote_name);
+
+	if (!strcmp(remote_name, remote_url))
+		warning(_("remote name and URL are the same '%s', "
+			  "maybe the URL is not configured on the remote side"),
+			remote_name);
+
 	return 0;
 }
 
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..05ae96d1f7 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
@@ -228,6 +247,46 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config unset remote.lop.url &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as URLs are
+	# different, and the server cannot lazy fetch as the LOP URL is
+	# missing, so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config set remote.lop.url "" &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as an empty URL is
+	# not advertised, and the server cannot lazy fetch as the LOP URL is empty,
+	# so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&
 
-- 
2.49.0.rc2.1.gf94452eaa2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3] promisor-remote: fix segfault when remote URL is missing
  2025-03-12 17:02     ` Junio C Hamano
@ 2025-03-13 10:39       ` Christian Couder
  2025-03-13 16:40         ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Couder @ 2025-03-13 10:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Wed, Mar 12, 2025 at 6:02 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> >       for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
> > +             const char *url;
> >               char *url_key = xstrfmt("remote.%s.url", r->name);
> >
> >               strvec_push(names, r->name);
> >
> > +             /*
> > +              * No URL defaults to the name of the remote, like
> > +              * elsewhere in Git (e.g. `git fetch` or `git remote
> > +              * get-url`). It's still possible that an empty URL is
> > +              * configured.
> > +              */
>
> Not a huge deal as it is not telling any lies, but does the second
> sentence need to be said?  An element in the urls strvec being an
> empty string is not all that more interesting than it being an
> incorrect or malformed URL to those who are reading this piece of
> code, is it?  It is also possible that an unreachable URL or
> misspelt URL is configured, but it is not a job of this piece of
> code to worry about them, just like it is none of the business of
> this code if the configured URL is an empty string, no?

Yeah, right, I have removed the second sentence in the next version.

> > +             strvec_push(urls, git_config_get_string_tmp(url_key, &url) ? r->name : url);
>
> More on this below.  Unlike "git fetch" and "git push" used as the
> source and destination, the remote URL used in this context are
> exposed to the outside world, and I am not sure the usual r->name
> fallback makes sense.
>
> >               free(url_key);
> >       }
> >  }
> > @@ -356,7 +362,7 @@ char *promisor_remote_info(struct repository *repo)
> >                       strbuf_addch(&sb, ';');
> >               strbuf_addstr(&sb, "name=");
> >               strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
> > -             if (urls.v[i]) {
> > +             if (*urls.v[i]) {
> >                       strbuf_addstr(&sb, ",url=");
> >                       strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
>
> We used to advertise an empty string name to the other end, but we
> no longer do, which is a good hygiene to be strict on what we send
> out.
>
> But now our updated promisor_info_vecs() pushes our local name
> r->name as a fallback. The idea of r->name fallback is to use it as
> a local directory path for "git fetch" and friends, but the local
> pathname has no meaning to the other side, does it?  Is it something
> we want to let the other side even know???

It could happen that the server, the client and the common promisor
remote are all on the same filesystem. Then it would make sense for
both the server and the client to rely on just the remote name,
without any URL configured, to access the promisor remote. So if we
want things to work in this case, then I think the server should
advertise the remote name in the "url=" field.

Also it's not like the server is giving away secret information as it
already passes the remote name anyway in the "name=" field.

And yeah, the client could be configured with "KnownName" instead of
"KnownURL" in this case, but that wouldn't work if there are other
promisor remotes that the client and the server want to share and that
are not local and therefore need a URL configured on both sides.

> What other uses do the name/url vectors prepared by
> promisor_info_vecs() have?  Is it that we use them only to advertise
> with this code, and then match with what they advertise?

Yes, I think so.

> If we are
> not using these names and urls locally to fetch from in code paths,
> I am inclined to suggest that promisor_info_vecs() should not shove
> these fallback URLs (local directory name implicitly inferred) into
> the names/urls vectors.

We could do that but I think it would make it more difficult to make
things work in the case I discussed above (where the client and the
common promisor remote are all on the same filesystem, and both the
server and the client rely on just the remote name to access the
promisor remote).

> On the other hand, if other callsites that use the names/urls
> obtained from that function do want to see such local pathnames, we
> cannot lose information at the source, so we'd somehow need to
> filter them at various places, I guess.  And this place that builds
> up the string to be sent as capability response should be one of
> these places that must filter.

Other call sites don't use promisor_info_vecs(). It was introduced by
the lop patch series which doesn't change how other code gets the
remote names and URLs.

> > @@ -409,12 +415,42 @@ static int should_accept_remote(enum accept_promisor accept,
> >       if (accept != ACCEPT_KNOWN_URL)
> >               BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
> >
> > +     if (!remote_url) {
> > +             warning(_("no URL advertised for remote '%s'"), remote_name);
> > +             return 0;
> > +     }
>
> Except for the above "no URL advertised" warning and returning,
> which is absolutely a good thing to do, I am still not sure how
> relevant various checks for an empty string new code added by this
> patch makes are ...
>
> > +     if (!*remote_url) {
> > +             /*
> > +              * This shouldn't happen with a Git server, but not
> > +              * sure how other servers will be implemented in the
> > +              * future.
> > +              */
> > +             warning(_("empty URL advertised for remote '%s'"), remote_name);
> > +             return 0;
> > +     }
> > +
> > +     if (!*urls->v[i]) {
> > +             warning(_("empty URL configured for remote '%s'"), remote_name);
> > +             return 0;
> > +     }
> > +
>
> ... would it be so different to pass an empty string as to pass a
> misspelt URL received from the other end?  Wouldn't the end result
> the same (i.e., we thought we had a URL usable as a promisor remote,
> but it turns out that we cannot reach it)?

Perhaps but I think it would be weird if URLs are matching when they
are empty on both sides. I think it makes more sense and is more
helpful to warn with a clear error message and just reject the remote
if any of the URL is empty.

> >       if (!strcmp(urls->v[i], remote_url))
> >               return 1;
>
> Past this point, I am not sure what the points of these checks and
> warnings are; even with these "problematic" remote_name and remote_url
> combinations these warnings attempt to warn against are used, as long
> as the above check said it is OK, we'd silently said "should accept"
> already to the caller.

Past this point we are in the case where remote names matched but
remote URLs didn't match. So I think we should help diagnose things
(with warnings) because it's likely that the intent was for the URL to
also match but a mistake prevented that from happening.

> > -     warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
> > +     warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
> >               remote_name, urls->v[i], remote_url);
> >
> > +     if (!strcmp(remote_name, urls->v[i]))
>
> The 'i' was obtained by calling remote_nick_find(), which uses
> strcasecmp() to find named remote (which I doubt it is a sensible
> design by the way).  This code should be consistent with whatever
> comparison used there.

I think comparing remote names case insensitively is fair. It's likely
to just make things a bit easier for users.

In the next version, I have changed the comparison to strcasecmp()
here as I agree it could help if the comparisons are consistent.

> > +             warning(_("remote name and URL are the same '%s', "
> > +                       "maybe the URL is not configured locally"),
> > +                     remote_name);
> > +
> > +     if (!strcmp(remote_name, remote_url))
>
> This is matching what r->name fallback did so it is correct to be
> strcmp().  But (1) it may be way too late after the above "return
> 1", and (2) if we are *not* going to use it, perhaps we shouldn't
> place it in the resulting strvec from promisor_info_vecs() in the
> first place?

We are in the case the URLs didn't match. So yeah we are not going to
use the remote info because we are going to reject the remote (with
`return 0;`) a few lines below. But it would be nice if we can help
users diagnose what happened.

If we notice that remote_name and remote_url are the same it might be
because the URL is not configured on the server side, so the server
passed the remote name instead. It can be nice to tell users that this
might have happened to help them debug.

> > +             warning(_("remote name and URL are the same '%s', "
> > +                       "maybe the URL is not configured on the remote side"),
> > +                     remote_name);
> > +
> >       return 0;
> >  }


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4] promisor-remote: fix segfault when remote URL is missing
  2025-03-13 10:38     ` [PATCH v4] " Christian Couder
@ 2025-03-13 16:28       ` Junio C Hamano
  2025-03-13 17:23         ` Junio C Hamano
  2025-03-14 14:10         ` Christian Couder
  2025-03-14 14:12       ` [PATCH v5 0/3] "promisor-remote" capability fixes Christian Couder
  1 sibling, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-03-13 16:28 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> While at it, let's warn and reject the remote, in the 'KnownUrl' case,
> when no URL or an empty URL is advertised by the server, or when an
> empty URL is configured on the client for a remote name advertised by
> the server and configured on the client. This is on par with a warning
> already emitted when URLs are different in the same case.

That explanation makes it unclear why we need a new one.  If the
configured and davertised are both empty and the same, according to
that "warning already emitted", that is not a warning-worthy event,
is it?

> Let's also warn if the remote is rejected because name and url are the
> same, as it could mean the url has not been configured.

Are we rejecting a remote _because_ r->name is used?  I thought the
code did something quite different.  We reject because the url does
not match, and then after that give an extra warning if remote nick
was used as a fallback URL.  Even if URL is configured as 'orogin'
for a remote with nick 'origin', the code would have rejected the
remote with the same logic in the same code path, wouldn't it?  It
is a bit confusiong to call such a situation "rejected because name
and URL are the same".

In any case, why do we want to keep these unconfigured remotes in
the list of candidate lop-remotes in the first place, and why do we
want to treat empty URLs as being so special, more special than say
a randomly misspelt URL?  I think I asked these questions on the
previous round and I do not think I saw them addressed at all in
this round.

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3] promisor-remote: fix segfault when remote URL is missing
  2025-03-13 10:39       ` Christian Couder
@ 2025-03-13 16:40         ` Junio C Hamano
  2025-03-14 14:09           ` Christian Couder
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-03-13 16:40 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> It could happen that the server, the client and the common promisor
> remote are all on the same filesystem. Then it would make sense for
> both the server and the client to rely on just the remote name,
> without any URL configured, to access the promisor remote. So if we
> want things to work in this case, then I think the server should
> advertise the remote name in the "url=" field.

Meaning the server and all the clients share the short-and-sweet
string that is suitable as a remote nickname (i.e. something you the
client driver would type to "git fetch" command) as a pathname that
is relative to the current working directory, and because the server
and these clients must refer to the same repository using this
mechanism, this in turn means that the server and all the clients
share the same current working directory?

It may be possible, but would that make _any_ practical sense?  I
doubt it.  I would understand perfectly well that the local single
machine situation as a good justification to use file:// URL in such
a setting, but not for the r->name fallback.

>> What other uses do the name/url vectors prepared by
>> promisor_info_vecs() have?  Is it that we use them only to advertise
>> with this code, and then match with what they advertise?
>
> Yes, I think so.
> ...
> Other call sites don't use promisor_info_vecs(). It was introduced by
> the lop patch series which doesn't change how other code gets the
> remote names and URLs.

Then it should be simpler to remove r->name entries at the source in
that function, than having to filter it from the strvec whenever the
strvec elements are used.

>> ... would it be so different to pass an empty string as to pass a
>> misspelt URL received from the other end?  Wouldn't the end result
>> the same (i.e., we thought we had a URL usable as a promisor remote,
>> but it turns out that we cannot reach it)?
>
> Perhaps but I think it would be weird if URLs are matching when they
> are empty on both sides. I think it makes more sense and is more
> helpful to warn with a clear error message and just reject the remote
> if any of the URL is empty.

Smells like over-engineering for nonexisting case to me.

>> The 'i' was obtained by calling remote_nick_find(), which uses
>> strcasecmp() to find named remote (which I doubt it is a sensible
>> design by the way).  This code should be consistent with whatever
>> comparison used there.
>
> I think comparing remote names case insensitively is fair. It's likely
> to just make things a bit easier for users.

Meaning it makes it impossible to have two remotes with nicknames
that are different in case?

Because the "[remote "nick"] fetch = ..." configuration variables
have the nickname in the second part, the nicknames are case
insensitive, unlike the first and the third component (i.e.
"remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
but "remote.Origin.fetch" and "remote.origin.fetch" are different).


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4] promisor-remote: fix segfault when remote URL is missing
  2025-03-13 16:28       ` Junio C Hamano
@ 2025-03-13 17:23         ` Junio C Hamano
  2025-03-14 14:10         ` Christian Couder
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-03-13 17:23 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Junio C Hamano <gitster@pobox.com> writes:

> In any case, why do we want to keep these unconfigured remotes in
> the list of candidate lop-remotes in the first place, and why do we
> want to treat empty URLs as being so special, more special than say
> a randomly misspelt URL?  I think I asked these questions on the
> previous round and I do not think I saw them addressed at all in
> this round.

Ah, sorry, I saw this new round before seeing the response to v3
review, in which these are covered.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3] promisor-remote: fix segfault when remote URL is missing
  2025-03-13 16:40         ` Junio C Hamano
@ 2025-03-14 14:09           ` Christian Couder
  2025-03-14 17:28             ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Couder @ 2025-03-14 14:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Thu, Mar 13, 2025 at 5:40 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > It could happen that the server, the client and the common promisor
> > remote are all on the same filesystem. Then it would make sense for
> > both the server and the client to rely on just the remote name,
> > without any URL configured, to access the promisor remote. So if we
> > want things to work in this case, then I think the server should
> > advertise the remote name in the "url=" field.
>
> Meaning the server and all the clients share the short-and-sweet
> string that is suitable as a remote nickname (i.e. something you the
> client driver would type to "git fetch" command) as a pathname that
> is relative to the current working directory, and because the server
> and these clients must refer to the same repository using this
> mechanism, this in turn means that the server and all the clients
> share the same current working directory?

Maybe they don't share the same directory but there is a symlink to or
a mount of the remote directory. I agree it's a rare case, but the
case with no URL is a rare case too.

Also I just tested the following:

$ mkdir test_git
$ cd test_git
$ git init
$ git config remote."../git".fetch '+refs/heads/*:refs/remotes/git/*'
$ git fetch "../git"

which works if ../git is a valid path to a repo.

So even if `git remote add "../stuff" url` is rejected, one can
actually create working remotes with names that point to any directory
on the current filesystem.

> It may be possible, but would that make _any_ practical sense?  I
> doubt it.  I would understand perfectly well that the local single
> machine situation as a good justification to use file:// URL in such
> a setting, but not for the r->name fallback.
>
> >> What other uses do the name/url vectors prepared by
> >> promisor_info_vecs() have?  Is it that we use them only to advertise
> >> with this code, and then match with what they advertise?
> >
> > Yes, I think so.
> > ...
> > Other call sites don't use promisor_info_vecs(). It was introduced by
> > the lop patch series which doesn't change how other code gets the
> > remote names and URLs.
>
> Then it should be simpler to remove r->name entries at the source in
> that function, than having to filter it from the strvec whenever the
> strvec elements are used.

I am fine with this solution. The case with no URL isn't likely to
happen in the first place, and if needed, it can be easily worked
around by just specifying an URL that can be the same as the remote
name. So in the next version, only remotes with an URL configured are
pushed into the 'names' and 'urls' strvecs.

> >> ... would it be so different to pass an empty string as to pass a
> >> misspelt URL received from the other end?  Wouldn't the end result
> >> the same (i.e., we thought we had a URL usable as a promisor remote,
> >> but it turns out that we cannot reach it)?
> >
> > Perhaps but I think it would be weird if URLs are matching when they
> > are empty on both sides. I think it makes more sense and is more
> > helpful to warn with a clear error message and just reject the remote
> > if any of the URL is empty.
>
> Smells like over-engineering for nonexisting case to me.

I am fine with not worrying about this. Then I think it's just simpler
to ignore any remote with an empty URL and not even push them into the
strvecs in the first place, like we are now also doing for remote with
no URL. So I have implemented this in the next version. It just
simplifies a lot of things.

It also seems that when an URL is empty, Git uses the remote name to
fetch, like when the URL is missing. So it makes sense to process
missing and empty URLs in the same way.

> >> The 'i' was obtained by calling remote_nick_find(), which uses
> >> strcasecmp() to find named remote (which I doubt it is a sensible
> >> design by the way).  This code should be consistent with whatever
> >> comparison used there.
> >
> > I think comparing remote names case insensitively is fair. It's likely
> > to just make things a bit easier for users.
>
> Meaning it makes it impossible to have two remotes with nicknames
> that are different in case?
>
> Because the "[remote "nick"] fetch = ..." configuration variables
> have the nickname in the second part, the nicknames are case
> insensitive, unlike the first and the third component (i.e.

I think you mean "the nicknames are case sensitive" here.

> "remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
> but "remote.Origin.fetch" and "remote.origin.fetch" are different).

Ok, then I have fixed this in a separate patch in the next version.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v4] promisor-remote: fix segfault when remote URL is missing
  2025-03-13 16:28       ` Junio C Hamano
  2025-03-13 17:23         ` Junio C Hamano
@ 2025-03-14 14:10         ` Christian Couder
  1 sibling, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-14 14:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Thu, Mar 13, 2025 at 5:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > While at it, let's warn and reject the remote, in the 'KnownUrl' case,
> > when no URL or an empty URL is advertised by the server, or when an
> > empty URL is configured on the client for a remote name advertised by
> > the server and configured on the client. This is on par with a warning
> > already emitted when URLs are different in the same case.
>
> That explanation makes it unclear why we need a new one.  If the
> configured and davertised are both empty and the same, according to
> that "warning already emitted", that is not a warning-worthy event,
> is it?

We have to check that remote_url is not NULL before using it in
strcmp(). If it is NULL, we need to reject the remote, and it makes
sense to warn before doing that with `return 0;` because we warn
otherwise when a remote is rejected to try to help diagnose things at
the end of the function.

And while we are checking that remote_url is not NULL and warning if
it is, it makes sense to also help diagnose the case where remote_url
is empty with something like:

    if (!remote_url || !*remote_url) {
        warning(_("no or empty URL advertised for remote '%s'"), remote_name);
        return 0;
    }

I have used the above in the next version. Also I think this part
deserves its own patch too, so it is in a separate patch in the next
version.

> > Let's also warn if the remote is rejected because name and url are the
> > same, as it could mean the url has not been configured.
>
> Are we rejecting a remote _because_ r->name is used?  I thought the
> code did something quite different.  We reject because the url does
> not match, and then after that give an extra warning if remote nick
> was used as a fallback URL.  Even if URL is configured as 'orogin'
> for a remote with nick 'origin', the code would have rejected the
> remote with the same logic in the same code path, wouldn't it?  It
> is a bit confusiong to call such a situation "rejected because name
> and URL are the same".

Yeah, I have removed the above code as it's not needed anyway if we
don't process the remotes when they don't have a non-empty URL
configured.

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 0/3] "promisor-remote" capability fixes
  2025-03-13 10:38     ` [PATCH v4] " Christian Couder
  2025-03-13 16:28       ` Junio C Hamano
@ 2025-03-14 14:12       ` Christian Couder
  2025-03-14 14:12         ` [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing Christian Couder
                           ` (3 more replies)
  1 sibling, 4 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-14 14:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder

As a number of different issues and fixes were found and discussed in
the previous iterations, I thought it made sense to split the patch
into a small 3 patch series.

There are a lot of changes again compared to the previous version, so
I don't think it makes sense to provide a range-diff.

Christian Couder (3):
  promisor-remote: fix segfault when remote URL is missing
  promisor-remote: fix possible issue when no URL is advertised
  promisor-remote: compare remote names case sensitively

 Documentation/config/promisor.adoc    |  4 +-
 promisor-remote.c                     | 27 +++++++-----
 t/t5710-promisor-remote-capability.sh | 59 +++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 13 deletions(-)

-- 
2.49.0.rc2.1.gf94452eaa2



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing
  2025-03-14 14:12       ` [PATCH v5 0/3] "promisor-remote" capability fixes Christian Couder
@ 2025-03-14 14:12         ` Christian Couder
  2025-03-14 18:59           ` Junio C Hamano
  2025-03-14 14:12         ` [PATCH v5 2/3] promisor-remote: fix possible issue when no URL is advertised Christian Couder
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Christian Couder @ 2025-03-14 14:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's not push the remote
name and URL into the 'strvec's.

While at it, let's also not push them in case the URL is empty. It's
just not worth the trouble and it's consistent with how Git otherwise
treats missing and empty URLs in the same way.

Note that in case of missing or empty URL, Git uses the remote name to
fetch, which can work if the remote is on the same filesystem. So
configurations where the client, server and remote are all on the same
filesystem may need URLs to be configured even if they are the same as
the remote names. But this is a rare case, and the work around is easy
enough.

We leave improving the strvec API and/or xstrdup() for a future
separate effort.

While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c                     | 16 ++++----
 t/t5710-promisor-remote-capability.sh | 59 +++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..ba80240f12 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,13 +323,15 @@ static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		const char *url;
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
-		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+		/* Only add remotes with a non empty URL */
+		if (!git_config_get_string_tmp(url_key, &url) && *url) {
+			strvec_push(names, r->name);
+			strvec_push(urls, url);
+		}
 
-		free(url);
 		free(url_key);
 	}
 }
@@ -356,10 +358,8 @@ char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
-			strbuf_addstr(&sb, ",url=");
-			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
-		}
+		strbuf_addstr(&sb, ",url=");
+		strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 	}
 
 	strvec_clear(&names);
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..05ae96d1f7 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+	test_when_finished "rm -rf client" &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 
@@ -228,6 +247,46 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config unset remote.lop.url &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as URLs are
+	# different, and the server cannot lazy fetch as the LOP URL is
+	# missing, so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
+	git -C server config promisor.advertise true &&
+
+	git -C server config set remote.lop.url "" &&
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as an empty URL is
+	# not advertised, and the server cannot lazy fetch as the LOP URL is empty,
+	# so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&
 
-- 
2.49.0.rc2.1.gf94452eaa2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 2/3] promisor-remote: fix possible issue when no URL is advertised
  2025-03-14 14:12       ` [PATCH v5 0/3] "promisor-remote" capability fixes Christian Couder
  2025-03-14 14:12         ` [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing Christian Couder
@ 2025-03-14 14:12         ` Christian Couder
  2025-03-14 14:12         ` [PATCH v5 3/3] promisor-remote: compare remote names case sensitively Christian Couder
  2025-03-18 11:00         ` [PATCH v6 0/4] "promisor-remote" capability fixes Christian Couder
  3 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-14 14:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

In the 'KnownUrl' case, in should_accept_remote(), let's check that
`remote_url` is not NULL before we use strcmp() to compare it with
the local URL. This could avoid crashes if a server starts to not
advertise any URL in the future.

If `remote_url` is NULL, we should reject the URL. Let's also warn in
this case because we warn otherwise when a remote is rejected to try
to help diagnose things at the end of the function.

And while we are checking that remote_url is not NULL and warning if
it is, it makes sense to also help diagnose the case where remote_url
is empty.

Also while at it, let's spell "URL" with uppercase letters in all the
warnings.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index ba80240f12..0b7b1ec45a 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -409,10 +409,15 @@ static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url || !*remote_url) {
+		warning(_("no or empty URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
-	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
+	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
 		remote_name, urls->v[i], remote_url);
 
 	return 0;
-- 
2.49.0.rc2.1.gf94452eaa2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 3/3] promisor-remote: compare remote names case sensitively
  2025-03-14 14:12       ` [PATCH v5 0/3] "promisor-remote" capability fixes Christian Couder
  2025-03-14 14:12         ` [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing Christian Couder
  2025-03-14 14:12         ` [PATCH v5 2/3] promisor-remote: fix possible issue when no URL is advertised Christian Couder
@ 2025-03-14 14:12         ` Christian Couder
  2025-03-14 17:28           ` Junio C Hamano
  2025-03-18 11:00         ` [PATCH v6 0/4] "promisor-remote" capability fixes Christian Couder
  3 siblings, 1 reply; 35+ messages in thread
From: Christian Couder @ 2025-03-14 14:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

Because the "[remote "nick"] fetch = ..." configuration variables
have the nickname in the second part, the nicknames are case
sensitive, unlike the first and the third component (i.e.
"remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
but "remote.Origin.fetch" and "remote.origin.fetch" are different).

Let's follow the way Git works in general and compare the remote
names case sensitively when processing advertised remotes.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/promisor.adoc | 4 ++--
 promisor-remote.c                  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index 9192acfd24..2638b01f83 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -26,5 +26,5 @@ promisor.acceptFromServer::
 	server will be accepted. By accepting a promisor remote, the
 	client agrees that the server might omit objects that are
 	lazily fetchable from this promisor remote from its responses
-	to "fetch" and "clone" requests from the client. See
-	linkgit:gitprotocol-v2[5].
+	to "fetch" and "clone" requests from the client. Name and URL
+	comparisons are case sensitive. See linkgit:gitprotocol-v2[5].
diff --git a/promisor-remote.c b/promisor-remote.c
index 0b7b1ec45a..5801ebfd9b 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -370,13 +370,13 @@ char *promisor_remote_info(struct repository *repo)
 
 /*
  * Find first index of 'nicks' where there is 'nick'. 'nick' is
- * compared case insensitively to the strings in 'nicks'. If not found
+ * compared case sensitively to the strings in 'nicks'. If not found
  * 'nicks->nr' is returned.
  */
 static size_t remote_nick_find(struct strvec *nicks, const char *nick)
 {
 	for (size_t i = 0; i < nicks->nr; i++)
-		if (!strcasecmp(nicks->v[i], nick))
+		if (!strcmp(nicks->v[i], nick))
 			return i;
 	return nicks->nr;
 }
-- 
2.49.0.rc2.1.gf94452eaa2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/3] promisor-remote: compare remote names case sensitively
  2025-03-14 14:12         ` [PATCH v5 3/3] promisor-remote: compare remote names case sensitively Christian Couder
@ 2025-03-14 17:28           ` Junio C Hamano
  2025-03-18 11:04             ` Christian Couder
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-03-14 17:28 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> Because the "[remote "nick"] fetch = ..." configuration variables
> have the nickname in the second part, the nicknames are case
> sensitive, unlike the first and the third component (i.e.
> "remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
> but "remote.Origin.fetch" and "remote.origin.fetch" are different).

I double-checked what the control flow that passes through
remote.c:handle_config() does, and the above is in line with what
remote_get() does.  

remote.c:read_config() populates the nickname-to-remote hashmap by
using handle_config() callback, which calls make_remote() with the
second level name (e.g. "Origin" and "origin" in the last example of
the above), which is passed to memhash() not memihash() when looking
up or registering the remote.

If we used case insensitive comparison in the new code, a malicious
large-object promisor remote could have told us to use "Origin" as
an extra promisor and in response the new code may noticed that we
have "origin" and tried to equate it with what the other side told
us.  But when the existing code actually interacts with the promisor
remote, it wouldn't have found any configured remote under the name
"Origin", and something funny would start from there.  By using the
right remote consistently throughout the system, we would not get
confused that way, which is good.

> Let's follow the way Git works in general and compare the remote
> names case sensitively when processing advertised remotes.
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/config/promisor.adoc | 4 ++--
>  promisor-remote.c                  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)

Looking good.  Thanks.

> diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
> index 9192acfd24..2638b01f83 100644
> --- a/Documentation/config/promisor.adoc
> +++ b/Documentation/config/promisor.adoc
> @@ -26,5 +26,5 @@ promisor.acceptFromServer::
>  	server will be accepted. By accepting a promisor remote, the
>  	client agrees that the server might omit objects that are
>  	lazily fetchable from this promisor remote from its responses
> -	to "fetch" and "clone" requests from the client. See
> -	linkgit:gitprotocol-v2[5].
> +	to "fetch" and "clone" requests from the client. Name and URL
> +	comparisons are case sensitive. See linkgit:gitprotocol-v2[5].
> diff --git a/promisor-remote.c b/promisor-remote.c
> index 0b7b1ec45a..5801ebfd9b 100644
> --- a/promisor-remote.c
> +++ b/promisor-remote.c
> @@ -370,13 +370,13 @@ char *promisor_remote_info(struct repository *repo)
>  
>  /*
>   * Find first index of 'nicks' where there is 'nick'. 'nick' is
> - * compared case insensitively to the strings in 'nicks'. If not found
> + * compared case sensitively to the strings in 'nicks'. If not found
>   * 'nicks->nr' is returned.
>   */
>  static size_t remote_nick_find(struct strvec *nicks, const char *nick)
>  {
>  	for (size_t i = 0; i < nicks->nr; i++)
> -		if (!strcasecmp(nicks->v[i], nick))
> +		if (!strcmp(nicks->v[i], nick))
>  			return i;
>  	return nicks->nr;
>  }


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3] promisor-remote: fix segfault when remote URL is missing
  2025-03-14 14:09           ` Christian Couder
@ 2025-03-14 17:28             ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-03-14 17:28 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

>> is relative to the current working directory, and because the server
>> and these clients must refer to the same repository using this
>> mechanism, this in turn means that the server and all the clients
>> share the same current working directory?
>
> Maybe they don't share the same directory but there is a symlink
> to or a mount of the remote directory.  I agree it's a rare case,
> but the case with no URL is a rare case too.

The meaning of the two instances of the word "rare" is different in
the above sentence, no?

The first one, the setting somebody could use if they really wanted
to is "if you re really pressed to choose between possible and
impossible, you cannot say it is impossible".

It is dubious how such a set-up is useful.  The server is telling
that the other side should now use this new thing (perhaps locally
reachable) so that "a symlink to or a mount of" must be prepared
beforehand in order to use the "the user is told ../foo as a new
lop, and at ../foo there is already usable (remote) object store
available over the network filesystem".  Wouldn't such a $CORP
sysadmin who can prepare "../foo" on the client machines for their
client repositories be able to instead prepare ".git/config" for
them with the same labor?  IOW, the "is dubious how such a set up is
useful" I said is not questioning if it works.  It questions if that
is the way how people _want_ their system to work.

The latter "rare" is "when there is no URL (i.e. r->name fallback
needs to be taken), it would be much more common that the command
'git fetch ../foo main' was given without configuring ../foo remote
(hence URL is missing, but so is fetch refspec), than somebody added
remote."../foo".fetch refspec but choose not to add URL".  There is
absolutely no reason a user would deliberately do so, unless the
only thing they are interested in saying is "hey this works because
of r->name fallback", without considering how much time of others
(e.g. who read their config when they have problem with Git to help
diagnose the problem for them, and then notice and wonder why .URL
is missing there) doing so would waste.

So whichever "rare" we are talking about, ...

>> It may be possible, but would that make _any_ practical sense?  I
>> doubt it.  I would understand perfectly well that the local single
>> machine situation as a good justification to use file:// URL in such
>> a setting, but not for the r->name fallback.

... this point still stands.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing
  2025-03-14 14:12         ` [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing Christian Couder
@ 2025-03-14 18:59           ` Junio C Hamano
  2025-03-18 11:03             ` Christian Couder
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-03-14 18:59 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> While at it, let's also use git_config_get_string_tmp() instead of
> git_config_get_string() to simplify memory management.
> ...
> -		strvec_push(names, r->name);
> -		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
> +		/* Only add remotes with a non empty URL */
> +		if (!git_config_get_string_tmp(url_key, &url) && *url) {
> +			strvec_push(names, r->name);
> +			strvec_push(urls, url);
> +		}
>  
> -		free(url);

Nice.

> +test_expect_success "clone with 'KnownName' and missing URL in the config" '
> +	git -C server config promisor.advertise true &&
> +
> +	# Clone from server to create a client
> +	# Lazy fetching by the client from the LOP will fail because of the
> +	# missing URL in the client config, so the server will have to lazy
> +	# fetch from the LOP.
> +	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
> +		-c promisor.acceptfromserver=KnownName \
> +		--no-local --filter="blob:limit=5k" server client &&
> +	test_when_finished "rm -rf client" &&

These are the other way around.  When 'clone' fails, test_when_finished
is not run, so nobody arranges the new directory 'client' to be removed.
"git clone" does try to remove in such a case, but we are protecting
against a failing "clone", so swapping them around, i.e. arrange to
remove it and then try to create it, would make more sense.

> +test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
> +	git -C server config promisor.advertise true &&
> +
> +	git -C server config unset remote.lop.url &&
> +	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&

Probably the same principle applies here, but the case where "git
config" fails, it is likely that the file is not touched at all, or
it gets so corrupt beyond salvaging with another "config set", so it
matters much less than the previous one.

> +	# Clone from server to create a client
> +	# It should fail because the client will reject the LOP as URLs are
> +	# different, and the server cannot lazy fetch as the LOP URL is
> +	# missing, so the remote name will be used instead which will fail.
> +	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
> +		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
> +		-c remote.lop.url="file://$(pwd)/lop" \
> +		-c promisor.acceptfromserver=KnownUrl \
> +		--no-local --filter="blob:limit=5k" server client &&
> +
> +	# Check that the largest object is still missing on the server
> +	check_missing_objects server 1 "$oid"
> +'
> +
> +test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
> +	git -C server config promisor.advertise true &&
> +
> +	git -C server config set remote.lop.url "" &&
> +	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
> +
> +	# Clone from server to create a client
> +	# It should fail because the client will reject the LOP as an empty URL is
> +	# not advertised, and the server cannot lazy fetch as the LOP URL is empty,
> +	# so the remote name will be used instead which will fail.
> +	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
> +		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
> +		-c remote.lop.url="file://$(pwd)/lop" \
> +		-c promisor.acceptfromserver=KnownUrl \
> +		--no-local --filter="blob:limit=5k" server client &&
> +
> +	# Check that the largest object is still missing on the server
> +	check_missing_objects server 1 "$oid"
> +'

The test clone is identical to the previous one except for the
four-line comment in the middle.  The set-up on the other side is
different (the server has remote.lop.url set in the previous one to
empty, and unset in this one, which should amount to the same
thing).  Makes sense.

>  test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
>  	git -C server config promisor.advertise true &&


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 0/4] "promisor-remote" capability fixes
  2025-03-14 14:12       ` [PATCH v5 0/3] "promisor-remote" capability fixes Christian Couder
                           ` (2 preceding siblings ...)
  2025-03-14 14:12         ` [PATCH v5 3/3] promisor-remote: compare remote names case sensitively Christian Couder
@ 2025-03-18 11:00         ` Christian Couder
  2025-03-18 11:00           ` [PATCH v6 1/4] t5710: arrange to delete the client before cloning Christian Couder
                             ` (3 more replies)
  3 siblings, 4 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-18 11:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder

This addresses a number of different issues found after the
"promisor-remote" capability patch series was merged.

Changes since v5:

There are only changes in the "t5710-promisor-remote-capability.sh"
test script since v5:

  - Patch 1/4 ("t5710: arrange to delete the client before cloning")
    is new. It moves `test_when_finished "rm -rf client"` before we
    clone. It's a preparatory patch to avoid possible issues in the
    following tests if a clone ever fails in a weird way.

  - In patch 2/4 ("promisor-remote: fix segfault when remote URL is
    missing"), the tests have been improved in a few ways:

      - a `test_when_finished "rm -rf client"` has been moved before a
        clone instruction to avoid possible issues in the following
        tests if the clone fails in a weird way,

      - some instructions to reset `remote.lop.url` on the server to
        it's original value ("file://$(pwd)/lop") have been moved
        before the instructions that unset it or set it to an empty
        value,

      - some `test_when_finished "rm -rf client"` have been added into
        the new tests where the clone is expected to fail, to avoid
        possible issues in the following tests in case the clone
        actually succeeds.

Range diff since v5:

-:  ---------- > 1:  12e6251c65 t5710: arrange to delete the client before cloning
1:  f01457943d ! 2:  9fe0844b72 promisor-remote: fix segfault when remote URL is missing
    @@ t/t5710-promisor-remote-capability.sh: test_expect_success "clone with 'KnownNam
      
     +test_expect_success "clone with 'KnownName' and missing URL in the config" '
     +  git -C server config promisor.advertise true &&
    ++  test_when_finished "rm -rf client" &&
     +
     +  # Clone from server to create a client
     +  # Lazy fetching by the client from the LOP will fail because of the
    @@ t/t5710-promisor-remote-capability.sh: test_expect_success "clone with 'KnownNam
     +  GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
     +          -c promisor.acceptfromserver=KnownName \
     +          --no-local --filter="blob:limit=5k" server client &&
    -+  test_when_finished "rm -rf client" &&
     +
     +  # Check that the largest object is not missing on the server
     +  check_missing_objects server 0 "" &&
    @@ t/t5710-promisor-remote-capability.sh: test_expect_success "clone with 'KnownNam
     +
      test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
        git -C server config promisor.advertise true &&
    - 
    +   test_when_finished "rm -rf client" &&
     @@ t/t5710-promisor-remote-capability.sh: test_expect_success "clone with 'KnownUrl' and different remote urls" '
        initialize_server 1 "$oid"
      '
      
     +test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
     +  git -C server config promisor.advertise true &&
    ++  test_when_finished "rm -rf client" &&
     +
    -+  git -C server config unset remote.lop.url &&
     +  test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
    ++  git -C server config unset remote.lop.url &&
     +
     +  # Clone from server to create a client
     +  # It should fail because the client will reject the LOP as URLs are
    @@ t/t5710-promisor-remote-capability.sh: test_expect_success "clone with 'KnownUrl
     +
     +test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
     +  git -C server config promisor.advertise true &&
    ++  test_when_finished "rm -rf client" &&
     +
    -+  git -C server config set remote.lop.url "" &&
     +  test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
    ++  git -C server config set remote.lop.url "" &&
     +
     +  # Clone from server to create a client
     +  # It should fail because the client will reject the LOP as an empty URL is
2:  8981eb9dae = 3:  f56dccc5e2 promisor-remote: fix possible issue when no URL is advertised
3:  a8a9f9b33b = 4:  81387f61c3 promisor-remote: compare remote names case sensitively

Christian Couder (4):
  t5710: arrange to delete the client before cloning
  promisor-remote: fix segfault when remote URL is missing
  promisor-remote: fix possible issue when no URL is advertised
  promisor-remote: compare remote names case sensitively

 Documentation/config/promisor.adoc    |  4 +-
 promisor-remote.c                     | 27 ++++++----
 t/t5710-promisor-remote-capability.sh | 75 ++++++++++++++++++++++++---
 3 files changed, 86 insertions(+), 20 deletions(-)

-- 
2.49.0.1.g12e6251c65



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 1/4] t5710: arrange to delete the client before cloning
  2025-03-18 11:00         ` [PATCH v6 0/4] "promisor-remote" capability fixes Christian Couder
@ 2025-03-18 11:00           ` Christian Couder
  2025-03-18 11:00           ` [PATCH v6 2/4] promisor-remote: fix segfault when remote URL is missing Christian Couder
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-18 11:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

If `test_when_finished "rm -rf client"` is run after we clone, it
will not run if the clone failed, so the "client" directory might
not be removed at the end of the test.

`git clone` does try to remove the directory when it fails, but
let's be safe and try to protect against possibly weird clone
failures by moving `test_when_finished "rm -rf client"` before
the clone. It just makes more sense this way around.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t5710-promisor-remote-capability.sh | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index d2cc69a17e..e26a97f588 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -93,6 +93,7 @@ test_expect_success "setup for testing promisor remote advertisement" '
 
 test_expect_success "clone with promisor.advertise set to 'true'" '
 	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
 
 	# Clone from server to create a client
 	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
@@ -100,7 +101,6 @@ test_expect_success "clone with promisor.advertise set to 'true'" '
 		-c remote.lop.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=All \
 		--no-local --filter="blob:limit=5k" server client &&
-	test_when_finished "rm -rf client" &&
 
 	# Check that the largest object is still missing on the server
 	check_missing_objects server 1 "$oid"
@@ -108,6 +108,7 @@ test_expect_success "clone with promisor.advertise set to 'true'" '
 
 test_expect_success "clone with promisor.advertise set to 'false'" '
 	git -C server config promisor.advertise false &&
+	test_when_finished "rm -rf client" &&
 
 	# Clone from server to create a client
 	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
@@ -115,7 +116,6 @@ test_expect_success "clone with promisor.advertise set to 'false'" '
 		-c remote.lop.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=All \
 		--no-local --filter="blob:limit=5k" server client &&
-	test_when_finished "rm -rf client" &&
 
 	# Check that the largest object is not missing on the server
 	check_missing_objects server 0 "" &&
@@ -126,6 +126,7 @@ test_expect_success "clone with promisor.advertise set to 'false'" '
 
 test_expect_success "clone with promisor.acceptfromserver set to 'None'" '
 	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
 
 	# Clone from server to create a client
 	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
@@ -133,7 +134,6 @@ test_expect_success "clone with promisor.acceptfromserver set to 'None'" '
 		-c remote.lop.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=None \
 		--no-local --filter="blob:limit=5k" server client &&
-	test_when_finished "rm -rf client" &&
 
 	# Check that the largest object is not missing on the server
 	check_missing_objects server 0 "" &&
@@ -144,8 +144,8 @@ test_expect_success "clone with promisor.acceptfromserver set to 'None'" '
 
 test_expect_success "init + fetch with promisor.advertise set to 'true'" '
 	git -C server config promisor.advertise true &&
-
 	test_when_finished "rm -rf client" &&
+
 	mkdir client &&
 	git -C client init &&
 	git -C client config remote.lop.promisor true &&
@@ -162,6 +162,7 @@ test_expect_success "init + fetch with promisor.advertise set to 'true'" '
 
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownName'" '
 	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
 
 	# Clone from server to create a client
 	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
@@ -169,7 +170,6 @@ test_expect_success "clone with promisor.acceptfromserver set to 'KnownName'" '
 		-c remote.lop.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=KnownName \
 		--no-local --filter="blob:limit=5k" server client &&
-	test_when_finished "rm -rf client" &&
 
 	# Check that the largest object is still missing on the server
 	check_missing_objects server 1 "$oid"
@@ -177,6 +177,7 @@ test_expect_success "clone with promisor.acceptfromserver set to 'KnownName'" '
 
 test_expect_success "clone with 'KnownName' and different remote names" '
 	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
 
 	# Clone from server to create a client
 	GIT_NO_LAZY_FETCH=0 git clone -c remote.serverTwo.promisor=true \
@@ -184,7 +185,6 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 		-c remote.serverTwo.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=KnownName \
 		--no-local --filter="blob:limit=5k" server client &&
-	test_when_finished "rm -rf client" &&
 
 	# Check that the largest object is not missing on the server
 	check_missing_objects server 0 "" &&
@@ -195,6 +195,7 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
 
 	# Clone from server to create a client
 	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
@@ -202,7 +203,6 @@ test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 		-c remote.lop.url="file://$(pwd)/lop" \
 		-c promisor.acceptfromserver=KnownUrl \
 		--no-local --filter="blob:limit=5k" server client &&
-	test_when_finished "rm -rf client" &&
 
 	# Check that the largest object is still missing on the server
 	check_missing_objects server 1 "$oid"
@@ -212,6 +212,7 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	ln -s lop serverTwo &&
 
 	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
 
 	# Clone from server to create a client
 	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
@@ -219,7 +220,6 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" '
 		-c remote.lop.url="file://$(pwd)/serverTwo" \
 		-c promisor.acceptfromserver=KnownUrl \
 		--no-local --filter="blob:limit=5k" server client &&
-	test_when_finished "rm -rf client" &&
 
 	# Check that the largest object is not missing on the server
 	check_missing_objects server 0 "" &&
-- 
2.49.0.1.g12e6251c65



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 2/4] promisor-remote: fix segfault when remote URL is missing
  2025-03-18 11:00         ` [PATCH v6 0/4] "promisor-remote" capability fixes Christian Couder
  2025-03-18 11:00           ` [PATCH v6 1/4] t5710: arrange to delete the client before cloning Christian Couder
@ 2025-03-18 11:00           ` Christian Couder
  2025-03-18 11:00           ` [PATCH v6 3/4] promisor-remote: fix possible issue when no URL is advertised Christian Couder
  2025-03-18 11:00           ` [PATCH v6 4/4] promisor-remote: compare remote names case sensitively Christian Couder
  3 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-18 11:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's not push the remote
name and URL into the 'strvec's.

While at it, let's also not push them in case the URL is empty. It's
just not worth the trouble and it's consistent with how Git otherwise
treats missing and empty URLs in the same way.

Note that in case of missing or empty URL, Git uses the remote name to
fetch, which can work if the remote is on the same filesystem. So
configurations where the client, server and remote are all on the same
filesystem may need URLs to be configured even if they are the same as
the remote names. But this is a rare case, and the work around is easy
enough.

We leave improving the strvec API and/or xstrdup() for a future
separate effort.

While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c                     | 16 +++----
 t/t5710-promisor-remote-capability.sh | 61 +++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 8 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index 6a0a61382f..ba80240f12 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -323,13 +323,15 @@ static void promisor_info_vecs(struct repository *repo,
 	promisor_remote_init(repo);
 
 	for (r = repo->promisor_remote_config->promisors; r; r = r->next) {
-		char *url;
+		const char *url;
 		char *url_key = xstrfmt("remote.%s.url", r->name);
 
-		strvec_push(names, r->name);
-		strvec_push(urls, git_config_get_string(url_key, &url) ? NULL : url);
+		/* Only add remotes with a non empty URL */
+		if (!git_config_get_string_tmp(url_key, &url) && *url) {
+			strvec_push(names, r->name);
+			strvec_push(urls, url);
+		}
 
-		free(url);
 		free(url_key);
 	}
 }
@@ -356,10 +358,8 @@ char *promisor_remote_info(struct repository *repo)
 			strbuf_addch(&sb, ';');
 		strbuf_addstr(&sb, "name=");
 		strbuf_addstr_urlencode(&sb, names.v[i], allow_unsanitized);
-		if (urls.v[i]) {
-			strbuf_addstr(&sb, ",url=");
-			strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
-		}
+		strbuf_addstr(&sb, ",url=");
+		strbuf_addstr_urlencode(&sb, urls.v[i], allow_unsanitized);
 	}
 
 	strvec_clear(&names);
diff --git a/t/t5710-promisor-remote-capability.sh b/t/t5710-promisor-remote-capability.sh
index e26a97f588..b35b774235 100755
--- a/t/t5710-promisor-remote-capability.sh
+++ b/t/t5710-promisor-remote-capability.sh
@@ -193,6 +193,25 @@ test_expect_success "clone with 'KnownName' and different remote names" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownName' and missing URL in the config" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	# Clone from server to create a client
+	# Lazy fetching by the client from the LOP will fail because of the
+	# missing URL in the client config, so the server will have to lazy
+	# fetch from the LOP.
+	GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c promisor.acceptfromserver=KnownName \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is not missing on the server
+	check_missing_objects server 0 "" &&
+
+	# Reinitialize server so that the largest object is missing again
+	initialize_server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.acceptfromserver set to 'KnownUrl'" '
 	git -C server config promisor.advertise true &&
 	test_when_finished "rm -rf client" &&
@@ -228,6 +247,48 @@ test_expect_success "clone with 'KnownUrl' and different remote urls" '
 	initialize_server 1 "$oid"
 '
 
+test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+	git -C server config unset remote.lop.url &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as URLs are
+	# different, and the server cannot lazy fetch as the LOP URL is
+	# missing, so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
+test_expect_success "clone with 'KnownUrl' and empty url, so not advertised" '
+	git -C server config promisor.advertise true &&
+	test_when_finished "rm -rf client" &&
+
+	test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
+	git -C server config set remote.lop.url "" &&
+
+	# Clone from server to create a client
+	# It should fail because the client will reject the LOP as an empty URL is
+	# not advertised, and the server cannot lazy fetch as the LOP URL is empty,
+	# so the remote name will be used instead which will fail.
+	test_must_fail env GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
+		-c remote.lop.fetch="+refs/heads/*:refs/remotes/lop/*" \
+		-c remote.lop.url="file://$(pwd)/lop" \
+		-c promisor.acceptfromserver=KnownUrl \
+		--no-local --filter="blob:limit=5k" server client &&
+
+	# Check that the largest object is still missing on the server
+	check_missing_objects server 1 "$oid"
+'
+
 test_expect_success "clone with promisor.advertise set to 'true' but don't delete the client" '
 	git -C server config promisor.advertise true &&
 
-- 
2.49.0.1.g12e6251c65



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 3/4] promisor-remote: fix possible issue when no URL is advertised
  2025-03-18 11:00         ` [PATCH v6 0/4] "promisor-remote" capability fixes Christian Couder
  2025-03-18 11:00           ` [PATCH v6 1/4] t5710: arrange to delete the client before cloning Christian Couder
  2025-03-18 11:00           ` [PATCH v6 2/4] promisor-remote: fix segfault when remote URL is missing Christian Couder
@ 2025-03-18 11:00           ` Christian Couder
  2025-03-18 11:00           ` [PATCH v6 4/4] promisor-remote: compare remote names case sensitively Christian Couder
  3 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-18 11:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

In the 'KnownUrl' case, in should_accept_remote(), let's check that
`remote_url` is not NULL before we use strcmp() to compare it with
the local URL. This could avoid crashes if a server starts to not
advertise any URL in the future.

If `remote_url` is NULL, we should reject the URL. Let's also warn in
this case because we warn otherwise when a remote is rejected to try
to help diagnose things at the end of the function.

And while we are checking that remote_url is not NULL and warning if
it is, it makes sense to also help diagnose the case where remote_url
is empty.

Also while at it, let's spell "URL" with uppercase letters in all the
warnings.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 promisor-remote.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index ba80240f12..0b7b1ec45a 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -409,10 +409,15 @@ static int should_accept_remote(enum accept_promisor accept,
 	if (accept != ACCEPT_KNOWN_URL)
 		BUG("Unhandled 'enum accept_promisor' value '%d'", accept);
 
+	if (!remote_url || !*remote_url) {
+		warning(_("no or empty URL advertised for remote '%s'"), remote_name);
+		return 0;
+	}
+
 	if (!strcmp(urls->v[i], remote_url))
 		return 1;
 
-	warning(_("known remote named '%s' but with url '%s' instead of '%s'"),
+	warning(_("known remote named '%s' but with URL '%s' instead of '%s'"),
 		remote_name, urls->v[i], remote_url);
 
 	return 0;
-- 
2.49.0.1.g12e6251c65



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 4/4] promisor-remote: compare remote names case sensitively
  2025-03-18 11:00         ` [PATCH v6 0/4] "promisor-remote" capability fixes Christian Couder
                             ` (2 preceding siblings ...)
  2025-03-18 11:00           ` [PATCH v6 3/4] promisor-remote: fix possible issue when no URL is advertised Christian Couder
@ 2025-03-18 11:00           ` Christian Couder
  3 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-18 11:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Patrick Steinhardt, Taylor Blau,
	Eric Sunshine, Karthik Nayak, Kristoffer Haugsbakk,
	brian m . carlson, Randall S . Becker, Christian Couder,
	Christian Couder

Because the "[remote "nick"] fetch = ..." configuration variables
have the nickname in the second part, the nicknames are case
sensitive, unlike the first and the third component (i.e.
"remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
but "remote.Origin.fetch" and "remote.origin.fetch" are different).

Let's follow the way Git works in general and compare the remote
names case sensitively when processing advertised remotes.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/config/promisor.adoc | 4 ++--
 promisor-remote.c                  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
index 9192acfd24..2638b01f83 100644
--- a/Documentation/config/promisor.adoc
+++ b/Documentation/config/promisor.adoc
@@ -26,5 +26,5 @@ promisor.acceptFromServer::
 	server will be accepted. By accepting a promisor remote, the
 	client agrees that the server might omit objects that are
 	lazily fetchable from this promisor remote from its responses
-	to "fetch" and "clone" requests from the client. See
-	linkgit:gitprotocol-v2[5].
+	to "fetch" and "clone" requests from the client. Name and URL
+	comparisons are case sensitive. See linkgit:gitprotocol-v2[5].
diff --git a/promisor-remote.c b/promisor-remote.c
index 0b7b1ec45a..5801ebfd9b 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -370,13 +370,13 @@ char *promisor_remote_info(struct repository *repo)
 
 /*
  * Find first index of 'nicks' where there is 'nick'. 'nick' is
- * compared case insensitively to the strings in 'nicks'. If not found
+ * compared case sensitively to the strings in 'nicks'. If not found
  * 'nicks->nr' is returned.
  */
 static size_t remote_nick_find(struct strvec *nicks, const char *nick)
 {
 	for (size_t i = 0; i < nicks->nr; i++)
-		if (!strcasecmp(nicks->v[i], nick))
+		if (!strcmp(nicks->v[i], nick))
 			return i;
 	return nicks->nr;
 }
-- 
2.49.0.1.g12e6251c65



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing
  2025-03-14 18:59           ` Junio C Hamano
@ 2025-03-18 11:03             ` Christian Couder
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-18 11:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Fri, Mar 14, 2025 at 7:59 PM Junio C Hamano <gitster@pobox.com> wrote:

> > +test_expect_success "clone with 'KnownName' and missing URL in the config" '
> > +     git -C server config promisor.advertise true &&
> > +
> > +     # Clone from server to create a client
> > +     # Lazy fetching by the client from the LOP will fail because of the
> > +     # missing URL in the client config, so the server will have to lazy
> > +     # fetch from the LOP.
> > +     GIT_NO_LAZY_FETCH=0 git clone -c remote.lop.promisor=true \
> > +             -c promisor.acceptfromserver=KnownName \
> > +             --no-local --filter="blob:limit=5k" server client &&
> > +     test_when_finished "rm -rf client" &&
>
> These are the other way around.  When 'clone' fails, test_when_finished
> is not run, so nobody arranges the new directory 'client' to be removed.
> "git clone" does try to remove in such a case, but we are protecting
> against a failing "clone", so swapping them around, i.e. arrange to
> remove it and then try to create it, would make more sense.

Yeah, right. I made this change in the next version.

I also think it would make more sense for all the tests in this test
script to arrange to remove it before cloning in case the clone fails
in weird ways. So I am adding a preparatory patch to do that in the
next version then.

> > +test_expect_success "clone with 'KnownUrl' and url not configured on the server" '
> > +     git -C server config promisor.advertise true &&
> > +
> > +     git -C server config unset remote.lop.url &&
> > +     test_when_finished "git -C server config set remote.lop.url \"file://$(pwd)/lop\"" &&
>
> Probably the same principle applies here, but the case where "git
> config" fails, it is likely that the file is not touched at all, or
> it gets so corrupt beyond salvaging with another "config set", so it
> matters much less than the previous one.

I agree it would be a bit better, so, in the next version, I moved the
"test_when_finished ..." line above the other one. I did that for the
"clone with 'KnownUrl' and empty url, so not advertised" test below
too.

While at it, I think it's also a bit better to add `test_when_finished
"rm -rf client"` to these 2 tests, to protect the following test in
case the clone actually succeeds. So I have added that in the next
version.

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/3] promisor-remote: compare remote names case sensitively
  2025-03-14 17:28           ` Junio C Hamano
@ 2025-03-18 11:04             ` Christian Couder
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Couder @ 2025-03-18 11:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Patrick Steinhardt, Taylor Blau, Eric Sunshine,
	Karthik Nayak, Kristoffer Haugsbakk, brian m . carlson,
	Randall S . Becker, Christian Couder

On Fri, Mar 14, 2025 at 6:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > Because the "[remote "nick"] fetch = ..." configuration variables
> > have the nickname in the second part, the nicknames are case
> > sensitive, unlike the first and the third component (i.e.
> > "remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
> > but "remote.Origin.fetch" and "remote.origin.fetch" are different).
>
> I double-checked what the control flow that passes through
> remote.c:handle_config() does, and the above is in line with what
> remote_get() does.
>
> remote.c:read_config() populates the nickname-to-remote hashmap by
> using handle_config() callback, which calls make_remote() with the
> second level name (e.g. "Origin" and "origin" in the last example of
> the above), which is passed to memhash() not memihash() when looking
> up or registering the remote.
>
> If we used case insensitive comparison in the new code, a malicious
> large-object promisor remote could have told us to use "Origin" as
> an extra promisor and in response the new code may noticed that we
> have "origin" and tried to equate it with what the other side told
> us.  But when the existing code actually interacts with the promisor
> remote, it wouldn't have found any configured remote under the name
> "Origin", and something funny would start from there.  By using the
> right remote consistently throughout the system, we would not get
> confused that way, which is good.

Yeah, right. Thanks for looking into it.

I don't think this needs to be in the commit message, so I haven't
changed this patch in the next version. But I would be fine with
adding your explanations or something similar if someone thinks it's
worth it.


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-03-18 11:06 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-10  7:40 [PATCH] promisor-remote: fix segfault when remote URL is missing Christian Couder
2025-03-10 16:29 ` Junio C Hamano
2025-03-11 15:24   ` Christian Couder
2025-03-11 16:57     ` Junio C Hamano
2025-03-11 15:24 ` [PATCH v2] " Christian Couder
2025-03-11 16:59   ` Junio C Hamano
2025-03-12 11:48     ` Christian Couder
2025-03-11 20:48   ` Junio C Hamano
2025-03-12 11:47     ` Christian Couder
2025-03-11 23:06   ` Jeff King
2025-03-11 23:36     ` Junio C Hamano
2025-03-12 11:47     ` Christian Couder
2025-03-12 11:46   ` [PATCH v3] " Christian Couder
2025-03-12 17:02     ` Junio C Hamano
2025-03-13 10:39       ` Christian Couder
2025-03-13 16:40         ` Junio C Hamano
2025-03-14 14:09           ` Christian Couder
2025-03-14 17:28             ` Junio C Hamano
2025-03-13 10:38     ` [PATCH v4] " Christian Couder
2025-03-13 16:28       ` Junio C Hamano
2025-03-13 17:23         ` Junio C Hamano
2025-03-14 14:10         ` Christian Couder
2025-03-14 14:12       ` [PATCH v5 0/3] "promisor-remote" capability fixes Christian Couder
2025-03-14 14:12         ` [PATCH v5 1/3] promisor-remote: fix segfault when remote URL is missing Christian Couder
2025-03-14 18:59           ` Junio C Hamano
2025-03-18 11:03             ` Christian Couder
2025-03-14 14:12         ` [PATCH v5 2/3] promisor-remote: fix possible issue when no URL is advertised Christian Couder
2025-03-14 14:12         ` [PATCH v5 3/3] promisor-remote: compare remote names case sensitively Christian Couder
2025-03-14 17:28           ` Junio C Hamano
2025-03-18 11:04             ` Christian Couder
2025-03-18 11:00         ` [PATCH v6 0/4] "promisor-remote" capability fixes Christian Couder
2025-03-18 11:00           ` [PATCH v6 1/4] t5710: arrange to delete the client before cloning Christian Couder
2025-03-18 11:00           ` [PATCH v6 2/4] promisor-remote: fix segfault when remote URL is missing Christian Couder
2025-03-18 11:00           ` [PATCH v6 3/4] promisor-remote: fix possible issue when no URL is advertised Christian Couder
2025-03-18 11:00           ` [PATCH v6 4/4] promisor-remote: compare remote names case sensitively Christian Couder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).