git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/2] Conditional config includes based on remote URL
@ 2021-10-12 22:57 Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
                   ` (11 more replies)
  0 siblings, 12 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-12 22:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Previously [1], I sent a patch set for remote-suggested configs that are
transmitted when fetching, but there were some security concerns. Here
is another way that remote repo administators can provide recommended
configs - through conditionally included files based on the configured
remote. Git itself neither transmits nor prompts for these files, which
hopefully reduces people's concerns.

More information is in the commit message of patch 2.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 config.c          | 80 ++++++++++++++++++++++++++++++++++++++++++-----
 config.h          | 37 +++-------------------
 t/t1300-config.sh | 27 ++++++++++++++++
 3 files changed, 103 insertions(+), 41 deletions(-)

-- 
2.33.0.882.g93a45727a2-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2021-10-12 22:57 ` Jonathan Tan
  2021-10-12 23:07   ` Jeff King
                     ` (2 more replies)
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-12 22:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2edf835262..365d57833b 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index 147f5e0490..b11b0be09a 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -339,39 +343,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.33.0.882.g93a45727a2-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-10-12 22:57 ` Jonathan Tan
  2021-10-12 23:30   ` Jeff King
  2021-10-12 23:48   ` Junio C Hamano
  2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-12 22:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager). But this can also be used by,
say, an individual that wants certain configs to apply to a certain set
of local repos but not others.

I marked this as RFC because there are some design points that need to
be resolved:

 - The existing "include" and "includeIf" instructions are executed
   immediately, whereas in order to be useful, the execution of
   "includeIf hasremoteurl" needs to be delayed until all config files
   are read. Are there better ways to do this?

 - Is the conditionally-included file allowed to have its own
   "include{,If}" instructions? I'm thinking that we should forbid it
   because, for example, if we had 4 files as follows: A includes B and
   C includes D, and we include A and C in our main config (in that
   order), it wouldn't be clear whether B (because A was first included)
   or C (because we should execute everything at the same depth first)
   should be executed first. (In this patch, I didn't do anything about
   includes.)

 - A small one: the exact format of the glob. I probably will treat the
   URL like a path.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c          | 70 +++++++++++++++++++++++++++++++++++++++++------
 t/t1300-config.sh | 27 ++++++++++++++++++
 2 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/config.c b/config.c
index 365d57833b..448509d549 100644
--- a/config.c
+++ b/config.c
@@ -125,8 +125,20 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list remote_urls;
+
+	/*
+	 * All "includeif.hasremoteurl:" entries. The item is the URL glob and the
+	 * util is the path (must be freed).
+	 */
+	struct string_list include_url_glob_to_path;
 };
-#define CONFIG_INCLUDE_INIT { 0 }
+#define CONFIG_INCLUDE_INIT { .remote_urls = STRING_LIST_INIT_DUP, \
+	.include_url_glob_to_path = STRING_LIST_INIT_DUP }
 
 static int git_config_include(const char *var, const char *value, void *data);
 
@@ -319,10 +331,18 @@ static int include_condition_is_true(const struct config_options *opts,
 static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
+	const char *remote_name;
+	size_t remote_name_len;
 	const char *cond, *key;
 	size_t cond_len;
 	int ret;
 
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(&inc->remote_urls, value);
+
 	/*
 	 * Pass along all values, including "include" directives; this makes it
 	 * possible to query information on the includes themselves.
@@ -335,9 +355,18 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
-		ret = handle_path_include(value, inc);
+	    cond && !strcmp(key, "path")) {
+		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &cond,
+				    &cond_len)) {
+			struct string_list_item *item = string_list_append_nodup(
+				&inc->include_url_glob_to_path,
+				xmemdupz(cond, cond_len));
+			item->util = xstrdup(value);
+			ret = 0;
+		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
+			ret = handle_path_include(value, inc);
+		}
+	}
 
 	return ret;
 }
@@ -1951,6 +1980,8 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
+	struct string_list_item *glob_item;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
@@ -1968,17 +1999,40 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
+	}
+
+	for_each_string_list_item(glob_item, &inc.include_url_glob_to_path) {
+		struct strbuf pattern = STRBUF_INIT;
+		struct string_list_item *url_item;
+		int found = 0;
+
+		strbuf_addstr(&pattern, glob_item->string);
+		add_trailing_starstar_for_dir(&pattern);
+		for_each_string_list_item(url_item, &inc.remote_urls) {
+			if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+				found = 1;
+				break;
+			}
+		}
+		strbuf_release(&pattern);
+		if (found) {
+			handle_path_include(glob_item->util, &inc);
+		}
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	string_list_clear(&inc.remote_urls, 0);
+	string_list_clear(&inc.include_url_glob_to_path, 1);
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..4803155f89 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,31 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasremoteurl' '
+	test_create_repo hasremoteurlTest &&
+
+	cat >"$(pwd)"/include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >"$(pwd)"/dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasremoteurl:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
 test_done
-- 
2.33.0.882.g93a45727a2-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-10-12 23:07   ` Jeff King
  2021-10-12 23:26   ` Junio C Hamano
  2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 87+ messages in thread
From: Jeff King @ 2021-10-12 23:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Oct 12, 2021 at 03:57:22PM -0700, Jonathan Tan wrote:

> It is not used from outside the file in which it is declared.

Makes sense. We used to use it from builtin/config.c, but that went away
in e895589883 (git-config: use git_config_with_options, 2012-10-23).

> diff --git a/config.h b/config.h
> index 147f5e0490..b11b0be09a 100644
> --- a/config.h
> +++ b/config.h
> @@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
>  /**
>   * Read a specific file in git-config format.
>   * This function takes the same callback and data parameters as `git_config`.
> + *
> + * Unlike git_config(), this function does not respect includes.
>   */

Breaking out the relevant caller-facing parts of the documentation like
this is a nice touch.

And the rest of the patch looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
  2021-10-12 23:07   ` Jeff King
@ 2021-10-12 23:26   ` Junio C Hamano
  2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-10-12 23:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> It is not used from outside the file in which it is declared.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  config.c | 12 +++++++++++-
>  config.h | 37 ++++---------------------------------
>  2 files changed, 15 insertions(+), 34 deletions(-)

Nice.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-10-12 23:30   ` Jeff King
  2021-10-13 18:33     ` Jonathan Tan
  2021-10-12 23:48   ` Junio C Hamano
  1 sibling, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-12 23:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Oct 12, 2021 at 03:57:23PM -0700, Jonathan Tan wrote:

> This is a feature that supports config file inclusion conditional on
> whether the repo has a remote with a URL that matches a glob.
> 
> Similar to my previous work on remote-suggested hooks [1], the main
> motivation is to allow remote repo administrators to provide recommended
> configs in a way that can be consumed more easily (e.g. through a
> package installable by a package manager). But this can also be used by,
> say, an individual that wants certain configs to apply to a certain set
> of local repos but not others.

OK. I was a little wary after reading the subject that this would be
"when we are using such a URL", which is full of all kinds of odd corner
cases. But if it is "a remote is defined with a matching URL" that makes
it a property of the repository, not the operation.

I think in general this kind of feature is currently served by just
correlating filesystem paths with their function. So with your patch I
could do:

  [includeIf "hasremoteurl:https://myjob.example.com"]
  path = foo.conf

But in general, I'd imagine most people put their repository in ~/work
or similar, and just do:

  [includeIf "gitdir:~/work"]
  path = foo.conf

(and of course you can imagine more subdivisions as necessary). So I
find the use-case only sort-of compelling. In general, I'm in favor of
adding new includeIf directions even if they're only moderately
convenient. But this one is rather sticky, because it is dependent on
other config keys being defined. So it introduces a new and complicated
ordering issue. Is it worth it? Maybe I'm not being imaginative enough
in seeing the use cases.

> I marked this as RFC because there are some design points that need to
> be resolved:
> 
>  - The existing "include" and "includeIf" instructions are executed
>    immediately, whereas in order to be useful, the execution of
>    "includeIf hasremoteurl" needs to be delayed until all config files
>    are read. Are there better ways to do this?

Note that this violates the "as if they had been found at the location
of the include directive" rule which we advertise to users. I'd imagine
that most of the time it doesn't matter, but this is a pretty big
exception we'll need to document.

Just brainstorming some alternatives:

  - We could stop the world while we are parsing and do a _new_ parse
    that just looks at the remote config (in fact, this is the natural
    thing if you were consulting the regular remote.c code for the list
    of remotes, because it does its own config parse).

    That does mean that the remote-conditional includes cannot
    themselves define new remotes. But I think that is already the case
    with your patch (and violating that gets you into weird circular
    problems).

  - We could simply document that if you want to depend on conditional
    includes based on a particular remote.*.url existing, then that
    remote config must appear earlier in the sequence.

    This is a bit ugly, because I'm sure it will bite somebody
    eventually. But at the same time, it resolves all of the weird
    timing issues, and does so in a way that will be easy to match if we
    have any other config dependencies.

>  - Is the conditionally-included file allowed to have its own
>    "include{,If}" instructions? I'm thinking that we should forbid it
>    because, for example, if we had 4 files as follows: A includes B and
>    C includes D, and we include A and C in our main config (in that
>    order), it wouldn't be clear whether B (because A was first included)
>    or C (because we should execute everything at the same depth first)
>    should be executed first. (In this patch, I didn't do anything about
>    includes.)

I'd say that A would expand B at the moment it is parsed, by the usual
as-if rule. If it has a recursive includeIf on remotes, then my head may
explode. I'd argue that we should refuse to do recursive remote-ifs in
that case (though all of this is a consequence of the after-the-fact
parsing; I'd much prefer one of the alternatives I gave earlier).

>  - A small one: the exact format of the glob. I probably will treat the
>    URL like a path.

You might want to use the matcher from urlmatch.[ch], which understands
things like wildcards. Of course remote "URLs" are not always real
syntactically valid URLs, which may make that awkward.

Barring that the usual fnmatch glob is probably our best bet.

> @@ -319,10 +331,18 @@ static int include_condition_is_true(const struct config_options *opts,
>  static int git_config_include(const char *var, const char *value, void *data)
>  {
>  	struct config_include_data *inc = data;
> +	const char *remote_name;
> +	size_t remote_name_len;
>  	const char *cond, *key;
>  	size_t cond_len;
>  	int ret;
>  
> +	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
> +			      &key) &&
> +	    remote_name &&
> +	    !strcmp(key, "url"))
> +		string_list_append(&inc->remote_urls, value);

So we make a copy of every remote name on the off chance that somebody
has an includeIf which looks at it. That feels wasteful, though in
practice it's probably not that big a deal.

By doing the config parsing ourselves here we're missing out on any
other forms of remote, like .git/remotes. Those are old and not widely
used, and I'd be OK with skipping them. But we should clearly document
that this is matching remote.*.url, not any of the other mechanisms.

> [...]

I only lightly read the rest of the patch. I didn't see anything
obviously wrong, but I think the goal at this point is figuring out the
design.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-10-12 23:30   ` Jeff King
@ 2021-10-12 23:48   ` Junio C Hamano
  2021-10-13 19:52     ` Jonathan Tan
  1 sibling, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-10-12 23:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> I marked this as RFC because there are some design points that need to
> be resolved:
>
>  - The existing "include" and "includeIf" instructions are executed
>    immediately, whereas in order to be useful, the execution of
>    "includeIf hasremoteurl" needs to be delayed until all config files
>    are read. Are there better ways to do this?

An interesting chicken-and-egg problem.  Even if an included
configuration file does not have further "include", you may discover
there are more remotes, which may add new includes to fire from the
top-level configuration file.

What if we have multiple remotes?  Is it a sufficient match for only
one of them match what is mentioned in the includeIf condition?
Should all of them must match the pattern instead?  Majority,
perhaps?  Something else?

>  - Is the conditionally-included file allowed to have its own
>    "include{,If}" instructions? I'm thinking that we should forbid it
>    because, for example, if we had 4 files as follows: A includes B and
>    C includes D, and we include A and C in our main config (in that
>    order), it wouldn't be clear whether B (because A was first included)
>    or C (because we should execute everything at the same depth first)
>    should be executed first. (In this patch, I didn't do anything about
>    includes.)

Interesting.  The order of real inclusion obviously would affect the
outcome of the "last one wins" rule.  And this does not have to be
limited to this "hasremote" condition, so we need to design it with
a bit of care.

Would it be possible for a newly included file to invalidate an
earlier condition that was used to decide whether to include another
file or not?  If not, then you can take a two-pass approach where
the first pass is used to decide solely to discover which
conditionally included files are taken, clear the slate and the
parse these files in the textual order.  In the case of your example
above, the early part of the primary config would be the first to be
read, then comes A's early part, then comes B in its entirety, then
the rest of A, and then the middle part of the primary config, then
C's early part, then D, and then the rest of C,... you got the idea.

If it is possible for an included file to invalidate a condition we
have already evaluated to make a decision, it would become messy.
For example, we might decide to include another file based on the
value we discovered for a config variable:

    === .git/config ===
    [my] variable
    [your] variable = false

    [includeIf "configEq:my.variable==true"]
            file = fileA

but the included file may override the condition, e.g.

    === fileA ===
    [my] variable = false
    [your] variable = true

and applying the "last one wins" rule becomes messy.  I do not
offhand know what these

    $ git config --bool my.variable
    $ git config --bool your.variable

should say, and do not have a good explanation for possible
outcomes.

Maybe the above example can serve as a way to guide us when we
design the types of conditionals we allow in includeIf.  This
example tells us that it is probably a terrible idea to allow using
values of configuration variables as part of "includeIf" condition.

There may lead to similar "'hasremoteurl' makes a well-behaved
condition, because [remote] are additive and not 'last-one-wins',
but we cannot add 'lacksremoteurl' as a condition, because a file we
decide to include based on a 'lacks' predicate may invalidate the
'lacks' condition by defining such a remote" design decisions you'd
need to make around the URLs of the remotes defined for the
repository.



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
  2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-10-13  0:46 ` brian m. carlson
  2021-10-13 18:17   ` Jonathan Tan
  2021-10-18 20:48 ` Jonathan Tan
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: brian m. carlson @ 2021-10-13  0:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

On 2021-10-12 at 22:57:21, Jonathan Tan wrote:
> Previously [1], I sent a patch set for remote-suggested configs that are
> transmitted when fetching, but there were some security concerns. Here
> is another way that remote repo administators can provide recommended
> configs - through conditionally included files based on the configured
> remote. Git itself neither transmits nor prompts for these files, which
> hopefully reduces people's concerns.
> 
> More information is in the commit message of patch 2.

I won't go into the details of the patches, since I'm a little low on
time at the moment, but I think from what I've seen of the cover letter
and the commit messages, this approach is much better from a security
perspective and, provided we can get the kinks mentioned downthread
ironed out, I'd be happy to see this merged.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
  2021-10-12 23:07   ` Jeff King
  2021-10-12 23:26   ` Junio C Hamano
@ 2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
  2021-10-13 17:00     ` Junio C Hamano
  2 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-13  8:26 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Tue, Oct 12 2021, Jonathan Tan wrote:

> It is not used from outside the file in which it is declared.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  config.c | 12 +++++++++++-
>  config.h | 37 ++++---------------------------------
>  2 files changed, 15 insertions(+), 34 deletions(-)
>
> diff --git a/config.c b/config.c
> index 2edf835262..365d57833b 100644
> --- a/config.c
> +++ b/config.c
> @@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
>  	return conf->u.buf.pos;
>  }
>  
> +struct config_include_data {
> +	int depth;
> +	config_fn_t fn;
> +	void *data;
> +	const struct config_options *opts;
> +};
> +#define CONFIG_INCLUDE_INIT { 0 }
> +
> +static int git_config_include(const char *var, const char *value, void *data);

Can't we just move the function here?

>  #define MAX_INCLUDE_DEPTH 10
>  static const char include_depth_advice[] = N_(
>  "exceeded maximum include depth (%d) while including\n"
> @@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
>  	return 0;
>  }
>  
> -int git_config_include(const char *var, const char *value, void *data)
> +static int git_config_include(const char *var, const char *value, void *data)

...and avoid the forward declaration?

I've seen that in a few places, making the diff smaller here doesn't
seem worth it v.s. maintaining the definition in two places.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
@ 2021-10-13 17:00     ` Junio C Hamano
  2021-10-13 18:13       ` Jonathan Tan
  0 siblings, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-10-13 17:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Tan, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Tue, Oct 12 2021, Jonathan Tan wrote:
>
>> It is not used from outside the file in which it is declared.
>>
>> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>> ---
>>  config.c | 12 +++++++++++-
>>  config.h | 37 ++++---------------------------------
>>  2 files changed, 15 insertions(+), 34 deletions(-)
>>
>> diff --git a/config.c b/config.c
>> index 2edf835262..365d57833b 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
>>  	return conf->u.buf.pos;
>>  }
>>  
>> +struct config_include_data {
>> +	int depth;
>> +	config_fn_t fn;
>> +	void *data;
>> +	const struct config_options *opts;
>> +};
>> +#define CONFIG_INCLUDE_INIT { 0 }
>> +
>> +static int git_config_include(const char *var, const char *value, void *data);
>
> Can't we just move the function here?
>
>>  #define MAX_INCLUDE_DEPTH 10
>>  static const char include_depth_advice[] = N_(
>>  "exceeded maximum include depth (%d) while including\n"
>> @@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
>>  	return 0;
>>  }
>>  
>> -int git_config_include(const char *var, const char *value, void *data)
>> +static int git_config_include(const char *var, const char *value, void *data)
>
> ...and avoid the forward declaration?
>
> I've seen that in a few places, making the diff smaller here doesn't
> seem worth it v.s. maintaining the definition in two places.

Sounds good.  If we are moving things around anyway, it is probably
a good time to do that, too ;-)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 1/2] config: make git_config_include() static
  2021-10-13 17:00     ` Junio C Hamano
@ 2021-10-13 18:13       ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-13 18:13 UTC (permalink / raw)
  To: gitster; +Cc: avarab, jonathantanmy, git

> > ...and avoid the forward declaration?
> >
> > I've seen that in a few places, making the diff smaller here doesn't
> > seem worth it v.s. maintaining the definition in two places.
> 
> Sounds good.  If we are moving things around anyway, it is probably
> a good time to do that, too ;-)

OK, I'll do this.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
@ 2021-10-13 18:17   ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-13 18:17 UTC (permalink / raw)
  To: sandals; +Cc: git, Jonathan Tan

> I won't go into the details of the patches, since I'm a little low on
> time at the moment, but I think from what I've seen of the cover letter
> and the commit messages, this approach is much better from a security
> perspective and, provided we can get the kinks mentioned downthread
> ironed out, I'd be happy to see this merged.

Thanks - I really appreciate this note. Thanks also for all your
thoughts up to now about the security perspective.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 23:30   ` Jeff King
@ 2021-10-13 18:33     ` Jonathan Tan
  2021-10-27 11:40       ` Jeff King
  0 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-10-13 18:33 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> OK. I was a little wary after reading the subject that this would be
> "when we are using such a URL", which is full of all kinds of odd corner
> cases. But if it is "a remote is defined with a matching URL" that makes
> it a property of the repository, not the operation.
> 
> I think in general this kind of feature is currently served by just
> correlating filesystem paths with their function. So with your patch I
> could do:
> 
>   [includeIf "hasremoteurl:https://myjob.example.com"]
>   path = foo.conf
> 
> But in general, I'd imagine most people put their repository in ~/work
> or similar, and just do:
> 
>   [includeIf "gitdir:~/work"]
>   path = foo.conf
> 
> (and of course you can imagine more subdivisions as necessary). So I
> find the use-case only sort-of compelling. In general, I'm in favor of
> adding new includeIf directions even if they're only moderately
> convenient. But this one is rather sticky, because it is dependent on
> other config keys being defined. So it introduces a new and complicated
> ordering issue. Is it worth it? Maybe I'm not being imaginative enough
> in seeing the use cases.

My main use case is for a remote repo administrator to offer a
recommended config to anyone who clones that repo. For this, I don't
think we can prescribe a local directory structure (e.g. "~/work")
without being too restrictive or broad (that is, if the user ends up
creating a repo that so happens to match our glob but did not intend the
config to apply to it).

I did bring up the idea that an individual could use this to have config
in one place that affects a subset of remotes, but you're right that
they could just do this by putting repositories at different places in
the filesystem.

> > I marked this as RFC because there are some design points that need to
> > be resolved:
> > 
> >  - The existing "include" and "includeIf" instructions are executed
> >    immediately, whereas in order to be useful, the execution of
> >    "includeIf hasremoteurl" needs to be delayed until all config files
> >    are read. Are there better ways to do this?
> 
> Note that this violates the "as if they had been found at the location
> of the include directive" rule which we advertise to users. I'd imagine
> that most of the time it doesn't matter, but this is a pretty big
> exception we'll need to document.

Yes, that's true. Another thing I just thought of is to add a new
"deferIncludeIf" which makes clear the different semantics (deferred
include, and perhaps not allow recursive includes).

> Just brainstorming some alternatives:
> 
>   - We could stop the world while we are parsing and do a _new_ parse
>     that just looks at the remote config (in fact, this is the natural
>     thing if you were consulting the regular remote.c code for the list
>     of remotes, because it does its own config parse).
> 
>     That does mean that the remote-conditional includes cannot
>     themselves define new remotes. But I think that is already the case
>     with your patch (and violating that gets you into weird circular
>     problems).

Hmm...yes, having a special-case rule that such an included file cannot
define new remotes would be complex.

>   - We could simply document that if you want to depend on conditional
>     includes based on a particular remote.*.url existing, then that
>     remote config must appear earlier in the sequence.
> 
>     This is a bit ugly, because I'm sure it will bite somebody
>     eventually. But at the same time, it resolves all of the weird
>     timing issues, and does so in a way that will be easy to match if we
>     have any other config dependencies.

My main issue with this is that different config files are read at
different times, and the repo config (that usually contains the remote)
is read last.

> >  - Is the conditionally-included file allowed to have its own
> >    "include{,If}" instructions? I'm thinking that we should forbid it
> >    because, for example, if we had 4 files as follows: A includes B and
> >    C includes D, and we include A and C in our main config (in that
> >    order), it wouldn't be clear whether B (because A was first included)
> >    or C (because we should execute everything at the same depth first)
> >    should be executed first. (In this patch, I didn't do anything about
> >    includes.)
> 
> I'd say that A would expand B at the moment it is parsed, by the usual
> as-if rule. If it has a recursive includeIf on remotes, then my head may
> explode. I'd argue that we should refuse to do recursive remote-ifs in
> that case (though all of this is a consequence of the after-the-fact
> parsing; I'd much prefer one of the alternatives I gave earlier).

If we can't expand in place, I would say that any recursive includes
should be refused. But as you said, we could still think about whether
in-place expansion can be done before addressing this question.

> >  - A small one: the exact format of the glob. I probably will treat the
> >    URL like a path.
> 
> You might want to use the matcher from urlmatch.[ch], which understands
> things like wildcards. Of course remote "URLs" are not always real
> syntactically valid URLs, which may make that awkward.
> 
> Barring that the usual fnmatch glob is probably our best bet.

OK.

> So we make a copy of every remote name on the off chance that somebody
> has an includeIf which looks at it. That feels wasteful, though in
> practice it's probably not that big a deal.
> 
> By doing the config parsing ourselves here we're missing out on any
> other forms of remote, like .git/remotes. Those are old and not widely
> used, and I'd be OK with skipping them. But we should clearly document
> that this is matching remote.*.url, not any of the other mechanisms.

Sounds good.

> I only lightly read the rest of the patch. I didn't see anything
> obviously wrong, but I think the goal at this point is figuring out the
> design.

Yes, that's right.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-12 23:48   ` Junio C Hamano
@ 2021-10-13 19:52     ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-13 19:52 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > I marked this as RFC because there are some design points that need to
> > be resolved:
> >
> >  - The existing "include" and "includeIf" instructions are executed
> >    immediately, whereas in order to be useful, the execution of
> >    "includeIf hasremoteurl" needs to be delayed until all config files
> >    are read. Are there better ways to do this?
> 
> An interesting chicken-and-egg problem.  Even if an included
> configuration file does not have further "include", you may discover
> there are more remotes, which may add new includes to fire from the
> top-level configuration file.

That's true. We might need to say that such conditional includes are
based only on what happened during the main config parsing.

> What if we have multiple remotes?  Is it a sufficient match for only
> one of them match what is mentioned in the includeIf condition?
> Should all of them must match the pattern instead?  Majority,
> perhaps?  Something else?

I think at least one remote should match.

> >  - Is the conditionally-included file allowed to have its own
> >    "include{,If}" instructions? I'm thinking that we should forbid it
> >    because, for example, if we had 4 files as follows: A includes B and
> >    C includes D, and we include A and C in our main config (in that
> >    order), it wouldn't be clear whether B (because A was first included)
> >    or C (because we should execute everything at the same depth first)
> >    should be executed first. (In this patch, I didn't do anything about
> >    includes.)
> 
> Interesting.  The order of real inclusion obviously would affect the
> outcome of the "last one wins" rule.  And this does not have to be
> limited to this "hasremote" condition, so we need to design it with
> a bit of care.
> 
> Would it be possible for a newly included file to invalidate an
> earlier condition that was used to decide whether to include another
> file or not?  If not, then you can take a two-pass approach where
> the first pass is used to decide solely to discover which
> conditionally included files are taken, clear the slate and the
> parse these files in the textual order.  In the case of your example
> above, the early part of the primary config would be the first to be
> read, then comes A's early part, then comes B in its entirety, then
> the rest of A, and then the middle part of the primary config, then
> C's early part, then D, and then the rest of C,... you got the idea.
>
> If it is possible for an included file to invalidate a condition we
> have already evaluated to make a decision, it would become messy.
> For example, we might decide to include another file based on the
> value we discovered for a config variable:
> 
>     === .git/config ===
>     [my] variable
>     [your] variable = false
> 
>     [includeIf "configEq:my.variable==true"]
>             file = fileA
> 
> but the included file may override the condition, e.g.
> 
>     === fileA ===
>     [my] variable = false
>     [your] variable = true
> 
> and applying the "last one wins" rule becomes messy.  I do not
> offhand know what these
> 
>     $ git config --bool my.variable
>     $ git config --bool your.variable
> 
> should say, and do not have a good explanation for possible
> outcomes.

In this case, it makes sense to me to think that files are included
entirely or not at all, so my.variable would be false and your.variable
would be true. I guess the tricky part is something like:

  === .git/config ===
  [my] variable = true
  [your] variable = false
  [includeIf "configEq:my.variable==true"]
    file = fileA
  [includeIf "configEq:my.variable==false"]
    file = fileB
  === fileA ===
    my.variable = false
  === fileB ===
    your.variable = true

and what my.variable and your.variable would end up being.

> Maybe the above example can serve as a way to guide us when we
> design the types of conditionals we allow in includeIf.  This
> example tells us that it is probably a terrible idea to allow using
> values of configuration variables as part of "includeIf" condition.

Hmm...well, remote.foo.url is a configuration variable. I think that the
two-pass approach you describe would work if we prohibit subsequent
inclusions.

> There may lead to similar "'hasremoteurl' makes a well-behaved
> condition, because [remote] are additive and not 'last-one-wins',
> but we cannot add 'lacksremoteurl' as a condition, because a file we
> decide to include based on a 'lacks' predicate may invalidate the
> 'lacks' condition by defining such a remote" design decisions you'd
> need to make around the URLs of the remotes defined for the
> repository.

And if we implement two-pass with no subsequent inclusions, "lacks"
would work the same way.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (2 preceding siblings ...)
  2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
@ 2021-10-18 20:48 ` Jonathan Tan
  2021-10-22  3:12   ` Emily Shaffer
  2021-10-27 11:55   ` Jeff King
  2021-10-25 13:03 ` Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-18 20:48 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git, peff, gitster

After some in-office discussion, here are the alternatives as I see it:

 (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
     name) command that is executed after all config files are read. (If
     there are multiple, they are executed in order of appearance.)
     Files included by this mechanism cannot directly or indirectly
     contain another "includeAfterIf". This is the same as what was
     introduced in this patch set, except for the name of the directive.

 (2) Leave the name as "includeIf", and when it is encountered with a
     remote-URL condition: continue parsing the config files, skipping
     all "includeIf hasRemoteUrl", only looking for remote.*.url. After
     that, resume the reading of config files at the first "includeIf
     hasRemoteUrl", using the prior remote.*.url information gathered to
     determine which files to include when "includeIf hasRemoteUrl" is
     encountered. Files included by this mechanism cannot contain any
     "remote.*.url" variables.

In all cases, the include is executed if at least one remote URL
matches.

There are other ideas including:

 (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
     wants to match it. (But this doesn't fit our use case, in which a
     repo config has the URL but a system or user config has the
     include.)

 (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
     for remote.*.url. (I think this out-of-order config search is more
     complicated than (2), though.)

For (2), I think that prohibiting "remote.*.url" from any "includeIf
hasRemoteUrl" files sidesteps questions like "what happens when an
included file overrides the URL that made us include this file in the
first place" or "what happens if an included file includes a
remote.*.url that validates or invalidates a prior or subsequent file",
because now that cannot happen at all. My main concern with this
prohibition was that if we were to introduce another similar condition
(say, one based on remote names), what would happen? But I think this is
solvable - make the prohibitions based only on all the conditions that
the actually used, so if the user only uses conditions on remote URLs,
then the user can still set refspecs (for example), even after the
remote-name-condition feature is introduced in Git.

For (1), it is simpler in concept (and also in implementation, I think).
The user just needs to know that certain includes are on-the-spot and
certain includes (the ones with "after" in the name) are deferred - in
particular, if a config variable isn't the value they expect, they'll
need to check that it wasn't introduced in an includeAfterIf file. (And
the user also needs to figure out that if they want to override such a
variable, they'll need to make their own includeAfterIf with an
always-true condition.)

From the prior replies, I think that people will be more interested in
(2) as it preserves the "last config wins" rule, and I'm inclined to go
for (2) too. I'll see if others have any other opinions, and if not I'll
see how the implementation of (2) will look like.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-18 20:48 ` Jonathan Tan
@ 2021-10-22  3:12   ` Emily Shaffer
  2021-10-27 11:55   ` Jeff King
  1 sibling, 0 replies; 87+ messages in thread
From: Emily Shaffer @ 2021-10-22  3:12 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, gitster

On Mon, Oct 18, 2021 at 01:48:03PM -0700, Jonathan Tan wrote:
> 
> After some in-office discussion, here are the alternatives as I see it:
> 
>  (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
>      name) command that is executed after all config files are read. (If
>      there are multiple, they are executed in order of appearance.)
>      Files included by this mechanism cannot directly or indirectly
>      contain another "includeAfterIf". This is the same as what was
>      introduced in this patch set, except for the name of the directive.
> 
>  (2) Leave the name as "includeIf", and when it is encountered with a
>      remote-URL condition: continue parsing the config files, skipping
>      all "includeIf hasRemoteUrl", only looking for remote.*.url. After
>      that, resume the reading of config files at the first "includeIf
>      hasRemoteUrl", using the prior remote.*.url information gathered to
>      determine which files to include when "includeIf hasRemoteUrl" is
>      encountered. Files included by this mechanism cannot contain any
>      "remote.*.url" variables.
> 
> In all cases, the include is executed if at least one remote URL
> matches.
> 
> There are other ideas including:
> 
>  (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
>      wants to match it. (But this doesn't fit our use case, in which a
>      repo config has the URL but a system or user config has the
>      include.)
> 
>  (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
>      for remote.*.url. (I think this out-of-order config search is more
>      complicated than (2), though.)
> 
> For (2), I think that prohibiting "remote.*.url" from any "includeIf
> hasRemoteUrl" files sidesteps questions like "what happens when an
> included file overrides the URL that made us include this file in the
> first place" or "what happens if an included file includes a
> remote.*.url that validates or invalidates a prior or subsequent file",
> because now that cannot happen at all. My main concern with this
> prohibition was that if we were to introduce another similar condition
> (say, one based on remote names), what would happen? But I think this is
> solvable - make the prohibitions based only on all the conditions that
> the actually used, so if the user only uses conditions on remote URLs,
> then the user can still set refspecs (for example), even after the
> remote-name-condition feature is introduced in Git.
> 
> For (1), it is simpler in concept (and also in implementation, I think).
> The user just needs to know that certain includes are on-the-spot and
> certain includes (the ones with "after" in the name) are deferred - in
> particular, if a config variable isn't the value they expect, they'll
> need to check that it wasn't introduced in an includeAfterIf file. (And
> the user also needs to figure out that if they want to override such a
> variable, they'll need to make their own includeAfterIf with an
> always-true condition.)
> 
> From the prior replies, I think that people will be more interested in
> (2) as it preserves the "last config wins" rule, and I'm inclined to go
> for (2) too. I'll see if others have any other opinions, and if not I'll
> see how the implementation of (2) will look like.

Another concern which came up for me in a private conversation today -

How difficult will it be for users to override this include directive if
it is set somewhere outside of their control? For example:

/etc/gitconfig:
[includeIf hasRemoteUrl.https://example.com/example.git] // or whatever
  path = /etc/some-special-config

Will it be possible for a user to "un-include" /etc/some-special-config
themselves?

I don't think this should change your patch much - if my understanding
is correct, we also don't have a way to "un-include" existing include or
includeIf directives made outside of the user's control. But I wonder if
it'd be useful to think about some way to do that. Maybe we can teach
the config parse how to include a config file in reverse? Maybe we need
a "neverInclude" directive? Food for thought, anyway.

Sorry, but I won't have time to take a look at the rest of this series
til next week.

 - Emily


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (3 preceding siblings ...)
  2021-10-18 20:48 ` Jonathan Tan
@ 2021-10-25 13:03 ` Ævar Arnfjörð Bjarmason
  2021-10-25 18:53   ` Jonathan Tan
  2021-10-29 17:31 ` [WIP v2 " Jonathan Tan
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-25 13:03 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Tue, Oct 12 2021, Jonathan Tan wrote:

I tried sending the below (sans some last minute spellchecking now)
around October 19th, but for some reason it didn't make it
on-list. Trying again now, apologies for [near-]duplicates, if any (I
elaborated a bit at the end just now).

> Previously [1], I sent a patch set for remote-suggested configs that are
> transmitted when fetching, but there were some security concerns. Here
> is another way that remote repo administators can provide recommended
> configs - through conditionally included files based on the configured
> remote. Git itself neither transmits nor prompts for these files, which
> hopefully reduces people's concerns.

I had some concerns about the specifics of the implementation/what
seemed to be tailoring it a bit too closely to one use-case[1][2], not
inherently with the idea (although I think e.g. for brian that more
closely reflects his thoughts).

Anyway, just saying that aside from this RFC I don't think we were at
the point of really fleshing out what this would look like, and there
being some hard "no", so I think that idea could still be pursued.

On this proposal: this also applies globally to all history, but I don't
have the same concern with that as the 1=1 mapping of remote-suggested
hooks, our path includes work that way, after all.

I think it would be nice if you could think about if/how this and the
"onbranch" include would work together though to serve the general case
better.

Also if you have a repo with N remotes each where "origin" tracks URLs
at git.example.com, and you add a "dev" tracking dev.example.com, will
the config apply if you're say on a branch tracking the "live" server,
if you've said "include this for repos matching dev.example.com?

Arguably that's what you want, but perhaps something that those more
used to the centralized workflows wouldn't consider as being unintuitive
for users who might want to add this config only for their main "origin"
remote. We don't really have a way of marking that special-ness though,
except maybe checkout.defaultRemote.

I'm also still somewhat mystified at how this would better serve your
userbase than the path-based included, i.e. the selling point of the
remote-suggested configuration was that it would Just Work.

But for this the users would either need to setup the config themselves
for your remote, but that would be easier than pro-actively cloning in
"work" or whatever? I guess, just wondering if I'm missing something.

Or if it's a partly-automated system where some automation is dropping
in a /etc/gitconfig.d/google-remote-config-include I wonder if this
whole thing wouldn't be better for users with such special-needs if we
just supported an "early config hook".

i.e. similar to how we read trace2 config from /etc/gitconfig early, we
could start picking up a hook that just so happens to conform to the
config schema Emily's config-based hooks use.

So the /etc/gitconfig would have say:

    hook.ourConfigThingy.command=/usr/bin/googly-git-config
    hook.ourConfigThingy.event=include-config

That hook would just produce a config snippet to be included on STDOUT.

Since it's an arbitrary external command it would nicely get around any
chicken and egg problems in git itself, it could run "git remote -v",
inspect the equivalent of an "onbranch" etc. etc, then just dynamically
produce config-to-be-included.

Please don't take this as some objection to your current proposal, just
a thought on something that might entirely bypass odd edge cases and
arbitrary limitations associated with doing this all in the "main"
process on-the-fly.

The special-ness with that one would need to be that we'd say it
wouldn't have the normal "last set wins" semantics, or maybe we could do
that and just note that we saw it, and execute the "include" when we
detect the end of the full config parsing (I'm not familiar enough with
those bits to say where that is).

Both of those seem easier than dealing with any chicken & egg problems
in parsing the config stream itself, since such a hook could just invoke
"git remote -v" and the like itself, after e.g. setting some environment
variable of its own to guard against its own recursion (or we'd do it
for it for such hooks...).

1. https://lore.kernel.org/git/87k0mn2dd3.fsf@evledraar.gmail.com/
2. https://lore.kernel.org/git/87o8awvglr.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-25 13:03 ` Ævar Arnfjörð Bjarmason
@ 2021-10-25 18:53   ` Jonathan Tan
  2021-10-26 10:12     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-10-25 18:53 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> I had some concerns about the specifics of the implementation/what
> seemed to be tailoring it a bit too closely to one use-case[1][2], not
> inherently with the idea (although I think e.g. for brian that more
> closely reflects his thoughts).
> 
> Anyway, just saying that aside from this RFC I don't think we were at
> the point of really fleshing out what this would look like, and there
> being some hard "no", so I think that idea could still be pursued.

Which idea specifically do you think could still be pursued?

> On this proposal: this also applies globally to all history, but I don't
> have the same concern with that as the 1=1 mapping of remote-suggested
> hooks, our path includes work that way, after all.
> 
> I think it would be nice if you could think about if/how this and the
> "onbranch" include would work together though to serve the general case
> better.
> 
> Also if you have a repo with N remotes each where "origin" tracks URLs
> at git.example.com, and you add a "dev" tracking dev.example.com, will
> the config apply if you're say on a branch tracking the "live" server,
> if you've said "include this for repos matching dev.example.com?

Right now, the feature is only dependent on remote URLs configured
through remote.?.url. It wouldn't work with "onbranch" because there's
no way to combine conditions (and I have no plans to do that). I think
that if you have something that you want depending on which branch
you're on, you can just use the existing "onbranch" feature.

> Arguably that's what you want, but perhaps something that those more
> used to the centralized workflows wouldn't consider as being unintuitive
> for users who might want to add this config only for their main "origin"
> remote. We don't really have a way of marking that special-ness though,
> except maybe checkout.defaultRemote.

What do you mean by adding a config for a specific remote?

> I'm also still somewhat mystified at how this would better serve your
> userbase than the path-based included, i.e. the selling point of the
> remote-suggested configuration was that it would Just Work.
> 
> But for this the users would either need to setup the config themselves
> for your remote, but that would be easier than pro-actively cloning in
> "work" or whatever? I guess, just wondering if I'm missing something.
> 
> Or if it's a partly-automated system where some automation is dropping
> in a /etc/gitconfig.d/google-remote-config-include 

Yes, the config is meant to be handled e.g. through a package manager
like apt. We don't want to prescribe directory structures like "work",
which is why the include is conditional upon the remote URL.

Even if the user pro-actively clones into "work", the user still needs
to set up the conditional config, so I don't see how that is a net
benefit.

> I wonder if this
> whole thing wouldn't be better for users with such special-needs if we
> just supported an "early config hook".
> 
> i.e. similar to how we read trace2 config from /etc/gitconfig early, we
> could start picking up a hook that just so happens to conform to the
> config schema Emily's config-based hooks use.
> 
> So the /etc/gitconfig would have say:
> 
>     hook.ourConfigThingy.command=/usr/bin/googly-git-config
>     hook.ourConfigThingy.event=include-config
> 
> That hook would just produce a config snippet to be included on STDOUT.
> 
> Since it's an arbitrary external command it would nicely get around any
> chicken and egg problems in git itself, it could run "git remote -v",
> inspect the equivalent of an "onbranch" etc. etc, then just dynamically
> produce config-to-be-included.

I see that later on, you suggest an environment variable to guard
against recursion.

One thing is that if there are multiple such hooks, each one won't be
able to see what the other hooks have produced.

If the feature you described already existed in Git, I think I could use
that, but if we're deciding between implementing the config hook you
describe versus something with more constraints, I think the one I
proposed is better for now. Some design points that have already been
discussed are whether setting a config during processing of an included
file would then invalidate the include and also the order of operations,
both of which would be much more difficult to control with config hooks.

> Please don't take this as some objection to your current proposal, just
> a thought on something that might entirely bypass odd edge cases and
> arbitrary limitations associated with doing this all in the "main"
> process on-the-fly.
> 
> The special-ness with that one would need to be that we'd say it
> wouldn't have the normal "last set wins" semantics, or maybe we could do
> that and just note that we saw it, and execute the "include" when we
> detect the end of the full config parsing (I'm not familiar enough with
> those bits to say where that is).

The "last set" would be those set by the hooks, so yes, a user would
need to know to make their own hook in order to override anything set by
the hooks. The end of the full config parsing is in
config_with_options().

> Both of those seem easier than dealing with any chicken & egg problems
> in parsing the config stream itself, since such a hook could just invoke
> "git remote -v" and the like itself, after e.g. setting some environment
> variable of its own to guard against its own recursion (or we'd do it
> for it for such hooks...).
> 
> 1. https://lore.kernel.org/git/87k0mn2dd3.fsf@evledraar.gmail.com/
> 2. https://lore.kernel.org/git/87o8awvglr.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-25 18:53   ` Jonathan Tan
@ 2021-10-26 10:12     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-26 10:12 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Mon, Oct 25 2021, Jonathan Tan wrote:

>> I had some concerns about the specifics of the implementation/what
>> seemed to be tailoring it a bit too closely to one use-case[1][2], not
>> inherently with the idea (although I think e.g. for brian that more
>> closely reflects his thoughts).
>> 
>> Anyway, just saying that aside from this RFC I don't think we were at
>> the point of really fleshing out what this would look like, and there
>> being some hard "no", so I think that idea could still be pursued.
>
> Which idea specifically do you think could still be pursued?

I meant the whole in-repo .gitconfig. I.e. to the extent that you're
submitting this as an alternative to that because of the negative
feedback on that RFC.

>> On this proposal: this also applies globally to all history, but I don't
>> have the same concern with that as the 1=1 mapping of remote-suggested
>> hooks, our path includes work that way, after all.
>> 
>> I think it would be nice if you could think about if/how this and the
>> "onbranch" include would work together though to serve the general case
>> better.
>> 
>> Also if you have a repo with N remotes each where "origin" tracks URLs
>> at git.example.com, and you add a "dev" tracking dev.example.com, will
>> the config apply if you're say on a branch tracking the "live" server,
>> if you've said "include this for repos matching dev.example.com?
>
> Right now, the feature is only dependent on remote URLs configured
> through remote.?.url. It wouldn't work with "onbranch" because there's
> no way to combine conditions (and I have no plans to do that). I think
> that if you have something that you want depending on which branch
> you're on, you can just use the existing "onbranch" feature.

I mean with this and the below...

>> Arguably that's what you want, but perhaps something that those more
>> used to the centralized workflows wouldn't consider as being unintuitive
>> for users who might want to add this config only for their main "origin"
>> remote. We don't really have a way of marking that special-ness though,
>> except maybe checkout.defaultRemote.
>
> What do you mean by adding a config for a specific remote?

...what happens if you add a google.com remote for a repository that
"lives" on github.com. I.e. are the semantics "match any remote", or
"match the 'primary' remote (origin?" etc.

>> I'm also still somewhat mystified at how this would better serve your
>> userbase than the path-based included, i.e. the selling point of the
>> remote-suggested configuration was that it would Just Work.
>> 
>> But for this the users would either need to setup the config themselves
>> for your remote, but that would be easier than pro-actively cloning in
>> "work" or whatever? I guess, just wondering if I'm missing something.
>> 
>> Or if it's a partly-automated system where some automation is dropping
>> in a /etc/gitconfig.d/google-remote-config-include 
>
> Yes, the config is meant to be handled e.g. through a package manager
> like apt. We don't want to prescribe directory structures like "work",
> which is why the include is conditional upon the remote URL.
>
> Even if the user pro-actively clones into "work", the user still needs
> to set up the conditional config, so I don't see how that is a net
> benefit.

Ah, that explains it. I assumed both cases would be ones where the user
would need to manually enable the 'configuration' (or cloning to a given
subdir).

>> I wonder if this
>> whole thing wouldn't be better for users with such special-needs if we
>> just supported an "early config hook".
>> 
>> i.e. similar to how we read trace2 config from /etc/gitconfig early, we
>> could start picking up a hook that just so happens to conform to the
>> config schema Emily's config-based hooks use.
>> 
>> So the /etc/gitconfig would have say:
>> 
>>     hook.ourConfigThingy.command=/usr/bin/googly-git-config
>>     hook.ourConfigThingy.event=include-config
>> 
>> That hook would just produce a config snippet to be included on STDOUT.
>> 
>> Since it's an arbitrary external command it would nicely get around any
>> chicken and egg problems in git itself, it could run "git remote -v",
>> inspect the equivalent of an "onbranch" etc. etc, then just dynamically
>> produce config-to-be-included.
>
> I see that later on, you suggest an environment variable to guard
> against recursion.
>
> One thing is that if there are multiple such hooks, each one won't be
> able to see what the other hooks have produced.

Yes, although aside from this hook that's a general caveat with the
proposed config-based hooks, I think if you need a hook that does that
(whether it's this, or pre-receive etc.) our answer is "put it in your
own wrapper".

> If the feature you described already existed in Git, I think I could use
> that, but if we're deciding between implementing the config hook you
> describe versus something with more constraints, I think the one I
> proposed is better for now. Some design points that have already been
> discussed are whether setting a config during processing of an included
> file would then invalidate the include and also the order of operations,
> both of which would be much more difficult to control with config hooks.

I suggested it because maybe it would be a lot simpler, i.e. we don't
need such a feature to be aware of remote config at all, or having to
"read forward" to find it, maybe it would be more complex. I haven't
tried to implement it.

>> Please don't take this as some objection to your current proposal, just
>> a thought on something that might entirely bypass odd edge cases and
>> arbitrary limitations associated with doing this all in the "main"
>> process on-the-fly.
>> 
>> The special-ness with that one would need to be that we'd say it
>> wouldn't have the normal "last set wins" semantics, or maybe we could do
>> that and just note that we saw it, and execute the "include" when we
>> detect the end of the full config parsing (I'm not familiar enough with
>> those bits to say where that is).
>
> The "last set" would be those set by the hooks, so yes, a user would
> need to know to make their own hook in order to override anything set by
> the hooks. The end of the full config parsing is in
> config_with_options().

On the "user would need to know" that's the same if it's config? I.e. in
either case it would be in /etc/gitconfig or whatever shipped by the
*.deb package.

Anyway, I really just meant this as a suggestion, and one that might
make things simpler. If you don't think it makes sense...

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-13 18:33     ` Jonathan Tan
@ 2021-10-27 11:40       ` Jeff King
  2021-10-27 17:23         ` Jonathan Tan
  0 siblings, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-27 11:40 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Wed, Oct 13, 2021 at 11:33:41AM -0700, Jonathan Tan wrote:

> > But in general, I'd imagine most people put their repository in ~/work
> > or similar, and just do:
> > 
> >   [includeIf "gitdir:~/work"]
> >   path = foo.conf
> > 
> > (and of course you can imagine more subdivisions as necessary). So I
> > find the use-case only sort-of compelling. In general, I'm in favor of
> > adding new includeIf directions even if they're only moderately
> > convenient. But this one is rather sticky, because it is dependent on
> > other config keys being defined. So it introduces a new and complicated
> > ordering issue. Is it worth it? Maybe I'm not being imaginative enough
> > in seeing the use cases.
> 
> My main use case is for a remote repo administrator to offer a
> recommended config to anyone who clones that repo. For this, I don't
> think we can prescribe a local directory structure (e.g. "~/work")
> without being too restrictive or broad (that is, if the user ends up
> creating a repo that so happens to match our glob but did not intend the
> config to apply to it).

Yeah, I agree that it's not quite as turnkey if you have to assume
something about the user's directory structure. On the other hand, they
have to decide to put the included config file somewhere, too, so it
seems like you need to give the user "do something like this"
instructions rather than purely something they can copy and paste.

I dunno. I guess you can assume they'll put it in ~/.gitconfig-foo or
similar, and come up with copy-and-pastable directions from that.

I agree that the "match the remote" rule makes things _more_ convenient.
Mostly I was just wondering if it changed things enough to merit the
complications it introduces. I'm not sure I have an answer, and clearly
it's pretty subjective.

> > Just brainstorming some alternatives:
> > 
> >   - We could stop the world while we are parsing and do a _new_ parse
> >     that just looks at the remote config (in fact, this is the natural
> >     thing if you were consulting the regular remote.c code for the list
> >     of remotes, because it does its own config parse).
> > 
> >     That does mean that the remote-conditional includes cannot
> >     themselves define new remotes. But I think that is already the case
> >     with your patch (and violating that gets you into weird circular
> >     problems).
> 
> Hmm...yes, having a special-case rule that such an included file cannot
> define new remotes would be complex.

I think that's mostly true of your "defer" system, too, unless you keep
applying it recursively. The rule is slightly different there: it's not
"you can't define new remotes", but rather "you can't do a
remote-conditional include based on a remote included by
remote-conditional".

> >   - We could simply document that if you want to depend on conditional
> >     includes based on a particular remote.*.url existing, then that
> >     remote config must appear earlier in the sequence.
> > 
> >     This is a bit ugly, because I'm sure it will bite somebody
> >     eventually. But at the same time, it resolves all of the weird
> >     timing issues, and does so in a way that will be easy to match if we
> >     have any other config dependencies.
> 
> My main issue with this is that different config files are read at
> different times, and the repo config (that usually contains the remote)
> is read last.

Ah, right. I was thinking of the definitions within a single file, but
you're right that the common case would be having the include in
~/.gitconfig, and the remotes defined in $GIT_DIR/config. So yeah, any
ordering constraint like that is a non-starter, I'd think.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-18 20:48 ` Jonathan Tan
  2021-10-22  3:12   ` Emily Shaffer
@ 2021-10-27 11:55   ` Jeff King
  2021-10-27 17:52     ` Jonathan Tan
  1 sibling, 1 reply; 87+ messages in thread
From: Jeff King @ 2021-10-27 11:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster

On Mon, Oct 18, 2021 at 01:48:03PM -0700, Jonathan Tan wrote:

>  (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
>      name) command that is executed after all config files are read. (If
>      there are multiple, they are executed in order of appearance.)
>      Files included by this mechanism cannot directly or indirectly
>      contain another "includeAfterIf". This is the same as what was
>      introduced in this patch set, except for the name of the directive.

I think this works in terms of having self-consistent rules that make
sense. But deferring things does introduce new complications in terms of
overrides, because we rely on last-one-wins. Emily asked elsewhere about
overriding the inclusion of a file. We don't have a way to do that now,
and I think it would be tricky to add. But what about overriding a
single variable?

Right now this works:

  git config --global foo.bar true
  git config --local foo.bar false

to give you "false". But imagining there was a world of deferred config,
then:

  git config --file ~/.gitconfig-foo foo.bar true
  git config --global deferInclude.path .gitconfig-foo
  git config --local foo.bar false

gives "true". We'd read .gitconfig-foo after everything else, overriding
the repo-level config.

If the deferred includes were processed at the end of each individual
file, that would solve that. You're still left with the slight oddness
that a deferred include may override options within the same file that
come after it, but that's inherent to the "defer" concept, and the
answer is probably "don't do that". It's only when it crosses file
boundaries (which are explicitly ordered by priority) that it really
hurts.

>  (2) Leave the name as "includeIf", and when it is encountered with a
>      remote-URL condition: continue parsing the config files, skipping
>      all "includeIf hasRemoteUrl", only looking for remote.*.url. After
>      that, resume the reading of config files at the first "includeIf
>      hasRemoteUrl", using the prior remote.*.url information gathered to
>      determine which files to include when "includeIf hasRemoteUrl" is
>      encountered. Files included by this mechanism cannot contain any
>      "remote.*.url" variables.

I think doing this as "continue parsing" and "resume" is hard to do.
Because you can't look at other non-remote.*.url entries here (otherwise
you'd see them out of order). So you have to either:

  - complete the parse, stashing all the other variables away, and then
    resolve the include, and then look at all the stashed variables as
    if you were parsing them anew.

  - teach our config parser how to save and restore state, including
    both intra-file state and the progress across the set of files

I think it's much easier if you think of it as "start a new config parse
that does not respect hasRemoteURL". And the easiest way to do that is
to just let remote.c's existing git_config() start that parse (probably
by calling git_config_with_options() and telling it "don't respect
hasRemoteURL includes"). You may also need to teach the config parser to
be reentrant. We did some work on that a while ago, pushing the state
int config_source which functions as a stack, but I don't offhand know
if you can call git_config() from within a config callback.

> There are other ideas including:
> 
>  (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
>      wants to match it. (But this doesn't fit our use case, in which a
>      repo config has the URL but a system or user config has the
>      include.)

Yeah, I agree this won't work.

>  (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
>      for remote.*.url. (I think this out-of-order config search is more
>      complicated than (2), though.)

I think this is what I described above, and actually is less
complicated. ;)

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 2/2] config: include file if remote URL matches a glob
  2021-10-27 11:40       ` Jeff King
@ 2021-10-27 17:23         ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-27 17:23 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> Yeah, I agree that it's not quite as turnkey if you have to assume
> something about the user's directory structure. On the other hand, they
> have to decide to put the included config file somewhere, too, so it
> seems like you need to give the user "do something like this"
> instructions rather than purely something they can copy and paste.

They can copy and paste instructions to add a package repository (e.g.
by editing /etc/apt/sources.list) and then install a package.

> I dunno. I guess you can assume they'll put it in ~/.gitconfig-foo or
> similar, and come up with copy-and-pastable directions from that.
> 
> I agree that the "match the remote" rule makes things _more_ convenient.
> Mostly I was just wondering if it changed things enough to merit the
> complications it introduces. I'm not sure I have an answer, and clearly
> it's pretty subjective.

I am almost done with the implementation, so maybe the community could
look at it and concretely see the extent of the complication.

> > > Just brainstorming some alternatives:
> > > 
> > >   - We could stop the world while we are parsing and do a _new_ parse
> > >     that just looks at the remote config (in fact, this is the natural
> > >     thing if you were consulting the regular remote.c code for the list
> > >     of remotes, because it does its own config parse).
> > > 
> > >     That does mean that the remote-conditional includes cannot
> > >     themselves define new remotes. But I think that is already the case
> > >     with your patch (and violating that gets you into weird circular
> > >     problems).
> > 
> > Hmm...yes, having a special-case rule that such an included file cannot
> > define new remotes would be complex.
> 
> I think that's mostly true of your "defer" system, too, unless you keep
> applying it recursively. The rule is slightly different there: it's not
> "you can't define new remotes", but rather "you can't do a
> remote-conditional include based on a remote included by
> remote-conditional".

I was thinking that deferred includes cannot themselves have other
deferred includes and that which deferred includes are included would be
computed only once, and those would be the only rules. (But I guess this
is moot now - we're not doing this approach.)

> > >   - We could simply document that if you want to depend on conditional
> > >     includes based on a particular remote.*.url existing, then that
> > >     remote config must appear earlier in the sequence.
> > > 
> > >     This is a bit ugly, because I'm sure it will bite somebody
> > >     eventually. But at the same time, it resolves all of the weird
> > >     timing issues, and does so in a way that will be easy to match if we
> > >     have any other config dependencies.
> > 
> > My main issue with this is that different config files are read at
> > different times, and the repo config (that usually contains the remote)
> > is read last.
> 
> Ah, right. I was thinking of the definitions within a single file, but
> you're right that the common case would be having the include in
> ~/.gitconfig, and the remotes defined in $GIT_DIR/config. So yeah, any
> ordering constraint like that is a non-starter, I'd think.
> 
> -Peff

Yeah. Thanks for continuing to take a look at this.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-27 11:55   ` Jeff King
@ 2021-10-27 17:52     ` Jonathan Tan
  2021-10-27 20:32       ` Jeff King
  0 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-10-27 17:52 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git, gitster

> On Mon, Oct 18, 2021 at 01:48:03PM -0700, Jonathan Tan wrote:
> 
> >  (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
> >      name) command that is executed after all config files are read. (If
> >      there are multiple, they are executed in order of appearance.)
> >      Files included by this mechanism cannot directly or indirectly
> >      contain another "includeAfterIf". This is the same as what was
> >      introduced in this patch set, except for the name of the directive.
> 
> I think this works in terms of having self-consistent rules that make
> sense. But deferring things does introduce new complications in terms of
> overrides, because we rely on last-one-wins. Emily asked elsewhere about
> overriding the inclusion of a file. We don't have a way to do that now,
> and I think it would be tricky to add. But what about overriding a
> single variable?
> 
> Right now this works:
> 
>   git config --global foo.bar true
>   git config --local foo.bar false
> 
> to give you "false". But imagining there was a world of deferred config,
> then:
> 
>   git config --file ~/.gitconfig-foo foo.bar true
>   git config --global deferInclude.path .gitconfig-foo
>   git config --local foo.bar false
> 
> gives "true". We'd read .gitconfig-foo after everything else, overriding
> the repo-level config.
> 
> If the deferred includes were processed at the end of each individual
> file, that would solve that. You're still left with the slight oddness
> that a deferred include may override options within the same file that
> come after it, but that's inherent to the "defer" concept, and the
> answer is probably "don't do that". It's only when it crosses file
> boundaries (which are explicitly ordered by priority) that it really
> hurts.

This would indeed solve the issue of the user needing to know the trick
to override variables set by deferred includes. But this wouldn't solve
our primary use case in which a system-level config defines a
conditional include but the repo config defines the URL, I think.

> >  (2) Leave the name as "includeIf", and when it is encountered with a
> >      remote-URL condition: continue parsing the config files, skipping
> >      all "includeIf hasRemoteUrl", only looking for remote.*.url. After
> >      that, resume the reading of config files at the first "includeIf
> >      hasRemoteUrl", using the prior remote.*.url information gathered to
> >      determine which files to include when "includeIf hasRemoteUrl" is
> >      encountered. Files included by this mechanism cannot contain any
> >      "remote.*.url" variables.
> 
> I think doing this as "continue parsing" and "resume" is hard to do.
> Because you can't look at other non-remote.*.url entries here (otherwise
> you'd see them out of order). So you have to either:
> 
>   - complete the parse, stashing all the other variables away, and then
>     resolve the include, and then look at all the stashed variables as
>     if you were parsing them anew.
> 
>   - teach our config parser how to save and restore state, including
>     both intra-file state and the progress across the set of files

I am implementing something similar to your first approach (stashing
things). It's almost done so hopefully we'll have something concrete to
discuss soon.

> I think it's much easier if you think of it as "start a new config parse
> that does not respect hasRemoteURL". And the easiest way to do that is
> to just let remote.c's existing git_config() start that parse (probably
> by calling git_config_with_options() and telling it "don't respect
> hasRemoteURL includes"). You may also need to teach the config parser to
> be reentrant. We did some work on that a while ago, pushing the state
> int config_source which functions as a stack, but I don't offhand know
> if you can call git_config() from within a config callback.

Besides the reentrancy (which may be difficult, as there are some global
variables, but from a glance, some code seems to take care to save and
restore them, so it may already be reentrant or not too difficult to
make reentrant), we would have to bubble down the config (struct
git_config_source and struct config_options) into all the places that
could potentially start the parse and also have a place to store the
URLs we get. If we're already going to stash URLs, it may be easier to
stash the variables instead.

> > There are other ideas including:
> > 
> >  (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
> >      wants to match it. (But this doesn't fit our use case, in which a
> >      repo config has the URL but a system or user config has the
> >      include.)
> 
> Yeah, I agree this won't work.
> 
> >  (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
> >      for remote.*.url. (I think this out-of-order config search is more
> >      complicated than (2), though.)
> 
> I think this is what I described above, and actually is less
> complicated. ;)
> 
> -Peff

Well, let me finish up (2), and let's see.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [RFC PATCH 0/2] Conditional config includes based on remote URL
  2021-10-27 17:52     ` Jonathan Tan
@ 2021-10-27 20:32       ` Jeff King
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff King @ 2021-10-27 20:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster

On Wed, Oct 27, 2021 at 10:52:59AM -0700, Jonathan Tan wrote:

> > If the deferred includes were processed at the end of each individual
> > file, that would solve that. You're still left with the slight oddness
> > that a deferred include may override options within the same file that
> > come after it, but that's inherent to the "defer" concept, and the
> > answer is probably "don't do that". It's only when it crosses file
> > boundaries (which are explicitly ordered by priority) that it really
> > hurts.
> 
> This would indeed solve the issue of the user needing to know the trick
> to override variables set by deferred includes. But this wouldn't solve
> our primary use case in which a system-level config defines a
> conditional include but the repo config defines the URL, I think.

Doh, of course. I forgot that was the whole point of the defer. ;)

> I am implementing something similar to your first approach (stashing
> things). It's almost done so hopefully we'll have something concrete to
> discuss soon.

Sounds good.

-Peff

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [WIP v2 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (4 preceding siblings ...)
  2021-10-25 13:03 ` Ævar Arnfjörð Bjarmason
@ 2021-10-29 17:31 ` Jonathan Tan
  2021-10-29 17:31   ` [WIP v2 1/2] config: make git_config_include() static Jonathan Tan
  2021-10-29 17:31   ` [WIP v2 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-11-16  0:00 ` [PATCH v3 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-10-29 17:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, gitster, avarab

Here's a version that implements the URL lookahead. Compared to an
approach in which we rerun config in order to just get the remote URLs,
I think that the main benefit of this is that we don't read from disk
twice. One inelegant thing is that we now have essentially two caches of
config variables - the one in struct repository (struct config_set
*config) and the one newly introduced by this patch set. (Although I
think that many commands don't use the cache in struct repository,
instead just reading the config in one pass.)

(Of course, there is also the possibility that we should have the remote
repo administrator provide the config in another way.)

I have marked this WIP. This patch set is mostly done, except for the
following:
 - Prohibiting remote.?.url from any files included directly or
   indirectly by a URL-conditional include.
 - Checking that memory everywhere is freed when no longer needed.
 - Documentation (as mentioned in the NEEDSWORK comment in patch 2).
 - Tests that check what the glob matches and doesn't match.

No range-diff included because this version is substantially different.

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 config.c          | 142 ++++++++++++++++++++++++++++++++++++++++++----
 config.h          |  37 ++----------
 t/t1300-config.sh |  60 ++++++++++++++++++++
 3 files changed, 194 insertions(+), 45 deletions(-)

-- 
2.33.1.1089.g2158813163f-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [WIP v2 1/2] config: make git_config_include() static
  2021-10-29 17:31 ` [WIP v2 " Jonathan Tan
@ 2021-10-29 17:31   ` Jonathan Tan
  2021-11-05 19:45     ` Emily Shaffer
  2021-10-29 17:31   ` [WIP v2 2/2] config: include file if remote URL matches a glob Jonathan Tan
  1 sibling, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-10-29 17:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, gitster, avarab

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2dcbe901b6..94ad5ce913 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index f119de0130..48a5e472ca 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.33.1.1089.g2158813163f-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [WIP v2 2/2] config: include file if remote URL matches a glob
  2021-10-29 17:31 ` [WIP v2 " Jonathan Tan
  2021-10-29 17:31   ` [WIP v2 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-10-29 17:31   ` Jonathan Tan
  2021-11-05 20:24     ` Emily Shaffer
  1 sibling, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-10-29 17:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, peff, gitster, avarab

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager).

NEEDSWORK: The way this works is that if we see such an include, we
shunt all subsequent configs into a stash (while looking for URLs), then
process the stash. In particular, this means that more memory is needed,
and the nature of error reporting changes (currently, if a callback
returns nonzero for a variable, processing halts immediately, but with
this patch, all the config might be read from disk before the callback
even sees the variable). I'll need to expand on this and write a
documentation section.

One alternative is to rerun the config parsing mechanism upon noticing
the first URL-conditional include in order to find all URLs. This would
require the config files to be read from disk twice, though.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c          | 132 +++++++++++++++++++++++++++++++++++++++++-----
 t/t1300-config.sh |  60 +++++++++++++++++++++
 2 files changed, 180 insertions(+), 12 deletions(-)

diff --git a/config.c b/config.c
index 94ad5ce913..63a37e0a5d 100644
--- a/config.c
+++ b/config.c
@@ -120,13 +120,30 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct stashed_var {
+	char *var;
+	char *value;
+	int depth;
+
+	char *url;
+};
+
 struct config_include_data {
 	int depth;
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list remote_urls;
+
+	struct stashed_var *stashed;
+	size_t stashed_nr, stashed_alloc;
+	int current_stash_depth;
 };
-#define CONFIG_INCLUDE_INIT { 0 }
+#define CONFIG_INCLUDE_INIT { .remote_urls = STRING_LIST_INIT_DUP }
 
 static int git_config_include(const char *var, const char *value, void *data);
 
@@ -316,28 +333,110 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
+static int execute_stashed(struct config_include_data *inc)
+{
+	size_t i = 0;
+	while (i < inc->stashed_nr) {
+		int ret = inc->fn(inc->stashed[i].var, inc->stashed[i].value,
+				  inc->data);
+		if (ret)
+			return ret;
+
+		/*
+		 * If it is an include, skip to next entry of the same depth if
+		 * the URL doesn't match
+		 */
+		if (inc->stashed[i].url) {
+			struct strbuf pattern = STRBUF_INIT;
+			struct string_list_item *url_item;
+			int found = 0;
+
+			strbuf_addstr(&pattern, inc->stashed[i].url);
+			add_trailing_starstar_for_dir(&pattern);
+			for_each_string_list_item(url_item, &inc->remote_urls) {
+				if (!wildmatch(pattern.buf, url_item->string,
+					       WM_PATHNAME)) {
+					found = 1;
+					break;
+				}
+			}
+			strbuf_release(&pattern);
+			if (found) {
+				i++;
+			} else {
+				int depth = inc->stashed[i].depth;
+
+				i++;
+				while (i < inc->stashed_nr &&
+				       inc->stashed[i].depth != depth)
+					i++;
+			}
+		} else {
+			i++;
+		}
+	}
+	return 0;
+}
+
 static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
+	const char *remote_name;
+	size_t remote_name_len;
 	const char *cond, *key;
 	size_t cond_len;
-	int ret;
+	int ret = 0;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(&inc->remote_urls, value);
 
 	/*
 	 * Pass along all values, including "include" directives; this makes it
 	 * possible to query information on the includes themselves.
 	 */
-	ret = inc->fn(var, value, inc->data);
-	if (ret < 0)
-		return ret;
+	if (inc->stashed_nr || starts_with(var, "includeif.hasremoteurl:")) {
+		struct stashed_var *last;
+
+		/*
+		 * Start or continue using the stash. (A false positive on
+		 * "includeif.hasremoteurl:?.path" is fine here - this just
+		 * means that some config variables unnecessarily go through
+		 * the stash before being passed to the callback.)
+		 */
+		ALLOC_GROW_BY(inc->stashed, inc->stashed_nr, 1,
+			      inc->stashed_alloc);
+		last = &inc->stashed[inc->stashed_nr - 1];
+		last->var = xstrdup(var);
+		last->value = xstrdup(value);
+		last->depth = inc->current_stash_depth;
+	} else {
+		ret = inc->fn(var, value, inc->data);
+		if (ret < 0)
+			return ret;
+	}
 
 	if (!strcmp(var, "include.path"))
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
-		ret = handle_path_include(value, inc);
+	    cond && !strcmp(key, "path")) {
+		const char *url;
+		size_t url_len;
+
+		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
+				    &url_len)) {
+			inc->stashed[inc->stashed_nr - 1].url =
+				xmemdupz(url, url_len);
+			inc->current_stash_depth++;
+			ret = handle_path_include(value, inc);
+			inc->current_stash_depth--;
+		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
+			ret = handle_path_include(value, inc);
+		}
+	}
 
 	return ret;
 }
@@ -1933,6 +2032,7 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
@@ -1950,17 +2050,25 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	if (inc.stashed_nr) {
+		execute_stashed(&inc);
+		inc.stashed_nr = 0;
+	}
+
+	string_list_clear(&inc.remote_urls, 0);
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..ea15f7fd46 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,64 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasremoteurl' '
+	test_create_repo hasremoteurlTest &&
+
+	cat >"$(pwd)"/include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >"$(pwd)"/dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasremoteurl:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
+test_expect_success 'includeIf.hasremoteurl respects last-config-wins' '
+	test_create_repo hasremoteurlTestOverride &&
+
+	cat >"$(pwd)"/include-two-three <<-\EOF &&
+	[user]
+		two = included-config
+		three = included-config
+	EOF
+	cat >>hasremoteurlTestOverride/.git/config <<-EOF &&
+	[remote "foo"]
+		url = foo
+	[user]
+		one = main-config
+		two = main-config
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-two-three"
+	[user]
+		three = main-config
+	EOF
+
+	echo main-config >expect-main-config &&
+	echo included-config >expect-included-config &&
+
+	git -C hasremoteurlTestOverride config --get user.one >actual &&
+	test_cmp expect-main-config actual &&
+
+	git -C hasremoteurlTestOverride config --get user.two >actual &&
+	test_cmp expect-included-config actual &&
+
+	git -C hasremoteurlTestOverride config --get user.three >actual &&
+	test_cmp expect-main-config actual
+'
+
 test_done
-- 
2.33.1.1089.g2158813163f-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [WIP v2 1/2] config: make git_config_include() static
  2021-10-29 17:31   ` [WIP v2 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-11-05 19:45     ` Emily Shaffer
  0 siblings, 0 replies; 87+ messages in thread
From: Emily Shaffer @ 2021-11-05 19:45 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, gitster, avarab

On Fri, Oct 29, 2021 at 10:31:09AM -0700, Jonathan Tan wrote:
> 
> It is not used from outside the file in which it is declared.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  config.c | 12 +++++++++++-
>  config.h | 37 ++++---------------------------------
>  2 files changed, 15 insertions(+), 34 deletions(-)
> 
> diff --git a/config.c b/config.c
> index 2dcbe901b6..94ad5ce913 100644
> --- a/config.c
> +++ b/config.c
> @@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
>  	return conf->u.buf.pos;
>  }
>  
> +struct config_include_data {
> +	int depth;
> +	config_fn_t fn;
> +	void *data;
> +	const struct config_options *opts;
> +};
> +#define CONFIG_INCLUDE_INIT { 0 }
> +
> +static int git_config_include(const char *var, const char *value, void *data);
> +
>  #define MAX_INCLUDE_DEPTH 10
>  static const char include_depth_advice[] = N_(
>  "exceeded maximum include depth (%d) while including\n"
> @@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
>  	return 0;
>  }
>  
> -int git_config_include(const char *var, const char *value, void *data)
> +static int git_config_include(const char *var, const char *value, void *data)
>  {
>  	struct config_include_data *inc = data;
>  	const char *cond, *key;
> diff --git a/config.h b/config.h
> index f119de0130..48a5e472ca 100644
> --- a/config.h
> +++ b/config.h
> @@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
>  /**
>   * Read a specific file in git-config format.
>   * This function takes the same callback and data parameters as `git_config`.
> + *
> + * Unlike git_config(), this function does not respect includes.
>   */
>  int git_config_from_file(config_fn_t fn, const char *, void *);
>  
> @@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
>   * will first feed the user-wide one to the callback, and then the
>   * repo-specific one; by overwriting, the higher-priority repo-specific
>   * value is left at the end).
> + *
> + * Unlike git_config_from_file(), this function respects includes.
>   */
>  void git_config(config_fn_t fn, void *);
>  
> @@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
>  const char *current_config_name(void);
>  int current_config_line(void);
>  
> -/**
> - * Include Directives
> - * ------------------
> - *
> - * By default, the config parser does not respect include directives.
> - * However, a caller can use the special `git_config_include` wrapper
> - * callback to support them. To do so, you simply wrap your "real" callback
> - * function and data pointer in a `struct config_include_data`, and pass
> - * the wrapper to the regular config-reading functions. For example:
> - *
> - * -------------------------------------------
> - * int read_file_with_include(const char *file, config_fn_t fn, void *data)
> - * {
> - * struct config_include_data inc = CONFIG_INCLUDE_INIT;
> - * inc.fn = fn;
> - * inc.data = data;
> - * return git_config_from_file(git_config_include, file, &inc);
> - * }
> - * -------------------------------------------
> - *
> - * `git_config` respects includes automatically. The lower-level
> - * `git_config_from_file` does not.
> - *
> - */

It is a shame to lose the comprehensive usage documentation. Can we move
it into the source near the static definition instead?

> -struct config_include_data {
> -	int depth;
> -	config_fn_t fn;
> -	void *data;
> -	const struct config_options *opts;
> -};
> -#define CONFIG_INCLUDE_INIT { 0 }
> -int git_config_include(const char *name, const char *value, void *data);

I wondered why we even had this here, if we were only calling
'git_config_include()' from config.c. The last time this definition was
touched was when it was moved out of cache.h (e67a57fc518) and the time
before that was when it was introduced in 2012 (9b25a0b52e0). At the
time of its introduction it was only called in config.c, anyways. So I
guess it is just a matter of history.

It's still a WIP so I won't leave a reviewed-by line, but this patch
looks fine.

 - Emily

> -
>  /*
>   * Match and parse a config key of the form:
>   *
> -- 
> 2.33.1.1089.g2158813163f-goog
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [WIP v2 2/2] config: include file if remote URL matches a glob
  2021-10-29 17:31   ` [WIP v2 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-11-05 20:24     ` Emily Shaffer
  2021-11-06  4:41       ` Ævar Arnfjörð Bjarmason
  2021-11-09  0:22       ` Jonathan Tan
  0 siblings, 2 replies; 87+ messages in thread
From: Emily Shaffer @ 2021-11-05 20:24 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, gitster, avarab

On Fri, Oct 29, 2021 at 10:31:10AM -0700, Jonathan Tan wrote:
> 
> This is a feature that supports config file inclusion conditional on
> whether the repo has a remote with a URL that matches a glob.
> 
> Similar to my previous work on remote-suggested hooks [1], the main
> motivation is to allow remote repo administrators to provide recommended
> configs in a way that can be consumed more easily (e.g. through a
> package installable by a package manager).

To expand a little more on this:

At Google we ship /etc/gitconfig, as well as /usr/share/git-core/. Our
/etc/gitconfig looks basically like:

 [include]
   path = /usr/share/git-core/gitconfig
   path = /usr/share/git-core/some-specific-config
   path = /usr/share/git-core/other-specific-config

Jonathan's WIP allows us to append lines to /etc/gitconfig sort of like

 [includeIf "hasRemoteUrl:https://internal-google/big-project"]
   path = /usr/share/big-project/gitconfig

That's approach #1 to shipping a config, which we might use for a
project that makes up a significant portion of our userbase. We ship
(and own) the /etc/gitconfig; BigProject team ships and owns their own
gitconfig; everybody internally who works on BigProject, whether it's
just once to fix a small thing or every day as their main job, gets the
relevant configs for BigProject.

Approach #2 I think is also still a useful one, and maybe more
interesting outside of Google:

When I run 'sudo apt install big-oss-project-devkit', a few things
happen:
1. /usr/share/big-oss-project/gitconfig appears
2. `git config --global \
		'includeIf.hasRemoteUrl:https://github/big-oss-project/*' \
		'/usr/share/big-oss-project/gitconfig'` is run
3. whatever other special tools, scripts, etc. are installed

That way regardless of which project I'm working on -
big-oss-project/translation, big-oss-project/docs,
big-oss-project/big-oss-project - I still get configs and style checkers
and whatever else.

With this approach #2, it's still possible for someone to do a drive-by
contribution without ever running 'apt install big-oss-project-devkit',
so it's not quite as strong a recommendation as the former
"remote-suggested-hooks" topic. User would still want to take a look at
the README for big-oss-project to learn they're supposed to be
installing that package ahead of time. But it's still a oneshot setup
for nice things like partial clone filters, maybe sparsity filters,
maybe config-based hooks, etc., especially if big-oss-project already
was shipping some project-specific tooling (like maybe a special
debugger or a docker image or I don't know).

The nice thing about 'hasRemoteUrl' in this case is that we don't need
to know the location of the user's big-oss-project/ checkout on disk. We
can set that config globally and they can checkout big-oss-project as
many times and as many places as they wish. It wouldn't be possible to
ship configs via a package manager or other automated script without it.

> 
> NEEDSWORK: The way this works is that if we see such an include, we
> shunt all subsequent configs into a stash (while looking for URLs), then
> process the stash. In particular, this means that more memory is needed,
> and the nature of error reporting changes (currently, if a callback
> returns nonzero for a variable, processing halts immediately, but with
> this patch, all the config might be read from disk before the callback
> even sees the variable). I'll need to expand on this and write a
> documentation section.

Hm. I'm not so sure about making another structure for storing config
into memory, because we already do that during the regular config parse
(to make things like git_config_get_string() fast). Can we not re-walk
the in-memory config at the end of the normal parse, rather than reading
from disk twice?

I think git_config()/repo_config() callback even does that for you for free...?

2304 void repo_config(struct repository *repo, config_fn_t fn, void
*data)
2305 {
2306         git_config_check_init(repo);
2307         configset_iter(repo->config, fn, data);
2308 }

> 
> One alternative is to rerun the config parsing mechanism upon noticing
> the first URL-conditional include in order to find all URLs. This would
> require the config files to be read from disk twice, though.

What's the easiest way to "try it and see", to add tooling and find out
whether the config files would be reopened during the second parse?
Because I suspect that we won't actually reopen those files, due to the
config cache.

So couldn't we do something like....

pass #1:
 if (include)
   if (not hasRemoteUrl)
     open up path & parse
 put config into in-memory cache normally
pass #2: (and this pass would need to be added to repo_config() probably)
 if (include)
   if (hasRemoteUrl)
     open up path & parse
     insert in-order into in-memory cache
 don't touch existing configs otherwise

I think it's in practice similar to the approach you're using (getting
around the weird ordering with a cache in memory), but we could reuse
the existing config cache rather than creating a new and different one.

 - Emily

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [WIP v2 2/2] config: include file if remote URL matches a glob
  2021-11-05 20:24     ` Emily Shaffer
@ 2021-11-06  4:41       ` Ævar Arnfjörð Bjarmason
  2021-11-09  0:25         ` Jonathan Tan
  2021-11-09  0:22       ` Jonathan Tan
  1 sibling, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-06  4:41 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jonathan Tan, git, peff, gitster


On Fri, Nov 05 2021, Emily Shaffer wrote:

> On Fri, Oct 29, 2021 at 10:31:10AM -0700, Jonathan Tan wrote:
> [...]
>> 
>> One alternative is to rerun the config parsing mechanism upon noticing
>> the first URL-conditional include in order to find all URLs. This would
>> require the config files to be read from disk twice, though.
>
> What's the easiest way to "try it and see", to add tooling and find out
> whether the config files would be reopened during the second parse?
> Because I suspect that we won't actually reopen those files, due to the
> config cache.

strace -f?

> So couldn't we do something like....
>
> pass #1:
>  if (include)
>    if (not hasRemoteUrl)
>      open up path & parse
>  put config into in-memory cache normally
> pass #2: (and this pass would need to be added to repo_config() probably)
>  if (include)
>    if (hasRemoteUrl)
>      open up path & parse
>      insert in-order into in-memory cache
>  don't touch existing configs otherwise
>
> I think it's in practice similar to the approach you're using (getting
> around the weird ordering with a cache in memory), but we could reuse
> the existing config cache rather than creating a new and different one.

I don't know enough to say if this two-step approach is better (although
I'm slightly biased in that direction, since it seems simpler), but this
just seems like premature optimization.

I.e. let's just read the files twice, they'll be in the OS's FS cache,
which is unlikely to be a bottleneck for the amount of files involved.

That being said we do have exactly this cache already. See [1] and
3c8687a73ee (add `config_set` API for caching config-like files,
2014-07-28).

But I think that was added due to *very* frequent re-parsing of the
entire config every time someone needed a config variable, not due to
the I/O overhead (but I may be wrong).

So if we've got 100 config variables we need and 10 config files then
10*100 is probably starting to hurt, but if for whatever reason we
needed 2*10 here that's probably no big deal, and in any case would only
happen if this new include mechanism was in play.

1. https://lore.kernel.org/git/1404631162-18556-1-git-send-email-tanayabh@gmail.com/ 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [WIP v2 2/2] config: include file if remote URL matches a glob
  2021-11-05 20:24     ` Emily Shaffer
  2021-11-06  4:41       ` Ævar Arnfjörð Bjarmason
@ 2021-11-09  0:22       ` Jonathan Tan
  1 sibling, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-09  0:22 UTC (permalink / raw)
  To: emilyshaffer; +Cc: jonathantanmy, git, peff, gitster, avarab

> To expand a little more on this:

[snip]

> The nice thing about 'hasRemoteUrl' in this case is that we don't need
> to know the location of the user's big-oss-project/ checkout on disk. We
> can set that config globally and they can checkout big-oss-project as
> many times and as many places as they wish. It wouldn't be possible to
> ship configs via a package manager or other automated script without it.

Ah, thanks for the elaboration!

> > NEEDSWORK: The way this works is that if we see such an include, we
> > shunt all subsequent configs into a stash (while looking for URLs), then
> > process the stash. In particular, this means that more memory is needed,
> > and the nature of error reporting changes (currently, if a callback
> > returns nonzero for a variable, processing halts immediately, but with
> > this patch, all the config might be read from disk before the callback
> > even sees the variable). I'll need to expand on this and write a
> > documentation section.
> 
> Hm. I'm not so sure about making another structure for storing config
> into memory, because we already do that during the regular config parse
> (to make things like git_config_get_string() fast). Can we not re-walk
> the in-memory config at the end of the normal parse, rather than reading
> from disk twice?
> 
> I think git_config()/repo_config() callback even does that for you for free...?

The main thing is that we wouldn't know if an entry would have been
overridden by a value from an includeif.hasremoteurl or not.

> What's the easiest way to "try it and see", to add tooling and find out
> whether the config files would be reopened during the second parse?
> Because I suspect that we won't actually reopen those files, due to the
> config cache.

As Ævar says, strace should work. The hard part is implementing the
recursive config parse, but it looks like the way to go, so I'll try it
and see how it goes.

[1] https://lore.kernel.org/git/211106.8635o9hogz.gmgdl@evledraar.gmail.com/

> So couldn't we do something like....
> 
> pass #1:
>  if (include)
>    if (not hasRemoteUrl)
>      open up path & parse
>  put config into in-memory cache normally
> pass #2: (and this pass would need to be added to repo_config() probably)
>  if (include)
>    if (hasRemoteUrl)
>      open up path & parse
>      insert in-order into in-memory cache
>  don't touch existing configs otherwise
>
> I think it's in practice similar to the approach you're using (getting
> around the weird ordering with a cache in memory), but we could reuse
> the existing config cache rather than creating a new and different one.

What do you mean by "insert in-order"? If you mean figuring out which
variables would be overridden (and for multi-valued variables, what
order to put all the values in), I think that's the hard part.

Another thing is that at the point where we read the config
(config_with_options()), we have a callback, so we would need to make
sure that we're writing to the in-memory cache in the first place (as
opposed to passing a callback that does something else). That might be
doable by changing the API, but in ay case, I'll try the recursive
config parse first.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [WIP v2 2/2] config: include file if remote URL matches a glob
  2021-11-06  4:41       ` Ævar Arnfjörð Bjarmason
@ 2021-11-09  0:25         ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-09  0:25 UTC (permalink / raw)
  To: avarab; +Cc: emilyshaffer, jonathantanmy, git, peff, gitster

> > What's the easiest way to "try it and see", to add tooling and find out
> > whether the config files would be reopened during the second parse?
> > Because I suspect that we won't actually reopen those files, due to the
> > config cache.
> 
> strace -f?

Thanks - this might work.

> > So couldn't we do something like....
> >
> > pass #1:
> >  if (include)
> >    if (not hasRemoteUrl)
> >      open up path & parse
> >  put config into in-memory cache normally
> > pass #2: (and this pass would need to be added to repo_config() probably)
> >  if (include)
> >    if (hasRemoteUrl)
> >      open up path & parse
> >      insert in-order into in-memory cache
> >  don't touch existing configs otherwise
> >
> > I think it's in practice similar to the approach you're using (getting
> > around the weird ordering with a cache in memory), but we could reuse
> > the existing config cache rather than creating a new and different one.
> 
> I don't know enough to say if this two-step approach is better (although
> I'm slightly biased in that direction, since it seems simpler), but this
> just seems like premature optimization.
> 
> I.e. let's just read the files twice, they'll be in the OS's FS cache,
> which is unlikely to be a bottleneck for the amount of files involved.

OK - let me try this.

> That being said we do have exactly this cache already. See [1] and
> 3c8687a73ee (add `config_set` API for caching config-like files,
> 2014-07-28).
> 
> But I think that was added due to *very* frequent re-parsing of the
> entire config every time someone needed a config variable, not due to
> the I/O overhead (but I may be wrong).
> 
> So if we've got 100 config variables we need and 10 config files then
> 10*100 is probably starting to hurt, but if for whatever reason we
> needed 2*10 here that's probably no big deal, and in any case would only
> happen if this new include mechanism was in play.
> 
> 1. https://lore.kernel.org/git/1404631162-18556-1-git-send-email-tanayabh@gmail.com/ 

This might not work for the reasons I described in my reply to Emily
[1]. I'll try the read-twice version first.

[1] https://lore.kernel.org/git/20211109002255.1110653-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (5 preceding siblings ...)
  2021-10-29 17:31 ` [WIP v2 " Jonathan Tan
@ 2021-11-16  0:00 ` Jonathan Tan
  2021-11-16  0:00   ` [PATCH v3 1/2] config: make git_config_include() static Jonathan Tan
  2021-11-16  0:00   ` [PATCH v3 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-11-29 20:23 ` [PATCH v4 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-16  0:00 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, emilyshaffer, peff, avarab, gitster

Here's a version that starts a second traversal of the config file read
when encountering the first remote URL. No user-visible changes from v2,
but hopefully the algorithm is simpler.

I've also added the user-facing documentation and a test of what the
glob pattern should match.

Some people have suggested avoiding the forward declaration in patch 1,
but I found that there are 2 functions that call each other, so the
forward declaration cannot be avoided.

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 Documentation/config.txt |  11 ++++
 config.c                 | 133 ++++++++++++++++++++++++++++++++++++---
 config.h                 |  44 ++++---------
 t/t1300-config.sh        | 100 +++++++++++++++++++++++++++++
 4 files changed, 246 insertions(+), 42 deletions(-)

-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 1/2] config: make git_config_include() static
  2021-11-16  0:00 ` [PATCH v3 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2021-11-16  0:00   ` Jonathan Tan
  2021-11-16  0:00   ` [PATCH v3 2/2] config: include file if remote URL matches a glob Jonathan Tan
  1 sibling, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-16  0:00 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, emilyshaffer, peff, avarab, gitster

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2dcbe901b6..94ad5ce913 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index f119de0130..48a5e472ca 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-16  0:00 ` [PATCH v3 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-11-16  0:00   ` [PATCH v3 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-11-16  0:00   ` Jonathan Tan
  2021-11-22 22:59     ` Glen Choo
                       ` (2 more replies)
  1 sibling, 3 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-16  0:00 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, emilyshaffer, peff, avarab, gitster

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager - it could, for example,
contain a file to be included conditionally and a post-install script
that adds the include directive to the system-wide config file).

In order to do this, Git reruns the config parsing mechanism upon
noticing the first URL-conditional include in order to find all remote
URLs, and these remote URLs are then used to determine if that first and
all subsequent includes are executed. Remote URLs are not allowed to be
configued in any URL-conditionally-included file.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt |  11 ++++
 config.c                 | 121 ++++++++++++++++++++++++++++++++++++---
 config.h                 |   7 +++
 t/t1300-config.sh        | 100 ++++++++++++++++++++++++++++++++
 4 files changed, 231 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0c0e6b859f..93d18b2fe9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -159,6 +159,17 @@ all branches that begin with `foo/`. This is useful if your branches are
 organized hierarchically and you would like to apply a configuration to
 all the branches in that hierarchy.
 
+`hasremoteurl`::
+	The data that follows the keyword `hasremoteurl:` is taken to
+	be a pattern with standard globbing wildcards and two
+	additional ones, `**/` and `/**`, that can match multiple
+	components. The rest of the config files will be scanned for
+	remote URLs, and then if there at least one remote URL that
+	matches this pattern, the include condition is met.
++
+Files included by this option (directly or indirectly) are not allowed
+to contain remote URLs.
+
 A few more notes on matching via `gitdir` and `gitdir/i`:
 
  * Symlinks in `$GIT_DIR` are not resolved before matching.
diff --git a/config.c b/config.c
index 94ad5ce913..4ffc1e87e9 100644
--- a/config.c
+++ b/config.c
@@ -125,6 +125,12 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+	struct git_config_source *config_source;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list *remote_urls;
 };
 #define CONFIG_INCLUDE_INIT { 0 }
 
@@ -316,12 +322,83 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
+static int add_remote_url(const char *var, const char *value, void *data)
+{
+	struct string_list *remote_urls = data;
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(remote_urls, value);
+	return 0;
+}
+
+static void populate_remote_urls(struct config_include_data *inc)
+{
+	struct config_options opts;
+
+	struct config_source *store_cf = cf;
+	struct key_value_info *store_kvi = current_config_kvi;
+	enum config_scope store_scope = current_parsing_scope;
+
+	opts = *inc->opts;
+	opts.unconditional_remote_url = 1;
+
+	cf = NULL;
+	current_config_kvi = NULL;
+	current_parsing_scope = 0;
+
+	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
+	string_list_init_dup(inc->remote_urls);
+	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+
+	cf = store_cf;
+	current_config_kvi = store_kvi;
+	current_parsing_scope = store_scope;
+}
+
+static int forbid_remote_url(const char *var, const char *value, void *data)
+{
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl"));
+	return 0;
+}
+
+static int at_least_one_url_matches_glob(const char *glob, int glob_len,
+					 struct string_list *remote_urls)
+{
+	struct strbuf pattern = STRBUF_INIT;
+	struct string_list_item *url_item;
+	int found = 0;
+
+	strbuf_add(&pattern, glob, glob_len);
+	for_each_string_list_item(url_item, remote_urls) {
+		if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+			found = 1;
+			break;
+		}
+	}
+	strbuf_release(&pattern);
+	return found;
+}
+
 static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
 	size_t cond_len;
-	int ret;
+	int ret = 0;
 
 	/*
 	 * Pass along all values, including "include" directives; this makes it
@@ -335,9 +412,29 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
-		ret = handle_path_include(value, inc);
+	    cond && !strcmp(key, "path")) {
+		const char *url;
+		size_t url_len;
+
+		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
+				    &url_len)) {
+			if (inc->opts->unconditional_remote_url) {
+				config_fn_t old_fn = inc->fn;
+
+				inc->fn = forbid_remote_url;
+				ret = handle_path_include(value, inc);
+				inc->fn = old_fn;
+			} else {
+				if (!inc->remote_urls)
+					populate_remote_urls(inc);
+				if (at_least_one_url_matches_glob(
+						url, url_len, inc->remote_urls))
+					ret = handle_path_include(value, inc);
+			}
+		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
+			ret = handle_path_include(value, inc);
+		}
+	}
 
 	return ret;
 }
@@ -1933,11 +2030,13 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
 		inc.data = data;
 		inc.opts = opts;
+		inc.config_source = config_source;
 		fn = git_config_include;
 		data = &inc;
 	}
@@ -1950,17 +2049,23 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	if (inc.remote_urls) {
+		string_list_clear(inc.remote_urls, 0);
+		FREE_AND_NULL(inc.remote_urls);
+	}
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/config.h b/config.h
index 48a5e472ca..c24458b10a 100644
--- a/config.h
+++ b/config.h
@@ -89,6 +89,13 @@ struct config_options {
 	unsigned int ignore_worktree : 1;
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
+
+	/*
+	 * For internal use. Include all includeif.hasremoteurl paths without
+	 * checking if the repo has that remote URL.
+	 */
+	unsigned int unconditional_remote_url : 1;
+
 	const char *commondir;
 	const char *git_dir;
 	config_parser_event_fn_t event_fn;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..9daab4c6da 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,104 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasremoteurl' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >"$(pwd)"/dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasremoteurl:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
+test_expect_success 'includeIf.hasremoteurl respects last-config-wins' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-two-three <<-\EOF &&
+	[user]
+		two = included-config
+		three = included-config
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = foo
+	[user]
+		one = main-config
+		two = main-config
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-two-three"
+	[user]
+		three = main-config
+	EOF
+
+	echo main-config >expect-main-config &&
+	echo included-config >expect-included-config &&
+
+	git -C hasremoteurlTest config --get user.one >actual &&
+	test_cmp expect-main-config actual &&
+
+	git -C hasremoteurlTest config --get user.two >actual &&
+	test_cmp expect-included-config actual &&
+
+	git -C hasremoteurlTest config --get user.three >actual &&
+	test_cmp expect-main-config actual
+'
+
+test_expect_success 'includeIf.hasremoteurl globs' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	printf "[user]\ndss = yes\n" >"$(pwd)/double-star-start" &&
+	printf "[user]\ndse = yes\n" >"$(pwd)/double-star-end" &&
+	printf "[user]\ndsm = yes\n" >"$(pwd)/double-star-middle" &&
+	printf "[user]\nssm = yes\n" >"$(pwd)/single-star-middle" &&
+	printf "[user]\nno = no\n" >"$(pwd)/no" &&
+
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = https://foo/bar/baz
+	[includeIf "hasremoteurl:**/baz"]
+		path = "$(pwd)/double-star-start"
+	[includeIf "hasremoteurl:**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasremoteurl:https:/**"]
+		path = "$(pwd)/double-star-end"
+	[includeIf "hasremoteurl:nomatch:/**"]
+		path = "$(pwd)/no"
+	[includeIf "hasremoteurl:https:/**/baz"]
+		path = "$(pwd)/double-star-middle"
+	[includeIf "hasremoteurl:https:/**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasremoteurl:https://*/bar/baz"]
+		path = "$(pwd)/single-star-middle"
+	[includeIf "hasremoteurl:https://*/baz"]
+		path = "$(pwd)/no"
+	EOF
+
+	git -C hasremoteurlTest config --get user.dss &&
+	git -C hasremoteurlTest config --get user.dse &&
+	git -C hasremoteurlTest config --get user.dsm &&
+	git -C hasremoteurlTest config --get user.ssm &&
+	test_must_fail git -C hasremoteurlTest config --get user.no
+'
+
 test_done
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-16  0:00   ` [PATCH v3 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-11-22 22:59     ` Glen Choo
  2021-11-29 17:53       ` Jonathan Tan
  2021-11-23  1:22     ` Junio C Hamano
  2021-11-23  1:27     ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 87+ messages in thread
From: Glen Choo @ 2021-11-22 22:59 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: Jonathan Tan, emilyshaffer, peff, avarab, gitster

Jonathan Tan <jonathantanmy@google.com> writes:

> +`hasremoteurl`::
> +	The data that follows the keyword `hasremoteurl:` is taken to
> +	be a pattern with standard globbing wildcards and two
> +	additional ones, `**/` and `/**`, that can match multiple
> +	components. The rest of the config files will be scanned for
> +	remote URLs, and then if there at least one remote URL that

  if there {is,exists}* at least one remote URL that

> +	matches this pattern, the include condition is met.
> ++
> +Files included by this option (directly or indirectly) are not allowed
> +to contain remote URLs.

As Jeff mentioned earlier in this thread, this "last-config-wins" is a
pretty big exception to the existing semantics, as
Documentation/config.txt reads:

  The contents of the included file are inserted immediately, as if they
  had been found at the location of the include directive.

At minimum, I think we should call out this exception in
Documentation/config.txt and the commit message, but calling out *just*
hasremoteurl makes this exception seem like a strange anomaly at first
glance, even though we actually have a good idea of when and why we are
doing this (which is that it simplifies includes that rely on config
values).

I was a big fan of your includeIfDeferred proposal, and I still think
that it's easier for users to understand if we explicitly require
"includeIfDeferred" instead of counting on them to remember when
"includeIf" behaves as it always did vs this new 'deferred' behavior.
That said, I doubt most users actually rely on the inclusion order, and 
I am ok with this approach as long as we document the different
inclusion order.


> +static void populate_remote_urls(struct config_include_data *inc)
> +{
> +	struct config_options opts;
> +
> +	struct config_source *store_cf = cf;
> +	struct key_value_info *store_kvi = current_config_kvi;
> +	enum config_scope store_scope = current_parsing_scope;
> +
> +	opts = *inc->opts;
> +	opts.unconditional_remote_url = 1;
> +
> +	cf = NULL;
> +	current_config_kvi = NULL;
> +	current_parsing_scope = 0;
> +
> +	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
> +	string_list_init_dup(inc->remote_urls);
> +	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
> +
> +	cf = store_cf;
> +	current_config_kvi = store_kvi;
> +	current_parsing_scope = store_scope;
> +}

The algorithm is easy to understand and reuses config_with_options(),
which is great.

> +static int forbid_remote_url(const char *var, const char *value, void *data)
> +{
> +	const char *remote_name;
> +	size_t remote_name_len;
> +	const char *key;
> +
> +	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
> +			      &key) &&
> +	    remote_name &&
> +	    !strcmp(key, "url"))
> +		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl"));
> +	return 0;
> +}
> +
> +static int at_least_one_url_matches_glob(const char *glob, int glob_len,
> +					 struct string_list *remote_urls)
> +{
> +	struct strbuf pattern = STRBUF_INIT;
> +	struct string_list_item *url_item;
> +	int found = 0;
> +
> +	strbuf_add(&pattern, glob, glob_len);
> +	for_each_string_list_item(url_item, remote_urls) {
> +		if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
> +			found = 1;
> +			break;
> +		}
> +	}
> +	strbuf_release(&pattern);
> +	return found;
> +}
> +
>  static int git_config_include(const char *var, const char *value, void *data)
>  {
>  	struct config_include_data *inc = data;
>  	const char *cond, *key;
>  	size_t cond_len;
> -	int ret;
> +	int ret = 0;
>  
>  	/*
>  	 * Pass along all values, including "include" directives; this makes it
> @@ -335,9 +412,29 @@ static int git_config_include(const char *var, const char *value, void *data)
>  		ret = handle_path_include(value, inc);
>  
>  	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
> -	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
> -	    !strcmp(key, "path"))
> -		ret = handle_path_include(value, inc);
> +	    cond && !strcmp(key, "path")) {
> +		const char *url;
> +		size_t url_len;
> +
> +		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
> +				    &url_len)) {
> +			if (inc->opts->unconditional_remote_url) {
> +				config_fn_t old_fn = inc->fn;
> +
> +				inc->fn = forbid_remote_url;

When unconditional_remote_url is true, we forbid remote urls in the
included files as expected, but...

> +				ret = handle_path_include(value, inc);
> +				inc->fn = old_fn;
> +			} else {
> +				if (!inc->remote_urls)
> +					populate_remote_urls(inc);
> +				if (at_least_one_url_matches_glob(
> +						url, url_len, inc->remote_urls))
> +					ret = handle_path_include(value, inc);
> +			}
> +		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
> +			ret = handle_path_include(value, inc);
> +		}
> +	}
>  
>  	return ret;
>  }

It's not clear to me whether we are forbidding the remote urls correctly
when uncondition_remote_url is false. I would be convinced if we had
tests that convered this behavior, but I did not find any such test
cases.

> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index 9ff46f3b04..9daab4c6da 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -2387,4 +2387,104 @@ test_expect_success '--get and --get-all with --fixed-value' '
>  	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
>  '
>  
> +test_expect_success 'includeIf.hasremoteurl' '
> +	git init hasremoteurlTest &&
> +	test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +	cat >"$(pwd)"/include-this <<-\EOF &&
> +	[user]
> +		this = this-is-included
> +	EOF
> +	cat >"$(pwd)"/dont-include-that <<-\EOF &&
> +	[user]
> +		that = that-is-not-included
> +	EOF
> +	cat >>hasremoteurlTest/.git/config <<-EOF &&
> +	[includeIf "hasremoteurl:foo"]
> +		path = "$(pwd)/include-this"
> +	[includeIf "hasremoteurl:bar"]
> +		path = "$(pwd)/dont-include-that"
> +	[remote "foo"]
> +		url = foo
> +	EOF
> +
> +	echo this-is-included >expect-this &&
> +	git -C hasremoteurlTest config --get user.this >actual-this &&
> +	test_cmp expect-this actual-this &&
> +
> +	test_must_fail git -C hasremoteurlTest config --get user.that
> +'
> +
> +test_expect_success 'includeIf.hasremoteurl respects last-config-wins' '
> +	git init hasremoteurlTest &&
> +	test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +	cat >"$(pwd)"/include-two-three <<-\EOF &&
> +	[user]
> +		two = included-config
> +		three = included-config
> +	EOF
> +	cat >>hasremoteurlTest/.git/config <<-EOF &&
> +	[remote "foo"]
> +		url = foo
> +	[user]
> +		one = main-config
> +		two = main-config
> +	[includeIf "hasremoteurl:foo"]
> +		path = "$(pwd)/include-two-three"
> +	[user]
> +		three = main-config
> +	EOF
> +
> +	echo main-config >expect-main-config &&
> +	echo included-config >expect-included-config &&
> +
> +	git -C hasremoteurlTest config --get user.one >actual &&
> +	test_cmp expect-main-config actual &&
> +
> +	git -C hasremoteurlTest config --get user.two >actual &&
> +	test_cmp expect-included-config actual &&
> +
> +	git -C hasremoteurlTest config --get user.three >actual &&
> +	test_cmp expect-main-config actual
> +'
> +
> +test_expect_success 'includeIf.hasremoteurl globs' '
> +	git init hasremoteurlTest &&
> +	test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +	printf "[user]\ndss = yes\n" >"$(pwd)/double-star-start" &&
> +	printf "[user]\ndse = yes\n" >"$(pwd)/double-star-end" &&
> +	printf "[user]\ndsm = yes\n" >"$(pwd)/double-star-middle" &&
> +	printf "[user]\nssm = yes\n" >"$(pwd)/single-star-middle" &&
> +	printf "[user]\nno = no\n" >"$(pwd)/no" &&
> +
> +	cat >>hasremoteurlTest/.git/config <<-EOF &&
> +	[remote "foo"]
> +		url = https://foo/bar/baz
> +	[includeIf "hasremoteurl:**/baz"]
> +		path = "$(pwd)/double-star-start"
> +	[includeIf "hasremoteurl:**/nomatch"]
> +		path = "$(pwd)/no"
> +	[includeIf "hasremoteurl:https:/**"]
> +		path = "$(pwd)/double-star-end"
> +	[includeIf "hasremoteurl:nomatch:/**"]
> +		path = "$(pwd)/no"

As mentioned above, I would have expected to find test cases that test
whether or not we forbid the remote urls correctly, but the tests are
pretty clear.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-16  0:00   ` [PATCH v3 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-11-22 22:59     ` Glen Choo
@ 2021-11-23  1:22     ` Junio C Hamano
  2021-11-29 18:18       ` Jonathan Tan
  2021-11-23  1:27     ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-11-23  1:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, emilyshaffer, peff, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> This is a feature that supports config file inclusion conditional on
> whether the repo has a remote with a URL that matches a glob.
>
> Similar to my previous work on remote-suggested hooks [1], the main
> motivation is to allow remote repo administrators to provide recommended
> configs in a way that can be consumed more easily (e.g. through a
> package installable by a package manager - it could, for example,
> contain a file to be included conditionally and a post-install script
> that adds the include directive to the system-wide config file).
>
> In order to do this, Git reruns the config parsing mechanism upon
> noticing the first URL-conditional include in order to find all remote
> URLs, and these remote URLs are then used to determine if that first and
> all subsequent includes are executed. Remote URLs are not allowed to be
> configued in any URL-conditionally-included file.
>
> [1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/config.txt |  11 ++++
>  config.c                 | 121 ++++++++++++++++++++++++++++++++++++---
>  config.h                 |   7 +++
>  t/t1300-config.sh        | 100 ++++++++++++++++++++++++++++++++
>  4 files changed, 231 insertions(+), 8 deletions(-)

Here is just a design level comment, without trying to outline the
implementation in my head like I usually do before making any
suggestion, but it strikes me somewhat sad that config.c needs to
know specifically about "remote_url".

I wonder if this can be a more generalized framework that allows us
to say "we introduce a new [includeIf] variant to get another file
included only if some condition is met for the configuration
variables we read without the includeIf directive", with variations
of "condition" including

 - a literal X is among the values of multi-valued variable Y.
 - a pattern X matches one of the values of multi-valued variable Y.
 - a literal Y is the name of an existing configuration variable.
 - a pattern Y matches the name of an existing configuration variable.

If that is done, I would imagine that the feature can become a thin
specialization e.g. "there is an existing configuration variable
whose name is 'remotes.https://github.com/janathantanmy/git.url'"

Perhaps I am dreaming?

Thanks.




^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-16  0:00   ` [PATCH v3 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-11-22 22:59     ` Glen Choo
  2021-11-23  1:22     ` Junio C Hamano
@ 2021-11-23  1:27     ` Ævar Arnfjörð Bjarmason
  2021-11-29 18:33       ` Jonathan Tan
  2 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-23  1:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, emilyshaffer, peff, gitster


On Mon, Nov 15 2021, Jonathan Tan wrote:

> +`hasremoteurl`::
> +	The data that follows the keyword `hasremoteurl:` is taken to

Both here..

> +		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl"));

..and here...

> +		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,

...but not here (C code)..

> +	 * For internal use. Include all includeif.hasremoteurl paths without

..but here..

> +test_expect_success 'includeIf.hasremoteurl' '

..and also here etc., let's consistently camelCase config keys whenever
we're not using them for lookups in the C
code.

I.e. "includeIf.hasRemoteUrl" (possibly "includeIf.hasRemoteURL"?). It
makes them a lot easier to read, and makes the end-user documentation &
messaging more consistent.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-22 22:59     ` Glen Choo
@ 2021-11-29 17:53       ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-29 17:53 UTC (permalink / raw)
  To: chooglen; +Cc: jonathantanmy, git, emilyshaffer, peff, avarab, gitster

Glen Choo <chooglen@google.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > +`hasremoteurl`::
> > +	The data that follows the keyword `hasremoteurl:` is taken to
> > +	be a pattern with standard globbing wildcards and two
> > +	additional ones, `**/` and `/**`, that can match multiple
> > +	components. The rest of the config files will be scanned for
> > +	remote URLs, and then if there at least one remote URL that
> 
>   if there {is,exists}* at least one remote URL that

Ah, good catch.

> > +	matches this pattern, the include condition is met.
> > ++
> > +Files included by this option (directly or indirectly) are not allowed
> > +to contain remote URLs.
> 
> As Jeff mentioned earlier in this thread, this "last-config-wins" is a
> pretty big exception to the existing semantics, as
> Documentation/config.txt reads:
> 
>   The contents of the included file are inserted immediately, as if they
>   had been found at the location of the include directive.
> 
> At minimum, I think we should call out this exception in
> Documentation/config.txt and the commit message, but calling out *just*
> hasremoteurl makes this exception seem like a strange anomaly at first
> glance, even though we actually have a good idea of when and why we are
> doing this (which is that it simplifies includes that rely on config
> values).

I've switched it to expand-in-place semantics. The scanning for remote
URLs does not mean that those configs are applied before the include.
I'll add a note to the documentation about that, but if you can think of
a better way to explain that, that would be great.

The patch includes a test "includeIf.hasremoteurl respects
last-config-wins". Take a look and see if it matches your expected
behavior, and let me know if it could be clearer.

> I was a big fan of your includeIfDeferred proposal, and I still think
> that it's easier for users to understand if we explicitly require
> "includeIfDeferred" instead of counting on them to remember when
> "includeIf" behaves as it always did vs this new 'deferred' behavior.
> That said, I doubt most users actually rely on the inclusion order, and 
> I am ok with this approach as long as we document the different
> inclusion order.

The user still needs to know that config variables in the future can
affect the behavior of the include, but perhaps that will be easier than
remembering that certain configs are deferred.

> It's not clear to me whether we are forbidding the remote urls correctly
> when uncondition_remote_url is false. I would be convinced if we had
> tests that convered this behavior, but I did not find any such test
> cases.

[snip]

> As mentioned above, I would have expected to find test cases that test
> whether or not we forbid the remote urls correctly, but the tests are
> pretty clear.

Ah yes, I should include a test for this. I'll include it in the next
reroll.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-23  1:22     ` Junio C Hamano
@ 2021-11-29 18:18       ` Jonathan Tan
  2021-12-01 18:51         ` Junio C Hamano
  0 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-11-29 18:18 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, emilyshaffer, peff, avarab

Junio C Hamano <gitster@pobox.com> writes:
> Here is just a design level comment, without trying to outline the
> implementation in my head like I usually do before making any
> suggestion, but it strikes me somewhat sad that config.c needs to
> know specifically about "remote_url".
> 
> I wonder if this can be a more generalized framework that allows us
> to say "we introduce a new [includeIf] variant to get another file
> included only if some condition is met for the configuration
> variables we read without the includeIf directive", with variations
> of "condition" including
> 
>  - a literal X is among the values of multi-valued variable Y.
>  - a pattern X matches one of the values of multi-valued variable Y.
>  - a literal Y is the name of an existing configuration variable.
>  - a pattern Y matches the name of an existing configuration variable.
> 
> If that is done, I would imagine that the feature can become a thin
> specialization e.g. "there is an existing configuration variable
> whose name is 'remotes.https://github.com/janathantanmy/git.url'"
> 
> Perhaps I am dreaming?
> 
> Thanks.

I think that the hard part of this is how to present this to the end
user - right now, we just have one pattern of variable (remote.*.url,
where "*" is a wildcard) and one pattern of value with specific
properties (e.g. this is a glob, not a regular expression, and the
special value "**" is supported).

Once we figure that out, I would think that we could implement it in a
way similar to this patch. As for whether we should wait until the full
feature before merging any code that does includeIf based on a variable
(in order to avoid having code that would quickly be replaced by other
code), in this case, unless there is another use case for this, I think
we should proceed with the use case that we know about first
(conditional include of a file supplied by a remote repo administrator).

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-23  1:27     ` Ævar Arnfjörð Bjarmason
@ 2021-11-29 18:33       ` Jonathan Tan
  2021-11-29 20:50         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-11-29 18:33 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git, emilyshaffer, peff, gitster

> On Mon, Nov 15 2021, Jonathan Tan wrote:
> 
> > +`hasremoteurl`::
> > +	The data that follows the keyword `hasremoteurl:` is taken to
> 
> Both here..
> 
> > +		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl"));
> 
> ..and here...
> 
> > +		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
> 
> ...but not here (C code)..
> 
> > +	 * For internal use. Include all includeif.hasremoteurl paths without
> 
> ..but here..
> 
> > +test_expect_success 'includeIf.hasremoteurl' '
> 
> ..and also here etc., let's consistently camelCase config keys whenever
> we're not using them for lookups in the C
> code.
> 
> I.e. "includeIf.hasRemoteUrl" (possibly "includeIf.hasRemoteURL"?). It
> makes them a lot easier to read, and makes the end-user documentation &
> messaging more consistent.

The middle part is not case-insensitive, though - I tried changing it in
the test and the test now fails. (Unless you mean that we should also
change the code to make it case-insensitive - but I would think that
it's better for the URL to be case-sensitive, and by extension, the
"hasremoteurl:" part connected to it.)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v4 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (6 preceding siblings ...)
  2021-11-16  0:00 ` [PATCH v3 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2021-11-29 20:23 ` Jonathan Tan
  2021-11-29 20:23   ` [PATCH v4 1/2] config: make git_config_include() static Jonathan Tan
                     ` (2 more replies)
  2021-12-02 23:31 ` [PATCH v5 " Jonathan Tan
                   ` (3 subsequent siblings)
  11 siblings, 3 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-29 20:23 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster, avarab

Thanks everyone for your comments. Here's an update.

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 Documentation/config.txt |  12 ++++
 config.c                 | 133 ++++++++++++++++++++++++++++++++++++---
 config.h                 |  44 ++++---------
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++
 4 files changed, 265 insertions(+), 42 deletions(-)

Range-diff against v3:
1:  b2dcae03ed = 1:  b2dcae03ed config: make git_config_include() static
2:  1c1a07a0b6 ! 2:  3b3af0da98 config: include file if remote URL matches a glob
    @@ Documentation/config.txt: all branches that begin with `foo/`. This is useful if
     +	The data that follows the keyword `hasremoteurl:` is taken to
     +	be a pattern with standard globbing wildcards and two
     +	additional ones, `**/` and `/**`, that can match multiple
    -+	components. The rest of the config files will be scanned for
    -+	remote URLs, and then if there at least one remote URL that
    -+	matches this pattern, the include condition is met.
    ++	components. The first time this keyword is seen, the rest of
    ++	the config files will be scanned for remote URLs (without
    ++	applying any values). If there exists at least one remote URL
    ++	that matches this pattern, the include condition is met.
     ++
     +Files included by this option (directly or indirectly) are not allowed
     +to contain remote URLs.
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	git -C hasremoteurlTest config --get user.ssm &&
     +	test_must_fail git -C hasremoteurlTest config --get user.no
     +'
    ++
    ++test_expect_success 'includeIf.hasremoteurl forbids remote url in such included files' '
    ++	git init hasremoteurlTest &&
    ++	test_when_finished "rm -rf hasremoteurlTest" &&
    ++
    ++	cat >"$(pwd)"/include-with-url <<-\EOF &&
    ++	[remote "bar"]
    ++		url = bar
    ++	EOF
    ++	cat >>hasremoteurlTest/.git/config <<-EOF &&
    ++	[includeIf "hasremoteurl:foo"]
    ++		path = "$(pwd)/include-with-url"
    ++	EOF
    ++
    ++	# test with any Git command
    ++	test_must_fail git -C hasremoteurlTest status 2>err &&
    ++	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl" err
    ++'
     +
      test_done
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v4 1/2] config: make git_config_include() static
  2021-11-29 20:23 ` [PATCH v4 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2021-11-29 20:23   ` Jonathan Tan
  2021-11-29 20:23   ` [PATCH v4 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-11-29 20:48   ` [PATCH v4 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-11-29 20:23 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster, avarab

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2dcbe901b6..94ad5ce913 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index f119de0130..48a5e472ca 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 2/2] config: include file if remote URL matches a glob
  2021-11-29 20:23 ` [PATCH v4 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-11-29 20:23   ` [PATCH v4 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-11-29 20:23   ` Jonathan Tan
  2021-12-02  6:57     ` Junio C Hamano
  2021-11-29 20:48   ` [PATCH v4 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-11-29 20:23 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster, avarab

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager - it could, for example,
contain a file to be included conditionally and a post-install script
that adds the include directive to the system-wide config file).

In order to do this, Git reruns the config parsing mechanism upon
noticing the first URL-conditional include in order to find all remote
URLs, and these remote URLs are then used to determine if that first and
all subsequent includes are executed. Remote URLs are not allowed to be
configued in any URL-conditionally-included file.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt |  12 ++++
 config.c                 | 121 ++++++++++++++++++++++++++++++++++++---
 config.h                 |   7 +++
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 250 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0c0e6b859f..bfc9e22d78 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -159,6 +159,18 @@ all branches that begin with `foo/`. This is useful if your branches are
 organized hierarchically and you would like to apply a configuration to
 all the branches in that hierarchy.
 
+`hasremoteurl`::
+	The data that follows the keyword `hasremoteurl:` is taken to
+	be a pattern with standard globbing wildcards and two
+	additional ones, `**/` and `/**`, that can match multiple
+	components. The first time this keyword is seen, the rest of
+	the config files will be scanned for remote URLs (without
+	applying any values). If there exists at least one remote URL
+	that matches this pattern, the include condition is met.
++
+Files included by this option (directly or indirectly) are not allowed
+to contain remote URLs.
+
 A few more notes on matching via `gitdir` and `gitdir/i`:
 
  * Symlinks in `$GIT_DIR` are not resolved before matching.
diff --git a/config.c b/config.c
index 94ad5ce913..4ffc1e87e9 100644
--- a/config.c
+++ b/config.c
@@ -125,6 +125,12 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+	struct git_config_source *config_source;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list *remote_urls;
 };
 #define CONFIG_INCLUDE_INIT { 0 }
 
@@ -316,12 +322,83 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
+static int add_remote_url(const char *var, const char *value, void *data)
+{
+	struct string_list *remote_urls = data;
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(remote_urls, value);
+	return 0;
+}
+
+static void populate_remote_urls(struct config_include_data *inc)
+{
+	struct config_options opts;
+
+	struct config_source *store_cf = cf;
+	struct key_value_info *store_kvi = current_config_kvi;
+	enum config_scope store_scope = current_parsing_scope;
+
+	opts = *inc->opts;
+	opts.unconditional_remote_url = 1;
+
+	cf = NULL;
+	current_config_kvi = NULL;
+	current_parsing_scope = 0;
+
+	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
+	string_list_init_dup(inc->remote_urls);
+	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+
+	cf = store_cf;
+	current_config_kvi = store_kvi;
+	current_parsing_scope = store_scope;
+}
+
+static int forbid_remote_url(const char *var, const char *value, void *data)
+{
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl"));
+	return 0;
+}
+
+static int at_least_one_url_matches_glob(const char *glob, int glob_len,
+					 struct string_list *remote_urls)
+{
+	struct strbuf pattern = STRBUF_INIT;
+	struct string_list_item *url_item;
+	int found = 0;
+
+	strbuf_add(&pattern, glob, glob_len);
+	for_each_string_list_item(url_item, remote_urls) {
+		if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+			found = 1;
+			break;
+		}
+	}
+	strbuf_release(&pattern);
+	return found;
+}
+
 static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
 	size_t cond_len;
-	int ret;
+	int ret = 0;
 
 	/*
 	 * Pass along all values, including "include" directives; this makes it
@@ -335,9 +412,29 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
-		ret = handle_path_include(value, inc);
+	    cond && !strcmp(key, "path")) {
+		const char *url;
+		size_t url_len;
+
+		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
+				    &url_len)) {
+			if (inc->opts->unconditional_remote_url) {
+				config_fn_t old_fn = inc->fn;
+
+				inc->fn = forbid_remote_url;
+				ret = handle_path_include(value, inc);
+				inc->fn = old_fn;
+			} else {
+				if (!inc->remote_urls)
+					populate_remote_urls(inc);
+				if (at_least_one_url_matches_glob(
+						url, url_len, inc->remote_urls))
+					ret = handle_path_include(value, inc);
+			}
+		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
+			ret = handle_path_include(value, inc);
+		}
+	}
 
 	return ret;
 }
@@ -1933,11 +2030,13 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
 		inc.data = data;
 		inc.opts = opts;
+		inc.config_source = config_source;
 		fn = git_config_include;
 		data = &inc;
 	}
@@ -1950,17 +2049,23 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	if (inc.remote_urls) {
+		string_list_clear(inc.remote_urls, 0);
+		FREE_AND_NULL(inc.remote_urls);
+	}
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/config.h b/config.h
index 48a5e472ca..c24458b10a 100644
--- a/config.h
+++ b/config.h
@@ -89,6 +89,13 @@ struct config_options {
 	unsigned int ignore_worktree : 1;
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
+
+	/*
+	 * For internal use. Include all includeif.hasremoteurl paths without
+	 * checking if the repo has that remote URL.
+	 */
+	unsigned int unconditional_remote_url : 1;
+
 	const char *commondir;
 	const char *git_dir;
 	config_parser_event_fn_t event_fn;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..9bd299d3f8 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,122 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasremoteurl' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >"$(pwd)"/dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasremoteurl:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
+test_expect_success 'includeIf.hasremoteurl respects last-config-wins' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-two-three <<-\EOF &&
+	[user]
+		two = included-config
+		three = included-config
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = foo
+	[user]
+		one = main-config
+		two = main-config
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-two-three"
+	[user]
+		three = main-config
+	EOF
+
+	echo main-config >expect-main-config &&
+	echo included-config >expect-included-config &&
+
+	git -C hasremoteurlTest config --get user.one >actual &&
+	test_cmp expect-main-config actual &&
+
+	git -C hasremoteurlTest config --get user.two >actual &&
+	test_cmp expect-included-config actual &&
+
+	git -C hasremoteurlTest config --get user.three >actual &&
+	test_cmp expect-main-config actual
+'
+
+test_expect_success 'includeIf.hasremoteurl globs' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	printf "[user]\ndss = yes\n" >"$(pwd)/double-star-start" &&
+	printf "[user]\ndse = yes\n" >"$(pwd)/double-star-end" &&
+	printf "[user]\ndsm = yes\n" >"$(pwd)/double-star-middle" &&
+	printf "[user]\nssm = yes\n" >"$(pwd)/single-star-middle" &&
+	printf "[user]\nno = no\n" >"$(pwd)/no" &&
+
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = https://foo/bar/baz
+	[includeIf "hasremoteurl:**/baz"]
+		path = "$(pwd)/double-star-start"
+	[includeIf "hasremoteurl:**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasremoteurl:https:/**"]
+		path = "$(pwd)/double-star-end"
+	[includeIf "hasremoteurl:nomatch:/**"]
+		path = "$(pwd)/no"
+	[includeIf "hasremoteurl:https:/**/baz"]
+		path = "$(pwd)/double-star-middle"
+	[includeIf "hasremoteurl:https:/**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasremoteurl:https://*/bar/baz"]
+		path = "$(pwd)/single-star-middle"
+	[includeIf "hasremoteurl:https://*/baz"]
+		path = "$(pwd)/no"
+	EOF
+
+	git -C hasremoteurlTest config --get user.dss &&
+	git -C hasremoteurlTest config --get user.dse &&
+	git -C hasremoteurlTest config --get user.dsm &&
+	git -C hasremoteurlTest config --get user.ssm &&
+	test_must_fail git -C hasremoteurlTest config --get user.no
+'
+
+test_expect_success 'includeIf.hasremoteurl forbids remote url in such included files' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-with-url <<-\EOF &&
+	[remote "bar"]
+		url = bar
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasremoteurl:foo"]
+		path = "$(pwd)/include-with-url"
+	EOF
+
+	# test with any Git command
+	test_must_fail git -C hasremoteurlTest status 2>err &&
+	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl" err
+'
+
 test_done
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v4 0/2] Conditional config includes based on remote URL
  2021-11-29 20:23 ` [PATCH v4 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-11-29 20:23   ` [PATCH v4 1/2] config: make git_config_include() static Jonathan Tan
  2021-11-29 20:23   ` [PATCH v4 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-11-29 20:48   ` Ævar Arnfjörð Bjarmason
  2021-11-30  7:51     ` Junio C Hamano
  2 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-29 20:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, chooglen, gitster


On Mon, Nov 29 2021, Jonathan Tan wrote:

> Thanks everyone for your comments. Here's an update.

Just from skimming this (minor) feedback on v3 still applies:
https://lore.kernel.org/git/211123.86pmqrwtf2.gmgdl@evledraar.gmail.com/

I.e. s/hasremoteurl/hasRemoteURL/ etc. in appropriate places.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-29 18:33       ` Jonathan Tan
@ 2021-11-29 20:50         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-29 20:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, emilyshaffer, peff, gitster


On Mon, Nov 29 2021, Jonathan Tan wrote:

>> On Mon, Nov 15 2021, Jonathan Tan wrote:
>> 
>> > +`hasremoteurl`::
>> > +	The data that follows the keyword `hasremoteurl:` is taken to
>> 
>> Both here..
>> 
>> > +		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl"));
>> 
>> ..and here...
>> 
>> > +		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
>> 
>> ...but not here (C code)..
>> 
>> > +	 * For internal use. Include all includeif.hasremoteurl paths without
>> 
>> ..but here..
>> 
>> > +test_expect_success 'includeIf.hasremoteurl' '
>> 
>> ..and also here etc., let's consistently camelCase config keys whenever
>> we're not using them for lookups in the C
>> code.
>> 
>> I.e. "includeIf.hasRemoteUrl" (possibly "includeIf.hasRemoteURL"?). It
>> makes them a lot easier to read, and makes the end-user documentation &
>> messaging more consistent.
>
> The middle part is not case-insensitive, though - I tried changing it in
> the test and the test now fails. (Unless you mean that we should also
> change the code to make it case-insensitive - but I would think that
> it's better for the URL to be case-sensitive, and by extension, the
> "hasremoteurl:" part connected to it.)

Ah, I forgot about that edge case. sorry. And sent [1] without having
seen this as a reminder on v4. Makes sense.

(I seem to be getting really slow delivery from kernel.org to GMail
these days, sometimes I can see things on lore.kernel.org hours or half
a day before it pops up in my mail...)

1. https://lore.kernel.org/git/211129.864k7ug02c.gmgdl@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v4 0/2] Conditional config includes based on remote URL
  2021-11-29 20:48   ` [PATCH v4 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
@ 2021-11-30  7:51     ` Junio C Hamano
  0 siblings, 0 replies; 87+ messages in thread
From: Junio C Hamano @ 2021-11-30  7:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Tan, git, chooglen

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Mon, Nov 29 2021, Jonathan Tan wrote:
>
>> Thanks everyone for your comments. Here's an update.
>
> Just from skimming this (minor) feedback on v3 still applies:
> https://lore.kernel.org/git/211123.86pmqrwtf2.gmgdl@evledraar.gmail.com/
>
> I.e. s/hasremoteurl/hasRemoteURL/ etc. in appropriate places.

Is there any appropriate place, though?

"hasremoteurl" is a new directive to be used as the leading part of
<condition> in the name of `includeIf.<condition>.path` variable.
The <condtion> part is case sensitive, and we do not want people to
spell it, and the existing "gitdir", "gitdir/i", and "onbranch", in
mixed cases.

See config.c::include_condition_is_true() function and its use of
skip_prefix_mem() to locate these existing conditions.

It is troubling that this patch is *NOT* extend the implementation
of include_condition_is_true() function (which gives a very clean
abstraction and makes the caller very readable); it instead mucks
with the caller of include_condition_is_true() and adds a parallel
logic that include_condition_is_true() does not know about.  It may
have been an expedite way to implement this, and the result may not
seem to hurt when include_condition_is_true() is called by only one
caller, but I find the resulting code structure unnecessarily ugly.

Can't the body of if (skip_prefix_mem(..."hasremoteurl:", ...)) block
become include_by_remoteurl() function, similar to include_by_foo()
functions include_condition_is_true() already calls?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-11-29 18:18       ` Jonathan Tan
@ 2021-12-01 18:51         ` Junio C Hamano
  2021-12-02 23:14           ` Jonathan Tan
  0 siblings, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-12-01 18:51 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, emilyshaffer, peff, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

>> variables we read without the includeIf directive", with variations
>> of "condition" including
>> 
>>  - a literal X is among the values of multi-valued variable Y.
>>  - a pattern X matches one of the values of multi-valued variable Y.
>>  - a literal Y is the name of an existing configuration variable.
>>  - a pattern Y matches the name of an existing configuration variable.
> ...
> code), in this case, unless there is another use case for this, I think
> we should proceed with the use case that we know about first
> (conditional include of a file supplied by a remote repo administrator).

Doing it that way without thinking flexibility through will paint us
into a corner, from which we cannot get out of, doesn't it?

People will start asking "Why should we even have
'hasremoteurl:$URL' variant in 'includeIf' conditions, when one of
the 'variableExists:Y' and friends can express the same thing",
somebody new who is not yet in this community today will propose
deprecating hasremoteurl in favor of more generalized approach and
we have to give a sad answer "no, we earlier made a mistake of
starting with a special case variant for expediency's sake, without
thinking the general cases through.  Because existing users depend
on it, we have to support it til the end of time."

We have the same regret with "why do we need grep.extendedRegexp
when grep.patternType suffices?"  I am reluctant to see us knowingly
commit the same mistake here, unless there is a very good reason.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v4 2/2] config: include file if remote URL matches a glob
  2021-11-29 20:23   ` [PATCH v4 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-12-02  6:57     ` Junio C Hamano
  2021-12-02 17:41       ` Jonathan Tan
  0 siblings, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-12-02  6:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, chooglen, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> +static int add_remote_url(const char *var, const char *value, void *data)
> +{
> +...
> +}
> +
> +static void populate_remote_urls(struct config_include_data *inc)
> +{
> +...
> +}
> +
> +static int forbid_remote_url(const char *var, const char *value, void *data)
> +{
> +...
> +}
> +
> +static int at_least_one_url_matches_glob(const char *glob, int glob_len,
> +					 struct string_list *remote_urls)
> +{
> +...
> +}

All of the above makes sense; you prepare the remote URLs defined in
a string list, and have these helper functions that can determine if
the value given to hasremoteurl:* condition satisfies the condition.

>  static int git_config_include(const char *var, const char *value, void *data)
>  {
>  	struct config_include_data *inc = data;
>  	const char *cond, *key;
>  	size_t cond_len;
> -	int ret;
> +	int ret = 0;
>  
>  	/*
>  	 * Pass along all values, including "include" directives; this makes it
> @@ -335,9 +412,29 @@ static int git_config_include(const char *var, const char *value, void *data)
>  		ret = handle_path_include(value, inc);
>  
>  	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
> -	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
> -	    !strcmp(key, "path"))
> -		ret = handle_path_include(value, inc);
> +	    cond && !strcmp(key, "path")) {
> +		const char *url;
> +		size_t url_len;
> +
> +		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
> +				    &url_len)) {
> +			if (inc->opts->unconditional_remote_url) {
> +				config_fn_t old_fn = inc->fn;
> +
> +				inc->fn = forbid_remote_url;
> +				ret = handle_path_include(value, inc);
> +				inc->fn = old_fn;
> +			} else {
> +				if (!inc->remote_urls)
> +					populate_remote_urls(inc);
> +				if (at_least_one_url_matches_glob(
> +						url, url_len, inc->remote_urls))
> +					ret = handle_path_include(value, inc);
> +			}
> +		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
> +			ret = handle_path_include(value, inc);
> +		}

This looks iffy, especialy in a patch that is not marked as [RFC].

I can see that include_condition_is_true() only passes inc->opts and
you need some other parts of inc for your purpose, and it may be the
primary reason why you munge this caller instead of adding function
include_by_remoteurl() and making include_condition_is_true() call
it.  But wouldn't it be sufficient to pass inc (not inc->opts) to
include_condition_is_true(), and have it dereference inc->opts when
calling include_by_gitdir() and friends that want opts, while
passing inc to include_by_remoteurl()?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v4 2/2] config: include file if remote URL matches a glob
  2021-12-02  6:57     ` Junio C Hamano
@ 2021-12-02 17:41       ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-02 17:41 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, chooglen, avarab

Junio C Hamano <gitster@pobox.com> writes:
> > @@ -335,9 +412,29 @@ static int git_config_include(const char *var, const char *value, void *data)
> >  		ret = handle_path_include(value, inc);
> >  
> >  	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
> > -	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
> > -	    !strcmp(key, "path"))
> > -		ret = handle_path_include(value, inc);
> > +	    cond && !strcmp(key, "path")) {
> > +		const char *url;
> > +		size_t url_len;
> > +
> > +		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
> > +				    &url_len)) {
> > +			if (inc->opts->unconditional_remote_url) {
> > +				config_fn_t old_fn = inc->fn;
> > +
> > +				inc->fn = forbid_remote_url;
> > +				ret = handle_path_include(value, inc);
> > +				inc->fn = old_fn;
> > +			} else {
> > +				if (!inc->remote_urls)
> > +					populate_remote_urls(inc);
> > +				if (at_least_one_url_matches_glob(
> > +						url, url_len, inc->remote_urls))
> > +					ret = handle_path_include(value, inc);
> > +			}
> > +		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
> > +			ret = handle_path_include(value, inc);
> > +		}
> 
> This looks iffy, especialy in a patch that is not marked as [RFC].
> 
> I can see that include_condition_is_true() only passes inc->opts and
> you need some other parts of inc for your purpose, and it may be the
> primary reason why you munge this caller instead of adding function
> include_by_remoteurl() and making include_condition_is_true() call
> it.  But wouldn't it be sufficient to pass inc (not inc->opts) to
> include_condition_is_true(), and have it dereference inc->opts when
> calling include_by_gitdir() and friends that want opts, while
> passing inc to include_by_remoteurl()?

Thanks for taking a look. I think the primary reason why I wrote it like
this was because originally (in v1) I included at the end, not inline,
but of course that is no longer the case so I should revisit it. I'll
have to pay attention to how inc->fn is swapped for forbid_remote_url
(to prevent remote URLs from being configured in included files), but
perhaps handle_path_include() can do that. I'll take a look.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 2/2] config: include file if remote URL matches a glob
  2021-12-01 18:51         ` Junio C Hamano
@ 2021-12-02 23:14           ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-02 23:14 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, emilyshaffer, peff, avarab

Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >> variables we read without the includeIf directive", with variations
> >> of "condition" including
> >> 
> >>  - a literal X is among the values of multi-valued variable Y.
> >>  - a pattern X matches one of the values of multi-valued variable Y.
> >>  - a literal Y is the name of an existing configuration variable.
> >>  - a pattern Y matches the name of an existing configuration variable.
> > ...
> > code), in this case, unless there is another use case for this, I think
> > we should proceed with the use case that we know about first
> > (conditional include of a file supplied by a remote repo administrator).
> 
> Doing it that way without thinking flexibility through will paint us
> into a corner, from which we cannot get out of, doesn't it?
> 
> People will start asking "Why should we even have
> 'hasremoteurl:$URL' variant in 'includeIf' conditions, when one of
> the 'variableExists:Y' and friends can express the same thing",
> somebody new who is not yet in this community today will propose
> deprecating hasremoteurl in favor of more generalized approach and
> we have to give a sad answer "no, we earlier made a mistake of
> starting with a special case variant for expediency's sake, without
> thinking the general cases through.  Because existing users depend
> on it, we have to support it til the end of time."
> 
> We have the same regret with "why do we need grep.extendedRegexp
> when grep.patternType suffices?"  I am reluctant to see us knowingly
> commit the same mistake here, unless there is a very good reason.

Hmm...that's true. I was thinking that there wouldn't be a way to
predict exactly what we'll need, but perhaps making the config variable
name be of the form 'includeIf."hasconfig:remote.*.url".path' might give
us enough flexibility for the future.. I'll take a look.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v5 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (7 preceding siblings ...)
  2021-11-29 20:23 ` [PATCH v4 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2021-12-02 23:31 ` Jonathan Tan
  2021-12-02 23:31   ` [PATCH v5 1/2] config: make git_config_include() static Jonathan Tan
                     ` (2 more replies)
  2021-12-07 23:23 ` [PATCH v6 " Jonathan Tan
                   ` (2 subsequent siblings)
  11 siblings, 3 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-02 23:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster

Thanks, Junio, for your comments. I think the code is more clearly laid                                  
out now.                                                                                                 
                                                                                                         
The main changes from v4 are that I've maintained the existing code
structure more, and changed the keyword used to something that hopefully
will be more forwards compatible. I've also updated the documentation to
explain the forwards compatibility idea.

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 Documentation/config.txt |  16 +++++
 config.c                 | 134 +++++++++++++++++++++++++++++++++++----
 config.h                 |  46 ++++----------
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++
 4 files changed, 270 insertions(+), 44 deletions(-)

Range-diff against v4:
1:  b2dcae03ed = 1:  b2dcae03ed config: make git_config_include() static
2:  3b3af0da98 ! 2:  d3b8e00717 config: include file if remote URL matches a glob
    @@ Documentation/config.txt: all branches that begin with `foo/`. This is useful if
      organized hierarchically and you would like to apply a configuration to
      all the branches in that hierarchy.
      
    -+`hasremoteurl`::
    -+	The data that follows the keyword `hasremoteurl:` is taken to
    ++`hasconfig:remote.*.url:`::
    ++	The data that follows this keyword is taken to
     +	be a pattern with standard globbing wildcards and two
     +	additional ones, `**/` and `/**`, that can match multiple
     +	components. The first time this keyword is seen, the rest of
    @@ Documentation/config.txt: all branches that begin with `foo/`. This is useful if
     ++
     +Files included by this option (directly or indirectly) are not allowed
     +to contain remote URLs.
    +++
    ++This keyword is designed to be forwards compatible with a naming
    ++scheme that supports more variable-based include conditions, but
    ++currently Git only supports the exact keyword described above.
     +
      A few more notes on matching via `gitdir` and `gitdir/i`:
      
    @@ config.c: struct config_include_data {
      };
      #define CONFIG_INCLUDE_INIT { 0 }
      
    -@@ config.c: static int include_condition_is_true(const struct config_options *opts,
    - 	return 0;
    +@@ config.c: static int include_by_branch(const char *cond, size_t cond_len)
    + 	return ret;
      }
      
    +-static int include_condition_is_true(const struct config_options *opts,
     +static int add_remote_url(const char *var, const char *value, void *data)
     +{
     +	struct string_list *remote_urls = data;
    @@ config.c: static int include_condition_is_true(const struct config_options *opts
     +			      &key) &&
     +	    remote_name &&
     +	    !strcmp(key, "url"))
    -+		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl"));
    ++		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url"));
     +	return 0;
     +}
     +
    @@ config.c: static int include_condition_is_true(const struct config_options *opts
     +	return found;
     +}
     +
    - static int git_config_include(const char *var, const char *value, void *data)
    ++static int include_condition_is_true(struct config_include_data *inc,
    + 				     const char *cond, size_t cond_len)
      {
    - 	struct config_include_data *inc = data;
    - 	const char *cond, *key;
    - 	size_t cond_len;
    --	int ret;
    -+	int ret = 0;
    ++	const struct config_options *opts = inc->opts;
    + 
    +-	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
    ++	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len)) {
    + 		return include_by_gitdir(opts, cond, cond_len, 0);
    +-	else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len))
    ++	} else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len)) {
    + 		return include_by_gitdir(opts, cond, cond_len, 1);
    +-	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
    ++	} else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len)) {
    + 		return include_by_branch(cond, cond_len);
    ++	} else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
    ++				   &cond_len)) {
    ++		if (inc->opts->unconditional_remote_url)
    ++			return 1;
    ++		if (!inc->remote_urls)
    ++			populate_remote_urls(inc);
    ++		return at_least_one_url_matches_glob(cond, cond_len,
    ++						     inc->remote_urls);
    ++	}
      
    - 	/*
    - 	 * Pass along all values, including "include" directives; this makes it
    + 	/* unknown conditionals are always false */
    + 	return 0;
     @@ config.c: static int git_config_include(const char *var, const char *value, void *data)
      		ret = handle_path_include(value, inc);
      
      	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
     -	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
     -	    !strcmp(key, "path"))
    --		ret = handle_path_include(value, inc);
    -+	    cond && !strcmp(key, "path")) {
    -+		const char *url;
    -+		size_t url_len;
    -+
    -+		if (skip_prefix_mem(cond, cond_len, "hasremoteurl:", &url,
    -+				    &url_len)) {
    -+			if (inc->opts->unconditional_remote_url) {
    -+				config_fn_t old_fn = inc->fn;
    -+
    -+				inc->fn = forbid_remote_url;
    -+				ret = handle_path_include(value, inc);
    -+				inc->fn = old_fn;
    -+			} else {
    -+				if (!inc->remote_urls)
    -+					populate_remote_urls(inc);
    -+				if (at_least_one_url_matches_glob(
    -+						url, url_len, inc->remote_urls))
    -+					ret = handle_path_include(value, inc);
    -+			}
    -+		} else if (include_condition_is_true(inc->opts, cond, cond_len)) {
    -+			ret = handle_path_include(value, inc);
    -+		}
    ++	    cond && include_condition_is_true(inc, cond, cond_len) &&
    ++	    !strcmp(key, "path")) {
    ++		config_fn_t old_fn = inc->fn;
    ++
    ++		if (inc->opts->unconditional_remote_url)
    ++			inc->fn = forbid_remote_url;
    + 		ret = handle_path_include(value, inc);
    ++		if (inc->opts->unconditional_remote_url)
    ++			inc->fn = old_fn;
     +	}
      
      	return ret;
    @@ config.h: struct config_options {
     +
     +	/*
     +	 * For internal use. Include all includeif.hasremoteurl paths without
    -+	 * checking if the repo has that remote URL.
    ++	 * checking if the repo has that remote URL, and when doing so, verify
    ++	 * that files included in this way do not configure any remote URLs
    ++	 * themselves.
     +	 */
     +	unsigned int unconditional_remote_url : 1;
     +
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
      	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
      '
      
    -+test_expect_success 'includeIf.hasremoteurl' '
    ++test_expect_success 'includeIf.hasconfig:remote.*.url' '
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +		that = that-is-not-included
     +	EOF
     +	cat >>hasremoteurlTest/.git/config <<-EOF &&
    -+	[includeIf "hasremoteurl:foo"]
    ++	[includeIf "hasconfig:remote.*.url:foo"]
     +		path = "$(pwd)/include-this"
    -+	[includeIf "hasremoteurl:bar"]
    ++	[includeIf "hasconfig:remote.*.url:bar"]
     +		path = "$(pwd)/dont-include-that"
     +	[remote "foo"]
     +		url = foo
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	test_must_fail git -C hasremoteurlTest config --get user.that
     +'
     +
    -+test_expect_success 'includeIf.hasremoteurl respects last-config-wins' '
    ++test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins' '
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	[user]
     +		one = main-config
     +		two = main-config
    -+	[includeIf "hasremoteurl:foo"]
    ++	[includeIf "hasconfig:remote.*.url:foo"]
     +		path = "$(pwd)/include-two-three"
     +	[user]
     +		three = main-config
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	test_cmp expect-main-config actual
     +'
     +
    -+test_expect_success 'includeIf.hasremoteurl globs' '
    ++test_expect_success 'includeIf.hasconfig:remote.*.url globs' '
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	cat >>hasremoteurlTest/.git/config <<-EOF &&
     +	[remote "foo"]
     +		url = https://foo/bar/baz
    -+	[includeIf "hasremoteurl:**/baz"]
    ++	[includeIf "hasconfig:remote.*.url:**/baz"]
     +		path = "$(pwd)/double-star-start"
    -+	[includeIf "hasremoteurl:**/nomatch"]
    ++	[includeIf "hasconfig:remote.*.url:**/nomatch"]
     +		path = "$(pwd)/no"
    -+	[includeIf "hasremoteurl:https:/**"]
    ++	[includeIf "hasconfig:remote.*.url:https:/**"]
     +		path = "$(pwd)/double-star-end"
    -+	[includeIf "hasremoteurl:nomatch:/**"]
    ++	[includeIf "hasconfig:remote.*.url:nomatch:/**"]
     +		path = "$(pwd)/no"
    -+	[includeIf "hasremoteurl:https:/**/baz"]
    ++	[includeIf "hasconfig:remote.*.url:https:/**/baz"]
     +		path = "$(pwd)/double-star-middle"
    -+	[includeIf "hasremoteurl:https:/**/nomatch"]
    ++	[includeIf "hasconfig:remote.*.url:https:/**/nomatch"]
     +		path = "$(pwd)/no"
    -+	[includeIf "hasremoteurl:https://*/bar/baz"]
    ++	[includeIf "hasconfig:remote.*.url:https://*/bar/baz"]
     +		path = "$(pwd)/single-star-middle"
    -+	[includeIf "hasremoteurl:https://*/baz"]
    ++	[includeIf "hasconfig:remote.*.url:https://*/baz"]
     +		path = "$(pwd)/no"
     +	EOF
     +
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	test_must_fail git -C hasremoteurlTest config --get user.no
     +'
     +
    -+test_expect_success 'includeIf.hasremoteurl forbids remote url in such included files' '
    ++test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such included files' '
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +		url = bar
     +	EOF
     +	cat >>hasremoteurlTest/.git/config <<-EOF &&
    -+	[includeIf "hasremoteurl:foo"]
    ++	[includeIf "hasconfig:remote.*.url:foo"]
     +		path = "$(pwd)/include-with-url"
     +	EOF
     +
     +	# test with any Git command
     +	test_must_fail git -C hasremoteurlTest status 2>err &&
    -+	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasremoteurl" err
    ++	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
     +'
     +
      test_done
-- 
2.34.1.400.ga245620fadb-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v5 1/2] config: make git_config_include() static
  2021-12-02 23:31 ` [PATCH v5 " Jonathan Tan
@ 2021-12-02 23:31   ` Jonathan Tan
  2021-12-02 23:31   ` [PATCH v5 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-12-06 18:57   ` [PATCH v5 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-02 23:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2dcbe901b6..94ad5ce913 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index f119de0130..48a5e472ca 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.34.1.400.ga245620fadb-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 2/2] config: include file if remote URL matches a glob
  2021-12-02 23:31 ` [PATCH v5 " Jonathan Tan
  2021-12-02 23:31   ` [PATCH v5 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-12-02 23:31   ` Jonathan Tan
  2021-12-06 22:32     ` Glen Choo
  2021-12-06 18:57   ` [PATCH v5 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-12-02 23:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager - it could, for example,
contain a file to be included conditionally and a post-install script
that adds the include directive to the system-wide config file).

In order to do this, Git reruns the config parsing mechanism upon
noticing the first URL-conditional include in order to find all remote
URLs, and these remote URLs are then used to determine if that first and
all subsequent includes are executed. Remote URLs are not allowed to be
configued in any URL-conditionally-included file.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt |  16 +++++
 config.c                 | 122 +++++++++++++++++++++++++++++++++++----
 config.h                 |   9 +++
 t/t1300-config.sh        | 118 +++++++++++++++++++++++++++++++++++++
 4 files changed, 255 insertions(+), 10 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0c0e6b859f..e0e5ca558e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -159,6 +159,22 @@ all branches that begin with `foo/`. This is useful if your branches are
 organized hierarchically and you would like to apply a configuration to
 all the branches in that hierarchy.
 
+`hasconfig:remote.*.url:`::
+	The data that follows this keyword is taken to
+	be a pattern with standard globbing wildcards and two
+	additional ones, `**/` and `/**`, that can match multiple
+	components. The first time this keyword is seen, the rest of
+	the config files will be scanned for remote URLs (without
+	applying any values). If there exists at least one remote URL
+	that matches this pattern, the include condition is met.
++
+Files included by this option (directly or indirectly) are not allowed
+to contain remote URLs.
++
+This keyword is designed to be forwards compatible with a naming
+scheme that supports more variable-based include conditions, but
+currently Git only supports the exact keyword described above.
+
 A few more notes on matching via `gitdir` and `gitdir/i`:
 
  * Symlinks in `$GIT_DIR` are not resolved before matching.
diff --git a/config.c b/config.c
index 94ad5ce913..d2cf95add2 100644
--- a/config.c
+++ b/config.c
@@ -125,6 +125,12 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+	struct git_config_source *config_source;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list *remote_urls;
 };
 #define CONFIG_INCLUDE_INIT { 0 }
 
@@ -301,16 +307,97 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	return ret;
 }
 
-static int include_condition_is_true(const struct config_options *opts,
+static int add_remote_url(const char *var, const char *value, void *data)
+{
+	struct string_list *remote_urls = data;
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(remote_urls, value);
+	return 0;
+}
+
+static void populate_remote_urls(struct config_include_data *inc)
+{
+	struct config_options opts;
+
+	struct config_source *store_cf = cf;
+	struct key_value_info *store_kvi = current_config_kvi;
+	enum config_scope store_scope = current_parsing_scope;
+
+	opts = *inc->opts;
+	opts.unconditional_remote_url = 1;
+
+	cf = NULL;
+	current_config_kvi = NULL;
+	current_parsing_scope = 0;
+
+	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
+	string_list_init_dup(inc->remote_urls);
+	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+
+	cf = store_cf;
+	current_config_kvi = store_kvi;
+	current_parsing_scope = store_scope;
+}
+
+static int forbid_remote_url(const char *var, const char *value, void *data)
+{
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url"));
+	return 0;
+}
+
+static int at_least_one_url_matches_glob(const char *glob, int glob_len,
+					 struct string_list *remote_urls)
+{
+	struct strbuf pattern = STRBUF_INIT;
+	struct string_list_item *url_item;
+	int found = 0;
+
+	strbuf_add(&pattern, glob, glob_len);
+	for_each_string_list_item(url_item, remote_urls) {
+		if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+			found = 1;
+			break;
+		}
+	}
+	strbuf_release(&pattern);
+	return found;
+}
+
+static int include_condition_is_true(struct config_include_data *inc,
 				     const char *cond, size_t cond_len)
 {
+	const struct config_options *opts = inc->opts;
 
-	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
+	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len)) {
 		return include_by_gitdir(opts, cond, cond_len, 0);
-	else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len))
+	} else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len)) {
 		return include_by_gitdir(opts, cond, cond_len, 1);
-	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
+	} else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len)) {
 		return include_by_branch(cond, cond_len);
+	} else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
+				   &cond_len)) {
+		if (inc->opts->unconditional_remote_url)
+			return 1;
+		if (!inc->remote_urls)
+			populate_remote_urls(inc);
+		return at_least_one_url_matches_glob(cond, cond_len,
+						     inc->remote_urls);
+	}
 
 	/* unknown conditionals are always false */
 	return 0;
@@ -335,9 +422,16 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
+	    cond && include_condition_is_true(inc, cond, cond_len) &&
+	    !strcmp(key, "path")) {
+		config_fn_t old_fn = inc->fn;
+
+		if (inc->opts->unconditional_remote_url)
+			inc->fn = forbid_remote_url;
 		ret = handle_path_include(value, inc);
+		if (inc->opts->unconditional_remote_url)
+			inc->fn = old_fn;
+	}
 
 	return ret;
 }
@@ -1933,11 +2027,13 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
 		inc.data = data;
 		inc.opts = opts;
+		inc.config_source = config_source;
 		fn = git_config_include;
 		data = &inc;
 	}
@@ -1950,17 +2046,23 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	if (inc.remote_urls) {
+		string_list_clear(inc.remote_urls, 0);
+		FREE_AND_NULL(inc.remote_urls);
+	}
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/config.h b/config.h
index 48a5e472ca..ab0106d287 100644
--- a/config.h
+++ b/config.h
@@ -89,6 +89,15 @@ struct config_options {
 	unsigned int ignore_worktree : 1;
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
+
+	/*
+	 * For internal use. Include all includeif.hasremoteurl paths without
+	 * checking if the repo has that remote URL, and when doing so, verify
+	 * that files included in this way do not configure any remote URLs
+	 * themselves.
+	 */
+	unsigned int unconditional_remote_url : 1;
+
 	const char *commondir;
 	const char *git_dir;
 	config_parser_event_fn_t event_fn;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..0f7bae31b4 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,122 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasconfig:remote.*.url' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >"$(pwd)"/dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasconfig:remote.*.url:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-two-three <<-\EOF &&
+	[user]
+		two = included-config
+		three = included-config
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = foo
+	[user]
+		one = main-config
+		two = main-config
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-two-three"
+	[user]
+		three = main-config
+	EOF
+
+	echo main-config >expect-main-config &&
+	echo included-config >expect-included-config &&
+
+	git -C hasremoteurlTest config --get user.one >actual &&
+	test_cmp expect-main-config actual &&
+
+	git -C hasremoteurlTest config --get user.two >actual &&
+	test_cmp expect-included-config actual &&
+
+	git -C hasremoteurlTest config --get user.three >actual &&
+	test_cmp expect-main-config actual
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url globs' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	printf "[user]\ndss = yes\n" >"$(pwd)/double-star-start" &&
+	printf "[user]\ndse = yes\n" >"$(pwd)/double-star-end" &&
+	printf "[user]\ndsm = yes\n" >"$(pwd)/double-star-middle" &&
+	printf "[user]\nssm = yes\n" >"$(pwd)/single-star-middle" &&
+	printf "[user]\nno = no\n" >"$(pwd)/no" &&
+
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = https://foo/bar/baz
+	[includeIf "hasconfig:remote.*.url:**/baz"]
+		path = "$(pwd)/double-star-start"
+	[includeIf "hasconfig:remote.*.url:**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**"]
+		path = "$(pwd)/double-star-end"
+	[includeIf "hasconfig:remote.*.url:nomatch:/**"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**/baz"]
+		path = "$(pwd)/double-star-middle"
+	[includeIf "hasconfig:remote.*.url:https:/**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https://*/bar/baz"]
+		path = "$(pwd)/single-star-middle"
+	[includeIf "hasconfig:remote.*.url:https://*/baz"]
+		path = "$(pwd)/no"
+	EOF
+
+	git -C hasremoteurlTest config --get user.dss &&
+	git -C hasremoteurlTest config --get user.dse &&
+	git -C hasremoteurlTest config --get user.dsm &&
+	git -C hasremoteurlTest config --get user.ssm &&
+	test_must_fail git -C hasremoteurlTest config --get user.no
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such included files' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >"$(pwd)"/include-with-url <<-\EOF &&
+	[remote "bar"]
+		url = bar
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-with-url"
+	EOF
+
+	# test with any Git command
+	test_must_fail git -C hasremoteurlTest status 2>err &&
+	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
+'
+
 test_done
-- 
2.34.1.400.ga245620fadb-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 0/2] Conditional config includes based on remote URL
  2021-12-02 23:31 ` [PATCH v5 " Jonathan Tan
  2021-12-02 23:31   ` [PATCH v5 1/2] config: make git_config_include() static Jonathan Tan
  2021-12-02 23:31   ` [PATCH v5 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-12-06 18:57   ` Ævar Arnfjörð Bjarmason
  2021-12-07 17:46     ` Jonathan Tan
  2 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-06 18:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, chooglen, gitster


On Thu, Dec 02 2021, Jonathan Tan wrote:

> Thanks, Junio, for your comments. I think the code is more clearly laid                                  
> out now.                                                                                                 
>                                                                                                          
> The main changes from v4 are that I've maintained the existing code
> structure more, and changed the keyword used to something that hopefully
> will be more forwards compatible. I've also updated the documentation to
> explain the forwards compatibility idea.

I read through this and came up with the below as a proposed squash-in
just while reading through it. These may or may not help. Changes:

 * There was some needless "$(pwd)" in the tests
 * Inlining the "remote_urls" in the struct makes its management easier;
   and the free/NULL checks just check .nr now, and string_list_clear() can be
   unconditional.
 * Created a include_by_remote_url() function. Makes the overall diff smaller
   since you don't need to add braces to everything in include_condition_is_true()

Other comments (not related to the below):

 * It would be nice if e.g. the "includeIf.hasconfig:remote.*.url globs" test
   were split up by condition, but maybe that's a hassle (would need a small helper).

   Just something that would have helped while hacking on this, i.e. now most of it
   was an all-or-nothing failure & peek at the trace output

 * Your last test appears to entirely forbid recursion. I.e. we die if you include config
   which in turn tries to use this include mechanism, right?

   That's probably wise, and it is explicitly documented.

   But as far as the documentation about this being a forward-compatible facility, do we
   think that this limitation would apply to any future config key? I.e. if I include based
   on "user.email" nothing in that to-be-included can set user.email?

   That's probably OK, just wondering. In any case it can always be expanded later on.

diff --git a/config.c b/config.c
index 39ac38e0e78..91b0a328e59 100644
--- a/config.c
+++ b/config.c
@@ -130,9 +130,11 @@ struct config_include_data {
 	/*
 	 * All remote URLs discovered when reading all config files.
 	 */
-	struct string_list *remote_urls;
+	struct string_list remote_urls;
 };
-#define CONFIG_INCLUDE_INIT { 0 }
+#define CONFIG_INCLUDE_INIT { \
+	.remote_urls = STRING_LIST_INIT_DUP, \
+}
 
 static int git_config_include(const char *var, const char *value, void *data);
 
@@ -340,9 +342,7 @@ static void populate_remote_urls(struct config_include_data *inc)
 	current_config_kvi = NULL;
 	current_parsing_scope = 0;
 
-	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
-	string_list_init_dup(inc->remote_urls);
-	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+	config_with_options(add_remote_url, &inc->remote_urls, inc->config_source, &opts);
 
 	cf = store_cf;
 	current_config_kvi = store_kvi;
@@ -381,26 +381,31 @@ static int at_least_one_url_matches_glob(const char *glob, int glob_len,
 	return found;
 }
 
+static int include_by_remote_url(struct config_include_data *inc,
+				 const char *cond, size_t cond_len)
+{
+	if (inc->opts->unconditional_remote_url)
+		return 1;
+	if (!inc->remote_urls.nr)
+		populate_remote_urls(inc);
+	return at_least_one_url_matches_glob(cond, cond_len,
+					     &inc->remote_urls);
+}
+
 static int include_condition_is_true(struct config_include_data *inc,
 				     const char *cond, size_t cond_len)
 {
 	const struct config_options *opts = inc->opts;
 
-	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len)) {
+	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
 		return include_by_gitdir(opts, cond, cond_len, 0);
-	} else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len)) {
+	else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len))
 		return include_by_gitdir(opts, cond, cond_len, 1);
-	} else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len)) {
+	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
 		return include_by_branch(cond, cond_len);
-	} else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
-				   &cond_len)) {
-		if (inc->opts->unconditional_remote_url)
-			return 1;
-		if (!inc->remote_urls)
-			populate_remote_urls(inc);
-		return at_least_one_url_matches_glob(cond, cond_len,
-						     inc->remote_urls);
-	}
+	else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
+				   &cond_len))
+		return include_by_remote_url(inc, cond, cond_len);
 
 	/* unknown conditionals are always false */
 	return 0;
@@ -2061,10 +2066,7 @@ int config_with_options(config_fn_t fn, void *data,
 		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	if (inc.remote_urls) {
-		string_list_clear(inc.remote_urls, 0);
-		FREE_AND_NULL(inc.remote_urls);
-	}
+	string_list_clear(&inc.remote_urls, 0);
 	return ret;
 }
 
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 0f7bae31b4b..8310562b842 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2391,11 +2391,11 @@ test_expect_success 'includeIf.hasconfig:remote.*.url' '
 	git init hasremoteurlTest &&
 	test_when_finished "rm -rf hasremoteurlTest" &&
 
-	cat >"$(pwd)"/include-this <<-\EOF &&
+	cat >include-this <<-\EOF &&
 	[user]
 		this = this-is-included
 	EOF
-	cat >"$(pwd)"/dont-include-that <<-\EOF &&
+	cat >dont-include-that <<-\EOF &&
 	[user]
 		that = that-is-not-included
 	EOF
@@ -2419,7 +2419,7 @@ test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins'
 	git init hasremoteurlTest &&
 	test_when_finished "rm -rf hasremoteurlTest" &&
 
-	cat >"$(pwd)"/include-two-three <<-\EOF &&
+	cat >include-two-three <<-\EOF &&
 	[user]
 		two = included-config
 		three = included-config
@@ -2453,11 +2453,11 @@ test_expect_success 'includeIf.hasconfig:remote.*.url globs' '
 	git init hasremoteurlTest &&
 	test_when_finished "rm -rf hasremoteurlTest" &&
 
-	printf "[user]\ndss = yes\n" >"$(pwd)/double-star-start" &&
-	printf "[user]\ndse = yes\n" >"$(pwd)/double-star-end" &&
-	printf "[user]\ndsm = yes\n" >"$(pwd)/double-star-middle" &&
-	printf "[user]\nssm = yes\n" >"$(pwd)/single-star-middle" &&
-	printf "[user]\nno = no\n" >"$(pwd)/no" &&
+	printf "[user]\ndss = yes\n" >double-star-start &&
+	printf "[user]\ndse = yes\n" >double-star-end &&
+	printf "[user]\ndsm = yes\n" >double-star-middle &&
+	printf "[user]\nssm = yes\n" >single-star-middle &&
+	printf "[user]\nno = no\n" >no &&
 
 	cat >>hasremoteurlTest/.git/config <<-EOF &&
 	[remote "foo"]
@@ -2491,7 +2491,7 @@ test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such
 	git init hasremoteurlTest &&
 	test_when_finished "rm -rf hasremoteurlTest" &&
 
-	cat >"$(pwd)"/include-with-url <<-\EOF &&
+	cat >include-with-url <<-\EOF &&
 	[remote "bar"]
 		url = bar
 	EOF


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 2/2] config: include file if remote URL matches a glob
  2021-12-02 23:31   ` [PATCH v5 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-12-06 22:32     ` Glen Choo
  2021-12-07 17:53       ` Jonathan Tan
  0 siblings, 1 reply; 87+ messages in thread
From: Glen Choo @ 2021-12-06 22:32 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: Jonathan Tan, gitster

Jonathan Tan <jonathantanmy@google.com> writes:

>  Documentation/config.txt |  16 +++++
>  config.c                 | 122 +++++++++++++++++++++++++++++++++++----
>  config.h                 |   9 +++
>  t/t1300-config.sh        | 118 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 255 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0c0e6b859f..e0e5ca558e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -159,6 +159,22 @@ all branches that begin with `foo/`. This is useful if your branches are
>  organized hierarchically and you would like to apply a configuration to
>  all the branches in that hierarchy.
>  
> +`hasconfig:remote.*.url:`::
> +	The data that follows this keyword is taken to
> +	be a pattern with standard globbing wildcards and two
> +	additional ones, `**/` and `/**`, that can match multiple
> +	components. The first time this keyword is seen, the rest of
> +	the config files will be scanned for remote URLs (without
> +	applying any values). If there exists at least one remote URL
> +	that matches this pattern, the include condition is met.
> ++
> +Files included by this option (directly or indirectly) are not allowed
> +to contain remote URLs.
> ++
> +This keyword is designed to be forwards compatible with a naming
> +scheme that supports more variable-based include conditions, but
> +currently Git only supports the exact keyword described above.
> +

A reader of this description doesn't have any reason to think that
`hasconfig:remote.*.url` wouldn't respect in-place semantics, so my
concern in [1] is addressed.

`hasconfig:foo.*.bar` seems reasonable from a forwards-compatibility
perspective. Ideally, it would be nice to see a generic implementation
that actually handles config values beyond `remote.*.url`, but unless we
take a closer look at all config values and the conditions we would like
to support, a generic implementation seems like a premature
optimization that won't age well.

So OK to having a forward-compatible name without a forward compatible
implementation.

> +static int include_condition_is_true(struct config_include_data *inc,
>  				     const char *cond, size_t cond_len)
>  {
> +	const struct config_options *opts = inc->opts;
>  
> -	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
> +	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len)) {
>  		return include_by_gitdir(opts, cond, cond_len, 0);
> -	else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len))
> +	} else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len)) {
>  		return include_by_gitdir(opts, cond, cond_len, 1);
> -	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
> +	} else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len)) {
>  		return include_by_branch(cond, cond_len);
> +	} else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
> +				   &cond_len)) {
> +		if (inc->opts->unconditional_remote_url)
> +			return 1;
> +		if (!inc->remote_urls)
> +			populate_remote_urls(inc);
> +		return at_least_one_url_matches_glob(cond, cond_len,
> +						     inc->remote_urls);
> +	}
>  
>  	/* unknown conditionals are always false */
>  	return 0;

Nit: I have a preference for Ævar's version [2], which looks more
consistent with the rest of the function i.e. handling the match using a
helper function.

> +test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such included files' '
> +	git init hasremoteurlTest &&
> +	test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +	cat >"$(pwd)"/include-with-url <<-\EOF &&
> +	[remote "bar"]
> +		url = bar
> +	EOF
> +	cat >>hasremoteurlTest/.git/config <<-EOF &&
> +	[includeIf "hasconfig:remote.*.url:foo"]
> +		path = "$(pwd)/include-with-url"
> +	EOF
> +
> +	# test with any Git command
> +	test_must_fail git -C hasremoteurlTest status 2>err &&
> +	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
> +'
> +
>  test_done
> -- 
> 2.34.1.400.ga245620fadb-goog

This addresses the test coverage comment in [1]. Great!

[1] https://lore.kernel.org/git/kl6lilwjre3m.fsf@chooglen-macbookpro.roam.corp.google.com
[2] https://lore.kernel.org/git/211206.86zgpdpmyy.gmgdl@evledraar.gmail.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 0/2] Conditional config includes based on remote URL
  2021-12-06 18:57   ` [PATCH v5 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
@ 2021-12-07 17:46     ` Jonathan Tan
  2021-12-07 17:56       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2021-12-07 17:46 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git, chooglen, gitster

> I read through this and came up with the below as a proposed squash-in
> just while reading through it. These may or may not help. Changes:
> 
>  * There was some needless "$(pwd)" in the tests

Ah, thanks for catching that.

>  * Inlining the "remote_urls" in the struct makes its management easier;
>    and the free/NULL checks just check .nr now, and string_list_clear() can be
>    unconditional.

I don't think we can do this - nr might still be 0 after a scan if we
don't have remote URLs for some reason, so we still need to distinguish
between not-scanned and scanned-with-zero-URLs.

>  * Created a include_by_remote_url() function. Makes the overall diff smaller
>    since you don't need to add braces to everything in include_condition_is_true()

Ah, good idea. I'll do this.

> Other comments (not related to the below):
> 
>  * It would be nice if e.g. the "includeIf.hasconfig:remote.*.url globs" test
>    were split up by condition, but maybe that's a hassle (would need a small helper).
> 
>    Just something that would have helped while hacking on this, i.e. now most of it
>    was an all-or-nothing failure & peek at the trace output

What do you mean by condition? There seems to only be one condition
(whether the URL is there or not), unless you were thinking of smaller
subdivisions.

>  * Your last test appears to entirely forbid recursion. I.e. we die if you include config
>    which in turn tries to use this include mechanism, right?
> 
>    That's probably wise, and it is explicitly documented.
> 
>    But as far as the documentation about this being a forward-compatible facility, do we
>    think that this limitation would apply to any future config key? I.e. if I include based
>    on "user.email" nothing in that to-be-included can set user.email?
> 
>    That's probably OK, just wondering. In any case it can always be expanded later on.

We can decide later what the future facility will be, but I envision
that we will not allow included files to set config that can affect any
include directives in use. So, for example, if I have a user.email-based
include, none of my config-conditionally included files can set user.email.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 2/2] config: include file if remote URL matches a glob
  2021-12-06 22:32     ` Glen Choo
@ 2021-12-07 17:53       ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-07 17:53 UTC (permalink / raw)
  To: chooglen; +Cc: jonathantanmy, git, gitster

Glen Choo <chooglen@google.com> writes:
> A reader of this description doesn't have any reason to think that
> `hasconfig:remote.*.url` wouldn't respect in-place semantics, so my
> concern in [1] is addressed.
> 
> `hasconfig:foo.*.bar` seems reasonable from a forwards-compatibility
> perspective. Ideally, it would be nice to see a generic implementation
> that actually handles config values beyond `remote.*.url`, but unless we
> take a closer look at all config values and the conditions we would like
> to support, a generic implementation seems like a premature
> optimization that won't age well.
> 
> So OK to having a forward-compatible name without a forward compatible
> implementation.

Thanks for taking a look at this.

> Nit: I have a preference for Ævar's version [2], which looks more
> consistent with the rest of the function i.e. handling the match using a
> helper function.

I agree - I'll use it.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 0/2] Conditional config includes based on remote URL
  2021-12-07 17:46     ` Jonathan Tan
@ 2021-12-07 17:56       ` Ævar Arnfjörð Bjarmason
  2021-12-07 18:52         ` Jonathan Tan
  0 siblings, 1 reply; 87+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-07 17:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, chooglen, gitster


On Tue, Dec 07 2021, Jonathan Tan wrote:

>> I read through this and came up with the below as a proposed squash-in
>> just while reading through it. These may or may not help. Changes:
>> 
>>  * There was some needless "$(pwd)" in the tests
>
> Ah, thanks for catching that.
>
>>  * Inlining the "remote_urls" in the struct makes its management easier;
>>    and the free/NULL checks just check .nr now, and string_list_clear() can be
>>    unconditional.
>
> I don't think we can do this - nr might still be 0 after a scan if we
> don't have remote URLs for some reason, so we still need to distinguish
> between not-scanned and scanned-with-zero-URLs.

You mean so that we don't double-free? The way string_list_clear()
protects against that, but maybe there's something else.

Whatever it is (if there's anything) it could use test coverage then :)

>>  * Created a include_by_remote_url() function. Makes the overall diff smaller
>>    since you don't need to add braces to everything in include_condition_is_true()
>
> Ah, good idea. I'll do this.
>
>> Other comments (not related to the below):
>> 
>>  * It would be nice if e.g. the "includeIf.hasconfig:remote.*.url globs" test
>>    were split up by condition, but maybe that's a hassle (would need a small helper).
>> 
>>    Just something that would have helped while hacking on this, i.e. now most of it
>>    was an all-or-nothing failure & peek at the trace output
>
> What do you mean by condition? There seems to only be one condition
> (whether the URL is there or not), unless you were thinking of smaller
> subdivisions.

Maybe I'm just misunderstanding the intent here, but aren't you trying
to guard against the case of having a ~/.gitconfig that includes
~/.gitconfig.d/for-this-url, and *that* file in turns changes the
remote's "url" in its config, followed by another "include if url
matches" condition therein?

I.e. I read (more like skimmed) the documentation & test at the end as
forbidding that, but maybe that's OK?

>>  * Your last test appears to entirely forbid recursion. I.e. we die if you include config
>>    which in turn tries to use this include mechanism, right?
>> 
>>    That's probably wise, and it is explicitly documented.
>> 
>>    But as far as the documentation about this being a forward-compatible facility, do we
>>    think that this limitation would apply to any future config key? I.e. if I include based
>>    on "user.email" nothing in that to-be-included can set user.email?
>> 
>>    That's probably OK, just wondering. In any case it can always be expanded later on.
>
> We can decide later what the future facility will be, but I envision
> that we will not allow included files to set config that can affect any
> include directives in use. So, for example, if I have a user.email-based
> include, none of my config-conditionally included files can set user.email.

I didn't look deeply at the implementation at all, but why would this be
a problem?

You parse ~/.gitconfig, it has user.name=foo, then right after in that
file we do:

    [includeIf "hasconfig:user.name:*foo*"]
    path = ~/.gitconfig.d/foo

Now the top of  ~/.gitconfig.d/foo we have:

    [user]
    name = bar
    [includeIf "hasconfig:user.name:*bar*"]
    path = ~/.gitconfig.d/bar

Why would it matter that we included on user.name=foo before?

Doesn't that only matter *while* we process that first "path" line? Once
we move past it we update our configset to user.name=bar once we hit the
"name" line of the included file.

Then when we get another "hasconfig:user.name" we just match it to our
current user.name=*bar*.

No?

Anyway, I think it's fine to punt on it for now or whatever, just
curious...

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 0/2] Conditional config includes based on remote URL
  2021-12-07 17:56       ` Ævar Arnfjörð Bjarmason
@ 2021-12-07 18:52         ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-07 18:52 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git, chooglen, gitster

> >>  * Inlining the "remote_urls" in the struct makes its management easier;
> >>    and the free/NULL checks just check .nr now, and string_list_clear() can be
> >>    unconditional.
> >
> > I don't think we can do this - nr might still be 0 after a scan if we
> > don't have remote URLs for some reason, so we still need to distinguish
> > between not-scanned and scanned-with-zero-URLs.
> 
> You mean so that we don't double-free? The way string_list_clear()
> protects against that, but maybe there's something else.
> 
> Whatever it is (if there's anything) it could use test coverage then :)

No - we only want to do one scan per config read. If we scan and there
are no remote URLs, with your scheme, next time we encounter another
includeIf.hasconfig, we would need to scan again (because nr is still
0). With my scheme, we can see that the pointer is non-NULL, so we know
that we have already scanned.

> >>  * It would be nice if e.g. the "includeIf.hasconfig:remote.*.url globs" test
> >>    were split up by condition, but maybe that's a hassle (would need a small helper).
> >> 
> >>    Just something that would have helped while hacking on this, i.e. now most of it
> >>    was an all-or-nothing failure & peek at the trace output
> >
> > What do you mean by condition? There seems to only be one condition
> > (whether the URL is there or not), unless you were thinking of smaller
> > subdivisions.
> 
> Maybe I'm just misunderstanding the intent here, but aren't you trying
> to guard against the case of having a ~/.gitconfig that includes
> ~/.gitconfig.d/for-this-url, and *that* file in turns changes the
> remote's "url" in its config, followed by another "include if url
> matches" condition therein?
> 
> I.e. I read (more like skimmed) the documentation & test at the end as
> forbidding that, but maybe that's OK?

If we're including "~/.gitconfig.d/for-this-url" by includeIf.hasconfig,
then yes, I'm guarding against that and other similar conditions.

> > We can decide later what the future facility will be, but I envision
> > that we will not allow included files to set config that can affect any
> > include directives in use. So, for example, if I have a user.email-based
> > include, none of my config-conditionally included files can set user.email.
> 
> I didn't look deeply at the implementation at all, but why would this be
> a problem?
> 
> You parse ~/.gitconfig, it has user.name=foo, then right after in that
> file we do:
> 
>     [includeIf "hasconfig:user.name:*foo*"]
>     path = ~/.gitconfig.d/foo
> 
> Now the top of  ~/.gitconfig.d/foo we have:
> 
>     [user]
>     name = bar
>     [includeIf "hasconfig:user.name:*bar*"]
>     path = ~/.gitconfig.d/bar
> 
> Why would it matter that we included on user.name=foo before?
> 
> Doesn't that only matter *while* we process that first "path" line? Once
> we move past it we update our configset to user.name=bar once we hit the
> "name" line of the included file.
> 
> Then when we get another "hasconfig:user.name" we just match it to our
> current user.name=*bar*.
> 
> No?
> 
> Anyway, I think it's fine to punt on it for now or whatever, just
> curious...

Well, we can't punt on it because what you describe also applies to
remote URL :-)

So what you're saying is that once we have decided to include a file, we
always include it in its entirety regardless of whether the condition
changes during the file's include. That's reasonable, but other people
could have differing opinions. In this case, I think it's fine just to
prohibit it entirely. In the future, we may look into relaxing this
condition.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v6 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (8 preceding siblings ...)
  2021-12-02 23:31 ` [PATCH v5 " Jonathan Tan
@ 2021-12-07 23:23 ` Jonathan Tan
  2021-12-07 23:23   ` [PATCH v6 1/2] config: make git_config_include() static Jonathan Tan
  2021-12-07 23:23   ` [PATCH v6 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-12-14 21:31 ` [PATCH v7 0/2] Conditional config includes based on remote URL Jonathan Tan
  2022-01-18 17:47 ` [PATCH v8 " Jonathan Tan
  11 siblings, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, avarab

Here's a reroll addressing Ævar's comments about needless $(pwd) and
separating out include_by_remote_url() to make things more consistent.

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 Documentation/config.txt |  16 +++++
 config.c                 | 133 ++++++++++++++++++++++++++++++++++++---
 config.h                 |  46 ++++----------
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++
 4 files changed, 272 insertions(+), 41 deletions(-)

Range-diff against v5:
1:  b2dcae03ed = 1:  b2dcae03ed config: make git_config_include() static
2:  d3b8e00717 ! 2:  de2be06818 config: include file if remote URL matches a glob
    @@ config.c: static int include_by_branch(const char *cond, size_t cond_len)
     +	return found;
     +}
     +
    ++static int include_by_remote_url(struct config_include_data *inc,
    ++		const char *cond, size_t cond_len)
    ++{
    ++	if (inc->opts->unconditional_remote_url)
    ++		return 1;
    ++	if (!inc->remote_urls)
    ++		populate_remote_urls(inc);
    ++	return at_least_one_url_matches_glob(cond, cond_len,
    ++					     inc->remote_urls);
    ++}
    ++
     +static int include_condition_is_true(struct config_include_data *inc,
      				     const char *cond, size_t cond_len)
      {
     +	const struct config_options *opts = inc->opts;
      
    --	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
    -+	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len)) {
    + 	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
      		return include_by_gitdir(opts, cond, cond_len, 0);
    --	else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len))
    -+	} else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", &cond, &cond_len)) {
    +@@ config.c: static int include_condition_is_true(const struct config_options *opts,
      		return include_by_gitdir(opts, cond, cond_len, 1);
    --	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
    -+	} else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len)) {
    + 	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
      		return include_by_branch(cond, cond_len);
    -+	} else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
    -+				   &cond_len)) {
    -+		if (inc->opts->unconditional_remote_url)
    -+			return 1;
    -+		if (!inc->remote_urls)
    -+			populate_remote_urls(inc);
    -+		return at_least_one_url_matches_glob(cond, cond_len,
    -+						     inc->remote_urls);
    -+	}
    ++	else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
    ++				   &cond_len))
    ++		return include_by_remote_url(inc, cond, cond_len);
      
      	/* unknown conditionals are always false */
      	return 0;
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    -+	cat >"$(pwd)"/include-this <<-\EOF &&
    ++	cat >include-this <<-\EOF &&
     +	[user]
     +		this = this-is-included
     +	EOF
    -+	cat >"$(pwd)"/dont-include-that <<-\EOF &&
    ++	cat >dont-include-that <<-\EOF &&
     +	[user]
     +		that = that-is-not-included
     +	EOF
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    -+	cat >"$(pwd)"/include-two-three <<-\EOF &&
    ++	cat >include-two-three <<-\EOF &&
     +	[user]
     +		two = included-config
     +		three = included-config
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    -+	printf "[user]\ndss = yes\n" >"$(pwd)/double-star-start" &&
    -+	printf "[user]\ndse = yes\n" >"$(pwd)/double-star-end" &&
    -+	printf "[user]\ndsm = yes\n" >"$(pwd)/double-star-middle" &&
    -+	printf "[user]\nssm = yes\n" >"$(pwd)/single-star-middle" &&
    -+	printf "[user]\nno = no\n" >"$(pwd)/no" &&
    ++	printf "[user]\ndss = yes\n" >double-star-start &&
    ++	printf "[user]\ndse = yes\n" >double-star-end &&
    ++	printf "[user]\ndsm = yes\n" >double-star-middle &&
    ++	printf "[user]\nssm = yes\n" >single-star-middle &&
    ++	printf "[user]\nno = no\n" >no &&
     +
     +	cat >>hasremoteurlTest/.git/config <<-EOF &&
     +	[remote "foo"]
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	git init hasremoteurlTest &&
     +	test_when_finished "rm -rf hasremoteurlTest" &&
     +
    -+	cat >"$(pwd)"/include-with-url <<-\EOF &&
    ++	cat >include-with-url <<-\EOF &&
     +	[remote "bar"]
     +		url = bar
     +	EOF
-- 
2.34.1.400.ga245620fadb-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v6 1/2] config: make git_config_include() static
  2021-12-07 23:23 ` [PATCH v6 " Jonathan Tan
@ 2021-12-07 23:23   ` Jonathan Tan
  2021-12-07 23:23   ` [PATCH v6 2/2] config: include file if remote URL matches a glob Jonathan Tan
  1 sibling, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, avarab

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2dcbe901b6..94ad5ce913 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index f119de0130..48a5e472ca 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.34.1.400.ga245620fadb-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-07 23:23 ` [PATCH v6 " Jonathan Tan
  2021-12-07 23:23   ` [PATCH v6 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-12-07 23:23   ` Jonathan Tan
  2021-12-08 19:19     ` Glen Choo
  2021-12-08 19:55     ` Glen Choo
  1 sibling, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-07 23:23 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, avarab

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager - it could, for example,
contain a file to be included conditionally and a post-install script
that adds the include directive to the system-wide config file).

In order to do this, Git reruns the config parsing mechanism upon
noticing the first URL-conditional include in order to find all remote
URLs, and these remote URLs are then used to determine if that first and
all subsequent includes are executed. Remote URLs are not allowed to be
configued in any URL-conditionally-included file.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt |  16 ++++++
 config.c                 | 121 ++++++++++++++++++++++++++++++++++++---
 config.h                 |   9 +++
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 257 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0c0e6b859f..e0e5ca558e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -159,6 +159,22 @@ all branches that begin with `foo/`. This is useful if your branches are
 organized hierarchically and you would like to apply a configuration to
 all the branches in that hierarchy.
 
+`hasconfig:remote.*.url:`::
+	The data that follows this keyword is taken to
+	be a pattern with standard globbing wildcards and two
+	additional ones, `**/` and `/**`, that can match multiple
+	components. The first time this keyword is seen, the rest of
+	the config files will be scanned for remote URLs (without
+	applying any values). If there exists at least one remote URL
+	that matches this pattern, the include condition is met.
++
+Files included by this option (directly or indirectly) are not allowed
+to contain remote URLs.
++
+This keyword is designed to be forwards compatible with a naming
+scheme that supports more variable-based include conditions, but
+currently Git only supports the exact keyword described above.
+
 A few more notes on matching via `gitdir` and `gitdir/i`:
 
  * Symlinks in `$GIT_DIR` are not resolved before matching.
diff --git a/config.c b/config.c
index 94ad5ce913..f17053d91b 100644
--- a/config.c
+++ b/config.c
@@ -125,6 +125,12 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+	struct git_config_source *config_source;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list *remote_urls;
 };
 #define CONFIG_INCLUDE_INIT { 0 }
 
@@ -301,9 +307,92 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	return ret;
 }
 
-static int include_condition_is_true(const struct config_options *opts,
+static int add_remote_url(const char *var, const char *value, void *data)
+{
+	struct string_list *remote_urls = data;
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(remote_urls, value);
+	return 0;
+}
+
+static void populate_remote_urls(struct config_include_data *inc)
+{
+	struct config_options opts;
+
+	struct config_source *store_cf = cf;
+	struct key_value_info *store_kvi = current_config_kvi;
+	enum config_scope store_scope = current_parsing_scope;
+
+	opts = *inc->opts;
+	opts.unconditional_remote_url = 1;
+
+	cf = NULL;
+	current_config_kvi = NULL;
+	current_parsing_scope = 0;
+
+	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
+	string_list_init_dup(inc->remote_urls);
+	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+
+	cf = store_cf;
+	current_config_kvi = store_kvi;
+	current_parsing_scope = store_scope;
+}
+
+static int forbid_remote_url(const char *var, const char *value, void *data)
+{
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url"));
+	return 0;
+}
+
+static int at_least_one_url_matches_glob(const char *glob, int glob_len,
+					 struct string_list *remote_urls)
+{
+	struct strbuf pattern = STRBUF_INIT;
+	struct string_list_item *url_item;
+	int found = 0;
+
+	strbuf_add(&pattern, glob, glob_len);
+	for_each_string_list_item(url_item, remote_urls) {
+		if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+			found = 1;
+			break;
+		}
+	}
+	strbuf_release(&pattern);
+	return found;
+}
+
+static int include_by_remote_url(struct config_include_data *inc,
+		const char *cond, size_t cond_len)
+{
+	if (inc->opts->unconditional_remote_url)
+		return 1;
+	if (!inc->remote_urls)
+		populate_remote_urls(inc);
+	return at_least_one_url_matches_glob(cond, cond_len,
+					     inc->remote_urls);
+}
+
+static int include_condition_is_true(struct config_include_data *inc,
 				     const char *cond, size_t cond_len)
 {
+	const struct config_options *opts = inc->opts;
 
 	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
 		return include_by_gitdir(opts, cond, cond_len, 0);
@@ -311,6 +400,9 @@ static int include_condition_is_true(const struct config_options *opts,
 		return include_by_gitdir(opts, cond, cond_len, 1);
 	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
 		return include_by_branch(cond, cond_len);
+	else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
+				   &cond_len))
+		return include_by_remote_url(inc, cond, cond_len);
 
 	/* unknown conditionals are always false */
 	return 0;
@@ -335,9 +427,16 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
+	    cond && include_condition_is_true(inc, cond, cond_len) &&
+	    !strcmp(key, "path")) {
+		config_fn_t old_fn = inc->fn;
+
+		if (inc->opts->unconditional_remote_url)
+			inc->fn = forbid_remote_url;
 		ret = handle_path_include(value, inc);
+		if (inc->opts->unconditional_remote_url)
+			inc->fn = old_fn;
+	}
 
 	return ret;
 }
@@ -1933,11 +2032,13 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
 		inc.data = data;
 		inc.opts = opts;
+		inc.config_source = config_source;
 		fn = git_config_include;
 		data = &inc;
 	}
@@ -1950,17 +2051,23 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	if (inc.remote_urls) {
+		string_list_clear(inc.remote_urls, 0);
+		FREE_AND_NULL(inc.remote_urls);
+	}
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/config.h b/config.h
index 48a5e472ca..ab0106d287 100644
--- a/config.h
+++ b/config.h
@@ -89,6 +89,15 @@ struct config_options {
 	unsigned int ignore_worktree : 1;
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
+
+	/*
+	 * For internal use. Include all includeif.hasremoteurl paths without
+	 * checking if the repo has that remote URL, and when doing so, verify
+	 * that files included in this way do not configure any remote URLs
+	 * themselves.
+	 */
+	unsigned int unconditional_remote_url : 1;
+
 	const char *commondir;
 	const char *git_dir;
 	config_parser_event_fn_t event_fn;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..8310562b84 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,122 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasconfig:remote.*.url' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasconfig:remote.*.url:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-two-three <<-\EOF &&
+	[user]
+		two = included-config
+		three = included-config
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = foo
+	[user]
+		one = main-config
+		two = main-config
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-two-three"
+	[user]
+		three = main-config
+	EOF
+
+	echo main-config >expect-main-config &&
+	echo included-config >expect-included-config &&
+
+	git -C hasremoteurlTest config --get user.one >actual &&
+	test_cmp expect-main-config actual &&
+
+	git -C hasremoteurlTest config --get user.two >actual &&
+	test_cmp expect-included-config actual &&
+
+	git -C hasremoteurlTest config --get user.three >actual &&
+	test_cmp expect-main-config actual
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url globs' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	printf "[user]\ndss = yes\n" >double-star-start &&
+	printf "[user]\ndse = yes\n" >double-star-end &&
+	printf "[user]\ndsm = yes\n" >double-star-middle &&
+	printf "[user]\nssm = yes\n" >single-star-middle &&
+	printf "[user]\nno = no\n" >no &&
+
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = https://foo/bar/baz
+	[includeIf "hasconfig:remote.*.url:**/baz"]
+		path = "$(pwd)/double-star-start"
+	[includeIf "hasconfig:remote.*.url:**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**"]
+		path = "$(pwd)/double-star-end"
+	[includeIf "hasconfig:remote.*.url:nomatch:/**"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**/baz"]
+		path = "$(pwd)/double-star-middle"
+	[includeIf "hasconfig:remote.*.url:https:/**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https://*/bar/baz"]
+		path = "$(pwd)/single-star-middle"
+	[includeIf "hasconfig:remote.*.url:https://*/baz"]
+		path = "$(pwd)/no"
+	EOF
+
+	git -C hasremoteurlTest config --get user.dss &&
+	git -C hasremoteurlTest config --get user.dse &&
+	git -C hasremoteurlTest config --get user.dsm &&
+	git -C hasremoteurlTest config --get user.ssm &&
+	test_must_fail git -C hasremoteurlTest config --get user.no
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such included files' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-with-url <<-\EOF &&
+	[remote "bar"]
+		url = bar
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-with-url"
+	EOF
+
+	# test with any Git command
+	test_must_fail git -C hasremoteurlTest status 2>err &&
+	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
+'
+
 test_done
-- 
2.34.1.400.ga245620fadb-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-07 23:23   ` [PATCH v6 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-12-08 19:19     ` Glen Choo
  2021-12-09 22:16       ` Jonathan Tan
  2021-12-08 19:55     ` Glen Choo
  1 sibling, 1 reply; 87+ messages in thread
From: Glen Choo @ 2021-12-08 19:19 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: Jonathan Tan, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> @@ -335,9 +427,16 @@ static int git_config_include(const char *var, const char *value, void *data)
>  		ret = handle_path_include(value, inc);
>  
>  	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
> -	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
> -	    !strcmp(key, "path"))
> +	    cond && include_condition_is_true(inc, cond, cond_len) &&
> +	    !strcmp(key, "path")) {
> +		config_fn_t old_fn = inc->fn;
> +
> +		if (inc->opts->unconditional_remote_url)
> +			inc->fn = forbid_remote_url;
>  		ret = handle_path_include(value, inc);
> +		if (inc->opts->unconditional_remote_url)
> +			inc->fn = old_fn;
> +	}
>  
>  	return ret;
>  }

Minor nit: it looks like we don't need to restore inc->fn conditionally,
so instead of:

	if (inc->opts->unconditional_remote_url)
			inc->fn = old_fn;

we could just have:

  inc->fn = old_fn;

which (purely as a matter of personal taste) looks a bit more consistent
with the unconditional assignment of:

  config_fn_t old_fn = inc->fn;



No comments on the rest of the patch; it looks clean and
easy-to-understand :)

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-07 23:23   ` [PATCH v6 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-12-08 19:19     ` Glen Choo
@ 2021-12-08 19:55     ` Glen Choo
  2021-12-09 22:39       ` Jonathan Tan
  1 sibling, 1 reply; 87+ messages in thread
From: Glen Choo @ 2021-12-08 19:55 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: Jonathan Tan, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0c0e6b859f..e0e5ca558e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -159,6 +159,22 @@ all branches that begin with `foo/`. This is useful if your branches are
>  organized hierarchically and you would like to apply a configuration to
>  all the branches in that hierarchy.
>  
> +`hasconfig:remote.*.url:`::
> +	The data that follows this keyword is taken to
> +	be a pattern with standard globbing wildcards and two
> +	additional ones, `**/` and `/**`, that can match multiple
> +	components. The first time this keyword is seen, the rest of
> +	the config files will be scanned for remote URLs (without
> +	applying any values). If there exists at least one remote URL
> +	that matches this pattern, the include condition is met.
> ++
> +Files included by this option (directly or indirectly) are not allowed
> +to contain remote URLs.

Wondering out loud.. Reading this and Ævar's comment [1], I wonder if we
should make it clear to users *why* we choose to forbid remote URLs.

Since this series is setting a precedent for future "hasconfig:"
conditions (files included by "hasconfig:foo.*.bar" cannot contain any
"foo.*.bar" values), it would be useful to git developers to explain
*why* we chose to do this. And if we're documenting it for ourselves,
we might as well write it in the public docs. That way, users would know
that this is more of a guardrail (because it's simpler to understand
this way) than a hard limitation.

[1] https://lore.kernel.org/git/211207.86k0ggnvfo.gmgdl@evledraar.gmail.com

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-08 19:19     ` Glen Choo
@ 2021-12-09 22:16       ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-09 22:16 UTC (permalink / raw)
  To: chooglen; +Cc: jonathantanmy, git, avarab

Glen Choo <chooglen@google.com> writes:
> Minor nit: it looks like we don't need to restore inc->fn conditionally,
> so instead of:
> 
> 	if (inc->opts->unconditional_remote_url)
> 			inc->fn = old_fn;
> 
> we could just have:
> 
>   inc->fn = old_fn;
> 
> which (purely as a matter of personal taste) looks a bit more consistent
> with the unconditional assignment of:
> 
>   config_fn_t old_fn = inc->fn;
> 
> 
> 
> No comments on the rest of the patch; it looks clean and
> easy-to-understand :)

Thanks for taking a look. This is a good suggestion - I'll use it.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-08 19:55     ` Glen Choo
@ 2021-12-09 22:39       ` Jonathan Tan
  2021-12-09 23:33         ` Glen Choo
  2021-12-10 21:45         ` Junio C Hamano
  0 siblings, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-09 22:39 UTC (permalink / raw)
  To: chooglen; +Cc: jonathantanmy, git, avarab

Glen Choo <chooglen@google.com> writes:
> > +`hasconfig:remote.*.url:`::
> > +	The data that follows this keyword is taken to
> > +	be a pattern with standard globbing wildcards and two
> > +	additional ones, `**/` and `/**`, that can match multiple
> > +	components. The first time this keyword is seen, the rest of
> > +	the config files will be scanned for remote URLs (without
> > +	applying any values). If there exists at least one remote URL
> > +	that matches this pattern, the include condition is met.
> > ++
> > +Files included by this option (directly or indirectly) are not allowed
> > +to contain remote URLs.
> 
> Wondering out loud.. Reading this and Ævar's comment [1], I wonder if we
> should make it clear to users *why* we choose to forbid remote URLs.
> 
> Since this series is setting a precedent for future "hasconfig:"
> conditions (files included by "hasconfig:foo.*.bar" cannot contain any
> "foo.*.bar" values), it would be useful to git developers to explain
> *why* we chose to do this. And if we're documenting it for ourselves,
> we might as well write it in the public docs. That way, users would know
> that this is more of a guardrail (because it's simpler to understand
> this way) than a hard limitation.
> 
> [1] https://lore.kernel.org/git/211207.86k0ggnvfo.gmgdl@evledraar.gmail.com

The explanation is rather long, though. It goes something like this:

  If the main config is:

  [remote a]
    url = bar
  [includeif hasconfig:remote.*.url:foo]
    path = foo
  [includeif hasconfig:remote.*.url:bar]
    path = bar

  and "bar" contains:

  [remote b]
    url = foo

  Should "foo" be included? For now, we avoid these situations
  completely by prohibiting URLs from being configured in "includeif
  hasconfig".

If you can think of a concise explanation, maybe we can include it.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-09 22:39       ` Jonathan Tan
@ 2021-12-09 23:33         ` Glen Choo
  2021-12-13 23:35           ` Jonathan Tan
  2021-12-10 21:45         ` Junio C Hamano
  1 sibling, 1 reply; 87+ messages in thread
From: Glen Choo @ 2021-12-09 23:33 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: jonathantanmy, git, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

>> > +`hasconfig:remote.*.url:`::
>> > +	The data that follows this keyword is taken to
>> > +	be a pattern with standard globbing wildcards and two
>> > +	additional ones, `**/` and `/**`, that can match multiple
>> > +	components. The first time this keyword is seen, the rest of
>> > +	the config files will be scanned for remote URLs (without
>> > +	applying any values). If there exists at least one remote URL
>> > +	that matches this pattern, the include condition is met.
>> > ++
>> > +Files included by this option (directly or indirectly) are not allowed
>> > +to contain remote URLs.
>> 
>> Wondering out loud.. Reading this and Ævar's comment [1], I wonder if we
>> should make it clear to users *why* we choose to forbid remote URLs.
>> 
>> Since this series is setting a precedent for future "hasconfig:"
>> conditions (files included by "hasconfig:foo.*.bar" cannot contain any
>> "foo.*.bar" values), it would be useful to git developers to explain
>> *why* we chose to do this. And if we're documenting it for ourselves,
>> we might as well write it in the public docs. That way, users would know
>> that this is more of a guardrail (because it's simpler to understand
>> this way) than a hard limitation.
>> 
>> [1] https://lore.kernel.org/git/211207.86k0ggnvfo.gmgdl@evledraar.gmail.com
>
> The explanation is rather long, though. It goes something like this:
>
>   If the main config is:
>
>   [remote a]
>     url = bar
>   [includeif hasconfig:remote.*.url:foo]
>     path = foo
>   [includeif hasconfig:remote.*.url:bar]
>     path = bar
>
>   and "bar" contains:
>
>   [remote b]
>     url = foo
>
>   Should "foo" be included? For now, we avoid these situations
>   completely by prohibiting URLs from being configured in "includeif
>   hasconfig".
>
> If you can think of a concise explanation, maybe we can include it.

Yeah, I can't think of a concise-yet-clear way to convey this to users
(if I had thought of one, I wouldn't have prefaced my original comment
with "Wondering out loud").

Spitballing here...

  `hasconfig:remote.*.url:`::
    The data that follows this keyword is taken to
    be a pattern with standard globbing wildcards and two
    additional ones, `**/` and `/**`, that can match multiple
    components. The first time this keyword is seen, the rest of
    the config files will be scanned for remote URLs (without
    applying any values). If there exists at least one remote URL
    that matches this pattern, the include condition is met.

  - Files included by this option (directly or indirectly) are not allowed
  - to contain remote URLs.
  + Because new remote URLs might affect the correctness of the include
  + condition, files included by this option (directly or indirectly) are
  + not allowed to contain remote URLs.

Although, upon further reflection, I wonder if this approach of banning
config variables really gives us the safety we want after all. Reworking
your example, say we expand "hasconfig" to include
"hasconfig:branch.*.merge" then we can have this in the main config:

   [remote a]
     url = baz
   [branch c]
     merge = bar

   [includeif hasconfig:remote.*.url:foo]
     path = foo
   [includeif hasconfig:branch.*.merge:bar]
     path = bar

and "bar" contains:

   [remote b]
     url = foo

we end up with the exact same question of "Should "foo" be included?".
This shows that the rule isn't actually "files included by
hasconfig:remote.*.url cannot include remote.*.url", but the much more
restrictive "files included by hasconfig:<anything> cannot include any
config values that can appear in hasconfig". This sounds pretty unusable
to me..

But I think that with the semantics you've defined, we don't really need
to forbid config variables. This section describes:

  The first time this keyword is seen, the rest of the config files will
  be scanned for remote URLs (without applying any values). If there
  exists at least one remote URL that matches this pattern, the include
  condition is met.

which, to me, gives us a pass to say "the first time we see a hasconfig,
we will do an additional scan without applying values". That doesn't
sound _too_ confusing to me, but I don't know how it looks to someone
with fresh eyes.

Forgive me if this exact suggestion came up before on-list (I know we've
discussed this exact approach off-list).

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-09 22:39       ` Jonathan Tan
  2021-12-09 23:33         ` Glen Choo
@ 2021-12-10 21:45         ` Junio C Hamano
  2021-12-13 23:37           ` Jonathan Tan
  1 sibling, 1 reply; 87+ messages in thread
From: Junio C Hamano @ 2021-12-10 21:45 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: chooglen, git, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> The explanation is rather long, though. It goes something like this:
>
>   If the main config is:
>
>   [remote a]
>     url = bar
>   [includeif hasconfig:remote.*.url:foo]
>     path = foo
>   [includeif hasconfig:remote.*.url:bar]
>     path = bar
>
>   and "bar" contains:
>
>   [remote b]
>     url = foo
>
>   Should "foo" be included? For now, we avoid these situations
>   completely by prohibiting URLs from being configured in "includeif
>   hasconfig".
>
> If you can think of a concise explanation, maybe we can include it.

Perhaps it is easier to approach it from the viewpoint of a new
user who is unfamiliar with what you designed.

I would imagine that most users would find it natural if a single
pass precedure read and processed lines as it sees them.

That is, when the first includeif is evaluated, we have seen only
'remote.a.url' whose value is 'bar', so the condition does not hold.
and then when the second includeif is evaluated, it gets included,
and we read 'bar'.  But that is wher configuration reading ends;
remote.b.url is not asked for after we process the second includeif
til the end.

If you explain

 (1) why such a simplest design would not work well; and

 (2) how the actual design is different from that simplest design to
     overcome it.

it would be easier to grok?

Thanks.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-09 23:33         ` Glen Choo
@ 2021-12-13 23:35           ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-13 23:35 UTC (permalink / raw)
  To: chooglen; +Cc: jonathantanmy, git, avarab

Glen Choo <chooglen@google.com> writes:
> Yeah, I can't think of a concise-yet-clear way to convey this to users
> (if I had thought of one, I wouldn't have prefaced my original comment
> with "Wondering out loud").
> 
> Spitballing here...
> 
>   `hasconfig:remote.*.url:`::
>     The data that follows this keyword is taken to
>     be a pattern with standard globbing wildcards and two
>     additional ones, `**/` and `/**`, that can match multiple
>     components. The first time this keyword is seen, the rest of
>     the config files will be scanned for remote URLs (without
>     applying any values). If there exists at least one remote URL
>     that matches this pattern, the include condition is met.
> 
>   - Files included by this option (directly or indirectly) are not allowed
>   - to contain remote URLs.
>   + Because new remote URLs might affect the correctness of the include
>   + condition, files included by this option (directly or indirectly) are
>   + not allowed to contain remote URLs.

Junio suggested another approach [1] - I'll try that and see what I come
up with.

[1] https://lore.kernel.org/git/xmqqmtl8m8wj.fsf@gitster.g/

> Although, upon further reflection, I wonder if this approach of banning
> config variables really gives us the safety we want after all. Reworking
> your example, say we expand "hasconfig" to include
> "hasconfig:branch.*.merge" then we can have this in the main config:
> 
>    [remote a]
>      url = baz
>    [branch c]
>      merge = bar
> 
>    [includeif hasconfig:remote.*.url:foo]
>      path = foo
>    [includeif hasconfig:branch.*.merge:bar]
>      path = bar
> 
> and "bar" contains:
> 
>    [remote b]
>      url = foo
> 
> we end up with the exact same question of "Should "foo" be included?".
> This shows that the rule isn't actually "files included by
> hasconfig:remote.*.url cannot include remote.*.url", but the much more
> restrictive "files included by hasconfig:<anything> cannot include any
> config values that can appear in hasconfig". This sounds pretty unusable
> to me..

This was my original idea actually (using any config variable anywhere
bans you from that config variable in all "includeif hasconfig"). I
think it would still be usable - you just have to be careful in which
config variables you use. But we don't have plans to include other
variables now anyway.

> But I think that with the semantics you've defined, we don't really need
> to forbid config variables. This section describes:
> 
>   The first time this keyword is seen, the rest of the config files will
>   be scanned for remote URLs (without applying any values). If there
>   exists at least one remote URL that matches this pattern, the include
>   condition is met.
> 
> which, to me, gives us a pass to say "the first time we see a hasconfig,
> we will do an additional scan without applying values". That doesn't
> sound _too_ confusing to me, but I don't know how it looks to someone
> with fresh eyes.
> 
> Forgive me if this exact suggestion came up before on-list (I know we've
> discussed this exact approach off-list).

This "additional scan without applying values" is not very well-defined,
though. In the scenario I described in [2], should "foo" be included?
"Yes" because it is referenced (even though at that time, nobody has
ever head of the URL "foo") or "no" because at that point in time in the
scan, nobody has ever heard of the URL "foo"?

[2] https://lore.kernel.org/git/20211209223919.513113-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v6 2/2] config: include file if remote URL matches a glob
  2021-12-10 21:45         ` Junio C Hamano
@ 2021-12-13 23:37           ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-13 23:37 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, chooglen, git, avarab

Junio C Hamano <gitster@pobox.com> writes:
> Perhaps it is easier to approach it from the viewpoint of a new
> user who is unfamiliar with what you designed.
> 
> I would imagine that most users would find it natural if a single
> pass precedure read and processed lines as it sees them.
> 
> That is, when the first includeif is evaluated, we have seen only
> 'remote.a.url' whose value is 'bar', so the condition does not hold.
> and then when the second includeif is evaluated, it gets included,
> and we read 'bar'.  But that is wher configuration reading ends;
> remote.b.url is not asked for after we process the second includeif
> til the end.
> 
> If you explain
> 
>  (1) why such a simplest design would not work well; and
> 
>  (2) how the actual design is different from that simplest design to
>      overcome it.
> 
> it would be easier to grok?
> 
> Thanks.

Thanks - this sounds like a good approach. I'll try this.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v7 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (9 preceding siblings ...)
  2021-12-07 23:23 ` [PATCH v6 " Jonathan Tan
@ 2021-12-14 21:31 ` Jonathan Tan
  2021-12-14 21:31   ` [PATCH v7 1/2] config: make git_config_include() static Jonathan Tan
                     ` (3 more replies)
  2022-01-18 17:47 ` [PATCH v8 " Jonathan Tan
  11 siblings, 4 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-14 21:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster

Thanks, everyone, for your comments. I've followed Glen's code
suggestion and Junio's documentation suggestion, as you can see in the
range-diff.

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 Documentation/config.txt |  27 ++++++++
 config.c                 | 132 ++++++++++++++++++++++++++++++++++++---
 config.h                 |  46 ++++----------
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++
 4 files changed, 282 insertions(+), 41 deletions(-)

Range-diff against v6:
1:  b2dcae03ed = 1:  b2dcae03ed config: make git_config_include() static
2:  de2be06818 ! 2:  7c70089074 config: include file if remote URL matches a glob
    @@ Documentation/config.txt: all branches that begin with `foo/`. This is useful if
     +Files included by this option (directly or indirectly) are not allowed
     +to contain remote URLs.
     ++
    -+This keyword is designed to be forwards compatible with a naming
    -+scheme that supports more variable-based include conditions, but
    -+currently Git only supports the exact keyword described above.
    ++Note that unlike other includeIf conditions, resolving this condition
    ++relies on information that is not yet known at the point of reading the
    ++condition. A typical use case is this option being present as a
    ++system-level or global-level config, and the remote URL being in a
    ++local-level config; hence the need to scan ahead when resolving this
    ++condition. In order to avoid the chicken-and-egg problem in which
    ++potentially-included files can affect whether such files are potentially
    ++included, Git breaks the cycle by prohibiting these files from affecting
    ++the resolution of these conditions (thus, prohibiting them from
    ++declaring remote URLs).
    +++
    ++As for the naming of this keyword, it is for forwards compatibiliy with
    ++a naming scheme that supports more variable-based include conditions,
    ++but currently Git only supports the exact keyword described above.
     +
      A few more notes on matching via `gitdir` and `gitdir/i`:
      
    @@ config.c: static int git_config_include(const char *var, const char *value, void
     +		if (inc->opts->unconditional_remote_url)
     +			inc->fn = forbid_remote_url;
      		ret = handle_path_include(value, inc);
    -+		if (inc->opts->unconditional_remote_url)
    -+			inc->fn = old_fn;
    ++		inc->fn = old_fn;
     +	}
      
      	return ret;
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v7 1/2] config: make git_config_include() static
  2021-12-14 21:31 ` [PATCH v7 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2021-12-14 21:31   ` Jonathan Tan
  2021-12-14 21:31   ` [PATCH v7 2/2] config: include file if remote URL matches a glob Jonathan Tan
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-14 21:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2dcbe901b6..94ad5ce913 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index f119de0130..48a5e472ca 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v7 2/2] config: include file if remote URL matches a glob
  2021-12-14 21:31 ` [PATCH v7 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-12-14 21:31   ` [PATCH v7 1/2] config: make git_config_include() static Jonathan Tan
@ 2021-12-14 21:31   ` Jonathan Tan
  2021-12-16 21:54     ` Glen Choo
  2021-12-28  0:55     ` Elijah Newren
  2021-12-16 21:57   ` [PATCH v7 0/2] Conditional config includes based on remote URL Glen Choo
  2021-12-28  1:13   ` Elijah Newren
  3 siblings, 2 replies; 87+ messages in thread
From: Jonathan Tan @ 2021-12-14 21:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, gitster

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager - it could, for example,
contain a file to be included conditionally and a post-install script
that adds the include directive to the system-wide config file).

In order to do this, Git reruns the config parsing mechanism upon
noticing the first URL-conditional include in order to find all remote
URLs, and these remote URLs are then used to determine if that first and
all subsequent includes are executed. Remote URLs are not allowed to be
configued in any URL-conditionally-included file.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt |  27 +++++++++
 config.c                 | 120 ++++++++++++++++++++++++++++++++++++---
 config.h                 |   9 +++
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 267 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0c0e6b859f..9b3480779e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -159,6 +159,33 @@ all branches that begin with `foo/`. This is useful if your branches are
 organized hierarchically and you would like to apply a configuration to
 all the branches in that hierarchy.
 
+`hasconfig:remote.*.url:`::
+	The data that follows this keyword is taken to
+	be a pattern with standard globbing wildcards and two
+	additional ones, `**/` and `/**`, that can match multiple
+	components. The first time this keyword is seen, the rest of
+	the config files will be scanned for remote URLs (without
+	applying any values). If there exists at least one remote URL
+	that matches this pattern, the include condition is met.
++
+Files included by this option (directly or indirectly) are not allowed
+to contain remote URLs.
++
+Note that unlike other includeIf conditions, resolving this condition
+relies on information that is not yet known at the point of reading the
+condition. A typical use case is this option being present as a
+system-level or global-level config, and the remote URL being in a
+local-level config; hence the need to scan ahead when resolving this
+condition. In order to avoid the chicken-and-egg problem in which
+potentially-included files can affect whether such files are potentially
+included, Git breaks the cycle by prohibiting these files from affecting
+the resolution of these conditions (thus, prohibiting them from
+declaring remote URLs).
++
+As for the naming of this keyword, it is for forwards compatibiliy with
+a naming scheme that supports more variable-based include conditions,
+but currently Git only supports the exact keyword described above.
+
 A few more notes on matching via `gitdir` and `gitdir/i`:
 
  * Symlinks in `$GIT_DIR` are not resolved before matching.
diff --git a/config.c b/config.c
index 94ad5ce913..ac4534ecf2 100644
--- a/config.c
+++ b/config.c
@@ -125,6 +125,12 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+	struct git_config_source *config_source;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list *remote_urls;
 };
 #define CONFIG_INCLUDE_INIT { 0 }
 
@@ -301,9 +307,92 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	return ret;
 }
 
-static int include_condition_is_true(const struct config_options *opts,
+static int add_remote_url(const char *var, const char *value, void *data)
+{
+	struct string_list *remote_urls = data;
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(remote_urls, value);
+	return 0;
+}
+
+static void populate_remote_urls(struct config_include_data *inc)
+{
+	struct config_options opts;
+
+	struct config_source *store_cf = cf;
+	struct key_value_info *store_kvi = current_config_kvi;
+	enum config_scope store_scope = current_parsing_scope;
+
+	opts = *inc->opts;
+	opts.unconditional_remote_url = 1;
+
+	cf = NULL;
+	current_config_kvi = NULL;
+	current_parsing_scope = 0;
+
+	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
+	string_list_init_dup(inc->remote_urls);
+	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+
+	cf = store_cf;
+	current_config_kvi = store_kvi;
+	current_parsing_scope = store_scope;
+}
+
+static int forbid_remote_url(const char *var, const char *value, void *data)
+{
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url"));
+	return 0;
+}
+
+static int at_least_one_url_matches_glob(const char *glob, int glob_len,
+					 struct string_list *remote_urls)
+{
+	struct strbuf pattern = STRBUF_INIT;
+	struct string_list_item *url_item;
+	int found = 0;
+
+	strbuf_add(&pattern, glob, glob_len);
+	for_each_string_list_item(url_item, remote_urls) {
+		if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+			found = 1;
+			break;
+		}
+	}
+	strbuf_release(&pattern);
+	return found;
+}
+
+static int include_by_remote_url(struct config_include_data *inc,
+		const char *cond, size_t cond_len)
+{
+	if (inc->opts->unconditional_remote_url)
+		return 1;
+	if (!inc->remote_urls)
+		populate_remote_urls(inc);
+	return at_least_one_url_matches_glob(cond, cond_len,
+					     inc->remote_urls);
+}
+
+static int include_condition_is_true(struct config_include_data *inc,
 				     const char *cond, size_t cond_len)
 {
+	const struct config_options *opts = inc->opts;
 
 	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
 		return include_by_gitdir(opts, cond, cond_len, 0);
@@ -311,6 +400,9 @@ static int include_condition_is_true(const struct config_options *opts,
 		return include_by_gitdir(opts, cond, cond_len, 1);
 	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
 		return include_by_branch(cond, cond_len);
+	else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
+				   &cond_len))
+		return include_by_remote_url(inc, cond, cond_len);
 
 	/* unknown conditionals are always false */
 	return 0;
@@ -335,9 +427,15 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
+	    cond && include_condition_is_true(inc, cond, cond_len) &&
+	    !strcmp(key, "path")) {
+		config_fn_t old_fn = inc->fn;
+
+		if (inc->opts->unconditional_remote_url)
+			inc->fn = forbid_remote_url;
 		ret = handle_path_include(value, inc);
+		inc->fn = old_fn;
+	}
 
 	return ret;
 }
@@ -1933,11 +2031,13 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
 		inc.data = data;
 		inc.opts = opts;
+		inc.config_source = config_source;
 		fn = git_config_include;
 		data = &inc;
 	}
@@ -1950,17 +2050,23 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	if (inc.remote_urls) {
+		string_list_clear(inc.remote_urls, 0);
+		FREE_AND_NULL(inc.remote_urls);
+	}
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/config.h b/config.h
index 48a5e472ca..ab0106d287 100644
--- a/config.h
+++ b/config.h
@@ -89,6 +89,15 @@ struct config_options {
 	unsigned int ignore_worktree : 1;
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
+
+	/*
+	 * For internal use. Include all includeif.hasremoteurl paths without
+	 * checking if the repo has that remote URL, and when doing so, verify
+	 * that files included in this way do not configure any remote URLs
+	 * themselves.
+	 */
+	unsigned int unconditional_remote_url : 1;
+
 	const char *commondir;
 	const char *git_dir;
 	config_parser_event_fn_t event_fn;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..8310562b84 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,122 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasconfig:remote.*.url' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasconfig:remote.*.url:bar"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foo
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-two-three <<-\EOF &&
+	[user]
+		two = included-config
+		three = included-config
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = foo
+	[user]
+		one = main-config
+		two = main-config
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-two-three"
+	[user]
+		three = main-config
+	EOF
+
+	echo main-config >expect-main-config &&
+	echo included-config >expect-included-config &&
+
+	git -C hasremoteurlTest config --get user.one >actual &&
+	test_cmp expect-main-config actual &&
+
+	git -C hasremoteurlTest config --get user.two >actual &&
+	test_cmp expect-included-config actual &&
+
+	git -C hasremoteurlTest config --get user.three >actual &&
+	test_cmp expect-main-config actual
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url globs' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	printf "[user]\ndss = yes\n" >double-star-start &&
+	printf "[user]\ndse = yes\n" >double-star-end &&
+	printf "[user]\ndsm = yes\n" >double-star-middle &&
+	printf "[user]\nssm = yes\n" >single-star-middle &&
+	printf "[user]\nno = no\n" >no &&
+
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = https://foo/bar/baz
+	[includeIf "hasconfig:remote.*.url:**/baz"]
+		path = "$(pwd)/double-star-start"
+	[includeIf "hasconfig:remote.*.url:**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**"]
+		path = "$(pwd)/double-star-end"
+	[includeIf "hasconfig:remote.*.url:nomatch:/**"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**/baz"]
+		path = "$(pwd)/double-star-middle"
+	[includeIf "hasconfig:remote.*.url:https:/**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https://*/bar/baz"]
+		path = "$(pwd)/single-star-middle"
+	[includeIf "hasconfig:remote.*.url:https://*/baz"]
+		path = "$(pwd)/no"
+	EOF
+
+	git -C hasremoteurlTest config --get user.dss &&
+	git -C hasremoteurlTest config --get user.dse &&
+	git -C hasremoteurlTest config --get user.dsm &&
+	git -C hasremoteurlTest config --get user.ssm &&
+	test_must_fail git -C hasremoteurlTest config --get user.no
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such included files' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-with-url <<-\EOF &&
+	[remote "bar"]
+		url = bar
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foo"]
+		path = "$(pwd)/include-with-url"
+	EOF
+
+	# test with any Git command
+	test_must_fail git -C hasremoteurlTest status 2>err &&
+	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
+'
+
 test_done
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 2/2] config: include file if remote URL matches a glob
  2021-12-14 21:31   ` [PATCH v7 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-12-16 21:54     ` Glen Choo
  2021-12-28  0:55     ` Elijah Newren
  1 sibling, 0 replies; 87+ messages in thread
From: Glen Choo @ 2021-12-16 21:54 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: Jonathan Tan, gitster

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0c0e6b859f..9b3480779e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -159,6 +159,33 @@ all branches that begin with `foo/`. This is useful if your branches are
>  organized hierarchically and you would like to apply a configuration to
>  all the branches in that hierarchy.
>  
> +`hasconfig:remote.*.url:`::
> +	The data that follows this keyword is taken to
> +	be a pattern with standard globbing wildcards and two
> +	additional ones, `**/` and `/**`, that can match multiple
> +	components. The first time this keyword is seen, the rest of
> +	the config files will be scanned for remote URLs (without
> +	applying any values). If there exists at least one remote URL
> +	that matches this pattern, the include condition is met.
> ++
> +Files included by this option (directly or indirectly) are not allowed
> +to contain remote URLs.
> ++
> +Note that unlike other includeIf conditions, resolving this condition
> +relies on information that is not yet known at the point of reading the
> +condition. A typical use case is this option being present as a
> +system-level or global-level config, and the remote URL being in a
> +local-level config; hence the need to scan ahead when resolving this
> +condition. In order to avoid the chicken-and-egg problem in which
> +potentially-included files can affect whether such files are potentially
> +included, Git breaks the cycle by prohibiting these files from affecting
> +the resolution of these conditions (thus, prohibiting them from
> +declaring remote URLs).

Putting myself in the shoes of someone who is unfamiliar with the
implementation, I think that this becomes clear if you read it enough
times (but also, I'm not a good reader), so this is ok.

It would be nice for this to be reviewed by someone who is _actually_
unfamiliar, though.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 0/2] Conditional config includes based on remote URL
  2021-12-14 21:31 ` [PATCH v7 0/2] Conditional config includes based on remote URL Jonathan Tan
  2021-12-14 21:31   ` [PATCH v7 1/2] config: make git_config_include() static Jonathan Tan
  2021-12-14 21:31   ` [PATCH v7 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2021-12-16 21:57   ` Glen Choo
  2021-12-28  1:13   ` Elijah Newren
  3 siblings, 0 replies; 87+ messages in thread
From: Glen Choo @ 2021-12-16 21:57 UTC (permalink / raw)
  To: Jonathan Tan, git; +Cc: Jonathan Tan, gitster

Jonathan Tan <jonathantanmy@google.com> writes:

> Thanks, everyone, for your comments. I've followed Glen's code
> suggestion and Junio's documentation suggestion, as you can see in the
> range-diff.
>
> Jonathan Tan (2):
>   config: make git_config_include() static
>   config: include file if remote URL matches a glob
>
>  Documentation/config.txt |  27 ++++++++
>  config.c                 | 132 ++++++++++++++++++++++++++++++++++++---
>  config.h                 |  46 ++++----------
>  t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++
>  4 files changed, 282 insertions(+), 41 deletions(-)
>
> Range-diff against v6:
> 1:  b2dcae03ed = 1:  b2dcae03ed config: make git_config_include() static
> 2:  de2be06818 ! 2:  7c70089074 config: include file if remote URL matches a glob
>     @@ Documentation/config.txt: all branches that begin with `foo/`. This is useful if
>      +Files included by this option (directly or indirectly) are not allowed
>      +to contain remote URLs.
>      ++
>     -+This keyword is designed to be forwards compatible with a naming
>     -+scheme that supports more variable-based include conditions, but
>     -+currently Git only supports the exact keyword described above.
>     ++Note that unlike other includeIf conditions, resolving this condition
>     ++relies on information that is not yet known at the point of reading the
>     ++condition. A typical use case is this option being present as a
>     ++system-level or global-level config, and the remote URL being in a
>     ++local-level config; hence the need to scan ahead when resolving this
>     ++condition. In order to avoid the chicken-and-egg problem in which
>     ++potentially-included files can affect whether such files are potentially
>     ++included, Git breaks the cycle by prohibiting these files from affecting
>     ++the resolution of these conditions (thus, prohibiting them from
>     ++declaring remote URLs).
>     +++
>     ++As for the naming of this keyword, it is for forwards compatibiliy with
>     ++a naming scheme that supports more variable-based include conditions,
>     ++but currently Git only supports the exact keyword described above.
>      +
>       A few more notes on matching via `gitdir` and `gitdir/i`:
>       
>     @@ config.c: static int git_config_include(const char *var, const char *value, void
>      +		if (inc->opts->unconditional_remote_url)
>      +			inc->fn = forbid_remote_url;
>       		ret = handle_path_include(value, inc);
>     -+		if (inc->opts->unconditional_remote_url)
>     -+			inc->fn = old_fn;
>     ++		inc->fn = old_fn;
>      +	}
>       
>       	return ret;
> -- 
> 2.34.1.173.g76aa8bc2d0-goog

The implementation looks good, and I think that the precedent we are
setting with "hasconfig:" is pretty well captured on this thread. 

This looks good to me, though I'm not an expert in this area, so it
would be good for others to chime in.

Reviewed-by: Glen Choo <chooglen@google.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 2/2] config: include file if remote URL matches a glob
  2021-12-14 21:31   ` [PATCH v7 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2021-12-16 21:54     ` Glen Choo
@ 2021-12-28  0:55     ` Elijah Newren
  2022-01-10 18:58       ` Jonathan Tan
  1 sibling, 1 reply; 87+ messages in thread
From: Elijah Newren @ 2021-12-28  0:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Glen Choo, Junio C Hamano

On Wed, Dec 15, 2021 at 7:01 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This is a feature that supports config file inclusion conditional on
> whether the repo has a remote with a URL that matches a glob.
>
> Similar to my previous work on remote-suggested hooks [1], the main
> motivation is to allow remote repo administrators to provide recommended
> configs in a way that can be consumed more easily (e.g. through a
> package installable by a package manager - it could, for example,
> contain a file to be included conditionally and a post-install script
> that adds the include directive to the system-wide config file).
>
> In order to do this, Git reruns the config parsing mechanism upon
> noticing the first URL-conditional include in order to find all remote
> URLs, and these remote URLs are then used to determine if that first and
> all subsequent includes are executed. Remote URLs are not allowed to be
> configued in any URL-conditionally-included file.
>
> [1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/config.txt |  27 +++++++++
>  config.c                 | 120 ++++++++++++++++++++++++++++++++++++---
>  config.h                 |   9 +++
>  t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 267 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0c0e6b859f..9b3480779e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -159,6 +159,33 @@ all branches that begin with `foo/`. This is useful if your branches are
>  organized hierarchically and you would like to apply a configuration to
>  all the branches in that hierarchy.
>
> +`hasconfig:remote.*.url:`::
> +       The data that follows this keyword is taken to
> +       be a pattern with standard globbing wildcards and two
> +       additional ones, `**/` and `/**`, that can match multiple
> +       components. The first time this keyword is seen, the rest of
> +       the config files will be scanned for remote URLs (without
> +       applying any values). If there exists at least one remote URL
> +       that matches this pattern, the include condition is met.
> ++
> +Files included by this option (directly or indirectly) are not allowed
> +to contain remote URLs.
> ++
> +Note that unlike other includeIf conditions, resolving this condition
> +relies on information that is not yet known at the point of reading the
> +condition. A typical use case is this option being present as a
> +system-level or global-level config, and the remote URL being in a
> +local-level config; hence the need to scan ahead when resolving this
> +condition. In order to avoid the chicken-and-egg problem in which
> +potentially-included files can affect whether such files are potentially
> +included, Git breaks the cycle by prohibiting these files from affecting
> +the resolution of these conditions (thus, prohibiting them from
> +declaring remote URLs).
> ++
> +As for the naming of this keyword, it is for forwards compatibiliy with
> +a naming scheme that supports more variable-based include conditions,
> +but currently Git only supports the exact keyword described above.
> +
>  A few more notes on matching via `gitdir` and `gitdir/i`:
>
>   * Symlinks in `$GIT_DIR` are not resolved before matching.
> diff --git a/config.c b/config.c
> index 94ad5ce913..ac4534ecf2 100644
> --- a/config.c
> +++ b/config.c
> @@ -125,6 +125,12 @@ struct config_include_data {
>         config_fn_t fn;
>         void *data;
>         const struct config_options *opts;
> +       struct git_config_source *config_source;
> +
> +       /*
> +        * All remote URLs discovered when reading all config files.
> +        */
> +       struct string_list *remote_urls;
>  };
>  #define CONFIG_INCLUDE_INIT { 0 }
>
> @@ -301,9 +307,92 @@ static int include_by_branch(const char *cond, size_t cond_len)
>         return ret;
>  }
>
> -static int include_condition_is_true(const struct config_options *opts,
> +static int add_remote_url(const char *var, const char *value, void *data)
> +{
> +       struct string_list *remote_urls = data;
> +       const char *remote_name;
> +       size_t remote_name_len;
> +       const char *key;
> +
> +       if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
> +                             &key) &&
> +           remote_name &&
> +           !strcmp(key, "url"))
> +               string_list_append(remote_urls, value);
> +       return 0;
> +}
> +
> +static void populate_remote_urls(struct config_include_data *inc)
> +{
> +       struct config_options opts;
> +
> +       struct config_source *store_cf = cf;
> +       struct key_value_info *store_kvi = current_config_kvi;
> +       enum config_scope store_scope = current_parsing_scope;
> +
> +       opts = *inc->opts;
> +       opts.unconditional_remote_url = 1;
> +
> +       cf = NULL;
> +       current_config_kvi = NULL;
> +       current_parsing_scope = 0;
> +
> +       inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
> +       string_list_init_dup(inc->remote_urls);
> +       config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
> +
> +       cf = store_cf;
> +       current_config_kvi = store_kvi;
> +       current_parsing_scope = store_scope;
> +}
> +
> +static int forbid_remote_url(const char *var, const char *value, void *data)
> +{
> +       const char *remote_name;
> +       size_t remote_name_len;
> +       const char *key;
> +
> +       if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
> +                             &key) &&
> +           remote_name &&
> +           !strcmp(key, "url"))
> +               die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url"));
> +       return 0;
> +}
> +
> +static int at_least_one_url_matches_glob(const char *glob, int glob_len,
> +                                        struct string_list *remote_urls)
> +{
> +       struct strbuf pattern = STRBUF_INIT;
> +       struct string_list_item *url_item;
> +       int found = 0;
> +
> +       strbuf_add(&pattern, glob, glob_len);
> +       for_each_string_list_item(url_item, remote_urls) {
> +               if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
> +                       found = 1;
> +                       break;
> +               }
> +       }
> +       strbuf_release(&pattern);
> +       return found;
> +}
> +
> +static int include_by_remote_url(struct config_include_data *inc,
> +               const char *cond, size_t cond_len)
> +{
> +       if (inc->opts->unconditional_remote_url)
> +               return 1;
> +       if (!inc->remote_urls)
> +               populate_remote_urls(inc);
> +       return at_least_one_url_matches_glob(cond, cond_len,
> +                                            inc->remote_urls);
> +}
> +
> +static int include_condition_is_true(struct config_include_data *inc,
>                                      const char *cond, size_t cond_len)
>  {
> +       const struct config_options *opts = inc->opts;
>
>         if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
>                 return include_by_gitdir(opts, cond, cond_len, 0);
> @@ -311,6 +400,9 @@ static int include_condition_is_true(const struct config_options *opts,
>                 return include_by_gitdir(opts, cond, cond_len, 1);
>         else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
>                 return include_by_branch(cond, cond_len);
> +       else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
> +                                  &cond_len))
> +               return include_by_remote_url(inc, cond, cond_len);
>
>         /* unknown conditionals are always false */
>         return 0;
> @@ -335,9 +427,15 @@ static int git_config_include(const char *var, const char *value, void *data)
>                 ret = handle_path_include(value, inc);
>
>         if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
> -           (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
> -           !strcmp(key, "path"))
> +           cond && include_condition_is_true(inc, cond, cond_len) &&
> +           !strcmp(key, "path")) {
> +               config_fn_t old_fn = inc->fn;
> +
> +               if (inc->opts->unconditional_remote_url)
> +                       inc->fn = forbid_remote_url;
>                 ret = handle_path_include(value, inc);
> +               inc->fn = old_fn;
> +       }
>
>         return ret;
>  }
> @@ -1933,11 +2031,13 @@ int config_with_options(config_fn_t fn, void *data,
>                         const struct config_options *opts)
>  {
>         struct config_include_data inc = CONFIG_INCLUDE_INIT;
> +       int ret;
>
>         if (opts->respect_includes) {
>                 inc.fn = fn;
>                 inc.data = data;
>                 inc.opts = opts;
> +               inc.config_source = config_source;
>                 fn = git_config_include;
>                 data = &inc;
>         }
> @@ -1950,17 +2050,23 @@ int config_with_options(config_fn_t fn, void *data,
>          * regular lookup sequence.
>          */
>         if (config_source && config_source->use_stdin) {
> -               return git_config_from_stdin(fn, data);
> +               ret = git_config_from_stdin(fn, data);
>         } else if (config_source && config_source->file) {
> -               return git_config_from_file(fn, config_source->file, data);
> +               ret = git_config_from_file(fn, config_source->file, data);
>         } else if (config_source && config_source->blob) {
>                 struct repository *repo = config_source->repo ?
>                         config_source->repo : the_repository;
> -               return git_config_from_blob_ref(fn, repo, config_source->blob,
> +               ret = git_config_from_blob_ref(fn, repo, config_source->blob,
>                                                 data);
> +       } else {
> +               ret = do_git_config_sequence(opts, fn, data);
>         }
>
> -       return do_git_config_sequence(opts, fn, data);
> +       if (inc.remote_urls) {
> +               string_list_clear(inc.remote_urls, 0);
> +               FREE_AND_NULL(inc.remote_urls);
> +       }
> +       return ret;
>  }
>
>  static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
> diff --git a/config.h b/config.h
> index 48a5e472ca..ab0106d287 100644
> --- a/config.h
> +++ b/config.h
> @@ -89,6 +89,15 @@ struct config_options {
>         unsigned int ignore_worktree : 1;
>         unsigned int ignore_cmdline : 1;
>         unsigned int system_gently : 1;
> +
> +       /*
> +        * For internal use. Include all includeif.hasremoteurl paths without
> +        * checking if the repo has that remote URL, and when doing so, verify
> +        * that files included in this way do not configure any remote URLs
> +        * themselves.
> +        */
> +       unsigned int unconditional_remote_url : 1;
> +
>         const char *commondir;
>         const char *git_dir;
>         config_parser_event_fn_t event_fn;
> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index 9ff46f3b04..8310562b84 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -2387,4 +2387,122 @@ test_expect_success '--get and --get-all with --fixed-value' '
>         test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
>  '
>
> +test_expect_success 'includeIf.hasconfig:remote.*.url' '
> +       git init hasremoteurlTest &&
> +       test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +       cat >include-this <<-\EOF &&
> +       [user]
> +               this = this-is-included
> +       EOF
> +       cat >dont-include-that <<-\EOF &&
> +       [user]
> +               that = that-is-not-included
> +       EOF
> +       cat >>hasremoteurlTest/.git/config <<-EOF &&
> +       [includeIf "hasconfig:remote.*.url:foo"]
> +               path = "$(pwd)/include-this"
> +       [includeIf "hasconfig:remote.*.url:bar"]
> +               path = "$(pwd)/dont-include-that"
> +       [remote "foo"]
> +               url = foo

Which "foo" is relevant here?  The remote name, or the url value?
Could they be given different values so that the testcase is a bit
easier to read and understand?

> +       EOF
> +
> +       echo this-is-included >expect-this &&
> +       git -C hasremoteurlTest config --get user.this >actual-this &&
> +       test_cmp expect-this actual-this &&
> +
> +       test_must_fail git -C hasremoteurlTest config --get user.that
> +'
> +
> +test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins' '
> +       git init hasremoteurlTest &&
> +       test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +       cat >include-two-three <<-\EOF &&
> +       [user]
> +               two = included-config
> +               three = included-config
> +       EOF
> +       cat >>hasremoteurlTest/.git/config <<-EOF &&
> +       [remote "foo"]
> +               url = foo

...similarly here.

> +       [user]
> +               one = main-config
> +               two = main-config
> +       [includeIf "hasconfig:remote.*.url:foo"]
> +               path = "$(pwd)/include-two-three"
> +       [user]
> +               three = main-config
> +       EOF
> +
> +       echo main-config >expect-main-config &&
> +       echo included-config >expect-included-config &&
> +
> +       git -C hasremoteurlTest config --get user.one >actual &&
> +       test_cmp expect-main-config actual &&
> +
> +       git -C hasremoteurlTest config --get user.two >actual &&
> +       test_cmp expect-included-config actual &&
> +
> +       git -C hasremoteurlTest config --get user.three >actual &&
> +       test_cmp expect-main-config actual
> +'
> +
> +test_expect_success 'includeIf.hasconfig:remote.*.url globs' '
> +       git init hasremoteurlTest &&
> +       test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +       printf "[user]\ndss = yes\n" >double-star-start &&
> +       printf "[user]\ndse = yes\n" >double-star-end &&
> +       printf "[user]\ndsm = yes\n" >double-star-middle &&
> +       printf "[user]\nssm = yes\n" >single-star-middle &&
> +       printf "[user]\nno = no\n" >no &&
> +
> +       cat >>hasremoteurlTest/.git/config <<-EOF &&
> +       [remote "foo"]
> +               url = https://foo/bar/baz

This example is nicer as the below matches make it clearer that it's
about the url value.

> +       [includeIf "hasconfig:remote.*.url:**/baz"]
> +               path = "$(pwd)/double-star-start"
> +       [includeIf "hasconfig:remote.*.url:**/nomatch"]
> +               path = "$(pwd)/no"
> +       [includeIf "hasconfig:remote.*.url:https:/**"]
> +               path = "$(pwd)/double-star-end"
> +       [includeIf "hasconfig:remote.*.url:nomatch:/**"]
> +               path = "$(pwd)/no"
> +       [includeIf "hasconfig:remote.*.url:https:/**/baz"]
> +               path = "$(pwd)/double-star-middle"
> +       [includeIf "hasconfig:remote.*.url:https:/**/nomatch"]
> +               path = "$(pwd)/no"
> +       [includeIf "hasconfig:remote.*.url:https://*/bar/baz"]
> +               path = "$(pwd)/single-star-middle"
> +       [includeIf "hasconfig:remote.*.url:https://*/baz"]
> +               path = "$(pwd)/no"
> +       EOF
> +
> +       git -C hasremoteurlTest config --get user.dss &&
> +       git -C hasremoteurlTest config --get user.dse &&
> +       git -C hasremoteurlTest config --get user.dsm &&
> +       git -C hasremoteurlTest config --get user.ssm &&
> +       test_must_fail git -C hasremoteurlTest config --get user.no
> +'
> +
> +test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such included files' '
> +       git init hasremoteurlTest &&
> +       test_when_finished "rm -rf hasremoteurlTest" &&
> +
> +       cat >include-with-url <<-\EOF &&
> +       [remote "bar"]
> +               url = bar
> +       EOF
> +       cat >>hasremoteurlTest/.git/config <<-EOF &&
> +       [includeIf "hasconfig:remote.*.url:foo"]
> +               path = "$(pwd)/include-with-url"
> +       EOF
> +
> +       # test with any Git command
> +       test_must_fail git -C hasremoteurlTest status 2>err &&
> +       grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
> +'
> +
>  test_done
> --
> 2.34.1.173.g76aa8bc2d0-goog

The testcases are very helpful.  I found myself confused when reading
just the documentation about how it would be used.  Perhaps an example
or two should be added to the documentation?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 0/2] Conditional config includes based on remote URL
  2021-12-14 21:31 ` [PATCH v7 0/2] Conditional config includes based on remote URL Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-12-16 21:57   ` [PATCH v7 0/2] Conditional config includes based on remote URL Glen Choo
@ 2021-12-28  1:13   ` Elijah Newren
  2021-12-28 23:13     ` Glen Choo
  2022-01-10 19:22     ` Jonathan Tan
  3 siblings, 2 replies; 87+ messages in thread
From: Elijah Newren @ 2021-12-28  1:13 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Git Mailing List, Glen Choo, Junio C Hamano, Derrick Stolee,
	Johannes Schindelin

On Wed, Dec 15, 2021 at 7:25 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks, everyone, for your comments. I've followed Glen's code
> suggestion and Junio's documentation suggestion, as you can see in the
> range-diff.

So, the basic idea is, in a setting like Google's, you can have users
install additional files on their system out-of-band, and have the
users specify a simple line in their configuration to make use of
those additional files -- or portions thereof.  It's a way of easily
providing potentially large blocks of pre-vetted configuration for
users.

Seems to make sense.  (and I've read over the code lightly, so feel
free to take this as an Acked-by.)


But can I back up and comment on a bigger picture item?

This mechanism requires somehow getting additional files to the user
separately; projects that span companies (git.git, linux.git, etc.)
won't likely be able to make use of this.

Scalar also has a mechanism for providing potentially large blocks of
pre-vetted configuration for users.  It does so as part of a new
top-level command.  And it does so with a very opinionated set of
values that are not configurable.  Thus, while I'd like to use it,
they use a configuration option that would break things badly at my
$DAYJOB.  (Too many gradle plugins using jgit, which doesn't
understand index.version=4 and will blow up with a very suboptimal
error message when they see it.)  And, it's very specific to scalar;
we probably don't want to add a new toplevel command everytime someone
wants common configuration to be easily grabbed by some user.

It would be nice if we could find some more generic solution.
Granted, I can't think of any, and I don't think this comment should
block this particular series (nor the scalar one), but I am worrying a
little bit that we're getting multiple completely different solutions
for the same general problem, and each brings caveats big enough to
preclude many (most?) potential users.  I don't know what to do about
that, especially since configuration that is too easy to propagate
comes with big security problems, but I wanted to at least raise the
issue and hope others have good ideas.  If nothing else, I want to
raise awareness to avoid proliferation of similar
pre-vetted-configuration-deployment mechanisms.  I'm CC'ing a couple
scalar folks as well for that point.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 0/2] Conditional config includes based on remote URL
  2021-12-28  1:13   ` Elijah Newren
@ 2021-12-28 23:13     ` Glen Choo
  2022-01-10 19:22     ` Jonathan Tan
  1 sibling, 0 replies; 87+ messages in thread
From: Glen Choo @ 2021-12-28 23:13 UTC (permalink / raw)
  To: Elijah Newren, Jonathan Tan
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee,
	Johannes Schindelin

Elijah Newren <newren@gmail.com> writes:

> But can I back up and comment on a bigger picture item?
>
> This mechanism requires somehow getting additional files to the user
> separately; projects that span companies (git.git, linux.git, etc.)
> won't likely be able to make use of this.
>
> Scalar also has a mechanism for providing potentially large blocks of
> pre-vetted configuration for users.  It does so as part of a new
> top-level command.  And it does so with a very opinionated set of
> values that are not configurable.  Thus, while I'd like to use it,
> they use a configuration option that would break things badly at my
> $DAYJOB.  (Too many gradle plugins using jgit, which doesn't
> understand index.version=4 and will blow up with a very suboptimal
> error message when they see it.)  And, it's very specific to scalar;
> we probably don't want to add a new toplevel command everytime someone
> wants common configuration to be easily grabbed by some user.
>
> It would be nice if we could find some more generic solution.
> Granted, I can't think of any, and I don't think this comment should
> block this particular series (nor the scalar one), but I am worrying a
> little bit that we're getting multiple completely different solutions
> for the same general problem, and each brings caveats big enough to
> preclude many (most?) potential users.  I don't know what to do about
> that, especially since configuration that is too easy to propagate
> comes with big security problems, but I wanted to at least raise the
> issue and hope others have good ideas.  If nothing else, I want to
> raise awareness to avoid proliferation of similar
> pre-vetted-configuration-deployment mechanisms.  I'm CC'ing a couple
> scalar folks as well for that point.

Yes, that's an accurate description. To reiterate what Jonathan said in
his first cover letter [1], the primary motivation is that we want to be
able to 'suggest' hooks to users. There was an RFC for this
'remote-suggested hooks feature' (docs [2], RFC implementation [3]) but
it ultimately stalled due to security concerns I believe (this was
before I joined the team, so I'm not the most familiar with this).

It might be worth re-reading those threads since they tread on pretty
much the same ground of shipping pre-vetted config (this is directed at
me too, since I haven't read through those in detail). I've also been
told that we're (aka Google) still looking for feedback on [2], so feel
free to share any thoughts on that thread too.

[1] https://lore.kernel.org/git/cover.1634077795.git.jonathantanmy@google.com
[2] https://lore.kernel.org/git/pull.908.v4.git.1620241892929.gitgitgadget@gmail.com/
[3] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 2/2] config: include file if remote URL matches a glob
  2021-12-28  0:55     ` Elijah Newren
@ 2022-01-10 18:58       ` Jonathan Tan
  0 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2022-01-10 18:58 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git, chooglen, gitster

Elijah Newren <newren@gmail.com> writes:
> > +test_expect_success 'includeIf.hasconfig:remote.*.url' '
> > +       git init hasremoteurlTest &&
> > +       test_when_finished "rm -rf hasremoteurlTest" &&
> > +
> > +       cat >include-this <<-\EOF &&
> > +       [user]
> > +               this = this-is-included
> > +       EOF
> > +       cat >dont-include-that <<-\EOF &&
> > +       [user]
> > +               that = that-is-not-included
> > +       EOF
> > +       cat >>hasremoteurlTest/.git/config <<-EOF &&
> > +       [includeIf "hasconfig:remote.*.url:foo"]
> > +               path = "$(pwd)/include-this"
> > +       [includeIf "hasconfig:remote.*.url:bar"]
> > +               path = "$(pwd)/dont-include-that"
> > +       [remote "foo"]
> > +               url = foo
> 
> Which "foo" is relevant here?  The remote name, or the url value?
> Could they be given different values so that the testcase is a bit
> easier to read and understand?

Thanks for taking a look. Sorry for the late reply - I just got back
from vacation.

This is a good point - I'll change one of them.

> > +test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins' '
> > +       git init hasremoteurlTest &&
> > +       test_when_finished "rm -rf hasremoteurlTest" &&
> > +
> > +       cat >include-two-three <<-\EOF &&
> > +       [user]
> > +               two = included-config
> > +               three = included-config
> > +       EOF
> > +       cat >>hasremoteurlTest/.git/config <<-EOF &&
> > +       [remote "foo"]
> > +               url = foo
> 
> ...similarly here.

Noted.

> The testcases are very helpful.  I found myself confused when reading
> just the documentation about how it would be used.  Perhaps an example
> or two should be added to the documentation?

Will do. I notice that there is a section with examples - I'll add it
there.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 0/2] Conditional config includes based on remote URL
  2021-12-28  1:13   ` Elijah Newren
  2021-12-28 23:13     ` Glen Choo
@ 2022-01-10 19:22     ` Jonathan Tan
  2022-01-10 20:17       ` Elijah Newren
  1 sibling, 1 reply; 87+ messages in thread
From: Jonathan Tan @ 2022-01-10 19:22 UTC (permalink / raw)
  To: newren; +Cc: jonathantanmy, git, chooglen, gitster, stolee,
	Johannes.Schindelin

Elijah Newren <newren@gmail.com> writes:
> On Wed, Dec 15, 2021 at 7:25 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > Thanks, everyone, for your comments. I've followed Glen's code
> > suggestion and Junio's documentation suggestion, as you can see in the
> > range-diff.
> 
> So, the basic idea is, in a setting like Google's, you can have users
> install additional files on their system out-of-band, and have the
> users specify a simple line in their configuration to make use of
> those additional files -- or portions thereof.  It's a way of easily
> providing potentially large blocks of pre-vetted configuration for
> users.
> 
> Seems to make sense.  (and I've read over the code lightly, so feel
> free to take this as an Acked-by.)

Thanks.

> But can I back up and comment on a bigger picture item?
> 
> This mechanism requires somehow getting additional files to the user
> separately; projects that span companies (git.git, linux.git, etc.)
> won't likely be able to make use of this.

Yes, they would also need to use a separate mechanism in addition to
Git.

> Scalar also has a mechanism for providing potentially large blocks of
> pre-vetted configuration for users.  It does so as part of a new
> top-level command.  And it does so with a very opinionated set of
> values that are not configurable.  Thus, while I'd like to use it,
> they use a configuration option that would break things badly at my
> $DAYJOB.  (Too many gradle plugins using jgit, which doesn't
> understand index.version=4 and will blow up with a very suboptimal
> error message when they see it.)  And, it's very specific to scalar;
> we probably don't want to add a new toplevel command everytime someone
> wants common configuration to be easily grabbed by some user.

Do you have more information on this? The closest thing I've seen is
"Scalar Config" under "Modifying Configuration Values" in [1], which
seems to be more about bundling additional tools (which may change
config, of course).

Unless you're referring to the config bundled in the Scalar tool itself,
in which case this patch set seems orthogonal and potentially
complementary - I was envisioning config being provided by a package
manager package, but Scalar could provide some too for users to use at
their own discretion.

[1] https://github.com/microsoft/git/blob/7a514b4c2d5df7fdd2f66f048010d8ddcb412d0b/contrib/scalar/docs/troubleshooting.md

> It would be nice if we could find some more generic solution.
> Granted, I can't think of any, and I don't think this comment should
> block this particular series (nor the scalar one), but I am worrying a
> little bit that we're getting multiple completely different solutions
> for the same general problem, and each brings caveats big enough to
> preclude many (most?) potential users.  I don't know what to do about
> that, especially since configuration that is too easy to propagate
> comes with big security problems, but I wanted to at least raise the
> issue and hope others have good ideas.  If nothing else, I want to
> raise awareness to avoid proliferation of similar
> pre-vetted-configuration-deployment mechanisms.  I'm CC'ing a couple
> scalar folks as well for that point.

That's a good point. As Glen said [2], it seems like transmitting config
itself (or, at least, hooks) through Git is something that we (the Git
project) don't want to do, so I have been working from the basis that
Git should just make use of config/hooks delivered through a non-Git
mechanism, and not deliver the config/hooks itself.

[2] https://lore.kernel.org/git/kl6lee5w5nng.fsf@chooglen-macbookpro.roam.corp.google.com/

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v7 0/2] Conditional config includes based on remote URL
  2022-01-10 19:22     ` Jonathan Tan
@ 2022-01-10 20:17       ` Elijah Newren
  2022-01-25 13:26         ` Scalar vs JGit, was " Johannes Schindelin
  0 siblings, 1 reply; 87+ messages in thread
From: Elijah Newren @ 2022-01-10 20:17 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Git Mailing List, Glen Choo, Junio C Hamano, Derrick Stolee,
	Johannes Schindelin

On Mon, Jan 10, 2022 at 11:22 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
> > On Wed, Dec 15, 2021 at 7:25 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> > >
> > > Thanks, everyone, for your comments. I've followed Glen's code
> > > suggestion and Junio's documentation suggestion, as you can see in the
> > > range-diff.
> >
> > So, the basic idea is, in a setting like Google's, you can have users
> > install additional files on their system out-of-band, and have the
> > users specify a simple line in their configuration to make use of
> > those additional files -- or portions thereof.  It's a way of easily
> > providing potentially large blocks of pre-vetted configuration for
> > users.
> >
> > Seems to make sense.  (and I've read over the code lightly, so feel
> > free to take this as an Acked-by.)
>
> Thanks.
>
> > But can I back up and comment on a bigger picture item?
> >
> > This mechanism requires somehow getting additional files to the user
> > separately; projects that span companies (git.git, linux.git, etc.)
> > won't likely be able to make use of this.
>
> Yes, they would also need to use a separate mechanism in addition to
> Git.
>
> > Scalar also has a mechanism for providing potentially large blocks of
> > pre-vetted configuration for users.  It does so as part of a new
> > top-level command.  And it does so with a very opinionated set of
> > values that are not configurable.  Thus, while I'd like to use it,
> > they use a configuration option that would break things badly at my
> > $DAYJOB.  (Too many gradle plugins using jgit, which doesn't
> > understand index.version=4 and will blow up with a very suboptimal
> > error message when they see it.)  And, it's very specific to scalar;
> > we probably don't want to add a new toplevel command everytime someone
> > wants common configuration to be easily grabbed by some user.
>
> Do you have more information on this? The closest thing I've seen is
> "Scalar Config" under "Modifying Configuration Values" in [1], which
> seems to be more about bundling additional tools (which may change
> config, of course).
>
> Unless you're referring to the config bundled in the Scalar tool itself,
> in which case this patch set seems orthogonal and potentially
> complementary - I was envisioning config being provided by a package
> manager package, but Scalar could provide some too for users to use at
> their own discretion.
>
> [1] https://github.com/microsoft/git/blob/7a514b4c2d5df7fdd2f66f048010d8ddcb412d0b/contrib/scalar/docs/troubleshooting.md

Yes, I was referring to the config hardcoded in the Scalar tool itself
(see set_recommended_config() in
https://lore.kernel.org/git/4439ab4de0bc3f48a6bdcf4b5165b16fad792ebd.1638538470.git.gitgitgadget@gmail.com/).

I agree they are different solutions to "help others setup config in a
pre-vetted way", that the two don't seem to conflict, and one can't be
implemented in terms of the other.  It might even be possible for
someone somewhere to simultaneously take advantage of both (not sure
if anyone would try, but I don't forsee problems in doing so, except
in the narrow case that both schemes try to set the same config and
there are worries about which one "wins", which might boil down to
whether the include directive came first in the config file or the
specific config value that scalar set).

> > It would be nice if we could find some more generic solution.
> > Granted, I can't think of any, and I don't think this comment should
> > block this particular series (nor the scalar one), but I am worrying a
> > little bit that we're getting multiple completely different solutions
> > for the same general problem, and each brings caveats big enough to
> > preclude many (most?) potential users.  I don't know what to do about
> > that, especially since configuration that is too easy to propagate
> > comes with big security problems, but I wanted to at least raise the
> > issue and hope others have good ideas.  If nothing else, I want to
> > raise awareness to avoid proliferation of similar
> > pre-vetted-configuration-deployment mechanisms.  I'm CC'ing a couple
> > scalar folks as well for that point.
>
> That's a good point. As Glen said [2], it seems like transmitting config
> itself (or, at least, hooks) through Git is something that we (the Git
> project) don't want to do, so I have been working from the basis that
> Git should just make use of config/hooks delivered through a non-Git
> mechanism, and not deliver the config/hooks itself.
>
> [2] https://lore.kernel.org/git/kl6lee5w5nng.fsf@chooglen-macbookpro.roam.corp.google.com/

Yeah, makes sense.  And I don't know any better solutions.  I guess
all I'm really saying is that if a third narrowly targetted way to
provide pre-vetted configuration shows up on the list, it may be time
to ask folks to step back and try to find a more generic solution.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v8 0/2] Conditional config includes based on remote URL
  2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
                   ` (10 preceding siblings ...)
  2021-12-14 21:31 ` [PATCH v7 0/2] Conditional config includes based on remote URL Jonathan Tan
@ 2022-01-18 17:47 ` Jonathan Tan
  2022-01-18 17:47   ` [PATCH v8 1/2] config: make git_config_include() static Jonathan Tan
                     ` (2 more replies)
  11 siblings, 3 replies; 87+ messages in thread
From: Jonathan Tan @ 2022-01-18 17:47 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, newren

Thanks everyone for your review. v8 follows Elijah Newren's comments
about making URLs more identifiable in tests, and also including an
example in documentation.

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 Documentation/config.txt |  35 +++++++++++
 config.c                 | 132 ++++++++++++++++++++++++++++++++++++---
 config.h                 |  46 ++++----------
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++
 4 files changed, 290 insertions(+), 41 deletions(-)

Range-diff against v7:
1:  b2dcae03ed = 1:  b2dcae03ed config: make git_config_include() static
2:  7c70089074 ! 2:  6691e39c82 config: include file if remote URL matches a glob
    @@ Documentation/config.txt: all branches that begin with `foo/`. This is useful if
      A few more notes on matching via `gitdir` and `gitdir/i`:
      
       * Symlinks in `$GIT_DIR` are not resolved before matching.
    +@@ Documentation/config.txt: Example
    + ; currently checked out
    + [includeIf "onbranch:foo-branch"]
    + 	path = foo.inc
    ++
    ++; include only if a remote with the given URL exists (note
    ++; that such a URL may be provided later in a file or in a
    ++; file read after this file is read, as seen in this example)
    ++[includeIf "hasconfig:remote.*.url:https://example.com/**"]
    ++	path = foo.inc
    ++[remote "origin"]
    ++	url = https://example.com/git
    + ----
    + 
    + Values
     
      ## config.c ##
     @@ config.c: struct config_include_data {
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +		that = that-is-not-included
     +	EOF
     +	cat >>hasremoteurlTest/.git/config <<-EOF &&
    -+	[includeIf "hasconfig:remote.*.url:foo"]
    ++	[includeIf "hasconfig:remote.*.url:foourl"]
     +		path = "$(pwd)/include-this"
    -+	[includeIf "hasconfig:remote.*.url:bar"]
    ++	[includeIf "hasconfig:remote.*.url:barurl"]
     +		path = "$(pwd)/dont-include-that"
     +	[remote "foo"]
    -+		url = foo
    ++		url = foourl
     +	EOF
     +
     +	echo this-is-included >expect-this &&
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +	EOF
     +	cat >>hasremoteurlTest/.git/config <<-EOF &&
     +	[remote "foo"]
    -+		url = foo
    ++		url = foourl
     +	[user]
     +		one = main-config
     +		two = main-config
    -+	[includeIf "hasconfig:remote.*.url:foo"]
    ++	[includeIf "hasconfig:remote.*.url:foourl"]
     +		path = "$(pwd)/include-two-three"
     +	[user]
     +		three = main-config
    @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
     +
     +	cat >include-with-url <<-\EOF &&
     +	[remote "bar"]
    -+		url = bar
    ++		url = barurl
     +	EOF
     +	cat >>hasremoteurlTest/.git/config <<-EOF &&
    -+	[includeIf "hasconfig:remote.*.url:foo"]
    ++	[includeIf "hasconfig:remote.*.url:foourl"]
     +		path = "$(pwd)/include-with-url"
     +	EOF
     +
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v8 1/2] config: make git_config_include() static
  2022-01-18 17:47 ` [PATCH v8 " Jonathan Tan
@ 2022-01-18 17:47   ` Jonathan Tan
  2022-01-18 17:47   ` [PATCH v8 2/2] config: include file if remote URL matches a glob Jonathan Tan
  2022-01-18 20:54   ` [PATCH v8 0/2] Conditional config includes based on remote URL Elijah Newren
  2 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2022-01-18 17:47 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, newren

It is not used from outside the file in which it is declared.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c | 12 +++++++++++-
 config.h | 37 ++++---------------------------------
 2 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/config.c b/config.c
index 2dcbe901b6..94ad5ce913 100644
--- a/config.c
+++ b/config.c
@@ -120,6 +120,16 @@ static long config_buf_ftell(struct config_source *conf)
 	return conf->u.buf.pos;
 }
 
+struct config_include_data {
+	int depth;
+	config_fn_t fn;
+	void *data;
+	const struct config_options *opts;
+};
+#define CONFIG_INCLUDE_INIT { 0 }
+
+static int git_config_include(const char *var, const char *value, void *data);
+
 #define MAX_INCLUDE_DEPTH 10
 static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
@@ -306,7 +316,7 @@ static int include_condition_is_true(const struct config_options *opts,
 	return 0;
 }
 
-int git_config_include(const char *var, const char *value, void *data)
+static int git_config_include(const char *var, const char *value, void *data)
 {
 	struct config_include_data *inc = data;
 	const char *cond, *key;
diff --git a/config.h b/config.h
index f119de0130..48a5e472ca 100644
--- a/config.h
+++ b/config.h
@@ -126,6 +126,8 @@ int git_default_config(const char *, const char *, void *);
 /**
  * Read a specific file in git-config format.
  * This function takes the same callback and data parameters as `git_config`.
+ *
+ * Unlike git_config(), this function does not respect includes.
  */
 int git_config_from_file(config_fn_t fn, const char *, void *);
 
@@ -158,6 +160,8 @@ void read_very_early_config(config_fn_t cb, void *data);
  * will first feed the user-wide one to the callback, and then the
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
+ *
+ * Unlike git_config_from_file(), this function respects includes.
  */
 void git_config(config_fn_t fn, void *);
 
@@ -338,39 +342,6 @@ const char *current_config_origin_type(void);
 const char *current_config_name(void);
 int current_config_line(void);
 
-/**
- * Include Directives
- * ------------------
- *
- * By default, the config parser does not respect include directives.
- * However, a caller can use the special `git_config_include` wrapper
- * callback to support them. To do so, you simply wrap your "real" callback
- * function and data pointer in a `struct config_include_data`, and pass
- * the wrapper to the regular config-reading functions. For example:
- *
- * -------------------------------------------
- * int read_file_with_include(const char *file, config_fn_t fn, void *data)
- * {
- * struct config_include_data inc = CONFIG_INCLUDE_INIT;
- * inc.fn = fn;
- * inc.data = data;
- * return git_config_from_file(git_config_include, file, &inc);
- * }
- * -------------------------------------------
- *
- * `git_config` respects includes automatically. The lower-level
- * `git_config_from_file` does not.
- *
- */
-struct config_include_data {
-	int depth;
-	config_fn_t fn;
-	void *data;
-	const struct config_options *opts;
-};
-#define CONFIG_INCLUDE_INIT { 0 }
-int git_config_include(const char *name, const char *value, void *data);
-
 /*
  * Match and parse a config key of the form:
  *
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v8 2/2] config: include file if remote URL matches a glob
  2022-01-18 17:47 ` [PATCH v8 " Jonathan Tan
  2022-01-18 17:47   ` [PATCH v8 1/2] config: make git_config_include() static Jonathan Tan
@ 2022-01-18 17:47   ` Jonathan Tan
  2022-01-18 20:54   ` [PATCH v8 0/2] Conditional config includes based on remote URL Elijah Newren
  2 siblings, 0 replies; 87+ messages in thread
From: Jonathan Tan @ 2022-01-18 17:47 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, chooglen, newren

This is a feature that supports config file inclusion conditional on
whether the repo has a remote with a URL that matches a glob.

Similar to my previous work on remote-suggested hooks [1], the main
motivation is to allow remote repo administrators to provide recommended
configs in a way that can be consumed more easily (e.g. through a
package installable by a package manager - it could, for example,
contain a file to be included conditionally and a post-install script
that adds the include directive to the system-wide config file).

In order to do this, Git reruns the config parsing mechanism upon
noticing the first URL-conditional include in order to find all remote
URLs, and these remote URLs are then used to determine if that first and
all subsequent includes are executed. Remote URLs are not allowed to be
configued in any URL-conditionally-included file.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/config.txt |  35 ++++++++++++
 config.c                 | 120 ++++++++++++++++++++++++++++++++++++---
 config.h                 |   9 +++
 t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 275 insertions(+), 7 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0c0e6b859f..5a5205952e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -159,6 +159,33 @@ all branches that begin with `foo/`. This is useful if your branches are
 organized hierarchically and you would like to apply a configuration to
 all the branches in that hierarchy.
 
+`hasconfig:remote.*.url:`::
+	The data that follows this keyword is taken to
+	be a pattern with standard globbing wildcards and two
+	additional ones, `**/` and `/**`, that can match multiple
+	components. The first time this keyword is seen, the rest of
+	the config files will be scanned for remote URLs (without
+	applying any values). If there exists at least one remote URL
+	that matches this pattern, the include condition is met.
++
+Files included by this option (directly or indirectly) are not allowed
+to contain remote URLs.
++
+Note that unlike other includeIf conditions, resolving this condition
+relies on information that is not yet known at the point of reading the
+condition. A typical use case is this option being present as a
+system-level or global-level config, and the remote URL being in a
+local-level config; hence the need to scan ahead when resolving this
+condition. In order to avoid the chicken-and-egg problem in which
+potentially-included files can affect whether such files are potentially
+included, Git breaks the cycle by prohibiting these files from affecting
+the resolution of these conditions (thus, prohibiting them from
+declaring remote URLs).
++
+As for the naming of this keyword, it is for forwards compatibiliy with
+a naming scheme that supports more variable-based include conditions,
+but currently Git only supports the exact keyword described above.
+
 A few more notes on matching via `gitdir` and `gitdir/i`:
 
  * Symlinks in `$GIT_DIR` are not resolved before matching.
@@ -226,6 +253,14 @@ Example
 ; currently checked out
 [includeIf "onbranch:foo-branch"]
 	path = foo.inc
+
+; include only if a remote with the given URL exists (note
+; that such a URL may be provided later in a file or in a
+; file read after this file is read, as seen in this example)
+[includeIf "hasconfig:remote.*.url:https://example.com/**"]
+	path = foo.inc
+[remote "origin"]
+	url = https://example.com/git
 ----
 
 Values
diff --git a/config.c b/config.c
index 94ad5ce913..ac4534ecf2 100644
--- a/config.c
+++ b/config.c
@@ -125,6 +125,12 @@ struct config_include_data {
 	config_fn_t fn;
 	void *data;
 	const struct config_options *opts;
+	struct git_config_source *config_source;
+
+	/*
+	 * All remote URLs discovered when reading all config files.
+	 */
+	struct string_list *remote_urls;
 };
 #define CONFIG_INCLUDE_INIT { 0 }
 
@@ -301,9 +307,92 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	return ret;
 }
 
-static int include_condition_is_true(const struct config_options *opts,
+static int add_remote_url(const char *var, const char *value, void *data)
+{
+	struct string_list *remote_urls = data;
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		string_list_append(remote_urls, value);
+	return 0;
+}
+
+static void populate_remote_urls(struct config_include_data *inc)
+{
+	struct config_options opts;
+
+	struct config_source *store_cf = cf;
+	struct key_value_info *store_kvi = current_config_kvi;
+	enum config_scope store_scope = current_parsing_scope;
+
+	opts = *inc->opts;
+	opts.unconditional_remote_url = 1;
+
+	cf = NULL;
+	current_config_kvi = NULL;
+	current_parsing_scope = 0;
+
+	inc->remote_urls = xmalloc(sizeof(*inc->remote_urls));
+	string_list_init_dup(inc->remote_urls);
+	config_with_options(add_remote_url, inc->remote_urls, inc->config_source, &opts);
+
+	cf = store_cf;
+	current_config_kvi = store_kvi;
+	current_parsing_scope = store_scope;
+}
+
+static int forbid_remote_url(const char *var, const char *value, void *data)
+{
+	const char *remote_name;
+	size_t remote_name_len;
+	const char *key;
+
+	if (!parse_config_key(var, "remote", &remote_name, &remote_name_len,
+			      &key) &&
+	    remote_name &&
+	    !strcmp(key, "url"))
+		die(_("remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url"));
+	return 0;
+}
+
+static int at_least_one_url_matches_glob(const char *glob, int glob_len,
+					 struct string_list *remote_urls)
+{
+	struct strbuf pattern = STRBUF_INIT;
+	struct string_list_item *url_item;
+	int found = 0;
+
+	strbuf_add(&pattern, glob, glob_len);
+	for_each_string_list_item(url_item, remote_urls) {
+		if (!wildmatch(pattern.buf, url_item->string, WM_PATHNAME)) {
+			found = 1;
+			break;
+		}
+	}
+	strbuf_release(&pattern);
+	return found;
+}
+
+static int include_by_remote_url(struct config_include_data *inc,
+		const char *cond, size_t cond_len)
+{
+	if (inc->opts->unconditional_remote_url)
+		return 1;
+	if (!inc->remote_urls)
+		populate_remote_urls(inc);
+	return at_least_one_url_matches_glob(cond, cond_len,
+					     inc->remote_urls);
+}
+
+static int include_condition_is_true(struct config_include_data *inc,
 				     const char *cond, size_t cond_len)
 {
+	const struct config_options *opts = inc->opts;
 
 	if (skip_prefix_mem(cond, cond_len, "gitdir:", &cond, &cond_len))
 		return include_by_gitdir(opts, cond, cond_len, 0);
@@ -311,6 +400,9 @@ static int include_condition_is_true(const struct config_options *opts,
 		return include_by_gitdir(opts, cond, cond_len, 1);
 	else if (skip_prefix_mem(cond, cond_len, "onbranch:", &cond, &cond_len))
 		return include_by_branch(cond, cond_len);
+	else if (skip_prefix_mem(cond, cond_len, "hasconfig:remote.*.url:", &cond,
+				   &cond_len))
+		return include_by_remote_url(inc, cond, cond_len);
 
 	/* unknown conditionals are always false */
 	return 0;
@@ -335,9 +427,15 @@ static int git_config_include(const char *var, const char *value, void *data)
 		ret = handle_path_include(value, inc);
 
 	if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
-	    (cond && include_condition_is_true(inc->opts, cond, cond_len)) &&
-	    !strcmp(key, "path"))
+	    cond && include_condition_is_true(inc, cond, cond_len) &&
+	    !strcmp(key, "path")) {
+		config_fn_t old_fn = inc->fn;
+
+		if (inc->opts->unconditional_remote_url)
+			inc->fn = forbid_remote_url;
 		ret = handle_path_include(value, inc);
+		inc->fn = old_fn;
+	}
 
 	return ret;
 }
@@ -1933,11 +2031,13 @@ int config_with_options(config_fn_t fn, void *data,
 			const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
+	int ret;
 
 	if (opts->respect_includes) {
 		inc.fn = fn;
 		inc.data = data;
 		inc.opts = opts;
+		inc.config_source = config_source;
 		fn = git_config_include;
 		data = &inc;
 	}
@@ -1950,17 +2050,23 @@ int config_with_options(config_fn_t fn, void *data,
 	 * regular lookup sequence.
 	 */
 	if (config_source && config_source->use_stdin) {
-		return git_config_from_stdin(fn, data);
+		ret = git_config_from_stdin(fn, data);
 	} else if (config_source && config_source->file) {
-		return git_config_from_file(fn, config_source->file, data);
+		ret = git_config_from_file(fn, config_source->file, data);
 	} else if (config_source && config_source->blob) {
 		struct repository *repo = config_source->repo ?
 			config_source->repo : the_repository;
-		return git_config_from_blob_ref(fn, repo, config_source->blob,
+		ret = git_config_from_blob_ref(fn, repo, config_source->blob,
 						data);
+	} else {
+		ret = do_git_config_sequence(opts, fn, data);
 	}
 
-	return do_git_config_sequence(opts, fn, data);
+	if (inc.remote_urls) {
+		string_list_clear(inc.remote_urls, 0);
+		FREE_AND_NULL(inc.remote_urls);
+	}
+	return ret;
 }
 
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
diff --git a/config.h b/config.h
index 48a5e472ca..ab0106d287 100644
--- a/config.h
+++ b/config.h
@@ -89,6 +89,15 @@ struct config_options {
 	unsigned int ignore_worktree : 1;
 	unsigned int ignore_cmdline : 1;
 	unsigned int system_gently : 1;
+
+	/*
+	 * For internal use. Include all includeif.hasremoteurl paths without
+	 * checking if the repo has that remote URL, and when doing so, verify
+	 * that files included in this way do not configure any remote URLs
+	 * themselves.
+	 */
+	unsigned int unconditional_remote_url : 1;
+
 	const char *commondir;
 	const char *git_dir;
 	config_parser_event_fn_t event_fn;
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 9ff46f3b04..c6b5911c4d 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2387,4 +2387,122 @@ test_expect_success '--get and --get-all with --fixed-value' '
 	test_must_fail git config --file=config --get-regexp --fixed-value fixed+ non-existent
 '
 
+test_expect_success 'includeIf.hasconfig:remote.*.url' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-this <<-\EOF &&
+	[user]
+		this = this-is-included
+	EOF
+	cat >dont-include-that <<-\EOF &&
+	[user]
+		that = that-is-not-included
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foourl"]
+		path = "$(pwd)/include-this"
+	[includeIf "hasconfig:remote.*.url:barurl"]
+		path = "$(pwd)/dont-include-that"
+	[remote "foo"]
+		url = foourl
+	EOF
+
+	echo this-is-included >expect-this &&
+	git -C hasremoteurlTest config --get user.this >actual-this &&
+	test_cmp expect-this actual-this &&
+
+	test_must_fail git -C hasremoteurlTest config --get user.that
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url respects last-config-wins' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-two-three <<-\EOF &&
+	[user]
+		two = included-config
+		three = included-config
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = foourl
+	[user]
+		one = main-config
+		two = main-config
+	[includeIf "hasconfig:remote.*.url:foourl"]
+		path = "$(pwd)/include-two-three"
+	[user]
+		three = main-config
+	EOF
+
+	echo main-config >expect-main-config &&
+	echo included-config >expect-included-config &&
+
+	git -C hasremoteurlTest config --get user.one >actual &&
+	test_cmp expect-main-config actual &&
+
+	git -C hasremoteurlTest config --get user.two >actual &&
+	test_cmp expect-included-config actual &&
+
+	git -C hasremoteurlTest config --get user.three >actual &&
+	test_cmp expect-main-config actual
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url globs' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	printf "[user]\ndss = yes\n" >double-star-start &&
+	printf "[user]\ndse = yes\n" >double-star-end &&
+	printf "[user]\ndsm = yes\n" >double-star-middle &&
+	printf "[user]\nssm = yes\n" >single-star-middle &&
+	printf "[user]\nno = no\n" >no &&
+
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[remote "foo"]
+		url = https://foo/bar/baz
+	[includeIf "hasconfig:remote.*.url:**/baz"]
+		path = "$(pwd)/double-star-start"
+	[includeIf "hasconfig:remote.*.url:**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**"]
+		path = "$(pwd)/double-star-end"
+	[includeIf "hasconfig:remote.*.url:nomatch:/**"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https:/**/baz"]
+		path = "$(pwd)/double-star-middle"
+	[includeIf "hasconfig:remote.*.url:https:/**/nomatch"]
+		path = "$(pwd)/no"
+	[includeIf "hasconfig:remote.*.url:https://*/bar/baz"]
+		path = "$(pwd)/single-star-middle"
+	[includeIf "hasconfig:remote.*.url:https://*/baz"]
+		path = "$(pwd)/no"
+	EOF
+
+	git -C hasremoteurlTest config --get user.dss &&
+	git -C hasremoteurlTest config --get user.dse &&
+	git -C hasremoteurlTest config --get user.dsm &&
+	git -C hasremoteurlTest config --get user.ssm &&
+	test_must_fail git -C hasremoteurlTest config --get user.no
+'
+
+test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such included files' '
+	git init hasremoteurlTest &&
+	test_when_finished "rm -rf hasremoteurlTest" &&
+
+	cat >include-with-url <<-\EOF &&
+	[remote "bar"]
+		url = barurl
+	EOF
+	cat >>hasremoteurlTest/.git/config <<-EOF &&
+	[includeIf "hasconfig:remote.*.url:foourl"]
+		path = "$(pwd)/include-with-url"
+	EOF
+
+	# test with any Git command
+	test_must_fail git -C hasremoteurlTest status 2>err &&
+	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
+'
+
 test_done
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v8 0/2] Conditional config includes based on remote URL
  2022-01-18 17:47 ` [PATCH v8 " Jonathan Tan
  2022-01-18 17:47   ` [PATCH v8 1/2] config: make git_config_include() static Jonathan Tan
  2022-01-18 17:47   ` [PATCH v8 2/2] config: include file if remote URL matches a glob Jonathan Tan
@ 2022-01-18 20:54   ` Elijah Newren
  2 siblings, 0 replies; 87+ messages in thread
From: Elijah Newren @ 2022-01-18 20:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List, Glen Choo

On Tue, Jan 18, 2022 at 9:47 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks everyone for your review. v8 follows Elijah Newren's comments
> about making URLs more identifiable in tests, and also including an
> example in documentation.

Thanks for that; this version looks good to me.

> Jonathan Tan (2):
>   config: make git_config_include() static
>   config: include file if remote URL matches a glob
>
>  Documentation/config.txt |  35 +++++++++++
>  config.c                 | 132 ++++++++++++++++++++++++++++++++++++---
>  config.h                 |  46 ++++----------
>  t/t1300-config.sh        | 118 ++++++++++++++++++++++++++++++++++
>  4 files changed, 290 insertions(+), 41 deletions(-)
>
> Range-diff against v7:
> 1:  b2dcae03ed = 1:  b2dcae03ed config: make git_config_include() static
> 2:  7c70089074 ! 2:  6691e39c82 config: include file if remote URL matches a glob
>     @@ Documentation/config.txt: all branches that begin with `foo/`. This is useful if
>       A few more notes on matching via `gitdir` and `gitdir/i`:
>
>        * Symlinks in `$GIT_DIR` are not resolved before matching.
>     +@@ Documentation/config.txt: Example
>     + ; currently checked out
>     + [includeIf "onbranch:foo-branch"]
>     +   path = foo.inc
>     ++
>     ++; include only if a remote with the given URL exists (note
>     ++; that such a URL may be provided later in a file or in a
>     ++; file read after this file is read, as seen in this example)
>     ++[includeIf "hasconfig:remote.*.url:https://example.com/**"]
>     ++  path = foo.inc
>     ++[remote "origin"]
>     ++  url = https://example.com/git
>     + ----
>     +
>     + Values
>
>       ## config.c ##
>      @@ config.c: struct config_include_data {
>     @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
>      +          that = that-is-not-included
>      +  EOF
>      +  cat >>hasremoteurlTest/.git/config <<-EOF &&
>     -+  [includeIf "hasconfig:remote.*.url:foo"]
>     ++  [includeIf "hasconfig:remote.*.url:foourl"]
>      +          path = "$(pwd)/include-this"
>     -+  [includeIf "hasconfig:remote.*.url:bar"]
>     ++  [includeIf "hasconfig:remote.*.url:barurl"]
>      +          path = "$(pwd)/dont-include-that"
>      +  [remote "foo"]
>     -+          url = foo
>     ++          url = foourl
>      +  EOF
>      +
>      +  echo this-is-included >expect-this &&
>     @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
>      +  EOF
>      +  cat >>hasremoteurlTest/.git/config <<-EOF &&
>      +  [remote "foo"]
>     -+          url = foo
>     ++          url = foourl
>      +  [user]
>      +          one = main-config
>      +          two = main-config
>     -+  [includeIf "hasconfig:remote.*.url:foo"]
>     ++  [includeIf "hasconfig:remote.*.url:foourl"]
>      +          path = "$(pwd)/include-two-three"
>      +  [user]
>      +          three = main-config
>     @@ t/t1300-config.sh: test_expect_success '--get and --get-all with --fixed-value'
>      +
>      +  cat >include-with-url <<-\EOF &&
>      +  [remote "bar"]
>     -+          url = bar
>     ++          url = barurl
>      +  EOF
>      +  cat >>hasremoteurlTest/.git/config <<-EOF &&
>     -+  [includeIf "hasconfig:remote.*.url:foo"]
>     ++  [includeIf "hasconfig:remote.*.url:foourl"]
>      +          path = "$(pwd)/include-with-url"
>      +  EOF
>      +
> --
> 2.34.1.703.g22d0c6ccf7-goog
>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Scalar vs JGit, was Re: [PATCH v7 0/2] Conditional config includes based on remote URL
  2022-01-10 20:17       ` Elijah Newren
@ 2022-01-25 13:26         ` Johannes Schindelin
  0 siblings, 0 replies; 87+ messages in thread
From: Johannes Schindelin @ 2022-01-25 13:26 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jonathan Tan, Git Mailing List, Glen Choo, Junio C Hamano,
	Derrick Stolee

Hi Elijah,

On Mon, 10 Jan 2022, Elijah Newren wrote:

> On Mon, Jan 10, 2022 at 11:22 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > Elijah Newren <newren@gmail.com> writes:
> >
> > > Scalar also has a mechanism for providing potentially large blocks
> > > of pre-vetted configuration for users.  It does so as part of a new
> > > top-level command.  And it does so with a very opinionated set of
> > > values that are not configurable.  Thus, while I'd like to use it,
> > > they use a configuration option that would break things badly at my
> > > $DAYJOB.  (Too many gradle plugins using jgit, which doesn't
> > > understand index.version=4 and will blow up with a very suboptimal
> > > error message when they see it.)  And, it's very specific to scalar;
> > > we probably don't want to add a new toplevel command everytime
> > > someone wants common configuration to be easily grabbed by some
> > > user.
> >
> > Do you have more information on this? The closest thing I've seen is
> > "Scalar Config" under "Modifying Configuration Values" in [1], which
> > seems to be more about bundling additional tools (which may change
> > config, of course).
> >
> > Unless you're referring to the config bundled in the Scalar tool itself,
> > in which case this patch set seems orthogonal and potentially
> > complementary - I was envisioning config being provided by a package
> > manager package, but Scalar could provide some too for users to use at
> > their own discretion.
> >
> > [1] https://github.com/microsoft/git/blob/7a514b4c2d5df7fdd2f66f048010d8ddcb412d0b/contrib/scalar/docs/troubleshooting.md
>
> Yes, I was referring to the config hardcoded in the Scalar tool itself
> (see set_recommended_config() in
> https://lore.kernel.org/git/4439ab4de0bc3f48a6bdcf4b5165b16fad792ebd.1638538470.git.gitgitgadget@gmail.com/).

I was kind of thinking that such problems might be solved via introducing
e.g. `scalar.ensureJGitCompatibility = true` (which should be a relatively
trivial patch to write).

What do you think?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2022-01-25 13:35 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 22:57 [RFC PATCH 0/2] Conditional config includes based on remote URL Jonathan Tan
2021-10-12 22:57 ` [RFC PATCH 1/2] config: make git_config_include() static Jonathan Tan
2021-10-12 23:07   ` Jeff King
2021-10-12 23:26   ` Junio C Hamano
2021-10-13  8:26   ` Ævar Arnfjörð Bjarmason
2021-10-13 17:00     ` Junio C Hamano
2021-10-13 18:13       ` Jonathan Tan
2021-10-12 22:57 ` [RFC PATCH 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-10-12 23:30   ` Jeff King
2021-10-13 18:33     ` Jonathan Tan
2021-10-27 11:40       ` Jeff King
2021-10-27 17:23         ` Jonathan Tan
2021-10-12 23:48   ` Junio C Hamano
2021-10-13 19:52     ` Jonathan Tan
2021-10-13  0:46 ` [RFC PATCH 0/2] Conditional config includes based on remote URL brian m. carlson
2021-10-13 18:17   ` Jonathan Tan
2021-10-18 20:48 ` Jonathan Tan
2021-10-22  3:12   ` Emily Shaffer
2021-10-27 11:55   ` Jeff King
2021-10-27 17:52     ` Jonathan Tan
2021-10-27 20:32       ` Jeff King
2021-10-25 13:03 ` Ævar Arnfjörð Bjarmason
2021-10-25 18:53   ` Jonathan Tan
2021-10-26 10:12     ` Ævar Arnfjörð Bjarmason
2021-10-29 17:31 ` [WIP v2 " Jonathan Tan
2021-10-29 17:31   ` [WIP v2 1/2] config: make git_config_include() static Jonathan Tan
2021-11-05 19:45     ` Emily Shaffer
2021-10-29 17:31   ` [WIP v2 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-11-05 20:24     ` Emily Shaffer
2021-11-06  4:41       ` Ævar Arnfjörð Bjarmason
2021-11-09  0:25         ` Jonathan Tan
2021-11-09  0:22       ` Jonathan Tan
2021-11-16  0:00 ` [PATCH v3 0/2] Conditional config includes based on remote URL Jonathan Tan
2021-11-16  0:00   ` [PATCH v3 1/2] config: make git_config_include() static Jonathan Tan
2021-11-16  0:00   ` [PATCH v3 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-11-22 22:59     ` Glen Choo
2021-11-29 17:53       ` Jonathan Tan
2021-11-23  1:22     ` Junio C Hamano
2021-11-29 18:18       ` Jonathan Tan
2021-12-01 18:51         ` Junio C Hamano
2021-12-02 23:14           ` Jonathan Tan
2021-11-23  1:27     ` Ævar Arnfjörð Bjarmason
2021-11-29 18:33       ` Jonathan Tan
2021-11-29 20:50         ` Ævar Arnfjörð Bjarmason
2021-11-29 20:23 ` [PATCH v4 0/2] Conditional config includes based on remote URL Jonathan Tan
2021-11-29 20:23   ` [PATCH v4 1/2] config: make git_config_include() static Jonathan Tan
2021-11-29 20:23   ` [PATCH v4 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-12-02  6:57     ` Junio C Hamano
2021-12-02 17:41       ` Jonathan Tan
2021-11-29 20:48   ` [PATCH v4 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
2021-11-30  7:51     ` Junio C Hamano
2021-12-02 23:31 ` [PATCH v5 " Jonathan Tan
2021-12-02 23:31   ` [PATCH v5 1/2] config: make git_config_include() static Jonathan Tan
2021-12-02 23:31   ` [PATCH v5 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-12-06 22:32     ` Glen Choo
2021-12-07 17:53       ` Jonathan Tan
2021-12-06 18:57   ` [PATCH v5 0/2] Conditional config includes based on remote URL Ævar Arnfjörð Bjarmason
2021-12-07 17:46     ` Jonathan Tan
2021-12-07 17:56       ` Ævar Arnfjörð Bjarmason
2021-12-07 18:52         ` Jonathan Tan
2021-12-07 23:23 ` [PATCH v6 " Jonathan Tan
2021-12-07 23:23   ` [PATCH v6 1/2] config: make git_config_include() static Jonathan Tan
2021-12-07 23:23   ` [PATCH v6 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-12-08 19:19     ` Glen Choo
2021-12-09 22:16       ` Jonathan Tan
2021-12-08 19:55     ` Glen Choo
2021-12-09 22:39       ` Jonathan Tan
2021-12-09 23:33         ` Glen Choo
2021-12-13 23:35           ` Jonathan Tan
2021-12-10 21:45         ` Junio C Hamano
2021-12-13 23:37           ` Jonathan Tan
2021-12-14 21:31 ` [PATCH v7 0/2] Conditional config includes based on remote URL Jonathan Tan
2021-12-14 21:31   ` [PATCH v7 1/2] config: make git_config_include() static Jonathan Tan
2021-12-14 21:31   ` [PATCH v7 2/2] config: include file if remote URL matches a glob Jonathan Tan
2021-12-16 21:54     ` Glen Choo
2021-12-28  0:55     ` Elijah Newren
2022-01-10 18:58       ` Jonathan Tan
2021-12-16 21:57   ` [PATCH v7 0/2] Conditional config includes based on remote URL Glen Choo
2021-12-28  1:13   ` Elijah Newren
2021-12-28 23:13     ` Glen Choo
2022-01-10 19:22     ` Jonathan Tan
2022-01-10 20:17       ` Elijah Newren
2022-01-25 13:26         ` Scalar vs JGit, was " Johannes Schindelin
2022-01-18 17:47 ` [PATCH v8 " Jonathan Tan
2022-01-18 17:47   ` [PATCH v8 1/2] config: make git_config_include() static Jonathan Tan
2022-01-18 17:47   ` [PATCH v8 2/2] config: include file if remote URL matches a glob Jonathan Tan
2022-01-18 20:54   ` [PATCH v8 0/2] Conditional config includes based on remote URL Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).