From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <patrick.steinhardt@elego.de>
Cc: git@vger.kernel.org, Patrick Steinhardt <ps@pks.im>,
Philip Oakley <philipoakley@iee.org>
Subject: Re: [PATCH v3 4/4] urlmatch: allow globbing for the URL host part
Date: Thu, 26 Jan 2017 12:43:31 -0800 [thread overview]
Message-ID: <xmqq7f5h4kng.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: 20170125095648.4116-5-patrick.steinhardt@elego.de
Patrick Steinhardt <patrick.steinhardt@elego.de> writes:
> The URL matching function computes for two URLs whether they match not.
> The match is performed by splitting up the URL into different parts and
> then doing an exact comparison with the to-be-matched URL.
>
> The main user of `urlmatch` is the configuration subsystem. It allows to
> set certain configurations based on the URL which is being connected to
> via keys like `http.<url>.*`. A common use case for this is to set
> proxies for only some remotes which match the given URL. Unfortunately,
> having exact matches for all parts of the URL can become quite tedious
> in some setups. Imagine for example a corporate network where there are
> dozens or even hundreds of subdomains, which would have to be configured
> individually.
>
> This commit introduces the ability to use globbing in the host-part of
> the URLs. A user can simply specify a `*` as part of the host name to
> match all subdomains at this level. For example adding a configuration
> key `http.https://*.example.com.proxy` will match all subdomains of
> `https://example.com`.
This is probably a useful improvement.
Having said that, when I mentioned "glob", I meant to also support
something like this:
https://www[1-4].ibm.com/
And when people read "glob", that is what they expect.
So calling this "the ability to use globbing" is misleading.
The last paragraph in the log message above needs a bit of
tweaking, perhaps like this:
Allow users to write an asterisk '*' in place of any 'host'
or 'subdomain' label as part of the host name. For example,
"http.https://*.example.com.proxy" sets "http.proxy" for all
direct subdomains of "https://example.com",
e.g. "https://foo.example.com", but not
"https://foo.bar.example.com".
Fortunately, your update to config.txt, which is facing the end
users, does not misuse the word and instead is explicit that the
only thing the matcher does is to match '*' to a single hierarchy.
It is clear that even http://www*.ibm.com/ is not supported from
the description, which is good.
> . Host/domain name (e.g., `example.com` in `https://example.com/`).
> - This field must match exactly between the config key and the URL.
> + This field must match between the config key and the URL. It is
> + possible to specify a `*` as part of the host name to match all subdomains
> + at this level. `https://*.example.com/` for example would match
> + `https://foo.example.com/`, but not `https://foo.bar.example.com/`.
This is good as-is.
> . Port number (e.g., `8080` in `http://example.com:8080/`).
> This field must match exactly between the config key and the URL.
> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index 923bfc5a2..ec545e092 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh
> @@ -1177,6 +1177,42 @@ test_expect_success 'urlmatch' '
> test_cmp expect actual
> '
>
> +test_expect_success 'glob-based urlmatch' '
This is not "glob". A more generic term "wildcard" is OK.
> + cat >.git/config <<-\EOF &&
> + [http]
> + sslVerify
> ...
> +static int match_host(const struct url_info *url_info,
> + const struct url_info *pattern_info)
> +{
> + char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
> + char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
> + char *url_tok, *pat_tok, *url_save, *pat_save;
> + int matching;
> +
> + url_tok = strtok_r(url, ".", &url_save);
> + pat_tok = strtok_r(pat, ".", &pat_save);
Hmph, this will be the first use of strtok_r() in our codebase.
Does everybody have it?
For a use like this where your delimiter set is a singleton, it may
be simpler to do the usual strchrnul() or memchr() based loop. The
attached is my attempt to do so on top of this patch.
> +
> + for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
> + pat_tok = strtok_r(NULL, ".", &pat_save)) {
> + if (!strcmp(pat_tok, "*"))
> + continue; /* a simple glob matches everything */
s/glob/asterisk/
Other than that, the patch looks OK.
diff --git a/urlmatch.c b/urlmatch.c
index 53ff972a60..8dfc7fd28a 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -63,36 +63,47 @@ static int append_normalized_escapes(struct strbuf *buf,
return 1;
}
+static const char *end_of_token(const char *s, int c, size_t n)
+{
+ const char *next = memchr(s, c, n);
+ if (!next)
+ next = s + n;
+ return next;
+}
+
static int match_host(const struct url_info *url_info,
const struct url_info *pattern_info)
{
- char *url = xmemdupz(url_info->url + url_info->host_off, url_info->host_len);
- char *pat = xmemdupz(pattern_info->url + pattern_info->host_off, pattern_info->host_len);
- char *url_tok, *pat_tok, *url_save, *pat_save;
- int matching;
-
- url_tok = strtok_r(url, ".", &url_save);
- pat_tok = strtok_r(pat, ".", &pat_save);
-
- for (; url_tok && pat_tok; url_tok = strtok_r(NULL, ".", &url_save),
- pat_tok = strtok_r(NULL, ".", &pat_save)) {
- if (!strcmp(pat_tok, "*"))
- continue; /* a simple glob matches everything */
-
- if (strcmp(url_tok, pat_tok)) {
- /* subdomains do not match */
- matching = 0;
- break;
- }
+ const char *url = url_info->url + url_info->host_off;
+ const char *pat = pattern_info->url + pattern_info->host_off;
+ int url_len = url_info->host_len;
+ int pat_len = pattern_info->host_len;
+
+ while (url_len && pat_len) {
+ const char *url_next = end_of_token(url, '.', url_len);
+ const char *pat_next = end_of_token(pat, '.', pat_len);
+
+ if (pat_next == pat + 1 && pat[0] == '*')
+ /* wildcard matches anything */
+ ;
+ else if ((pat_next - pat) == (url_next - url) &&
+ !memcmp(url, pat, url_next - url))
+ /* the components are the same */
+ ;
+ else
+ return 0; /* found an unmatch */
+
+ if (url_next < url + url_len)
+ url_next++;
+ url_len -= url_next - url;
+ url = url_next;
+ if (pat_next < pat + pat_len)
+ pat_next++;
+ pat_len -= pat_next - pat;
+ pat = pat_next;
}
- /* matching if both URL and pattern are at their ends */
- matching = (url_tok == NULL && pat_tok == NULL);
-
- free(url);
- free(pat);
-
- return matching;
+ return 1;
}
static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
next prev parent reply other threads:[~2017-01-26 20:47 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-23 13:06 [PATCH v1 0/2] urlmatch: allow regexp-based matches Patrick Steinhardt
2017-01-23 13:06 ` [PATCH v1 1/2] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-23 13:06 ` [PATCH v1 2/2] urlmatch: allow regex-based URL matching Patrick Steinhardt
2017-01-23 19:53 ` [PATCH v1 0/2] urlmatch: allow regexp-based matches Junio C Hamano
2017-01-24 11:29 ` Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 0/4] " Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-24 17:00 ` [PATCH v2 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
2017-01-24 17:52 ` Philip Oakley
2017-01-25 9:57 ` Patrick Steinhardt
2017-01-25 9:56 ` [PATCH v3 0/4] urlmatch: allow regexp-based matches Patrick Steinhardt
2017-01-25 9:56 ` [PATCH v3 1/4] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-25 9:56 ` [PATCH v3 2/4] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-25 9:56 ` [PATCH v3 3/4] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-25 9:56 ` [PATCH v3 4/4] urlmatch: allow globbing for the URL host part Patrick Steinhardt
2017-01-26 20:43 ` Junio C Hamano [this message]
2017-01-26 20:49 ` Junio C Hamano
2017-01-26 21:12 ` Junio C Hamano
2017-01-27 6:21 ` Patrick Steinhardt
2017-01-27 17:45 ` Junio C Hamano
2017-01-27 10:32 ` [PATCH v4 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
2017-01-30 22:00 ` Junio C Hamano
2017-01-30 22:52 ` Junio C Hamano
2017-01-31 8:26 ` Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 4/5] urlmatch: include host and port in urlmatch length Patrick Steinhardt
2017-01-27 10:32 ` [PATCH v4 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt
2017-01-31 9:01 ` [PATCH v5 0/5] urlmatch: allow wildcard-based matches Patrick Steinhardt
2017-01-31 9:01 ` [PATCH v5 1/5] mailmap: add Patrick Steinhardt's work address Patrick Steinhardt
2017-01-31 9:01 ` [PATCH v5 2/5] urlmatch: enable normalization of URLs with globs Patrick Steinhardt
2017-01-31 9:01 ` [PATCH v5 3/5] urlmatch: split host and port fields in `struct url_info` Patrick Steinhardt
2017-01-31 9:01 ` [PATCH v5 4/5] urlmatch: include host in urlmatch ranking Patrick Steinhardt
2017-01-31 9:01 ` [PATCH v5 5/5] urlmatch: allow globbing for the URL host part Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq7f5h4kng.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=patrick.steinhardt@elego.de \
--cc=philipoakley@iee.org \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).