From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kyle J. McKay" Subject: [PATCH v7 2/4] config: improve support for http..* settings Date: Sun, 21 Jul 2013 20:18:36 -0700 Message-ID: <242e892174353ddcea572934ce891e1@f74d39fa044aa309eaea14b9f57fe79> References: Cc: David Aguilar , Petr Baudis , Junio C Hamano , Richard Hartmann , Jeff King , Daniel Knittl-Frank , =?UTF-8?q?Jan=20Kr=C3=BCger?= , Alejandro Mery , Aaron Schrab , Eric Sunshine To: git@vger.kernel.org X-From: git-owner@vger.kernel.org Mon Jul 22 05:19:11 2013 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1V16eQ-0004js-Hb for gcvg-git-2@plane.gmane.org; Mon, 22 Jul 2013 05:19:07 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755773Ab3GVDSy (ORCPT ); Sun, 21 Jul 2013 23:18:54 -0400 Received: from mail-pd0-f177.google.com ([209.85.192.177]:38876 "EHLO mail-pd0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754743Ab3GVDSw (ORCPT ); Sun, 21 Jul 2013 23:18:52 -0400 Received: by mail-pd0-f177.google.com with SMTP id p10so6272892pdj.36 for ; Sun, 21 Jul 2013 20:18:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=juwuae6ESTm2l6qkT3DL9A7u/KhNmM9gsRjb3+hSd+4=; b=ap6xVxa/DfsyXX7Fl5JbQNkXzXa+fwyVyzkN6moQgzSdyJxqRodNqR+WGtaYT8vERD LoDmF6z52/ayLbrYvNl80XYM2Sin3nHZsTflPY0lDUvQOxOvFMATfCV6YO+Y6E/ER6wI cSv9RoV4jmcrI2/wXp0wGtWO7leyiiGHmfT0vSnYhf7sAD4YVbeN8IGPvzqk2rM1xeqa MpYqqrwrv/OuHlHEGw18USw9BVL7CmB8tTmj+LdbCzJo+cr2ccojA5hd8KhRc1rlHjmx JFh4ejwlIocCuTikevR87vwMZKbqBNbkQl7xHBVYQMm+T1MZdus5p24ICe+0fIZgKmGy GIWw== X-Received: by 10.66.249.202 with SMTP id yw10mr29022491pac.145.1374463132045; Sun, 21 Jul 2013 20:18:52 -0700 (PDT) Received: from localhost.localdomain (ip72-192-173-141.sd.sd.cox.net. [72.192.173.141]) by mx.google.com with ESMTPSA id fl2sm36325047pab.23.2013.07.21.20.18.49 for (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 21 Jul 2013 20:18:51 -0700 (PDT) In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Improve on the http..* url matching behavior by first normalizing the urls before they are compared. With this change, for example, the following configuration section: [http "https://example.com/path"] useragent = example-agent sslVerify = false will properly match a "HTTPS://example.COM/p%61th" url which is equivalent. The normalization rules are based on RFC 3986 and should result in any two equivalent urls being a match. Signed-off-by: Kyle J. McKay --- Documentation/config.txt | 19 ++- http.c | 311 ++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 318 insertions(+), 12 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 41cab91..e461f32 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1517,16 +1517,15 @@ http..*:: Any of the http.* options above can be applied selectively to some urls. For example "http.https://example.com.useragent" would set the user agent only for https connections to example.com. The value - matches a url if it is an exact match or a prefix of the url matching - at a "/" boundary. Longer matches take precedence over shorter - ones with the environment variable settings taking precedence over all. - Note that must match the url passed to git exactly (other than - possibly being a prefix). This means any user, password and/or port - setting that appears in a url as well as any %XX escapes that are - present must also appear in to have a successful match. The urls - that are matched against are those given directly to git commands. In - other words, use exactly the same url that was passed to git (possibly - shortened) for the value of the config setting. + matches a url if it is an exact match or if it is a prefix of the url + matching at a "/" boundary. Longer matches take precedence over + shorter ones with the environment variable settings taking precedence + over all. The urls are normalized before testing for a match. Note, + however, that any user, password and/or port setting that appears in a + url must also match that part of to have a successful match. The + urls that are matched against are those given directly to git commands. + This means any urls visited as a result of a redirection do not + participate in matching. i18n.commitEncoding:: Character encoding the commit messages are stored in; Git itself diff --git a/http.c b/http.c index 1531ffa..29e119c 100644 --- a/http.c +++ b/http.c @@ -169,6 +169,300 @@ static void process_curl_messages(void) } #endif +#define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" +#define URL_DIGIT "0123456789" +#define URL_ALPHADIGIT URL_ALPHA URL_DIGIT +#define URL_SCHEME_CHARS URL_ALPHADIGIT "+.-" +#define URL_HOST_CHARS URL_ALPHADIGIT ".-[:]" /* IPv6 literals need [:] */ +#define URL_UNSAFE_CHARS " <>\"%{}|\\^`" /* plus 0x00-0x1F,0x7F-0xFF */ +#define URL_GEN_RESERVED ":/?#[]@" +#define URL_SUB_RESERVED "!$&'()*+,;=" +#define URL_RESERVED URL_GEN_RESERVED URL_SUB_RESERVED /* only allowed delims */ + +static int append_normalized_escapes(struct strbuf *buf, + const char *from, + size_t from_len, + const char *esc_extra, + const char *esc_ok) +{ + /* + * Append to strbuf 'buf' characters from string 'from' with length + * 'from_len' while unescaping characters that do not need to be escaped + * and escaping characters that do. The set of characters to escape + * (the complement of which is unescaped) starts out as the RFC 3986 + * unsafe characters (0x00-0x1F,0x7F-0xFF," <>\"#%{}|\\^`"). If + * 'esc_extra' is not NULL, those additional characters will also always + * be escaped. If 'esc_ok' is not NULL, those characters will be left + * escaped if found that way, but will not be unescaped otherwise (used + * for delimiters). If a %-escape sequence is encountered that is not + * followed by 2 hexadecimal digits, the sequence is invalid and + * false (0) will be returned. Otherwise true (1) will be returned for + * success. + * + * Note that all %-escape sequences will be normalized to UPPERCASE + * as indicated in RFC 3986. Unless included in esc_extra or esc_ok + * alphanumerics and "-._~" will always be unescaped as per RFC 3986. + */ + + while (from_len) { + int ch = *from++; + int was_esc = 0; + + from_len--; + if (ch == '%') { + if (from_len < 2 || + !isxdigit((unsigned char)from[0]) || + !isxdigit((unsigned char)from[1])) + return 0; + ch = hexval_table[(unsigned char)*from++] << 4; + ch |= hexval_table[(unsigned char)*from++]; + from_len -= 2; + was_esc = 1; + } + if ((unsigned char)ch <= 0x1F || (unsigned char)ch >= 0x7F || + strchr(URL_UNSAFE_CHARS, ch) || + (esc_extra && strchr(esc_extra, ch)) || + (was_esc && strchr(esc_ok, ch))) + strbuf_addf(buf, "%%%02X", (unsigned char)ch); + else + strbuf_addch(buf, ch); + } + + return 1; +} + +static char *http_options_url_normalize(const char *url) +{ + /* + * Normalize NUL-terminated url using the following rules: + * + * 1. Case-insensitive parts of url will be converted to lower case + * 2. %-encoded characters that do not need to be will be unencoded + * 3. Characters that are not %-encoded and must be will be encoded + * 4. All %-encodings will be converted to upper case hexadecimal + * 5. Leading 0s are removed from port numbers + * 6. If the default port for the scheme is given it will be removed + * 7. A path part (including empty) not starting with '/' has one added + * 8. Any dot segments (. or ..) in the path are resolved and removed + * 9. IPv6 host literals are allowed (but not normalized or validated) + * + * The rules are based on information in RFC 3986. + * + * Please note this function requires a full URL including a scheme + * and host part (except for file: URLs which may have an empty host). + * + * The return value is a newly allocated string that must be freed + * or NULL if the url is not valid. + * + * This is NOT a URL validation function. Full URL validation is NOT + * performed. Some invalid host names are passed through this function + * undetected. However, most all other problems that make a URL invalid + * will be detected (including a missing host for non file: URLs). + */ + + size_t url_len = strlen(url); + struct strbuf norm; + size_t spanned; + const char *slash_ptr, *at_ptr, *colon_ptr, *path_start; + int found_host = 0; + + + /* + * Copy lowercased scheme and :// suffix, %-escapes are not allowed + * First character of scheme must be URL_ALPHA + */ + spanned = strspn(url, URL_SCHEME_CHARS); + if (!spanned || !isalpha(url[0]) || spanned + 3 > url_len || + url[spanned] != ':' || url[spanned+1] != '/' || url[spanned+2] != '/') + return NULL; /* Bad scheme and/or missing "://" part */ + strbuf_init(&norm, url_len); + spanned += 3; + url_len -= spanned; + while (spanned--) + strbuf_addch(&norm, tolower(*url++)); + + + /* + * Copy any username:password if present normalizing %-escapes + */ + at_ptr = strchr(url, '@'); + slash_ptr = url + strcspn(url, "/?#"); + if (at_ptr && at_ptr < slash_ptr) { + if (at_ptr > url) { + if (!append_normalized_escapes(&norm, url, at_ptr - url, + "", URL_RESERVED)) { + strbuf_release(&norm); + return NULL; + } + } + strbuf_addch(&norm, '@'); + url_len -= (++at_ptr - url); + url = at_ptr; + } + + + /* + * Copy the host part excluding any port part, no %-escapes allowed + */ + if (!url_len || strchr(":/?#", *url)) { + /* Missing host invalid for all URL schemes except file */ + if (strncmp(norm.buf, "file:", 5)) { + strbuf_release(&norm); + return NULL; + } + } else { + found_host = 1; + } + colon_ptr = slash_ptr - 1; + while (colon_ptr > url && *colon_ptr != ':' && *colon_ptr != ']') + colon_ptr--; + if (*colon_ptr != ':') { + colon_ptr = slash_ptr; + } else if (!found_host && colon_ptr < slash_ptr && colon_ptr + 1 != slash_ptr) { + /* file: URLs may not have a port number */ + strbuf_release(&norm); + return NULL; + } + spanned = strspn(url, URL_HOST_CHARS); + if (spanned < colon_ptr - url) { + /* Host name has invalid characters */ + strbuf_release(&norm); + return NULL; + } + while (url < colon_ptr) { + strbuf_addch(&norm, tolower(*url++)); + url_len--; + } + + + /* + * Check the port part and copy if not the default (after removing any + * leading 0s); no %-escapes allowed + */ + if (colon_ptr < slash_ptr) { + /* skip the ':' and leading 0s but not the last one if all 0s */ + url++; + url += strspn(url, "0"); + if (url == slash_ptr && url[-1] == '0') + url--; + if (url == slash_ptr) { + /* Skip ":" port with no number, it's same as default */ + } else if (slash_ptr - url == 2 && + !strncmp(norm.buf, "http:", 5) && + !strncmp(url, "80", 2)) { + /* Skip http :80 as it's the default */ + } else if (slash_ptr - url == 3 && + !strncmp(norm.buf, "https:", 6) && + !strncmp(url, "443", 3)) { + /* Skip https :443 as it's the default */ + } else { + /* + * Port number must be all digits with leading 0s removed + * and since all the protocols we deal with have a 16-bit + * port number it must also be in the range 1..65535 + * 0 is not allowed because that means "next available" + * on just about every system and therefore cannot be used + */ + unsigned long pnum = 0; + spanned = strspn(url, URL_DIGIT); + if (spanned < slash_ptr - url) { + /* port number has invalid characters */ + strbuf_release(&norm); + return NULL; + } + if (slash_ptr - url <= 5) + pnum = strtoul(url, NULL, 10); + if (pnum == 0 || pnum > 65535) { + /* port number not in range 1..65535 */ + strbuf_release(&norm); + return NULL; + } + strbuf_addch(&norm, ':'); + strbuf_add(&norm, url, slash_ptr - url); + } + url_len -= slash_ptr - colon_ptr; + url = slash_ptr; + } + + + /* + * Now copy the path resolving any . and .. segments being careful not + * to corrupt the URL by unescaping any delimiters, but do add an + * initial '/' if it's missing and do normalize any %-escape sequences. + */ + path_start = norm.buf + norm.len; + strbuf_addch(&norm, '/'); + if (*url == '/') { + url++; + url_len--; + } + for (;;) { + const char *seg_start = norm.buf + norm.len; + const char *next_slash = url + strcspn(url, "/?#"); + int skip_add_slash = 0; + /* + * RFC 3689 indicates that any . or .. segments should be + * unescaped before being checked for. + */ + if (!append_normalized_escapes(&norm, url, next_slash - url, "", + URL_RESERVED)) { + strbuf_release(&norm); + return NULL; + } + if (!strcmp(seg_start, ".")) { + /* ignore a . segment; be careful not to remove initial '/' */ + if (seg_start == path_start + 1) { + strbuf_setlen(&norm, norm.len - 1); + skip_add_slash = 1; + } else { + strbuf_setlen(&norm, norm.len - 2); + } + } else if (!strcmp(seg_start, "..")) { + /* + * ignore a .. segment and remove the previous segment; + * be careful not to remove initial '/' from path + */ + const char *prev_slash = norm.buf + norm.len - 3; + if (prev_slash == path_start) { + /* invalid .. because no previous segment to remove */ + strbuf_release(&norm); + return NULL; + } + while (*--prev_slash != '/') {} + if (prev_slash == path_start) { + strbuf_setlen(&norm, prev_slash - norm.buf + 1); + skip_add_slash = 1; + } else { + strbuf_setlen(&norm, prev_slash - norm.buf); + } + } + url_len -= next_slash - url; + url = next_slash; + /* if the next char is not '/' done with the path */ + if (*url != '/') + break; + url++; + url_len--; + if (!skip_add_slash) + strbuf_addch(&norm, '/'); + } + + + /* + * Now simply copy the rest, if any, only normalizing %-escapes and + * being careful not to corrupt the URL by unescaping any delimiters. + */ + if (*url) { + if (!append_normalized_escapes(&norm, url, url_len, "", URL_RESERVED)) { + strbuf_release(&norm); + return NULL; + } + } + + + return strbuf_detach(&norm, NULL); +} + static size_t http_options_url_match_prefix(const char *url, const char *url_prefix, size_t url_prefix_len) @@ -185,8 +479,13 @@ static size_t http_options_url_match_prefix(const char *url, * The return value is the length of the match in characters (excluding * any final '/') or 0 for no match. Passing "/" as url_prefix will * always cause 0 to be returned. + * + * Passing NULL as url and/or url_prefix will always cause 0 to be + * returned without causing any faults. */ size_t url_len; + if (!url || !url_prefix) + return 0; if (url_prefix_len && url_prefix[url_prefix_len - 1] == '/') url_prefix_len--; if (!url_prefix_len || strncmp(url, url_prefix, url_prefix_len)) @@ -233,7 +532,13 @@ static int http_options(const char *var, const char *value, void *cb) */ dot = strrchr(key, '.'); if (dot) { - matchlen = http_options_url_match_prefix(url, key, dot - key); + char *config_url = xmemdupz(key, dot - key); + char *norm_url = http_options_url_normalize(config_url); + free(config_url); + if (!norm_url) + return 0; + matchlen = http_options_url_match_prefix(url, norm_url, strlen(norm_url)); + free(norm_url); if (!matchlen) return 0; key = dot + 1; @@ -469,11 +774,13 @@ void http_init(struct remote *remote, const char *url, int proactive_auth) { char *low_speed_limit; char *low_speed_time; + char *norm_url = http_options_url_normalize(url); http_is_verbose = 0; memset(http_option_max_matched_len, 0, sizeof(http_option_max_matched_len)); - git_config(http_options, (void *)url); + git_config(http_options, norm_url); + free(norm_url); curl_global_init(CURL_GLOBAL_ALL); -- 1.8.3