git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Eric Sunshine <sunshine@sunshineco.com>,
	Git List <git@vger.kernel.org>,
	Konstantin Kharlamov <hi-angel@yandex.ru>,
	Ramsay Jones <ramsay@ramsayjones.plus.com>
Subject: Re: [PATCH v3 1/1] worktree add: sanitize worktree names
Date: Mon, 4 Mar 2019 19:04:24 +0700	[thread overview]
Message-ID: <20190304120424.GA7966@ash> (raw)
In-Reply-To: <CACsJy8D0o6-ihNcpmfhCfQPNo-t2i=NySp65Y8h2e3md2GvXVw@mail.gmail.com>

On Mon, Mar 04, 2019 at 06:19:15PM +0700, Duy Nguyen wrote:
> On Wed, Feb 27, 2019 at 11:05 PM Jeff King <peff@peff.net> wrote:
> >
> > On Wed, Feb 27, 2019 at 09:23:33AM -0500, Eric Sunshine wrote:
> >
> > > > If we just cared about saying "is this worktree name valid", I'd suggest
> > > > actually constructing a sample refname with the worktree name embedded
> > > > in it and feeding that to check_refname_format(). But because you want
> > > > to actually sanitize, I don't think there's an easy way to reuse it.
> > > >
> > > > So this approach is probably the best we can do, though I do still think
> > > > it's worth renaming that function (and/or putting a big warning comment
> > > > in front of it).
> > >
> > > The above arguments seem to suggest the introduction of a companion to
> > > check_refname_format() for sanitizing, perhaps named
> > > sanitize_refname_format(), in ref.[hc]. The potential difficulty with
> > > that is defining exactly what "sanitize" means. Will it be contextual?
> > > (That is, will git-worktree have differently sanitation needs than
> > > some other facility?) If so, perhaps a 'flags' argument could control
> > > how sanitization is done.
> >
> > I agree that sanitize_refname_format() would be nice, but I'm pretty
> > sure it's going to end up having to duplicate many of the rules from
> > check_refname_format(). Which is ugly if the two ever get out of sync.
> >
> > But if we could write it in a way that keeps the actual policy logic in
> > one factored-out portion, I think it would be worth doing.
> 
> I think we could make check_refname_format() returns the bad position
> and several different error codes depending on context. Then
> sanitize_.. can just repeatedly call check_refname_format and fix up
> whatever error it reports. Performance goes straight to hell but I
> don't think that's a big deal for git-worktree, and it keeps
> check_refname_format() simple (relatively speaking).

The new refs.c code would look something like this.
do_check_refname_component() does not look so bad.

-- 8< --
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 21469eb52c..ca63dd3df6 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -262,36 +262,6 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts)
 	free_worktrees(worktrees);
 }
 
-static void sanitize_worktree_name(struct strbuf *name)
-{
-	struct strbuf sb = STRBUF_INIT;
-	int i;
-
-	for (i = 0; i < name->len; i++) {
-		int ch = name->buf[i];
-
-		if (char_allowed_in_refname(ch))
-			strbuf_addch(&sb, ch);
-		else if (sb.len > 0 && sb.buf[sb.len - 1] != '-')
-			strbuf_addch(&sb, '-');
-	}
-	if (sb.len > 0 && sb.buf[sb.len - 1] == '-')
-		strbuf_setlen(&sb, sb.len - 1);
-	/*
-	 * a worktree name of only special chars would be reduced to
-	 * an empty string
-	 */
-	if (sb.len == 0)
-		strbuf_addstr(&sb, "worktree");
-
-	if (check_refname_format(sb.buf, REFNAME_ALLOW_ONELEVEL))
-		BUG("worktree name '%s' (from '%s') is not a valid refname",
-		    sb.buf, name->buf);
-
-	strbuf_swap(&sb, name);
-	strbuf_release(&sb);
-}
-
 static int add_worktree(const char *path, const char *refname,
 			const struct add_opts *opts)
 {
@@ -322,7 +292,7 @@ static int add_worktree(const char *path, const char *refname,
 
 	name = worktree_basename(path, &len);
 	strbuf_add(&sb_name, name, path + len - name);
-	sanitize_worktree_name(&sb_name);
+	sanitize_worktree_refname(&sb_name);
 	name = sb_name.buf;
 	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
diff --git a/refs.c b/refs.c
index f23f583db1..2d9730e792 100644
--- a/refs.c
+++ b/refs.c
@@ -63,6 +63,17 @@ int char_allowed_in_refname(int ch)
 		refname_disposition[ch] == 0;
 }
 
+enum check_code {
+	 refname_ok = 0,
+	 refname_contains_dotdot,
+	 refname_contains_atopen,
+	 refname_has_badchar,
+	 refname_contains_wildcard,
+	 refname_starts_with_dot,
+	 refname_ends_with_dotlock,
+	 refname_component_has_zero_length
+};
+
 /*
  * Try to read one refname component from the front of refname.
  * Return the length of the component found, or -1 if the component is
@@ -78,10 +89,11 @@ int char_allowed_in_refname(int ch)
  * - it ends with ".lock", or
  * - it contains a "@{" portion
  */
-static int check_refname_component(const char *refname, int *flags)
+static enum check_code do_check_refname_component(const char *refname, int *flags, const char **cp_out)
 {
 	const char *cp;
 	char last = '\0';
+	enum check_code ret = refname_ok;
 
 	for (cp = refname; ; cp++) {
 		int ch = *cp & 255;
@@ -90,18 +102,26 @@ static int check_refname_component(const char *refname, int *flags)
 		case 1:
 			goto out;
 		case 2:
-			if (last == '.')
-				return -1; /* Refname contains "..". */
+			if (last == '.') {
+				ret = refname_contains_dotdot;
+				goto done;
+			}
 			break;
 		case 3:
-			if (last == '@')
-				return -1; /* Refname contains "@{". */
+			if (last == '@') {
+				ret = refname_contains_atopen; /* @{ */
+				goto done;
+			}
 			break;
 		case 4:
-			return -1;
+			ret = refname_has_badchar;
+			goto done;
 		case 5:
-			if (!(*flags & REFNAME_REFSPEC_PATTERN))
-				return -1; /* refspec can't be a pattern */
+			if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
+				/* refspec can't be a pattern */
+				ret = refname_contains_wildcard;
+				goto done;
+			}
 
 			/*
 			 * Unset the pattern flag so that we only accept
@@ -113,16 +133,67 @@ static int check_refname_component(const char *refname, int *flags)
 		last = ch;
 	}
 out:
-	if (cp == refname)
-		return 0; /* Component has zero length. */
-	if (refname[0] == '.')
-		return -1; /* Component starts with '.'. */
+	if (cp == refname) {
+		ret = refname_component_has_zero_length;
+		goto done;
+	}
+	if (refname[0] == '.') {
+		ret = refname_starts_with_dot;
+		cp = refname;
+		goto done;
+	}
 	if (cp - refname >= LOCK_SUFFIX_LEN &&
-	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN))
-		return -1; /* Refname ends with ".lock". */
+	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
+		cp -= LOCK_SUFFIX_LEN;
+		ret = refname_ends_with_dotlock;
+	}
+done:
+	*cp_out = cp;
+	return ret;
+}
+
+static int check_refname_component(const char *refname, int *flags)
+{
+	const char *cp;
+	enum check_code ret;
+
+	ret = do_check_refname_component(refname, flags, &cp);
+	if (ret)
+		return -1;
 	return cp - refname;
 }
 
+void sanitize_worktree_refname(struct strbuf *name)
+{
+	int last_length = -1;
+	int flags = 0;
+
+	while (1) {
+		const char *cp;
+
+		enum check_code ret = do_check_refname_component(name->buf, &flags, &cp);
+		if (last_length != -1 && cp - name->buf == last_length)
+			BUG("stuck in infinite loop! pos = %d buf = %s",
+			    last_length, name->buf);
+		last_length = cp - name->buf;
+		switch (ret) {
+		case refname_ok:
+			return;
+		case refname_contains_dotdot:
+		case refname_contains_atopen:
+		case refname_has_badchar:
+		case refname_contains_wildcard:
+		case refname_ends_with_dotlock:
+		case refname_starts_with_dot:
+			name->buf[last_length] = '-';
+			break;
+		case refname_component_has_zero_length:
+			strbuf_addstr(name, "worktree");
+			return;
+		}
+	}
+}
+
 int check_refname_format(const char *refname, int flags)
 {
 	int component_len, component_count = 0;
diff --git a/refs.h b/refs.h
index 61b4073f76..3b65b8d27a 100644
--- a/refs.h
+++ b/refs.h
@@ -459,7 +459,7 @@ int for_each_reflog(each_ref_fn fn, void *cb_data);
  * repeated slashes are accepted.
  */
 int check_refname_format(const char *refname, int flags);
-int char_allowed_in_refname(int ch);
+void sanitize_worktree_refname(struct strbuf *name);
 
 const char *prettify_refname(const char *refname);
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index ea22207361..24c574f365 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -571,10 +571,10 @@ test_expect_success '"add" an existing locked but missing worktree' '
 '
 
 test_expect_success 'sanitize generated worktree name' '
-	git worktree add --detach ".  weird*..?.lock.lock." &&
-	test -d .git/worktrees/weird-lock-lock &&
+	git worktree add --detach ".  weird*..?.lock.lock" &&
+	test -d .git/worktrees/---weird-.--.lock-lock &&
 	git worktree add --detach .... &&
-	test -d .git/worktrees/worktree
+	test -d .git/worktrees/--.-
 '
 
 test_done
-- 8< --

--
Duy

  reply	other threads:[~2019-03-04 12:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 14:36 git gc fails with "unable to resolve reference" for worktree hi-angel
2019-02-18 15:02 ` Duy Nguyen
2019-02-18 15:09   ` hi-angel
2019-02-18 15:18     ` Duy Nguyen
2019-02-20 14:34       ` hi-angel
2019-02-21 11:00 ` [PATCH] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
2019-02-21 11:28   ` Konstantin Kharlamov
2019-02-21 11:38     ` Duy Nguyen
2019-02-21 11:44       ` Konstantin Kharlamov
2019-02-21 11:52         ` Duy Nguyen
2019-02-21 13:23           ` Jeff King
2019-02-21 12:19   ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy
2019-02-21 12:19     ` [PATCH v2 1/1] " Nguyễn Thái Ngọc Duy
2019-02-21 13:22       ` Jeff King
2019-02-21 17:41       ` Ramsay Jones
2019-02-22  9:21         ` Duy Nguyen
2019-02-26 10:58     ` [PATCH v3 0/1] " Nguyễn Thái Ngọc Duy
2019-02-26 10:58       ` [PATCH v3 1/1] " Nguyễn Thái Ngọc Duy
2019-02-27 12:08         ` Jeff King
2019-02-27 14:23           ` Eric Sunshine
2019-02-27 16:04             ` Jeff King
2019-03-03  1:22               ` Junio C Hamano
2019-03-04 11:19               ` Duy Nguyen
2019-03-04 12:04                 ` Duy Nguyen [this message]
2019-03-04 15:06         ` Johannes Schindelin
2019-03-05 12:08       ` [PATCH v4 0/2] " Nguyễn Thái Ngọc Duy
2019-03-05 12:08         ` [PATCH v4 1/2] refs.c: refactor check_refname_component() Nguyễn Thái Ngọc Duy
2019-03-06 21:49           ` Jeff King
2019-03-07 23:24             ` Eric Sunshine
2019-03-05 12:08         ` [PATCH v4 2/2] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
2019-03-08  9:28         ` [PATCH v5 0/1] " Nguyễn Thái Ngọc Duy
2019-03-08  9:28           ` [PATCH v5 1/1] " Nguyễn Thái Ngọc Duy
2019-03-10  2:02             ` Eric Sunshine
2019-03-11  6:20               ` Junio C Hamano
2019-03-11  9:24                 ` Duy Nguyen
2019-03-11 22:39                   ` Jeff King
2019-03-12  6:32                     ` Junio C Hamano
2019-03-11  6:36             ` Junio C Hamano
2019-03-11  9:27               ` Duy Nguyen
2019-03-11 13:05             ` Johannes Schindelin
2019-03-12  6:45               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190304120424.GA7966@ash \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=hi-angel@yandex.ru \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).