git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git gc fails with "unable to resolve reference" for worktree
@ 2019-02-18 14:36 hi-angel
  2019-02-18 15:02 ` Duy Nguyen
  2019-02-21 11:00 ` [PATCH] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 41+ messages in thread
From: hi-angel @ 2019-02-18 14:36 UTC (permalink / raw)
  To: git

# Steps to reproduce (in terms of terminal commands)

    $ mkdir foo
    $ cd foo
    $ git init
    Initialized empty Git repository in /tmp/foo/.git/
    $ echo hello > testfile
    $ git add testfile && git commit -m "my commit1"
    [master (root-commit) d5f0b47] my commit1
    1 file changed, 1 insertion(+)
    create mode 100644 testfile
    $ git checkout -b bar
    Switched to a new branch 'bar'
    $ git worktree add ../bar\ \(worktree\) master
    Preparing worktree (checking out 'master')
    HEAD is now at d5f0b47 my commit1
    $ git gc
    error: cannot lock ref 'worktrees/bar (worktree)/HEAD': unable to 
resolve reference 'worktrees/bar (worktree)/HEAD': Invalid argument
    fatal: failed to run reflog

# Expected

No errors

# Actual

error: cannot lock ref 'worktrees/bar (worktree)/HEAD': unable to 
resolve reference 'worktrees/bar (worktree)/HEAD': Invalid argument



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: git gc fails with "unable to resolve reference" for worktree
  2019-02-18 14:36 git gc fails with "unable to resolve reference" for worktree hi-angel
@ 2019-02-18 15:02 ` Duy Nguyen
  2019-02-18 15:09   ` hi-angel
  2019-02-21 11:00 ` [PATCH] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2019-02-18 15:02 UTC (permalink / raw)
  To: hi-angel; +Cc: Git Mailing List

On Mon, Feb 18, 2019 at 9:44 PM <hi-angel@yandex.ru> wrote:
>
> # Steps to reproduce (in terms of terminal commands)
>
>     $ mkdir foo
>     $ cd foo
>     $ git init
>     Initialized empty Git repository in /tmp/foo/.git/
>     $ echo hello > testfile
>     $ git add testfile && git commit -m "my commit1"
>     [master (root-commit) d5f0b47] my commit1
>     1 file changed, 1 insertion(+)
>     create mode 100644 testfile
>     $ git checkout -b bar
>     Switched to a new branch 'bar'
>     $ git worktree add ../bar\ \(worktree\) master
>     Preparing worktree (checking out 'master')
>     HEAD is now at d5f0b47 my commit1
>     $ git gc
>     error: cannot lock ref 'worktrees/bar (worktree)/HEAD': unable to
> resolve reference 'worktrees/bar (worktree)/HEAD': Invalid argument

Thanks for reporting. This is not a valid reference and causes the
problem. The worktree's name has to sanitized. I'll fix it tomorrow.

>     fatal: failed to run reflog
>
> # Expected
>
> No errors
>
> # Actual
>
> error: cannot lock ref 'worktrees/bar (worktree)/HEAD': unable to
> resolve reference 'worktrees/bar (worktree)/HEAD': Invalid argument
>
>


-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: git gc fails with "unable to resolve reference" for worktree
  2019-02-18 15:02 ` Duy Nguyen
@ 2019-02-18 15:09   ` hi-angel
  2019-02-18 15:18     ` Duy Nguyen
  0 siblings, 1 reply; 41+ messages in thread
From: hi-angel @ 2019-02-18 15:09 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List



On Пн, Feb 18, 2019 at 6:02 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Feb 18, 2019 at 9:44 PM <hi-angel@yandex.ru> wrote:
>> 
>>  # Steps to reproduce (in terms of terminal commands)
>> 
>>      $ mkdir foo
>>      $ cd foo
>>      $ git init
>>      Initialized empty Git repository in /tmp/foo/.git/
>>      $ echo hello > testfile
>>      $ git add testfile && git commit -m "my commit1"
>>      [master (root-commit) d5f0b47] my commit1
>>      1 file changed, 1 insertion(+)
>>      create mode 100644 testfile
>>      $ git checkout -b bar
>>      Switched to a new branch 'bar'
>>      $ git worktree add ../bar\ \(worktree\) master
>>      Preparing worktree (checking out 'master')
>>      HEAD is now at d5f0b47 my commit1
>>      $ git gc
>>      error: cannot lock ref 'worktrees/bar (worktree)/HEAD': unable 
>> to
>>  resolve reference 'worktrees/bar (worktree)/HEAD': Invalid argument
> 
> Thanks for reporting. This is not a valid reference and causes the
> problem. The worktree's name has to sanitized. I'll fix it tomorrow.
>> 

You mean, you want to prohibit such directory names as a worktree? But 
it's a proper directory naming, can perhaps git do the sanitizing 
transparently for end-user?



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: git gc fails with "unable to resolve reference" for worktree
  2019-02-18 15:09   ` hi-angel
@ 2019-02-18 15:18     ` Duy Nguyen
  2019-02-20 14:34       ` hi-angel
  0 siblings, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2019-02-18 15:18 UTC (permalink / raw)
  To: hi-angel; +Cc: Git Mailing List

On Mon, Feb 18, 2019 at 10:09 PM <hi-angel@yandex.ru> wrote:
>
>
>
> On Пн, Feb 18, 2019 at 6:02 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> > On Mon, Feb 18, 2019 at 9:44 PM <hi-angel@yandex.ru> wrote:
> >>
> >>  # Steps to reproduce (in terms of terminal commands)
> >>
> >>      $ mkdir foo
> >>      $ cd foo
> >>      $ git init
> >>      Initialized empty Git repository in /tmp/foo/.git/
> >>      $ echo hello > testfile
> >>      $ git add testfile && git commit -m "my commit1"
> >>      [master (root-commit) d5f0b47] my commit1
> >>      1 file changed, 1 insertion(+)
> >>      create mode 100644 testfile
> >>      $ git checkout -b bar
> >>      Switched to a new branch 'bar'
> >>      $ git worktree add ../bar\ \(worktree\) master
> >>      Preparing worktree (checking out 'master')
> >>      HEAD is now at d5f0b47 my commit1
> >>      $ git gc
> >>      error: cannot lock ref 'worktrees/bar (worktree)/HEAD': unable
> >> to
> >>  resolve reference 'worktrees/bar (worktree)/HEAD': Invalid argument
> >
> > Thanks for reporting. This is not a valid reference and causes the
> > problem. The worktree's name has to sanitized. I'll fix it tomorrow.
> >>
>
> You mean, you want to prohibit such directory names as a worktree? But
> it's a proper directory naming, can perhaps git do the sanitizing
> transparently for end-user?

No, not inhibiting. When you do "git add ../(abc)" then the internal
name could be simply "abc" or something like that instead of "(abc)"
which is invalid to reflog and other commands. The worktree's location
is still '(abc)'. There's also a work-in-progress option to let the
user control this worktree name directly.
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: git gc fails with "unable to resolve reference" for worktree
  2019-02-18 15:18     ` Duy Nguyen
@ 2019-02-20 14:34       ` hi-angel
  0 siblings, 0 replies; 41+ messages in thread
From: hi-angel @ 2019-02-20 14:34 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

I see, thanks!

On Пн, Feb 18, 2019 at 6:18 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Feb 18, 2019 at 10:09 PM <hi-angel@yandex.ru> wrote:
>> 
>> 
>> 
>>  On Пн, Feb 18, 2019 at 6:02 PM, Duy Nguyen <pclouds@gmail.com> 
>> wrote:
>>  > On Mon, Feb 18, 2019 at 9:44 PM <hi-angel@yandex.ru> wrote:
>>  >>
>>  >>  # Steps to reproduce (in terms of terminal commands)
>>  >>
>>  >>      $ mkdir foo
>>  >>      $ cd foo
>>  >>      $ git init
>>  >>      Initialized empty Git repository in /tmp/foo/.git/
>>  >>      $ echo hello > testfile
>>  >>      $ git add testfile && git commit -m "my commit1"
>>  >>      [master (root-commit) d5f0b47] my commit1
>>  >>      1 file changed, 1 insertion(+)
>>  >>      create mode 100644 testfile
>>  >>      $ git checkout -b bar
>>  >>      Switched to a new branch 'bar'
>>  >>      $ git worktree add ../bar\ \(worktree\) master
>>  >>      Preparing worktree (checking out 'master')
>>  >>      HEAD is now at d5f0b47 my commit1
>>  >>      $ git gc
>>  >>      error: cannot lock ref 'worktrees/bar (worktree)/HEAD': 
>> unable
>>  >> to
>>  >>  resolve reference 'worktrees/bar (worktree)/HEAD': Invalid 
>> argument
>>  >
>>  > Thanks for reporting. This is not a valid reference and causes the
>>  > problem. The worktree's name has to sanitized. I'll fix it 
>> tomorrow.
>>  >>
>> 
>>  You mean, you want to prohibit such directory names as a worktree? 
>> But
>>  it's a proper directory naming, can perhaps git do the sanitizing
>>  transparently for end-user?
> 
> No, not inhibiting. When you do "git add ../(abc)" then the internal
> name could be simply "abc" or something like that instead of "(abc)"
> which is invalid to reflog and other commands. The worktree's location
> is still '(abc)'. There's also a work-in-progress option to let the
> user control this worktree name directly.
> --
> Duy



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] worktree add: sanitize worktree names
  2019-02-18 14:36 git gc fails with "unable to resolve reference" for worktree hi-angel
  2019-02-18 15:02 ` Duy Nguyen
@ 2019-02-21 11:00 ` Nguyễn Thái Ngọc Duy
  2019-02-21 11:28   ` Konstantin Kharlamov
  2019-02-21 12:19   ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-02-21 11:00 UTC (permalink / raw)
  To: hi-angel; +Cc: git, Eric Sunshine, Nguyễn Thái Ngọc Duy

Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
significant until 3a3b9d8cde (refs: new ref types to make per-worktree
refs visible to all worktrees - 2018-10-21), where worktree name could
be part of a refname and must follow refname rules.

Update 'worktree add' code to remove special characters to follow
these rules. The code could replace chars with '-' more than
necessary, but it keeps the code simple. In the future the user will
be able to specify the worktree name by themselves if they're not
happy with this dumb character substitution.

Reported-by: hi-angel@yandex.ru
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/worktree.c      | 47 ++++++++++++++++++++++++++++++++++++++++-
 t/t2025-worktree-add.sh |  5 +++++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3f9907fcc9..ff36838a33 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -262,6 +262,46 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts)
 	free_worktrees(worktrees);
 }
 
+/*
+ * worktree name is part of refname and has to pass
+ * check_refname_component(). Remove unallowed characters to make it
+ * valid.
+ */
+static void sanitize_worktree_name(struct strbuf *name)
+{
+	int i;
+
+	/* no ending with .lock */
+	if (ends_with(name->buf, ".lock"))
+		strbuf_remove(name, name->len - strlen(".lock"),
+			      strlen(".lock"));
+
+	/*
+	 * All special chars replaced with dashes. See
+	 * check_refname_component() for reference.
+	 */
+	for (i = 0; i < name->len; i++) {
+		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
+			name->buf[i] = '-';
+	}
+
+	/* remove consecutive dashes, leading or trailing dashes */
+	for (i = 0; i < name->len; i++) {
+		while (name->buf[i] == '-' &&
+		       (i == 0 ||
+			i == name->len - 1 ||
+			(i < name->len - 1 && name->buf[i + 1] == '-')))
+			strbuf_remove(name, i, 1);
+	}
+
+	/* last resort, should never ever happen in practice */
+	if (name->len == 0)
+		strbuf_addstr(name, "worktree");
+
+	if (check_refname_format(name->buf, REFNAME_ALLOW_ONELEVEL))
+		BUG("worktree name '%s' is not a valid refname", name->buf);
+}
+
 static int add_worktree(const char *path, const char *refname,
 			const struct add_opts *opts)
 {
@@ -275,6 +315,7 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
+	struct strbuf sb_name = STRBUF_INIT;
 
 	validate_worktree_add(path, opts);
 
@@ -290,7 +331,10 @@ static int add_worktree(const char *path, const char *refname,
 		die(_("invalid reference: %s"), refname);
 
 	name = worktree_basename(path, &len);
-	git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), name);
+	strbuf_add(&sb_name, name, path + len - name);
+	sanitize_worktree_name(&sb_name);
+	name = sb_name.buf;
+	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
@@ -415,6 +459,7 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_release(&symref);
 	strbuf_release(&sb_repo);
 	strbuf_release(&sb_git);
+	strbuf_release(&sb_name);
 	return ret;
 }
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 286bba35d8..0d465adb54 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -570,4 +570,9 @@ test_expect_success '"add" an existing locked but missing worktree' '
 	git worktree add --force --force --detach gnoo
 '
 
+test_expect_success 'sanitize generated worktree name' '
+	git worktree add --detach ".  weird*..?.lock" &&
+	test -d .git/worktrees/weird
+'
+
 test_done
-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] worktree add: sanitize worktree names
  2019-02-21 11:00 ` [PATCH] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
@ 2019-02-21 11:28   ` Konstantin Kharlamov
  2019-02-21 11:38     ` Duy Nguyen
  2019-02-21 12:19   ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 41+ messages in thread
From: Konstantin Kharlamov @ 2019-02-21 11:28 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, Eric Sunshine



On Чт, Feb 21, 2019 at 2:00 PM, 
=?UTF-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy <pclouds@gmail.com> wrote:
> Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
> significant until 3a3b9d8cde (refs: new ref types to make per-worktree
> refs visible to all worktrees - 2018-10-21), where worktree name could
> be part of a refname and must follow refname rules.
> 
> Update 'worktree add' code to remove special characters to follow
> these rules. The code could replace chars with '-' more than
> necessary, but it keeps the code simple. In the future the user will
> be able to specify the worktree name by themselves if they're not
> happy with this dumb character substitution.
> 
> Reported-by: hi-angel@yandex.ru
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/worktree.c      | 47 
> ++++++++++++++++++++++++++++++++++++++++-
>  t/t2025-worktree-add.sh |  5 +++++
>  2 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 3f9907fcc9..ff36838a33 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -262,6 +262,46 @@ static void validate_worktree_add(const char 
> *path, const struct add_opts *opts)
>  	free_worktrees(worktrees);
>  }
> 
> +/*
> + * worktree name is part of refname and has to pass
> + * check_refname_component(). Remove unallowed characters to make it
> + * valid.
> + */
> +static void sanitize_worktree_name(struct strbuf *name)
> +{
> +	int i;
> +
> +	/* no ending with .lock */
> +	if (ends_with(name->buf, ".lock"))
> +		strbuf_remove(name, name->len - strlen(".lock"),
> +			      strlen(".lock"));
> +
> +	/*
> +	 * All special chars replaced with dashes. See
> +	 * check_refname_component() for reference.
> +	 */
> +	for (i = 0; i < name->len; i++) {
> +		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> +			name->buf[i] = '-';
> +	}
> +
> +	/* remove consecutive dashes, leading or trailing dashes */
> +	for (i = 0; i < name->len; i++) {
> +		while (name->buf[i] == '-' &&
> +		       (i == 0 ||
> +			i == name->len - 1 ||
> +			(i < name->len - 1 && name->buf[i + 1] == '-')))
> +			strbuf_remove(name, i, 1);
> +	}
> +
> +	/* last resort, should never ever happen in practice */
> +	if (name->len == 0)
> +		strbuf_addstr(name, "worktree");

I assume this means a user have passed a zero-sized worktree name? But 
zero-sized file/directory names are not possible anyway, would it make 
sense to just return an error in this case?

> +
> +	if (check_refname_format(name->buf, REFNAME_ALLOW_ONELEVEL))
> +		BUG("worktree name '%s' is not a valid refname", name->buf);
> +}
> +
>  static int add_worktree(const char *path, const char *refname,
>  			const struct add_opts *opts)
>  {
> @@ -275,6 +315,7 @@ static int add_worktree(const char *path, const 
> char *refname,
>  	struct strbuf symref = STRBUF_INIT;
>  	struct commit *commit = NULL;
>  	int is_branch = 0;
> +	struct strbuf sb_name = STRBUF_INIT;
> 
>  	validate_worktree_add(path, opts);
> 
> @@ -290,7 +331,10 @@ static int add_worktree(const char *path, const 
> char *refname,
>  		die(_("invalid reference: %s"), refname);
> 
>  	name = worktree_basename(path, &len);
> -	git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), 
> name);
> +	strbuf_add(&sb_name, name, path + len - name);
> +	sanitize_worktree_name(&sb_name);
> +	name = sb_name.buf;
> +	git_path_buf(&sb_repo, "worktrees/%s", name);
>  	len = sb_repo.len;
>  	if (safe_create_leading_directories_const(sb_repo.buf))
>  		die_errno(_("could not create leading directories of '%s'"),
> @@ -415,6 +459,7 @@ static int add_worktree(const char *path, const 
> char *refname,
>  	strbuf_release(&symref);
>  	strbuf_release(&sb_repo);
>  	strbuf_release(&sb_git);
> +	strbuf_release(&sb_name);
>  	return ret;
>  }
> 
> diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
> index 286bba35d8..0d465adb54 100755
> --- a/t/t2025-worktree-add.sh
> +++ b/t/t2025-worktree-add.sh
> @@ -570,4 +570,9 @@ test_expect_success '"add" an existing locked but 
> missing worktree' '
>  	git worktree add --force --force --detach gnoo
>  '
> 
> +test_expect_success 'sanitize generated worktree name' '
> +	git worktree add --detach ".  weird*..?.lock" &&
> +	test -d .git/worktrees/weird
> +'
> +
>  test_done
> --
> 2.21.0.rc1.337.gdf7f8d0522
> 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] worktree add: sanitize worktree names
  2019-02-21 11:28   ` Konstantin Kharlamov
@ 2019-02-21 11:38     ` Duy Nguyen
  2019-02-21 11:44       ` Konstantin Kharlamov
  0 siblings, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2019-02-21 11:38 UTC (permalink / raw)
  To: Konstantin Kharlamov; +Cc: Git Mailing List, Eric Sunshine

On Thu, Feb 21, 2019 at 6:28 PM Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
>
>
>
> On Чт, Feb 21, 2019 at 2:00 PM,
> =?UTF-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy <pclouds@gmail.com> wrote:
> > Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
> > significant until 3a3b9d8cde (refs: new ref types to make per-worktree
> > refs visible to all worktrees - 2018-10-21), where worktree name could
> > be part of a refname and must follow refname rules.
> >
> > Update 'worktree add' code to remove special characters to follow
> > these rules. The code could replace chars with '-' more than
> > necessary, but it keeps the code simple. In the future the user will
> > be able to specify the worktree name by themselves if they're not
> > happy with this dumb character substitution.
> >
> > Reported-by: hi-angel@yandex.ru
> > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > ---
> >  builtin/worktree.c      | 47
> > ++++++++++++++++++++++++++++++++++++++++-
> >  t/t2025-worktree-add.sh |  5 +++++
> >  2 files changed, 51 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/worktree.c b/builtin/worktree.c
> > index 3f9907fcc9..ff36838a33 100644
> > --- a/builtin/worktree.c
> > +++ b/builtin/worktree.c
> > @@ -262,6 +262,46 @@ static void validate_worktree_add(const char
> > *path, const struct add_opts *opts)
> >       free_worktrees(worktrees);
> >  }
> >
> > +/*
> > + * worktree name is part of refname and has to pass
> > + * check_refname_component(). Remove unallowed characters to make it
> > + * valid.
> > + */
> > +static void sanitize_worktree_name(struct strbuf *name)
> > +{
> > +     int i;
> > +
> > +     /* no ending with .lock */
> > +     if (ends_with(name->buf, ".lock"))
> > +             strbuf_remove(name, name->len - strlen(".lock"),
> > +                           strlen(".lock"));
> > +
> > +     /*
> > +      * All special chars replaced with dashes. See
> > +      * check_refname_component() for reference.
> > +      */
> > +     for (i = 0; i < name->len; i++) {
> > +             if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> > +                     name->buf[i] = '-';
> > +     }
> > +
> > +     /* remove consecutive dashes, leading or trailing dashes */
> > +     for (i = 0; i < name->len; i++) {
> > +             while (name->buf[i] == '-' &&
> > +                    (i == 0 ||
> > +                     i == name->len - 1 ||
> > +                     (i < name->len - 1 && name->buf[i + 1] == '-')))
> > +                     strbuf_remove(name, i, 1);
> > +     }
> > +
> > +     /* last resort, should never ever happen in practice */
> > +     if (name->len == 0)
> > +             strbuf_addstr(name, "worktree");
>
> I assume this means a user have passed a zero-sized worktree name? But
> zero-sized file/directory names are not possible anyway, would it make
> sense to just return an error in this case?

It could happen if you do "git worktree add .lock". The ".lock" part
will be stripped out, leaving us with an empty string.
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] worktree add: sanitize worktree names
  2019-02-21 11:38     ` Duy Nguyen
@ 2019-02-21 11:44       ` Konstantin Kharlamov
  2019-02-21 11:52         ` Duy Nguyen
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Kharlamov @ 2019-02-21 11:44 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Eric Sunshine



On Чт, Feb 21, 2019 at 2:38 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Thu, Feb 21, 2019 at 6:28 PM Konstantin Kharlamov 
> <hi-angel@yandex.ru> wrote:
>> 
>> 
>> 
>>  On Чт, Feb 21, 2019 at 2:00 PM,
>>  =?UTF-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= Duy <pclouds@gmail.com> 
>> wrote:
>>  > Worktree names are based on $(basename $GIT_WORK_TREE). They 
>> aren't
>>  > significant until 3a3b9d8cde (refs: new ref types to make 
>> per-worktree
>>  > refs visible to all worktrees - 2018-10-21), where worktree name 
>> could
>>  > be part of a refname and must follow refname rules.
>>  >
>>  > Update 'worktree add' code to remove special characters to follow
>>  > these rules. The code could replace chars with '-' more than
>>  > necessary, but it keeps the code simple. In the future the user 
>> will
>>  > be able to specify the worktree name by themselves if they're not
>>  > happy with this dumb character substitution.
>>  >
>>  > Reported-by: hi-angel@yandex.ru
>>  > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>>  > ---
>>  >  builtin/worktree.c      | 47
>>  > ++++++++++++++++++++++++++++++++++++++++-
>>  >  t/t2025-worktree-add.sh |  5 +++++
>>  >  2 files changed, 51 insertions(+), 1 deletion(-)
>>  >
>>  > diff --git a/builtin/worktree.c b/builtin/worktree.c
>>  > index 3f9907fcc9..ff36838a33 100644
>>  > --- a/builtin/worktree.c
>>  > +++ b/builtin/worktree.c
>>  > @@ -262,6 +262,46 @@ static void validate_worktree_add(const char
>>  > *path, const struct add_opts *opts)
>>  >       free_worktrees(worktrees);
>>  >  }
>>  >
>>  > +/*
>>  > + * worktree name is part of refname and has to pass
>>  > + * check_refname_component(). Remove unallowed characters to 
>> make it
>>  > + * valid.
>>  > + */
>>  > +static void sanitize_worktree_name(struct strbuf *name)
>>  > +{
>>  > +     int i;
>>  > +
>>  > +     /* no ending with .lock */
>>  > +     if (ends_with(name->buf, ".lock"))
>>  > +             strbuf_remove(name, name->len - strlen(".lock"),
>>  > +                           strlen(".lock"));
>>  > +
>>  > +     /*
>>  > +      * All special chars replaced with dashes. See
>>  > +      * check_refname_component() for reference.
>>  > +      */
>>  > +     for (i = 0; i < name->len; i++) {
>>  > +             if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
>>  > +                     name->buf[i] = '-';
>>  > +     }
>>  > +
>>  > +     /* remove consecutive dashes, leading or trailing dashes */
>>  > +     for (i = 0; i < name->len; i++) {
>>  > +             while (name->buf[i] == '-' &&
>>  > +                    (i == 0 ||
>>  > +                     i == name->len - 1 ||
>>  > +                     (i < name->len - 1 && name->buf[i + 1] == 
>> '-')))
>>  > +                     strbuf_remove(name, i, 1);
>>  > +     }
>>  > +
>>  > +     /* last resort, should never ever happen in practice */
>>  > +     if (name->len == 0)
>>  > +             strbuf_addstr(name, "worktree");
>> 
>>  I assume this means a user have passed a zero-sized worktree name? 
>> But
>>  zero-sized file/directory names are not possible anyway, would it 
>> make
>>  sense to just return an error in this case?
> 
> It could happen if you do "git worktree add .lock". The ".lock" part
> will be stripped out, leaving us with an empty string.

Ah, I see. Then, would it maybe make sense to just sanitize the ".lock" 
out the same way as you did with special symbols, i.e. with dashes?

(I am not a git developer, so not sure if that's a good question, but I 
would also question why ".lock" needs to be deleted. I guess git uses 
the postfix internally, but why can't it be okay with "name.lock.lock")



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] worktree add: sanitize worktree names
  2019-02-21 11:44       ` Konstantin Kharlamov
@ 2019-02-21 11:52         ` Duy Nguyen
  2019-02-21 13:23           ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2019-02-21 11:52 UTC (permalink / raw)
  To: Konstantin Kharlamov; +Cc: Git Mailing List, Eric Sunshine

On Thu, Feb 21, 2019 at 6:44 PM Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
> >>  > +
> >>  > +     /* last resort, should never ever happen in practice */
> >>  > +     if (name->len == 0)
> >>  > +             strbuf_addstr(name, "worktree");
> >>
> >>  I assume this means a user have passed a zero-sized worktree name?
> >> But
> >>  zero-sized file/directory names are not possible anyway, would it
> >> make
> >>  sense to just return an error in this case?
> >
> > It could happen if you do "git worktree add .lock". The ".lock" part
> > will be stripped out, leaving us with an empty string.
>
> Ah, I see. Then, would it maybe make sense to just sanitize the ".lock"
> out the same way as you did with special symbols, i.e. with dashes?

Hmm.. I actually did not think of that. Then we could return the error
if "name" is empty.

> (I am not a git developer, so not sure if that's a good question, but I
> would also question why ".lock" needs to be deleted. I guess git uses

It's because "foo.lock" is usually a temporary file to prepare things
before we do an atomic update to "foo". And the "refs guys" were just
being careful when they designed reference names so they reject any
reference names that end with .lock. You can try to create a branch
named something.lock, it will not go through. This is actually
documented in "git help check-ref-format".

> the postfix internally, but why can't it be okay with "name.lock.lock")

Uh oh I miss this case. I only delete ".lock" once, "name.lock" would
still be rejected. Thanks for noticing.
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/1] worktree add: sanitize worktree names
  2019-02-21 11:00 ` [PATCH] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
  2019-02-21 11:28   ` Konstantin Kharlamov
@ 2019-02-21 12:19   ` Nguyễn Thái Ngọc Duy
  2019-02-21 12:19     ` [PATCH v2 1/1] " Nguyễn Thái Ngọc Duy
  2019-02-26 10:58     ` [PATCH v3 0/1] " Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-02-21 12:19 UTC (permalink / raw)
  To: pclouds; +Cc: git, hi-angel, sunshine

v2 fixes bad ".lock" handling in v1. I keep the "name->len == 0" part
though because I found another valid case that could end up there.

Nguyễn Thái Ngọc Duy (1):
  worktree add: sanitize worktree names

 builtin/worktree.c      | 51 ++++++++++++++++++++++++++++++++++++++++-
 t/t2025-worktree-add.sh |  7 ++++++
 2 files changed, 57 insertions(+), 1 deletion(-)

Range-diff dựa trên v1:
1:  42a3144874 ! 1:  d1b6e1c55b worktree add: sanitize worktree names
    @@ -13,7 +13,7 @@
         be able to specify the worktree name by themselves if they're not
         happy with this dumb character substitution.
     
    -    Reported-by: hi-angel@yandex.ru
    +    Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
         Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
     
      diff --git a/builtin/worktree.c b/builtin/worktree.c
    @@ -30,16 +30,14 @@
     + */
     +static void sanitize_worktree_name(struct strbuf *name)
     +{
    ++	char *orig_name = xstrdup(name->buf);
     +	int i;
     +
    -+	/* no ending with .lock */
    -+	if (ends_with(name->buf, ".lock"))
    -+		strbuf_remove(name, name->len - strlen(".lock"),
    -+			      strlen(".lock"));
    -+
     +	/*
     +	 * All special chars replaced with dashes. See
     +	 * check_refname_component() for reference.
    ++	 * Note that .lock is also turned to -lock, removing its
    ++	 * special status.
     +	 */
     +	for (i = 0; i < name->len; i++) {
     +		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
    @@ -55,12 +53,18 @@
     +			strbuf_remove(name, i, 1);
     +	}
     +
    -+	/* last resort, should never ever happen in practice */
    ++	/*
    ++	 * a worktree name of only special chars would be reduced to
    ++	 * an empty string
    ++	 */
     +	if (name->len == 0)
     +		strbuf_addstr(name, "worktree");
     +
     +	if (check_refname_format(name->buf, REFNAME_ALLOW_ONELEVEL))
    -+		BUG("worktree name '%s' is not a valid refname", name->buf);
    ++		BUG("worktree name '%s' (from '%s') is not a valid refname",
    ++		    name->buf, orig_name);
    ++
    ++	free(orig_name);
     +}
     +
      static int add_worktree(const char *path, const char *refname,
    @@ -103,8 +107,10 @@
      '
      
     +test_expect_success 'sanitize generated worktree name' '
    -+	git worktree add --detach ".  weird*..?.lock" &&
    -+	test -d .git/worktrees/weird
    ++	git worktree add --detach ".  weird*..?.lock.lock" &&
    ++	test -d .git/worktrees/weird-lock-lock &&
    ++	git worktree add --detach .... &&
    ++	test -d .git/worktrees/worktree
     +'
     +
      test_done
-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/1] worktree add: sanitize worktree names
  2019-02-21 12:19   ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy
@ 2019-02-21 12:19     ` Nguyễn Thái Ngọc Duy
  2019-02-21 13:22       ` Jeff King
  2019-02-21 17:41       ` Ramsay Jones
  2019-02-26 10:58     ` [PATCH v3 0/1] " Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-02-21 12:19 UTC (permalink / raw)
  To: pclouds; +Cc: git, hi-angel, sunshine

Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
significant until 3a3b9d8cde (refs: new ref types to make per-worktree
refs visible to all worktrees - 2018-10-21), where worktree name could
be part of a refname and must follow refname rules.

Update 'worktree add' code to remove special characters to follow
these rules. The code could replace chars with '-' more than
necessary, but it keeps the code simple. In the future the user will
be able to specify the worktree name by themselves if they're not
happy with this dumb character substitution.

Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/worktree.c      | 51 ++++++++++++++++++++++++++++++++++++++++-
 t/t2025-worktree-add.sh |  7 ++++++
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3f9907fcc9..53e41db229 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -262,6 +262,50 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts)
 	free_worktrees(worktrees);
 }
 
+/*
+ * worktree name is part of refname and has to pass
+ * check_refname_component(). Remove unallowed characters to make it
+ * valid.
+ */
+static void sanitize_worktree_name(struct strbuf *name)
+{
+	char *orig_name = xstrdup(name->buf);
+	int i;
+
+	/*
+	 * All special chars replaced with dashes. See
+	 * check_refname_component() for reference.
+	 * Note that .lock is also turned to -lock, removing its
+	 * special status.
+	 */
+	for (i = 0; i < name->len; i++) {
+		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
+			name->buf[i] = '-';
+	}
+
+	/* remove consecutive dashes, leading or trailing dashes */
+	for (i = 0; i < name->len; i++) {
+		while (name->buf[i] == '-' &&
+		       (i == 0 ||
+			i == name->len - 1 ||
+			(i < name->len - 1 && name->buf[i + 1] == '-')))
+			strbuf_remove(name, i, 1);
+	}
+
+	/*
+	 * a worktree name of only special chars would be reduced to
+	 * an empty string
+	 */
+	if (name->len == 0)
+		strbuf_addstr(name, "worktree");
+
+	if (check_refname_format(name->buf, REFNAME_ALLOW_ONELEVEL))
+		BUG("worktree name '%s' (from '%s') is not a valid refname",
+		    name->buf, orig_name);
+
+	free(orig_name);
+}
+
 static int add_worktree(const char *path, const char *refname,
 			const struct add_opts *opts)
 {
@@ -275,6 +319,7 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
+	struct strbuf sb_name = STRBUF_INIT;
 
 	validate_worktree_add(path, opts);
 
@@ -290,7 +335,10 @@ static int add_worktree(const char *path, const char *refname,
 		die(_("invalid reference: %s"), refname);
 
 	name = worktree_basename(path, &len);
-	git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), name);
+	strbuf_add(&sb_name, name, path + len - name);
+	sanitize_worktree_name(&sb_name);
+	name = sb_name.buf;
+	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
@@ -415,6 +463,7 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_release(&symref);
 	strbuf_release(&sb_repo);
 	strbuf_release(&sb_git);
+	strbuf_release(&sb_name);
 	return ret;
 }
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 286bba35d8..71aa6ab9c1 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -570,4 +570,11 @@ test_expect_success '"add" an existing locked but missing worktree' '
 	git worktree add --force --force --detach gnoo
 '
 
+test_expect_success 'sanitize generated worktree name' '
+	git worktree add --detach ".  weird*..?.lock.lock" &&
+	test -d .git/worktrees/weird-lock-lock &&
+	git worktree add --detach .... &&
+	test -d .git/worktrees/worktree
+'
+
 test_done
-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/1] worktree add: sanitize worktree names
  2019-02-21 12:19     ` [PATCH v2 1/1] " Nguyễn Thái Ngọc Duy
@ 2019-02-21 13:22       ` Jeff King
  2019-02-21 17:41       ` Ramsay Jones
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-02-21 13:22 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, hi-angel, sunshine

On Thu, Feb 21, 2019 at 07:19:43PM +0700, Nguyễn Thái Ngọc Duy wrote:

> +/*
> + * worktree name is part of refname and has to pass
> + * check_refname_component(). Remove unallowed characters to make it
> + * valid.
> + */
> +static void sanitize_worktree_name(struct strbuf *name)
> +{
> +	char *orig_name = xstrdup(name->buf);
> +	int i;
> +
> +	/*
> +	 * All special chars replaced with dashes. See
> +	 * check_refname_component() for reference.
> +	 * Note that .lock is also turned to -lock, removing its
> +	 * special status.
> +	 */
> +	for (i = 0; i < name->len; i++) {
> +		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> +			name->buf[i] = '-';
> +	}

This is reject-known-bad, but I think there are still some other
characters that are not allowed in refnames (e.g., ASCII control
characters). Which would lead to us hitting the BUG() below.

It might make sense to provide access to refname_disposition() and use
it here. Alternatively, I think if we did an allow-known-good, it might
be OK to have a slightly more restrictive scheme (say, alnum plus
dashes, plus high-bit chars).

> +	/* remove consecutive dashes, leading or trailing dashes */
> +	for (i = 0; i < name->len; i++) {
> +		while (name->buf[i] == '-' &&
> +		       (i == 0 ||
> +			i == name->len - 1 ||
> +			(i < name->len - 1 && name->buf[i + 1] == '-')))
> +			strbuf_remove(name, i, 1);
> +	}

I think this is correct, though it is possibly to be quadratic in the
string length due to the O(n) remove. I think this kind of sanitizing is
more readable if done between two strings rather than in-place, like:

  for (i = 0; i < name->len; i++) {
	if (is_allowed(name->buf[i])) {
		strbuf_addch(&dest, name->buf[i]);
		last_was_dash = 0;
	} else if (!last_was_dash && dest->len)
		strbuf_addch(&dest, '-');
		last_was_dash = 1;
	}
  }
  /* still must handle removal from end of stray "-" and ".lock" */
  strbuf_swap(name, &dest);
  strbuf_release(&dest);

but that may just be personal preference. I'm OK with it if you prefer
the in-place way.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] worktree add: sanitize worktree names
  2019-02-21 11:52         ` Duy Nguyen
@ 2019-02-21 13:23           ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-02-21 13:23 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Konstantin Kharlamov, Git Mailing List, Eric Sunshine

On Thu, Feb 21, 2019 at 06:52:05PM +0700, Duy Nguyen wrote:

> > the postfix internally, but why can't it be okay with "name.lock.lock")
> 
> Uh oh I miss this case. I only delete ".lock" once, "name.lock" would
> still be rejected. Thanks for noticing.

Another tricky case is "refs/heads/foo.lock/bar.lock", which would need
both ".lock"s removed. I think your v2 handles this correctly, though
(because it disallows "." entirely).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/1] worktree add: sanitize worktree names
  2019-02-21 12:19     ` [PATCH v2 1/1] " Nguyễn Thái Ngọc Duy
  2019-02-21 13:22       ` Jeff King
@ 2019-02-21 17:41       ` Ramsay Jones
  2019-02-22  9:21         ` Duy Nguyen
  1 sibling, 1 reply; 41+ messages in thread
From: Ramsay Jones @ 2019-02-21 17:41 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, hi-angel, sunshine



On 21/02/2019 12:19, Nguyễn Thái Ngọc Duy wrote:
> Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
> significant until 3a3b9d8cde (refs: new ref types to make per-worktree
> refs visible to all worktrees - 2018-10-21), where worktree name could
> be part of a refname and must follow refname rules.
> 
> Update 'worktree add' code to remove special characters to follow
> these rules. The code could replace chars with '-' more than
> necessary, but it keeps the code simple. In the future the user will
> be able to specify the worktree name by themselves if they're not
> happy with this dumb character substitution.
> 
> Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  builtin/worktree.c      | 51 ++++++++++++++++++++++++++++++++++++++++-
>  t/t2025-worktree-add.sh |  7 ++++++
>  2 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index 3f9907fcc9..53e41db229 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -262,6 +262,50 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts)
>  	free_worktrees(worktrees);
>  }
>  
> +/*
> + * worktree name is part of refname and has to pass
> + * check_refname_component(). Remove unallowed characters to make it
> + * valid.
> + */
> +static void sanitize_worktree_name(struct strbuf *name)
> +{
> +	char *orig_name = xstrdup(name->buf);
> +	int i;
> +
> +	/*
> +	 * All special chars replaced with dashes. See
> +	 * check_refname_component() for reference.
> +	 * Note that .lock is also turned to -lock, removing its
> +	 * special status.
> +	 */
> +	for (i = 0; i < name->len; i++) {
> +		if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> +			name->buf[i] = '-';
> +	}
> +
> +	/* remove consecutive dashes, leading or trailing dashes */

Why? So, '[fred]' will be 'sanitized' to 'fred' (rather than '-fred-'),
which would increase the chance of a 'collision' with the 'fred'
worktree (not very likely, but still). Is that useful? How about
'x86_64-*-gnu' which now becomes 'x86_64-gnu'?
 
> +	for (i = 0; i < name->len; i++) {
> +		while (name->buf[i] == '-' &&
> +		       (i == 0 ||
> +			i == name->len - 1 ||
> +			(i < name->len - 1 && name->buf[i + 1] == '-')))
> +			strbuf_remove(name, i, 1);
> +	}
> +
> +	/*
> +	 * a worktree name of only special chars would be reduced to
> +	 * an empty string
> +	 */> +	if (name->len == 0)
> +		strbuf_addstr(name, "worktree");

If you didn't 'collapse' the name above, you could check for
an empty name at the top and wouldn't need this (presumably
an empty name would not be valid).

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/1] worktree add: sanitize worktree names
  2019-02-21 17:41       ` Ramsay Jones
@ 2019-02-22  9:21         ` Duy Nguyen
  0 siblings, 0 replies; 41+ messages in thread
From: Duy Nguyen @ 2019-02-22  9:21 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Git Mailing List, Konstantin Kharlamov, Eric Sunshine

On Fri, Feb 22, 2019 at 12:42 AM Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
> > +static void sanitize_worktree_name(struct strbuf *name)
> > +{
> > +     char *orig_name = xstrdup(name->buf);
> > +     int i;
> > +
> > +     /*
> > +      * All special chars replaced with dashes. See
> > +      * check_refname_component() for reference.
> > +      * Note that .lock is also turned to -lock, removing its
> > +      * special status.
> > +      */
> > +     for (i = 0; i < name->len; i++) {
> > +             if (strchr(":?[]\\~ \t@{}*/.", name->buf[i]))
> > +                     name->buf[i] = '-';
> > +     }
> > +
> > +     /* remove consecutive dashes, leading or trailing dashes */
>
> Why? So, '[fred]' will be 'sanitized' to 'fred' (rather than '-fred-'),
> which would increase the chance of a 'collision' with the 'fred'
> worktree (not very likely, but still). Is that useful? How about
> 'x86_64-*-gnu' which now becomes 'x86_64-gnu'?

It is useful when you want to specify HEAD of [fred] for example.
Writing worktrees/fred/HEAD is a bit better than
worktrees/-fred-/HEAD. I haven't done it yet, but these names will be
shown in "git worktree list" too and lots of dashes does not improve
readability. Collision is not a problem because if fred is taken, the
final name would be fred1 or fred<some other number>.

If you're really bothered with this, you will be able to specify the
name you want (you can't, yet). You still have to pass the valid
refname check, but you have a lot more flexibility.

So this code only needs to work mostly ok for the common case and I
could go either way, clean up consecutive dashes or not. I suppose
simpler code would be the tie breaker.
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 0/1] worktree add: sanitize worktree names
  2019-02-21 12:19   ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy
  2019-02-21 12:19     ` [PATCH v2 1/1] " Nguyễn Thái Ngọc Duy
@ 2019-02-26 10:58     ` Nguyễn Thái Ngọc Duy
  2019-02-26 10:58       ` [PATCH v3 1/1] " Nguyễn Thái Ngọc Duy
  2019-03-05 12:08       ` [PATCH v4 0/2] " Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-02-26 10:58 UTC (permalink / raw)
  To: pclouds; +Cc: git, hi-angel, sunshine, Jeff King, Ramsay Jones

v3 is rewritten to use refname_disposition[] to cover all invalid
chars.

Nguyễn Thái Ngọc Duy (1):
  worktree add: sanitize worktree names

 builtin/worktree.c      | 37 ++++++++++++++++++++++++++++++++++++-
 refs.c                  |  6 ++++++
 refs.h                  |  1 +
 t/t2025-worktree-add.sh |  7 +++++++
 4 files changed, 50 insertions(+), 1 deletion(-)

-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-02-26 10:58     ` [PATCH v3 0/1] " Nguyễn Thái Ngọc Duy
@ 2019-02-26 10:58       ` Nguyễn Thái Ngọc Duy
  2019-02-27 12:08         ` Jeff King
  2019-03-04 15:06         ` Johannes Schindelin
  2019-03-05 12:08       ` [PATCH v4 0/2] " Nguyễn Thái Ngọc Duy
  1 sibling, 2 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-02-26 10:58 UTC (permalink / raw)
  To: pclouds; +Cc: git, hi-angel, sunshine, Jeff King, Ramsay Jones

Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
significant until 3a3b9d8cde (refs: new ref types to make per-worktree
refs visible to all worktrees - 2018-10-21), where worktree name could
be part of a refname and must follow refname rules.

Update 'worktree add' code to remove special characters to follow
these rules. The code could replace chars with '-' more than
necessary, but it keeps the code simple. In the future the user will
be able to specify the worktree name by themselves if they're not
happy with this dumb character substitution.

Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/worktree.c      | 37 ++++++++++++++++++++++++++++++++++++-
 refs.c                  |  6 ++++++
 refs.h                  |  1 +
 t/t2025-worktree-add.sh |  7 +++++++
 4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3f9907fcc9..21469eb52c 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -262,6 +262,36 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts)
 	free_worktrees(worktrees);
 }
 
+static void sanitize_worktree_name(struct strbuf *name)
+{
+	struct strbuf sb = STRBUF_INIT;
+	int i;
+
+	for (i = 0; i < name->len; i++) {
+		int ch = name->buf[i];
+
+		if (char_allowed_in_refname(ch))
+			strbuf_addch(&sb, ch);
+		else if (sb.len > 0 && sb.buf[sb.len - 1] != '-')
+			strbuf_addch(&sb, '-');
+	}
+	if (sb.len > 0 && sb.buf[sb.len - 1] == '-')
+		strbuf_setlen(&sb, sb.len - 1);
+	/*
+	 * a worktree name of only special chars would be reduced to
+	 * an empty string
+	 */
+	if (sb.len == 0)
+		strbuf_addstr(&sb, "worktree");
+
+	if (check_refname_format(sb.buf, REFNAME_ALLOW_ONELEVEL))
+		BUG("worktree name '%s' (from '%s') is not a valid refname",
+		    sb.buf, name->buf);
+
+	strbuf_swap(&sb, name);
+	strbuf_release(&sb);
+}
+
 static int add_worktree(const char *path, const char *refname,
 			const struct add_opts *opts)
 {
@@ -275,6 +305,7 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
+	struct strbuf sb_name = STRBUF_INIT;
 
 	validate_worktree_add(path, opts);
 
@@ -290,7 +321,10 @@ static int add_worktree(const char *path, const char *refname,
 		die(_("invalid reference: %s"), refname);
 
 	name = worktree_basename(path, &len);
-	git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), name);
+	strbuf_add(&sb_name, name, path + len - name);
+	sanitize_worktree_name(&sb_name);
+	name = sb_name.buf;
+	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
@@ -415,6 +449,7 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_release(&symref);
 	strbuf_release(&sb_repo);
 	strbuf_release(&sb_git);
+	strbuf_release(&sb_name);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 142888a40a..f23f583db1 100644
--- a/refs.c
+++ b/refs.c
@@ -57,6 +57,12 @@ static unsigned char refname_disposition[256] = {
 	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
 };
 
+int char_allowed_in_refname(int ch)
+{
+	return 0 <= ch && ch < ARRAY_SIZE(refname_disposition) &&
+		refname_disposition[ch] == 0;
+}
+
 /*
  * Try to read one refname component from the front of refname.
  * Return the length of the component found, or -1 if the component is
diff --git a/refs.h b/refs.h
index 308fa1f03b..61b4073f76 100644
--- a/refs.h
+++ b/refs.h
@@ -459,6 +459,7 @@ int for_each_reflog(each_ref_fn fn, void *cb_data);
  * repeated slashes are accepted.
  */
 int check_refname_format(const char *refname, int flags);
+int char_allowed_in_refname(int ch);
 
 const char *prettify_refname(const char *refname);
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 286bba35d8..ea22207361 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -570,4 +570,11 @@ test_expect_success '"add" an existing locked but missing worktree' '
 	git worktree add --force --force --detach gnoo
 '
 
+test_expect_success 'sanitize generated worktree name' '
+	git worktree add --detach ".  weird*..?.lock.lock." &&
+	test -d .git/worktrees/weird-lock-lock &&
+	git worktree add --detach .... &&
+	test -d .git/worktrees/worktree
+'
+
 test_done
-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-02-26 10:58       ` [PATCH v3 1/1] " Nguyễn Thái Ngọc Duy
@ 2019-02-27 12:08         ` Jeff King
  2019-02-27 14:23           ` Eric Sunshine
  2019-03-04 15:06         ` Johannes Schindelin
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-02-27 12:08 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, hi-angel, sunshine, Ramsay Jones

On Tue, Feb 26, 2019 at 05:58:51PM +0700, Nguyễn Thái Ngọc Duy wrote:

> Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
> significant until 3a3b9d8cde (refs: new ref types to make per-worktree
> refs visible to all worktrees - 2018-10-21), where worktree name could
> be part of a refname and must follow refname rules.
> 
> Update 'worktree add' code to remove special characters to follow
> these rules. The code could replace chars with '-' more than
> necessary, but it keeps the code simple. In the future the user will
> be able to specify the worktree name by themselves if they're not
> happy with this dumb character substitution.

So notably this gets around ".." and ".lock" by just disallowing "."
entirely. I think I'm OK with that for worktrees. It does make me a
little nervous to see this new public function, though:

> +int char_allowed_in_refname(int ch)
> +{
> +	return 0 <= ch && ch < ARRAY_SIZE(refname_disposition) &&
> +		refname_disposition[ch] == 0;
> +}

because it's not entirely accurate, as you noted above. I wonder if we
could name this differently to warn people that the refname rules are
not so simple.

If we just cared about saying "is this worktree name valid", I'd suggest
actually constructing a sample refname with the worktree name embedded
in it and feeding that to check_refname_format(). But because you want
to actually sanitize, I don't think there's an easy way to reuse it.

So this approach is probably the best we can do, though I do still think
it's worth renaming that function (and/or putting a big warning comment
in front of it).

Other than that, I didn't see anything objectionable in the patch.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-02-27 12:08         ` Jeff King
@ 2019-02-27 14:23           ` Eric Sunshine
  2019-02-27 16:04             ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2019-02-27 14:23 UTC (permalink / raw)
  To: Jeff King
  Cc: Nguyễn Thái Ngọc Duy, Git List, hi-angel,
	Ramsay Jones

On Wed, Feb 27, 2019 at 7:09 AM Jeff King <peff@peff.net> wrote:
> On Tue, Feb 26, 2019 at 05:58:51PM +0700, Nguyễn Thái Ngọc Duy wrote:
> > Update 'worktree add' code to remove special characters to follow
> > these rules. The code could replace chars with '-' more than
> > necessary, but it keeps the code simple. In the future the user will
> > be able to specify the worktree name by themselves if they're not
> > happy with this dumb character substitution.
>
> So notably this gets around ".." and ".lock" by just disallowing "."
> entirely. I think I'm OK with that for worktrees. It does make me a
> little nervous to see this new public function, though:
>
> > +int char_allowed_in_refname(int ch) [...]
>
> because it's not entirely accurate, as you noted above. I wonder if we
> could name this differently to warn people that the refname rules are
> not so simple.
>
> If we just cared about saying "is this worktree name valid", I'd suggest
> actually constructing a sample refname with the worktree name embedded
> in it and feeding that to check_refname_format(). But because you want
> to actually sanitize, I don't think there's an easy way to reuse it.
>
> So this approach is probably the best we can do, though I do still think
> it's worth renaming that function (and/or putting a big warning comment
> in front of it).

The above arguments seem to suggest the introduction of a companion to
check_refname_format() for sanitizing, perhaps named
sanitize_refname_format(), in ref.[hc]. The potential difficulty with
that is defining exactly what "sanitize" means. Will it be contextual?
(That is, will git-worktree have differently sanitation needs than
some other facility?) If so, perhaps a 'flags' argument could control
how sanitization is done.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-02-27 14:23           ` Eric Sunshine
@ 2019-02-27 16:04             ` Jeff King
  2019-03-03  1:22               ` Junio C Hamano
  2019-03-04 11:19               ` Duy Nguyen
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-02-27 16:04 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Nguyễn Thái Ngọc Duy, Git List, hi-angel,
	Ramsay Jones

On Wed, Feb 27, 2019 at 09:23:33AM -0500, Eric Sunshine wrote:

> > If we just cared about saying "is this worktree name valid", I'd suggest
> > actually constructing a sample refname with the worktree name embedded
> > in it and feeding that to check_refname_format(). But because you want
> > to actually sanitize, I don't think there's an easy way to reuse it.
> >
> > So this approach is probably the best we can do, though I do still think
> > it's worth renaming that function (and/or putting a big warning comment
> > in front of it).
> 
> The above arguments seem to suggest the introduction of a companion to
> check_refname_format() for sanitizing, perhaps named
> sanitize_refname_format(), in ref.[hc]. The potential difficulty with
> that is defining exactly what "sanitize" means. Will it be contextual?
> (That is, will git-worktree have differently sanitation needs than
> some other facility?) If so, perhaps a 'flags' argument could control
> how sanitization is done.

I agree that sanitize_refname_format() would be nice, but I'm pretty
sure it's going to end up having to duplicate many of the rules from
check_refname_format(). Which is ugly if the two ever get out of sync.

But if we could write it in a way that keeps the actual policy logic in
one factored-out portion, I think it would be worth doing.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-02-27 16:04             ` Jeff King
@ 2019-03-03  1:22               ` Junio C Hamano
  2019-03-04 11:19               ` Duy Nguyen
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-03-03  1:22 UTC (permalink / raw)
  To: Jeff King
  Cc: Eric Sunshine, Nguyễn Thái Ngọc Duy, Git List,
	hi-angel, Ramsay Jones

Jeff King <peff@peff.net> writes:

> I agree that sanitize_refname_format() would be nice, but I'm pretty
> sure it's going to end up having to duplicate many of the rules from
> check_refname_format(). Which is ugly if the two ever get out of sync.
>
> But if we could write it in a way that keeps the actual policy logic in
> one factored-out portion, I think it would be worth doing.

Yeah, I do too.

In the meantime, let's call v3 sufficient improvement from the
current state for now and queue it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-02-27 16:04             ` Jeff King
  2019-03-03  1:22               ` Junio C Hamano
@ 2019-03-04 11:19               ` Duy Nguyen
  2019-03-04 12:04                 ` Duy Nguyen
  1 sibling, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2019-03-04 11:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Git List, Konstantin Kharlamov, Ramsay Jones

On Wed, Feb 27, 2019 at 11:05 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Feb 27, 2019 at 09:23:33AM -0500, Eric Sunshine wrote:
>
> > > If we just cared about saying "is this worktree name valid", I'd suggest
> > > actually constructing a sample refname with the worktree name embedded
> > > in it and feeding that to check_refname_format(). But because you want
> > > to actually sanitize, I don't think there's an easy way to reuse it.
> > >
> > > So this approach is probably the best we can do, though I do still think
> > > it's worth renaming that function (and/or putting a big warning comment
> > > in front of it).
> >
> > The above arguments seem to suggest the introduction of a companion to
> > check_refname_format() for sanitizing, perhaps named
> > sanitize_refname_format(), in ref.[hc]. The potential difficulty with
> > that is defining exactly what "sanitize" means. Will it be contextual?
> > (That is, will git-worktree have differently sanitation needs than
> > some other facility?) If so, perhaps a 'flags' argument could control
> > how sanitization is done.
>
> I agree that sanitize_refname_format() would be nice, but I'm pretty
> sure it's going to end up having to duplicate many of the rules from
> check_refname_format(). Which is ugly if the two ever get out of sync.
>
> But if we could write it in a way that keeps the actual policy logic in
> one factored-out portion, I think it would be worth doing.

I think we could make check_refname_format() returns the bad position
and several different error codes depending on context. Then
sanitize_.. can just repeatedly call check_refname_format and fix up
whatever error it reports. Performance goes straight to hell but I
don't think that's a big deal for git-worktree, and it keeps
check_refname_format() simple (relatively speaking).

The second option is make check_refname_format() call some callback
instead of returning error. This allows sanitize_ to fix up in one go
(inside the callback), but check_refname_format could be a lot uglier,
and verifying all refs (I think pack-refs does this?) could also be
slowed down.
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-03-04 11:19               ` Duy Nguyen
@ 2019-03-04 12:04                 ` Duy Nguyen
  0 siblings, 0 replies; 41+ messages in thread
From: Duy Nguyen @ 2019-03-04 12:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Git List, Konstantin Kharlamov, Ramsay Jones

On Mon, Mar 04, 2019 at 06:19:15PM +0700, Duy Nguyen wrote:
> On Wed, Feb 27, 2019 at 11:05 PM Jeff King <peff@peff.net> wrote:
> >
> > On Wed, Feb 27, 2019 at 09:23:33AM -0500, Eric Sunshine wrote:
> >
> > > > If we just cared about saying "is this worktree name valid", I'd suggest
> > > > actually constructing a sample refname with the worktree name embedded
> > > > in it and feeding that to check_refname_format(). But because you want
> > > > to actually sanitize, I don't think there's an easy way to reuse it.
> > > >
> > > > So this approach is probably the best we can do, though I do still think
> > > > it's worth renaming that function (and/or putting a big warning comment
> > > > in front of it).
> > >
> > > The above arguments seem to suggest the introduction of a companion to
> > > check_refname_format() for sanitizing, perhaps named
> > > sanitize_refname_format(), in ref.[hc]. The potential difficulty with
> > > that is defining exactly what "sanitize" means. Will it be contextual?
> > > (That is, will git-worktree have differently sanitation needs than
> > > some other facility?) If so, perhaps a 'flags' argument could control
> > > how sanitization is done.
> >
> > I agree that sanitize_refname_format() would be nice, but I'm pretty
> > sure it's going to end up having to duplicate many of the rules from
> > check_refname_format(). Which is ugly if the two ever get out of sync.
> >
> > But if we could write it in a way that keeps the actual policy logic in
> > one factored-out portion, I think it would be worth doing.
> 
> I think we could make check_refname_format() returns the bad position
> and several different error codes depending on context. Then
> sanitize_.. can just repeatedly call check_refname_format and fix up
> whatever error it reports. Performance goes straight to hell but I
> don't think that's a big deal for git-worktree, and it keeps
> check_refname_format() simple (relatively speaking).

The new refs.c code would look something like this.
do_check_refname_component() does not look so bad.

-- 8< --
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 21469eb52c..ca63dd3df6 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -262,36 +262,6 @@ static void validate_worktree_add(const char *path, const struct add_opts *opts)
 	free_worktrees(worktrees);
 }
 
-static void sanitize_worktree_name(struct strbuf *name)
-{
-	struct strbuf sb = STRBUF_INIT;
-	int i;
-
-	for (i = 0; i < name->len; i++) {
-		int ch = name->buf[i];
-
-		if (char_allowed_in_refname(ch))
-			strbuf_addch(&sb, ch);
-		else if (sb.len > 0 && sb.buf[sb.len - 1] != '-')
-			strbuf_addch(&sb, '-');
-	}
-	if (sb.len > 0 && sb.buf[sb.len - 1] == '-')
-		strbuf_setlen(&sb, sb.len - 1);
-	/*
-	 * a worktree name of only special chars would be reduced to
-	 * an empty string
-	 */
-	if (sb.len == 0)
-		strbuf_addstr(&sb, "worktree");
-
-	if (check_refname_format(sb.buf, REFNAME_ALLOW_ONELEVEL))
-		BUG("worktree name '%s' (from '%s') is not a valid refname",
-		    sb.buf, name->buf);
-
-	strbuf_swap(&sb, name);
-	strbuf_release(&sb);
-}
-
 static int add_worktree(const char *path, const char *refname,
 			const struct add_opts *opts)
 {
@@ -322,7 +292,7 @@ static int add_worktree(const char *path, const char *refname,
 
 	name = worktree_basename(path, &len);
 	strbuf_add(&sb_name, name, path + len - name);
-	sanitize_worktree_name(&sb_name);
+	sanitize_worktree_refname(&sb_name);
 	name = sb_name.buf;
 	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
diff --git a/refs.c b/refs.c
index f23f583db1..2d9730e792 100644
--- a/refs.c
+++ b/refs.c
@@ -63,6 +63,17 @@ int char_allowed_in_refname(int ch)
 		refname_disposition[ch] == 0;
 }
 
+enum check_code {
+	 refname_ok = 0,
+	 refname_contains_dotdot,
+	 refname_contains_atopen,
+	 refname_has_badchar,
+	 refname_contains_wildcard,
+	 refname_starts_with_dot,
+	 refname_ends_with_dotlock,
+	 refname_component_has_zero_length
+};
+
 /*
  * Try to read one refname component from the front of refname.
  * Return the length of the component found, or -1 if the component is
@@ -78,10 +89,11 @@ int char_allowed_in_refname(int ch)
  * - it ends with ".lock", or
  * - it contains a "@{" portion
  */
-static int check_refname_component(const char *refname, int *flags)
+static enum check_code do_check_refname_component(const char *refname, int *flags, const char **cp_out)
 {
 	const char *cp;
 	char last = '\0';
+	enum check_code ret = refname_ok;
 
 	for (cp = refname; ; cp++) {
 		int ch = *cp & 255;
@@ -90,18 +102,26 @@ static int check_refname_component(const char *refname, int *flags)
 		case 1:
 			goto out;
 		case 2:
-			if (last == '.')
-				return -1; /* Refname contains "..". */
+			if (last == '.') {
+				ret = refname_contains_dotdot;
+				goto done;
+			}
 			break;
 		case 3:
-			if (last == '@')
-				return -1; /* Refname contains "@{". */
+			if (last == '@') {
+				ret = refname_contains_atopen; /* @{ */
+				goto done;
+			}
 			break;
 		case 4:
-			return -1;
+			ret = refname_has_badchar;
+			goto done;
 		case 5:
-			if (!(*flags & REFNAME_REFSPEC_PATTERN))
-				return -1; /* refspec can't be a pattern */
+			if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
+				/* refspec can't be a pattern */
+				ret = refname_contains_wildcard;
+				goto done;
+			}
 
 			/*
 			 * Unset the pattern flag so that we only accept
@@ -113,16 +133,67 @@ static int check_refname_component(const char *refname, int *flags)
 		last = ch;
 	}
 out:
-	if (cp == refname)
-		return 0; /* Component has zero length. */
-	if (refname[0] == '.')
-		return -1; /* Component starts with '.'. */
+	if (cp == refname) {
+		ret = refname_component_has_zero_length;
+		goto done;
+	}
+	if (refname[0] == '.') {
+		ret = refname_starts_with_dot;
+		cp = refname;
+		goto done;
+	}
 	if (cp - refname >= LOCK_SUFFIX_LEN &&
-	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN))
-		return -1; /* Refname ends with ".lock". */
+	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
+		cp -= LOCK_SUFFIX_LEN;
+		ret = refname_ends_with_dotlock;
+	}
+done:
+	*cp_out = cp;
+	return ret;
+}
+
+static int check_refname_component(const char *refname, int *flags)
+{
+	const char *cp;
+	enum check_code ret;
+
+	ret = do_check_refname_component(refname, flags, &cp);
+	if (ret)
+		return -1;
 	return cp - refname;
 }
 
+void sanitize_worktree_refname(struct strbuf *name)
+{
+	int last_length = -1;
+	int flags = 0;
+
+	while (1) {
+		const char *cp;
+
+		enum check_code ret = do_check_refname_component(name->buf, &flags, &cp);
+		if (last_length != -1 && cp - name->buf == last_length)
+			BUG("stuck in infinite loop! pos = %d buf = %s",
+			    last_length, name->buf);
+		last_length = cp - name->buf;
+		switch (ret) {
+		case refname_ok:
+			return;
+		case refname_contains_dotdot:
+		case refname_contains_atopen:
+		case refname_has_badchar:
+		case refname_contains_wildcard:
+		case refname_ends_with_dotlock:
+		case refname_starts_with_dot:
+			name->buf[last_length] = '-';
+			break;
+		case refname_component_has_zero_length:
+			strbuf_addstr(name, "worktree");
+			return;
+		}
+	}
+}
+
 int check_refname_format(const char *refname, int flags)
 {
 	int component_len, component_count = 0;
diff --git a/refs.h b/refs.h
index 61b4073f76..3b65b8d27a 100644
--- a/refs.h
+++ b/refs.h
@@ -459,7 +459,7 @@ int for_each_reflog(each_ref_fn fn, void *cb_data);
  * repeated slashes are accepted.
  */
 int check_refname_format(const char *refname, int flags);
-int char_allowed_in_refname(int ch);
+void sanitize_worktree_refname(struct strbuf *name);
 
 const char *prettify_refname(const char *refname);
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index ea22207361..24c574f365 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -571,10 +571,10 @@ test_expect_success '"add" an existing locked but missing worktree' '
 '
 
 test_expect_success 'sanitize generated worktree name' '
-	git worktree add --detach ".  weird*..?.lock.lock." &&
-	test -d .git/worktrees/weird-lock-lock &&
+	git worktree add --detach ".  weird*..?.lock.lock" &&
+	test -d .git/worktrees/---weird-.--.lock-lock &&
 	git worktree add --detach .... &&
-	test -d .git/worktrees/worktree
+	test -d .git/worktrees/--.-
 '
 
 test_done
-- 8< --

--
Duy

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 1/1] worktree add: sanitize worktree names
  2019-02-26 10:58       ` [PATCH v3 1/1] " Nguyễn Thái Ngọc Duy
  2019-02-27 12:08         ` Jeff King
@ 2019-03-04 15:06         ` Johannes Schindelin
  1 sibling, 0 replies; 41+ messages in thread
From: Johannes Schindelin @ 2019-03-04 15:06 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, hi-angel, sunshine, Jeff King, Ramsay Jones

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

Hi Duy,

On Tue, 26 Feb 2019, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
> index 286bba35d8..ea22207361 100755
> --- a/t/t2025-worktree-add.sh
> +++ b/t/t2025-worktree-add.sh
> @@ -570,4 +570,11 @@ test_expect_success '"add" an existing locked but missing worktree' '
>  	git worktree add --force --force --detach gnoo
>  '
>  
> +test_expect_success 'sanitize generated worktree name' '
> +	git worktree add --detach ".  weird*..?.lock.lock." &&
> +	test -d .git/worktrees/weird-lock-lock &&
> +	git worktree add --detach .... &&
> +	test -d .git/worktrees/worktree
> +'
> +
>  test_done

You probably missed that this added test fails on Windows:

https://dev.azure.com/gitgitgadget/git/_build/results?buildId=3782&view=logs

The reason is that you use a "funny name" which cannot be represented on
every filesystem.

Please use the FUNNYNAMES prerequisite (or introduce another one, if you
are uncomfortable with using a test whether tabs are valid in filenames to
indiciate whether wildcard characters are valid in filenames).

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 0/2] worktree add: sanitize worktree names
  2019-02-26 10:58     ` [PATCH v3 0/1] " Nguyễn Thái Ngọc Duy
  2019-02-26 10:58       ` [PATCH v3 1/1] " Nguyễn Thái Ngọc Duy
@ 2019-03-05 12:08       ` Nguyễn Thái Ngọc Duy
  2019-03-05 12:08         ` [PATCH v4 1/2] refs.c: refactor check_refname_component() Nguyễn Thái Ngọc Duy
                           ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-03-05 12:08 UTC (permalink / raw)
  To: pclouds
  Cc: git, hi-angel, peff, ramsay, sunshine, Junio C Hamano,
	Johannes Schindelin

v4 refactors check_refname_component() so that we could do more accurate
substitution (and leave fewer traps).

Performance of sanitize_worktree_refname() goes back to horrible
again. But since it's not really a big deal (no body is going to add
200 worktrees per second), I don't feel like we should optimize it.
That may involve removing the for loop in do_check_refname_component()
and making things uglier.

The test is also updated to have FUNNYNAMES prerequisite, which is
always unset on Windows. This should fix the breakage there.

Nguyễn Thái Ngọc Duy (2):
  refs.c: refactor check_refname_component()
  worktree add: sanitize worktree names

 builtin/worktree.c      |   7 ++-
 refs.c                  | 114 ++++++++++++++++++++++++++++++++++------
 refs.h                  |   1 +
 t/t2025-worktree-add.sh |   5 ++
 4 files changed, 110 insertions(+), 17 deletions(-)

-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v4 1/2] refs.c: refactor check_refname_component()
  2019-03-05 12:08       ` [PATCH v4 0/2] " Nguyễn Thái Ngọc Duy
@ 2019-03-05 12:08         ` Nguyễn Thái Ngọc Duy
  2019-03-06 21:49           ` Jeff King
  2019-03-05 12:08         ` [PATCH v4 2/2] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
  2019-03-08  9:28         ` [PATCH v5 0/1] " Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-03-05 12:08 UTC (permalink / raw)
  To: pclouds
  Cc: git, hi-angel, peff, ramsay, sunshine, Junio C Hamano,
	Johannes Schindelin

The core logic of this function is factored out to provide more
information when the refname is invalid: what part that is and exact
what is wrong with it. This will be useful for a coming function that
has to turn a string into a valid refname component.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 refs.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 62 insertions(+), 16 deletions(-)

diff --git a/refs.c b/refs.c
index 142888a40a..70c55ea1b6 100644
--- a/refs.c
+++ b/refs.c
@@ -57,10 +57,21 @@ static unsigned char refname_disposition[256] = {
 	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
 };
 
+enum refname_check_code {
+	 refname_ok = 0,
+	 refname_contains_dotdot,
+	 refname_contains_atopen,
+	 refname_has_badchar,
+	 refname_contains_wildcard,
+	 refname_starts_with_dot,
+	 refname_ends_with_dotlock,
+	 refname_component_has_zero_length
+};
+
 /*
  * Try to read one refname component from the front of refname.
- * Return the length of the component found, or -1 if the component is
- * not legal.  It is legal if it is something reasonable to have under
+ * If the component is legal, return the end of the component in cp_out.
+ * It is legal if it is something reasonable to have under
  * ".git/refs/"; We do not like it if:
  *
  * - any path component of it begins with ".", or
@@ -71,11 +82,15 @@ static unsigned char refname_disposition[256] = {
  * - it ends with a "/", or
  * - it ends with ".lock", or
  * - it contains a "@{" portion
+ *
+ * in which case cp_out points to the beginning of the illegal part.
  */
-static int check_refname_component(const char *refname, int *flags)
+static enum refname_check_code do_check_refname_component(
+	const char *refname, int *flags, const char **cp_out)
 {
 	const char *cp;
 	char last = '\0';
+	enum refname_check_code ret = refname_ok;
 
 	for (cp = refname; ; cp++) {
 		int ch = *cp & 255;
@@ -84,18 +99,28 @@ static int check_refname_component(const char *refname, int *flags)
 		case 1:
 			goto out;
 		case 2:
-			if (last == '.')
-				return -1; /* Refname contains "..". */
+			if (last == '.') {
+				cp--;
+				ret = refname_contains_dotdot;
+				goto done;
+			}
 			break;
 		case 3:
-			if (last == '@')
-				return -1; /* Refname contains "@{". */
+			if (last == '@') {
+				cp--;
+				ret = refname_contains_atopen; /* @{ */
+				goto done;
+			}
 			break;
 		case 4:
-			return -1;
+			ret = refname_has_badchar;
+			goto done;
 		case 5:
-			if (!(*flags & REFNAME_REFSPEC_PATTERN))
-				return -1; /* refspec can't be a pattern */
+			if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
+				/* refspec can't be a pattern */
+				ret = refname_contains_wildcard;
+				goto done;
+			}
 
 			/*
 			 * Unset the pattern flag so that we only accept
@@ -107,13 +132,34 @@ static int check_refname_component(const char *refname, int *flags)
 		last = ch;
 	}
 out:
-	if (cp == refname)
-		return 0; /* Component has zero length. */
-	if (refname[0] == '.')
-		return -1; /* Component starts with '.'. */
+	if (cp == refname) {
+		ret = refname_component_has_zero_length;
+		goto done;
+	}
+	if (refname[0] == '.') {
+		cp = refname;
+		ret = refname_starts_with_dot;
+		goto done;
+	}
 	if (cp - refname >= LOCK_SUFFIX_LEN &&
-	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN))
-		return -1; /* Refname ends with ".lock". */
+	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
+		cp -= LOCK_SUFFIX_LEN;
+		ret = refname_ends_with_dotlock;
+	}
+done:
+	*cp_out = cp;
+	return ret;
+}
+
+/* Return the length of the component if it's legal otherwise -1 */
+static int check_refname_component(const char *refname, int *flags)
+{
+	const char *cp;
+	enum refname_check_code ret;
+
+	ret = do_check_refname_component(refname, flags, &cp);
+	if (ret)
+		return -1;
 	return cp - refname;
 }
 
-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v4 2/2] worktree add: sanitize worktree names
  2019-03-05 12:08       ` [PATCH v4 0/2] " Nguyễn Thái Ngọc Duy
  2019-03-05 12:08         ` [PATCH v4 1/2] refs.c: refactor check_refname_component() Nguyễn Thái Ngọc Duy
@ 2019-03-05 12:08         ` Nguyễn Thái Ngọc Duy
  2019-03-08  9:28         ` [PATCH v5 0/1] " Nguyễn Thái Ngọc Duy
  2 siblings, 0 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-03-05 12:08 UTC (permalink / raw)
  To: pclouds
  Cc: git, hi-angel, peff, ramsay, sunshine, Junio C Hamano,
	Johannes Schindelin

Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
significant until 3a3b9d8cde (refs: new ref types to make per-worktree
refs visible to all worktrees - 2018-10-21), where worktree name could
be part of a refname and must follow refname rules.

Update 'worktree add' code to remove special characters to follow these
rules. In the future the user will be able to specify the worktree name
by themselves if they're not happy with this dumb character
substitution.

Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/worktree.c      |  7 ++++++-
 refs.c                  | 36 ++++++++++++++++++++++++++++++++++++
 refs.h                  |  1 +
 t/t2025-worktree-add.sh |  5 +++++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3f9907fcc9..ca63dd3df6 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -275,6 +275,7 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
+	struct strbuf sb_name = STRBUF_INIT;
 
 	validate_worktree_add(path, opts);
 
@@ -290,7 +291,10 @@ static int add_worktree(const char *path, const char *refname,
 		die(_("invalid reference: %s"), refname);
 
 	name = worktree_basename(path, &len);
-	git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), name);
+	strbuf_add(&sb_name, name, path + len - name);
+	sanitize_worktree_refname(&sb_name);
+	name = sb_name.buf;
+	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
@@ -415,6 +419,7 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_release(&symref);
 	strbuf_release(&sb_repo);
 	strbuf_release(&sb_git);
+	strbuf_release(&sb_name);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 70c55ea1b6..a23a84e431 100644
--- a/refs.c
+++ b/refs.c
@@ -163,6 +163,42 @@ static int check_refname_component(const char *refname, int *flags)
 	return cp - refname;
 }
 
+void sanitize_worktree_refname(struct strbuf *name)
+{
+	int flags = 0, i, max_tries;
+	const char *cp;
+	enum refname_check_code ret;
+
+	/*
+	 * name->len should be enough because we should never need to
+	 * substitute any position more than once, but let's just add
+	 * a couple more to be on the safe side.
+	 */
+	max_tries = name->len + 10;
+	for (i = 0; i < max_tries; i++) {
+		ret = do_check_refname_component(name->buf, &flags, &cp);
+		switch (ret) {
+		case refname_ok:
+			strbuf_setlen(name, cp - name->buf);
+			return;
+
+		case refname_component_has_zero_length:
+			strbuf_addstr(name, "worktree");
+			return;
+
+		case refname_contains_dotdot:
+		case refname_contains_atopen:
+		case refname_has_badchar:
+		case refname_contains_wildcard:
+		case refname_ends_with_dotlock:
+		case refname_starts_with_dot:
+			*(char *)cp = '-';
+			break;
+		}
+	}
+	BUG("stuck in infinite loop! buf = %s", name->buf);
+}
+
 int check_refname_format(const char *refname, int flags)
 {
 	int component_len, component_count = 0;
diff --git a/refs.h b/refs.h
index 308fa1f03b..3b65b8d27a 100644
--- a/refs.h
+++ b/refs.h
@@ -459,6 +459,7 @@ int for_each_reflog(each_ref_fn fn, void *cb_data);
  * repeated slashes are accepted.
  */
 int check_refname_format(const char *refname, int flags);
+void sanitize_worktree_refname(struct strbuf *name);
 
 const char *prettify_refname(const char *refname);
 
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 286bba35d8..6e2b90c84f 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -570,4 +570,9 @@ test_expect_success '"add" an existing locked but missing worktree' '
 	git worktree add --force --force --detach gnoo
 '
 
+test_expect_success FUNNYNAMES 'sanitize generated worktree name' '
+	git worktree add --detach ".  weird*..?.lock.lock" &&
+	test -d .git/worktrees/---weird--.-.lock-lock
+'
+
 test_done
-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 1/2] refs.c: refactor check_refname_component()
  2019-03-05 12:08         ` [PATCH v4 1/2] refs.c: refactor check_refname_component() Nguyễn Thái Ngọc Duy
@ 2019-03-06 21:49           ` Jeff King
  2019-03-07 23:24             ` Eric Sunshine
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-03-06 21:49 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, hi-angel, ramsay, sunshine, Junio C Hamano,
	Johannes Schindelin

On Tue, Mar 05, 2019 at 07:08:33PM +0700, Nguyễn Thái Ngọc Duy wrote:

> @@ -71,11 +82,15 @@ static unsigned char refname_disposition[256] = {
>   * - it ends with a "/", or
>   * - it ends with ".lock", or
>   * - it contains a "@{" portion
> + *
> + * in which case cp_out points to the beginning of the illegal part.
>   */
> -static int check_refname_component(const char *refname, int *flags)
> +static enum refname_check_code do_check_refname_component(
> +	const char *refname, int *flags, const char **cp_out)

Hmm, OK, so we get to know what type of problem, but also the exact
character where we found it. And then we just keep mutating that char
until we have something that passes.

I can't think of any reason that wouldn't work. As you note, it's
possibly quadratic, though that might be OK for our purposes.

I had envisioned just sanitizing each character into an output buffer as
we did the checks. It does introduce some complexities, though, because
now the checking function is doing the replacement (so it has to know
the right sanitizing rule for each case).

The patch below is a rough cut at that, just for discussion.  You can
ignore the check-ref-format bits; they were just to make poking at it
easier, though perhaps we'd want something like that in the long run.

I suspect check_refname_component() could be made a bit more readable by
reordering a few bits. E.g., why do we check for a leading "." at the
_end_, after having parsed the entire rest of the component for errors?

I dunno. I think I can live with what you've got in your series, but I
figured I'd share this for the sake of completeness. If you really love
it, feel free to adapt it.

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index bc67d3f0a8..41b5434be2 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -56,6 +56,7 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 	int i;
 	int normalize = 0;
 	int flags = 0;
+	int sanitize = 0;
 	const char *refname;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
@@ -73,13 +74,22 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 			flags &= ~REFNAME_ALLOW_ONELEVEL;
 		else if (!strcmp(argv[i], "--refspec-pattern"))
 			flags |= REFNAME_REFSPEC_PATTERN;
+		else if (!strcmp(argv[i], "--sanitize"))
+			sanitize = 1;
 		else
 			usage(builtin_check_ref_format_usage);
 	}
 	if (! (i == argc - 1))
 		usage(builtin_check_ref_format_usage);
 
 	refname = argv[i];
+	if (sanitize) {
+		struct strbuf out = STRBUF_INIT;
+		sanitize_refname(refname, &out);
+		printf("%s\n", out.buf);
+		strbuf_release(&out);
+		return 0;
+	}
 	if (normalize)
 		refname = collapse_slashes(refname);
 	if (check_refname_format(refname, flags))
diff --git a/refs.c b/refs.c
index 142888a40a..2a0c0c6338 100644
--- a/refs.c
+++ b/refs.c
@@ -72,30 +72,58 @@ static unsigned char refname_disposition[256] = {
  * - it ends with ".lock", or
  * - it contains a "@{" portion
  */
-static int check_refname_component(const char *refname, int *flags)
+static int check_refname_component(const char *refname, int *flags,
+				   struct strbuf *sanitized)
 {
 	const char *cp;
 	char last = '\0';
+	size_t component_start;
+
+	if (sanitized)
+		component_start = sanitized->len;
 
 	for (cp = refname; ; cp++) {
 		int ch = *cp & 255;
 		unsigned char disp = refname_disposition[ch];
+
+		if (sanitized && disp != 1)
+			strbuf_addch(sanitized, ch);
+
 		switch (disp) {
 		case 1:
 			goto out;
 		case 2:
-			if (last == '.')
-				return -1; /* Refname contains "..". */
+			if (last == '.') {
+				/* Refname contains "..". */
+				if (sanitized)
+					sanitized->len--; /* collapse ".." to single "." */
+				else
+					return -1;
+			}
 			break;
 		case 3:
-			if (last == '@')
-				return -1; /* Refname contains "@{". */
+			if (last == '@') {
+				/* Refname contains "@{". */
+				if (sanitized)
+					sanitized->buf[sanitized->len-1] = '-';
+				else
+					return -1;
+			}
 			break;
 		case 4:
-			return -1;
+			/* forbidden char */
+			if (sanitized)
+				sanitized->buf[sanitized->len-1] = '-';
+			else
+				return -1;
+			break;
 		case 5:
-			if (!(*flags & REFNAME_REFSPEC_PATTERN))
-				return -1; /* refspec can't be a pattern */
+			if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
+				if (sanitized)
+					sanitized->buf[sanitized->len-1] = '-';
+				else
+					return -1; /* refspec can't be a pattern */
+			}
 
 			/*
 			 * Unset the pattern flag so that we only accept
@@ -109,26 +137,48 @@ static int check_refname_component(const char *refname, int *flags)
 out:
 	if (cp == refname)
 		return 0; /* Component has zero length. */
-	if (refname[0] == '.')
-		return -1; /* Component starts with '.'. */
+
+	if (refname[0] == '.') {
+		/* Component starts with '.'. */
+		if (sanitized)
+			sanitized->buf[component_start] = '-';
+		else
+			return -1;
+	}
 	if (cp - refname >= LOCK_SUFFIX_LEN &&
-	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN))
-		return -1; /* Refname ends with ".lock". */
+	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
+		/* Refname ends with ".lock". */
+		if (sanitized)
+			strbuf_strip_suffix(sanitized, LOCK_SUFFIX);
+		else
+			return -1;
+	}
 	return cp - refname;
 }
 
-int check_refname_format(const char *refname, int flags)
+static int check_or_sanitize_refname(const char *refname, int flags,
+				     struct strbuf *sanitized)
 {
 	int component_len, component_count = 0;
 
-	if (!strcmp(refname, "@"))
+	if (!strcmp(refname, "@")) {
 		/* Refname is a single character '@'. */
-		return -1;
+		if (sanitized)
+			strbuf_addch(sanitized, '-');
+		else
+			return -1;
+	}
 
 	while (1) {
+		if (sanitized && sanitized->len)
+			strbuf_complete(sanitized, '/');
+
 		/* We are at the start of a path component. */
-		component_len = check_refname_component(refname, &flags);
-		if (component_len <= 0)
+		component_len = check_refname_component(refname, &flags,
+							sanitized);
+		if (sanitized && component_len == 0)
+			; /* OK, omit empty component */
+		else if (component_len <= 0)
 			return -1;
 
 		component_count++;
@@ -138,13 +188,29 @@ int check_refname_format(const char *refname, int flags)
 		refname += component_len + 1;
 	}
 
-	if (refname[component_len - 1] == '.')
-		return -1; /* Refname ends with '.'. */
+	if (refname[component_len - 1] == '.') {
+		/* Refname ends with '.'. */
+		if (sanitized)
+			; /* omit ending dot */
+		else
+			return -1;
+	}
 	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
 		return -1; /* Refname has only one component. */
 	return 0;
 }
 
+int check_refname_format(const char *refname, int flags)
+{
+	return check_or_sanitize_refname(refname, flags, NULL);
+}
+
+void sanitize_refname(const char *refname, struct strbuf *out)
+{
+	if (check_or_sanitize_refname(refname, 0, out))
+		BUG("sanitizing refname check returned error");
+}
+
 int refname_is_safe(const char *refname)
 {
 	const char *rest;
diff --git a/refs.h b/refs.h
index 308fa1f03b..b99c309dd9 100644
--- a/refs.h
+++ b/refs.h
@@ -460,6 +460,12 @@ int for_each_reflog(each_ref_fn fn, void *cb_data);
  */
 int check_refname_format(const char *refname, int flags);
 
+/*
+ * Apply the rules from check_refname_format, but mutate the result until it
+ * is acceptable, and place the result in "out".
+ */
+void sanitize_refname(const char *refname, struct strbuf *out);
+
 const char *prettify_refname(const char *refname);
 
 char *shorten_unambiguous_ref(const char *refname, int strict);

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v4 1/2] refs.c: refactor check_refname_component()
  2019-03-06 21:49           ` Jeff King
@ 2019-03-07 23:24             ` Eric Sunshine
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Sunshine @ 2019-03-07 23:24 UTC (permalink / raw)
  To: Jeff King
  Cc: Nguyễn Thái Ngọc Duy, Git List, Yagamy Light,
	Ramsay Jones, Junio C Hamano, Johannes Schindelin

On Wed, Mar 6, 2019 at 4:49 PM Jeff King <peff@peff.net> wrote:
> On Tue, Mar 05, 2019 at 07:08:33PM +0700, Nguyễn Thái Ngọc Duy wrote:
> I had envisioned just sanitizing each character into an output buffer as
> we did the checks. It does introduce some complexities, though, because
> now the checking function is doing the replacement (so it has to know
> the right sanitizing rule for each case).
>
> The patch below is a rough cut at that, just for discussion.  You can
> ignore the check-ref-format bits; they were just to make poking at it
> easier, though perhaps we'd want something like that in the long run.

This is more along the lines of what I had envisioned, as well, after
looking over the implementation of check_refname_component(). It's a
bit noisy and loud but easy to follow, and doesn't give rise to
concerns about quadratic behavior, etc.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 0/1] worktree add: sanitize worktree names
  2019-03-05 12:08       ` [PATCH v4 0/2] " Nguyễn Thái Ngọc Duy
  2019-03-05 12:08         ` [PATCH v4 1/2] refs.c: refactor check_refname_component() Nguyễn Thái Ngọc Duy
  2019-03-05 12:08         ` [PATCH v4 2/2] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
@ 2019-03-08  9:28         ` Nguyễn Thái Ngọc Duy
  2019-03-08  9:28           ` [PATCH v5 1/1] " Nguyễn Thái Ngọc Duy
  2 siblings, 1 reply; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-03-08  9:28 UTC (permalink / raw)
  To: pclouds; +Cc: Johannes.Schindelin, git, gitster, hi-angel, peff, ramsay,
	sunshine

v5 is basically Jeff's version from one of the replies in v4, where
check_refname_component is enhanced to optionally sanitize.

I was reluctant to go this way because it makes check_refname_component
more complex (turns out still manageable) and burns worktree rules in
it. But there may never be the second sanitization user, we deal with
it when it comes.

As said, refs.c is pretty much Jeff's except two major changes:

 - handle foo.lock.lock correctly by stripping .lock repeatedly

 - sanitize refname _components_ instead of full refs. I could construct
   worktrees/<name> and pass to Jeff's sanitize_refname. But then I need
   to strip worktrees/ after that.

I took credits so that bugs come to me first (then I'll blame him
anyway while doing some evil laughs)

Nguyễn Thái Ngọc Duy (1):
  worktree add: sanitize worktree names

 builtin/worktree.c      |  10 +++-
 refs.c                  | 103 ++++++++++++++++++++++++++++++++--------
 refs.h                  |   6 +++
 t/t2400-worktree-add.sh |   5 ++
 4 files changed, 104 insertions(+), 20 deletions(-)

-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-08  9:28         ` [PATCH v5 0/1] " Nguyễn Thái Ngọc Duy
@ 2019-03-08  9:28           ` Nguyễn Thái Ngọc Duy
  2019-03-10  2:02             ` Eric Sunshine
                               ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2019-03-08  9:28 UTC (permalink / raw)
  To: pclouds; +Cc: Johannes.Schindelin, git, gitster, hi-angel, peff, ramsay,
	sunshine

Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
significant until 3a3b9d8cde (refs: new ref types to make per-worktree
refs visible to all worktrees - 2018-10-21), where worktree name could
be part of a refname and must follow refname rules.

Update 'worktree add' code to remove special characters to follow
these rules. In the future the user will be able to specify the
worktree name by themselves if they're not happy with this dumb
character substitution.

Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/worktree.c      |  10 +++-
 refs.c                  | 103 ++++++++++++++++++++++++++++++++--------
 refs.h                  |   6 +++
 t/t2400-worktree-add.sh |   5 ++
 4 files changed, 104 insertions(+), 20 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 6cc094a453..756cf3a417 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -275,6 +275,7 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf symref = STRBUF_INIT;
 	struct commit *commit = NULL;
 	int is_branch = 0;
+	struct strbuf sb_name = STRBUF_INIT;
 
 	validate_worktree_add(path, opts);
 
@@ -290,7 +291,13 @@ static int add_worktree(const char *path, const char *refname,
 		die(_("invalid reference: %s"), refname);
 
 	name = worktree_basename(path, &len);
-	git_path_buf(&sb_repo, "worktrees/%.*s", (int)(path + len - name), name);
+	strbuf_add(&sb, name, path + len - name);
+	sanitize_refname_component(sb.buf, &sb_name);
+	if (!sb_name.len)
+		BUG("How come '%s' becomes empty after sanitization?", sb.buf);
+	strbuf_reset(&sb);
+	name = sb_name.buf;
+	git_path_buf(&sb_repo, "worktrees/%s", name);
 	len = sb_repo.len;
 	if (safe_create_leading_directories_const(sb_repo.buf))
 		die_errno(_("could not create leading directories of '%s'"),
@@ -416,6 +423,7 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_release(&symref);
 	strbuf_release(&sb_repo);
 	strbuf_release(&sb_git);
+	strbuf_release(&sb_name);
 	return ret;
 }
 
diff --git a/refs.c b/refs.c
index 142888a40a..e9f83018f0 100644
--- a/refs.c
+++ b/refs.c
@@ -72,30 +72,57 @@ static unsigned char refname_disposition[256] = {
  * - it ends with ".lock", or
  * - it contains a "@{" portion
  */
-static int check_refname_component(const char *refname, int *flags)
+static int check_refname_component(const char *refname, int *flags,
+				   struct strbuf *sanitized)
 {
 	const char *cp;
 	char last = '\0';
+	size_t component_start;
+
+	if (sanitized)
+		component_start = sanitized->len;
 
 	for (cp = refname; ; cp++) {
 		int ch = *cp & 255;
 		unsigned char disp = refname_disposition[ch];
+
+		if (sanitized && disp != 1)
+			strbuf_addch(sanitized, ch);
+
 		switch (disp) {
 		case 1:
 			goto out;
 		case 2:
-			if (last == '.')
-				return -1; /* Refname contains "..". */
+			if (last == '.') { /* Refname contains "..". */
+				if (sanitized)
+					sanitized->len--; /* collapse ".." to single "." */
+				else
+					return -1;
+			}
 			break;
 		case 3:
-			if (last == '@')
-				return -1; /* Refname contains "@{". */
+			if (last == '@') { /* Refname contains "@{". */
+				if (sanitized)
+					sanitized->buf[sanitized->len-1] = '-';
+				else
+					return -1;
+			}
 			break;
 		case 4:
-			return -1;
+			/* forbidden char */
+			if (sanitized)
+				sanitized->buf[sanitized->len-1] = '-';
+			else
+				return -1;
+			break;
 		case 5:
-			if (!(*flags & REFNAME_REFSPEC_PATTERN))
-				return -1; /* refspec can't be a pattern */
+			if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
+				/* refspec can't be a pattern */
+				if (sanitized)
+					sanitized->buf[sanitized->len-1] = '-';
+				else
+					return -1;
+			}
 
 			/*
 			 * Unset the pattern flag so that we only accept
@@ -109,26 +136,48 @@ static int check_refname_component(const char *refname, int *flags)
 out:
 	if (cp == refname)
 		return 0; /* Component has zero length. */
-	if (refname[0] == '.')
-		return -1; /* Component starts with '.'. */
+
+	if (refname[0] == '.') { /* Component starts with '.'. */
+		if (sanitized)
+			sanitized->buf[component_start] = '-';
+		else
+			return -1;
+	}
 	if (cp - refname >= LOCK_SUFFIX_LEN &&
-	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN))
-		return -1; /* Refname ends with ".lock". */
+	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
+		if (!sanitized)
+			return -1;
+		/* Refname ends with ".lock". */
+		while (strbuf_strip_suffix(sanitized, LOCK_SUFFIX)) {
+			/* try again in case we have .lock.lock */
+		}
+	}
 	return cp - refname;
 }
 
-int check_refname_format(const char *refname, int flags)
+static int check_or_sanitize_refname(const char *refname, int flags,
+				     struct strbuf *sanitized)
 {
 	int component_len, component_count = 0;
 
-	if (!strcmp(refname, "@"))
+	if (!strcmp(refname, "@")) {
 		/* Refname is a single character '@'. */
-		return -1;
+		if (sanitized)
+			strbuf_addch(sanitized, '-');
+		else
+			return -1;
+	}
 
 	while (1) {
+		if (sanitized && sanitized->len)
+			strbuf_complete(sanitized, '/');
+
 		/* We are at the start of a path component. */
-		component_len = check_refname_component(refname, &flags);
-		if (component_len <= 0)
+		component_len = check_refname_component(refname, &flags,
+							sanitized);
+		if (sanitized && component_len == 0)
+			; /* OK, omit empty component */
+		else if (component_len <= 0)
 			return -1;
 
 		component_count++;
@@ -138,13 +187,29 @@ int check_refname_format(const char *refname, int flags)
 		refname += component_len + 1;
 	}
 
-	if (refname[component_len - 1] == '.')
-		return -1; /* Refname ends with '.'. */
+	if (refname[component_len - 1] == '.') {
+		/* Refname ends with '.'. */
+		if (sanitized)
+			; /* omit ending dot */
+		else
+			return -1;
+	}
 	if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
 		return -1; /* Refname has only one component. */
 	return 0;
 }
 
+int check_refname_format(const char *refname, int flags)
+{
+	return check_or_sanitize_refname(refname, flags, NULL);
+}
+
+void sanitize_refname_component(const char *refname, struct strbuf *out)
+{
+	if (check_or_sanitize_refname(refname, REFNAME_ALLOW_ONELEVEL, out))
+		BUG("sanitizing refname '%s' check returned error", refname);
+}
+
 int refname_is_safe(const char *refname)
 {
 	const char *rest;
diff --git a/refs.h b/refs.h
index 308fa1f03b..4d8c5465c3 100644
--- a/refs.h
+++ b/refs.h
@@ -460,6 +460,12 @@ int for_each_reflog(each_ref_fn fn, void *cb_data);
  */
 int check_refname_format(const char *refname, int flags);
 
+/*
+ * Apply the rules from check_refname_format, but mutate the result until it
+ * is acceptable, and place the result in "out".
+ */
+void sanitize_refname_component(const char *refname, struct strbuf *out);
+
 const char *prettify_refname(const char *refname);
 
 char *shorten_unambiguous_ref(const char *refname, int strict);
diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
index 286bba35d8..c989dbe321 100755
--- a/t/t2400-worktree-add.sh
+++ b/t/t2400-worktree-add.sh
@@ -570,4 +570,9 @@ test_expect_success '"add" an existing locked but missing worktree' '
 	git worktree add --force --force --detach gnoo
 '
 
+test_expect_success FUNNYNAMES 'sanitize generated worktree name' '
+	git worktree add --detach ".  weird*..?.lock.lock" &&
+	test -d .git/worktrees/---weird-.-
+'
+
 test_done
-- 
2.21.0.rc1.337.gdf7f8d0522


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-08  9:28           ` [PATCH v5 1/1] " Nguyễn Thái Ngọc Duy
@ 2019-03-10  2:02             ` Eric Sunshine
  2019-03-11  6:20               ` Junio C Hamano
  2019-03-11  6:36             ` Junio C Hamano
  2019-03-11 13:05             ` Johannes Schindelin
  2 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2019-03-10  2:02 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Johannes Schindelin, Git List, Junio C Hamano, Yagamy Light,
	Jeff King, Ramsay Jones

On Fri, Mar 8, 2019 at 4:28 AM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> Worktree names are based on $(basename $GIT_WORK_TREE). They aren't
> significant until 3a3b9d8cde (refs: new ref types to make per-worktree
> refs visible to all worktrees - 2018-10-21), where worktree name could
> be part of a refname and must follow refname rules.
>
> Update 'worktree add' code to remove special characters to follow
> these rules. In the future the user will be able to specify the
> worktree name by themselves if they're not happy with this dumb
> character substitution.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> diff --git a/refs.c b/refs.c
> @@ -72,30 +72,57 @@ static unsigned char refname_disposition[256] = {
> +static int check_refname_component(const char *refname, int *flags,
> +                                  struct strbuf *sanitized)
>  {
>         for (cp = refname; ; cp++) {
>                 unsigned char disp = refname_disposition[ch];
> +               if (sanitized && disp != 1)
> +                       strbuf_addch(sanitized, ch);
> +
>                 switch (disp) {
>                 case 1:
>                         goto out;
>                 case 2:
> +                       if (last == '.') { /* Refname contains "..". */
> +                               if (sanitized)
> +                                       sanitized->len--; /* collapse ".." to single "." */

I think this needs to be:

    strbuf_setlen(sanitized, sanitized->len - 1);

to ensure that NUL-terminator ends up in the correct place if this "."
is the very last character in 'refname'. (Otherwise, the NUL will
remain after the second ".", thus ".." won't be collapsed to "." at
all.)

> +                               else
> +                                       return -1;
> +                       }
>                         break;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-10  2:02             ` Eric Sunshine
@ 2019-03-11  6:20               ` Junio C Hamano
  2019-03-11  9:24                 ` Duy Nguyen
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-03-11  6:20 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Nguyễn Thái Ngọc Duy, Johannes Schindelin,
	Git List, Yagamy Light, Jeff King, Ramsay Jones

Eric Sunshine <sunshine@sunshineco.com> writes:

>>                 case 2:
>> +                       if (last == '.') { /* Refname contains "..". */
>> +                               if (sanitized)
>> +                                       sanitized->len--; /* collapse ".." to single "." */
>
> I think this needs to be:
>
>     strbuf_setlen(sanitized, sanitized->len - 1);
>
> to ensure that NUL-terminator ends up in the correct place if this "."
> is the very last character in 'refname'. (Otherwise, the NUL will
> remain after the second ".", thus ".." won't be collapsed to "." at
> all.)

True.  Why doesn't it do the similar "replace with -" it does for
other unfortunate characters, though?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-08  9:28           ` [PATCH v5 1/1] " Nguyễn Thái Ngọc Duy
  2019-03-10  2:02             ` Eric Sunshine
@ 2019-03-11  6:36             ` Junio C Hamano
  2019-03-11  9:27               ` Duy Nguyen
  2019-03-11 13:05             ` Johannes Schindelin
  2 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-03-11  6:36 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: Johannes.Schindelin, git, hi-angel, peff, ramsay, sunshine

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Update 'worktree add' code to remove special characters to follow
> these rules. In the future the user will be able to specify the
> worktree name by themselves if they're not happy with this dumb
> character substitution.

This replaces both of the two patches in v4, and applies to a more
recent tip of 'master' (post 7d0c1f4556).

> diff --git a/refs.c b/refs.c
> index 142888a40a..e9f83018f0 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -72,30 +72,57 @@ static unsigned char refname_disposition[256] = {
>   * - it ends with ".lock", or
>   * - it contains a "@{" portion
>   */

The comment above needs modernizing (see attached at the end).

>  		case 2:
> -			if (last == '.')
> -				return -1; /* Refname contains "..". */
> +			if (last == '.') { /* Refname contains "..". */
> +				if (sanitized)
> +					sanitized->len--; /* collapse ".." to single "." */

As Eric points out, this needs to be fixed.  

I'll use the strbuf_setlen() version suggested by Eric in the
meantime, but "sanitized->buf[sanitized->len-1] = '-'" as done to
everything else in the function may be a better idea, especially
since they'll be able to name the worktree themselves in the future
anyway.

> +
> +	if (refname[0] == '.') { /* Component starts with '.'. */
> +		if (sanitized)
> +			sanitized->buf[component_start] = '-';

... and a dot turns into a dash in some cases anyway.

> +		else
> +			return -1;
> +	}
>  	if (cp - refname >= LOCK_SUFFIX_LEN &&
> -	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN))
> -		return -1; /* Refname ends with ".lock". */
> +	    !memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
> +		if (!sanitized)
> +			return -1;
> +		/* Refname ends with ".lock". */
> +		while (strbuf_strip_suffix(sanitized, LOCK_SUFFIX)) {
> +			/* try again in case we have .lock.lock */
> +		}

No need for {}; just have an empty statment

		while (...) 
			; /* try again ... */

This "strip all .lock repeatedly" made me stop and think a bit; this
will never make the component empty, as the only way for this loop
to make it empty is if we have a string that match "^\(.lock)\*$" in
it, but the first dot would have already been turned into a dash, so
we'll end up with "-lock", which is not empty.

> +	}
>  	return cp - refname;
>  }

See below for a possible further polishment.

 * The first hunk is not about this patch but a long-standing issue
   after the comment was given to this function for a single level
   (I do not know or care how it happened--perhaps we had a single
   function that verifies multiple levels which later was split into
   a caller that loops and this function that checks a single level,
   and the comment for the multi-level function was left stale).

 * check_refname_component() can now try to sanitize; document it.

 * The last hunk is from Eric's comment.

 refs.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index e9f83018f0..3a1b2a8c3c 100644
--- a/refs.c
+++ b/refs.c
@@ -63,7 +63,7 @@ static unsigned char refname_disposition[256] = {
  * not legal.  It is legal if it is something reasonable to have under
  * ".git/refs/"; We do not like it if:
  *
- * - any path component of it begins with ".", or
+ * - it begins with ".", or
  * - it has double dots "..", or
  * - it has ASCII control characters, or
  * - it has ":", "?", "[", "\", "^", "~", SP, or TAB anywhere, or
@@ -71,6 +71,10 @@ static unsigned char refname_disposition[256] = {
  * - it ends with a "/", or
  * - it ends with ".lock", or
  * - it contains a "@{" portion
+ *
+ * When sanitized is not NULL, instead of rejecting the input refname
+ * as an error, try to come up with a usable replacement for the input
+ * refname in it.
  */
 static int check_refname_component(const char *refname, int *flags,
 				   struct strbuf *sanitized)
@@ -95,7 +99,8 @@ static int check_refname_component(const char *refname, int *flags,
 		case 2:
 			if (last == '.') { /* Refname contains "..". */
 				if (sanitized)
-					sanitized->len--; /* collapse ".." to single "." */
+					/* collapse ".." to single "." */
+					strbuf_setlen(sanitized, sanitized->len - 1);
 				else
 					return -1;
 			}

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-11  6:20               ` Junio C Hamano
@ 2019-03-11  9:24                 ` Duy Nguyen
  2019-03-11 22:39                   ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Duy Nguyen @ 2019-03-11  9:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Eric Sunshine, Johannes Schindelin, Git List, Yagamy Light,
	Jeff King, Ramsay Jones

On Mon, Mar 11, 2019 at 1:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
> >>                 case 2:
> >> +                       if (last == '.') { /* Refname contains "..". */
> >> +                               if (sanitized)
> >> +                                       sanitized->len--; /* collapse ".." to single "." */
> >
> > I think this needs to be:
> >
> >     strbuf_setlen(sanitized, sanitized->len - 1);
> >
> > to ensure that NUL-terminator ends up in the correct place if this "."
> > is the very last character in 'refname'. (Otherwise, the NUL will
> > remain after the second ".", thus ".." won't be collapsed to "." at
> > all.)
>
> True.  Why doesn't it do the similar "replace with -" it does for
> other unfortunate characters, though?
>

I think Jeff saw an opportunity to keep it cleaner ("." looks better
than ".-") and took it.
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-11  6:36             ` Junio C Hamano
@ 2019-03-11  9:27               ` Duy Nguyen
  0 siblings, 0 replies; 41+ messages in thread
From: Duy Nguyen @ 2019-03-11  9:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Konstantin Kharlamov,
	Jeff King, Ramsay Jones, Eric Sunshine

On Mon, Mar 11, 2019 at 1:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>                 while (...)
>                         ; /* try again ... */
>
> This "strip all .lock repeatedly" made me stop and think a bit; this
> will never make the component empty, as the only way for this loop
> to make it empty is if we have a string that match "^\(.lock)\*$" in
> it, but the first dot would have already been turned into a dash, so
> we'll end up with "-lock", which is not empty.

Yep. I added a BUG() check anyway in worktree.c just in case something
slips through in the future.

> > +     }
> >       return cp - refname;
> >  }
>
> See below for a possible further polishment.
>
>  * The first hunk is not about this patch but a long-standing issue
>    after the comment was given to this function for a single level
>    (I do not know or care how it happened--perhaps we had a single
>    function that verifies multiple levels which later was split into
>    a caller that loops and this function that checks a single level,
>    and the comment for the multi-level function was left stale).
>
>  * check_refname_component() can now try to sanitize; document it.
>
>  * The last hunk is from Eric's comment.

Thanks.
-- 
Duy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-08  9:28           ` [PATCH v5 1/1] " Nguyễn Thái Ngọc Duy
  2019-03-10  2:02             ` Eric Sunshine
  2019-03-11  6:36             ` Junio C Hamano
@ 2019-03-11 13:05             ` Johannes Schindelin
  2019-03-12  6:45               ` Junio C Hamano
  2 siblings, 1 reply; 41+ messages in thread
From: Johannes Schindelin @ 2019-03-11 13:05 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, gitster, hi-angel, peff, ramsay, sunshine

[-- Attachment #1: Type: text/plain, Size: 3151 bytes --]

Hi Duy,

On Fri, 8 Mar 2019, Nguyễn Thái Ngọc Duy wrote:

> diff --git a/refs.c b/refs.c
> index 142888a40a..e9f83018f0 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -72,30 +72,57 @@ static unsigned char refname_disposition[256] = {
>   * - it ends with ".lock", or
>   * - it contains a "@{" portion
>   */
> -static int check_refname_component(const char *refname, int *flags)
> +static int check_refname_component(const char *refname, int *flags,
> +				   struct strbuf *sanitized)
>  {
>  	const char *cp;
>  	char last = '\0';
> +	size_t component_start;

This variable is uninitialized. It is then...

> +
> +	if (sanitized)
> +		component_start = sanitized->len;

... initialized only when `sanitized` is not `NULL`, and subsequently...

>  
>  	for (cp = refname; ; cp++) {
>  		int ch = *cp & 255;
>  		unsigned char disp = refname_disposition[ch];
> +
> +		if (sanitized && disp != 1)
> +			strbuf_addch(sanitized, ch);
> +
>  		switch (disp) {
>  		case 1:
>  			goto out;
>  		case 2:
> -			if (last == '.')
> -				return -1; /* Refname contains "..". */
> +			if (last == '.') { /* Refname contains "..". */
> +				if (sanitized)
> +					sanitized->len--; /* collapse ".." to single "." */
> +				else
> +					return -1;
> +			}
>  			break;
>  		case 3:
> -			if (last == '@')
> -				return -1; /* Refname contains "@{". */
> +			if (last == '@') { /* Refname contains "@{". */
> +				if (sanitized)
> +					sanitized->buf[sanitized->len-1] = '-';
> +				else
> +					return -1;
> +			}
>  			break;
>  		case 4:
> -			return -1;
> +			/* forbidden char */
> +			if (sanitized)
> +				sanitized->buf[sanitized->len-1] = '-';
> +			else
> +				return -1;
> +			break;
>  		case 5:
> -			if (!(*flags & REFNAME_REFSPEC_PATTERN))
> -				return -1; /* refspec can't be a pattern */
> +			if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
> +				/* refspec can't be a pattern */
> +				if (sanitized)
> +					sanitized->buf[sanitized->len-1] = '-';
> +				else
> +					return -1;
> +			}
>  
>  			/*
>  			 * Unset the pattern flag so that we only accept
> @@ -109,26 +136,48 @@ static int check_refname_component(const char *refname, int *flags)
>  out:
>  	if (cp == refname)
>  		return 0; /* Component has zero length. */
> -	if (refname[0] == '.')
> -		return -1; /* Component starts with '.'. */
> +
> +	if (refname[0] == '.') { /* Component starts with '.'. */
> +		if (sanitized)
> +			sanitized->buf[component_start] = '-';

... used a loooooooong time after that, also only if `sanitized` is not
`NULL`.

Apparently for some GCC versions, this is too cute, and it complains that
this variable might be used uninitialized:
https://dev.azure.com/gitgitgadget/git/_build/results?buildId=4352&view=logs

And quite honestly, even for mere humans it is not all *that* clear that
`sanitized` cannot be changed from `NULL` to non-`NULL` in the code in
between, *in particular* because the changes extend over two hunks, the
code between is not shown.

I would strongly advise against trying to be so cute, and just initialize
the variable already. Over-optimization in such instances makes the code a
lot harder to reason about.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-11  9:24                 ` Duy Nguyen
@ 2019-03-11 22:39                   ` Jeff King
  2019-03-12  6:32                     ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-03-11 22:39 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Junio C Hamano, Eric Sunshine, Johannes Schindelin, Git List,
	Yagamy Light, Ramsay Jones

On Mon, Mar 11, 2019 at 04:24:13PM +0700, Duy Nguyen wrote:

> > > I think this needs to be:
> > >
> > >     strbuf_setlen(sanitized, sanitized->len - 1);
> > >
> > > to ensure that NUL-terminator ends up in the correct place if this "."
> > > is the very last character in 'refname'. (Otherwise, the NUL will
> > > remain after the second ".", thus ".." won't be collapsed to "." at
> > > all.)
> >
> > True.  Why doesn't it do the similar "replace with -" it does for
> > other unfortunate characters, though?
> 
> I think Jeff saw an opportunity to keep it cleaner ("." looks better
> than ".-") and took it.

Yeah, that was one thing I was going to comment on your patch. The
"rules" I made up were pretty ad-hoc as I was walking through the
function (note it also drops ".lock" instead of sanitizing it into
"-lock").

But it may make sense to make things more consistent (even if the result
isn't entirely reversible).

Another option _is_ to actually make it reversible. I.e., use "%2e"
instead of ".", which would also necessitate replacing "%". I don't know
if that has a huge value for this use-case, but it's a nice property
that two sanitized names can't collide (unless they originally
identical).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-11 22:39                   ` Jeff King
@ 2019-03-12  6:32                     ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-03-12  6:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Duy Nguyen, Eric Sunshine, Johannes Schindelin, Git List,
	Yagamy Light, Ramsay Jones

Jeff King <peff@peff.net> writes:

> Another option _is_ to actually make it reversible. I.e., use "%2e"
> instead of ".", which would also necessitate replacing "%". I don't know
> if that has a huge value for this use-case, but it's a nice property
> that two sanitized names can't collide (unless they originally
> identical).

Yeah.  That is a good property to have.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 1/1] worktree add: sanitize worktree names
  2019-03-11 13:05             ` Johannes Schindelin
@ 2019-03-12  6:45               ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-03-12  6:45 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Nguyễn Thái Ngọc Duy, git, hi-angel, peff,
	ramsay, sunshine

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> +static int check_refname_component(const char *refname, int *flags,
>> +				   struct strbuf *sanitized)
>>  {
>>  	const char *cp;
>>  	char last = '\0';
>> +	size_t component_start;
>
> This variable is uninitialized. It is then...
>
>> +
>> +	if (sanitized)
>> +		component_start = sanitized->len;
>
> ... initialized only when `sanitized` is not `NULL`, and subsequently...
> ...
>> +	if (refname[0] == '.') { /* Component starts with '.'. */
>> +		if (sanitized)
>> +			sanitized->buf[component_start] = '-';
> ...
> ... used a loooooooong time after that, also only if `sanitized` is not
> `NULL`.
>
> Apparently for some GCC versions, this is too cute, and it complains that

It does require humans (well, at least it did to this one) to be
careful when reading the code to know that component_start is valid
when it is used.

There unfortunately is no good "default" value to initialize the
variable to.  When checking a later component in a series of
components, it would be looking at non-zero position, so even
initializing it to 0 in this function is *not* a more sensible
fallback default value than any other random garbage value (which
would squelch the compiler, but it would mislead the humans
nevertheless).

And that (i.e. the lack of any sensible default value when sanitized
is NULL) is the reason why the variable is left uninitialized by the
patch, I think.  I do not think the code is trying to be cute at
all.

I wonder if we make the caller pass a pointer to

	struct {
		struct strbuf result;
		size_t component_start;
	} sanitized;

	sanitized.component_start = sanitized.result.len
	check_refname_component(refname, flags, &sanitized);

and get rid of the assignment to component_start done by the callee,
it would appease compilers and makes the code easier to vet.  It does
introduce one more ad-hoc type, which is a certain downside.

I dunno.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-03-12  6:45 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-18 14:36 git gc fails with "unable to resolve reference" for worktree hi-angel
2019-02-18 15:02 ` Duy Nguyen
2019-02-18 15:09   ` hi-angel
2019-02-18 15:18     ` Duy Nguyen
2019-02-20 14:34       ` hi-angel
2019-02-21 11:00 ` [PATCH] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
2019-02-21 11:28   ` Konstantin Kharlamov
2019-02-21 11:38     ` Duy Nguyen
2019-02-21 11:44       ` Konstantin Kharlamov
2019-02-21 11:52         ` Duy Nguyen
2019-02-21 13:23           ` Jeff King
2019-02-21 12:19   ` [PATCH v2 0/1] " Nguyễn Thái Ngọc Duy
2019-02-21 12:19     ` [PATCH v2 1/1] " Nguyễn Thái Ngọc Duy
2019-02-21 13:22       ` Jeff King
2019-02-21 17:41       ` Ramsay Jones
2019-02-22  9:21         ` Duy Nguyen
2019-02-26 10:58     ` [PATCH v3 0/1] " Nguyễn Thái Ngọc Duy
2019-02-26 10:58       ` [PATCH v3 1/1] " Nguyễn Thái Ngọc Duy
2019-02-27 12:08         ` Jeff King
2019-02-27 14:23           ` Eric Sunshine
2019-02-27 16:04             ` Jeff King
2019-03-03  1:22               ` Junio C Hamano
2019-03-04 11:19               ` Duy Nguyen
2019-03-04 12:04                 ` Duy Nguyen
2019-03-04 15:06         ` Johannes Schindelin
2019-03-05 12:08       ` [PATCH v4 0/2] " Nguyễn Thái Ngọc Duy
2019-03-05 12:08         ` [PATCH v4 1/2] refs.c: refactor check_refname_component() Nguyễn Thái Ngọc Duy
2019-03-06 21:49           ` Jeff King
2019-03-07 23:24             ` Eric Sunshine
2019-03-05 12:08         ` [PATCH v4 2/2] worktree add: sanitize worktree names Nguyễn Thái Ngọc Duy
2019-03-08  9:28         ` [PATCH v5 0/1] " Nguyễn Thái Ngọc Duy
2019-03-08  9:28           ` [PATCH v5 1/1] " Nguyễn Thái Ngọc Duy
2019-03-10  2:02             ` Eric Sunshine
2019-03-11  6:20               ` Junio C Hamano
2019-03-11  9:24                 ` Duy Nguyen
2019-03-11 22:39                   ` Jeff King
2019-03-12  6:32                     ` Junio C Hamano
2019-03-11  6:36             ` Junio C Hamano
2019-03-11  9:27               ` Duy Nguyen
2019-03-11 13:05             ` Johannes Schindelin
2019-03-12  6:45               ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).