git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 0/8] Harden the sparse-checkout builtin
@ 2020-01-14 19:25 Derrick Stolee via GitGitGadget
  2020-01-14 19:25 ` [PATCH 1/8] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
                   ` (10 more replies)
  0 siblings, 11 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:25 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee

This series is based on ds/sparse-list-in-cone-mode.

This series attempts to clean up some rough edges in the sparse-checkout
feature, especially around the cone mode.

Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
clone --sparse" option when using a URL instead of a local path. This is
fixed and properly tested here.

Also, let's improve Git's response to these more complicated scenarios:

 1. Running "git sparse-checkout init" in a worktree would complain because
    the "info" dir doesn't exist.
 2. Tracked paths that include "*" and "" in their filenames.
 3. If a user edits the sparse-checkout file to have non-cone pattern, such
    as "*" anywhere or "" in the wrong place, then we should respond
    appropriately. That is: warn that the patterns are not cone-mode, then
    revert to the old logic.

Thanks, -Stolee

Derrick Stolee (8):
  t1091: use check_files to reduce boilerplate
  sparse-checkout: create leading directories
  clone: fix --sparse option with URLs
  sparse-checkout: cone mode does not recognize "**"
  sparse-checkout: detect short patterns
  sparse-checkout: warn on incorrect '*' in patterns
  sparse-checkout: properly match escaped characters
  sparse-checkout: write escaped patterns in cone mode

 builtin/clone.c                    |   2 +-
 builtin/sparse-checkout.c          |  52 ++++-
 dir.c                              |  69 ++++++-
 dir.h                              |   1 +
 t/t1091-sparse-checkout-builtin.sh | 320 ++++++++++++++++-------------
 5 files changed, 296 insertions(+), 148 deletions(-)


base-commit: 4fd683b6a35eabd23dd5183da7f654a1e1f00325
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-513%2Fderrickstolee%2Fsparse-harden-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-513/derrickstolee/sparse-harden-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/513
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/8] t1091: use check_files to reduce boilerplate
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
@ 2020-01-14 19:25 ` Derrick Stolee via GitGitGadget
  2020-01-16 21:40   ` Junio C Hamano
  2020-01-14 19:25 ` [PATCH 2/8] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:25 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When testing the sparse-checkout feature, we need to compare the
contents of the working-directory against some expected output.
Using here-docs was useful in the beginning, but became repetetive
as the test script grew.

Create a check_files helper to make the tests simpler and easier
to extend. It also reduces instances of bad here-doc whitespace.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1091-sparse-checkout-builtin.sh | 215 ++++++++++-------------------
 1 file changed, 71 insertions(+), 144 deletions(-)

diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index ff7f8f7a1f..20caefe155 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -12,6 +12,13 @@ list_files() {
 	(cd "$1" && printf '%s\n' *)
 }
 
+check_files() {
+	DIR=$1
+	printf "%s\n" $2 >expect &&
+	list_files $DIR >actual &&
+	test_cmp expect actual
+}
+
 test_expect_success 'setup' '
 	git init repo &&
 	(
@@ -39,11 +46,11 @@ test_expect_success 'git sparse-checkout list (empty)' '
 
 test_expect_success 'git sparse-checkout list (populated)' '
 	test_when_finished rm -f repo/.git/info/sparse-checkout &&
-	cat >repo/.git/info/sparse-checkout <<-EOF &&
-		/folder1/*
-		/deep/
-		**/a
-		!*bin*
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/folder1/*
+	/deep/
+	**/a
+	!*bin*
 	EOF
 	cp repo/.git/info/sparse-checkout expect &&
 	git -C repo sparse-checkout list >list &&
@@ -52,22 +59,20 @@ test_expect_success 'git sparse-checkout list (populated)' '
 
 test_expect_success 'git sparse-checkout init' '
 	git -C repo sparse-checkout init &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
 	test_cmp_config -C repo true core.sparsecheckout &&
-	list_files repo >dir  &&
-	echo a >expect &&
-	test_cmp expect dir
+	check_files repo a
 '
 
 test_expect_success 'git sparse-checkout list after init' '
 	git -C repo sparse-checkout list >actual &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
 	EOF
 	test_cmp expect actual
 '
@@ -75,32 +80,24 @@ test_expect_success 'git sparse-checkout list after init' '
 test_expect_success 'init with existing sparse-checkout' '
 	echo "*folder*" >> repo/.git/info/sparse-checkout &&
 	git -C repo sparse-checkout init &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		*folder*
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	*folder*
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo "a folder1 folder2"
 '
 
 test_expect_success 'clone --sparse' '
 	git clone --sparse repo clone &&
 	git -C clone sparse-checkout list >actual &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
 	EOF
 	test_cmp expect actual &&
-	list_files clone >dir &&
-	echo a >expect &&
-	test_cmp expect dir
+	check_files clone a
 '
 
 test_expect_success 'set enables config' '
@@ -119,41 +116,29 @@ test_expect_success 'set enables config' '
 
 test_expect_success 'set sparse-checkout using builtin' '
 	git -C repo sparse-checkout set "/*" "!/*/" "*folder*" &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		*folder*
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	*folder*
 	EOF
 	git -C repo sparse-checkout list >actual &&
 	test_cmp expect actual &&
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo "a folder1 folder2"
 '
 
 test_expect_success 'set sparse-checkout using --stdin' '
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/folder1/
-		/folder2/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/folder1/
+	/folder2/
 	EOF
 	git -C repo sparse-checkout set --stdin <expect &&
 	git -C repo sparse-checkout list >actual &&
 	test_cmp expect actual &&
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo "a folder1 folder2"
 '
 
 test_expect_success 'cone mode: match patterns' '
@@ -162,13 +147,7 @@ test_expect_success 'cone mode: match patterns' '
 	git -C repo read-tree -mu HEAD 2>err &&
 	test_i18ngrep ! "disabling cone patterns" err &&
 	git -C repo reset --hard &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo "a folder1 folder2"
 '
 
 test_expect_success 'cone mode: warn on bad pattern' '
@@ -185,14 +164,7 @@ test_expect_success 'sparse-checkout disable' '
 	test_path_is_file repo/.git/info/sparse-checkout &&
 	git -C repo config --list >config &&
 	test_must_fail git config core.sparseCheckout &&
-	list_files repo >dir &&
-	cat >expect <<-EOF &&
-		a
-		deep
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo "a deep folder1 folder2"
 '
 
 test_expect_success 'cone mode: init and set' '
@@ -204,52 +176,31 @@ test_expect_success 'cone mode: init and set' '
 	test_cmp expect dir &&
 	git -C repo sparse-checkout set deep/deeper1/deepest/ 2>err &&
 	test_must_be_empty err &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		deep
-	EOF
-	test_cmp expect dir &&
-	list_files repo/deep >dir  &&
-	cat >expect <<-EOF &&
-		a
-		deeper1
-	EOF
-	test_cmp expect dir &&
-	list_files repo/deep/deeper1 >dir  &&
-	cat >expect <<-EOF &&
-		a
-		deepest
-	EOF
-	test_cmp expect dir &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/deep/
-		!/deep/*/
-		/deep/deeper1/
-		!/deep/deeper1/*/
-		/deep/deeper1/deepest/
+	check_files repo "a deep" &&
+	check_files repo/deep "a deeper1" &&
+	check_files repo/deep/deeper1 "a deepest" &&
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/deep/
+	!/deep/*/
+	/deep/deeper1/
+	!/deep/deeper1/*/
+	/deep/deeper1/deepest/
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	git -C repo sparse-checkout set --stdin 2>err <<-EOF &&
-		folder1
-		folder2
+	git -C repo sparse-checkout set --stdin 2>err <<-\EOF &&
+	folder1
+	folder2
 	EOF
 	test_must_be_empty err &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	list_files repo >dir &&
-	test_cmp expect dir
+	check_files repo "a folder1 folder2"
 '
 
 test_expect_success 'cone mode: list' '
-	cat >expect <<-EOF &&
-		folder1
-		folder2
+	cat >expect <<-\EOF &&
+	folder1
+	folder2
 	EOF
 	git -C repo sparse-checkout set --stdin <expect &&
 	git -C repo sparse-checkout list >actual 2>err &&
@@ -260,10 +211,10 @@ test_expect_success 'cone mode: list' '
 test_expect_success 'cone mode: set with nested folders' '
 	git -C repo sparse-checkout set deep deep/deeper1/deepest 2>err &&
 	test_line_count = 0 err &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/deep/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/deep/
 	EOF
 	test_cmp repo/.git/info/sparse-checkout expect
 '
@@ -275,13 +226,7 @@ test_expect_success 'revert to old sparse-checkout on bad update' '
 	test_must_fail git -C repo sparse-checkout set deep/deeper1 2>err &&
 	test_i18ngrep "cannot set sparse-checkout patterns" err &&
 	test_cmp repo/.git/info/sparse-checkout expect &&
-	list_files repo/deep >dir &&
-	cat >expect <<-EOF &&
-		a
-		deeper1
-		deeper2
-	EOF
-	test_cmp dir expect
+	check_files repo/deep "a deeper1 deeper2"
 '
 
 test_expect_success 'revert to old sparse-checkout on empty update' '
@@ -326,18 +271,13 @@ test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
 test_expect_success 'cone mode: set with core.ignoreCase=true' '
 	git -C repo sparse-checkout init --cone &&
 	git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/folder1/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/folder1/
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-	EOF
-	test_cmp expect dir
+	check_files repo "a folder1"
 '
 
 test_expect_success 'interaction with submodules' '
@@ -351,21 +291,8 @@ test_expect_success 'interaction with submodules' '
 		git sparse-checkout init --cone &&
 		git sparse-checkout set folder1
 	) &&
-	list_files super >dir &&
-	cat >expect <<-\EOF &&
-		a
-		folder1
-		modules
-	EOF
-	test_cmp expect dir &&
-	list_files super/modules/child >dir &&
-	cat >expect <<-\EOF &&
-		a
-		deep
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files super "a folder1 modules" &&
+	check_files super/modules/child "a deep folder1 folder2"
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 2/8] sparse-checkout: create leading directories
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
  2020-01-14 19:25 ` [PATCH 1/8] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
@ 2020-01-14 19:25 ` Derrick Stolee via GitGitGadget
  2020-01-16 21:46   ` Junio C Hamano
  2020-01-14 19:25 ` [PATCH 3/8] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:25 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git init' command creates the ".git/info" directory and fills it
with some default files. However, 'git worktree add' does not create
the info directory for that worktree. This causes a problem when running
"git sparse-checkout init" inside a worktree. While care was taken to
allow the sparse-checkout config to be specific to a worktree, this
initialization was untested.

Safely create the leading directories for the sparse-checkout file. This
is the safest thing to do even without worktrees, as a user could delete
their ".git/info" directory and expect Git to recover safely.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c          |  4 ++++
 t/t1091-sparse-checkout-builtin.sh | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index b3bed891cb..3cee8ab46e 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -199,6 +199,10 @@ static int write_patterns_and_update(struct pattern_list *pl)
 	int result;
 
 	sparse_filename = get_sparse_checkout_filename();
+
+	if (safe_create_leading_directories(sparse_filename))
+		die(_("failed to create directory for sparse-checkout file"));
+
 	fd = hold_lock_file_for_update(&lk, sparse_filename,
 				      LOCK_DIE_ON_ERROR);
 
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 20caefe155..37365dc668 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -295,4 +295,14 @@ test_expect_success 'interaction with submodules' '
 	check_files super/modules/child "a deep folder1 folder2"
 '
 
+test_expect_success 'different sparse-checkouts with worktrees' '
+	git -C repo worktree add --detach ../worktree &&
+	check_files worktree "a deep folder1 folder2" &&
+	git -C worktree sparse-checkout init --cone &&
+	git -C repo sparse-checkout set folder1 &&
+	git -C worktree sparse-checkout set deep/deeper1 &&
+	check_files repo "a folder1" &&
+	check_files worktree "a deep"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 3/8] clone: fix --sparse option with URLs
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
  2020-01-14 19:25 ` [PATCH 1/8] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
  2020-01-14 19:25 ` [PATCH 2/8] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
@ 2020-01-14 19:25 ` Derrick Stolee via GitGitGadget
  2020-01-14 19:30   ` Taylor Blau
  2020-01-14 19:25 ` [PATCH 4/8] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:25 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The --sparse option was added to the clone builtin in d89f09c (clone:
add --sparse mode, 2019-11-21) and was tested with a local path clone
in t1091-sparse-checkout-builtin.sh. However, due to a difference in
how local paths are handled versus URLs, this mechanism does not work
with URLs.

Modify the test to use a "file://" URL, which would output this error
before the code change:

  Cloning into 'clone'...
  fatal: cannot change to 'file://.../repo': No such file or directory
  error: failed to initialize sparse-checkout

These errors are due to using a "-C <path>" option to call 'git -C
<path> sparse-checkout init' but the URL is being given instead of
the target directory.

Update that target directory to evaluate this correctly. I have also
manually tested that https:// URLs are handled correctly as well.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/clone.c                    | 2 +-
 t/t1091-sparse-checkout-builtin.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4348d962c9..2caefc44fb 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1130,7 +1130,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
-	if (option_sparse_checkout && git_sparse_checkout_init(repo))
+	if (option_sparse_checkout && git_sparse_checkout_init(dir))
 		return 1;
 
 	remote = remote_get(option_origin);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 37365dc668..58d9c69163 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -90,7 +90,7 @@ test_expect_success 'init with existing sparse-checkout' '
 '
 
 test_expect_success 'clone --sparse' '
-	git clone --sparse repo clone &&
+	git clone --sparse "file://$(pwd)/repo" clone &&
 	git -C clone sparse-checkout list >actual &&
 	cat >expect <<-\EOF &&
 	/*
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 4/8] sparse-checkout: cone mode does not recognize "**"
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-01-14 19:25 ` [PATCH 3/8] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
@ 2020-01-14 19:25 ` Derrick Stolee via GitGitGadget
  2020-01-14 21:16   ` Jeff King
  2020-01-14 19:25 ` [PATCH 5/8] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:25 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set'
command creates a restricted set of possible patterns that are used
by a custom algorithm to quickly match those patterns.

If a user manually edits the sparse-checkout file, then they could
create patterns that do not match these expectations. The cone-mode
matching algorithm can return incorrect results. The solution is to
detect these incorrect patterns, warn that we do not recognize them,
and revert to the standard algorithm.

Check each pattern for the "**" substring, and revert to the old
logic if seen. While technically a "/<dir>/**" pattern matches
the meaning of "/<dir>/", it is not one that would be written by
the sparse-checkout builtin in cone mode. Attempting to accept that
pattern change complicates the logic and instead we punt and do
not accept any instance of "**".

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              |  7 ++++++
 t/t1091-sparse-checkout-builtin.sh | 34 ++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/dir.c b/dir.c
index 22d08e61c2..f8e350dda2 100644
--- a/dir.c
+++ b/dir.c
@@ -651,6 +651,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 		return;
 	}
 
+	if (strstr(given->pattern, "**")) {
+		/* Not a cone pattern. */
+		pl->use_cone_patterns = 0;
+		warning(_("unrecognized pattern: '%s'"), given->pattern);
+		goto clear_hashmaps;
+	}
+
 	if (given->patternlen > 2 &&
 	    !strcmp(given->pattern + given->patternlen - 2, "/*")) {
 		if (!(given->flags & PATTERN_FLAG_NEGATIVE)) {
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 58d9c69163..e532a52f89 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -305,4 +305,38 @@ test_expect_success 'different sparse-checkouts with worktrees' '
 	check_files worktree "a deep"
 '
 
+check_read_tree_errors () {
+	REPO=$1
+	FILES=$2
+	ERRORS=$3
+	git -C $REPO read-tree -mu HEAD 2>err &&
+	if test -z "$ERRORS"
+	then
+		test_must_be_empty err
+	else
+		test_i18ngrep "$ERRORS" err
+	fi &&
+	check_files $REPO "$FILES"
+}
+
+test_expect_success 'pattern-checks: /A/**' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/folder1/**
+	EOF
+	check_read_tree_errors repo "a folder1" "disabling cone pattern matching"
+'
+
+test_expect_success 'pattern-checks: /A/**/B/' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/deep/**/deepest
+	EOF
+	check_read_tree_errors repo "a deep" "disabling cone pattern matching" &&
+	check_files repo/deep "deeper1" &&
+	check_files repo/deep/deeper1 "deepest"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 5/8] sparse-checkout: detect short patterns
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-01-14 19:25 ` [PATCH 4/8] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
@ 2020-01-14 19:25 ` Derrick Stolee via GitGitGadget
  2020-01-14 19:26 ` [PATCH 6/8] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:25 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In cone mode, the shortest pattern the sparse-checkout command will
write into the sparse-checkout file is "/*". This is handled carefully
in add_pattern_to_hashsets(), so warn if any other pattern is this
short. This will assist future pattern checks by allowing us to assume
there are at least three characters in the pattern.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              | 3 ++-
 t/t1091-sparse-checkout-builtin.sh | 9 +++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index f8e350dda2..1c96ddf5e3 100644
--- a/dir.c
+++ b/dir.c
@@ -651,7 +651,8 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 		return;
 	}
 
-	if (strstr(given->pattern, "**")) {
+	if (given->patternlen <= 2 ||
+	    strstr(given->pattern, "**")) {
 		/* Not a cone pattern. */
 		pl->use_cone_patterns = 0;
 		warning(_("unrecognized pattern: '%s'"), given->pattern);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index e532a52f89..974a4fec8f 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -339,4 +339,13 @@ test_expect_success 'pattern-checks: /A/**/B/' '
 	check_files repo/deep/deeper1 "deepest"
 '
 
+test_expect_success 'pattern-checks: too short' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/a
+	EOF
+	check_read_tree_errors repo "a" "disabling cone pattern matching"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 6/8] sparse-checkout: warn on incorrect '*' in patterns
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-01-14 19:25 ` [PATCH 5/8] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
@ 2020-01-14 19:26 ` Derrick Stolee via GitGitGadget
  2020-01-14 19:26 ` [PATCH 7/8] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:26 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In cone mode, the sparse-checkout commmand will write patterns that
allow faster pattern matching. This matching only works if the patterns
in the sparse-checkout file are those written by that command. Users
can edit the sparse-checkout file and create patterns that cause the
cone mode matching to fail.

The cone mode patterns may end in "/*" but otherwise an un-escaped
asterisk is invalid. Add checks to disable cone mode when seeing these
values.

A later change will properly handle escaped asterisks.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              | 30 ++++++++++++++++++++++++++++++
 t/t1091-sparse-checkout-builtin.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/dir.c b/dir.c
index 1c96ddf5e3..150c05f4de 100644
--- a/dir.c
+++ b/dir.c
@@ -635,6 +635,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 	struct pattern_entry *translated;
 	char *truncated;
 	char *data = NULL;
+	const char *prev, *cur, *next;
 
 	if (!pl->use_cone_patterns)
 		return;
@@ -652,6 +653,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 	}
 
 	if (given->patternlen <= 2 ||
+	    *given->pattern == '*' ||
 	    strstr(given->pattern, "**")) {
 		/* Not a cone pattern. */
 		pl->use_cone_patterns = 0;
@@ -659,6 +661,34 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 		goto clear_hashmaps;
 	}
 
+	prev = given->pattern;
+	cur = given->pattern + 1;
+	next = given->pattern + 2;
+
+	while (*cur) {
+		/* We care about *cur == '*' */
+		if (*cur != '*')
+			goto increment;
+
+		/* But only if *prev != '\\' */
+		if (*prev == '\\')
+			goto increment;
+
+		/* But a trailing '/' then '*' is fine */
+		if (*prev == '/' && *next == 0)
+			goto increment;
+
+		/* Not a cone pattern. */
+		pl->use_cone_patterns = 0;
+		warning(_("unrecognized pattern: '%s'"), given->pattern);
+		goto clear_hashmaps;
+
+	increment:
+		prev++;
+		cur++;
+		next++;
+	}
+
 	if (given->patternlen > 2 &&
 	    !strcmp(given->pattern + given->patternlen - 2, "/*")) {
 		if (!(given->flags & PATTERN_FLAG_NEGATIVE)) {
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 974a4fec8f..5b50be53a4 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -348,4 +348,31 @@ test_expect_success 'pattern-checks: too short' '
 	check_read_tree_errors repo "a" "disabling cone pattern matching"
 '
 
+test_expect_success 'pattern-checks: trailing "*"' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/a*
+	EOF
+	check_read_tree_errors repo "a" "disabling cone pattern matching"
+'
+
+test_expect_success 'pattern-checks: starting "*"' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	*eep/
+	EOF
+	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
+'
+
+test_expect_success 'pattern-checks: escaped "*"' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/does\*not\*exist/
+	EOF
+	check_read_tree_errors repo "a" ""
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 7/8] sparse-checkout: properly match escaped characters
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-01-14 19:26 ` [PATCH 6/8] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
@ 2020-01-14 19:26 ` Derrick Stolee via GitGitGadget
  2020-01-14 21:21   ` Jeff King
  2020-01-14 19:26 ` [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:26 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In cone mode, the sparse-checkout feature uses hashset containment
queries to match paths. Make this algorithm respect escaped asterisk
(*) and backslash (\) characters.

Create dup_and_filter_pattern() method to convert a pattern by
removing escape characters and dropping an optional "/*" at the end.
This method is available in dir.h as we will use it in
builtin/sparse-chekcout.c in a later change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              | 31 +++++++++++++++++++++++++++---
 dir.h                              |  1 +
 t/t1091-sparse-checkout-builtin.sh | 22 +++++++++++++++++----
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index 150c05f4de..2840bacd40 100644
--- a/dir.c
+++ b/dir.c
@@ -630,6 +630,32 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
 	return strncmp(ee1->pattern, ee2->pattern, min_len);
 }
 
+char *dup_and_filter_pattern(const char *pattern)
+{
+	char *set, *read;
+	char *result = xstrdup(pattern);
+
+	set = result;
+	read = result;
+
+	while (*read) {
+		/* skip escape characters (once) */
+		if (*read == '\\')
+			read++;
+
+		*set = *read;
+
+		set++;
+		read++;
+	}
+	*set = 0;
+
+	if (*(read - 2) == '/' && *(read - 1) == '*')
+		*(read - 2) = 0;
+
+	return result;
+}
+
 static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern *given)
 {
 	struct pattern_entry *translated;
@@ -698,8 +724,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 			goto clear_hashmaps;
 		}
 
-		truncated = xstrdup(given->pattern);
-		truncated[given->patternlen - 2] = 0;
+		truncated = dup_and_filter_pattern(given->pattern);
 
 		translated = xmalloc(sizeof(struct pattern_entry));
 		translated->pattern = truncated;
@@ -733,7 +758,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 
 	translated = xmalloc(sizeof(struct pattern_entry));
 
-	translated->pattern = xstrdup(given->pattern);
+	translated->pattern = dup_and_filter_pattern(given->pattern);
 	translated->patternlen = given->patternlen;
 	hashmap_entry_init(&translated->ent,
 			   ignore_case ?
diff --git a/dir.h b/dir.h
index 77a43dbf89..6dcd9d33e7 100644
--- a/dir.h
+++ b/dir.h
@@ -304,6 +304,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
 		   const struct hashmap_entry *a,
 		   const struct hashmap_entry *b,
 		   const void *key);
+char *dup_and_filter_pattern(const char *pattern);
 int hashmap_contains_parent(struct hashmap *map,
 			    const char *path,
 			    struct strbuf *buffer);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 5b50be53a4..051c1f3bf2 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -366,13 +366,27 @@ test_expect_success 'pattern-checks: starting "*"' '
 	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
 '
 
-test_expect_success 'pattern-checks: escaped "*"' '
-	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' '
+	git clone repo escaped &&
+	TREEOID=$(git -C escaped rev-parse HEAD:folder1) &&
+	NEWTREE=$(git -C escaped mktree <<-EOF
+	$(git -C escaped ls-tree HEAD)
+	040000 tree $TREEOID	zbad\\dir
+	040000 tree $TREEOID	zdoes*exist
+	EOF
+	) &&
+	COMMIT=$(git -C escaped commit-tree $NEWTREE -p HEAD) &&
+	git -C escaped reset --hard $COMMIT &&
+	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist" &&
+	git -C escaped sparse-checkout init --cone &&
+	cat >escaped/.git/info/sparse-checkout <<-\EOF &&
 	/*
 	!/*/
-	/does\*not\*exist/
+	/zbad\\dir/
+	/zdoes\*not\*exist/
+	/zdoes\*exist/
 	EOF
-	check_read_tree_errors repo "a" ""
+	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-01-14 19:26 ` [PATCH 7/8] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
@ 2020-01-14 19:26 ` Derrick Stolee via GitGitGadget
  2020-01-14 21:25   ` Jeff King
  2020-01-14 19:34 ` [PATCH 0/8] Harden the sparse-checkout builtin Taylor Blau
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-14 19:26 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.

Even more specifically, the goal is to always allow the following from
the root of a repo:

  git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin

The ls-tree command provides directory names with an unescaped asterisk.
It also quotes the directories that contain an escaped backslash. We
must remove these quotes, then keep the escaped backslashes.

However, there is some care needed for the timing of these escapes. The
in-memory pattern list is used to update the working directory before
writing the patterns to disk. Thus, we need the command to have the
unescaped names in the hashsets for the cone comparisons, then escape
the patterns later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c          | 48 ++++++++++++++++++++++++++++--
 t/t1091-sparse-checkout-builtin.sh | 21 +++++++++++--
 2 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 3cee8ab46e..61d2c30036 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -140,6 +140,22 @@ static int update_working_directory(struct pattern_list *pl)
 	return result;
 }
 
+static char *escaped_pattern(char *pattern)
+{
+	char *p = pattern;
+	struct strbuf final = STRBUF_INIT;
+
+	while (*p) {
+		if (*p == '*' || *p == '\\')
+			strbuf_addch(&final, '\\');
+
+		strbuf_addch(&final, *p);
+		p++;
+	}
+
+	return strbuf_detach(&final, NULL);
+}
+
 static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 {
 	int i;
@@ -164,10 +180,11 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 	fprintf(fp, "/*\n!/*/\n");
 
 	for (i = 0; i < sl.nr; i++) {
-		char *pattern = sl.items[i].string;
+		char *pattern = escaped_pattern(sl.items[i].string);
 
 		if (strlen(pattern))
 			fprintf(fp, "%s/\n!%s/*/\n", pattern, pattern);
+		free(pattern);
 	}
 
 	string_list_clear(&sl, 0);
@@ -185,8 +202,9 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 	string_list_remove_duplicates(&sl, 0);
 
 	for (i = 0; i < sl.nr; i++) {
-		char *pattern = sl.items[i].string;
+		char *pattern = escaped_pattern(sl.items[i].string);
 		fprintf(fp, "%s/\n", pattern);
+		free(pattern);
 	}
 }
 
@@ -337,7 +355,9 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 {
 	struct pattern_entry *e = xmalloc(sizeof(*e));
 	e->patternlen = path->len;
-	e->pattern = strbuf_detach(path, NULL);
+	e->pattern = dup_and_filter_pattern(path->buf);
+	strbuf_release(path);
+
 	hashmap_entry_init(&e->ent,
 			   ignore_case ?
 			   strihash(e->pattern) :
@@ -369,6 +389,7 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 
 static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 {
+	int i;
 	strbuf_trim(line);
 
 	strbuf_trim_trailing_dir_sep(line);
@@ -376,6 +397,27 @@ static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 	if (!line->len)
 		return;
 
+	for (i = 0; i < line->len; i++) {
+		if (line->buf[i] == '*') {
+			strbuf_insert(line, i, "\\", 1);
+			i++;
+		}
+
+		if (line->buf[i] == '\\') {
+			if (i < line->len - 1 && line->buf[i + 1] == '\\')
+				i++;
+			else
+				strbuf_insert(line, i, "\\", 1);
+
+			i++;
+		}
+	}
+
+	if (line->buf[0] == '"' && line->buf[line->len - 1] == '"') {
+		strbuf_remove(line, 0, 1);
+		strbuf_remove(line, line->len - 1, 1);
+	}
+
 	if (line->buf[0] != '/')
 		strbuf_insert(line, 0, "/", 1);
 
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 051c1f3bf2..3da7c10bd9 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -309,6 +309,9 @@ check_read_tree_errors () {
 	REPO=$1
 	FILES=$2
 	ERRORS=$3
+	git -C $REPO -c core.sparseCheckoutCone=false read-tree -mu HEAD 2>err &&
+	test_must_be_empty err &&
+	check_files $REPO "$FILES" &&
 	git -C $REPO read-tree -mu HEAD 2>err &&
 	if test -z "$ERRORS"
 	then
@@ -379,14 +382,28 @@ test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' '
 	git -C escaped reset --hard $COMMIT &&
 	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist" &&
 	git -C escaped sparse-checkout init --cone &&
-	cat >escaped/.git/info/sparse-checkout <<-\EOF &&
+	git -C escaped sparse-checkout set zbad\\dir zdoes\*not\*exist zdoes\*exist &&
+	cat >expect <<-\EOF &&
 	/*
 	!/*/
 	/zbad\\dir/
+	/zdoes\*exist/
 	/zdoes\*not\*exist/
+	EOF
+	test_cmp expect escaped/.git/info/sparse-checkout &&
+	check_read_tree_errors escaped "a zbad\\dir zdoes*exist" &&
+	git -C escaped ls-tree -d --name-only HEAD | git -C escaped sparse-checkout set --stdin &&
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/deep/
+	/folder1/
+	/folder2/
+	/zbad\\dir/
 	/zdoes\*exist/
 	EOF
-	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
+	test_cmp expect escaped/.git/info/sparse-checkout &&
+	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist"
 '
 
 test_done
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/8] clone: fix --sparse option with URLs
  2020-01-14 19:25 ` [PATCH 3/8] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
@ 2020-01-14 19:30   ` Taylor Blau
  0 siblings, 0 replies; 38+ messages in thread
From: Taylor Blau @ 2020-01-14 19:30 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, peff, newren, Derrick Stolee

Hi Stolee,

On Tue, Jan 14, 2020 at 07:25:57PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The --sparse option was added to the clone builtin in d89f09c (clone:
> add --sparse mode, 2019-11-21) and was tested with a local path clone
> in t1091-sparse-checkout-builtin.sh. However, due to a difference in
> how local paths are handled versus URLs, this mechanism does not work
> with URLs.

As we discussed off-list, both of us (as well as Peff) were able to
reproduce this issue. I think that this paragraph is a good description
of what's going on heee.

> Modify the test to use a "file://" URL, which would output this error
> before the code change:
>
>   Cloning into 'clone'...
>   fatal: cannot change to 'file://.../repo': No such file or directory
>   error: failed to initialize sparse-checkout

Nice, this should give us confidence that there won't be a regression
here in the future. I don't think that the explanation is complicated
enough for a single commit which introduced an expected failure, so
grouping it all together in this patch seems good to me.

> These errors are due to using a "-C <path>" option to call 'git -C
> <path> sparse-checkout init' but the URL is being given instead of
> the target directory.
>
> Update that target directory to evaluate this correctly. I have also
> manually tested that https:// URLs are handled correctly as well.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/clone.c                    | 2 +-
>  t/t1091-sparse-checkout-builtin.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 4348d962c9..2caefc44fb 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1130,7 +1130,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	if (option_required_reference.nr || option_optional_reference.nr)
>  		setup_reference();
>
> -	if (option_sparse_checkout && git_sparse_checkout_init(repo))
> +	if (option_sparse_checkout && git_sparse_checkout_init(dir))

I agree that 'dir' is the right thing to use here. It's the string we
read from to print "Cloning into ...", which always displays the
directory relative to the cwd. Looking at the implementation in
'git_sparse_checkout_init', this matches my understanding, too.

>  		return 1;
>
>  	remote = remote_get(option_origin);
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 37365dc668..58d9c69163 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -90,7 +90,7 @@ test_expect_success 'init with existing sparse-checkout' '
>  '
>
>  test_expect_success 'clone --sparse' '
> -	git clone --sparse repo clone &&
> +	git clone --sparse "file://$(pwd)/repo" clone &&
>  	git -C clone sparse-checkout list >actual &&
>  	cat >expect <<-\EOF &&
>  	/*
> --
> gitgitgadget

This all looks good to me.

  Acked-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/8] Harden the sparse-checkout builtin
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2020-01-14 19:26 ` [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
@ 2020-01-14 19:34 ` Taylor Blau
  2020-01-14 19:44   ` Derrick Stolee
  2020-01-15 19:16 ` Junio C Hamano
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  10 siblings, 1 reply; 38+ messages in thread
From: Taylor Blau @ 2020-01-14 19:34 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, peff, newren, Derrick Stolee

Hi Stolee,

On Tue, Jan 14, 2020 at 07:25:54PM +0000, Derrick Stolee via GitGitGadget wrote:
> This series is based on ds/sparse-list-in-cone-mode.
>
> This series attempts to clean up some rough edges in the sparse-checkout
> feature, especially around the cone mode.
>
> Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
> clone --sparse" option when using a URL instead of a local path. This is
> fixed and properly tested here.

I haven't had a chance to look at the other patches (besides the one
that I have already reviewed on- and off-list), so take my comments here
with a grain of salt.

It's too bad that 'git clone --sparse' isn't working with URLs in
v2.25.0, but it happens, and I don't think that this is a grave-enough
issue to warrant a new release. At least, since '--sparse' is new in
v2.25.0, we're not breaking existing workflows that have already relied
on it.

And, since sparse-checkout is still relatively niche (at, least for now
;-)), I think that this not handling cloning from URLs is fine until
v2.26.0.

Of course, if there's ever another need for v2.25.1, I don't think that
this would *hurt* to release then, which is to say that we definitely
should have these patches in a release, soon, but I don't think that
there's a terrible sense of urgency in the meantime.

> Also, let's improve Git's response to these more complicated scenarios:
>
>  1. Running "git sparse-checkout init" in a worktree would complain because
>     the "info" dir doesn't exist.
>  2. Tracked paths that include "*" and "" in their filenames.
>  3. If a user edits the sparse-checkout file to have non-cone pattern, such
>     as "*" anywhere or "" in the wrong place, then we should respond
>     appropriately. That is: warn that the patterns are not cone-mode, then
>     revert to the old logic.
>
> Thanks, -Stolee
>
> Derrick Stolee (8):
>   t1091: use check_files to reduce boilerplate
>   sparse-checkout: create leading directories
>   clone: fix --sparse option with URLs
>   sparse-checkout: cone mode does not recognize "**"
>   sparse-checkout: detect short patterns
>   sparse-checkout: warn on incorrect '*' in patterns
>   sparse-checkout: properly match escaped characters
>   sparse-checkout: write escaped patterns in cone mode
>
>  builtin/clone.c                    |   2 +-
>  builtin/sparse-checkout.c          |  52 ++++-
>  dir.c                              |  69 ++++++-
>  dir.h                              |   1 +
>  t/t1091-sparse-checkout-builtin.sh | 320 ++++++++++++++++-------------
>  5 files changed, 296 insertions(+), 148 deletions(-)
>
>
> base-commit: 4fd683b6a35eabd23dd5183da7f654a1e1f00325
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-513%2Fderrickstolee%2Fsparse-harden-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-513/derrickstolee/sparse-harden-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/513
> --
> gitgitgadget
Thanks,
Taylor

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/8] Harden the sparse-checkout builtin
  2020-01-14 19:34 ` [PATCH 0/8] Harden the sparse-checkout builtin Taylor Blau
@ 2020-01-14 19:44   ` Derrick Stolee
  2020-01-14 21:31     ` Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee @ 2020-01-14 19:44 UTC (permalink / raw)
  To: Taylor Blau, Derrick Stolee via GitGitGadget
  Cc: git, peff, newren, Derrick Stolee

On 1/14/2020 2:34 PM, Taylor Blau wrote:
> Hi Stolee,
> 
> On Tue, Jan 14, 2020 at 07:25:54PM +0000, Derrick Stolee via GitGitGadget wrote:
>> This series is based on ds/sparse-list-in-cone-mode.
>>
>> This series attempts to clean up some rough edges in the sparse-checkout
>> feature, especially around the cone mode.
>>
>> Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
>> clone --sparse" option when using a URL instead of a local path. This is
>> fixed and properly tested here.
> 
> I haven't had a chance to look at the other patches (besides the one
> that I have already reviewed on- and off-list), so take my comments here
> with a grain of salt.
> 
> It's too bad that 'git clone --sparse' isn't working with URLs in
> v2.25.0, but it happens, and I don't think that this is a grave-enough
> issue to warrant a new release. At least, since '--sparse' is new in
> v2.25.0, we're not breaking existing workflows that have already relied
> on it.
> 
> And, since sparse-checkout is still relatively niche (at, least for now
> ;-)), I think that this not handling cloning from URLs is fine until
> v2.26.0.

Since we've already worked out the workaround to be:

	git clone --no-checkout <url> <dir>
	cd <dir>
	git sparse-checkout init --cone

there is no rush to fix this. Users _may_ discover the --sparse option
from the clone docs and complain, but we can point them to the above
directions for now.

> Of course, if there's ever another need for v2.25.1, I don't think that
> this would *hurt* to release then, which is to say that we definitely
> should have these patches in a release, soon, but I don't think that
> there's a terrible sense of urgency in the meantime.

I wouldn't complain to have patches 1-3 in an otherwise-warranted .1 release.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/8] sparse-checkout: cone mode does not recognize "**"
  2020-01-14 19:25 ` [PATCH 4/8] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
@ 2020-01-14 21:16   ` Jeff King
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2020-01-14 21:16 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, newren, Derrick Stolee

On Tue, Jan 14, 2020 at 07:25:58PM +0000, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set'
> command creates a restricted set of possible patterns that are used
> by a custom algorithm to quickly match those patterns.
> 
> If a user manually edits the sparse-checkout file, then they could
> create patterns that do not match these expectations. The cone-mode
> matching algorithm can return incorrect results. The solution is to
> detect these incorrect patterns, warn that we do not recognize them,
> and revert to the standard algorithm.
> 
> Check each pattern for the "**" substring, and revert to the old
> logic if seen. While technically a "/<dir>/**" pattern matches
> the meaning of "/<dir>/", it is not one that would be written by
> the sparse-checkout builtin in cone mode. Attempting to accept that
> pattern change complicates the logic and instead we punt and do
> not accept any instance of "**".

That all makes sense.

> diff --git a/dir.c b/dir.c
> index 22d08e61c2..f8e350dda2 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -651,6 +651,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
>  		return;
>  	}
>  
> +	if (strstr(given->pattern, "**")) {
> +		/* Not a cone pattern. */
> +		pl->use_cone_patterns = 0;
> +		warning(_("unrecognized pattern: '%s'"), given->pattern);
> +		goto clear_hashmaps;
> +	}

The clear_hashmaps label already unsets pl->use_cone_patterns, so the
first line is redundant (the same is true of existing goto jumps, as
well, though).

I wondered whether this warning could be triggered accidentally by
somebody who just happened to add such a pattern. But we'd exit
immediately from add_pattern_to_hashsets() immediately unless the user
has set core.sparseCheckoutCone. And if that's set, then warning is
definitely the right thing to do.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/8] sparse-checkout: properly match escaped characters
  2020-01-14 19:26 ` [PATCH 7/8] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
@ 2020-01-14 21:21   ` Jeff King
  2020-01-14 22:08     ` Derrick Stolee
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2020-01-14 21:21 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, newren, Derrick Stolee

On Tue, Jan 14, 2020 at 07:26:01PM +0000, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> In cone mode, the sparse-checkout feature uses hashset containment
> queries to match paths. Make this algorithm respect escaped asterisk
> (*) and backslash (\) characters.
> 
> Create dup_and_filter_pattern() method to convert a pattern by
> removing escape characters and dropping an optional "/*" at the end.
> This method is available in dir.h as we will use it in
> builtin/sparse-chekcout.c in a later change.

s/chekcout/checkout/

It took me a minute to understand the problem here, but I think it's: if
a path in the sparse-checkout file has "\*" in it, we'd try to match a
literal "\*" in the hash, not "*"?

But we wouldn't run into that yet because we don't properly _write_ the
escaped names until patch 8.

Is that right?

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode
  2020-01-14 19:26 ` [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
@ 2020-01-14 21:25   ` Jeff King
  2020-01-14 22:11     ` Derrick Stolee
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2020-01-14 21:25 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, newren, Derrick Stolee

On Tue, Jan 14, 2020 at 07:26:02PM +0000, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> If a user somehow creates a directory with an asterisk (*) or backslash
> (\), then the "git sparse-checkout set" command will struggle to provide
> the correct pattern in the sparse-checkout file. When not in cone mode,
> the provided pattern is written directly into the sparse-checkout file.
> However, in cone mode we expect a list of paths to directories and then
> we convert those into patterns.
> 
> Even more specifically, the goal is to always allow the following from
> the root of a repo:
> 
>   git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
> 
> The ls-tree command provides directory names with an unescaped asterisk.
> It also quotes the directories that contain an escaped backslash. We
> must remove these quotes, then keep the escaped backslashes.

Do we need to document these rules somewhere? Naively I'd expect
"--stdin" to take in literal pathnames. But of course it can't represent
a path with a newline. So perhaps it makes sense to take quoted names by
default, and allow literal NUL-separated input with "-z" if anybody
wants it.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/8] Harden the sparse-checkout builtin
  2020-01-14 19:44   ` Derrick Stolee
@ 2020-01-14 21:31     ` Jeff King
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2020-01-14 21:31 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, Derrick Stolee via GitGitGadget, git, newren,
	Derrick Stolee

On Tue, Jan 14, 2020 at 02:44:50PM -0500, Derrick Stolee wrote:

> Since we've already worked out the workaround to be:
> 
> 	git clone --no-checkout <url> <dir>
> 	cd <dir>
> 	git sparse-checkout init --cone
> 
> there is no rush to fix this. Users _may_ discover the --sparse option
> from the clone docs and complain, but we can point them to the above
> directions for now.

Yes, but part of the beauty of the new system is just having to say
"--sparse" to make something useful happen. :)

> > Of course, if there's ever another need for v2.25.1, I don't think that
> > this would *hurt* to release then, which is to say that we definitely
> > should have these patches in a release, soon, but I don't think that
> > there's a terrible sense of urgency in the meantime.
> 
> I wouldn't complain to have patches 1-3 in an otherwise-warranted .1 release.

Agreed. 1-3 look obviously correct to me. The quoting bits I'm a little
more fuzzy on, just because I haven't really looked hard into cone mode.
Ditto for the "disable cone mode" checks.

My gut instinct is that you should be able to deduce whether the pattern
hashmap can be used purely from the patterns you see, and
core.sparseCheckoutCone would not be needed (and so if you violate the
rules by writing something manual, then it just gets slower; or maybe
we're even able to apply the literal cone-mode rules quickly and handle
the other separately). But it's much more likely I'm showing off my lack
of knowledge of the details of the problem space. You can feel free to
educate me, and/or roll your eyes and ignore me if this was already
discussed earlier.

By the way, I did notice this while poking about, which could go on top
(or hopefully be lumped in with the 1-3 as "obviously correct"):

-- >8 --

Subject: [PATCH] sparse-checkout: fix documentation typo for core.sparseCheckoutCone

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-sparse-checkout.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 974ade2238..285893b069 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -106,7 +106,7 @@ The full pattern set allows for arbitrary pattern matches and complicated
 inclusion/exclusion rules. These can result in O(N*M) pattern matches when
 updating the index, where N is the number of patterns and M is the number
 of paths in the index. To combat this performance issue, a more restricted
-pattern set is allowed when `core.spareCheckoutCone` is enabled.
+pattern set is allowed when `core.sparseCheckoutCone` is enabled.
 
 The accepted patterns in the cone pattern set are:
 
-- 
2.25.0.639.gb9b1511416


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/8] sparse-checkout: properly match escaped characters
  2020-01-14 21:21   ` Jeff King
@ 2020-01-14 22:08     ` Derrick Stolee
  0 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee @ 2020-01-14 22:08 UTC (permalink / raw)
  To: Jeff King, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, Derrick Stolee

On 1/14/2020 4:21 PM, Jeff King wrote:
> On Tue, Jan 14, 2020 at 07:26:01PM +0000, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> In cone mode, the sparse-checkout feature uses hashset containment
>> queries to match paths. Make this algorithm respect escaped asterisk
>> (*) and backslash (\) characters.
>>
>> Create dup_and_filter_pattern() method to convert a pattern by
>> removing escape characters and dropping an optional "/*" at the end.
>> This method is available in dir.h as we will use it in
>> builtin/sparse-chekcout.c in a later change.
> 
> s/chekcout/checkout/

Thanks.

> It took me a minute to understand the problem here, but I think it's: if
> a path in the sparse-checkout file has "\*" in it, we'd try to match a
> literal "\*" in the hash, not "*"?

Yes, the hashset would have the string "\*" instead of the string "*". This
would lead to missing directories when cone mode is enabled compared to
cone mode not being enabled.
 
> But we wouldn't run into that yet because we don't properly _write_ the
> escaped names until patch 8.

We wouldn't run into it when using the builtin, but also a user could
edit their sparse-checkout file manually OR figure out how to get the
"right" pattern by running "git sparse-checkout set "my\\*dir" (where the
escaped backslash is collapsed by the shell and Git sees "my\*dir".

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode
  2020-01-14 21:25   ` Jeff King
@ 2020-01-14 22:11     ` Derrick Stolee
  2020-01-14 22:48       ` Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee @ 2020-01-14 22:11 UTC (permalink / raw)
  To: Jeff King, Derrick Stolee via GitGitGadget
  Cc: git, me, newren, Derrick Stolee

On 1/14/2020 4:25 PM, Jeff King wrote:
> On Tue, Jan 14, 2020 at 07:26:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> If a user somehow creates a directory with an asterisk (*) or backslash
>> (\), then the "git sparse-checkout set" command will struggle to provide
>> the correct pattern in the sparse-checkout file. When not in cone mode,
>> the provided pattern is written directly into the sparse-checkout file.
>> However, in cone mode we expect a list of paths to directories and then
>> we convert those into patterns.
>>
>> Even more specifically, the goal is to always allow the following from
>> the root of a repo:
>>
>>   git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
>>
>> The ls-tree command provides directory names with an unescaped asterisk.
>> It also quotes the directories that contain an escaped backslash. We
>> must remove these quotes, then keep the escaped backslashes.
> 
> Do we need to document these rules somewhere? Naively I'd expect
> "--stdin" to take in literal pathnames. But of course it can't represent
> a path with a newline. So perhaps it makes sense to take quoted names by
> default, and allow literal NUL-separated input with "-z" if anybody
> wants it.

This is worth thinking about the right way to describe the rules:

1. You don't _need_ quotes. They happen to come along for the ride in
  'git ls-tree' so it doesn't mess up shell scripts that iterate on
  those entries. At least, that's why I think they are quoted.

2. If you use quotes, the first layer of quotes will be removed.

How much of this needs to be documented explicitly, or how much should
we say "The input format matches what we would expect from 'git ls-tree
--name-only'"?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode
  2020-01-14 22:11     ` Derrick Stolee
@ 2020-01-14 22:48       ` Jeff King
  2020-01-24 21:10         ` Derrick Stolee
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2020-01-14 22:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, Derrick Stolee

On Tue, Jan 14, 2020 at 05:11:03PM -0500, Derrick Stolee wrote:

> > Do we need to document these rules somewhere? Naively I'd expect
> > "--stdin" to take in literal pathnames. But of course it can't represent
> > a path with a newline. So perhaps it makes sense to take quoted names by
> > default, and allow literal NUL-separated input with "-z" if anybody
> > wants it.
> 
> This is worth thinking about the right way to describe the rules:
> 
> 1. You don't _need_ quotes. They happen to come along for the ride in
>   'git ls-tree' so it doesn't mess up shell scripts that iterate on
>   those entries. At least, that's why I think they are quoted.

It's not just shell scripts. Without quoting, the syntax becomes
ambiguous (e.g., imagine a file with a newline in it). So most Git
output that shows a filename will quote it if necessary, unless
NUL separators are being used.

> 2. If you use quotes, the first layer of quotes will be removed.

I take this to mean that anything starting with a double-quote will have
the outer layer removed, and backslash escapes inside expanded. And
anything without a starting double quote (even if it has internal
backslash escapes!) will be taken literally.

That would match how things like "update-index --index-info" work.

As far as implementation, I know you're trying to keep some of the
escaping, but I think it might make more sense to do use
unquote_c_style() to parse the input (see update-index's use for some
prior art), and then re-quote as necessary to put things into the
sparse-checkout file (I guess quoting more than just quote_c_style()
would do, since you need to quote glob metacharacters like '*' and
probably "!"). But as much as possible, I think you'd want literal
strings inside the program, and just quoting/unquoting at the edges.

> How much of this needs to be documented explicitly, or how much should
> we say "The input format matches what we would expect from 'git ls-tree
> --name-only'"?

I think it's fine to say that, and maybe call attention to the quoting.
Like:

  The input format matches the output of `git ls-tree --name-only`. This
  includes interpreting pathnames that begin with a double quote (") as
  C-style quoted strings.

Disappointingly, update-index does not seem to explain the rules
anywhere. fast-import does cover it. Maybe it's something that ought to
be hoisted out into gitcli(7) or similar (or maybe it has been and I
just can't find it).

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/8] Harden the sparse-checkout builtin
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2020-01-14 19:34 ` [PATCH 0/8] Harden the sparse-checkout builtin Taylor Blau
@ 2020-01-15 19:16 ` Junio C Hamano
  2020-01-15 20:32   ` Derrick Stolee
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  10 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2020-01-15 19:16 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, peff, newren, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Also, let's improve Git's response to these more complicated scenarios:
>
>  1. Running "git sparse-checkout init" in a worktree would complain because
>     the "info" dir doesn't exist.
>  2. Tracked paths that include "*" and "" in their filenames.
>  3. If a user edits the sparse-checkout file to have non-cone pattern, such
>     as "*" anywhere or "" in the wrong place, then we should respond
>     appropriately. That is: warn that the patterns are not cone-mode, then
>     revert to the old logic.

It seems somebody ate a letter to make "<some letter>" into "" an
empty string, so I cannot quite grok the above list---two out of
three bullet points are not quite readable.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/8] Harden the sparse-checkout builtin
  2020-01-15 19:16 ` Junio C Hamano
@ 2020-01-15 20:32   ` Derrick Stolee
  0 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee @ 2020-01-15 20:32 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, peff, newren, Derrick Stolee

On 1/15/2020 2:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Also, let's improve Git's response to these more complicated scenarios:
>>
>>  1. Running "git sparse-checkout init" in a worktree would complain because
>>     the "info" dir doesn't exist.
>>  2. Tracked paths that include "*" and "" in their filenames.
>>  3. If a user edits the sparse-checkout file to have non-cone pattern, such
>>     as "*" anywhere or "" in the wrong place, then we should respond
>>     appropriately. That is: warn that the patterns are not cone-mode, then
>>     revert to the old logic.
> 
> It seems somebody ate a letter to make "<some letter>" into "" an
> empty string, so I cannot quite grok the above list---two out of
> three bullet points are not quite readable.

In 2., the "" should be "\".

In 3., the "*" and "" should be "**" and "*" respectively.

These are things that are being collapsed by GitHub's markdown
processing. Sorry that this affects GitGitGadget's cover letter.

By escaping appropriately, these show up correctly and hopefully
will be fixed in v2.

-Stolee

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/8] t1091: use check_files to reduce boilerplate
  2020-01-14 19:25 ` [PATCH 1/8] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
@ 2020-01-16 21:40   ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2020-01-16 21:40 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, peff, newren, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> When testing the sparse-checkout feature, we need to compare the
> contents of the working-directory against some expected output.
> Using here-docs was useful in the beginning, but became repetetive
> as the test script grew.
>
> Create a check_files helper to make the tests simpler and easier
> to extend. It also reduces instances of bad here-doc whitespace.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1091-sparse-checkout-builtin.sh | 215 ++++++++++-------------------
>  1 file changed, 71 insertions(+), 144 deletions(-)
>
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index ff7f8f7a1f..20caefe155 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -12,6 +12,13 @@ list_files() {
>  	(cd "$1" && printf '%s\n' *)
>  }
>  
> +check_files() {
> +	DIR=$1
> +	printf "%s\n" $2 >expect &&
> +	list_files $DIR >actual  &&

It is unclear if the script is being deliberate or sloppy.

It turns out that not quoting $2 is deliberate (i.e. it wants to
pass more than one words in $2, have them split at $IFS and show
each of them on a separate line), at the same time not quoting $DIR
is simply sloppy.

And it is totally unnecessary to confuse readers like this.

Unless you plan to extend this helper further, I think this would be
much less burdensome to the readers:

        check_files () {
                list_files "$1" >actual  &&
                shift &&
                printf "%s\n" "$@" >expect &&
                test_cmp expect actual
        }

This ...

>  	test_cmp expect repo/.git/info/sparse-checkout &&
> -	list_files repo >dir  &&
> -	cat >expect <<-EOF &&
> -		a
> -		folder1
> -		folder2
> -	EOF
> -	test_cmp expect dir
> +	check_files repo "a folder1 folder2"

... is a kind of change that the log message advertises, which is a
very nice rewrite.

And ...

>  test_expect_success 'clone --sparse' '
>  	git clone --sparse repo clone &&
>  	git -C clone sparse-checkout list >actual &&
> -	cat >expect <<-EOF &&
> -		/*
> -		!/*/
> +	cat >expect <<-\EOF &&
> +	/*
> +	!/*/
>  	EOF

... this is a style-fix that is another nice rewrite but in a
different category.  I wonder if they should be done in separate
commits.

Other than that, makes sense.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/8] sparse-checkout: create leading directories
  2020-01-14 19:25 ` [PATCH 2/8] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
@ 2020-01-16 21:46   ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2020-01-16 21:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, peff, newren, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The 'git init' command creates the ".git/info" directory and fills it
> with some default files. However, 'git worktree add' does not create
> the info directory for that worktree. This causes a problem when running
> "git sparse-checkout init" inside a worktree. While care was taken to
> allow the sparse-checkout config to be specific to a worktree, this
> initialization was untested.
>
> Safely create the leading directories for the sparse-checkout file. This
> is the safest thing to do even without worktrees, as a user could delete
> their ".git/info" directory and expect Git to recover safely.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/sparse-checkout.c          |  4 ++++
>  t/t1091-sparse-checkout-builtin.sh | 10 ++++++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index b3bed891cb..3cee8ab46e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -199,6 +199,10 @@ static int write_patterns_and_update(struct pattern_list *pl)
>  	int result;
>  
>  	sparse_filename = get_sparse_checkout_filename();
> +
> +	if (safe_create_leading_directories(sparse_filename))
> +		die(_("failed to create directory for sparse-checkout file"));
> +

The use of safe_create_leading_directories() here, which uses
adjust_shared_perm(), is the right thing to do.

Looks good.

> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 20caefe155..37365dc668 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -295,4 +295,14 @@ test_expect_success 'interaction with submodules' '
>  	check_files super/modules/child "a deep folder1 folder2"
>  '
>  
> +test_expect_success 'different sparse-checkouts with worktrees' '
> +	git -C repo worktree add --detach ../worktree &&
> +	check_files worktree "a deep folder1 folder2" &&
> +	git -C worktree sparse-checkout init --cone &&
> +	git -C repo sparse-checkout set folder1 &&
> +	git -C worktree sparse-checkout set deep/deeper1 &&
> +	check_files repo "a folder1" &&
> +	check_files worktree "a deep"
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode
  2020-01-14 22:48       ` Jeff King
@ 2020-01-24 21:10         ` Derrick Stolee
  2020-01-24 21:42           ` Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: Derrick Stolee @ 2020-01-24 21:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, Derrick Stolee

On 1/14/2020 5:48 PM, Jeff King wrote:
> On Tue, Jan 14, 2020 at 05:11:03PM -0500, Derrick Stolee wrote:
> 
>>> Do we need to document these rules somewhere? Naively I'd expect
>>> "--stdin" to take in literal pathnames. But of course it can't represent
>>> a path with a newline. So perhaps it makes sense to take quoted names by
>>> default, and allow literal NUL-separated input with "-z" if anybody
>>> wants it.
>>
>> This is worth thinking about the right way to describe the rules:
>>
>> 1. You don't _need_ quotes. They happen to come along for the ride in
>>   'git ls-tree' so it doesn't mess up shell scripts that iterate on
>>   those entries. At least, that's why I think they are quoted.
> 
> It's not just shell scripts. Without quoting, the syntax becomes
> ambiguous (e.g., imagine a file with a newline in it). So most Git
> output that shows a filename will quote it if necessary, unless
> NUL separators are being used.

Good to know.

>> 2. If you use quotes, the first layer of quotes will be removed.
> 
> I take this to mean that anything starting with a double-quote will have
> the outer layer removed, and backslash escapes inside expanded. And
> anything without a starting double quote (even if it has internal
> backslash escapes!) will be taken literally.

Hm. Perhaps you are right! The ls-tree output for the test example
is:

	deep
	folder1
	folder2
	"zbad\\dir"
	zdoes*exist

so the "zdoes*exist" value is not escaped. I believe the current
logic does something extra: consider supplying this input to
'git sparse-checkout set --stdin':

	deep
	folder1
	folder2
	"zbad\\dir"
	zdoes\*exist

then should we un-escape "\*" to "*"? Or is this not a valid input
because a backslash should have been quoted into C-style quotes?

The behavior in the current series allows this output that would
never be written by "git ls-tree".

> That would match how things like "update-index --index-info" work.
> 
> As far as implementation, I know you're trying to keep some of the
> escaping, but I think it might make more sense to do use
> unquote_c_style() to parse the input (see update-index's use for some
> prior art), and then re-quote as necessary to put things into the
> sparse-checkout file (I guess quoting more than just quote_c_style()
> would do, since you need to quote glob metacharacters like '*' and
> probably "!"). But as much as possible, I think you'd want literal
> strings inside the program, and just quoting/unquoting at the edges.

I was playing around with this, and I think that quote_c_style() is
necessary for the output, but we have a strange in-memory situation
for the other escaping: we both fill the hashsets with the un-escaped
data and fill the pattern list with the escaped patterns.

I'll add a commit with the quote_c_style() calls during the 'list'
subcommand.

>> How much of this needs to be documented explicitly, or how much should
>> we say "The input format matches what we would expect from 'git ls-tree
>> --name-only'"?
> 
> I think it's fine to say that, and maybe call attention to the quoting.
> Like:
> 
>   The input format matches the output of `git ls-tree --name-only`. This
>   includes interpreting pathnames that begin with a double quote (") as
>   C-style quoted strings.
> 
> Disappointingly, update-index does not seem to explain the rules
> anywhere. fast-import does cover it. Maybe it's something that ought to
> be hoisted out into gitcli(7) or similar (or maybe it has been and I
> just can't find it).

I'll start the process by using your recommended language. I noticed
also that the 'set' command doesn't actually document what happens
when in cone mode, so I will correct that, too.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 00/12] Harden the sparse-checkout builtin
  2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2020-01-15 19:16 ` Junio C Hamano
@ 2020-01-24 21:19 ` " Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
                     ` (11 more replies)
  10 siblings, 12 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee

This series is based on ds/sparse-list-in-cone-mode.

This series attempts to clean up some rough edges in the sparse-checkout
feature, especially around the cone mode.

Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
clone --sparse" option when using a URL instead of a local path. This is
fixed and properly tested here.

Also, let's improve Git's response to these more complicated scenarios:

 1. Running "git sparse-checkout init" in a worktree would complain because
    the "info" dir doesn't exist.
 2. Tracked paths that include "*" and "\" in their filenames.
 3. If a user edits the sparse-checkout file to have non-cone pattern, such
    as "**" anywhere or "*" in the wrong place, then we should respond
    appropriately. That is: warn that the patterns are not cone-mode, then
    revert to the old logic.

Updates in V2:

 * Added C-style quoting to the output of "git sparse-checkout list" in cone
   mode.
 * Improved documentation.
 * Responded to most style feedback. Hopefully I didn't miss anything.
 * I was lingering on this a little to see if I could also fix the issue
   raised in [1], but I have not figured that one out, yet.

[1] 
https://lore.kernel.org/git/062301d5d0bc$c3e17760$4ba46620$@Frontier.com/

Thanks, -Stolee

Derrick Stolee (11):
  t1091: use check_files to reduce boilerplate
  t1091: improve here-docs
  sparse-checkout: create leading directories
  clone: fix --sparse option with URLs
  sparse-checkout: cone mode does not recognize "**"
  sparse-checkout: detect short patterns
  sparse-checkout: warn on incorrect '*' in patterns
  sparse-checkout: properly match escaped characters
  sparse-checkout: write escaped patterns in cone mode
  sparse-checkout: use C-style quotes in 'list' subcommand
  sparse-checkout: improve docs around 'set' in cone mode

Jeff King (1):
  sparse-checkout: fix documentation typo for core.sparseCheckoutCone

 Documentation/git-sparse-checkout.txt |  19 +-
 builtin/clone.c                       |   2 +-
 builtin/sparse-checkout.c             |  59 ++++-
 dir.c                                 |  68 +++++-
 dir.h                                 |   1 +
 t/t1091-sparse-checkout-builtin.sh    | 323 +++++++++++++++-----------
 6 files changed, 317 insertions(+), 155 deletions(-)


base-commit: 4fd683b6a35eabd23dd5183da7f654a1e1f00325
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-513%2Fderrickstolee%2Fsparse-harden-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-513/derrickstolee/sparse-harden-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/513

Range-diff vs v1:

  -:  ---------- >  1:  1cc825412f t1091: use check_files to reduce boilerplate
  1:  9f7791ae5e !  2:  b7a6ad145a t1091: use check_files to reduce boilerplate
     @@ -1,14 +1,11 @@
      Author: Derrick Stolee <dstolee@microsoft.com>
      
     -    t1091: use check_files to reduce boilerplate
     +    t1091: improve here-docs
      
     -    When testing the sparse-checkout feature, we need to compare the
     -    contents of the working-directory against some expected output.
     -    Using here-docs was useful in the beginning, but became repetetive
     -    as the test script grew.
     -
     -    Create a check_files helper to make the tests simpler and easier
     -    to extend. It also reduces instances of bad here-doc whitespace.
     +    t1091-sparse-checkout-builtin.sh uses here-docs to populate the
     +    expected contents of the sparse-checkout file. These do not use
     +    shell interpolation, so use "-\EOF" instead of "-EOF". Also use
     +    proper tabbing.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -16,20 +13,6 @@
       --- a/t/t1091-sparse-checkout-builtin.sh
       +++ b/t/t1091-sparse-checkout-builtin.sh
      @@
     - 	(cd "$1" && printf '%s\n' *)
     - }
     - 
     -+check_files() {
     -+	DIR=$1
     -+	printf "%s\n" $2 >expect &&
     -+	list_files $DIR >actual &&
     -+	test_cmp expect actual
     -+}
     -+
     - test_expect_success 'setup' '
     - 	git init repo &&
     - 	(
     -@@
       
       test_expect_success 'git sparse-checkout list (populated)' '
       	test_when_finished rm -f repo/.git/info/sparse-checkout &&
     @@ -59,11 +42,7 @@
       	EOF
       	test_cmp expect repo/.git/info/sparse-checkout &&
       	test_cmp_config -C repo true core.sparsecheckout &&
     --	list_files repo >dir  &&
     --	echo a >expect &&
     --	test_cmp expect dir
     -+	check_files repo a
     - '
     +@@
       
       test_expect_success 'git sparse-checkout list after init' '
       	git -C repo sparse-checkout list >actual &&
     @@ -90,16 +69,8 @@
      +	*folder*
       	EOF
       	test_cmp expect repo/.git/info/sparse-checkout &&
     --	list_files repo >dir  &&
     --	cat >expect <<-EOF &&
     --		a
     --		folder1
     --		folder2
     --	EOF
     --	test_cmp expect dir
     -+	check_files repo "a folder1 folder2"
     - '
     - 
     + 	check_files repo a folder1 folder2
     +@@
       test_expect_success 'clone --sparse' '
       	git clone --sparse repo clone &&
       	git -C clone sparse-checkout list >actual &&
     @@ -111,13 +82,7 @@
      +	!/*/
       	EOF
       	test_cmp expect actual &&
     --	list_files clone >dir &&
     --	echo a >expect &&
     --	test_cmp expect dir
     -+	check_files clone a
     - '
     - 
     - test_expect_success 'set enables config' '
     + 	check_files clone a
      @@
       
       test_expect_success 'set sparse-checkout using builtin' '
     @@ -133,15 +98,7 @@
       	EOF
       	git -C repo sparse-checkout list >actual &&
       	test_cmp expect actual &&
     - 	test_cmp expect repo/.git/info/sparse-checkout &&
     --	list_files repo >dir  &&
     --	cat >expect <<-EOF &&
     --		a
     --		folder1
     --		folder2
     --	EOF
     --	test_cmp expect dir
     -+	check_files repo "a folder1 folder2"
     +@@
       '
       
       test_expect_success 'set sparse-checkout using --stdin' '
     @@ -158,72 +115,10 @@
       	EOF
       	git -C repo sparse-checkout set --stdin <expect &&
       	git -C repo sparse-checkout list >actual &&
     - 	test_cmp expect actual &&
     - 	test_cmp expect repo/.git/info/sparse-checkout &&
     --	list_files repo >dir  &&
     --	cat >expect <<-EOF &&
     --		a
     --		folder1
     --		folder2
     --	EOF
     --	test_cmp expect dir
     -+	check_files repo "a folder1 folder2"
     - '
     - 
     - test_expect_success 'cone mode: match patterns' '
     -@@
     - 	git -C repo read-tree -mu HEAD 2>err &&
     - 	test_i18ngrep ! "disabling cone patterns" err &&
     - 	git -C repo reset --hard &&
     --	list_files repo >dir  &&
     --	cat >expect <<-EOF &&
     --		a
     --		folder1
     --		folder2
     --	EOF
     --	test_cmp expect dir
     -+	check_files repo "a folder1 folder2"
     - '
     - 
     - test_expect_success 'cone mode: warn on bad pattern' '
      @@
     - 	test_path_is_file repo/.git/info/sparse-checkout &&
     - 	git -C repo config --list >config &&
     - 	test_must_fail git config core.sparseCheckout &&
     --	list_files repo >dir &&
     --	cat >expect <<-EOF &&
     --		a
     --		deep
     --		folder1
     --		folder2
     --	EOF
     --	test_cmp expect dir
     -+	check_files repo "a deep folder1 folder2"
     - '
     - 
     - test_expect_success 'cone mode: init and set' '
     -@@
     - 	test_cmp expect dir &&
     - 	git -C repo sparse-checkout set deep/deeper1/deepest/ 2>err &&
     - 	test_must_be_empty err &&
     --	list_files repo >dir  &&
     --	cat >expect <<-EOF &&
     --		a
     --		deep
     --	EOF
     --	test_cmp expect dir &&
     --	list_files repo/deep >dir  &&
     --	cat >expect <<-EOF &&
     --		a
     --		deeper1
     --	EOF
     --	test_cmp expect dir &&
     --	list_files repo/deep/deeper1 >dir  &&
     --	cat >expect <<-EOF &&
     --		a
     --		deepest
     --	EOF
     --	test_cmp expect dir &&
     + 	check_files repo a deep &&
     + 	check_files repo/deep a deeper1 &&
     + 	check_files repo/deep/deeper1 a deepest &&
      -	cat >expect <<-EOF &&
      -		/*
      -		!/*/
     @@ -232,9 +127,6 @@
      -		/deep/deeper1/
      -		!/deep/deeper1/*/
      -		/deep/deeper1/deepest/
     -+	check_files repo "a deep" &&
     -+	check_files repo/deep "a deeper1" &&
     -+	check_files repo/deep/deeper1 "a deepest" &&
      +	cat >expect <<-\EOF &&
      +	/*
      +	!/*/
     @@ -253,14 +145,7 @@
      +	folder2
       	EOF
       	test_must_be_empty err &&
     --	cat >expect <<-EOF &&
     --		a
     --		folder1
     --		folder2
     --	EOF
     --	list_files repo >dir &&
     --	test_cmp expect dir
     -+	check_files repo "a folder1 folder2"
     + 	check_files repo a folder1 folder2
       '
       
       test_expect_success 'cone mode: list' '
     @@ -288,21 +173,6 @@
       	EOF
       	test_cmp repo/.git/info/sparse-checkout expect
       '
     -@@
     - 	test_must_fail git -C repo sparse-checkout set deep/deeper1 2>err &&
     - 	test_i18ngrep "cannot set sparse-checkout patterns" err &&
     - 	test_cmp repo/.git/info/sparse-checkout expect &&
     --	list_files repo/deep >dir &&
     --	cat >expect <<-EOF &&
     --		a
     --		deeper1
     --		deeper2
     --	EOF
     --	test_cmp dir expect
     -+	check_files repo/deep "a deeper1 deeper2"
     - '
     - 
     - test_expect_success 'revert to old sparse-checkout on empty update' '
      @@
       test_expect_success 'cone mode: set with core.ignoreCase=true' '
       	git -C repo sparse-checkout init --cone &&
     @@ -317,37 +187,4 @@
      +	/folder1/
       	EOF
       	test_cmp expect repo/.git/info/sparse-checkout &&
     --	list_files repo >dir &&
     --	cat >expect <<-EOF &&
     --		a
     --		folder1
     --	EOF
     --	test_cmp expect dir
     -+	check_files repo "a folder1"
     - '
     - 
     - test_expect_success 'interaction with submodules' '
     -@@
     - 		git sparse-checkout init --cone &&
     - 		git sparse-checkout set folder1
     - 	) &&
     --	list_files super >dir &&
     --	cat >expect <<-\EOF &&
     --		a
     --		folder1
     --		modules
     --	EOF
     --	test_cmp expect dir &&
     --	list_files super/modules/child >dir &&
     --	cat >expect <<-\EOF &&
     --		a
     --		deep
     --		folder1
     --		folder2
     --	EOF
     --	test_cmp expect dir
     -+	check_files super "a folder1 modules" &&
     -+	check_files super/modules/child "a deep folder1 folder2"
     - '
     - 
     - test_done
     + 	check_files repo a folder1
  2:  53a266f9aa !  3:  5497ad8778 sparse-checkout: create leading directories
     @@ -34,7 +34,7 @@
       --- a/t/t1091-sparse-checkout-builtin.sh
       +++ b/t/t1091-sparse-checkout-builtin.sh
      @@
     - 	check_files super/modules/child "a deep folder1 folder2"
     + 	check_files super/modules/child a deep folder1 folder2
       '
       
      +test_expect_success 'different sparse-checkouts with worktrees' '
     @@ -43,8 +43,8 @@
      +	git -C worktree sparse-checkout init --cone &&
      +	git -C repo sparse-checkout set folder1 &&
      +	git -C worktree sparse-checkout set deep/deeper1 &&
     -+	check_files repo "a folder1" &&
     -+	check_files worktree "a deep"
     ++	check_files repo a folder1 &&
     ++	check_files worktree a deep
      +'
      +
       test_done
  3:  3ef8e021a5 !  4:  4991a51f6d clone: fix --sparse option with URLs
     @@ -22,6 +22,7 @@
          Update that target directory to evaluate this correctly. I have also
          manually tested that https:// URLs are handled correctly as well.
      
     +    Acked-by: Taylor Blau <me@ttaylorr.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       diff --git a/builtin/clone.c b/builtin/clone.c
  -:  ---------- >  5:  ae78c3069b sparse-checkout: fix documentation typo for core.sparseCheckoutCone
  4:  dfa7e20444 !  6:  2ad4d3e467 sparse-checkout: cone mode does not recognize "**"
     @@ -30,7 +30,6 @@
       
      +	if (strstr(given->pattern, "**")) {
      +		/* Not a cone pattern. */
     -+		pl->use_cone_patterns = 0;
      +		warning(_("unrecognized pattern: '%s'"), given->pattern);
      +		goto clear_hashmaps;
      +	}
     @@ -38,12 +37,17 @@
       	if (given->patternlen > 2 &&
       	    !strcmp(given->pattern + given->patternlen - 2, "/*")) {
       		if (!(given->flags & PATTERN_FLAG_NEGATIVE)) {
     + 			/* Not a cone pattern. */
     +-			pl->use_cone_patterns = 0;
     + 			warning(_("unrecognized pattern: '%s'"), given->pattern);
     + 			goto clear_hashmaps;
     + 		}
      
       diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
       --- a/t/t1091-sparse-checkout-builtin.sh
       +++ b/t/t1091-sparse-checkout-builtin.sh
      @@
     - 	check_files worktree "a deep"
     + 	check_files worktree a deep
       '
       
      +check_read_tree_errors () {
     @@ -57,7 +61,7 @@
      +	else
      +		test_i18ngrep "$ERRORS" err
      +	fi &&
     -+	check_files $REPO "$FILES"
     ++	check_files $REPO $FILES
      +}
      +
      +test_expect_success 'pattern-checks: /A/**' '
  5:  9be49908fd !  7:  aace064510 sparse-checkout: detect short patterns
     @@ -21,8 +21,8 @@
      +	if (given->patternlen <= 2 ||
      +	    strstr(given->pattern, "**")) {
       		/* Not a cone pattern. */
     - 		pl->use_cone_patterns = 0;
       		warning(_("unrecognized pattern: '%s'"), given->pattern);
     + 		goto clear_hashmaps;
      
       diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
       --- a/t/t1091-sparse-checkout-builtin.sh
  6:  77a514f50b !  8:  d2a510a3bb sparse-checkout: warn on incorrect '*' in patterns
     @@ -34,8 +34,7 @@
      +	    *given->pattern == '*' ||
       	    strstr(given->pattern, "**")) {
       		/* Not a cone pattern. */
     - 		pl->use_cone_patterns = 0;
     -@@
     + 		warning(_("unrecognized pattern: '%s'"), given->pattern);
       		goto clear_hashmaps;
       	}
       
     @@ -57,7 +56,6 @@
      +			goto increment;
      +
      +		/* Not a cone pattern. */
     -+		pl->use_cone_patterns = 0;
      +		warning(_("unrecognized pattern: '%s'"), given->pattern);
      +		goto clear_hashmaps;
      +
  7:  09dbe1f902 !  9:  65c53d7526 sparse-checkout: properly match escaped characters
     @@ -9,7 +9,7 @@
          Create dup_and_filter_pattern() method to convert a pattern by
          removing escape characters and dropping an optional "/*" at the end.
          This method is available in dir.h as we will use it in
     -    builtin/sparse-chekcout.c in a later change.
     +    builtin/sparse-checkout.c in a later change.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
  8:  79b6e9a565 = 10:  c27a17a2fc sparse-checkout: write escaped patterns in cone mode
  -:  ---------- > 11:  526d5becbc sparse-checkout: use C-style quotes in 'list' subcommand
  -:  ---------- > 12:  1b5858adee sparse-checkout: improve docs around 'set' in cone mode

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 01/12] t1091: use check_files to reduce boilerplate
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 02/12] t1091: improve here-docs Derrick Stolee via GitGitGadget
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When testing the sparse-checkout feature, we need to compare the
contents of the working-directory against some expected output.
Using here-docs was useful in the beginning, but became repetetive
as the test script grew.

Create a check_files helper to make the tests simpler and easier
to extend. It also reduces instances of bad here-doc whitespace.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1091-sparse-checkout-builtin.sh | 117 ++++++-----------------------
 1 file changed, 22 insertions(+), 95 deletions(-)

diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index ff7f8f7a1f..e058a20ad6 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -12,6 +12,13 @@ list_files() {
 	(cd "$1" && printf '%s\n' *)
 }
 
+check_files() {
+	list_files "$1" >actual &&
+	shift &&
+	printf "%s\n" $@ >expect &&
+	test_cmp expect actual
+}
+
 test_expect_success 'setup' '
 	git init repo &&
 	(
@@ -58,9 +65,7 @@ test_expect_success 'git sparse-checkout init' '
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
 	test_cmp_config -C repo true core.sparsecheckout &&
-	list_files repo >dir  &&
-	echo a >expect &&
-	test_cmp expect dir
+	check_files repo a
 '
 
 test_expect_success 'git sparse-checkout list after init' '
@@ -81,13 +86,7 @@ test_expect_success 'init with existing sparse-checkout' '
 		*folder*
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo a folder1 folder2
 '
 
 test_expect_success 'clone --sparse' '
@@ -98,9 +97,7 @@ test_expect_success 'clone --sparse' '
 		!/*/
 	EOF
 	test_cmp expect actual &&
-	list_files clone >dir &&
-	echo a >expect &&
-	test_cmp expect dir
+	check_files clone a
 '
 
 test_expect_success 'set enables config' '
@@ -127,13 +124,7 @@ test_expect_success 'set sparse-checkout using builtin' '
 	git -C repo sparse-checkout list >actual &&
 	test_cmp expect actual &&
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo a folder1 folder2
 '
 
 test_expect_success 'set sparse-checkout using --stdin' '
@@ -147,13 +138,7 @@ test_expect_success 'set sparse-checkout using --stdin' '
 	git -C repo sparse-checkout list >actual &&
 	test_cmp expect actual &&
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo "a folder1 folder2"
 '
 
 test_expect_success 'cone mode: match patterns' '
@@ -162,13 +147,7 @@ test_expect_success 'cone mode: match patterns' '
 	git -C repo read-tree -mu HEAD 2>err &&
 	test_i18ngrep ! "disabling cone patterns" err &&
 	git -C repo reset --hard &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo a folder1 folder2
 '
 
 test_expect_success 'cone mode: warn on bad pattern' '
@@ -185,14 +164,7 @@ test_expect_success 'sparse-checkout disable' '
 	test_path_is_file repo/.git/info/sparse-checkout &&
 	git -C repo config --list >config &&
 	test_must_fail git config core.sparseCheckout &&
-	list_files repo >dir &&
-	cat >expect <<-EOF &&
-		a
-		deep
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files repo a deep folder1 folder2
 '
 
 test_expect_success 'cone mode: init and set' '
@@ -204,24 +176,9 @@ test_expect_success 'cone mode: init and set' '
 	test_cmp expect dir &&
 	git -C repo sparse-checkout set deep/deeper1/deepest/ 2>err &&
 	test_must_be_empty err &&
-	list_files repo >dir  &&
-	cat >expect <<-EOF &&
-		a
-		deep
-	EOF
-	test_cmp expect dir &&
-	list_files repo/deep >dir  &&
-	cat >expect <<-EOF &&
-		a
-		deeper1
-	EOF
-	test_cmp expect dir &&
-	list_files repo/deep/deeper1 >dir  &&
-	cat >expect <<-EOF &&
-		a
-		deepest
-	EOF
-	test_cmp expect dir &&
+	check_files repo a deep &&
+	check_files repo/deep a deeper1 &&
+	check_files repo/deep/deeper1 a deepest &&
 	cat >expect <<-EOF &&
 		/*
 		!/*/
@@ -237,13 +194,7 @@ test_expect_success 'cone mode: init and set' '
 		folder2
 	EOF
 	test_must_be_empty err &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-		folder2
-	EOF
-	list_files repo >dir &&
-	test_cmp expect dir
+	check_files repo a folder1 folder2
 '
 
 test_expect_success 'cone mode: list' '
@@ -275,13 +226,7 @@ test_expect_success 'revert to old sparse-checkout on bad update' '
 	test_must_fail git -C repo sparse-checkout set deep/deeper1 2>err &&
 	test_i18ngrep "cannot set sparse-checkout patterns" err &&
 	test_cmp repo/.git/info/sparse-checkout expect &&
-	list_files repo/deep >dir &&
-	cat >expect <<-EOF &&
-		a
-		deeper1
-		deeper2
-	EOF
-	test_cmp dir expect
+	check_files repo/deep a deeper1 deeper2
 '
 
 test_expect_success 'revert to old sparse-checkout on empty update' '
@@ -332,12 +277,7 @@ test_expect_success 'cone mode: set with core.ignoreCase=true' '
 		/folder1/
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	list_files repo >dir &&
-	cat >expect <<-EOF &&
-		a
-		folder1
-	EOF
-	test_cmp expect dir
+	check_files repo a folder1
 '
 
 test_expect_success 'interaction with submodules' '
@@ -351,21 +291,8 @@ test_expect_success 'interaction with submodules' '
 		git sparse-checkout init --cone &&
 		git sparse-checkout set folder1
 	) &&
-	list_files super >dir &&
-	cat >expect <<-\EOF &&
-		a
-		folder1
-		modules
-	EOF
-	test_cmp expect dir &&
-	list_files super/modules/child >dir &&
-	cat >expect <<-\EOF &&
-		a
-		deep
-		folder1
-		folder2
-	EOF
-	test_cmp expect dir
+	check_files super a folder1 modules &&
+	check_files super/modules/child a deep folder1 folder2
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 02/12] t1091: improve here-docs
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 03/12] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

t1091-sparse-checkout-builtin.sh uses here-docs to populate the
expected contents of the sparse-checkout file. These do not use
shell interpolation, so use "-\EOF" instead of "-EOF". Also use
proper tabbing.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1091-sparse-checkout-builtin.sh | 98 +++++++++++++++---------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index e058a20ad6..e28e1c797f 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -46,11 +46,11 @@ test_expect_success 'git sparse-checkout list (empty)' '
 
 test_expect_success 'git sparse-checkout list (populated)' '
 	test_when_finished rm -f repo/.git/info/sparse-checkout &&
-	cat >repo/.git/info/sparse-checkout <<-EOF &&
-		/folder1/*
-		/deep/
-		**/a
-		!*bin*
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/folder1/*
+	/deep/
+	**/a
+	!*bin*
 	EOF
 	cp repo/.git/info/sparse-checkout expect &&
 	git -C repo sparse-checkout list >list &&
@@ -59,9 +59,9 @@ test_expect_success 'git sparse-checkout list (populated)' '
 
 test_expect_success 'git sparse-checkout init' '
 	git -C repo sparse-checkout init &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
 	test_cmp_config -C repo true core.sparsecheckout &&
@@ -70,9 +70,9 @@ test_expect_success 'git sparse-checkout init' '
 
 test_expect_success 'git sparse-checkout list after init' '
 	git -C repo sparse-checkout list >actual &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
 	EOF
 	test_cmp expect actual
 '
@@ -80,10 +80,10 @@ test_expect_success 'git sparse-checkout list after init' '
 test_expect_success 'init with existing sparse-checkout' '
 	echo "*folder*" >> repo/.git/info/sparse-checkout &&
 	git -C repo sparse-checkout init &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		*folder*
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	*folder*
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
 	check_files repo a folder1 folder2
@@ -92,9 +92,9 @@ test_expect_success 'init with existing sparse-checkout' '
 test_expect_success 'clone --sparse' '
 	git clone --sparse repo clone &&
 	git -C clone sparse-checkout list >actual &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
 	EOF
 	test_cmp expect actual &&
 	check_files clone a
@@ -116,10 +116,10 @@ test_expect_success 'set enables config' '
 
 test_expect_success 'set sparse-checkout using builtin' '
 	git -C repo sparse-checkout set "/*" "!/*/" "*folder*" &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		*folder*
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	*folder*
 	EOF
 	git -C repo sparse-checkout list >actual &&
 	test_cmp expect actual &&
@@ -128,11 +128,11 @@ test_expect_success 'set sparse-checkout using builtin' '
 '
 
 test_expect_success 'set sparse-checkout using --stdin' '
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/folder1/
-		/folder2/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/folder1/
+	/folder2/
 	EOF
 	git -C repo sparse-checkout set --stdin <expect &&
 	git -C repo sparse-checkout list >actual &&
@@ -179,28 +179,28 @@ test_expect_success 'cone mode: init and set' '
 	check_files repo a deep &&
 	check_files repo/deep a deeper1 &&
 	check_files repo/deep/deeper1 a deepest &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/deep/
-		!/deep/*/
-		/deep/deeper1/
-		!/deep/deeper1/*/
-		/deep/deeper1/deepest/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/deep/
+	!/deep/*/
+	/deep/deeper1/
+	!/deep/deeper1/*/
+	/deep/deeper1/deepest/
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
-	git -C repo sparse-checkout set --stdin 2>err <<-EOF &&
-		folder1
-		folder2
+	git -C repo sparse-checkout set --stdin 2>err <<-\EOF &&
+	folder1
+	folder2
 	EOF
 	test_must_be_empty err &&
 	check_files repo a folder1 folder2
 '
 
 test_expect_success 'cone mode: list' '
-	cat >expect <<-EOF &&
-		folder1
-		folder2
+	cat >expect <<-\EOF &&
+	folder1
+	folder2
 	EOF
 	git -C repo sparse-checkout set --stdin <expect &&
 	git -C repo sparse-checkout list >actual 2>err &&
@@ -211,10 +211,10 @@ test_expect_success 'cone mode: list' '
 test_expect_success 'cone mode: set with nested folders' '
 	git -C repo sparse-checkout set deep deep/deeper1/deepest 2>err &&
 	test_line_count = 0 err &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/deep/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/deep/
 	EOF
 	test_cmp repo/.git/info/sparse-checkout expect
 '
@@ -271,10 +271,10 @@ test_expect_success 'sparse-checkout (init|set|disable) fails with dirty status'
 test_expect_success 'cone mode: set with core.ignoreCase=true' '
 	git -C repo sparse-checkout init --cone &&
 	git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
-	cat >expect <<-EOF &&
-		/*
-		!/*/
-		/folder1/
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/folder1/
 	EOF
 	test_cmp expect repo/.git/info/sparse-checkout &&
 	check_files repo a folder1
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 03/12] sparse-checkout: create leading directories
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 02/12] t1091: improve here-docs Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 04/12] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git init' command creates the ".git/info" directory and fills it
with some default files. However, 'git worktree add' does not create
the info directory for that worktree. This causes a problem when running
"git sparse-checkout init" inside a worktree. While care was taken to
allow the sparse-checkout config to be specific to a worktree, this
initialization was untested.

Safely create the leading directories for the sparse-checkout file. This
is the safest thing to do even without worktrees, as a user could delete
their ".git/info" directory and expect Git to recover safely.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c          |  4 ++++
 t/t1091-sparse-checkout-builtin.sh | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index b3bed891cb..3cee8ab46e 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -199,6 +199,10 @@ static int write_patterns_and_update(struct pattern_list *pl)
 	int result;
 
 	sparse_filename = get_sparse_checkout_filename();
+
+	if (safe_create_leading_directories(sparse_filename))
+		die(_("failed to create directory for sparse-checkout file"));
+
 	fd = hold_lock_file_for_update(&lk, sparse_filename,
 				      LOCK_DIE_ON_ERROR);
 
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index e28e1c797f..43d1f7520c 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -295,4 +295,14 @@ test_expect_success 'interaction with submodules' '
 	check_files super/modules/child a deep folder1 folder2
 '
 
+test_expect_success 'different sparse-checkouts with worktrees' '
+	git -C repo worktree add --detach ../worktree &&
+	check_files worktree "a deep folder1 folder2" &&
+	git -C worktree sparse-checkout init --cone &&
+	git -C repo sparse-checkout set folder1 &&
+	git -C worktree sparse-checkout set deep/deeper1 &&
+	check_files repo a folder1 &&
+	check_files worktree a deep
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 04/12] clone: fix --sparse option with URLs
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 03/12] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The --sparse option was added to the clone builtin in d89f09c (clone:
add --sparse mode, 2019-11-21) and was tested with a local path clone
in t1091-sparse-checkout-builtin.sh. However, due to a difference in
how local paths are handled versus URLs, this mechanism does not work
with URLs.

Modify the test to use a "file://" URL, which would output this error
before the code change:

  Cloning into 'clone'...
  fatal: cannot change to 'file://.../repo': No such file or directory
  error: failed to initialize sparse-checkout

These errors are due to using a "-C <path>" option to call 'git -C
<path> sparse-checkout init' but the URL is being given instead of
the target directory.

Update that target directory to evaluate this correctly. I have also
manually tested that https:// URLs are handled correctly as well.

Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/clone.c                    | 2 +-
 t/t1091-sparse-checkout-builtin.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4348d962c9..2caefc44fb 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1130,7 +1130,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_required_reference.nr || option_optional_reference.nr)
 		setup_reference();
 
-	if (option_sparse_checkout && git_sparse_checkout_init(repo))
+	if (option_sparse_checkout && git_sparse_checkout_init(dir))
 		return 1;
 
 	remote = remote_get(option_origin);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 43d1f7520c..cf4a595c86 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -90,7 +90,7 @@ test_expect_success 'init with existing sparse-checkout' '
 '
 
 test_expect_success 'clone --sparse' '
-	git clone --sparse repo clone &&
+	git clone --sparse "file://$(pwd)/repo" clone &&
 	git -C clone sparse-checkout list >actual &&
 	cat >expect <<-\EOF &&
 	/*
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 04/12] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Jeff King via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 06/12] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Jeff King via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Jeff King

From: Jeff King <peff@peff.net>

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 3b341cf0fc..4834fb434d 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -106,7 +106,7 @@ The full pattern set allows for arbitrary pattern matches and complicated
 inclusion/exclusion rules. These can result in O(N*M) pattern matches when
 updating the index, where N is the number of patterns and M is the number
 of paths in the index. To combat this performance issue, a more restricted
-pattern set is allowed when `core.spareCheckoutCone` is enabled.
+pattern set is allowed when `core.sparseCheckoutCone` is enabled.
 
 The accepted patterns in the cone pattern set are:
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 06/12] sparse-checkout: cone mode does not recognize "**"
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 07/12] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set'
command creates a restricted set of possible patterns that are used
by a custom algorithm to quickly match those patterns.

If a user manually edits the sparse-checkout file, then they could
create patterns that do not match these expectations. The cone-mode
matching algorithm can return incorrect results. The solution is to
detect these incorrect patterns, warn that we do not recognize them,
and revert to the standard algorithm.

Check each pattern for the "**" substring, and revert to the old
logic if seen. While technically a "/<dir>/**" pattern matches
the meaning of "/<dir>/", it is not one that would be written by
the sparse-checkout builtin in cone mode. Attempting to accept that
pattern change complicates the logic and instead we punt and do
not accept any instance of "**".

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              |  7 +++++-
 t/t1091-sparse-checkout-builtin.sh | 34 ++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 22d08e61c2..40fed73a94 100644
--- a/dir.c
+++ b/dir.c
@@ -651,11 +651,16 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 		return;
 	}
 
+	if (strstr(given->pattern, "**")) {
+		/* Not a cone pattern. */
+		warning(_("unrecognized pattern: '%s'"), given->pattern);
+		goto clear_hashmaps;
+	}
+
 	if (given->patternlen > 2 &&
 	    !strcmp(given->pattern + given->patternlen - 2, "/*")) {
 		if (!(given->flags & PATTERN_FLAG_NEGATIVE)) {
 			/* Not a cone pattern. */
-			pl->use_cone_patterns = 0;
 			warning(_("unrecognized pattern: '%s'"), given->pattern);
 			goto clear_hashmaps;
 		}
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index cf4a595c86..e2e45dc7fd 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -305,4 +305,38 @@ test_expect_success 'different sparse-checkouts with worktrees' '
 	check_files worktree a deep
 '
 
+check_read_tree_errors () {
+	REPO=$1
+	FILES=$2
+	ERRORS=$3
+	git -C $REPO read-tree -mu HEAD 2>err &&
+	if test -z "$ERRORS"
+	then
+		test_must_be_empty err
+	else
+		test_i18ngrep "$ERRORS" err
+	fi &&
+	check_files $REPO $FILES
+}
+
+test_expect_success 'pattern-checks: /A/**' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/folder1/**
+	EOF
+	check_read_tree_errors repo "a folder1" "disabling cone pattern matching"
+'
+
+test_expect_success 'pattern-checks: /A/**/B/' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/deep/**/deepest
+	EOF
+	check_read_tree_errors repo "a deep" "disabling cone pattern matching" &&
+	check_files repo/deep "deeper1" &&
+	check_files repo/deep/deeper1 "deepest"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 07/12] sparse-checkout: detect short patterns
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 06/12] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 08/12] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In cone mode, the shortest pattern the sparse-checkout command will
write into the sparse-checkout file is "/*". This is handled carefully
in add_pattern_to_hashsets(), so warn if any other pattern is this
short. This will assist future pattern checks by allowing us to assume
there are at least three characters in the pattern.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              | 3 ++-
 t/t1091-sparse-checkout-builtin.sh | 9 +++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 40fed73a94..c2e585607e 100644
--- a/dir.c
+++ b/dir.c
@@ -651,7 +651,8 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 		return;
 	}
 
-	if (strstr(given->pattern, "**")) {
+	if (given->patternlen <= 2 ||
+	    strstr(given->pattern, "**")) {
 		/* Not a cone pattern. */
 		warning(_("unrecognized pattern: '%s'"), given->pattern);
 		goto clear_hashmaps;
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index e2e45dc7fd..2e57534799 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -339,4 +339,13 @@ test_expect_success 'pattern-checks: /A/**/B/' '
 	check_files repo/deep/deeper1 "deepest"
 '
 
+test_expect_success 'pattern-checks: too short' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/a
+	EOF
+	check_read_tree_errors repo "a" "disabling cone pattern matching"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 08/12] sparse-checkout: warn on incorrect '*' in patterns
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 07/12] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 09/12] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In cone mode, the sparse-checkout commmand will write patterns that
allow faster pattern matching. This matching only works if the patterns
in the sparse-checkout file are those written by that command. Users
can edit the sparse-checkout file and create patterns that cause the
cone mode matching to fail.

The cone mode patterns may end in "/*" but otherwise an un-escaped
asterisk is invalid. Add checks to disable cone mode when seeing these
values.

A later change will properly handle escaped asterisks.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              | 29 +++++++++++++++++++++++++++++
 t/t1091-sparse-checkout-builtin.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/dir.c b/dir.c
index c2e585607e..7cb78c8b87 100644
--- a/dir.c
+++ b/dir.c
@@ -635,6 +635,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 	struct pattern_entry *translated;
 	char *truncated;
 	char *data = NULL;
+	const char *prev, *cur, *next;
 
 	if (!pl->use_cone_patterns)
 		return;
@@ -652,12 +653,40 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 	}
 
 	if (given->patternlen <= 2 ||
+	    *given->pattern == '*' ||
 	    strstr(given->pattern, "**")) {
 		/* Not a cone pattern. */
 		warning(_("unrecognized pattern: '%s'"), given->pattern);
 		goto clear_hashmaps;
 	}
 
+	prev = given->pattern;
+	cur = given->pattern + 1;
+	next = given->pattern + 2;
+
+	while (*cur) {
+		/* We care about *cur == '*' */
+		if (*cur != '*')
+			goto increment;
+
+		/* But only if *prev != '\\' */
+		if (*prev == '\\')
+			goto increment;
+
+		/* But a trailing '/' then '*' is fine */
+		if (*prev == '/' && *next == 0)
+			goto increment;
+
+		/* Not a cone pattern. */
+		warning(_("unrecognized pattern: '%s'"), given->pattern);
+		goto clear_hashmaps;
+
+	increment:
+		prev++;
+		cur++;
+		next++;
+	}
+
 	if (given->patternlen > 2 &&
 	    !strcmp(given->pattern + given->patternlen - 2, "/*")) {
 		if (!(given->flags & PATTERN_FLAG_NEGATIVE)) {
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 2e57534799..470900f6f4 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -348,4 +348,31 @@ test_expect_success 'pattern-checks: too short' '
 	check_read_tree_errors repo "a" "disabling cone pattern matching"
 '
 
+test_expect_success 'pattern-checks: trailing "*"' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/a*
+	EOF
+	check_read_tree_errors repo "a" "disabling cone pattern matching"
+'
+
+test_expect_success 'pattern-checks: starting "*"' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	*eep/
+	EOF
+	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
+'
+
+test_expect_success 'pattern-checks: escaped "*"' '
+	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+	/*
+	!/*/
+	/does\*not\*exist/
+	EOF
+	check_read_tree_errors repo "a" ""
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 09/12] sparse-checkout: properly match escaped characters
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 08/12] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 10/12] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

In cone mode, the sparse-checkout feature uses hashset containment
queries to match paths. Make this algorithm respect escaped asterisk
(*) and backslash (\) characters.

Create dup_and_filter_pattern() method to convert a pattern by
removing escape characters and dropping an optional "/*" at the end.
This method is available in dir.h as we will use it in
builtin/sparse-checkout.c in a later change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 dir.c                              | 31 +++++++++++++++++++++++++++---
 dir.h                              |  1 +
 t/t1091-sparse-checkout-builtin.sh | 22 +++++++++++++++++----
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/dir.c b/dir.c
index 7cb78c8b87..6d8abc09c3 100644
--- a/dir.c
+++ b/dir.c
@@ -630,6 +630,32 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
 	return strncmp(ee1->pattern, ee2->pattern, min_len);
 }
 
+char *dup_and_filter_pattern(const char *pattern)
+{
+	char *set, *read;
+	char *result = xstrdup(pattern);
+
+	set = result;
+	read = result;
+
+	while (*read) {
+		/* skip escape characters (once) */
+		if (*read == '\\')
+			read++;
+
+		*set = *read;
+
+		set++;
+		read++;
+	}
+	*set = 0;
+
+	if (*(read - 2) == '/' && *(read - 1) == '*')
+		*(read - 2) = 0;
+
+	return result;
+}
+
 static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern *given)
 {
 	struct pattern_entry *translated;
@@ -695,8 +721,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 			goto clear_hashmaps;
 		}
 
-		truncated = xstrdup(given->pattern);
-		truncated[given->patternlen - 2] = 0;
+		truncated = dup_and_filter_pattern(given->pattern);
 
 		translated = xmalloc(sizeof(struct pattern_entry));
 		translated->pattern = truncated;
@@ -730,7 +755,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 
 	translated = xmalloc(sizeof(struct pattern_entry));
 
-	translated->pattern = xstrdup(given->pattern);
+	translated->pattern = dup_and_filter_pattern(given->pattern);
 	translated->patternlen = given->patternlen;
 	hashmap_entry_init(&translated->ent,
 			   ignore_case ?
diff --git a/dir.h b/dir.h
index 77a43dbf89..6dcd9d33e7 100644
--- a/dir.h
+++ b/dir.h
@@ -304,6 +304,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
 		   const struct hashmap_entry *a,
 		   const struct hashmap_entry *b,
 		   const void *key);
+char *dup_and_filter_pattern(const char *pattern);
 int hashmap_contains_parent(struct hashmap *map,
 			    const char *path,
 			    struct strbuf *buffer);
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 470900f6f4..0a21a5e15d 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -366,13 +366,27 @@ test_expect_success 'pattern-checks: starting "*"' '
 	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
 '
 
-test_expect_success 'pattern-checks: escaped "*"' '
-	cat >repo/.git/info/sparse-checkout <<-\EOF &&
+test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' '
+	git clone repo escaped &&
+	TREEOID=$(git -C escaped rev-parse HEAD:folder1) &&
+	NEWTREE=$(git -C escaped mktree <<-EOF
+	$(git -C escaped ls-tree HEAD)
+	040000 tree $TREEOID	zbad\\dir
+	040000 tree $TREEOID	zdoes*exist
+	EOF
+	) &&
+	COMMIT=$(git -C escaped commit-tree $NEWTREE -p HEAD) &&
+	git -C escaped reset --hard $COMMIT &&
+	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist" &&
+	git -C escaped sparse-checkout init --cone &&
+	cat >escaped/.git/info/sparse-checkout <<-\EOF &&
 	/*
 	!/*/
-	/does\*not\*exist/
+	/zbad\\dir/
+	/zdoes\*not\*exist/
+	/zdoes\*exist/
 	EOF
-	check_read_tree_errors repo "a" ""
+	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 10/12] sparse-checkout: write escaped patterns in cone mode
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 09/12] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 11/12] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 12/12] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.

Even more specifically, the goal is to always allow the following from
the root of a repo:

  git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin

The ls-tree command provides directory names with an unescaped asterisk.
It also quotes the directories that contain an escaped backslash. We
must remove these quotes, then keep the escaped backslashes.

However, there is some care needed for the timing of these escapes. The
in-memory pattern list is used to update the working directory before
writing the patterns to disk. Thus, we need the command to have the
unescaped names in the hashsets for the cone comparisons, then escape
the patterns later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c          | 48 ++++++++++++++++++++++++++++--
 t/t1091-sparse-checkout-builtin.sh | 21 +++++++++++--
 2 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 3cee8ab46e..61d2c30036 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -140,6 +140,22 @@ static int update_working_directory(struct pattern_list *pl)
 	return result;
 }
 
+static char *escaped_pattern(char *pattern)
+{
+	char *p = pattern;
+	struct strbuf final = STRBUF_INIT;
+
+	while (*p) {
+		if (*p == '*' || *p == '\\')
+			strbuf_addch(&final, '\\');
+
+		strbuf_addch(&final, *p);
+		p++;
+	}
+
+	return strbuf_detach(&final, NULL);
+}
+
 static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 {
 	int i;
@@ -164,10 +180,11 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 	fprintf(fp, "/*\n!/*/\n");
 
 	for (i = 0; i < sl.nr; i++) {
-		char *pattern = sl.items[i].string;
+		char *pattern = escaped_pattern(sl.items[i].string);
 
 		if (strlen(pattern))
 			fprintf(fp, "%s/\n!%s/*/\n", pattern, pattern);
+		free(pattern);
 	}
 
 	string_list_clear(&sl, 0);
@@ -185,8 +202,9 @@ static void write_cone_to_file(FILE *fp, struct pattern_list *pl)
 	string_list_remove_duplicates(&sl, 0);
 
 	for (i = 0; i < sl.nr; i++) {
-		char *pattern = sl.items[i].string;
+		char *pattern = escaped_pattern(sl.items[i].string);
 		fprintf(fp, "%s/\n", pattern);
+		free(pattern);
 	}
 }
 
@@ -337,7 +355,9 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 {
 	struct pattern_entry *e = xmalloc(sizeof(*e));
 	e->patternlen = path->len;
-	e->pattern = strbuf_detach(path, NULL);
+	e->pattern = dup_and_filter_pattern(path->buf);
+	strbuf_release(path);
+
 	hashmap_entry_init(&e->ent,
 			   ignore_case ?
 			   strihash(e->pattern) :
@@ -369,6 +389,7 @@ static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *pat
 
 static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 {
+	int i;
 	strbuf_trim(line);
 
 	strbuf_trim_trailing_dir_sep(line);
@@ -376,6 +397,27 @@ static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 	if (!line->len)
 		return;
 
+	for (i = 0; i < line->len; i++) {
+		if (line->buf[i] == '*') {
+			strbuf_insert(line, i, "\\", 1);
+			i++;
+		}
+
+		if (line->buf[i] == '\\') {
+			if (i < line->len - 1 && line->buf[i + 1] == '\\')
+				i++;
+			else
+				strbuf_insert(line, i, "\\", 1);
+
+			i++;
+		}
+	}
+
+	if (line->buf[0] == '"' && line->buf[line->len - 1] == '"') {
+		strbuf_remove(line, 0, 1);
+		strbuf_remove(line, line->len - 1, 1);
+	}
+
 	if (line->buf[0] != '/')
 		strbuf_insert(line, 0, "/", 1);
 
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 0a21a5e15d..2bb30cbe29 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -309,6 +309,9 @@ check_read_tree_errors () {
 	REPO=$1
 	FILES=$2
 	ERRORS=$3
+	git -C $REPO -c core.sparseCheckoutCone=false read-tree -mu HEAD 2>err &&
+	test_must_be_empty err &&
+	check_files $REPO "$FILES" &&
 	git -C $REPO read-tree -mu HEAD 2>err &&
 	if test -z "$ERRORS"
 	then
@@ -379,14 +382,28 @@ test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' '
 	git -C escaped reset --hard $COMMIT &&
 	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist" &&
 	git -C escaped sparse-checkout init --cone &&
-	cat >escaped/.git/info/sparse-checkout <<-\EOF &&
+	git -C escaped sparse-checkout set zbad\\dir zdoes\*not\*exist zdoes\*exist &&
+	cat >expect <<-\EOF &&
 	/*
 	!/*/
 	/zbad\\dir/
+	/zdoes\*exist/
 	/zdoes\*not\*exist/
+	EOF
+	test_cmp expect escaped/.git/info/sparse-checkout &&
+	check_read_tree_errors escaped "a zbad\\dir zdoes*exist" &&
+	git -C escaped ls-tree -d --name-only HEAD | git -C escaped sparse-checkout set --stdin &&
+	cat >expect <<-\EOF &&
+	/*
+	!/*/
+	/deep/
+	/folder1/
+	/folder2/
+	/zbad\\dir/
 	/zdoes\*exist/
 	EOF
-	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
+	test_cmp expect escaped/.git/info/sparse-checkout &&
+	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist"
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 11/12] sparse-checkout: use C-style quotes in 'list' subcommand
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 10/12] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  2020-01-24 21:19   ` [PATCH v2 12/12] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When in cone mode, the 'git sparse-checkout list' subcommand lists
the directories included in the sparse cone. When these directories
contain odd characters, such as a backslash, then we need to use
C-style quotes similar to 'git ls-tree'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/sparse-checkout.c          | 7 +++++--
 t/t1091-sparse-checkout-builtin.sh | 7 +++++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 61d2c30036..83c8e9bb0c 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -13,6 +13,7 @@
 #include "resolve-undo.h"
 #include "unpack-trees.h"
 #include "wt-status.h"
+#include "quote.h"
 
 static const char *empty_base = "";
 
@@ -77,8 +78,10 @@ static int sparse_checkout_list(int argc, const char **argv)
 
 		string_list_sort(&sl);
 
-		for (i = 0; i < sl.nr; i++)
-			printf("%s\n", sl.items[i].string);
+		for (i = 0; i < sl.nr; i++) {
+			quote_c_style(sl.items[i].string, NULL, stdout, 0);
+			printf("\n");
+		}
 
 		return 0;
 	}
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 2bb30cbe29..16dd924291 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -392,7 +392,8 @@ test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' '
 	EOF
 	test_cmp expect escaped/.git/info/sparse-checkout &&
 	check_read_tree_errors escaped "a zbad\\dir zdoes*exist" &&
-	git -C escaped ls-tree -d --name-only HEAD | git -C escaped sparse-checkout set --stdin &&
+	git -C escaped ls-tree -d --name-only HEAD >list-expect &&
+	git -C escaped sparse-checkout set --stdin <list-expect &&
 	cat >expect <<-\EOF &&
 	/*
 	!/*/
@@ -403,7 +404,9 @@ test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' '
 	/zdoes\*exist/
 	EOF
 	test_cmp expect escaped/.git/info/sparse-checkout &&
-	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist"
+	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist" &&
+	git -C escaped sparse-checkout list >list-actual &&
+	test_cmp list-expect list-actual
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 12/12] sparse-checkout: improve docs around 'set' in cone mode
  2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
                     ` (10 preceding siblings ...)
  2020-01-24 21:19   ` [PATCH v2 11/12] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
@ 2020-01-24 21:19   ` Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 38+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-01-24 21:19 UTC (permalink / raw)
  To: git; +Cc: me, peff, newren, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The existing documentation does not clarify how the 'set' subcommand
changes when core.sparseCheckoutCone is enabled. Correct this by
changing some language around the "A/B/C" example. Also include a
description of the input format matching the output of 'git ls-tree
--name-only'.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-sparse-checkout.txt | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 4834fb434d..0914619881 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -50,6 +50,14 @@ To avoid interfering with other worktrees, it first enables the
 +
 When the `--stdin` option is provided, the patterns are read from
 standard in as a newline-delimited list instead of from the arguments.
++
+When `core.sparseCheckoutCone` is enabled, the input list is considered a
+list of directories instead of sparse-checkout patterns. The command writes
+patterns to the sparse-checkout file to include all files contained in those
+directories (recursively) as well as files that are siblings of ancestor
+directories. The input format matches the output of `git ls-tree --name-only`.
+This includes interpreting pathnames that begin with a double quote (") as
+C-style quoted strings.
 
 'disable'::
 	Disable the `core.sparseCheckout` config setting, and restore the
@@ -128,9 +136,12 @@ the following patterns:
 ----------------
 
 This says "include everything in root, but nothing two levels below root."
-If we then add the folder `A/B/C` as a recursive pattern, the folders `A` and
-`A/B` are added as parent patterns. The resulting sparse-checkout file is
-now
+
+When in cone mode, the `git sparse-checkout set` subcommand takes a list of
+directories instead of a list of sparse-checkout patterns. In this mode,
+the command `git sparse-checkout set A/B/C` sets the directory `A/B/C` as
+a recursive pattern, the directories `A` and `A/B` are added as parent
+patterns. The resulting sparse-checkout file is now
 
 ----------------
 /*
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode
  2020-01-24 21:10         ` Derrick Stolee
@ 2020-01-24 21:42           ` Jeff King
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2020-01-24 21:42 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, newren, Derrick Stolee

On Fri, Jan 24, 2020 at 04:10:21PM -0500, Derrick Stolee wrote:

> Hm. Perhaps you are right! The ls-tree output for the test example
> is:
> 
> 	deep
> 	folder1
> 	folder2
> 	"zbad\\dir"
> 	zdoes*exist
> 
> so the "zdoes*exist" value is not escaped. I believe the current
> logic does something extra: consider supplying this input to
> 'git sparse-checkout set --stdin':
> 
> 	deep
> 	folder1
> 	folder2
> 	"zbad\\dir"
> 	zdoes\*exist
> 
> then should we un-escape "\*" to "*"? Or is this not a valid input
> because a backslash should have been quoted into C-style quotes?

I'd think we should not un-escape anything, because we weren't told that
this was a C-style quoted string by the presence of an opening
double-quote. And that's how, say, update-index behaves:

  $ blob=$(echo foo | git hash-object -w --stdin)
  $ printf '100644 %s\t%s\n' \
      $blob 'just*asterisk' \
      $blob 'backslash\without\quotes' \
      $blob '"backslash\\with\\quotes"' |
    git update-index --index-info

which results in:

  $ git ls-files
  "backslash\\with\\quotes"
  "backslash\\without\\quotes"
  just*asterisk

  [same, but without quoting]
  $ git ls-files -z | tr '\0' '\n'
  backslash\with\quotes
  backslash\without\quotes
  just*asterisk

> The behavior in the current series allows this output that would
> never be written by "git ls-tree".

Yes, I think we'd never write that, because ls-tree would quote anything
with a backslash in it, even though it's not strictly necessary. But it
would be valid input to specify a file that has backslashes but not
double-quotes, and I think sparse-checkout should be changed to match
update-index here.

> I was playing around with this, and I think that quote_c_style() is
> necessary for the output, but we have a strange in-memory situation
> for the other escaping: we both fill the hashsets with the un-escaped
> data and fill the pattern list with the escaped patterns.

Yeah, but I think that the syntactic escaping on input might not have
identical rules to the escaping needed for the patterns.

So it makes sense to me to handle input as a separate mechanism, get a
pristine copy of what the user was trying to communicate to us, and then
re-escape whatever we need to put into the pattern list.

And ultimately the flow would be something like:

  - read input
    - if argument is from command-line, use it verbatim
    - else if reading stdin with "-z", use it verbatim
    - else if line starts with double-quote, unquote_c_style()
    - else use line verbatim
    - the result is a single pristine filename
  - fill hashset with pristine filenames
  - generate pattern list to write to sparse file, escaping filenames as
    necessary according to sparse-pattern rules

Obviously you don't have a "-z" yet, but I think it's something we'd
probably want in the long run. And anything coming from the command-line
shouldn't need quoting to get it to us either (and so we'd need to
escape before writing to the sparse file).

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, back to index

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
2020-01-14 19:25 ` [PATCH 1/8] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-16 21:40   ` Junio C Hamano
2020-01-14 19:25 ` [PATCH 2/8] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-16 21:46   ` Junio C Hamano
2020-01-14 19:25 ` [PATCH 3/8] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-14 19:30   ` Taylor Blau
2020-01-14 19:25 ` [PATCH 4/8] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-14 21:16   ` Jeff King
2020-01-14 19:25 ` [PATCH 5/8] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-14 19:26 ` [PATCH 6/8] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-14 19:26 ` [PATCH 7/8] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-14 21:21   ` Jeff King
2020-01-14 22:08     ` Derrick Stolee
2020-01-14 19:26 ` [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-14 21:25   ` Jeff King
2020-01-14 22:11     ` Derrick Stolee
2020-01-14 22:48       ` Jeff King
2020-01-24 21:10         ` Derrick Stolee
2020-01-24 21:42           ` Jeff King
2020-01-14 19:34 ` [PATCH 0/8] Harden the sparse-checkout builtin Taylor Blau
2020-01-14 19:44   ` Derrick Stolee
2020-01-14 21:31     ` Jeff King
2020-01-15 19:16 ` Junio C Hamano
2020-01-15 20:32   ` Derrick Stolee
2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 02/12] t1091: improve here-docs Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 03/12] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 04/12] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 06/12] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 07/12] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 08/12] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 09/12] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 10/12] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 11/12] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 12/12] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git