git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, peff@peff.net, newren@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v4 00/15] Harden the sparse-checkout builtin
Date: Fri, 31 Jan 2020 20:16:00 +0000	[thread overview]
Message-ID: <pull.513.v4.git.1580501775.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.513.v3.git.1580236003.gitgitgadget@gmail.com>

This series is based on ds/sparse-list-in-cone-mode.

This series attempts to clean up some rough edges in the sparse-checkout
feature, especially around the cone mode.

Unfortunately, after the v2.25.0 release, we noticed an issue with the "git
clone --sparse" option when using a URL instead of a local path. This is
fixed and properly tested here.

Also, let's improve Git's response to these more complicated scenarios:

 1. Running "git sparse-checkout init" in a worktree would complain because
    the "info" dir doesn't exist.
 2. Tracked paths that include "*" and "\" in their filenames.
 3. If a user edits the sparse-checkout file to have non-cone pattern, such
    as "**" anywhere or "*" in the wrong place, then we should respond
    appropriately. That is: warn that the patterns are not cone-mode, then
    revert to the old logic.

Updates in V2:

 * Added C-style quoting to the output of "git sparse-checkout list" in cone
   mode.
 * Improved documentation.
 * Responded to most style feedback. Hopefully I didn't miss anything.
 * I was lingering on this a little to see if I could also fix the issue
   raised in [1], but I have not figured that one out, yet.

Update in V3:

 * Input now uses Peff's recommended pattern: unquote C-style strings over
   stdin and otherwise do not un-escape input.

[1] 
https://lore.kernel.org/git/062301d5d0bc$c3e17760$4ba46620$@Frontier.com/

Thanks, -Stolee

Derrick Stolee (14):
  t1091: use check_files to reduce boilerplate
  t1091: improve here-docs
  sparse-checkout: create leading directories
  clone: fix --sparse option with URLs
  sparse-checkout: cone mode does not recognize "**"
  sparse-checkout: detect short patterns
  sparse-checkout: warn on globs in cone patterns
  sparse-checkout: properly match escaped characters
  sparse-checkout: write escaped patterns in cone mode
  sparse-checkout: unquote C-style strings over --stdin
  sparse-checkout: use C-style quotes in 'list' subcommand
  sparse-checkout: escape all glob characters on write
  sparse-checkout: improve docs around 'set' in cone mode
  sparse-checkout: fix cone mode behavior mismatch

Jeff King (1):
  sparse-checkout: fix documentation typo for core.sparseCheckoutCone

 Documentation/git-sparse-checkout.txt |  19 +-
 builtin/clone.c                       |   2 +-
 builtin/sparse-checkout.c             |  48 +++-
 dir.c                                 |  79 +++++-
 t/t1091-sparse-checkout-builtin.sh    | 352 +++++++++++++++-----------
 unpack-trees.c                        |   2 +-
 6 files changed, 346 insertions(+), 156 deletions(-)


base-commit: 4fd683b6a35eabd23dd5183da7f654a1e1f00325
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-513%2Fderrickstolee%2Fsparse-harden-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-513/derrickstolee/sparse-harden-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/513

Range-diff vs v3:

  1:  1cc825412f =  1:  1cc825412f t1091: use check_files to reduce boilerplate
  2:  b7a6ad145a =  2:  b7a6ad145a t1091: improve here-docs
  3:  5497ad8778 =  3:  5497ad8778 sparse-checkout: create leading directories
  4:  4991a51f6d =  4:  4991a51f6d clone: fix --sparse option with URLs
  5:  ae78c3069b =  5:  ae78c3069b sparse-checkout: fix documentation typo for core.sparseCheckoutCone
  6:  2ad4d3e467 =  6:  2ad4d3e467 sparse-checkout: cone mode does not recognize "**"
  7:  aace064510 =  7:  aace064510 sparse-checkout: detect short patterns
  8:  d2a510a3bb !  8:  66caabef5f sparse-checkout: warn on incorrect '*' in patterns
     @@ -1,6 +1,6 @@
      Author: Derrick Stolee <dstolee@microsoft.com>
      
     -    sparse-checkout: warn on incorrect '*' in patterns
     +    sparse-checkout: warn on globs in cone patterns
      
          In cone mode, the sparse-checkout commmand will write patterns that
          allow faster pattern matching. This matching only works if the patterns
     @@ -9,10 +9,10 @@
          cone mode matching to fail.
      
          The cone mode patterns may end in "/*" but otherwise an un-escaped
     -    asterisk is invalid. Add checks to disable cone mode when seeing these
     -    values.
     +    asterisk or other glob character is invalid. Add checks to disable
     +    cone mode when seeing these values.
      
     -    A later change will properly handle escaped asterisks.
     +    A later change will properly handle escaped globs.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -43,16 +43,23 @@
      +	next = given->pattern + 2;
      +
      +	while (*cur) {
     -+		/* We care about *cur == '*' */
     -+		if (*cur != '*')
     ++		/* Watch for glob characters '*', '\', '[', '?' */
     ++		if (!is_glob_special(*cur))
      +			goto increment;
      +
      +		/* But only if *prev != '\\' */
      +		if (*prev == '\\')
      +			goto increment;
      +
     ++		/* But allow the initial '\' */
     ++		if (*cur == '\\' &&
     ++		    is_glob_special(*next))
     ++			goto increment;
     ++
      +		/* But a trailing '/' then '*' is fine */
     -+		if (*prev == '/' && *next == 0)
     ++		if (*prev == '/' &&
     ++		    *cur == '*' &&
     ++		    *next == 0)
      +			goto increment;
      +
      +		/* Not a cone pattern. */
     @@ -94,6 +101,18 @@
      +	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
      +'
      +
     ++test_expect_success 'pattern-checks: contained glob characters' '
     ++	for c in "[a]" "\\" "?" "*"
     ++	do
     ++		cat >repo/.git/info/sparse-checkout <<-EOF &&
     ++		/*
     ++		!/*/
     ++		something$c-else/
     ++		EOF
     ++		check_read_tree_errors repo "a" "disabling cone pattern matching"
     ++	done
     ++'
     ++
      +test_expect_success 'pattern-checks: escaped "*"' '
      +	cat >repo/.git/info/sparse-checkout <<-\EOF &&
      +	/*
  9:  9ea69e9069 !  9:  4c86d01f0e sparse-checkout: properly match escaped characters
     @@ -23,6 +23,7 @@
      +static char *dup_and_filter_pattern(const char *pattern)
      +{
      +	char *set, *read;
     ++	size_t count  = 0;
      +	char *result = xstrdup(pattern);
      +
      +	set = result;
     @@ -37,11 +38,14 @@
      +
      +		set++;
      +		read++;
     ++		count++;
      +	}
      +	*set = 0;
      +
     -+	if (*(read - 2) == '/' && *(read - 1) == '*')
     -+		*(read - 2) = 0;
     ++	if (count > 2 &&
     ++	    *(set - 1) == '*' &&
     ++	    *(set - 2) == '/')
     ++		*(set - 2) = 0;
      +
      +	return result;
      +}
     @@ -73,7 +77,7 @@
       --- a/t/t1091-sparse-checkout-builtin.sh
       +++ b/t/t1091-sparse-checkout-builtin.sh
      @@
     - 	check_read_tree_errors repo "a deep" "disabling cone pattern matching"
     + 	done
       '
       
      -test_expect_success 'pattern-checks: escaped "*"' '
     @@ -96,6 +100,7 @@
       	!/*/
      -	/does\*not\*exist/
      +	/zbad\\dir/
     ++	!/zbad\\dir/*/
      +	/zdoes\*not\*exist/
      +	/zdoes\*exist/
       	EOF
 10:  e2f9afc70c ! 10:  0b9346f67b sparse-checkout: write escaped patterns in cone mode
     @@ -9,26 +9,12 @@
          However, in cone mode we expect a list of paths to directories and then
          we convert those into patterns.
      
     -    Even more specifically, the goal is to always allow the following from
     -    the root of a repo:
     -
     -      git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
     -
     -    The ls-tree command provides directory names with an unescaped asterisk.
     -    It also quotes the directories that contain an escaped backslash. We
     -    must remove these quotes, then keep the escaped backslashes.
     -
          However, there is some care needed for the timing of these escapes. The
          in-memory pattern list is used to update the working directory before
          writing the patterns to disk. Thus, we need the command to have the
          unescaped names in the hashsets for the cone comparisons, then escape
          the patterns later.
      
     -    Use unquote_c_style() when parsing lines from stdin. Command-line
     -    arguments will be parsed as-is, assuming the user can do the correct
     -    level of escaping from their environment to match the exact directory
     -    names.
     -
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
     @@ -89,29 +75,6 @@
       	}
       }
       
     -@@
     - 		pl.use_cone_patterns = 1;
     - 
     - 		if (set_opts.use_stdin) {
     --			while (!strbuf_getline(&line, stdin))
     -+			struct strbuf unquoted = STRBUF_INIT;
     -+			while (!strbuf_getline(&line, stdin)) {
     -+				if (line.buf[0] == '"') {
     -+					strbuf_setlen(&unquoted, 0);
     -+					if (unquote_c_style(&unquoted, line.buf, NULL))
     -+						die(_("unable to unquote C-style string '%s'"),
     -+						line.buf);
     -+
     -+					strbuf_swap(&unquoted, &line);
     -+				}
     -+
     - 				strbuf_to_cone_pattern(&line, &pl);
     -+			}
     -+
     -+			strbuf_release(&unquoted);
     - 		} else {
     - 			for (i = 0; i < argc; i++) {
     - 				strbuf_setlen(&line, 0);
      
       diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
       --- a/t/t1091-sparse-checkout-builtin.sh
     @@ -131,29 +94,18 @@
       	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist" &&
       	git -C escaped sparse-checkout init --cone &&
      -	cat >escaped/.git/info/sparse-checkout <<-\EOF &&
     -+	git -C escaped sparse-checkout set zbad\\dir "zdoes*not*exist" "zdoes*exist" &&
     ++	git -C escaped sparse-checkout set zbad\\dir/bogus "zdoes*not*exist" "zdoes*exist" &&
      +	cat >expect <<-\EOF &&
       	/*
       	!/*/
       	/zbad\\dir/
     -+	/zdoes\*exist/
     - 	/zdoes\*not\*exist/
     -+	EOF
     -+	test_cmp expect escaped/.git/info/sparse-checkout &&
     -+	check_read_tree_errors escaped "a zbad\\dir zdoes*exist" &&
     -+	git -C escaped ls-tree -d --name-only HEAD | git -C escaped sparse-checkout set --stdin &&
     -+	cat >expect <<-\EOF &&
     -+	/*
     -+	!/*/
     -+	/deep/
     -+	/folder1/
     -+	/folder2/
     -+	/zbad\\dir/
     + 	!/zbad\\dir/*/
     +-	/zdoes\*not\*exist/
     ++	/zbad\\dir/bogus/
       	/zdoes\*exist/
     ++	/zdoes\*not\*exist/
       	EOF
     --	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
      +	test_cmp expect escaped/.git/info/sparse-checkout &&
     -+	check_files escaped "a deep folder1 folder2 zbad\\dir zdoes*exist"
     + 	check_read_tree_errors escaped "a zbad\\dir zdoes*exist"
       '
       
     - test_done
  -:  ---------- > 11:  9f682e6076 sparse-checkout: unquote C-style strings over --stdin
 11:  ec714a4cf0 = 12:  e2c6f85617 sparse-checkout: use C-style quotes in 'list' subcommand
  -:  ---------- > 13:  54be8e89eb sparse-checkout: escape all glob characters on write
 12:  1867746d97 = 14:  3dd8f97b3a sparse-checkout: improve docs around 'set' in cone mode
  -:  ---------- > 15:  5e9fcce75f sparse-checkout: fix cone mode behavior mismatch

-- 
gitgitgadget

  parent reply	other threads:[~2020-01-31 20:16 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 19:25 [PATCH 0/8] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
2020-01-14 19:25 ` [PATCH 1/8] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-16 21:40   ` Junio C Hamano
2020-01-14 19:25 ` [PATCH 2/8] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-16 21:46   ` Junio C Hamano
2020-01-14 19:25 ` [PATCH 3/8] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-14 19:30   ` Taylor Blau
2020-01-14 19:25 ` [PATCH 4/8] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-14 21:16   ` Jeff King
2020-01-14 19:25 ` [PATCH 5/8] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-14 19:26 ` [PATCH 6/8] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-14 19:26 ` [PATCH 7/8] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-14 21:21   ` Jeff King
2020-01-14 22:08     ` Derrick Stolee
2020-01-14 19:26 ` [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-14 21:25   ` Jeff King
2020-01-14 22:11     ` Derrick Stolee
2020-01-14 22:48       ` Jeff King
2020-01-24 21:10         ` Derrick Stolee
2020-01-24 21:42           ` Jeff King
2020-01-28 15:03             ` Derrick Stolee
2020-01-14 19:34 ` [PATCH 0/8] Harden the sparse-checkout builtin Taylor Blau
2020-01-14 19:44   ` Derrick Stolee
2020-01-14 21:31     ` Jeff King
2020-01-15 19:16 ` Junio C Hamano
2020-01-15 20:32   ` Derrick Stolee
2020-01-24 21:19 ` [PATCH v2 00/12] " Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 02/12] t1091: improve here-docs Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 03/12] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 04/12] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 06/12] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 07/12] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 08/12] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 09/12] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 10/12] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 11/12] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
2020-01-24 21:19   ` [PATCH v2 12/12] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
2020-01-28 18:26   ` [PATCH v3 00/12] Harden the sparse-checkout builtin Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 01/12] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 02/12] t1091: improve here-docs Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 03/12] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 04/12] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 05/12] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 06/12] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 07/12] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 08/12] sparse-checkout: warn on incorrect '*' in patterns Derrick Stolee via GitGitGadget
2020-01-28 18:26     ` [PATCH v3 09/12] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-29 10:03       ` Jeff King
2020-01-29 13:58         ` Derrick Stolee
2020-01-29 14:04           ` Derrick Stolee
2020-01-28 18:26     ` [PATCH v3 10/12] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-29 10:17       ` Jeff King
2020-01-29 10:33         ` Jeff King
2020-01-29 14:16           ` Derrick Stolee
2020-01-29 14:39             ` Derrick Stolee
2020-01-30  7:29             ` Jeff King
2020-01-30 15:01               ` Derrick Stolee
2020-01-28 18:26     ` [PATCH v3 11/12] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
2020-01-29 10:23       ` Jeff King
2020-01-28 18:26     ` [PATCH v3 12/12] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
2020-01-31 20:16     ` Derrick Stolee via GitGitGadget [this message]
2020-01-31 20:16       ` [PATCH v4 01/15] t1091: use check_files to reduce boilerplate Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 02/15] t1091: improve here-docs Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 03/15] sparse-checkout: create leading directories Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 04/15] clone: fix --sparse option with URLs Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 05/15] sparse-checkout: fix documentation typo for core.sparseCheckoutCone Jeff King via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 06/15] sparse-checkout: cone mode does not recognize "**" Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 07/15] sparse-checkout: detect short patterns Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 08/15] sparse-checkout: warn on globs in cone patterns Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 09/15] sparse-checkout: properly match escaped characters Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 10/15] sparse-checkout: write escaped patterns in cone mode Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 11/15] sparse-checkout: unquote C-style strings over --stdin Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 12/15] sparse-checkout: use C-style quotes in 'list' subcommand Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 13/15] sparse-checkout: escape all glob characters on write Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 14/15] sparse-checkout: improve docs around 'set' in cone mode Derrick Stolee via GitGitGadget
2020-01-31 20:16       ` [PATCH v4 15/15] sparse-checkout: fix cone mode behavior mismatch Derrick Stolee via GitGitGadget
2020-01-31 20:36       ` [PATCH v4 00/15] Harden the sparse-checkout builtin Elijah Newren
2020-02-03 14:09         ` Derrick Stolee
2020-02-08 23:32           ` Taylor Blau
2020-02-09 17:27             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.513.v4.git.1580501775.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).