git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: newren@gmail.com, gitster@pobox.com, matheus.bernardino@usp.br,
	stolee@gmail.com, vdye@github.com,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Derrick Stolee" <derrickstolee@github.com>
Subject: [PATCH v4 00/13] Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior
Date: Fri, 24 Sep 2021 15:39:01 +0000	[thread overview]
Message-ID: <pull.1018.v4.git.1632497954.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1018.v3.git.1632159937.gitgitgadget@gmail.com>

This series is based on ds/mergies-with-sparse-index.

As requested, this series looks to update the behavior of git add, git rm,
and git mv when they attempt to modify paths outside of the sparse-checkout
cone. In particular, this care is expanded to not just cache entries with
the SKIP_WORKTREE bit, but also paths that do not match the sparse-checkout
definition.

This means that commands that worked before this series can now fail. In
particular, if 'git merge' results in a conflict outside of the
sparse-checkout cone, then 'git add ' will now fail.

In order to allow users to circumvent these protections, a new '--sparse'
option is added that ignores the sparse-checkout patterns and the
SKIP_WORKTREE bit. The message for advice.updateSparsePath is adjusted to
assist with discovery of this option.

There is a subtle issue with git mv in that it does not check the index
until it discovers a directory and then uses the index to find the contained
entries. This means that in non-cone-mode patterns, a pattern such as
"sub/dir" will not match the path "sub" and this can cause an issue.

In order to allow for checking arbitrary paths against the sparse-checkout
patterns, some changes to the underlying pattern matching code is required.
It turns out that there are some bugs in the methods as advertised, but
these bugs were never discovered because of the way methods like
unpack_trees() will check a directory for a pattern match before checking
its contained paths. Our new "check patterns on-demand" approach pokes holes
in that approach, specifically with patterns that match entire directories.


Updates in v4
=============

 * Instead of using 'git status' and 'grep' to detect staged changes, we use
   'git diff --staged'. t1092 uses an additional --diff-filter because it
   tests with merge conflicts, so it needs this extra flag.

 * Patches 3 and 4 are merged into the new patch 3 to avoid temporarily
   having a poorly named method.


Updates in v3
=============

 * Fixed an incorrectly-squashed commit. Spread out some changes in a better
   way. For example, I don't add --sparse to tests before introducing the
   option.

 * Use a NULL struct strbuf pointer to indicate an uninitialized value
   instead of relying on an internal member.

 * Use grep over test_i18ngrep.

 * Fixed line wrapping for error messages.

 * Use strbuf_setlen() over modifying the len member manually.


Updates in v2
=============

 * I got no complaints about these restrictions, so this is now a full
   series, not RFC.

 * Thanks to Matheus, several holes are filled with extra testing and
   bugfixes.

 * New patches add --chmod and --renormalize improvements. These are added
   after the --sparse option to make them be one change each.

Thanks, -Stolee

Derrick Stolee (13):
  t3705: test that 'sparse_entry' is unstaged
  t1092: behavior for adding sparse files
  dir: select directories correctly
  dir: fix pattern matching on dirs
  add: fail when adding an untracked sparse file
  add: skip tracked paths outside sparse-checkout cone
  add: implement the --sparse option
  add: update --chmod to skip sparse paths
  add: update --renormalize to skip sparse paths
  rm: add --sparse option
  rm: skip sparse paths with missing SKIP_WORKTREE
  mv: refuse to move sparse paths
  advice: update message to suggest '--sparse'

 Documentation/git-add.txt                |   9 +-
 Documentation/git-rm.txt                 |   6 +
 advice.c                                 |  11 +-
 builtin/add.c                            |  32 +++-
 builtin/mv.c                             |  52 +++++--
 builtin/rm.c                             |  10 +-
 dir.c                                    |  56 ++++++-
 pathspec.c                               |   5 +-
 t/t1091-sparse-checkout-builtin.sh       |   4 +-
 t/t1092-sparse-checkout-compatibility.sh |  75 +++++++--
 t/t3602-rm-sparse-checkout.sh            |  40 ++++-
 t/t3705-add-sparse-checkout.sh           |  68 +++++++-
 t/t7002-mv-sparse-checkout.sh            | 189 +++++++++++++++++++++++
 13 files changed, 505 insertions(+), 52 deletions(-)
 create mode 100755 t/t7002-mv-sparse-checkout.sh


base-commit: 516680ba7704c473bb21628aa19cabbd787df4db
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1018%2Fderrickstolee%2Fsparse-index%2Fadd-rm-mv-behavior-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1018/derrickstolee/sparse-index/add-rm-mv-behavior-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1018

Range-diff vs v3:

  1:  ea940f10a7c !  1:  642b05fc020 t3705: test that 'sparse_entry' is unstaged
     @@ t/t3705-add-sparse-checkout.sh: setup_gitignore () {
       }
       
      +test_sparse_entry_unstaged () {
     -+	git status --porcelain >actual &&
     -+	! grep "^[MDARCU][M ] sparse_entry\$" actual
     ++	git diff --staged -- sparse_entry >diff &&
     ++	test_must_be_empty diff
      +}
      +
       test_expect_success 'setup' "
  2:  c7dedb41291 !  2:  58389edc76c t1092: behavior for adding sparse files
     @@ t/t1092-sparse-checkout-compatibility.sh: test_sparse_match () {
      +	file=$1 &&
      +	for repo in sparse-checkout sparse-index
      +	do
     -+		git -C $repo status --porcelain >$repo-out &&
     -+		! grep "^A  $file\$" $repo-out &&
     -+		! grep "^M  $file\$" $repo-out || return 1
     ++		# Skip "unmerged" paths
     ++		git -C $repo diff --staged --diff-filter=ACDMRTXB -- "$file" >diff &&
     ++		test_must_be_empty diff || return 1
      +	done
      +}
      +
  3:  b1f6468f9cd <  -:  ----------- dir: extract directory-matching logic
  4:  0252c7ee15c !  3:  2ebaf8e68c2 dir: select directories correctly
     @@ Commit message
          contained files. However, other commands will start matching individual
          files against pattern lists without that recursive approach.
      
     -    We modify path_matches_dir_pattern() to take a strbuf pointer
     -    'path_parent' that is used to store the parent directory of 'pathname'
     -    between multiple pattern matching tests. This is loaded lazily, only on
     -    the first pattern it finds that has the PATTERN_FLAG_MUSTBEDIR flag.
     +    The last_matching_pattern_from_list() logic performs some checks on the
     +    filetype of a path within the index when the PATTERN_FLAG_MUSTBEDIR flag
     +    is set. This works great when setting SKIP_WORKTREE bits within
     +    unpack_trees(), but doesn't work well when passing an arbitrary path
     +    such as a file within a matching directory.
     +
     +    We extract the logic around determining the file type, but attempt to
     +    avoid checking the filesystem if the parent directory already matches
     +    the sparse-checkout patterns. The new path_matches_dir_pattern() method
     +    includes a 'path_parent' parameter that is used to store the parent
     +    directory of 'pathname' between multiple pattern matching tests. This is
     +    loaded lazily, only on the first pattern it finds that has the
     +    PATTERN_FLAG_MUSTBEDIR flag.
      
          If we find that a path has a parent directory, we start by checking to
          see if that parent directory matches the pattern. If so, then we do not
     @@ Commit message
      
       ## dir.c ##
      @@ dir.c: int match_pathname(const char *pathname, int pathlen,
     + 				 WM_PATHNAME) == 0;
     + }
       
     - static int path_matches_dir_pattern(const char *pathname,
     - 				    int pathlen,
     ++static int path_matches_dir_pattern(const char *pathname,
     ++				    int pathlen,
      +				    struct strbuf **path_parent,
     - 				    int *dtype,
     - 				    struct path_pattern *pattern,
     - 				    struct index_state *istate)
     - {
     ++				    int *dtype,
     ++				    struct path_pattern *pattern,
     ++				    struct index_state *istate)
     ++{
      +	if (!*path_parent) {
      +		char *slash;
      +		CALLOC_ARRAY(*path_parent, 1);
     @@ dir.c: int match_pathname(const char *pathname, int pathlen,
      +			   pattern->patternlen, pattern->flags))
      +		return 1;
      +
     - 	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
     - 	if (*dtype != DT_DIR)
     - 		return 0;
     ++	*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
     ++	if (*dtype != DT_DIR)
     ++		return 0;
     ++
     ++	return 1;
     ++}
     ++
     + /*
     +  * Scan the given exclude list in reverse to see whether pathname
     +  * should be ignored.  The first match (i.e. the last on the list), if
      @@ dir.c: static struct path_pattern *last_matching_pattern_from_list(const char *pathname
       {
       	struct path_pattern *res = NULL; /* undecided */
     @@ dir.c: static struct path_pattern *last_matching_pattern_from_list(const char *p
       		const char *exclude = pattern->pattern;
       		int prefix = pattern->nowildcardlen;
       
     --		if ((pattern->flags & PATTERN_FLAG_MUSTBEDIR) &&
     --		    !path_matches_dir_pattern(pathname, pathlen,
     +-		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR) {
     +-			*dtype = resolve_dtype(*dtype, istate, pathname, pathlen);
     +-			if (*dtype != DT_DIR)
     +-				continue;
     +-		}
      +		if (pattern->flags & PATTERN_FLAG_MUSTBEDIR &&
      +		    !path_matches_dir_pattern(pathname, pathlen, &path_parent,
     - 					      dtype, pattern, istate))
     - 			continue;
     ++					      dtype, pattern, istate))
     ++			continue;
       
     + 		if (pattern->flags & PATTERN_FLAG_NODIR) {
     + 			if (match_basename(basename,
      @@ dir.c: static struct path_pattern *last_matching_pattern_from_list(const char *pathname
       			break;
       		}
  5:  c6d17df5e5d =  4:  24bffdab139 dir: fix pattern matching on dirs
  6:  3dd1d6c228c =  5:  e3a749e3182 add: fail when adding an untracked sparse file
  7:  15039e031e5 =  6:  2c5c834bc9f add: skip tracked paths outside sparse-checkout cone
  8:  6014ac8ab9e =  7:  430ab44e4f1 add: implement the --sparse option
  9:  2bd3448be5f =  8:  4f7b5cdfa36 add: update --chmod to skip sparse paths
 10:  131beda1bc3 =  9:  30ec6096939 add: update --renormalize to skip sparse paths
 11:  837a9314893 = 10:  99d50921ef4 rm: add --sparse option
 12:  cc25ce17162 = 11:  47a1444115b rm: skip sparse paths with missing SKIP_WORKTREE
 13:  63a9cd80ade = 12:  28e703d80d3 mv: refuse to move sparse paths
 14:  79a3518dc15 = 13:  9fbc88ee0da advice: update message to suggest '--sparse'

-- 
gitgitgadget

  parent reply	other threads:[~2021-09-24 15:39 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 21:54 [PATCH 00/13] [RFC] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 01/13] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 02/13] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 03/13] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-24  7:44   ` René Scharfe
2021-08-24 21:54 ` [PATCH 04/13] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 05/13] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-08-27 21:06   ` Matheus Tavares Bernardino
2021-08-27 22:50     ` Matheus Tavares Bernardino
2021-09-08 17:54     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 06/13] add: skip paths that are outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-08-27 21:13   ` Matheus Tavares
2021-09-08 19:46     ` Derrick Stolee
2021-09-08 20:02       ` Derrick Stolee
2021-09-08 21:06     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 07/13] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-08-27 21:14   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 08/13] add: prevent adding sparse conflict files Derrick Stolee via GitGitGadget
2021-08-27 21:16   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 09/13] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-08-27 21:17   ` Matheus Tavares Bernardino
2021-09-08 18:04     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 10/13] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-08-27 21:18   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 11/13] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-08-27 21:20   ` Matheus Tavares Bernardino
2021-08-27 23:44     ` Matheus Tavares Bernardino
2021-09-08 18:41     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 12/13] mv: add '--sparse' option to ignore sparse-checkout Derrick Stolee via GitGitGadget
2021-08-28 14:18   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 13/13] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 01/14] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-15  5:22     ` Elijah Newren
2021-09-15 16:17       ` Derrick Stolee
2021-09-15 16:32     ` Matheus Tavares
2021-09-15 16:42       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 02/14] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-12 22:17     ` Ævar Arnfjörð Bjarmason
2021-09-13 15:02       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 03/14] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 04/14] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-12 22:21     ` Ævar Arnfjörð Bjarmason
2021-09-15 14:41       ` Derrick Stolee
2021-09-15 14:54     ` Elijah Newren
2021-09-15 16:43       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 05/14] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 06/14] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 07/14] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 08/14] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-15 16:59     ` Elijah Newren
2021-09-20 15:45       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 09/14] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 10/14] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 11/14] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 12/14] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 13/14] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 14/14] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-12 21:58     ` Ævar Arnfjörð Bjarmason
2021-09-15 16:54       ` Derrick Stolee
2021-09-15 20:18   ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Elijah Newren
2021-09-20 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 01/14] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-22 22:52       ` Junio C Hamano
2021-09-20 17:45     ` [PATCH v3 02/14] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-22 23:06       ` Junio C Hamano
2021-09-23 13:37         ` Derrick Stolee
2021-09-20 17:45     ` [PATCH v3 03/14] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-09-22 23:13       ` Junio C Hamano
2021-09-23 13:39         ` Derrick Stolee
2021-09-23 13:42           ` Derrick Stolee
2021-09-23 18:23             ` Junio C Hamano
2021-09-24 13:29               ` Derrick Stolee
2021-09-20 17:45     ` [PATCH v3 04/14] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 05/14] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 06/14] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 07/14] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 08/14] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 09/14] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 10/14] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 11/14] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 12/14] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 13/14] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 14/14] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-24  6:08     ` [PATCH v3 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Elijah Newren
2021-09-24 15:39     ` Derrick Stolee via GitGitGadget [this message]
2021-09-24 15:39       ` [PATCH v4 01/13] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 02/13] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 03/13] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 04/13] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-11-02  0:15         ` Glen Choo
2021-11-02  0:34           ` Junio C Hamano
2021-11-02 13:42             ` Derrick Stolee
2021-11-02 14:50               ` Derrick Stolee
2021-11-02 15:33                 ` Ævar Arnfjörð Bjarmason
2021-11-03 14:40                   ` Derrick Stolee
2021-11-03 17:14                     ` Junio C Hamano
2021-09-24 15:39       ` [PATCH v4 05/13] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 06/13] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 07/13] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 08/13] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 09/13] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 10/13] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 11/13] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 12/13] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 13/13] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-27 15:51       ` [PATCH v4 00/13] Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior Elijah Newren
2021-09-27 20:51         ` Junio C Hamano
2021-10-18 21:28   ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Sean Christopherson
2021-10-19 12:29     ` Derrick Stolee
2021-10-19 16:50       ` Sean Christopherson
2021-10-20 13:28         ` Junio C Hamano
2021-10-20 14:28           ` Sean Christopherson
2021-10-22  2:28     ` [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse Matheus Tavares
2021-10-22  4:03       ` Matheus Tavares
2021-10-25 16:40       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1018.v4.git.1632497954.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).