git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>,
	Sean Christopherson <seanjc@google.com>
Cc: git@vger.kernel.org, newren@gmail.com, gitster@pobox.com,
	vdye@github.com, derrickstolee@github.com,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Subject: Re: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse
Date: Mon, 25 Oct 2021 12:40:17 -0400	[thread overview]
Message-ID: <a11d4fb0-184a-b811-64a0-e29e9cfd91a3@gmail.com> (raw)
In-Reply-To: <80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@usp.br>

On 10/21/2021 10:28 PM, Matheus Tavares wrote:
> On Mon, Oct 18, 2021 at 6:28 PM Sean Christopherson <seanjc@google.com> wrote:
>>
>> $ cat .git/info/sparse-checkout
>> !arch/*
>> !tools/arch/*
>> !virt/kvm/arm/*
>> /*
>> arch/.gitignore
>> arch/Kconfig
>> arch/x86
>> tools/arch/x86
>> tools/include/uapi/linux/kvm.h
>> !Documentation
>> !drivers
>>
>> $ git read-tree -mu HEAD
>>
>> $ rm arch/x86/kvm/x86.c
> [...]
>> $ git add arch/x86
>> The following paths and/or pathspecs matched paths that exist
>> outside of your sparse-checkout definition, so will not be
>> updated in the index:
>> arch/x86
> 
> I think the problem may be that we are performing pattern matching
> slightly different in add, mv, and rm, in comparison to "git
> sparse-checkout". On "git sparse-checkout init" (or reapply), we call
> clear_ce_flags() which calls path_matches_pattern_list() for each
> component of the working tree paths. If the full path gives a match
> result of UNDECIDED, we recursively try to use the match result from
> the parent dir (or NOT_MATCHED if we reach the top with UNDECIDED).

Yes! I think this is absolutely the problem. Thanks for pointing
this out!
 
> In Sean's example, we get UNDECIDED for "arch/x86/kvm/x86.c", but
> "arch/x86" gives MATCHED, so we end up using that for the full path.
> 
> However, in add|mv|rm we only call path_matches_pattern_list() for the
> full path and get UNDECIDED, which we consider the same as NOT_MATCHED,
> and end up disallowing the path update operation with a warning message.
> 
> The commands do work if we replace the sparsity pattern "arch/x86" with
> "arch/x86/" (with a trailing slash), but note that it only works
> because the pattern is relative to the root (see dir.c:1297). If we
> change it to "x86/", it would no longer work.
> 
> So far, the only way I could think of to fix this would be to perform
> pattern matching for the leading components of the paths too. That
> doesn't seem very nice, though, as it can probably be quite expensive...
> But here is a patch for discussion:

I agree that it is expensive, but that's already the case for the
non-cone sparse-checkout patterns. Hopefully it is sufficient that
these cases are restricted to modified files (in the case of `git add .`)
or specific pathspecs (in the case of `git mv` and `git rm`).

> -- >8 --
> Subject: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse dirs
> 
> These three commands recently learned to avoid updating paths that do
> not match the sparse-checkout patterns even if they are missing the
> SKIP_WORKTREE bit. This is done using path_in_sparse_checkout(), which
> tries to match the path with the current set of sparsity rules using
> path_matches_pattern_list(). This is similar to what clear_ce_flags()
> does when we run "git sparse-checkout init" or "git sparse-checkout
> reapply". But note that clear_ce_flags() has a recursive behavior,
> calling path_matches_pattern_list() for each component in a path,
> whereas path_in_sparse_checkout() only calls it for the full path. This
> makes the function miss matches such as the one between path "a/b/c" and
> the pattern "b/". So if the user has the sparsity rules "!/a" and "b/",
> for example, add, rm, and mv will fail to update the path "a/b/c" and
> end up displaying a warning about "a/b/c" being outside the sparse
> checkout even though it isn't. Note that this problem only occurs with
> non-cone mode.
> 
> Fix this by making path_in_sparse_checkout() perform pattern matching
> for every component in the given path when cone mode is disabled. (This
> can be expensive, and we might want to do some form of caching for the
> match results of the leading components. However, this is not
> implemented in this patch.) Also add two tests for each command (add,
> rm, and mv) to check that they behave correctly with the said pattern
> matching. The first test would previously fail without this patch, while
> the second already succeeded. It is added mostly to make sure that we
> are not breaking the existing pattern matching for directories that are
> really sparse, and also as a protection against any future
> regressions.
> 
> Note that two other existing tests had to be changed: one test in t3602
> checks that "git rm -r <dir>" won't remove sparse entries, but it
> didn't allow the non-sparse entries inside <dir> to be removed. The
> other one, in t7002, tested that "git mv" would correctly display a
> warning message for sparse paths, but it accidentally expected the
> message to include two non-sparse paths as well.


> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
>  				     struct index_state *istate,
>  				     int require_cone_mode)
>  {
> -	const char *base;
>  	int dtype = DT_REG;
> +	enum pattern_match_result ret = NOT_MATCHED;
> +	const char *p, *last_slash = NULL;
>  
>  	/*
>  	 * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path,
>  	     !istate->sparse_checkout_patterns->use_cone_patterns))
>  		return 1;
>  
> -	base = strrchr(path, '/');
> -	return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> -					 &dtype,
> -					 istate->sparse_checkout_patterns,
> -					 istate) > 0;
> +	if (istate->sparse_checkout_patterns->use_cone_patterns) {
> +		const char *base = strrchr(path, '/');
> +		return path_matches_pattern_list(path, strlen(path),
> +				base ? base + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate) > 0;
> +	}
> +
> +	for (p = path; ; p++) {
> +		enum pattern_match_result match;
> +
> +		if (*p && *p != '/')
> +			continue;
> +
> +		match  = path_matches_pattern_list(path, p - path,
> +				last_slash ? last_slash + 1 : path, &dtype,
> +				istate->sparse_checkout_patterns, istate);
> +
> +		if (match != UNDECIDED)
> +			ret = match;
> +		if (!*p)
> +			break;
> +		last_slash = p;
> +	}
> +
> +	return ret;

This implementation makes sense to me.

>  test_expect_success 'recursive rm does not remove sparse entries' '
>  	git reset --hard &&
>  	git sparse-checkout set sub/dir &&
> -	test_must_fail git rm -r sub &&
> -	git rm --sparse -r sub &&
> +	git rm -r sub &&

Interesting that the new pattern-matching already presents a change of
behavior in this test case.

>  	git status --porcelain -uno >actual &&
>  	cat >expected <<-\EOF &&
> +	D  sub/dir/e
> +	EOF
> +	test_cmp expected actual &&

And here is why. Excellent. I suppose that setting the pattern to be
"sub/dir/" would have shown this behavior before.

> +
> +	git rm --sparse -r sub &&
> +	git status --porcelain -uno >actual2 &&
> +	cat >expected2 <<-\EOF &&
>  	D  sub/d
>  	D  sub/dir/e
>  	EOF
> -	test_cmp expected actual
> +	test_cmp expected2 actual2
>  '

The rest of the test cases add new checks that are very valuable.

I love this idea and I agree that it would be better to change the
loop direction to match the full path first (as you mention in your
response).

Thanks,
-Stolee

      parent reply	other threads:[~2021-10-25 16:40 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 21:54 [PATCH 00/13] [RFC] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 01/13] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 02/13] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 03/13] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-24  7:44   ` René Scharfe
2021-08-24 21:54 ` [PATCH 04/13] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 05/13] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-08-27 21:06   ` Matheus Tavares Bernardino
2021-08-27 22:50     ` Matheus Tavares Bernardino
2021-09-08 17:54     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 06/13] add: skip paths that are outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-08-27 21:13   ` Matheus Tavares
2021-09-08 19:46     ` Derrick Stolee
2021-09-08 20:02       ` Derrick Stolee
2021-09-08 21:06     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 07/13] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-08-27 21:14   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 08/13] add: prevent adding sparse conflict files Derrick Stolee via GitGitGadget
2021-08-27 21:16   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 09/13] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-08-27 21:17   ` Matheus Tavares Bernardino
2021-09-08 18:04     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 10/13] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-08-27 21:18   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 11/13] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-08-27 21:20   ` Matheus Tavares Bernardino
2021-08-27 23:44     ` Matheus Tavares Bernardino
2021-09-08 18:41     ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 12/13] mv: add '--sparse' option to ignore sparse-checkout Derrick Stolee via GitGitGadget
2021-08-28 14:18   ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 13/13] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 01/14] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-15  5:22     ` Elijah Newren
2021-09-15 16:17       ` Derrick Stolee
2021-09-15 16:32     ` Matheus Tavares
2021-09-15 16:42       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 02/14] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-12 22:17     ` Ævar Arnfjörð Bjarmason
2021-09-13 15:02       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 03/14] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 04/14] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-12 22:21     ` Ævar Arnfjörð Bjarmason
2021-09-15 14:41       ` Derrick Stolee
2021-09-15 14:54     ` Elijah Newren
2021-09-15 16:43       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 05/14] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 06/14] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 07/14] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 08/14] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-15 16:59     ` Elijah Newren
2021-09-20 15:45       ` Derrick Stolee
2021-09-12 13:23   ` [PATCH v2 09/14] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 10/14] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 11/14] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 12/14] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 13/14] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-12 13:23   ` [PATCH v2 14/14] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-12 21:58     ` Ævar Arnfjörð Bjarmason
2021-09-15 16:54       ` Derrick Stolee
2021-09-15 20:18   ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Elijah Newren
2021-09-20 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 01/14] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-22 22:52       ` Junio C Hamano
2021-09-20 17:45     ` [PATCH v3 02/14] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-22 23:06       ` Junio C Hamano
2021-09-23 13:37         ` Derrick Stolee
2021-09-20 17:45     ` [PATCH v3 03/14] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-09-22 23:13       ` Junio C Hamano
2021-09-23 13:39         ` Derrick Stolee
2021-09-23 13:42           ` Derrick Stolee
2021-09-23 18:23             ` Junio C Hamano
2021-09-24 13:29               ` Derrick Stolee
2021-09-20 17:45     ` [PATCH v3 04/14] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 05/14] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 06/14] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 07/14] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 08/14] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 09/14] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 10/14] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 11/14] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 12/14] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 13/14] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-20 17:45     ` [PATCH v3 14/14] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-24  6:08     ` [PATCH v3 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Elijah Newren
2021-09-24 15:39     ` [PATCH v4 00/13] Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 01/13] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 02/13] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 03/13] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 04/13] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-11-02  0:15         ` Glen Choo
2021-11-02  0:34           ` Junio C Hamano
2021-11-02 13:42             ` Derrick Stolee
2021-11-02 14:50               ` Derrick Stolee
2021-11-02 15:33                 ` Ævar Arnfjörð Bjarmason
2021-11-03 14:40                   ` Derrick Stolee
2021-11-03 17:14                     ` Junio C Hamano
2021-09-24 15:39       ` [PATCH v4 05/13] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 06/13] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 07/13] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 08/13] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 09/13] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 10/13] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 11/13] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 12/13] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-24 15:39       ` [PATCH v4 13/13] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-27 15:51       ` [PATCH v4 00/13] Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior Elijah Newren
2021-09-27 20:51         ` Junio C Hamano
2021-10-18 21:28   ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Sean Christopherson
2021-10-19 12:29     ` Derrick Stolee
2021-10-19 16:50       ` Sean Christopherson
2021-10-20 13:28         ` Junio C Hamano
2021-10-20 14:28           ` Sean Christopherson
2021-10-22  2:28     ` [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse Matheus Tavares
2021-10-22  4:03       ` Matheus Tavares
2021-10-25 16:40       ` Derrick Stolee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a11d4fb0-184a-b811-64a0-e29e9cfd91a3@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=seanjc@google.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).