From: Derrick Stolee <stolee@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>,
Sean Christopherson <seanjc@google.com>
Cc: git@vger.kernel.org, newren@gmail.com, gitster@pobox.com,
vdye@github.com, derrickstolee@github.com,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Subject: Re: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse
Date: Mon, 25 Oct 2021 12:40:17 -0400 [thread overview]
Message-ID: <a11d4fb0-184a-b811-64a0-e29e9cfd91a3@gmail.com> (raw)
In-Reply-To: <80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@usp.br>
On 10/21/2021 10:28 PM, Matheus Tavares wrote:
> On Mon, Oct 18, 2021 at 6:28 PM Sean Christopherson <seanjc@google.com> wrote:
>>
>> $ cat .git/info/sparse-checkout
>> !arch/*
>> !tools/arch/*
>> !virt/kvm/arm/*
>> /*
>> arch/.gitignore
>> arch/Kconfig
>> arch/x86
>> tools/arch/x86
>> tools/include/uapi/linux/kvm.h
>> !Documentation
>> !drivers
>>
>> $ git read-tree -mu HEAD
>>
>> $ rm arch/x86/kvm/x86.c
> [...]
>> $ git add arch/x86
>> The following paths and/or pathspecs matched paths that exist
>> outside of your sparse-checkout definition, so will not be
>> updated in the index:
>> arch/x86
>
> I think the problem may be that we are performing pattern matching
> slightly different in add, mv, and rm, in comparison to "git
> sparse-checkout". On "git sparse-checkout init" (or reapply), we call
> clear_ce_flags() which calls path_matches_pattern_list() for each
> component of the working tree paths. If the full path gives a match
> result of UNDECIDED, we recursively try to use the match result from
> the parent dir (or NOT_MATCHED if we reach the top with UNDECIDED).
Yes! I think this is absolutely the problem. Thanks for pointing
this out!
> In Sean's example, we get UNDECIDED for "arch/x86/kvm/x86.c", but
> "arch/x86" gives MATCHED, so we end up using that for the full path.
>
> However, in add|mv|rm we only call path_matches_pattern_list() for the
> full path and get UNDECIDED, which we consider the same as NOT_MATCHED,
> and end up disallowing the path update operation with a warning message.
>
> The commands do work if we replace the sparsity pattern "arch/x86" with
> "arch/x86/" (with a trailing slash), but note that it only works
> because the pattern is relative to the root (see dir.c:1297). If we
> change it to "x86/", it would no longer work.
>
> So far, the only way I could think of to fix this would be to perform
> pattern matching for the leading components of the paths too. That
> doesn't seem very nice, though, as it can probably be quite expensive...
> But here is a patch for discussion:
I agree that it is expensive, but that's already the case for the
non-cone sparse-checkout patterns. Hopefully it is sufficient that
these cases are restricted to modified files (in the case of `git add .`)
or specific pathspecs (in the case of `git mv` and `git rm`).
> -- >8 --
> Subject: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse dirs
>
> These three commands recently learned to avoid updating paths that do
> not match the sparse-checkout patterns even if they are missing the
> SKIP_WORKTREE bit. This is done using path_in_sparse_checkout(), which
> tries to match the path with the current set of sparsity rules using
> path_matches_pattern_list(). This is similar to what clear_ce_flags()
> does when we run "git sparse-checkout init" or "git sparse-checkout
> reapply". But note that clear_ce_flags() has a recursive behavior,
> calling path_matches_pattern_list() for each component in a path,
> whereas path_in_sparse_checkout() only calls it for the full path. This
> makes the function miss matches such as the one between path "a/b/c" and
> the pattern "b/". So if the user has the sparsity rules "!/a" and "b/",
> for example, add, rm, and mv will fail to update the path "a/b/c" and
> end up displaying a warning about "a/b/c" being outside the sparse
> checkout even though it isn't. Note that this problem only occurs with
> non-cone mode.
>
> Fix this by making path_in_sparse_checkout() perform pattern matching
> for every component in the given path when cone mode is disabled. (This
> can be expensive, and we might want to do some form of caching for the
> match results of the leading components. However, this is not
> implemented in this patch.) Also add two tests for each command (add,
> rm, and mv) to check that they behave correctly with the said pattern
> matching. The first test would previously fail without this patch, while
> the second already succeeded. It is added mostly to make sure that we
> are not breaking the existing pattern matching for directories that are
> really sparse, and also as a protection against any future
> regressions.
>
> Note that two other existing tests had to be changed: one test in t3602
> checks that "git rm -r <dir>" won't remove sparse entries, but it
> didn't allow the non-sparse entries inside <dir> to be removed. The
> other one, in t7002, tested that "git mv" would correctly display a
> warning message for sparse paths, but it accidentally expected the
> message to include two non-sparse paths as well.
> @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path,
> struct index_state *istate,
> int require_cone_mode)
> {
> - const char *base;
> int dtype = DT_REG;
> + enum pattern_match_result ret = NOT_MATCHED;
> + const char *p, *last_slash = NULL;
>
> /*
> * We default to accepting a path if there are no patterns or
> @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path,
> !istate->sparse_checkout_patterns->use_cone_patterns))
> return 1;
>
> - base = strrchr(path, '/');
> - return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path,
> - &dtype,
> - istate->sparse_checkout_patterns,
> - istate) > 0;
> + if (istate->sparse_checkout_patterns->use_cone_patterns) {
> + const char *base = strrchr(path, '/');
> + return path_matches_pattern_list(path, strlen(path),
> + base ? base + 1 : path, &dtype,
> + istate->sparse_checkout_patterns, istate) > 0;
> + }
> +
> + for (p = path; ; p++) {
> + enum pattern_match_result match;
> +
> + if (*p && *p != '/')
> + continue;
> +
> + match = path_matches_pattern_list(path, p - path,
> + last_slash ? last_slash + 1 : path, &dtype,
> + istate->sparse_checkout_patterns, istate);
> +
> + if (match != UNDECIDED)
> + ret = match;
> + if (!*p)
> + break;
> + last_slash = p;
> + }
> +
> + return ret;
This implementation makes sense to me.
> test_expect_success 'recursive rm does not remove sparse entries' '
> git reset --hard &&
> git sparse-checkout set sub/dir &&
> - test_must_fail git rm -r sub &&
> - git rm --sparse -r sub &&
> + git rm -r sub &&
Interesting that the new pattern-matching already presents a change of
behavior in this test case.
> git status --porcelain -uno >actual &&
> cat >expected <<-\EOF &&
> + D sub/dir/e
> + EOF
> + test_cmp expected actual &&
And here is why. Excellent. I suppose that setting the pattern to be
"sub/dir/" would have shown this behavior before.
> +
> + git rm --sparse -r sub &&
> + git status --porcelain -uno >actual2 &&
> + cat >expected2 <<-\EOF &&
> D sub/d
> D sub/dir/e
> EOF
> - test_cmp expected actual
> + test_cmp expected2 actual2
> '
The rest of the test cases add new checks that are very valuable.
I love this idea and I agree that it would be better to change the
loop direction to match the full path first (as you mention in your
response).
Thanks,
-Stolee
prev parent reply other threads:[~2021-10-25 16:40 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-24 21:54 [PATCH 00/13] [RFC] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 01/13] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 02/13] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 03/13] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-24 7:44 ` René Scharfe
2021-08-24 21:54 ` [PATCH 04/13] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-08-24 21:54 ` [PATCH 05/13] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-08-27 21:06 ` Matheus Tavares Bernardino
2021-08-27 22:50 ` Matheus Tavares Bernardino
2021-09-08 17:54 ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 06/13] add: skip paths that are outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-08-27 21:13 ` Matheus Tavares
2021-09-08 19:46 ` Derrick Stolee
2021-09-08 20:02 ` Derrick Stolee
2021-09-08 21:06 ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 07/13] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-08-27 21:14 ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 08/13] add: prevent adding sparse conflict files Derrick Stolee via GitGitGadget
2021-08-27 21:16 ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 09/13] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-08-27 21:17 ` Matheus Tavares Bernardino
2021-09-08 18:04 ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 10/13] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-08-27 21:18 ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 11/13] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-08-27 21:20 ` Matheus Tavares Bernardino
2021-08-27 23:44 ` Matheus Tavares Bernardino
2021-09-08 18:41 ` Derrick Stolee
2021-08-24 21:54 ` [PATCH 12/13] mv: add '--sparse' option to ignore sparse-checkout Derrick Stolee via GitGitGadget
2021-08-28 14:18 ` Matheus Tavares Bernardino
2021-08-24 21:54 ` [PATCH 13/13] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 01/14] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-15 5:22 ` Elijah Newren
2021-09-15 16:17 ` Derrick Stolee
2021-09-15 16:32 ` Matheus Tavares
2021-09-15 16:42 ` Derrick Stolee
2021-09-12 13:23 ` [PATCH v2 02/14] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-12 22:17 ` Ævar Arnfjörð Bjarmason
2021-09-13 15:02 ` Derrick Stolee
2021-09-12 13:23 ` [PATCH v2 03/14] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 04/14] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-12 22:21 ` Ævar Arnfjörð Bjarmason
2021-09-15 14:41 ` Derrick Stolee
2021-09-15 14:54 ` Elijah Newren
2021-09-15 16:43 ` Derrick Stolee
2021-09-12 13:23 ` [PATCH v2 05/14] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 06/14] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 07/14] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 08/14] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-15 16:59 ` Elijah Newren
2021-09-20 15:45 ` Derrick Stolee
2021-09-12 13:23 ` [PATCH v2 09/14] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 10/14] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 11/14] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 12/14] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 13/14] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-12 13:23 ` [PATCH v2 14/14] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-12 21:58 ` Ævar Arnfjörð Bjarmason
2021-09-15 16:54 ` Derrick Stolee
2021-09-15 20:18 ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Elijah Newren
2021-09-20 17:45 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 01/14] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-22 22:52 ` Junio C Hamano
2021-09-20 17:45 ` [PATCH v3 02/14] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-22 23:06 ` Junio C Hamano
2021-09-23 13:37 ` Derrick Stolee
2021-09-20 17:45 ` [PATCH v3 03/14] dir: extract directory-matching logic Derrick Stolee via GitGitGadget
2021-09-22 23:13 ` Junio C Hamano
2021-09-23 13:39 ` Derrick Stolee
2021-09-23 13:42 ` Derrick Stolee
2021-09-23 18:23 ` Junio C Hamano
2021-09-24 13:29 ` Derrick Stolee
2021-09-20 17:45 ` [PATCH v3 04/14] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 05/14] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 06/14] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 07/14] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 08/14] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 09/14] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 10/14] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 11/14] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 12/14] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 13/14] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-20 17:45 ` [PATCH v3 14/14] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-24 6:08 ` [PATCH v3 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Elijah Newren
2021-09-24 15:39 ` [PATCH v4 00/13] Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 01/13] t3705: test that 'sparse_entry' is unstaged Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 02/13] t1092: behavior for adding sparse files Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 03/13] dir: select directories correctly Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 04/13] dir: fix pattern matching on dirs Derrick Stolee via GitGitGadget
2021-11-02 0:15 ` Glen Choo
2021-11-02 0:34 ` Junio C Hamano
2021-11-02 13:42 ` Derrick Stolee
2021-11-02 14:50 ` Derrick Stolee
2021-11-02 15:33 ` Ævar Arnfjörð Bjarmason
2021-11-03 14:40 ` Derrick Stolee
2021-11-03 17:14 ` Junio C Hamano
2021-09-24 15:39 ` [PATCH v4 05/13] add: fail when adding an untracked sparse file Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 06/13] add: skip tracked paths outside sparse-checkout cone Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 07/13] add: implement the --sparse option Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 08/13] add: update --chmod to skip sparse paths Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 09/13] add: update --renormalize " Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 10/13] rm: add --sparse option Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 11/13] rm: skip sparse paths with missing SKIP_WORKTREE Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 12/13] mv: refuse to move sparse paths Derrick Stolee via GitGitGadget
2021-09-24 15:39 ` [PATCH v4 13/13] advice: update message to suggest '--sparse' Derrick Stolee via GitGitGadget
2021-09-27 15:51 ` [PATCH v4 00/13] Sparse-checkout: modify 'git add', 'git rm', and 'git mv' behavior Elijah Newren
2021-09-27 20:51 ` Junio C Hamano
2021-10-18 21:28 ` [PATCH v2 00/14] Sparse-checkout: modify 'git add', 'git rm', and 'git add' behavior Sean Christopherson
2021-10-19 12:29 ` Derrick Stolee
2021-10-19 16:50 ` Sean Christopherson
2021-10-20 13:28 ` Junio C Hamano
2021-10-20 14:28 ` Sean Christopherson
2021-10-22 2:28 ` [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse Matheus Tavares
2021-10-22 4:03 ` Matheus Tavares
2021-10-25 16:40 ` Derrick Stolee [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a11d4fb0-184a-b811-64a0-e29e9cfd91a3@gmail.com \
--to=stolee@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=matheus.bernardino@usp.br \
--cc=newren@gmail.com \
--cc=seanjc@google.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).