From: Derrick Stolee <stolee@gmail.com>
To: Tim Renouf <tpr.ll@botech.co.uk>,
newren@gmail.com, dstolee@microsoft.com, git@vger.kernel.org
Subject: Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
Date: Tue, 1 Jun 2021 22:00:50 -0400 [thread overview]
Message-ID: <f6d39636-308c-c846-55b5-3f16a155e69d@gmail.com> (raw)
In-Reply-To: <20210601183106.3084008-1-tpr.ll@botech.co.uk>
On 6/1/2021 2:31 PM, Tim Renouf wrote:
> When doing a checkout (or other index merge from a tree) causes the
> removal of a path that is outside sparse-checkout, the file is removed
> from the working tree, even if it is dirty.
>
> That is probably what you want if the file got there by being
> materialized in a merge conflict. But it is not what you want if you
> deliberately put the file there.
>
> This commit adds the above config item, defaulting to "true" to get the
> old behavior. Set it to "false" to avoid removing such a file from the
> worktree.
I don't think config is necessary here. This behavior is niche
enough to create a behavior-breaking change. However, we do want
to ensure that the full flow of eventually clearing the file when
clean is handled.
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index a0eeaeb02e..31adb38b1d 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -111,6 +111,17 @@ the sparse-checkout file.
> To repopulate the working directory with all files, use the
> `git sparse-checkout disable` command.
>
> +The `core.sparseCheckoutRmFiles` config setting
If we _are_ going to go with a config option, then I'm not a big
fan of this name. I've also been thinking that the sparse-checkout
config has been squatting in the "core.*" space too long. Perhaps
it is time to create its own section?
What about something like sparseCheckout.keepDirtyFiles, which
defaults to false?
Alternatively, we could add things to the index.* space. We
already have "index.sparse" for the sparse index feature. For this
topic, a config option "index.keepDirtySparseFiles" would have a
similar meaning to my other suggestion.
At the least, you would need to update the appropriate file in
Documentation/config/*.txt.
> extern int core_apply_sparse_checkout;
> extern int core_sparse_checkout_cone;
> +extern int core_sparse_checkout_rm_files;
These previous variables being global is unfortunate and it
might be time to fix that. At minimum, I think this new
option might be able to inject somewhere else without
running a lot of git_config_get_value() calls in a loop.
Since your changes are within unpack-trees.c, maybe adding
a flag to 'struct unpack_trees_options' would suffice. It
could be initialized in unpack_trees() so only happen once
per index update.
> +test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' '
This test name is too long. Perhaps
'sparse-checkout removes dirty non-matching file'
> + git config core.sparseCheckout false &&
> + git checkout -f top &&
> + echo added3 >added3 &&
> + git add added3 &&
> + git commit -madded3 &&
> + git checkout top &&
> + test_path_is_missing added3 &&
> + git config core.sparseCheckout true &&
> + git config core.sparseCheckoutRmFiles true &&
> + echo init.t >.git/info/sparse-checkout &&
Perhaps we could use a more modern approach, such as
git sparse-checkout init &&
git sparse-checkout set init.t &&
(and use 'git sparse-checkout disable' at the start of the
test.)
> + git checkout HEAD@{1} &&
I'd typically expect 'git checkout HEAD~1' instead of
using the reflog, since it is more robust to changing
the test in the future. Better even to create a new
branch earlier and then switch between named branches.
> + test_path_is_missing added3 &&
> + echo dirty >added3 &&
I appreciate that you confirm the file is missing before
you create it.
> + git checkout top &&
> + test_path_is_missing added3
Thought: does this happen also with 'git sparse-checkout
reapply'?
> +'
> +
> +test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' '
and here:
'sparse-checkout keeps dirty non-matching file'
These tests are very similar. Perhaps they could be
grouped and just have a check at the end that makes
'added3' dirty and checks the behavior of 'git checkout'
with the two config settings?
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1528,7 +1528,9 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>
> /*
> * 1. Pretend the narrowest worktree: only unmerged entries
> - * are checked out
> + * are checked out. If core.sparseCheckoutRmFiles is off, then
> + * treat a file being removed as merged, so it does not get
> + * removed from the worktree.
> */
> for (i = 0; i < istate->cache_nr; i++) {
> struct cache_entry *ce = istate->cache[i];
> @@ -1536,7 +1538,8 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
> if (select_flag && !(ce->ce_flags & select_flag))
> continue;
>
> - if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED))
> + if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) ||
> + ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files))
> ce->ce_flags |= skip_wt_flag;
> else
> ce->ce_flags &= ~skip_wt_flag;
> @@ -1681,12 +1684,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>
> if (!o->skip_sparse_checkout) {
> /*
> - * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1
> + * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1.
> + * If !core.sparseCheckoutRmFiles, include files being removed so ones
> + * outside sparse-checkout patterns do not get removed from the worktree.
> * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE
> * so apply_sparse_checkout() won't attempt to remove it from worktree
> */
> + int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE;
> mark_new_skip_worktree(o->pl, &o->result,
> - CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
> + mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
> o->verbose_update);
I'm a bit worried about this use of CE_REMOVE. I see its use listed in
cache-tree.c with this comment:
/*
* CE_REMOVE entries are removed before the index is
* written to disk. Skip them to remain consistent
* with the future on-disk index.
*/
This makes me think that the entries are actually not present in the
written index, which is incorrect. It will make it look like we have
staged a deletion of that file. Can you check that the output of
'git status' shows the file with no staged changes, but an unstaged
_modification_? Alternatively: the modification might be ignored since
it is a sparse entry, but we would probably want to fix that before
taking this change. If my understanding is correct*, then 'git status'
will show it as a staged deletion and an unstaged addition.
(*) This is a BIG IF.
Thank you for your interest here. Elijah is right that the area is
fraught with complexity, and I'm learning something new about it
every day as I work on my sparse-index feature. I'm still trying
to grasp the subtleties like this and their ramifications before
changing the existing behavior because I want to be _sure_ we are
moving in a more stable direction and not just another unstable
point that frustrates users.
This change seems like a focused attempt, but I think we need to
track down those other details before committing to such a change:
1. How does a user with a dirty, tracked, sparse file get back
into a state where that file is deleted? What commands do
they need to run? Can that be tested and added to the sparse-
checkout documentation?
2. What does 'git status' look like when a user is in this state?
Is that helpful?
Thanks,
-Stolee
next prev parent reply other threads:[~2021-06-02 2:00 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-28 20:14 bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Tim Renouf (open source)
2021-05-28 22:44 ` Elijah Newren
2021-06-01 18:31 ` [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config Tim Renouf
2021-06-02 2:00 ` Derrick Stolee [this message]
2021-06-02 2:32 ` Junio C Hamano
2021-06-02 5:53 ` Elijah Newren
2021-06-02 6:13 ` Tim Renouf (open source)
2021-06-02 23:41 ` Elijah Newren
2021-06-05 15:33 ` Tim Renouf (open source)
2021-06-02 1:37 ` bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f6d39636-308c-c846-55b5-3f16a155e69d@gmail.com \
--to=stolee@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=newren@gmail.com \
--cc=tpr.ll@botech.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).