git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: warmsocks@gmail.com, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH] sparse-checkout: avoid staging deletions of all files
Date: Thu, 4 Jun 2020 10:48:27 -0400	[thread overview]
Message-ID: <7c34da8e-87b8-5236-4536-4d8fbc3f1e80@gmail.com> (raw)
In-Reply-To: <pull.801.git.git.1591258657818.gitgitgadget@gmail.com>

On 6/4/2020 4:17 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> sparse-checkout's purpose is to update the working tree to have it
> reflect a subset of the tracked files.  As such, it shouldn't be
> switching branches, making commits, downloading or uploading data, or
> staging or unstaging changes.  Other than updating the worktree, the
> only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the
> index.  In particular, this sets up a nice invariant: running
> sparse-checkout will never change the status of any file in `git status`
> (reflecting the fact that we only set the SKIP_WORKTREE bit if the file
> is safe to delete, i.e. if the file is unmodified).
> 
> Traditionally, we did a _really_ bad job with this goal.  The
> predecessor to sparse-checkout involved manual editing of
> .git/info/sparse-checkout and running `git read-tree -mu HEAD`.  That
> command would stage and unstage changes and overwrite dirty changes in
> the working tree.
> 
> The initial implementation of the sparse-checkout command was no better;
> it simply invoked `git read-tree -mu HEAD` as a subprocess and had the
> same caveats, though this issue came up repeatedly in review comments
> and workarounds for the problems were put in place before the feature
> was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6].
> 
> [1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@mail.gmail.com/
> [2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@mail.gmail.com/
> [3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@mail.gmail.com/
> [4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@mail.gmail.com/
> [5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@mail.gmail.com/
> [6] https://lore.kernel.org/git/20191121163706.GV23183@szeder.dev/
> 
> However, these workarounds, in addition to disabling the feature in a
> number of important cases, also missed one special case.  I'll get back
> to it later.
> 
> In the 2.27.0 cycle, the disabling of the feature was lifted by finally
> replacing the internal equivalent of `git read-tree -mu HEAD` with
> something that did what we wanted: the new update_sparsity() function in
> unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index
> and updates the working tree to match.  This new function handles all
> the cases that were problematic for the old implementation, except that
> it breaks the same special case that avoided the workarounds of the old
> implementation, but broke it in a different way.
> 
> So...that brings us to the special case: a git clone performed with
> --no-checkout.  As per the meaning of the flag, --no-checkout does not
> check out any branch, with the implication that you aren't on one and
> need to switch to one after the clone.  Implementationally, HEAD is
> still set (so in some sense you are partially on a branch), but
>   * the index is "unborn" (non-existent)
>   * there are no files in the working tree (other than .git/)
>   * the next time git switch (or git checkout) is run it will run
>     unpack_trees with `initial_checkout` flag set to true.
> It is not until you run, e.g. `git switch <somebranch>` that the index
> will be written and files in the working tree populated.
> 
> With this special --no-checkout case, the traditional `read-tree -mu
> HEAD` behavior would have done the equivalent of acting like checkout --
> switch to the default branch (HEAD), write out an index that matches
> HEAD, and update the working tree to match.  This special case slipped
> through the avoid-making-changes checks in the original sparse-checkout
> command and thus continued there.
> 
> After update_sparsity() was introduced and used (see commit f56f31af03
> ("sparse-checkout: use new update_sparsity() function", 2020-03-27)),
> the behavior for the --no-checkout case changed:  Due to git's
> auto-vivification of an empty in-memory index (see do_read_index() and
> note that `must_exist` is false), and due to sparse-checkout's
> update_working_directory() code to always write out the index after it
> was done, we got a new bug.  That made it so that sparse-checkout would
> switch the repository from a clone with an "unborn" index (i.e. still
> needing an initial_checkout), to one that had a recorded index with no
> entries.  Thus, instead of all the files appearing deleted in `git
> status` being known to git as a special artifact of not yet being on a
> branch, our recording of an empty index made it suddenly look to git as
> though it was definitely on a branch with ALL files staged for deletion!
> A subsequent checkout or switch then had to contend with the fact that
> it wasn't on an initial_checkout but had a bunch of staged deletions.
 
Thank you for the detailed summary of how we got to this point and
what is going on. When I asked you to look into this, I did ask for
you to spend "less than an hour" only because I hoped the fix would
be easy. Clearly, there are a lot of things going on here, but you
seem to have a firm grasp on everything.

> Make sure that sparse-checkout changes nothing in the index other than
> the SKIP_WORKTREE bit; in particular, when the index is unborn we do not
> have any branch checked out so there is no sparsification or
> de-sparsification work to do.  Simply return from
> update_working_directory() early.

This makes sense to me. The sparse-checkout file will still be
modified, so when the user performs a checkout then Git will
respect the set patterns. Your test confirms this, too!

I think this behavior is a reasonable compromise from the
previous behavior and being as correct as possible now. Before,
the "git read-tree -mu HEAD" portion of "git sparse-chekcout set"
would populate the working directory, and now it will not.
This more closely matches how a user interacts with --no-checkout
in the non-sparse case.

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>     sparse-checkout: avoid staging deletions of all files
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-801%2Fnewren%2Fsparse-checkout-and-unborn-index-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-801/newren/sparse-checkout-and-unborn-index-v1
> Pull-Request: https://github.com/git/git/pull/801
> 
>  builtin/sparse-checkout.c          |  4 ++++
>  t/t1091-sparse-checkout-builtin.sh | 19 +++++++++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 95d08824172..595463be68e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -99,6 +99,10 @@ static int update_working_directory(struct pattern_list *pl)
>  	struct lock_file lock_file = LOCK_INIT;
>  	struct repository *r = the_repository;
>  
> +	/* If no branch has been checked out, there are no updates to make. */
> +	if (is_index_unborn(r->index))
> +		return UPDATE_SPARSITY_SUCCESS;
> +
>  	memset(&o, 0, sizeof(o));
>  	o.verbose_update = isatty(2);
>  	o.update = 1;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 88cdde255cd..bc287e5c1fa 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -100,6 +100,25 @@ test_expect_success 'clone --sparse' '
>  	check_files clone a
>  '
>  
> +test_expect_success 'interaction with clone --no-checkout (unborn index)' '
> +	git clone --no-checkout "file://$(pwd)/repo" clone_no_checkout &&
> +	git -C clone_no_checkout sparse-checkout init --cone &&
> +	git -C clone_no_checkout sparse-checkout set folder1 &&
> +	git -C clone_no_checkout sparse-checkout list >actual &&
> +	cat >expect <<-\EOF &&
> +	folder1
> +	EOF
> +	test_cmp expect actual &&
> +	ls clone_no_checkout >actual &&
> +	test_must_be_empty actual &&

My only comment on the test case is to see if you could use
the "check_files" macro instead of "ls". See 761e3d26
(sparse-checkout: improve OS ls compatibility, 2019-12-20)
for details.

> +	test_path_is_missing clone_no_checkout/.git/index &&
> +
> +	# No branch is checked out until we manually switch to one
> +	git -C clone_no_checkout switch master &&
> +	test_path_is_file clone_no_checkout/.git/index &&
> +	check_files clone_no_checkout a folder1
> +'

Thanks!
-Stolee


  reply	other threads:[~2020-06-04 14:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  8:17 [PATCH] sparse-checkout: avoid staging deletions of all files Elijah Newren via GitGitGadget
2020-06-04 14:48 ` Derrick Stolee [this message]
2020-06-04 15:05   ` Elijah Newren
2020-06-04 15:23     ` Derrick Stolee
2020-06-04 17:08       ` Junio C Hamano
2020-06-04 17:18     ` Junio C Hamano
2020-06-04 17:21       ` Junio C Hamano
2020-06-04 16:57 ` Junio C Hamano
2020-06-05  2:41 ` [PATCH v2] " Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c34da8e-87b8-5236-4536-4d8fbc3f1e80@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --cc=warmsocks@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).