git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: matheus.bernardino@usp.br, dstolee@microsoft.com,
	Elijah Newren <newren@gmail.com>,
	chris.torek@gmail.com, Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 0/3] Fix stash apply in sparse checkouts (and a submodule test)
Date: Tue, 01 Dec 2020 22:25:15 +0000	[thread overview]
Message-ID: <pull.919.v2.git.git.1606861519.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.919.git.git.1605891222.gitgitgadget@gmail.com>

This series fixes some issues with git stash apply in sparse checkouts, when
the sparsity patterns have changed between when the stash was created and
when the stash is being applied. It also incidentally fixes a submodule
testcase unrelated to sparse checkouts.

Changes since v1:

 * Commit message cleanup, including a couple wording issues highlighted by
   Chris
 * Cleaned up the code a bit and commented it better, to try to make it
   easier for Junio (and others) to follow.

Elijah Newren (3):
  t7012: add a testcase demonstrating stash apply bugs in sparse
    checkouts
  stash: remove unnecessary process forking
  stash: fix stash application in sparse-checkouts

 builtin/stash.c                  | 165 ++++++++++++++++++++++---------
 t/lib-submodule-update.sh        |  16 +--
 t/t7012-skip-worktree-writing.sh |  88 +++++++++++++++++
 3 files changed, 212 insertions(+), 57 deletions(-)


base-commit: 898f80736c75878acc02dc55672317fcc0e0a5a6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-919%2Fnewren%2Fsparse-checkout-fixups-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-919/newren/sparse-checkout-fixups-v2
Pull-Request: https://github.com/git/git/pull/919

Range-diff vs v1:

 1:  de4b4d207b ! 1:  2155bbfe20 t7012: add a testcase demonstrating stash apply bugs in sparse checkouts
     @@ Commit message
          t7012: add a testcase demonstrating stash apply bugs in sparse checkouts
      
          Applying stashes in sparse-checkouts, particularly when the patterns
     -    used to define the present paths have changed between when the stash was
     +    used to define the sparseness have changed between when the stash was
          created and when it is applied, has a number of bugs.  The primary
     -    problem is perhaps that stashes can be only partially applied (sometimes
     -    without any warning or error being displayed and with 0 exit status).
     -    In addition, there are cases where partial stash application comes with
     -    non-translated error messages and an early abort.  The first is when
     -    there are files present despite the SKIP_WORKTREE bit being set, in
     -    which case the error message shown is:
     +    problem is that stashes are sometimes only partially applied.  In most
     +    such cases, it does so silently without any warning or error being
     +    displayed and with 0 exit status.
     +
     +    There are, however, a few cases when non-translated error messages are
     +    shown and the stash application aborts early.  The first is when there
     +    are files present despite the SKIP_WORKTREE bit being set, in which case
     +    the error message shown is:
      
              error: Entry 'PATHNAME' not uptodate. Cannot merge.
      
          The other situation is when a stash contains new files to add to the
     -    working tree; in this case, the code aborts early (but after at least
     -    partial stash application) with the following error message being shown:
     +    working tree; in this case, the code aborts early but still has the
     +    stash partially applied, and shows the following error message:
      
              error: NEWFILE: does not exist and --remove not passed
              fatal: Unable to process path NEWFILE
      
     -    Add a test that can trigger all three of these problems that carefully
     -    checks that the working copy and SKIP_WORKTREE bits are as expected
     -    after the stash application...but currently marked as expected to fail.
     +    Add a test that can trigger all three of these problems.  Have it
     +    carefully check that the working copy and SKIP_WORKTREE bits are as
     +    expected after the stash application.  The test is currently marked as
     +    expected to fail, but subsequent commits will implement the fixes and
     +    toggle the expectation.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
 2:  eb9ebcf0bd ! 2:  1fa263cf3c stash: remove unnecessary process forking
     @@ Commit message
          into a library function that does the same thing.  (The read-tree
          --reset was already partially converted over to a library call, but as
          an independent piece.)  Note here that this came after a merge operation
     -    was performed.  The merge machinery always staged anything that cleanly
     -    merges, and the above code only runs if there were no conflicts.  Its
     +    was performed.  The merge machinery always stages anything that cleanly
     +    merges, and the above code only runs if there are no conflicts.  Its
          purpose is to make it so that when there are no conflicts, all the
          changes from the stash are unstaged.  However, that causes brand new
          files from the stash to become untracked, so the code above first saves
     @@ Commit message
          We replace the whole series of commands with a simple function that will
          unstage files that are not newly added.  This doesn't fix any bugs in
          the usage of these commands, it simply matches the existing behavior but
     -    making it an actual builtin that we can then operate on as a whole.  A
     -    subsequent commit will take advantage of this to fix issues with these
     -    commands in sparse-checkouts.
     +    makes it into a single atomic operation that we can then operate on as a
     +    whole.  A subsequent commit will take advantage of this to fix issues
     +    with these commands in sparse-checkouts.
      
          This conversion incidentally fixes t3906.1, because the separate
          update-index process would die with the following error messages:
     @@ builtin/stash.c: static void add_diff_to_buf(struct diff_queue_struct *q,
       }
       
      -static int get_newly_staged(struct strbuf *out, struct object_id *c_tree)
     -+static void unstage_changes_unless_new(struct object_id *cache_tree)
     - {
     +-{
      -	struct child_process cp = CHILD_PROCESS_INIT;
      -	const char *c_tree_hex = oid_to_hex(c_tree);
      -
     - 	/*
     +-	/*
      -	 * diff-index is very similar to diff-tree above, and should be
      -	 * converted together with update_index.
     -+	 * When we enter this function, there has been a clean merge of
     -+	 * relevant trees, and the merge logic always stages whatever merges
     -+	 * cleanly.  We want to unstage those changes, unless it corresponds
     -+	 * to a file that didn't exist as of cache_tree.
     - 	 */
     +-	 */
      -	cp.git_cmd = 1;
      -	strvec_pushl(&cp.args, "diff-index", "--cached", "--name-only",
      -		     "--diff-filter=A", NULL);
      -	strvec_push(&cp.args, c_tree_hex);
      -	return pipe_command(&cp, NULL, 0, out, 0, NULL, 0);
      -}
     - 
     +-
      -static int update_index(struct strbuf *out)
      -{
      -	struct child_process cp = CHILD_PROCESS_INIT;
     -+	struct diff_options diff_opts;
     -+	struct lock_file lock = LOCK_INIT;
     -+	int i;
     - 
     +-
      -	/*
      -	 * Update-index is very complicated and may need to have a public
      -	 * function exposed in order to remove this forking.
     @@ builtin/stash.c: static void add_diff_to_buf(struct diff_queue_struct *q,
      -	cp.git_cmd = 1;
      -	strvec_pushl(&cp.args, "update-index", "--add", "--stdin", NULL);
      -	return pipe_command(&cp, out->buf, out->len, NULL, 0, NULL, 0);
     +-}
     +-
     + static int restore_untracked(struct object_id *u_tree)
     + {
     + 	int res;
     +@@ builtin/stash.c: static int restore_untracked(struct object_id *u_tree)
     + 	return res;
     + }
     + 
     ++static void unstage_changes_unless_new(struct object_id *orig_tree)
     ++{
     ++	/*
     ++	 * When we enter this function, there has been a clean merge of
     ++	 * relevant trees, and the merge logic always stages whatever merges
     ++	 * cleanly.  We want to unstage those changes, unless it corresponds
     ++	 * to a file that didn't exist as of orig_tree.
     ++	 */
     ++
     ++	struct diff_options diff_opts;
     ++	struct lock_file lock = LOCK_INIT;
     ++	int i;
     ++
     ++	/*
     ++	 * Step 1: get a difference between orig_tree (which corresponding
     ++	 * to the index before a merge was run) and the current index
     ++	 * (reflecting the changes brought in by the merge).
     ++	 */
      +	diff_setup(&diff_opts);
      +	diff_opts.flags.recursive = 1;
      +	diff_opts.detect_rename = 0;
      +	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
      +	diff_setup_done(&diff_opts);
      +
     -+	do_diff_cache(cache_tree, &diff_opts);
     ++	do_diff_cache(orig_tree, &diff_opts);
      +	diffcore_std(&diff_opts);
      +
     ++	/* Iterate over the paths that changed due to the merge... */
      +	for (i = 0; i < diff_queued_diff.nr; i++) {
      +		struct diff_filepair *p;
      +		struct cache_entry *ce;
      +		int pos;
      +
     ++		/* Look up the path's position in the current index. */
      +		p = diff_queued_diff.queue[i];
      +		pos = index_name_pos(&the_index, p->two->path,
      +				     strlen(p->two->path));
     -+		if (pos < 0) {
     -+			assert(p->one->oid_valid);
     -+			assert(!p->two->oid_valid);
     -+			ce = make_cache_entry(&the_index,
     -+					      p->one->mode,
     -+					      &p->one->oid,
     -+					      p->one->path,
     -+					      0, 0);
     -+			add_index_entry(&the_index, ce, ADD_CACHE_OK_TO_ADD);
     -+			continue;
     -+		}
     -+		ce = active_cache[pos];
     ++
     ++		/*
     ++		 * Step 2: "unstage" changes, as long as they are still tracked
     ++		 */
      +		if (p->one->oid_valid) {
     ++			/*
     ++			 * Path existed in orig_tree; restore index entry
     ++			 * from that tree in order to "unstage" the changes.
     ++			 */
     ++			int option = ADD_CACHE_OK_TO_REPLACE;
     ++			if (pos < 0)
     ++				option = ADD_CACHE_OK_TO_ADD;
     ++
      +			ce = make_cache_entry(&the_index,
      +					      p->one->mode,
      +					      &p->one->oid,
      +					      p->one->path,
      +					      0, 0);
     -+			add_index_entry(&the_index, ce,
     -+					ADD_CACHE_OK_TO_REPLACE);
     ++			add_index_entry(&the_index, ce, option);
      +		}
      +	}
      +	diff_flush(&diff_opts);
      +
     ++	/*
     ++	 * Step 3: write the new index to disk
     ++	 */
      +	repo_hold_locked_index(the_repository, &lock, LOCK_DIE_ON_ERROR);
      +	if (write_locked_index(&the_index, &lock,
      +			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
      +		die(_("Unable to write index."));
     - }
     - 
     - static int restore_untracked(struct object_id *u_tree)
     ++}
     ++
     + static int do_apply_stash(const char *prefix, struct stash_info *info,
     + 			  int index, int quiet)
     + {
      @@ builtin/stash.c: static int do_apply_stash(const char *prefix, struct stash_info *info,
       		if (reset_tree(&index_tree, 0, 0))
       			return -1;
 3:  5143cba704 ! 3:  ccfedc7140 stash: fix stash application in sparse-checkouts
     @@ Commit message
          staged at the end and include more things that should be quasi-merged,
          (2) stash generally wants changes to NOT be staged.  It only provides
          exceptions when (a) some of the changes had conflicts and thus we want
     -    to use staged to denote the clean merges and higher order stages to
     +    to use stages to denote the clean merges and higher order stages to
          mark the conflicts, or (b) if there is a brand new file we don't want
          it to become untracked.
      
     @@ Commit message
              git update-index --add --stdin <"$a"
              rm -f "$a"
      
     -    You might that between the merge that proceeded these commands and these
     -    different commands here, that we have commands from each of the
     -    different types of special sparsity handling listed at the beginning of
     -    this message, and in fact this precisely led to the following buggy
     -    behaviors:
     +    Looking back at the different types of special sparsity handling listed
     +    at the beginning of this message, you may note that we have at least one
     +    of each type covered here: merge, diff-index, and read-tree.  The weird
     +    mix-and-match led to 3 different bugs:
      
          (1) If a path merged cleanly and it didn't match the sparsity patterns,
          the merge backend would know to avoid writing it to the working tree and
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## builtin/stash.c ##
     -@@ builtin/stash.c: static void unstage_changes_unless_new(struct object_id *cache_tree)
     - 	 * When we enter this function, there has been a clean merge of
     +@@ builtin/stash.c: static void unstage_changes_unless_new(struct object_id *orig_tree)
       	 * relevant trees, and the merge logic always stages whatever merges
       	 * cleanly.  We want to unstage those changes, unless it corresponds
     --	 * to a file that didn't exist as of cache_tree.
     -+	 * to a file that didn't exist as of cache_tree.  However, if any
     -+	 * SKIP_WORKTREE path is modified relative to cache_tree, then we
     -+	 * want to clear the SKIP_WORKTREE bit and write it to the worktree
     -+	 * before unstaging.
     + 	 * to a file that didn't exist as of orig_tree.
     ++	 *
     ++	 * However, if any SKIP_WORKTREE path is modified relative to
     ++	 * orig_tree, then we want to clear the SKIP_WORKTREE bit and write
     ++	 * it to the worktree before unstaging.
       	 */
       
      +	struct checkout state = CHECKOUT_INIT;
     @@ builtin/stash.c: static void unstage_changes_unless_new(struct object_id *cache_
      +	state.refresh_cache = 1;
      +	state.istate = &the_index;
      +
     - 	diff_setup(&diff_opts);
     - 	diff_opts.flags.recursive = 1;
     - 	diff_opts.detect_rename = 0;
     -@@ builtin/stash.c: static void unstage_changes_unless_new(struct object_id *cache_tree)
     - 			continue;
     - 		}
     - 		ce = active_cache[pos];
     -+		if (ce_skip_worktree(ce)) {
     + 	/*
     + 	 * Step 1: get a difference between orig_tree (which corresponding
     + 	 * to the index before a merge was run) and the current index
     +@@ builtin/stash.c: static void unstage_changes_unless_new(struct object_id *orig_tree)
     + 				     strlen(p->two->path));
     + 
     + 		/*
     +-		 * Step 2: "unstage" changes, as long as they are still tracked
     ++		 * Step 2: Place changes in the working tree
     ++		 *
     ++		 * Stash is about restoring changes *to the working tree*.
     ++		 * So if the merge successfully got a new version of some
     ++		 * path, but left it out of the working tree, then clear the
     ++		 * SKIP_WORKTREE bit and write it to the working tree.
     ++		 */
     ++		if (pos >= 0 && ce_skip_worktree(active_cache[pos])) {
      +			struct stat st;
     ++
     ++			ce = active_cache[pos];
      +			if (!lstat(ce->name, &st)) {
     ++				/* Conflicting path present; relocate it */
      +				struct strbuf new_path = STRBUF_INIT;
      +				int fd;
      +
     @@ builtin/stash.c: static void unstage_changes_unless_new(struct object_id *cache_
      +				strbuf_release(&new_path);
      +			}
      +			checkout_entry(ce, &state, NULL, NULL);
     ++			ce->ce_flags &= ~CE_SKIP_WORKTREE;
      +		}
      +
     -+		ce->ce_flags &= ~CE_SKIP_WORKTREE;
     ++		/*
     ++		 * Step 3: "unstage" changes, as long as they are still tracked
     + 		 */
       		if (p->one->oid_valid) {
     - 			ce = make_cache_entry(&the_index,
     - 					      p->one->mode,
     + 			/*
     +@@ builtin/stash.c: static void unstage_changes_unless_new(struct object_id *orig_tree)
     + 	diff_flush(&diff_opts);
     + 
     + 	/*
     +-	 * Step 3: write the new index to disk
     ++	 * Step 4: write the new index to disk
     + 	 */
     + 	repo_hold_locked_index(the_repository, &lock, LOCK_DIE_ON_ERROR);
     + 	if (write_locked_index(&the_index, &lock,
      
       ## t/t7012-skip-worktree-writing.sh ##
      @@ t/t7012-skip-worktree-writing.sh: test_expect_success '--ignore-skip-worktree-entries leaves worktree alone' '

-- 
gitgitgadget

  parent reply	other threads:[~2020-12-01 22:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 16:53 [PATCH " Elijah Newren via GitGitGadget
2020-11-20 16:53 ` [PATCH 1/3] t7012: add a testcase demonstrating stash apply bugs in sparse checkouts Elijah Newren via GitGitGadget
2020-11-20 16:53 ` [PATCH 2/3] stash: remove unnecessary process forking Elijah Newren via GitGitGadget
2020-11-20 16:53 ` [PATCH 3/3] stash: fix stash application in sparse-checkouts Elijah Newren via GitGitGadget
2020-11-21 12:47   ` Chris Torek
2020-11-22  3:47     ` Elijah Newren
2020-11-25 22:14 ` [PATCH 0/3] Fix stash apply in sparse checkouts (and a submodule test) Junio C Hamano
2020-11-26  5:31   ` Elijah Newren
2020-12-01 22:25 ` Elijah Newren via GitGitGadget [this message]
2020-12-01 22:25   ` [PATCH v2 1/3] t7012: add a testcase demonstrating stash apply bugs in sparse checkouts Elijah Newren via GitGitGadget
2020-12-01 22:25   ` [PATCH v2 2/3] stash: remove unnecessary process forking Elijah Newren via GitGitGadget
2020-12-01 23:02     ` Junio C Hamano
2020-12-02 16:09       ` Elijah Newren
2020-12-01 22:25   ` [PATCH v2 3/3] stash: fix stash application in sparse-checkouts Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.919.v2.git.git.1606861519.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=chris.torek@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --subject='Re: [PATCH v2 0/3] Fix stash apply in sparse checkouts (and a submodule test)' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).