git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: newren@gmail.com, stolee@gmail.com, gitster@pobox.com,
	Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v2 0/6] Sparse Index: Integrate with merge, cherry-pick, rebase, and revert
Date: Tue, 24 Aug 2021 21:52:39 +0000	[thread overview]
Message-ID: <pull.1019.v2.git.1629841965.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1019.git.1629220124.gitgitgadget@gmail.com>

This series integrates the sparse index with commands that perform merges
such as 'git merge', 'git cherry-pick', 'git revert' (free with
cherry-pick), and 'git rebase'.

When the ORT merge strategy is enabled, this allows most merges to succeed
without expanding the sparse index, leading to significant performance
gains. I tested these changes against an internal monorepo with over 2
million paths at HEAD but with a sparse-checkout that only has ~60,000 files
within the sparse-checkout cone. 'git merge' commands went from 5-6 seconds
to 0.750-1.250s.

In the case of the recursive merge strategy, the sparse index is expanded
before the recursive algorithm proceeds. We expect that this is as good as
we can get with that strategy. When the strategy shifts to ORT as the
default, then this will not be a problem except for users who decide to
change the behavior.

Most of the hard work was done by previous series, such as
ds/sparse-index-ignored-files (which this series is based on).


Updates in V2
=============

 * The tests no longer specify GIT_TEST_MERGE_ALGORITHM or directly
   reference "-s ort". By relaxing this condition, I found an issue with
   'git cherry-pick' and 'git rebase' when using the 'recursive' algorithm
   which is fixed in a new patch.

 * Use the pul.twohead config to specify the ORT merge algorithm to avoid
   expanding the sparse index when that is what we are testing.

 * Corrected some misstatements in my commit messages.

Thanks, -Stolee

Derrick Stolee (6):
  diff: ignore sparse paths in diffstat
  merge: make sparse-aware with ORT
  merge-ort: expand only for out-of-cone conflicts
  t1092: add cherry-pick, rebase tests
  sequencer: ensure full index if not ORT strategy
  sparse-index: integrate with cherry-pick and rebase

 builtin/merge.c                          |  3 +
 builtin/rebase.c                         |  6 ++
 builtin/revert.c                         |  3 +
 diff.c                                   |  8 +++
 merge-ort.c                              | 15 ++++
 merge-recursive.c                        |  3 +
 sequencer.c                              |  9 +++
 t/t1092-sparse-checkout-compatibility.sh | 92 +++++++++++++++++++++---
 8 files changed, 129 insertions(+), 10 deletions(-)


base-commit: 8d55a6ba2fdf64cee4eb51f3cb6f9808bd0b7505
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1019%2Fderrickstolee%2Fsparse-index%2Fmerge-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1019/derrickstolee/sparse-index/merge-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1019

Range-diff vs v1:

 1:  7cad9eee90b < -:  ----------- t1092: use ORT merge strategy
 2:  9f50f11d394 ! 1:  c5ae705648c diff: ignore sparse paths in diffstat
     @@ Commit message
          diff: ignore sparse paths in diffstat
      
          The diff_populate_filespec() method is used to describe the diff after a
     -    merge operation is complete, especially when a conflict appears. In
     -    order to avoid expanding a sparse index, the reuse_worktree_file() needs
     -    to be adapted to ignore files that are outside of the sparse-checkout
     -    cone. The file names and OIDs used for this check come from the merged
     -    tree in the case of the ORT strategy, not the index, hence the ability
     -    to look into these paths without having already expanded the index.
     +    merge operation is complete. In order to avoid expanding a sparse index,
     +    the reuse_worktree_file() needs to be adapted to ignore files that are
     +    outside of the sparse-checkout cone. The file names and OIDs used for
     +    this check come from the merged tree in the case of the ORT strategy,
     +    not the index, hence the ability to look into these paths without having
     +    already expanded the index.
     +
     +    The work done by reuse_worktree_file() is only an optimization, and
     +    requires the file being on disk for it to be of any value. Thus, it is
     +    safe to exit the method early if we do not expect the file on disk.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ diff.c
       #ifdef NO_FAST_WORKING_DIRECTORY
       #define FAST_WORKING_DIRECTORY 0
      @@ diff.c: static int reuse_worktree_file(struct index_state *istate,
     - 	if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid))
     + 	if (!want_file && would_convert_to_git(istate, name))
       		return 0;
       
      +	/*
     @@ diff.c: static int reuse_worktree_file(struct index_state *istate,
      +	if (!path_in_sparse_checkout(name, istate))
      +		return 0;
      +
     - 	/*
     - 	 * Similarly, if we'd have to convert the file contents anyway, that
     - 	 * makes the optimization not worthwhile.
     + 	len = strlen(name);
     + 	pos = index_name_pos(istate, name, len);
     + 	if (pos < 0)
 3:  4c1104a0dd3 ! 2:  bb150483bcf merge: make sparse-aware with ORT
     @@ Commit message
             method. We expect sparse-index users to also have the 'ort' strategy
             enabled.
      
     -    2. If the merge results in a conflicted file, then we expand the index
     -       before updating the working tree. The loop that iterates over the
     -       worktree replaces index entries and tracks 'origintal_cache_nr' which
     -       can become completely wrong if the index expands in the middle of the
     -       operation. This safety valve is important before that loop starts. A
     -       later change will focus this to only expand if we indeed have a
     -       conflict outside of the sparse-checkout cone.
     +    2. With the 'ort' strategy, if the merge results in a conflicted file,
     +       then we expand the index before updating the working tree. The loop
     +       that iterates over the worktree replaces index entries and tracks
     +       'origintal_cache_nr' which can become completely wrong if the index
     +       expands in the middle of the operation. This safety valve is
     +       important before that loop starts. A later change will focus this
     +       to only expand if we indeed have a conflict outside of the
     +       sparse-checkout cone.
     +
     +    3. Other merge strategies are executed as a 'git merge-X' subcommand,
     +       and those strategies are currently protected with the
     +       'command_requires_full_index' guard.
      
          Some test updates are required, including a mistaken 'git checkout -b'
          that did not specify the base branch, causing merges to be fast-forward
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	ensure_not_expanded add . &&
      +
      +	ensure_not_expanded checkout -f update-deep &&
     -+	ensure_not_expanded merge -s ort -m merge update-folder1 &&
     -+	ensure_not_expanded merge -s ort -m merge update-folder2
     ++	(
     ++		sane_unset GIT_TEST_MERGE_ALGORITHM &&
     ++		git -C sparse-index config pull.twohead ort &&
     ++		ensure_not_expanded merge -m merge update-folder1 &&
     ++		ensure_not_expanded merge -m merge update-folder2
     ++	)
       '
       
       # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 4:  e47b15554e3 ! 3:  815b1b1cfbf merge-ort: expand only for out-of-cone conflicts
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## merge-ort.c ##
     -@@
     - #include "tree.h"
     - #include "unpack-trees.h"
     - #include "xdiff-interface.h"
     -+#include "dir.h"
     - 
     - /*
     -  * We have many arrays of size 3.  Whenever we have such an array, the
      @@ merge-ort.c: static int record_conflicted_index_entries(struct merge_options *opt)
       
       	/*
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is e
      +	if test "$1" = "!"
      +	then
      +		shift &&
     -+		(
     -+			GIT_TRACE2_EVENT="$(pwd)/trace2.txt" &&
     -+			GIT_TRACE2_EVENT_NESTING=10 &&
     -+			export GIT_TRACE2_EVENT &&
     -+			export GIT_TRACE2_EVENT_NESTING &&
     -+			test_must_fail git -C sparse-index "$@" || return 1
     -+		)
     ++		test_must_fail env \
     ++			GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
     ++			git -C sparse-index "$@" || return 1
      +	else
      +		GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
      +			git -C sparse-index "$@" || return 1
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is e
       }
       
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
     - 	ensure_not_expanded merge -s ort -m merge update-folder2
     + 	)
       '
       
      +test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +		git -C sparse-index commit -a -m "$side" || return 1
      +	done &&
      +
     -+	ensure_not_expanded ! merge -m merged expand-right
     ++	(
     ++		sane_unset GIT_TEST_MERGE_ALGORITHM &&
     ++		git -C sparse-index config pull.twohead ort &&
     ++		ensure_not_expanded ! merge -m merged expand-right
     ++	)
      +'
      +
       # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 5:  ca23bf38bd9 ! 4:  8032154bc8a t1092: add cherry-pick, rebase tests
     @@ Commit message
          t1092: add cherry-pick, rebase tests
      
          Add tests to check that cherry-pick and rebase behave the same in the
     -    sparse-index case as in the full index cases.
     +    sparse-index case as in the full index cases. These tests are agnostic
     +    to GIT_TEST_MERGE_ALGORITHM, so a full CI test suite will check both the
     +    'ort' and 'recursive' strategies on this test.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'checkout and rese
      -	test_all_match git rev-parse HEAD^{tree} &&
      -	test_all_match git merge -m "folder2" update-folder2 &&
      -	test_all_match git rev-parse HEAD^{tree}
     -+	for OPERATION in "merge -s ort -m merge" cherry-pick rebase
     ++	for OPERATION in "merge -m merge" cherry-pick rebase
      +	do
      +		test_all_match git checkout -B temp update-deep &&
      +		test_all_match git $OPERATION update-folder1 &&
 -:  ----------- > 5:  90ac85500b8 sequencer: ensure full index if not ORT strategy
 6:  350ed86a453 ! 6:  df4bbec744f sparse-index: integrate with cherry-pick and rebase
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'merge with confli
       	init_repos &&
       
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
     - 	echo >>sparse-index/untracked.txt &&
     - 	ensure_not_expanded add . &&
     - 
     --	ensure_not_expanded checkout -f update-deep &&
     --	ensure_not_expanded merge -s ort -m merge update-folder1 &&
     --	ensure_not_expanded merge -s ort -m merge update-folder2
     -+	for OPERATION in "merge -s ort -m merge" cherry-pick rebase
     -+	do
     -+		ensure_not_expanded checkout -f -B temp update-deep &&
     -+		ensure_not_expanded $OPERATION update-folder1 &&
     -+		ensure_not_expanded $OPERATION update-folder2 || return 1
     -+	done
     + 	(
     + 		sane_unset GIT_TEST_MERGE_ALGORITHM &&
     + 		git -C sparse-index config pull.twohead ort &&
     +-		ensure_not_expanded merge -m merge update-folder1 &&
     +-		ensure_not_expanded merge -m merge update-folder2
     ++		for OPERATION in "merge -m merge" cherry-pick rebase
     ++		do
     ++			ensure_not_expanded merge -m merge update-folder1 &&
     ++			ensure_not_expanded merge -m merge update-folder2 || return 1
     ++		done
     + 	)
       '
       
     - test_expect_success 'sparse-index is not expanded: merge conflict in cone' '

-- 
gitgitgadget

  parent reply	other threads:[~2021-08-24 21:53 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-17 17:08 [PATCH 0/6] Sparse Index: Integrate with merge, cherry-pick, rebase, and revert Derrick Stolee via GitGitGadget
2021-08-17 17:08 ` [PATCH 1/6] t1092: use ORT merge strategy Derrick Stolee via GitGitGadget
2021-08-18 17:16   ` Taylor Blau
2021-08-18 18:10   ` Junio C Hamano
2021-08-18 18:42     ` Derrick Stolee
2021-08-18 22:22       ` Junio C Hamano
2021-08-20 21:23       ` Elijah Newren
2021-08-20 23:32         ` Junio C Hamano
2021-08-21  0:20           ` Elijah Newren
2021-08-21  4:20             ` Junio C Hamano
2021-08-21 23:48               ` Elijah Newren
2021-08-17 17:08 ` [PATCH 2/6] diff: ignore sparse paths in diffstat Derrick Stolee via GitGitGadget
2021-08-20 21:32   ` Elijah Newren
2021-08-24 18:30     ` Derrick Stolee
2021-08-27 22:27       ` Elijah Newren
2021-08-17 17:08 ` [PATCH 3/6] merge: make sparse-aware with ORT Derrick Stolee via GitGitGadget
2021-08-20 21:40   ` Elijah Newren
2021-08-17 17:08 ` [PATCH 4/6] merge-ort: expand only for out-of-cone conflicts Derrick Stolee via GitGitGadget
2021-08-20 21:53   ` Elijah Newren
2021-08-17 17:08 ` [PATCH 5/6] t1092: add cherry-pick, rebase tests Derrick Stolee via GitGitGadget
2021-08-20 21:58   ` Elijah Newren
2021-08-17 17:08 ` [PATCH 6/6] sparse-index: integrate with cherry-pick and rebase Derrick Stolee via GitGitGadget
2021-08-21  0:12   ` Elijah Newren
2021-08-24 21:52 ` Derrick Stolee via GitGitGadget [this message]
2021-08-24 21:52   ` [PATCH v2 1/6] diff: ignore sparse paths in diffstat Derrick Stolee via GitGitGadget
2021-08-24 21:52   ` [PATCH v2 2/6] merge: make sparse-aware with ORT Derrick Stolee via GitGitGadget
2021-08-27 22:43     ` Elijah Newren
2021-08-30 17:18       ` Derrick Stolee
2021-09-08  1:49         ` Derrick Stolee
2021-08-24 21:52   ` [PATCH v2 3/6] merge-ort: expand only for out-of-cone conflicts Derrick Stolee via GitGitGadget
2021-08-27 22:47     ` Elijah Newren
2021-08-30 17:21       ` Derrick Stolee
2021-08-24 21:52   ` [PATCH v2 4/6] t1092: add cherry-pick, rebase tests Derrick Stolee via GitGitGadget
2021-08-24 21:52   ` [PATCH v2 5/6] sequencer: ensure full index if not ORT strategy Derrick Stolee via GitGitGadget
2021-08-24 21:52   ` [PATCH v2 6/6] sparse-index: integrate with cherry-pick and rebase Derrick Stolee via GitGitGadget
2021-08-27 22:56   ` [PATCH v2 0/6] Sparse Index: Integrate with merge, cherry-pick, rebase, and revert Elijah Newren
2021-08-30 17:26     ` Derrick Stolee
2021-09-08 11:23   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-09-08 11:23     ` [PATCH v3 1/6] diff: ignore sparse paths in diffstat Derrick Stolee via GitGitGadget
2021-09-08 11:23     ` [PATCH v3 2/6] merge: make sparse-aware with ORT Derrick Stolee via GitGitGadget
2021-09-08 11:23     ` [PATCH v3 3/6] merge-ort: expand only for out-of-cone conflicts Derrick Stolee via GitGitGadget
2021-09-08 11:23     ` [PATCH v3 4/6] t1092: add cherry-pick, rebase tests Derrick Stolee via GitGitGadget
2021-09-08 11:24     ` [PATCH v3 5/6] sequencer: ensure full index if not ORT strategy Derrick Stolee via GitGitGadget
2021-09-08 11:24     ` [PATCH v3 6/6] sparse-index: integrate with cherry-pick and rebase Derrick Stolee via GitGitGadget
2021-09-09 14:16     ` [PATCH v3 0/6] Sparse Index: Integrate with merge, cherry-pick, rebase, and revert Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1019.v2.git.1629841965.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).