git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Sparse Index: diff and blame builtins
@ 2021-10-14 17:25 Lessley Dennington via GitGitGadget
  2021-10-14 17:25 ` [PATCH 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-10-14 17:25 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Lessley Dennington

This series is based on vd/sparse-reset. It integrates the sparse index with
git diff and git blame and includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the commands
 2. repository settings to enable the sparse index

The p2000 tests demonstrate a ~30% execution time reduction for 'git diff'
and a ~75% execution time reduction for 'git diff --staged' using a sparse
index. For 'git blame', the reduction time was ~60% for a file two levels
deep and ~30% for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.30: git diff (full-v3)                  0.37    0.36 -2.7%
2000.31: git diff (full-v4)                  0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)                0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)                0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)         0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)         0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)       0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)       0.16    0.04 -75.0%
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%


Thanks, Lessley

Lessley Dennington (2):
  diff: enable and test the sparse index
  blame: enable and test the sparse index

 builtin/blame.c                          |  3 ++
 builtin/diff.c                           |  3 ++
 t/perf/p2000-sparse-operations.sh        |  4 ++
 t/t1092-sparse-checkout-compatibility.sh | 66 +++++++++++++++++++++---
 4 files changed, 69 insertions(+), 7 deletions(-)


base-commit: 57049e844c80d5fe1394e6d65b28705eabfd6585
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1050%2Fldennington%2Fdiff-blame-sparse-index-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1050
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/2] diff: enable and test the sparse index
  2021-10-14 17:25 [PATCH 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
@ 2021-10-14 17:25 ` Lessley Dennington via GitGitGadget
  2021-10-15 16:46   ` Derrick Stolee
  2021-10-14 17:25 ` [PATCH 2/2] blame: " Lessley Dennington via GitGitGadget
  2021-10-15 21:20 ` [PATCH v2 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-10-14 17:25 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Lessley Dennington, Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code with
the 'git status' and 'git checkout' commands that were already integrated.
The most interesting thing to do is to add tests that verify that 'git diff'
behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~30% execution time reduction for 'git
diff' and a ~75% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.37    0.36 -2.7%
2000.31: git diff (full-v4)               0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)             0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)             0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/diff.c                           |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 42 ++++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index dd8ce688ba7..cbf7b51c7c0 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -437,6 +437,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	prefix = setup_git_directory_gently(&nongit);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (!no_index) {
 		/*
 		 * Treat git diff with at least one path outside of the
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bfd332120c8..bff93f16e93 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -113,5 +113,7 @@ test_perf_on_all git checkout -f -
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
+test_perf_on_all git diff
+test_perf_on_all git diff --staged
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f19c1b3e2eb..1070bff1a83 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -386,6 +386,43 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
+test_expect_success 'diff partially-staged' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	# Add file within cone
+	test_all_match git sparse-checkout set deep &&
+	run_on_all ../edit-contents deep/testfile &&
+	test_all_match git add deep/testfile &&
+	run_on_all ../edit-contents deep/testfile &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+
+	# Add file outside cone
+	test_all_match git reset --hard &&
+	run_on_all mkdir newdirectory &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_all_match git sparse-checkout set newdirectory &&
+	test_all_match git add newdirectory/testfile &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_all_match git sparse-checkout set &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+
+	# Merge conflict outside cone
+	test_all_match git reset --hard &&
+	test_all_match git checkout merge-left &&
+	test_all_match test_must_fail git merge merge-right &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged
+'
+
 # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
 # running this test with 'df-conflict-2' after 'df-conflict-1'.
 test_expect_success 'diff with renames and conflicts' '
@@ -800,6 +837,11 @@ test_expect_success 'sparse-index is not expanded' '
 	# Wildcard identifies only full sparse directories, no index expansion
 	ensure_not_expanded reset deepest -- folder\* &&
 
+	echo a test change >>sparse-index/README.md &&
+	ensure_not_expanded diff &&
+	git -C sparse-index add README.md &&
+	ensure_not_expanded diff --staged &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 2/2] blame: enable and test the sparse index
  2021-10-14 17:25 [PATCH 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-10-14 17:25 ` [PATCH 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-10-14 17:25 ` Lessley Dennington via GitGitGadget
  2021-11-23  7:57   ` Elijah Newren
  2021-10-15 21:20 ` [PATCH v2 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-10-14 17:25 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Lessley Dennington, Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
currently does not support blaming files outside of the sparse definition.
Attempting to do so fails with the following error:

fatal: no such path '<path outside sparse definition>' in HEAD

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  3 +++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..af3d81e2bd4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	long anchor;
 	const int hexsz = the_hash_algo->hexsz;
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	setup_default_color_by_age();
 	git_config(git_blame_config, &output_option);
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bff93f16e93..9ac76a049b8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
 test_perf_on_all git diff --staged
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 1070bff1a83..54826e858a9 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -485,15 +485,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
 	test_all_match git blame deep/deeper1/deepest/a
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# Blame does not support blaming files outside of the sparse
+# definition, so we verify this scenario.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	test_sparse_match git sparse-checkout set &&
+	test_sparse_match test_must_fail git blame folder1/a &&
+	test_sparse_match test_must_fail git blame folder2/a &&
+	test_sparse_match test_must_fail git blame deep/deeper2/a &&
+	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -871,6 +872,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
 	)
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	ensure_not_expanded blame a &&
+	ensure_not_expanded blame deep/a &&
+	ensure_not_expanded blame deep/deeper1/a &&
+	ensure_not_expanded blame deep/deeper1/deepest/a
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/2] diff: enable and test the sparse index
  2021-10-14 17:25 ` [PATCH 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-10-15 16:46   ` Derrick Stolee
  0 siblings, 0 replies; 56+ messages in thread
From: Derrick Stolee @ 2021-10-15 16:46 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget, git
  Cc: gitster, newren, Lessley Dennington

On 10/14/2021 1:25 PM, Lessley Dennington via GitGitGadget wrote:

There is a failure in 'seen', and it's due to a subtle reason
that we didn't catch in gitgitgadget PR builds. It's because
ds/add-rm-with-sparse-index wasn't in your history until it was
merged into 'seen'.

> +test_expect_success 'diff partially-staged' '
> +	init_repos &&
> +
> +	write_script edit-contents <<-\EOF &&
> +	echo text >>$1
> +	EOF
> +
> +	# Add file within cone
> +	test_all_match git sparse-checkout set deep &&

The root cause is that you should use "test_sparse_match" when
adjusting the sparse-checkout definition. The full-checkout repo
is getting the sparse-checkout set to a single pattern "deep",
but without cone mode.

> +	run_on_all ../edit-contents deep/testfile &&
> +	test_all_match git add deep/testfile &&

But the test fails here because "deep/testfile" doesn't match
the sparse-checkout definition in the full-checkout repo.

> +	run_on_all ../edit-contents deep/testfile &&
> +
> +	test_all_match git diff &&
> +	test_all_match git diff --staged &&
> +
> +	# Add file outside cone
> +	test_all_match git reset --hard &&
> +	run_on_all mkdir newdirectory &&
> +	run_on_all ../edit-contents newdirectory/testfile &&
> +	test_all_match git sparse-checkout set newdirectory &&

You'll want to change this line, too.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/2] Sparse Index: diff and blame builtins
  2021-10-14 17:25 [PATCH 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-10-14 17:25 ` [PATCH 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
  2021-10-14 17:25 ` [PATCH 2/2] blame: " Lessley Dennington via GitGitGadget
@ 2021-10-15 21:20 ` Lessley Dennington via GitGitGadget
  2021-10-15 21:20   ` [PATCH v2 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
                     ` (2 more replies)
  2 siblings, 3 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-10-15 21:20 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Lessley Dennington

This series is based on vd/sparse-reset. It integrates the sparse index with
git diff and git blame and includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the commands
 2. repository settings to enable the sparse index

The p2000 tests demonstrate a ~30% execution time reduction for 'git diff'
and a ~75% execution time reduction for 'git diff --staged' using a sparse
index. For 'git blame', the reduction time was ~60% for a file two levels
deep and ~30% for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.30: git diff (full-v3)                  0.37    0.36 -2.7%
2000.31: git diff (full-v4)                  0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)                0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)                0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)         0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)         0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)       0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)       0.16    0.04 -75.0%
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%



Changes since V1
================

 * Fix failing diff partially-staged test in
   t1092-sparse-checkout-compatibility.sh, which was breaking in seen.

Thanks, Lessley

Lessley Dennington (2):
  diff: enable and test the sparse index
  blame: enable and test the sparse index

 builtin/blame.c                          |  3 ++
 builtin/diff.c                           |  3 ++
 t/perf/p2000-sparse-operations.sh        |  4 ++
 t/t1092-sparse-checkout-compatibility.sh | 69 +++++++++++++++++++++---
 4 files changed, 72 insertions(+), 7 deletions(-)


base-commit: 57049e844c80d5fe1394e6d65b28705eabfd6585
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1050%2Fldennington%2Fdiff-blame-sparse-index-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1050

Range-diff vs v1:

 1:  9a597233cf4 ! 1:  ac33159d020 diff: enable and test the sparse index
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
      +	EOF
      +
      +	# Add file within cone
     -+	test_all_match git sparse-checkout set deep &&
     ++	test_sparse_match git sparse-checkout set deep &&
      +	run_on_all ../edit-contents deep/testfile &&
      +	test_all_match git add deep/testfile &&
      +	run_on_all ../edit-contents deep/testfile &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
      +	test_all_match git reset --hard &&
      +	run_on_all mkdir newdirectory &&
      +	run_on_all ../edit-contents newdirectory/testfile &&
     -+	test_all_match git sparse-checkout set newdirectory &&
     ++	test_sparse_match git sparse-checkout set newdirectory &&
      +	test_all_match git add newdirectory/testfile &&
      +	run_on_all ../edit-contents newdirectory/testfile &&
     -+	test_all_match git sparse-checkout set &&
     ++	test_sparse_match git sparse-checkout set &&
      +
      +	test_all_match git diff &&
      +	test_all_match git diff --staged &&
      +
      +	# Merge conflict outside cone
     -+	test_all_match git reset --hard &&
     ++	# The sparse checkout will report a warning that is not in the
     ++	# full checkout, so we use `run_on_all` instead of
     ++	# `test_all_match`
     ++	run_on_all git reset --hard &&
      +	test_all_match git checkout merge-left &&
      +	test_all_match test_must_fail git merge merge-right &&
      +
 2:  ddcee003c92 = 2:  a0b6a152c75 blame: enable and test the sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 1/2] diff: enable and test the sparse index
  2021-10-15 21:20 ` [PATCH v2 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
@ 2021-10-15 21:20   ` Lessley Dennington via GitGitGadget
  2021-10-25 20:47     ` Taylor Blau
  2021-10-15 21:20   ` [PATCH v2 2/2] blame: " Lessley Dennington via GitGitGadget
  2021-11-01 21:27   ` [PATCH v3 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-10-15 21:20 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Lessley Dennington, Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code with
the 'git status' and 'git checkout' commands that were already integrated.
The most interesting thing to do is to add tests that verify that 'git diff'
behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~30% execution time reduction for 'git
diff' and a ~75% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.37    0.36 -2.7%
2000.31: git diff (full-v4)               0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)             0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)             0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/diff.c                           |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 45 ++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index dd8ce688ba7..cbf7b51c7c0 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -437,6 +437,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	prefix = setup_git_directory_gently(&nongit);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (!no_index) {
 		/*
 		 * Treat git diff with at least one path outside of the
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bfd332120c8..bff93f16e93 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -113,5 +113,7 @@ test_perf_on_all git checkout -f -
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
+test_perf_on_all git diff
+test_perf_on_all git diff --staged
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f19c1b3e2eb..e5d15be9d45 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -386,6 +386,46 @@ test_expect_success 'diff --staged' '
 	test_all_match git diff --staged
 '
 
+test_expect_success 'diff partially-staged' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	# Add file within cone
+	test_sparse_match git sparse-checkout set deep &&
+	run_on_all ../edit-contents deep/testfile &&
+	test_all_match git add deep/testfile &&
+	run_on_all ../edit-contents deep/testfile &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+
+	# Add file outside cone
+	test_all_match git reset --hard &&
+	run_on_all mkdir newdirectory &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set newdirectory &&
+	test_all_match git add newdirectory/testfile &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+
+	# Merge conflict outside cone
+	# The sparse checkout will report a warning that is not in the
+	# full checkout, so we use `run_on_all` instead of
+	# `test_all_match`
+	run_on_all git reset --hard &&
+	test_all_match git checkout merge-left &&
+	test_all_match test_must_fail git merge merge-right &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged
+'
+
 # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
 # running this test with 'df-conflict-2' after 'df-conflict-1'.
 test_expect_success 'diff with renames and conflicts' '
@@ -800,6 +840,11 @@ test_expect_success 'sparse-index is not expanded' '
 	# Wildcard identifies only full sparse directories, no index expansion
 	ensure_not_expanded reset deepest -- folder\* &&
 
+	echo a test change >>sparse-index/README.md &&
+	ensure_not_expanded diff &&
+	git -C sparse-index add README.md &&
+	ensure_not_expanded diff --staged &&
+
 	ensure_not_expanded checkout -f update-deep &&
 	test_config -C sparse-index pull.twohead ort &&
 	(
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 2/2] blame: enable and test the sparse index
  2021-10-15 21:20 ` [PATCH v2 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-10-15 21:20   ` [PATCH v2 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-10-15 21:20   ` Lessley Dennington via GitGitGadget
  2021-10-25 20:53     ` Taylor Blau
  2021-11-01 21:27   ` [PATCH v3 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-10-15 21:20 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Lessley Dennington, Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
currently does not support blaming files outside of the sparse definition.
Attempting to do so fails with the following error:

fatal: no such path '<path outside sparse definition>' in HEAD

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  3 +++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..af3d81e2bd4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	long anchor;
 	const int hexsz = the_hash_algo->hexsz;
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	setup_default_color_by_age();
 	git_config(git_blame_config, &output_option);
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bff93f16e93..9ac76a049b8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
 test_perf_on_all git diff --staged
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index e5d15be9d45..960ccf2d150 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
 	test_all_match git blame deep/deeper1/deepest/a
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# Blame does not support blaming files outside of the sparse
+# definition, so we verify this scenario.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	test_sparse_match git sparse-checkout set &&
+	test_sparse_match test_must_fail git blame folder1/a &&
+	test_sparse_match test_must_fail git blame folder2/a &&
+	test_sparse_match test_must_fail git blame deep/deeper2/a &&
+	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
 	)
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	ensure_not_expanded blame a &&
+	ensure_not_expanded blame deep/a &&
+	ensure_not_expanded blame deep/deeper1/a &&
+	ensure_not_expanded blame deep/deeper1/deepest/a
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] diff: enable and test the sparse index
  2021-10-15 21:20   ` [PATCH v2 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-10-25 20:47     ` Taylor Blau
  2021-10-26 16:10       ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Taylor Blau @ 2021-10-25 20:47 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, gitster, newren, Lessley Dennington

On Fri, Oct 15, 2021 at 09:20:34PM +0000, Lessley Dennington via GitGitGadget wrote:
> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Enable the sparse index within the 'git diff' command. Its implementation
> already safely integrates with the sparse index because it shares code with
> the 'git status' and 'git checkout' commands that were already integrated.

Good, it looks like most of the heavy-lifting to make `git diff` work
with the sparse index was already done elsewhere.

It may be helpful here to include either one of two things to help
readers and reviewers understand what's going on:

  - A summary of what `git status` and/or `git checkout` does to work
    with the sparse index.
  - Or the patches which make those commands work with the sparse index
    so that readers can refer back to them.

Having either of those would help readers who are unfamiliar with
builtin/diff.c convince themselves more easily that setting
'command_requires_full_index = 0' is all that's needed here.

> The most interesting thing to do is to add tests that verify that 'git diff'
> behaves correctly when the sparse index is enabled. These cases are:
>
> 1. The index is not expanded for 'diff' and 'diff --staged'
> 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
> checkout, and sparse index repositories in the following partially-staged
> scenarios (i.e. the index, HEAD, and working directory differ at a given
> path):
>     1. Path is within sparse-checkout cone
>     2. Path is outside sparse-checkout cone
>     3. A merge conflict exists for paths outside sparse-checkout cone

Nice, these are all of the test cases that I would expect to demonstrate
interesting behavior.

> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f19c1b3e2eb..e5d15be9d45 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -386,6 +386,46 @@ test_expect_success 'diff --staged' '
>  	test_all_match git diff --staged
>  '
>
> +test_expect_success 'diff partially-staged' '
> +	init_repos &&
> +
> +	write_script edit-contents <<-\EOF &&
> +	echo text >>$1
> +	EOF
> +
> +	# Add file within cone
> +	test_sparse_match git sparse-checkout set deep &&
> +	run_on_all ../edit-contents deep/testfile &&
> +	test_all_match git add deep/testfile &&
> +	run_on_all ../edit-contents deep/testfile &&
> +
> +	test_all_match git diff &&
> +	test_all_match git diff --staged &&
> +
> +	# Add file outside cone
> +	test_all_match git reset --hard &&
> +	run_on_all mkdir newdirectory &&
> +	run_on_all ../edit-contents newdirectory/testfile &&
> +	test_sparse_match git sparse-checkout set newdirectory &&
> +	test_all_match git add newdirectory/testfile &&
> +	run_on_all ../edit-contents newdirectory/testfile &&
> +	test_sparse_match git sparse-checkout set &&
> +
> +	test_all_match git diff &&
> +	test_all_match git diff --staged &&
> +
> +	# Merge conflict outside cone
> +	# The sparse checkout will report a warning that is not in the
> +	# full checkout, so we use `run_on_all` instead of
> +	# `test_all_match`
> +	run_on_all git reset --hard &&
> +	test_all_match git checkout merge-left &&
> +	test_all_match test_must_fail git merge merge-right &&
> +
> +	test_all_match git diff &&
> +	test_all_match git diff --staged
> +'
> +
>  # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
>  # running this test with 'df-conflict-2' after 'df-conflict-1'.
>  test_expect_success 'diff with renames and conflicts' '
> @@ -800,6 +840,11 @@ test_expect_success 'sparse-index is not expanded' '
>  	# Wildcard identifies only full sparse directories, no index expansion
>  	ensure_not_expanded reset deepest -- folder\* &&
>
> +	echo a test change >>sparse-index/README.md &&
> +	ensure_not_expanded diff &&

Thinking aloud here as somebody who is unfamiliar with the sparse-index
tests. ensure_not_expanded relies on the existence of the "sparse-index"
repository, and its top-level README.md is outside of the
sparse-checkout cone.

That makes sense, and when I create a repository with a file outside of
the sparse-checkout cone and then run `git diff`, I see no changes as
expected.

But isn't the top-level directory always part of the cone? If so, I
think that what this (and the below test) is demonstrating is that we
can show changes inside of the cone without expanding the sparse-index.

Having that test makes absolute sense to me. But I think it might also
make sense to have a test that creates some directory structure outside
of the cone, modifies it, and then ensures that both (a) those changes
aren't visible to `git diff` when the sparse-checkout is active and (b)
that running `git diff` doesn't cause the sparse-index to be expanded.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] blame: enable and test the sparse index
  2021-10-15 21:20   ` [PATCH v2 2/2] blame: " Lessley Dennington via GitGitGadget
@ 2021-10-25 20:53     ` Taylor Blau
  2021-10-26 16:17       ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Taylor Blau @ 2021-10-25 20:53 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, gitster, newren, Lessley Dennington

On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote:
> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Enable the sparse index for the 'git blame' command. The index was already
> not expanded with this command, so the most interesting thing to do is to
> add tests that verify that 'git blame' behaves correctly when the sparse
> index is enabled and that its performance improves. More specifically, these
> cases are:
>
> 1. The index is not expanded for 'blame' when given paths in the sparse
> checkout cone at multiple levels.
>
> 2. Performance measurably improves for 'blame' with sparse index when given
> paths in the sparse checkout cone at multiple levels.
>
> The `p2000` tests demonstrate a ~60% execution time reduction when running
> 'blame' for a file two levels deep and and a ~30% execution time reduction
> for a file three levels deep.

Eek. What's eating up the other 30% when we have to open up another
layer of trees?

>
> Test                                         before  after
> ----------------------------------------------------------------
> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%
>
> We do not include paths outside the sparse checkout cone because blame
> currently does not support blaming files outside of the sparse definition.
> Attempting to do so fails with the following error:
>
> fatal: no such path '<path outside sparse definition>' in HEAD.

Small nit; this error message should be indented with a couple of space
characters to indicate that it's the output of running Git instead of
part of your patch message. Not worth a reroll on its own, but something
to keep in mind for your many future patches :).

>
> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  builtin/blame.c                          |  3 +++
>  t/perf/p2000-sparse-operations.sh        |  2 ++
>  t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
>  3 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 641523ff9af..af3d81e2bd4 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>  	long anchor;
>  	const int hexsz = the_hash_algo->hexsz;
>
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +

By now we're quite used to seeing this ;). Makes sense to me.

>  	setup_default_color_by_age();
>  	git_config(git_blame_config, &output_option);
>  	repo_init_revisions(the_repository, &revs, NULL);
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index bff93f16e93..9ac76a049b8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
>  test_perf_on_all git reset -- does-not-exist
>  test_perf_on_all git diff
>  test_perf_on_all git diff --staged
> +test_perf_on_all git blame $SPARSE_CONE/a
> +test_perf_on_all git blame $SPARSE_CONE/f3/a

Good.

>  test_done
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index e5d15be9d45..960ccf2d150 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
>  	test_all_match git blame deep/deeper1/deepest/a
>  '
>
> -# TODO: blame currently does not support blaming files outside of the
> -# sparse definition. It complains that the file doesn't exist locally.
> -test_expect_failure 'blame with pathspec outside sparse definition' '
> +# Blame does not support blaming files outside of the sparse
> +# definition, so we verify this scenario.
> +test_expect_success 'blame with pathspec outside sparse definition' '
>  	init_repos &&
>
> -	test_all_match git blame folder1/a &&
> -	test_all_match git blame folder2/a &&
> -	test_all_match git blame deep/deeper2/a &&
> -	test_all_match git blame deep/deeper2/deepest/a
> +	test_sparse_match git sparse-checkout set &&
> +	test_sparse_match test_must_fail git blame folder1/a &&
> +	test_sparse_match test_must_fail git blame folder2/a &&
> +	test_sparse_match test_must_fail git blame deep/deeper2/a &&
> +	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
>  '

test_must_fail used to allow for segfaults, but doesn't these days. So
this is a good test of "it should fail in sparse checkouts but not
crash", although I think it would be good to ensure that it's failing in
the way you expect (i.e., by checking that stderr contains "no such path
<xyz> in HEAD").
>
>  test_expect_success 'checkout and reset (mixed)' '
> @@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>  	)
>  '
>
> +test_expect_success 'sparse index is not expanded: blame' '
> +	init_repos &&
> +
> +	ensure_not_expanded blame a &&
> +	ensure_not_expanded blame deep/a &&
> +	ensure_not_expanded blame deep/deeper1/a &&
> +	ensure_not_expanded blame deep/deeper1/deepest/a
> +'

Makes sense. Probably just one of these is necessary, but I haven't
looked into init_repos (or the "setup" test) enough to know for sure.
Either way, not worth changing.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] diff: enable and test the sparse index
  2021-10-25 20:47     ` Taylor Blau
@ 2021-10-26 16:10       ` Lessley Dennington
  2021-10-26 16:15         ` Taylor Blau
  0 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington @ 2021-10-26 16:10 UTC (permalink / raw)
  To: Taylor Blau, Lessley Dennington via GitGitGadget
  Cc: git, stolee, gitster, newren



On 10/25/21 1:47 PM, Taylor Blau wrote:
> On Fri, Oct 15, 2021 at 09:20:34PM +0000, Lessley Dennington via GitGitGadget wrote:
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Enable the sparse index within the 'git diff' command. Its implementation
>> already safely integrates with the sparse index because it shares code with
>> the 'git status' and 'git checkout' commands that were already integrated.
> 
> Good, it looks like most of the heavy-lifting to make `git diff` work
> with the sparse index was already done elsewhere.
> 
> It may be helpful here to include either one of two things to help
> readers and reviewers understand what's going on:
> 
>    - A summary of what `git status` and/or `git checkout` does to work
>      with the sparse index.
>    - Or the patches which make those commands work with the sparse index
>      so that readers can refer back to them.
> 
> Having either of those would help readers who are unfamiliar with
> builtin/diff.c convince themselves more easily that setting
> 'command_requires_full_index = 0' is all that's needed here.
> 
Great suggestion, thank you!
>> The most interesting thing to do is to add tests that verify that 'git diff'
>> behaves correctly when the sparse index is enabled. These cases are:
>>
>> 1. The index is not expanded for 'diff' and 'diff --staged'
>> 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
>> checkout, and sparse index repositories in the following partially-staged
>> scenarios (i.e. the index, HEAD, and working directory differ at a given
>> path):
>>      1. Path is within sparse-checkout cone
>>      2. Path is outside sparse-checkout cone
>>      3. A merge conflict exists for paths outside sparse-checkout cone
> 
> Nice, these are all of the test cases that I would expect to demonstrate
> interesting behavior.
> 
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index f19c1b3e2eb..e5d15be9d45 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -386,6 +386,46 @@ test_expect_success 'diff --staged' '
>>   	test_all_match git diff --staged
>>   '
>>
>> +test_expect_success 'diff partially-staged' '
>> +	init_repos &&
>> +
>> +	write_script edit-contents <<-\EOF &&
>> +	echo text >>$1
>> +	EOF
>> +
>> +	# Add file within cone
>> +	test_sparse_match git sparse-checkout set deep &&
>> +	run_on_all ../edit-contents deep/testfile &&
>> +	test_all_match git add deep/testfile &&
>> +	run_on_all ../edit-contents deep/testfile &&
>> +
>> +	test_all_match git diff &&
>> +	test_all_match git diff --staged &&
>> +
>> +	# Add file outside cone
>> +	test_all_match git reset --hard &&
>> +	run_on_all mkdir newdirectory &&
>> +	run_on_all ../edit-contents newdirectory/testfile &&
>> +	test_sparse_match git sparse-checkout set newdirectory &&
>> +	test_all_match git add newdirectory/testfile &&
>> +	run_on_all ../edit-contents newdirectory/testfile &&
>> +	test_sparse_match git sparse-checkout set &&
>> +
>> +	test_all_match git diff &&
>> +	test_all_match git diff --staged &&
>> +
>> +	# Merge conflict outside cone
>> +	# The sparse checkout will report a warning that is not in the
>> +	# full checkout, so we use `run_on_all` instead of
>> +	# `test_all_match`
>> +	run_on_all git reset --hard &&
>> +	test_all_match git checkout merge-left &&
>> +	test_all_match test_must_fail git merge merge-right &&
>> +
>> +	test_all_match git diff &&
>> +	test_all_match git diff --staged
>> +'
>> +
>>   # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
>>   # running this test with 'df-conflict-2' after 'df-conflict-1'.
>>   test_expect_success 'diff with renames and conflicts' '
>> @@ -800,6 +840,11 @@ test_expect_success 'sparse-index is not expanded' '
>>   	# Wildcard identifies only full sparse directories, no index expansion
>>   	ensure_not_expanded reset deepest -- folder\* &&
>>
>> +	echo a test change >>sparse-index/README.md &&
>> +	ensure_not_expanded diff &&
> 
> Thinking aloud here as somebody who is unfamiliar with the sparse-index
> tests. ensure_not_expanded relies on the existence of the "sparse-index"
> repository, and its top-level README.md is outside of the
> sparse-checkout cone.
> 
> That makes sense, and when I create a repository with a file outside of
> the sparse-checkout cone and then run `git diff`, I see no changes as
> expected.
> 
> But isn't the top-level directory always part of the cone? If so, I
> think that what this (and the below test) is demonstrating is that we
> can show changes inside of the cone without expanding the sparse-index.
> 
> Having that test makes absolute sense to me. But I think it might also
> make sense to have a test that creates some directory structure outside
> of the cone, modifies it, and then ensures that both (a) those changes
> aren't visible to `git diff` when the sparse-checkout is active and (b)
> that running `git diff` doesn't cause the sparse-index to be expanded.
> 
README.md is actually within the sparse checkout cone - all files at 
root are included by default. So your understanding is correct - we are 
ensuring that making a change to a file in the cone and running both 
diff and diff --staged once the file is in the index doesn't expand the 
sparse index.

I like your idea of verifying that running diff against files outside 
the sparse checkout cone won't expand the index. I've updated the diff 
tests in v3 (which I will send out shortly) to do so.

> Thanks,
> Taylor
> 
Best,
Lessley

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] diff: enable and test the sparse index
  2021-10-26 16:10       ` Lessley Dennington
@ 2021-10-26 16:15         ` Taylor Blau
  0 siblings, 0 replies; 56+ messages in thread
From: Taylor Blau @ 2021-10-26 16:15 UTC (permalink / raw)
  To: Lessley Dennington
  Cc: Taylor Blau, Lessley Dennington via GitGitGadget, git, stolee,
	gitster, newren

On Tue, Oct 26, 2021 at 09:10:20AM -0700, Lessley Dennington wrote:
> > But isn't the top-level directory always part of the cone? If so, I
> > think that what this (and the below test) is demonstrating is that we
> > can show changes inside of the cone without expanding the sparse-index.
> >
> > Having that test makes absolute sense to me. But I think it might also
> > make sense to have a test that creates some directory structure outside
> > of the cone, modifies it, and then ensures that both (a) those changes
> > aren't visible to `git diff` when the sparse-checkout is active and (b)
> > that running `git diff` doesn't cause the sparse-index to be expanded.
> >
> README.md is actually within the sparse checkout cone - all files at root
> are included by default. So your understanding is correct - we are ensuring
> that making a change to a file in the cone and running both diff and diff
> --staged once the file is in the index doesn't expand the sparse index.
>
> I like your idea of verifying that running diff against files outside the
> sparse checkout cone won't expand the index. I've updated the diff tests in
> v3 (which I will send out shortly) to do so.

Great, thank you! There is no hurry to send out an updated revision from
me, either. It may be good to wait a day or two and see if any other
review trickles in before sending another revision to the list that way
you can batch together updates from multiple reviewers.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] blame: enable and test the sparse index
  2021-10-25 20:53     ` Taylor Blau
@ 2021-10-26 16:17       ` Lessley Dennington
  2021-11-21  1:32         ` Elijah Newren
  0 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington @ 2021-10-26 16:17 UTC (permalink / raw)
  To: Taylor Blau, Lessley Dennington via GitGitGadget
  Cc: git, stolee, gitster, newren

On 10/25/21 1:53 PM, Taylor Blau wrote:
> On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote:
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Enable the sparse index for the 'git blame' command. The index was already
>> not expanded with this command, so the most interesting thing to do is to
>> add tests that verify that 'git blame' behaves correctly when the sparse
>> index is enabled and that its performance improves. More specifically, these
>> cases are:
>>
>> 1. The index is not expanded for 'blame' when given paths in the sparse
>> checkout cone at multiple levels.
>>
>> 2. Performance measurably improves for 'blame' with sparse index when given
>> paths in the sparse checkout cone at multiple levels.
>>
>> The `p2000` tests demonstrate a ~60% execution time reduction when running
>> 'blame' for a file two levels deep and and a ~30% execution time reduction
>> for a file three levels deep.
> 
> Eek. What's eating up the other 30% when we have to open up another
> layer of trees?
> 
I'm not sure to be totally honest. However, given these are both pretty 
good time reductions I don't think we should be terribly concerned.
>>
>> Test                                         before  after
>> ----------------------------------------------------------------
>> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
>> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
>> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
>> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
>> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
>> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
>> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
>> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%
>>
>> We do not include paths outside the sparse checkout cone because blame
>> currently does not support blaming files outside of the sparse definition.
>> Attempting to do so fails with the following error:
>>
>> fatal: no such path '<path outside sparse definition>' in HEAD.
> 
> Small nit; this error message should be indented with a couple of space
> characters to indicate that it's the output of running Git instead of
> part of your patch message. Not worth a reroll on its own, but something
> to keep in mind for your many future patches :).
> 
Eh, I'm making some changes based on your suggestions anyway, so I'm 
including this in v3. Thanks for letting me know!
>>
>> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
>> ---
>>   builtin/blame.c                          |  3 +++
>>   t/perf/p2000-sparse-operations.sh        |  2 ++
>>   t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
>>   3 files changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index 641523ff9af..af3d81e2bd4 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>>   	long anchor;
>>   	const int hexsz = the_hash_algo->hexsz;
>>
>> +	prepare_repo_settings(the_repository);
>> +	the_repository->settings.command_requires_full_index = 0;
>> +
> 
> By now we're quite used to seeing this ;). Makes sense to me.
> 
>>   	setup_default_color_by_age();
>>   	git_config(git_blame_config, &output_option);
>>   	repo_init_revisions(the_repository, &revs, NULL);
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index bff93f16e93..9ac76a049b8 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
>>   test_perf_on_all git reset -- does-not-exist
>>   test_perf_on_all git diff
>>   test_perf_on_all git diff --staged
>> +test_perf_on_all git blame $SPARSE_CONE/a
>> +test_perf_on_all git blame $SPARSE_CONE/f3/a
> 
> Good.
> 
>>   test_done
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index e5d15be9d45..960ccf2d150 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -488,15 +488,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
>>   	test_all_match git blame deep/deeper1/deepest/a
>>   '
>>
>> -# TODO: blame currently does not support blaming files outside of the
>> -# sparse definition. It complains that the file doesn't exist locally.
>> -test_expect_failure 'blame with pathspec outside sparse definition' '
>> +# Blame does not support blaming files outside of the sparse
>> +# definition, so we verify this scenario.
>> +test_expect_success 'blame with pathspec outside sparse definition' '
>>   	init_repos &&
>>
>> -	test_all_match git blame folder1/a &&
>> -	test_all_match git blame folder2/a &&
>> -	test_all_match git blame deep/deeper2/a &&
>> -	test_all_match git blame deep/deeper2/deepest/a
>> +	test_sparse_match git sparse-checkout set &&
>> +	test_sparse_match test_must_fail git blame folder1/a &&
>> +	test_sparse_match test_must_fail git blame folder2/a &&
>> +	test_sparse_match test_must_fail git blame deep/deeper2/a &&
>> +	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
>>   '
> 
> test_must_fail used to allow for segfaults, but doesn't these days. So
> this is a good test of "it should fail in sparse checkouts but not
> crash", although I think it would be good to ensure that it's failing in
> the way you expect (i.e., by checking that stderr contains "no such path
> <xyz> in HEAD").
Good suggestion, coming in v3!
>>
>>   test_expect_success 'checkout and reset (mixed)' '
>> @@ -874,6 +875,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>>   	)
>>   '
>>
>> +test_expect_success 'sparse index is not expanded: blame' '
>> +	init_repos &&
>> +
>> +	ensure_not_expanded blame a &&
>> +	ensure_not_expanded blame deep/a &&
>> +	ensure_not_expanded blame deep/deeper1/a &&
>> +	ensure_not_expanded blame deep/deeper1/deepest/a
>> +'
> 
> Makes sense. Probably just one of these is necessary, but I haven't
> looked into init_repos (or the "setup" test) enough to know for sure.
> Either way, not worth changing.
> 
> Thanks,
> Taylor
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v3 0/2] Sparse Index: diff and blame builtins
  2021-10-15 21:20 ` [PATCH v2 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-10-15 21:20   ` [PATCH v2 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
  2021-10-15 21:20   ` [PATCH v2 2/2] blame: " Lessley Dennington via GitGitGadget
@ 2021-11-01 21:27   ` Lessley Dennington via GitGitGadget
  2021-11-01 21:27     ` [PATCH v3 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
                       ` (2 more replies)
  2 siblings, 3 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-01 21:27 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington

This series is based on vd/sparse-reset. It integrates the sparse index with
git diff and git blame and includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the commands
 2. repository settings to enable the sparse index

The p2000 tests demonstrate a ~30% execution time reduction for 'git diff'
and a ~75% execution time reduction for 'git diff --staged' using a sparse
index. For 'git blame', the reduction time was ~60% for a file two levels
deep and ~30% for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.30: git diff (full-v3)                  0.37    0.36 -2.7%
2000.31: git diff (full-v4)                  0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)                0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)                0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)         0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)         0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)       0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)       0.16    0.04 -75.0%
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%



Changes since V1
================

 * Fix failing diff partially-staged test in
   t1092-sparse-checkout-compatibility.sh, which was breaking in seen.


Changes since V2
================

 * Update diff commit description to include patches that make the checkout
   and status commands work with the sparse index for readers to reference.
 * Add new test case to verify diff behaves as expected when run against
   files outside the sparse checkout cone.
 * Indent error message in blame commit
 * Check error message in blame with pathspec outside sparse definition test
   matches expectations.
 * Loop blame tests (instead of running the same command multiple time
   against different files).

Thanks, Lessley

Lessley Dennington (2):
  diff: enable and test the sparse index
  blame: enable and test the sparse index

 builtin/blame.c                          |  3 +
 builtin/diff.c                           |  3 +
 t/perf/p2000-sparse-operations.sh        |  4 +
 t/t1092-sparse-checkout-compatibility.sh | 94 +++++++++++++++++++++---
 4 files changed, 93 insertions(+), 11 deletions(-)


base-commit: 7159bf518eed5c997cf4ff0f17d9cb69192a091c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1050%2Fldennington%2Fdiff-blame-sparse-index-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1050

Range-diff vs v2:

 1:  ac33159d020 ! 1:  991aaad37b4 diff: enable and test the sparse index
     @@ Commit message
          Enable the sparse index within the 'git diff' command. Its implementation
          already safely integrates with the sparse index because it shares code with
          the 'git status' and 'git checkout' commands that were already integrated.
     +    For more details see:
     +
     +    d76723ee53 (status: use sparse-index throughout, 2021-07-14)
     +    1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)
     +
          The most interesting thing to do is to add tests that verify that 'git diff'
          behaves correctly when the sparse index is enabled. These cases are:
      
     @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git checkout -f -
       test_done
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
     -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
     - 	test_all_match git diff --staged
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
     + 	)
       '
       
     -+test_expect_success 'diff partially-staged' '
     ++test_expect_success 'sparse index is not expanded: diff' '
      +	init_repos &&
      +
      +	write_script edit-contents <<-\EOF &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
      +
      +	test_all_match git diff &&
      +	test_all_match git diff --staged &&
     ++	ensure_not_expanded diff &&
     ++	ensure_not_expanded diff --staged &&
      +
      +	# Add file outside cone
      +	test_all_match git reset --hard &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
      +
      +	test_all_match git diff &&
      +	test_all_match git diff --staged &&
     ++	ensure_not_expanded diff &&
     ++	ensure_not_expanded diff --staged &&
      +
      +	# Merge conflict outside cone
      +	# The sparse checkout will report a warning that is not in the
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'diff --staged' '
      +	test_all_match test_must_fail git merge merge-right &&
      +
      +	test_all_match git diff &&
     -+	test_all_match git diff --staged
     -+'
     -+
     - # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
     - # running this test with 'df-conflict-2' after 'df-conflict-1'.
     - test_expect_success 'diff with renames and conflicts' '
     -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded' '
     - 	# Wildcard identifies only full sparse directories, no index expansion
     - 	ensure_not_expanded reset deepest -- folder\* &&
     - 
     -+	echo a test change >>sparse-index/README.md &&
     ++	test_all_match git diff --staged &&
      +	ensure_not_expanded diff &&
     -+	git -C sparse-index add README.md &&
     -+	ensure_not_expanded diff --staged &&
     ++	ensure_not_expanded diff --staged
     ++'
      +
     - 	ensure_not_expanded checkout -f update-deep &&
     - 	test_config -C sparse-index pull.twohead ort &&
     - 	(
     + # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
     + # in this scenario, but it shouldn't.
     + test_expect_success 'reset mixed and checkout orphan' '
 2:  a0b6a152c75 ! 2:  cfdd33129ec blame: enable and test the sparse index
     @@ Commit message
          currently does not support blaming files outside of the sparse definition.
          Attempting to do so fails with the following error:
      
     -    fatal: no such path '<path outside sparse definition>' in HEAD
     +      fatal: no such path '<path outside sparse definition>' in HEAD
      
          Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
      
     @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git reset --hard
       test_done
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
     -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'blame with pathspec inside sparse definition' '
     - 	test_all_match git blame deep/deeper1/deepest/a
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'log with pathspec outside sparse definition' '
     + test_expect_success 'blame with pathspec inside sparse definition' '
     + 	init_repos &&
     + 
     +-	test_all_match git blame a &&
     +-	test_all_match git blame deep/a &&
     +-	test_all_match git blame deep/deeper1/a &&
     +-	test_all_match git blame deep/deeper1/deepest/a
     ++	for file in a \
     ++			deep/a \
     ++			deep/deeper1/a \
     ++			deep/deeper1/deepest/a
     ++	do
     ++		test_all_match git blame $file
     ++	done
       '
       
      -# TODO: blame currently does not support blaming files outside of the
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'blame with pathsp
      +# definition, so we verify this scenario.
      +test_expect_success 'blame with pathspec outside sparse definition' '
       	init_repos &&
     ++	test_sparse_match git sparse-checkout set &&
       
      -	test_all_match git blame folder1/a &&
      -	test_all_match git blame folder2/a &&
      -	test_all_match git blame deep/deeper2/a &&
      -	test_all_match git blame deep/deeper2/deepest/a
     -+	test_sparse_match git sparse-checkout set &&
     -+	test_sparse_match test_must_fail git blame folder1/a &&
     -+	test_sparse_match test_must_fail git blame folder2/a &&
     -+	test_sparse_match test_must_fail git blame deep/deeper2/a &&
     -+	test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
     ++	for file in a \
     ++			deep/a \
     ++			deep/deeper1/a \
     ++			deep/deeper1/deepest/a
     ++	do
     ++		test_sparse_match test_must_fail git blame $file &&
     ++		cat >expect <<-EOF &&
     ++		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
     ++		EOF
     ++		# We compare sparse-checkout-err and sparse-index-err in
     ++		# `test_sparse_match`. Given we know they are the same, we
     ++		# only check the content of sparse-index-err here.
     ++		test_cmp expect sparse-index-err
     ++	done
       '
       
       test_expect_success 'checkout and reset (mixed)' '
     -@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
     - 	)
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is not expanded: diff' '
     + 	ensure_not_expanded diff --staged
       '
       
      +test_expect_success 'sparse index is not expanded: blame' '
      +	init_repos &&
      +
     -+	ensure_not_expanded blame a &&
     -+	ensure_not_expanded blame deep/a &&
     -+	ensure_not_expanded blame deep/deeper1/a &&
     -+	ensure_not_expanded blame deep/deeper1/deepest/a
     ++	for file in a \
     ++			deep/a \
     ++			deep/deeper1/a \
     ++			deep/deeper1/deepest/a
     ++	do
     ++		ensure_not_expanded blame $file
     ++	done
      +'
      +
       # NEEDSWORK: a sparse-checkout behaves differently from a full checkout

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v3 1/2] diff: enable and test the sparse index
  2021-11-01 21:27   ` [PATCH v3 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
@ 2021-11-01 21:27     ` Lessley Dennington via GitGitGadget
  2021-11-03 17:05       ` Junio C Hamano
  2021-11-01 21:27     ` [PATCH v3 2/2] blame: " Lessley Dennington via GitGitGadget
  2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-01 21:27 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code with
the 'git status' and 'git checkout' commands that were already integrated.
For more details see:

d76723ee53 (status: use sparse-index throughout, 2021-07-14)
1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)

The most interesting thing to do is to add tests that verify that 'git diff'
behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~30% execution time reduction for 'git
diff' and a ~75% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.37    0.36 -2.7%
2000.31: git diff (full-v4)               0.36    0.35 -2.8%
2000.32: git diff (sparse-v3)             0.46    0.30 -34.8%
2000.33: git diff (sparse-v4)             0.43    0.31 -27.9%
2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/diff.c                           |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index dd8ce688ba7..cbf7b51c7c0 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -437,6 +437,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	prefix = setup_git_directory_gently(&nongit);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (!no_index) {
 		/*
 		 * Treat git diff with at least one path outside of the
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bfd332120c8..bff93f16e93 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -113,5 +113,7 @@ test_perf_on_all git checkout -f -
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
+test_perf_on_all git diff
+test_perf_on_all git diff --staged
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 44d5e11c762..53524660759 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -832,6 +832,52 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
 	)
 '
 
+test_expect_success 'sparse index is not expanded: diff' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	# Add file within cone
+	test_sparse_match git sparse-checkout set deep &&
+	run_on_all ../edit-contents deep/testfile &&
+	test_all_match git add deep/testfile &&
+	run_on_all ../edit-contents deep/testfile &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged &&
+
+	# Add file outside cone
+	test_all_match git reset --hard &&
+	run_on_all mkdir newdirectory &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set newdirectory &&
+	test_all_match git add newdirectory/testfile &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged &&
+
+	# Merge conflict outside cone
+	# The sparse checkout will report a warning that is not in the
+	# full checkout, so we use `run_on_all` instead of
+	# `test_all_match`
+	run_on_all git reset --hard &&
+	test_all_match git checkout merge-left &&
+	test_all_match test_must_fail git merge merge-right &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v3 2/2] blame: enable and test the sparse index
  2021-11-01 21:27   ` [PATCH v3 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-11-01 21:27     ` [PATCH v3 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-11-01 21:27     ` Lessley Dennington via GitGitGadget
  2021-11-03 16:47       ` Junio C Hamano
  2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-01 21:27 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
currently does not support blaming files outside of the sparse definition.
Attempting to do so fails with the following error:

  fatal: no such path '<path outside sparse definition>' in HEAD

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 +
 t/t1092-sparse-checkout-compatibility.sh | 48 ++++++++++++++++++------
 3 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..af3d81e2bd4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	long anchor;
 	const int hexsz = the_hash_algo->hexsz;
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	setup_default_color_by_age();
 	git_config(git_blame_config, &output_option);
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bff93f16e93..9ac76a049b8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
 test_perf_on_all git diff --staged
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 53524660759..da442f3dcff 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -442,21 +442,35 @@ test_expect_success 'log with pathspec outside sparse definition' '
 test_expect_success 'blame with pathspec inside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame a &&
-	test_all_match git blame deep/a &&
-	test_all_match git blame deep/deeper1/a &&
-	test_all_match git blame deep/deeper1/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_all_match git blame $file
+	done
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# Blame does not support blaming files outside of the sparse
+# definition, so we verify this scenario.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
+	test_sparse_match git sparse-checkout set &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_sparse_match test_must_fail git blame $file &&
+		cat >expect <<-EOF &&
+		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
+		EOF
+		# We compare sparse-checkout-err and sparse-index-err in
+		# `test_sparse_match`. Given we know they are the same, we
+		# only check the content of sparse-index-err here.
+		test_cmp expect sparse-index-err
+	done
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -878,6 +892,18 @@ test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --staged
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		ensure_not_expanded blame $file
+	done
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/2] blame: enable and test the sparse index
  2021-11-01 21:27     ` [PATCH v3 2/2] blame: " Lessley Dennington via GitGitGadget
@ 2021-11-03 16:47       ` Junio C Hamano
  2021-11-05  0:04         ` Lessley Dennington
  2021-11-21  1:46         ` Elijah Newren
  0 siblings, 2 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-11-03 16:47 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Lessley Dennington

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> We do not include paths outside the sparse checkout cone because blame
> currently does not support blaming files outside of the sparse definition.
> Attempting to do so fails with the following error:
>
>   fatal: no such path '<path outside sparse definition>' in HEAD

Does this indicate that we need to update how the command line
safety in verify_working_tree_path() works in a sparsely checked out
working tree?  If foo/bar is outside the sparse definition,

    git blame HEAD foo/bar

may get such a message, but shouldn't

    git blame HEAD -- foo/bar

make it work?

> -# TODO: blame currently does not support blaming files outside of the
> -# sparse definition. It complains that the file doesn't exist locally.
> -test_expect_failure 'blame with pathspec outside sparse definition' '
> +# Blame does not support blaming files outside of the sparse
> +# definition, so we verify this scenario.

IOW, why is it a good idea to drop the "TODO" and "currently" and pretend
as if the current behaviour is the desirable one?

> +test_expect_success 'blame with pathspec outside sparse definition' '
>  	init_repos &&
> +	test_sparse_match git sparse-checkout set &&
>  
> -	test_all_match git blame folder1/a &&
> -	test_all_match git blame folder2/a &&
> -	test_all_match git blame deep/deeper2/a &&
> -	test_all_match git blame deep/deeper2/deepest/a
> +	for file in a \
> +			deep/a \
> +			deep/deeper1/a \
> +			deep/deeper1/deepest/a
> +	do
> +		test_sparse_match test_must_fail git blame $file &&
> +		cat >expect <<-EOF &&
> +		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
> +		EOF
> +		# We compare sparse-checkout-err and sparse-index-err in
> +		# `test_sparse_match`. Given we know they are the same, we
> +		# only check the content of sparse-index-err here.
> +		test_cmp expect sparse-index-err
> +	done
>  '
>  
>  test_expect_success 'checkout and reset (mixed)' '
> @@ -878,6 +892,18 @@ test_expect_success 'sparse index is not expanded: diff' '
>  	ensure_not_expanded diff --staged
>  '
>  
> +test_expect_success 'sparse index is not expanded: blame' '
> +	init_repos &&
> +
> +	for file in a \
> +			deep/a \
> +			deep/deeper1/a \
> +			deep/deeper1/deepest/a
> +	do
> +		ensure_not_expanded blame $file
> +	done
> +'
> +
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>  # in this scenario, but it shouldn't.
>  test_expect_success 'reset mixed and checkout orphan' '

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/2] diff: enable and test the sparse index
  2021-11-01 21:27     ` [PATCH v3 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-11-03 17:05       ` Junio C Hamano
  2021-11-04 23:55         ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-11-03 17:05 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Lessley Dennington

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> 2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
> 2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
> 2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
> 2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%

Please do not add more use of the synonym to the test suite, other
than the one that makes sure the synonym works the same way as the
real option, which is "--cached".

> diff --git a/builtin/diff.c b/builtin/diff.c
> index dd8ce688ba7..cbf7b51c7c0 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -437,6 +437,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  
>  	prefix = setup_git_directory_gently(&nongit);
>  
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +

Doesn't the code need to be protected with

	if (!nongit) {
		prepare_repo_settings(the_repository);
		the_repository->settings.command_requires_full_index = 0;
	}

at the very least?  It may be that the code is getting lucky because
the_repository may be initialized with a random value (after all,
when we are not in a repository, there is nowhere to read the
on-disk settings from) and we may even be able to set a bit in the
settings structure without crashing, but conceptually, doing the
above when we _know_ we are not in any repository is simply wrong.

I wonder if prepare_repo_settings() needs be more strict.  For
example, shouldn't it check if we have a repository to begin with
and BUG() if it was called when there is not a repository?  After
all, it tries to read from the repository configuration file, so any
necessary set-up to discover where the gitdir is must have been done
already before it can be called.

With such a safety feature to catch a programmer errors, perhaps the
above could have been caught before the patch hit the list.

Thoughts?  Am I missing some chicken-and-egg situation where
prepare_repo_settings() must be callable before we know where the
repository is, or something, which justifies why the function is so
loose in its sanity checks in the current form?



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 1/2] diff: enable and test the sparse index
  2021-11-03 17:05       ` Junio C Hamano
@ 2021-11-04 23:55         ` Lessley Dennington
  0 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-04 23:55 UTC (permalink / raw)
  To: Junio C Hamano, Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau

On 11/3/21 10:05 AM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> 2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
>> 2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
>> 2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
>> 2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%
> 
> Please do not add more use of the synonym to the test suite, other
> than the one that makes sure the synonym works the same way as the
> real option, which is "--cached".
>

Thank you, changed for v4.

>> diff --git a/builtin/diff.c b/builtin/diff.c
>> index dd8ce688ba7..cbf7b51c7c0 100644
>> --- a/builtin/diff.c
>> +++ b/builtin/diff.c
>> @@ -437,6 +437,9 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>>   
>>   	prefix = setup_git_directory_gently(&nongit);
>>   
>> +	prepare_repo_settings(the_repository);
>> +	the_repository->settings.command_requires_full_index = 0;
>> +
> 
> Doesn't the code need to be protected with
> 
> 	if (!nongit) {
> 		prepare_repo_settings(the_repository);
> 		the_repository->settings.command_requires_full_index = 0;
> 	}
> 
> at the very least?  It may be that the code is getting lucky because
> the_repository may be initialized with a random value (after all,
> when we are not in a repository, there is nowhere to read the
> on-disk settings from) and we may even be able to set a bit in the
> settings structure without crashing, but conceptually, doing the
> above when we _know_ we are not in any repository is simply wrong.
> 
> I wonder if prepare_repo_settings() needs be more strict.  For
> example, shouldn't it check if we have a repository to begin with
> and BUG() if it was called when there is not a repository?  After
> all, it tries to read from the repository configuration file, so any
> necessary set-up to discover where the gitdir is must have been done
> already before it can be called.
> 
> With such a safety feature to catch a programmer errors, perhaps the
> above could have been caught before the patch hit the list.
> 
> Thoughts?  Am I missing some chicken-and-egg situation where
> prepare_repo_settings() must be callable before we know where the
> repository is, or something, which justifies why the function is so
> loose in its sanity checks in the current form?
> 
> 

This seems like a good idea. I've added both the nongit check and the 
prepare_repo_settings() updates you've suggested for v4, pending review 
by my team.

Best,
Lessley

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/2] blame: enable and test the sparse index
  2021-11-03 16:47       ` Junio C Hamano
@ 2021-11-05  0:04         ` Lessley Dennington
  2021-11-21  1:46         ` Elijah Newren
  1 sibling, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-05  0:04 UTC (permalink / raw)
  To: Junio C Hamano, Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau

On 11/3/21 9:47 AM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> We do not include paths outside the sparse checkout cone because blame
>> currently does not support blaming files outside of the sparse definition.
>> Attempting to do so fails with the following error:
>>
>>    fatal: no such path '<path outside sparse definition>' in HEAD
> 
> Does this indicate that we need to update how the command line
> safety in verify_working_tree_path() works in a sparsely checked out
> working tree?  If foo/bar is outside the sparse definition,
> 
>      git blame HEAD foo/bar
> 
> may get such a message, but shouldn't
> 
>      git blame HEAD -- foo/bar
> 
> make it work?
> 
This is something we could consider for the future, and I've updated the 
comment you called out below to better reflect that in the next version. 
However, I don't know that this change belongs in this series, which 
aims to enable the sparse index. Perhaps even updating the 'blame with 
pathspec outside sparse definition' test was a bit too far outside the 
target scope of this change - I can remove my updates if that is the case.

>> -# TODO: blame currently does not support blaming files outside of the
>> -# sparse definition. It complains that the file doesn't exist locally.
>> -test_expect_failure 'blame with pathspec outside sparse definition' '
>> +# Blame does not support blaming files outside of the sparse
>> +# definition, so we verify this scenario.
> 
> IOW, why is it a good idea to drop the "TODO" and "currently" and pretend
> as if the current behaviour is the desirable one?
> 
Thank you, updated for v4.

>> +test_expect_success 'blame with pathspec outside sparse definition' '
>>   	init_repos &&
>> +	test_sparse_match git sparse-checkout set &&
>>   
>> -	test_all_match git blame folder1/a &&
>> -	test_all_match git blame folder2/a &&
>> -	test_all_match git blame deep/deeper2/a &&
>> -	test_all_match git blame deep/deeper2/deepest/a
>> +	for file in a \
>> +			deep/a \
>> +			deep/deeper1/a \
>> +			deep/deeper1/deepest/a
>> +	do
>> +		test_sparse_match test_must_fail git blame $file &&
>> +		cat >expect <<-EOF &&
>> +		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
>> +		EOF
>> +		# We compare sparse-checkout-err and sparse-index-err in
>> +		# `test_sparse_match`. Given we know they are the same, we
>> +		# only check the content of sparse-index-err here.
>> +		test_cmp expect sparse-index-err
>> +	done
>>   '
>>   
>>   test_expect_success 'checkout and reset (mixed)' '
>> @@ -878,6 +892,18 @@ test_expect_success 'sparse index is not expanded: diff' '
>>   	ensure_not_expanded diff --staged
>>   '
>>   
>> +test_expect_success 'sparse index is not expanded: blame' '
>> +	init_repos &&
>> +
>> +	for file in a \
>> +			deep/a \
>> +			deep/deeper1/a \
>> +			deep/deeper1/deepest/a
>> +	do
>> +		ensure_not_expanded blame $file
>> +	done
>> +'
>> +
>>   # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>>   # in this scenario, but it shouldn't.
>>   test_expect_success 'reset mixed and checkout orphan' '

Best,

Lessley

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] blame: enable and test the sparse index
  2021-10-26 16:17       ` Lessley Dennington
@ 2021-11-21  1:32         ` Elijah Newren
  0 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-11-21  1:32 UTC (permalink / raw)
  To: Lessley Dennington
  Cc: Taylor Blau, Lessley Dennington via GitGitGadget,
	Git Mailing List, Derrick Stolee, Junio C Hamano

On Tue, Oct 26, 2021 at 9:17 AM Lessley Dennington
<lessleydennington@gmail.com> wrote:
>
> On 10/25/21 1:53 PM, Taylor Blau wrote:
> > On Fri, Oct 15, 2021 at 09:20:35PM +0000, Lessley Dennington via GitGitGadget wrote:
> >> From: Lessley Dennington <lessleydennington@gmail.com>
> >>
> >> Enable the sparse index for the 'git blame' command. The index was already
> >> not expanded with this command, so the most interesting thing to do is to
> >> add tests that verify that 'git blame' behaves correctly when the sparse
> >> index is enabled and that its performance improves. More specifically, these
> >> cases are:
> >>
> >> 1. The index is not expanded for 'blame' when given paths in the sparse
> >> checkout cone at multiple levels.
> >>
> >> 2. Performance measurably improves for 'blame' with sparse index when given
> >> paths in the sparse checkout cone at multiple levels.
> >>
> >> The `p2000` tests demonstrate a ~60% execution time reduction when running
> >> 'blame' for a file two levels deep and and a ~30% execution time reduction
> >> for a file three levels deep.
> >
> > Eek. What's eating up the other 30% when we have to open up another
> > layer of trees?
> >
> I'm not sure to be totally honest. However, given these are both pretty
> good time reductions I don't think we should be terribly concerned.

It's not something eating up more time in the sparse-index code; let's
look a bit closer...

> >>
> >> Test                                         before  after
> >> ----------------------------------------------------------------
> >> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
> >> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
> >> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
> >> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
> >> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
> >> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
> >> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
> >> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

Time was ~0.55s for the full at two levels deep, and dropped by just
over 0.3s in sparse-index.
Time was ~1.05s for the full at three levels deep, and dropped by just
over 0.3s in sparse-index.

So, the sparse-index enabling saves us the same amount of time, it's
just that the overall execution time for the non-sparse-index
comparison point goes up.  Saving the same amount of time for the two
cases seems intuitive to me; both cases get to avoid looking at the
same number of index entries outside the sparsity paths.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v3 2/2] blame: enable and test the sparse index
  2021-11-03 16:47       ` Junio C Hamano
  2021-11-05  0:04         ` Lessley Dennington
@ 2021-11-21  1:46         ` Elijah Newren
  1 sibling, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-11-21  1:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lessley Dennington via GitGitGadget, Git Mailing List,
	Derrick Stolee, Taylor Blau, Lessley Dennington

On Wed, Nov 3, 2021 at 9:47 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > We do not include paths outside the sparse checkout cone because blame
> > currently does not support blaming files outside of the sparse definition.
> > Attempting to do so fails with the following error:
> >
> >   fatal: no such path '<path outside sparse definition>' in HEAD
>
> Does this indicate that we need to update how the command line
> safety in verify_working_tree_path() works in a sparsely checked out
> working tree?

I wondered the same thing, but no I don't think we need any extra
command line safety here.  The behavior Lessley is reporting here is
equivalent to the following in a regular (non-sparse) checkout:

$ rm t/test-lib.sh
$ git blame t/test-lib.sh
fatal: Cannot lstat 't/test-lib.sh': No such file or directory
$ git blame -- t/test-lib.sh
fatal: Cannot lstat 't/test-lib.sh': No such file or directory

blame without a revision has always failed for files not in the
working tree, regardless of whether those files are found in the index
or HEAD.

>  If foo/bar is outside the sparse definition,
>
>     git blame HEAD foo/bar

Actually, that works; there's no error and the code with Lessley's
patch will show the blame info for foo/bar (assuming foo/bar was a
path in HEAD, of course).

> may get such a message, but shouldn't
>
>     git blame HEAD -- foo/bar
>
> make it work?

This also works.  But both of these things are kind of testing
something different; when given a revision, the checkout is irrelevant
to git blame: git blame with a revision will work regardless of
whether the checkout is full, sparse, completely empty, or
non-existent (i.e. a bare clone).

> > -# TODO: blame currently does not support blaming files outside of the
> > -# sparse definition. It complains that the file doesn't exist locally.
> > -test_expect_failure 'blame with pathspec outside sparse definition' '
> > +# Blame does not support blaming files outside of the sparse
> > +# definition, so we verify this scenario.
>
> IOW, why is it a good idea to drop the "TODO" and "currently" and pretend
> as if the current behaviour is the desirable one?

I think dropping the TODO is correct, but the wording is confusing --
it has nothing to do with sparse checkouts.  I'd rather say, "Without
a specified revision, blame will only handle files present in the
current working directory and error on any other paths"

> > +test_expect_success 'blame with pathspec outside sparse definition' '
> >       init_repos &&
> > +     test_sparse_match git sparse-checkout set &&
> >
> > -     test_all_match git blame folder1/a &&
> > -     test_all_match git blame folder2/a &&
> > -     test_all_match git blame deep/deeper2/a &&
> > -     test_all_match git blame deep/deeper2/deepest/a
> > +     for file in a \
> > +                     deep/a \
> > +                     deep/deeper1/a \
> > +                     deep/deeper1/deepest/a
> > +     do
> > +             test_sparse_match test_must_fail git blame $file &&
> > +             cat >expect <<-EOF &&
> > +             fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
> > +             EOF
> > +             # We compare sparse-checkout-err and sparse-index-err in
> > +             # `test_sparse_match`. Given we know they are the same, we
> > +             # only check the content of sparse-index-err here.
> > +             test_cmp expect sparse-index-err
> > +     done
> >  '
> >
> >  test_expect_success 'checkout and reset (mixed)' '
> > @@ -878,6 +892,18 @@ test_expect_success 'sparse index is not expanded: diff' '
> >       ensure_not_expanded diff --staged
> >  '
> >
> > +test_expect_success 'sparse index is not expanded: blame' '
> > +     init_repos &&
> > +
> > +     for file in a \
> > +                     deep/a \
> > +                     deep/deeper1/a \
> > +                     deep/deeper1/deepest/a
> > +     do
> > +             ensure_not_expanded blame $file
> > +     done
> > +'
> > +
> >  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
> >  # in this scenario, but it shouldn't.
> >  test_expect_success 'reset mixed and checkout orphan' '

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v4 0/4] Sparse Index: diff and blame builtins
  2021-11-01 21:27   ` [PATCH v3 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-11-01 21:27     ` [PATCH v3 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
  2021-11-01 21:27     ` [PATCH v3 2/2] blame: " Lessley Dennington via GitGitGadget
@ 2021-11-22 22:42     ` Lessley Dennington via GitGitGadget
  2021-11-22 22:42       ` [PATCH v4 1/4] sparse index: enable only for git repos Lessley Dennington via GitGitGadget
                         ` (4 more replies)
  2 siblings, 5 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-22 22:42 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington

This series is based on vd/sparse-reset. It integrates the sparse index with
git diff and git blame and includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the commands
 2. repository settings to enable the sparse index

The p2000 tests demonstrate a ~44% execution time reduction for 'git diff'
and a ~86% execution time reduction for 'git diff --staged' using a sparse
index. For 'git blame', the reduction time was ~60% for a file two levels
deep and ~30% for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.30: git diff (full-v3)                  0.33    0.34 +3.0%
2000.31: git diff (full-v4)                  0.33    0.35 +6.1%
2000.32: git diff (sparse-v3)                0.53    0.31 -41.5%
2000.33: git diff (sparse-v4)                0.54    0.29 -46.3%
2000.34: git diff --cached (full-v3)         0.07    0.07 +0.0%
2000.35: git diff --cached (full-v4)         0.07    0.08 +14.3%
2000.36: git diff --cached (sparse-v3)       0.28    0.04 -85.7%
2000.37: git diff --cached (sparse-v4)       0.23    0.03 -87.0%
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%



Changes since V1
================

 * Fix failing diff partially-staged test in
   t1092-sparse-checkout-compatibility.sh, which was breaking in seen.


Changes since V2
================

 * Update diff commit description to include patches that make the checkout
   and status commands work with the sparse index for readers to reference.
 * Add new test case to verify diff behaves as expected when run against
   files outside the sparse checkout cone.
 * Indent error message in blame commit
 * Check error message in blame with pathspec outside sparse definition test
   matches expectations.
 * Loop blame tests (instead of running the same command multiple time
   against different files).


Changes since V3
================

 * Update diff p2000 tests to use --cached instead of --staged. Execute new
   run and update results in commit description and cover letter.
 * Update comment on blame with pathspec outside sparse definition test in
   t1092-sparse-checkout-compatibility.sh to clarify that it tests the
   current state and could be improved in the future.
 * Ensure sparse index is only activated when diff is running against files
   in a Git repo.
 * BUG if prepare_repo_settings() is called outside a repository.
 * Ensure sparse index is not activated for calls to blame, checkout, or
   pack-object with -h.
 * Ensure commit-graph is only loaded if a git directory exists.

Thanks, Lessley

Lessley Dennington (4):
  sparse index: enable only for git repos
  test-read-cache: set up repo after git directory
  diff: enable and test the sparse index
  blame: enable and test the sparse index

 builtin/blame.c                          |  5 ++
 builtin/checkout.c                       |  6 +-
 builtin/diff.c                           |  5 ++
 builtin/pack-objects.c                   |  9 ++-
 commit-graph.c                           |  5 +-
 repo-settings.c                          |  3 +
 t/helper/test-read-cache.c               |  5 +-
 t/perf/p2000-sparse-operations.sh        |  4 +
 t/t1092-sparse-checkout-compatibility.sh | 95 +++++++++++++++++++++---
 9 files changed, 118 insertions(+), 19 deletions(-)


base-commit: 7159bf518eed5c997cf4ff0f17d9cb69192a091c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1050%2Fldennington%2Fdiff-blame-sparse-index-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1050

Range-diff vs v3:

 -:  ----------- > 1:  81e208cf454 sparse index: enable only for git repos
 -:  ----------- > 2:  5bc5e8465ab test-read-cache: set up repo after git directory
 1:  991aaad37b4 ! 3:  273ee16b74e diff: enable and test the sparse index
     @@ Commit message
          diff: enable and test the sparse index
      
          Enable the sparse index within the 'git diff' command. Its implementation
     -    already safely integrates with the sparse index because it shares code with
     -    the 'git status' and 'git checkout' commands that were already integrated.
     -    For more details see:
     +    already safely integrates with the sparse index because it shares code
     +    with the 'git status' and 'git checkout' commands that were already
     +    integrated.  For more details see:
      
     -    d76723ee53 (status: use sparse-index throughout, 2021-07-14)
     -    1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)
     +    d76723e (status: use sparse-index throughout, 2021-07-14)
     +    1ba5f45 (checkout: stop expanding sparse indexes, 2021-06-29)
      
     -    The most interesting thing to do is to add tests that verify that 'git diff'
     -    behaves correctly when the sparse index is enabled. These cases are:
     +    The most interesting thing to do is to add tests that verify that 'git
     +    diff' behaves correctly when the sparse index is enabled. These cases are:
      
          1. The index is not expanded for 'diff' and 'diff --staged'
          2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
     @@ Commit message
              2. Path is outside sparse-checkout cone
              3. A merge conflict exists for paths outside sparse-checkout cone
      
     -    The `p2000` tests demonstrate a ~30% execution time reduction for 'git
     -    diff' and a ~75% execution time reduction for 'git diff --staged' using a
     +    The `p2000` tests demonstrate a ~44% execution time reduction for 'git
     +    diff' and a ~86% execution time reduction for 'git diff --staged' using a
          sparse index:
      
          Test                                      before  after
          -------------------------------------------------------------
     -    2000.30: git diff (full-v3)               0.37    0.36 -2.7%
     -    2000.31: git diff (full-v4)               0.36    0.35 -2.8%
     -    2000.32: git diff (sparse-v3)             0.46    0.30 -34.8%
     -    2000.33: git diff (sparse-v4)             0.43    0.31 -27.9%
     -    2000.34: git diff --staged (full-v3)      0.08    0.08 +0.0%
     -    2000.35: git diff --staged (full-v4)      0.08    0.08 +0.0%
     -    2000.36: git diff --staged (sparse-v3)    0.17    0.04 -76.5%
     -    2000.37: git diff --staged (sparse-v4)    0.16    0.04 -75.0%
     +    2000.30: git diff (full-v3)               0.33    0.34 +3.0%
     +    2000.31: git diff (full-v4)               0.33    0.35 +6.1%
     +    2000.32: git diff (sparse-v3)             0.53    0.31 -41.5%
     +    2000.33: git diff (sparse-v4)             0.54    0.29 -46.3%
     +    2000.34: git diff --cached (full-v3)      0.07    0.07 +0.0%
     +    2000.35: git diff --cached (full-v4)      0.07    0.08 +14.3%
     +    2000.36: git diff --cached (sparse-v3)    0.28    0.04 -85.7%
     +    2000.37: git diff --cached (sparse-v4)    0.23    0.03 -87.0%
      
          Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     @@ builtin/diff.c: int cmd_diff(int argc, const char **argv, const char *prefix)
       
       	prefix = setup_git_directory_gently(&nongit);
       
     -+	prepare_repo_settings(the_repository);
     -+	the_repository->settings.command_requires_full_index = 0;
     ++	if (!nongit) {
     ++		prepare_repo_settings(the_repository);
     ++		the_repository->settings.command_requires_full_index = 0;
     ++	}
      +
       	if (!no_index) {
       		/*
     @@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git checkout -f -
       test_perf_on_all git reset --hard
       test_perf_on_all git reset -- does-not-exist
      +test_perf_on_all git diff
     -+test_perf_on_all git diff --staged
     ++test_perf_on_all git diff --cached
       
       test_done
      
 2:  cfdd33129ec ! 4:  7acf5118bf5 blame: enable and test the sparse index
     @@ builtin/blame.c: int cmd_blame(int argc, const char **argv, const char *prefix)
       	long anchor;
       	const int hexsz = the_hash_algo->hexsz;
       
     -+	prepare_repo_settings(the_repository);
     -+	the_repository->settings.command_requires_full_index = 0;
     ++	if (startup_info->have_repository) {
     ++		prepare_repo_settings(the_repository);
     ++		the_repository->settings.command_requires_full_index = 0;
     ++	}
      +
       	setup_default_color_by_age();
       	git_config(git_blame_config, &output_option);
       	repo_init_revisions(the_repository, &revs, NULL);
      
       ## t/perf/p2000-sparse-operations.sh ##
     -@@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git reset --hard
     +@@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git reset
     + test_perf_on_all git reset --hard
       test_perf_on_all git reset -- does-not-exist
       test_perf_on_all git diff
     - test_perf_on_all git diff --staged
     +-test_perf_on_all git diff --cached
     ++test_perf_on_all git diff --staged
      +test_perf_on_all git blame $SPARSE_CONE/a
      +test_perf_on_all git blame $SPARSE_CONE/f3/a
       
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'log with pathspec
      -# TODO: blame currently does not support blaming files outside of the
      -# sparse definition. It complains that the file doesn't exist locally.
      -test_expect_failure 'blame with pathspec outside sparse definition' '
     -+# Blame does not support blaming files outside of the sparse
     -+# definition, so we verify this scenario.
     ++# NEEDSWORK: This test documents the current behavior, but this could
     ++# change in the future if we decide to support blaming files outside
     ++# the sparse definition.
      +test_expect_success 'blame with pathspec outside sparse definition' '
       	init_repos &&
      +	test_sparse_match git sparse-checkout set &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
@ 2021-11-22 22:42       ` Lessley Dennington via GitGitGadget
  2021-11-23  7:41         ` Elijah Newren
  2021-11-23 23:39         ` Junio C Hamano
  2021-11-22 22:42       ` [PATCH v4 2/4] test-read-cache: set up repo after git directory Lessley Dennington via GitGitGadget
                         ` (3 subsequent siblings)
  4 siblings, 2 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-22 22:42 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Check whether git dir exists before adding any repo settings. If it
does not exist, BUG with the message that one cannot add settings for an
uninitialized repository. If it does exist, proceed with adding repo
settings.

Additionally, ensure the above BUG is not triggered when users pass the -h
flag by adding a check for the repository to the checkout and pack-objects
builtins.

Finally, ensure the above BUG is not triggered for commit-graph by
returning early if the git directory does not exist.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/checkout.c     | 6 ++++--
 builtin/pack-objects.c | 9 ++++++---
 commit-graph.c         | 5 ++++-
 repo-settings.c        | 3 +++
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8c69dcdf72a..31632b07888 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1588,8 +1588,10 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
 
 	git_config(git_checkout_config, opts);
 
-	prepare_repo_settings(the_repository);
-	the_repository->settings.command_requires_full_index = 0;
+	if (startup_info->have_repository) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
 
 	opts->track = BRANCH_TRACK_UNSPECIFIED;
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1a3dd445f83..45dc2258dc7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3976,9 +3976,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	read_replace_refs = 0;
 
 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
-	prepare_repo_settings(the_repository);
-	if (sparse < 0)
-		sparse = the_repository->settings.pack_use_sparse;
+
+	if (startup_info->have_repository) {
+		prepare_repo_settings(the_repository);
+		if (sparse < 0)
+			sparse = the_repository->settings.pack_use_sparse;
+	}
 
 	reset_pack_idx_option(&pack_idx_opts);
 	git_config(git_pack_config, NULL);
diff --git a/commit-graph.c b/commit-graph.c
index 2706683acfe..265c010122e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -632,10 +632,13 @@ static int prepare_commit_graph(struct repository *r)
 	struct object_directory *odb;
 
 	/*
+	 * Early return if there is no git dir or if the commit graph is
+	 * disabled.
+	 *
 	 * This must come before the "already attempted?" check below, because
 	 * we want to disable even an already-loaded graph file.
 	 */
-	if (r->commit_graph_disabled)
+	if (!r->gitdir || r->commit_graph_disabled)
 		return 0;
 
 	if (r->objects->commit_graph_attempted)
diff --git a/repo-settings.c b/repo-settings.c
index b93e91a212e..00ca5571a1a 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -17,6 +17,9 @@ void prepare_repo_settings(struct repository *r)
 	char *strval;
 	int manyfiles;
 
+	if (!r->gitdir)
+		BUG("Cannot add settings for uninitialized repository");
+
 	if (r->settings.initialized++)
 		return;
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v4 2/4] test-read-cache: set up repo after git directory
  2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-11-22 22:42       ` [PATCH v4 1/4] sparse index: enable only for git repos Lessley Dennington via GitGitGadget
@ 2021-11-22 22:42       ` Lessley Dennington via GitGitGadget
  2021-11-23 23:42         ` Junio C Hamano
  2021-11-22 22:42       ` [PATCH v4 3/4] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-22 22:42 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Move repo setup to occur after git directory is set up. This will ensure
enabling the sparse index for `diff` (and guarding against the nongit
scenario) will not cause tests to start failing, since that change will include
adding a check to prepare_repo_settings() with the new BUG.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 t/helper/test-read-cache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
index b52c174acc7..0d9f08931a1 100644
--- a/t/helper/test-read-cache.c
+++ b/t/helper/test-read-cache.c
@@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv)
 	int table = 0, expand = 0;
 
 	initialize_the_repository();
-	prepare_repo_settings(r);
-	r->settings.command_requires_full_index = 0;
 
 	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
 		if (skip_prefix(*argv, "--print-and-refresh=", &name))
@@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv)
 	setup_git_directory();
 	git_config(git_default_config, NULL);
 
+	prepare_repo_settings(r);
+	r->settings.command_requires_full_index = 0;
+
 	for (i = 0; i < cnt; i++) {
 		repo_read_index(r);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v4 3/4] diff: enable and test the sparse index
  2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-11-22 22:42       ` [PATCH v4 1/4] sparse index: enable only for git repos Lessley Dennington via GitGitGadget
  2021-11-22 22:42       ` [PATCH v4 2/4] test-read-cache: set up repo after git directory Lessley Dennington via GitGitGadget
@ 2021-11-22 22:42       ` Lessley Dennington via GitGitGadget
  2021-11-23  7:47         ` Elijah Newren
  2021-11-23 23:48         ` Junio C Hamano
  2021-11-22 22:42       ` [PATCH v4 4/4] blame: " Lessley Dennington via GitGitGadget
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  4 siblings, 2 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-22 22:42 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code
with the 'git status' and 'git checkout' commands that were already
integrated.  For more details see:

d76723e (status: use sparse-index throughout, 2021-07-14)
1ba5f45 (checkout: stop expanding sparse indexes, 2021-06-29)

The most interesting thing to do is to add tests that verify that 'git
diff' behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~44% execution time reduction for 'git
diff' and a ~86% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.33    0.34 +3.0%
2000.31: git diff (full-v4)               0.33    0.35 +6.1%
2000.32: git diff (sparse-v3)             0.53    0.31 -41.5%
2000.33: git diff (sparse-v4)             0.54    0.29 -46.3%
2000.34: git diff --cached (full-v3)      0.07    0.07 +0.0%
2000.35: git diff --cached (full-v4)      0.07    0.08 +14.3%
2000.36: git diff --cached (sparse-v3)    0.28    0.04 -85.7%
2000.37: git diff --cached (sparse-v4)    0.23    0.03 -87.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/diff.c                           |  5 +++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index dd8ce688ba7..fa4683377eb 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -437,6 +437,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	prefix = setup_git_directory_gently(&nongit);
 
+	if (!nongit) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
+
 	if (!no_index) {
 		/*
 		 * Treat git diff with at least one path outside of the
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bfd332120c8..5cf94627383 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -113,5 +113,7 @@ test_perf_on_all git checkout -f -
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
+test_perf_on_all git diff
+test_perf_on_all git diff --cached
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 44d5e11c762..53524660759 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -832,6 +832,52 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
 	)
 '
 
+test_expect_success 'sparse index is not expanded: diff' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	# Add file within cone
+	test_sparse_match git sparse-checkout set deep &&
+	run_on_all ../edit-contents deep/testfile &&
+	test_all_match git add deep/testfile &&
+	run_on_all ../edit-contents deep/testfile &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged &&
+
+	# Add file outside cone
+	test_all_match git reset --hard &&
+	run_on_all mkdir newdirectory &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set newdirectory &&
+	test_all_match git add newdirectory/testfile &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged &&
+
+	# Merge conflict outside cone
+	# The sparse checkout will report a warning that is not in the
+	# full checkout, so we use `run_on_all` instead of
+	# `test_all_match`
+	run_on_all git reset --hard &&
+	test_all_match git checkout merge-left &&
+	test_all_match test_must_fail git merge merge-right &&
+
+	test_all_match git diff &&
+	test_all_match git diff --staged &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --staged
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v4 4/4] blame: enable and test the sparse index
  2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-11-22 22:42       ` [PATCH v4 3/4] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-11-22 22:42       ` Lessley Dennington via GitGitGadget
  2021-11-23 23:53         ` Junio C Hamano
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  4 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-11-22 22:42 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
currently does not support blaming files outside of the sparse definition.
Attempting to do so fails with the following error:

  fatal: no such path '<path outside sparse definition>' in HEAD

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  5 +++
 t/perf/p2000-sparse-operations.sh        |  4 +-
 t/t1092-sparse-checkout-compatibility.sh | 49 ++++++++++++++++++------
 3 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..247b9eaf88f 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -902,6 +902,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	long anchor;
 	const int hexsz = the_hash_algo->hexsz;
 
+	if (startup_info->have_repository) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
+
 	setup_default_color_by_age();
 	git_config(git_blame_config, &output_option);
 	repo_init_revisions(the_repository, &revs, NULL);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 5cf94627383..9ac76a049b8 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -114,6 +114,8 @@ test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
-test_perf_on_all git diff --cached
+test_perf_on_all git diff --staged
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 53524660759..d680dbd8867 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -442,21 +442,36 @@ test_expect_success 'log with pathspec outside sparse definition' '
 test_expect_success 'blame with pathspec inside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame a &&
-	test_all_match git blame deep/a &&
-	test_all_match git blame deep/deeper1/a &&
-	test_all_match git blame deep/deeper1/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_all_match git blame $file
+	done
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# NEEDSWORK: This test documents the current behavior, but this could
+# change in the future if we decide to support blaming files outside
+# the sparse definition.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
+	test_sparse_match git sparse-checkout set &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_sparse_match test_must_fail git blame $file &&
+		cat >expect <<-EOF &&
+		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
+		EOF
+		# We compare sparse-checkout-err and sparse-index-err in
+		# `test_sparse_match`. Given we know they are the same, we
+		# only check the content of sparse-index-err here.
+		test_cmp expect sparse-index-err
+	done
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -878,6 +893,18 @@ test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --staged
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		ensure_not_expanded blame $file
+	done
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-22 22:42       ` [PATCH v4 1/4] sparse index: enable only for git repos Lessley Dennington via GitGitGadget
@ 2021-11-23  7:41         ` Elijah Newren
  2021-11-23 14:52           ` Lessley Dennington
  2021-11-23 23:39         ` Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2021-11-23  7:41 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Lessley Dennington

On Mon, Nov 22, 2021 at 2:42 PM Lessley Dennington via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Check whether git dir exists before adding any repo settings. If it
> does not exist, BUG with the message that one cannot add settings for an
> uninitialized repository. If it does exist, proceed with adding repo
> settings.
>
> Additionally, ensure the above BUG is not triggered when users pass the -h
> flag by adding a check for the repository to the checkout and pack-objects
> builtins.

Why only checkout and pack-objects?  Why don't the -h flags to all of
the following need it as well?:

$ git grep -l prepare_repo_settings | grep builtin/
builtin/add.c
builtin/blame.c
builtin/checkout.c
builtin/commit.c
builtin/diff.c
builtin/fetch.c
builtin/gc.c
builtin/merge.c
builtin/pack-objects.c
builtin/rebase.c
builtin/reset.c
builtin/revert.c
builtin/sparse-checkout.c
builtin/update-index.c

If none of these need it, was it because they put
prepare_repo_settings() calls after some other basic checks had been
done so more do not have to be added?  If so, is there a similar place
in checkout and pack-objects where their calls to
prepare_repo_settings() can be moved?  (Looking ahead, it appears you
moved some code in patch 2 to do something like this.  Are the similar
moves that could be done here?)

> Finally, ensure the above BUG is not triggered for commit-graph by
> returning early if the git directory does not exist.

If commit-graph needs a special case to avoid triggering the BUG,
wouldn't several of these need it too?:

$ git grep -l prepare_repo_settings | grep -v builtin/
commit-graph.c
fetch-negotiator.c
merge-recursive.c
midx.c
read-cache.c
repo-settings.c
repository.c
repository.h
sparse-index.c
t/helper/test-read-cache.c
t/helper/test-read-graph.c
unpack-trees.c

or are their calls to prepare_repo_settings() only done after gitdir
setup?  If the latter, perhaps the commit-graph function calls could
be moved after gitdir setup too to avoid the need to do extra checks
in it?

> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  builtin/checkout.c     | 6 ++++--
>  builtin/pack-objects.c | 9 ++++++---
>  commit-graph.c         | 5 ++++-
>  repo-settings.c        | 3 +++
>  4 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 8c69dcdf72a..31632b07888 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -1588,8 +1588,10 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
>
>         git_config(git_checkout_config, opts);
>
> -       prepare_repo_settings(the_repository);
> -       the_repository->settings.command_requires_full_index = 0;
> +       if (startup_info->have_repository) {
> +               prepare_repo_settings(the_repository);
> +               the_repository->settings.command_requires_full_index = 0;
> +       }
>
>         opts->track = BRANCH_TRACK_UNSPECIFIED;
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 1a3dd445f83..45dc2258dc7 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3976,9 +3976,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>         read_replace_refs = 0;
>
>         sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
> -       prepare_repo_settings(the_repository);
> -       if (sparse < 0)
> -               sparse = the_repository->settings.pack_use_sparse;
> +
> +       if (startup_info->have_repository) {
> +               prepare_repo_settings(the_repository);
> +               if (sparse < 0)
> +                       sparse = the_repository->settings.pack_use_sparse;
> +       }
>
>         reset_pack_idx_option(&pack_idx_opts);
>         git_config(git_pack_config, NULL);
> diff --git a/commit-graph.c b/commit-graph.c
> index 2706683acfe..265c010122e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -632,10 +632,13 @@ static int prepare_commit_graph(struct repository *r)
>         struct object_directory *odb;
>
>         /*
> +        * Early return if there is no git dir or if the commit graph is
> +        * disabled.
> +        *
>          * This must come before the "already attempted?" check below, because
>          * we want to disable even an already-loaded graph file.
>          */
> -       if (r->commit_graph_disabled)
> +       if (!r->gitdir || r->commit_graph_disabled)
>                 return 0;
>
>         if (r->objects->commit_graph_attempted)
> diff --git a/repo-settings.c b/repo-settings.c
> index b93e91a212e..00ca5571a1a 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -17,6 +17,9 @@ void prepare_repo_settings(struct repository *r)
>         char *strval;
>         int manyfiles;
>
> +       if (!r->gitdir)
> +               BUG("Cannot add settings for uninitialized repository");
> +
>         if (r->settings.initialized++)
>                 return;

I'm not what the BUG() is trying to help us catch, but I'm worried
that there are many additional places that now need workarounds to
avoid triggering bugs.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 3/4] diff: enable and test the sparse index
  2021-11-22 22:42       ` [PATCH v4 3/4] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-11-23  7:47         ` Elijah Newren
  2021-11-23 14:53           ` Lessley Dennington
  2021-11-23 23:48         ` Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2021-11-23  7:47 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Lessley Dennington

On Mon, Nov 22, 2021 at 2:42 PM Lessley Dennington via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Enable the sparse index within the 'git diff' command. Its implementation
> already safely integrates with the sparse index because it shares code
> with the 'git status' and 'git checkout' commands that were already
> integrated.  For more details see:
>
> d76723e (status: use sparse-index throughout, 2021-07-14)
> 1ba5f45 (checkout: stop expanding sparse indexes, 2021-06-29)

I preferred the references in your v3:

d76723ee53 (status: use sparse-index throughout, 2021-07-14)
1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)

because 7-character abbreviations aren't very future proof;
10-character seems better to me.

(Very micro nit.)

>
> The most interesting thing to do is to add tests that verify that 'git
> diff' behaves correctly when the sparse index is enabled. These cases are:
>
> 1. The index is not expanded for 'diff' and 'diff --staged'
> 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
> checkout, and sparse index repositories in the following partially-staged
> scenarios (i.e. the index, HEAD, and working directory differ at a given
> path):
>     1. Path is within sparse-checkout cone
>     2. Path is outside sparse-checkout cone
>     3. A merge conflict exists for paths outside sparse-checkout cone
>
> The `p2000` tests demonstrate a ~44% execution time reduction for 'git
> diff' and a ~86% execution time reduction for 'git diff --staged' using a
> sparse index:
>
> Test                                      before  after
> -------------------------------------------------------------
> 2000.30: git diff (full-v3)               0.33    0.34 +3.0%
> 2000.31: git diff (full-v4)               0.33    0.35 +6.1%
> 2000.32: git diff (sparse-v3)             0.53    0.31 -41.5%
> 2000.33: git diff (sparse-v4)             0.54    0.29 -46.3%
> 2000.34: git diff --cached (full-v3)      0.07    0.07 +0.0%
> 2000.35: git diff --cached (full-v4)      0.07    0.08 +14.3%
> 2000.36: git diff --cached (sparse-v3)    0.28    0.04 -85.7%
> 2000.37: git diff --cached (sparse-v4)    0.23    0.03 -87.0%
>
> Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  builtin/diff.c                           |  5 +++
>  t/perf/p2000-sparse-operations.sh        |  2 ++
>  t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
>  3 files changed, 53 insertions(+)
>
> diff --git a/builtin/diff.c b/builtin/diff.c
> index dd8ce688ba7..fa4683377eb 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -437,6 +437,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>
>         prefix = setup_git_directory_gently(&nongit);
>
> +       if (!nongit) {
> +               prepare_repo_settings(the_repository);
> +               the_repository->settings.command_requires_full_index = 0;
> +       }
> +
>         if (!no_index) {
>                 /*
>                  * Treat git diff with at least one path outside of the
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index bfd332120c8..5cf94627383 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -113,5 +113,7 @@ test_perf_on_all git checkout -f -
>  test_perf_on_all git reset
>  test_perf_on_all git reset --hard
>  test_perf_on_all git reset -- does-not-exist
> +test_perf_on_all git diff
> +test_perf_on_all git diff --cached
>
>  test_done
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 44d5e11c762..53524660759 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -832,6 +832,52 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>         )
>  '
>
> +test_expect_success 'sparse index is not expanded: diff' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +
> +       # Add file within cone
> +       test_sparse_match git sparse-checkout set deep &&
> +       run_on_all ../edit-contents deep/testfile &&
> +       test_all_match git add deep/testfile &&
> +       run_on_all ../edit-contents deep/testfile &&
> +
> +       test_all_match git diff &&
> +       test_all_match git diff --staged &&
> +       ensure_not_expanded diff &&
> +       ensure_not_expanded diff --staged &&
> +
> +       # Add file outside cone
> +       test_all_match git reset --hard &&
> +       run_on_all mkdir newdirectory &&
> +       run_on_all ../edit-contents newdirectory/testfile &&
> +       test_sparse_match git sparse-checkout set newdirectory &&
> +       test_all_match git add newdirectory/testfile &&
> +       run_on_all ../edit-contents newdirectory/testfile &&
> +       test_sparse_match git sparse-checkout set &&
> +
> +       test_all_match git diff &&
> +       test_all_match git diff --staged &&
> +       ensure_not_expanded diff &&
> +       ensure_not_expanded diff --staged &&
> +
> +       # Merge conflict outside cone
> +       # The sparse checkout will report a warning that is not in the
> +       # full checkout, so we use `run_on_all` instead of
> +       # `test_all_match`
> +       run_on_all git reset --hard &&
> +       test_all_match git checkout merge-left &&
> +       test_all_match test_must_fail git merge merge-right &&
> +
> +       test_all_match git diff &&
> +       test_all_match git diff --staged &&
> +       ensure_not_expanded diff &&
> +       ensure_not_expanded diff --staged

You've changed some of the --staged to --cached, but based on Junio's
comments on the previous round, you probably want to convert the
others too.

> +'
> +
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>  # in this scenario, but it shouldn't.
>  test_expect_success 'reset mixed and checkout orphan' '
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/2] blame: enable and test the sparse index
  2021-10-14 17:25 ` [PATCH 2/2] blame: " Lessley Dennington via GitGitGadget
@ 2021-11-23  7:57   ` Elijah Newren
  2021-11-23 14:57     ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2021-11-23  7:57 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Lessley Dennington

On Thu, Oct 14, 2021 at 10:25 AM Lessley Dennington via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Enable the sparse index for the 'git blame' command. The index was already
> not expanded with this command, so the most interesting thing to do is to
> add tests that verify that 'git blame' behaves correctly when the sparse
> index is enabled and that its performance improves. More specifically, these
> cases are:
>
> 1. The index is not expanded for 'blame' when given paths in the sparse
> checkout cone at multiple levels.
>
> 2. Performance measurably improves for 'blame' with sparse index when given
> paths in the sparse checkout cone at multiple levels.
>
> The `p2000` tests demonstrate a ~60% execution time reduction when running
> 'blame' for a file two levels deep and and a ~30% execution time reduction
> for a file three levels deep.
>
> Test                                         before  after
> ----------------------------------------------------------------
> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

Looks good.

> We do not include paths outside the sparse checkout cone because blame
> currently does not support blaming files outside of the sparse definition.
> Attempting to do so fails with the following error:
>
> fatal: no such path '<path outside sparse definition>' in HEAD

While technically accurate, this wording is misleading; it implies
that there is something unique to sparse checkouts, and perhaps even
to cone mode, affecting how blame handles files not in the working
directory.  That's not true, though; git blame without a revision has
always reported an error when given a file that does not exist in the
working tree.  Try this in git.git:

$ rm t/README
$ git blame t/README
fatal: Cannot lstat 't/README': No such file or directory

The reason is that with no revisions, calling git blame with a
filename means asking the question "Which commit did each line in that
file come from?"  If there's no file, the question just doesn't make
sense.  You could make sense of it by thinking in terms of some
revision of the file, but then you're passing a revision along --
which works just fine in a sparse checkout too.

>
> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  builtin/blame.c                          |  3 +++
>  t/perf/p2000-sparse-operations.sh        |  2 ++
>  t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
>  3 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 641523ff9af..af3d81e2bd4 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>         long anchor;
>         const int hexsz = the_hash_algo->hexsz;
>
> +       prepare_repo_settings(the_repository);
> +       the_repository->settings.command_requires_full_index = 0;
> +
>         setup_default_color_by_age();
>         git_config(git_blame_config, &output_option);
>         repo_init_revisions(the_repository, &revs, NULL);
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index bff93f16e93..9ac76a049b8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
>  test_perf_on_all git reset -- does-not-exist
>  test_perf_on_all git diff
>  test_perf_on_all git diff --staged
> +test_perf_on_all git blame $SPARSE_CONE/a
> +test_perf_on_all git blame $SPARSE_CONE/f3/a
>
>  test_done
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 1070bff1a83..54826e858a9 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -485,15 +485,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
>         test_all_match git blame deep/deeper1/deepest/a
>  '
>
> -# TODO: blame currently does not support blaming files outside of the
> -# sparse definition. It complains that the file doesn't exist locally.
> -test_expect_failure 'blame with pathspec outside sparse definition' '
> +# Blame does not support blaming files outside of the sparse
> +# definition, so we verify this scenario.

As above, this is misleading.  It'd be better to word it something like:

# Without a revision specified, blame will error if passed any file that
# is not present in the working directory (even if the file is tracked).
# Here we just verify that this is also true with sparse checkouts.

> +test_expect_success 'blame with pathspec outside sparse definition' '
>         init_repos &&
>
> -       test_all_match git blame folder1/a &&
> -       test_all_match git blame folder2/a &&
> -       test_all_match git blame deep/deeper2/a &&
> -       test_all_match git blame deep/deeper2/deepest/a
> +       test_sparse_match git sparse-checkout set &&
> +       test_sparse_match test_must_fail git blame folder1/a &&
> +       test_sparse_match test_must_fail git blame folder2/a &&
> +       test_sparse_match test_must_fail git blame deep/deeper2/a &&
> +       test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
>  '
>
>  test_expect_success 'checkout and reset (mixed)' '
> @@ -871,6 +872,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>         )
>  '
>
> +test_expect_success 'sparse index is not expanded: blame' '
> +       init_repos &&
> +
> +       ensure_not_expanded blame a &&
> +       ensure_not_expanded blame deep/a &&
> +       ensure_not_expanded blame deep/deeper1/a &&
> +       ensure_not_expanded blame deep/deeper1/deepest/a
> +'
> +
>  # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>  # in this scenario, but it shouldn't.
>  test_expect_success 'reset mixed and checkout orphan' '
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-23  7:41         ` Elijah Newren
@ 2021-11-23 14:52           ` Lessley Dennington
  0 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-23 14:52 UTC (permalink / raw)
  To: Elijah Newren, Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau



On 11/22/21 11:41 PM, Elijah Newren wrote:
> On Mon, Nov 22, 2021 at 2:42 PM Lessley Dennington via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Check whether git dir exists before adding any repo settings. If it
>> does not exist, BUG with the message that one cannot add settings for an
>> uninitialized repository. If it does exist, proceed with adding repo
>> settings.
>>
>> Additionally, ensure the above BUG is not triggered when users pass the -h
>> flag by adding a check for the repository to the checkout and pack-objects
>> builtins.
> 
> Why only checkout and pack-objects?  Why don't the -h flags to all of
> the following need it as well?:
> 
> $ git grep -l prepare_repo_settings | grep builtin/
> builtin/add.c
> builtin/blame.c
> builtin/checkout.c
> builtin/commit.c
> builtin/diff.c
> builtin/fetch.c
> builtin/gc.c
> builtin/merge.c
> builtin/pack-objects.c
> builtin/rebase.c
> builtin/reset.c
> builtin/revert.c
> builtin/sparse-checkout.c
> builtin/update-index.c
> 
> If none of these need it, was it because they put
> prepare_repo_settings() calls after some other basic checks had been
> done so more do not have to be added?  If so, is there a similar place
> in checkout and pack-objects where their calls to
> prepare_repo_settings() can be moved?  (Looking ahead, it appears you
> moved some code in patch 2 to do something like this.  Are the similar
> moves that could be done here?)
> 
Thank you for the quick feedback. Yes, I believe there are similar moves 
that can be done here. I was attempting to be explicit about the case I'm 
guarding against, but you're right - it shouldn't be done one way for some 
builtins and another way for others.
>> Finally, ensure the above BUG is not triggered for commit-graph by
>> returning early if the git directory does not exist.
> 
> If commit-graph needs a special case to avoid triggering the BUG,
> wouldn't several of these need it too?:
> 
> $ git grep -l prepare_repo_settings | grep -v builtin/
> commit-graph.c
> fetch-negotiator.c
> merge-recursive.c
> midx.c
> read-cache.c
> repo-settings.c
> repository.c
> repository.h
> sparse-index.c
> t/helper/test-read-cache.c
> t/helper/test-read-graph.c
> unpack-trees.c
> 
> or are their calls to prepare_repo_settings() only done after gitdir
> setup?  If the latter, perhaps the commit-graph function calls could
> be moved after gitdir setup too to avoid the need to do extra checks
> in it?
> 
Agreed, I can move this as well.
>> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
>> ---
>>   builtin/checkout.c     | 6 ++++--
>>   builtin/pack-objects.c | 9 ++++++---
>>   commit-graph.c         | 5 ++++-
>>   repo-settings.c        | 3 +++
>>   4 files changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/builtin/checkout.c b/builtin/checkout.c
>> index 8c69dcdf72a..31632b07888 100644
>> --- a/builtin/checkout.c
>> +++ b/builtin/checkout.c
>> @@ -1588,8 +1588,10 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
>>
>>          git_config(git_checkout_config, opts);
>>
>> -       prepare_repo_settings(the_repository);
>> -       the_repository->settings.command_requires_full_index = 0;
>> +       if (startup_info->have_repository) {
>> +               prepare_repo_settings(the_repository);
>> +               the_repository->settings.command_requires_full_index = 0;
>> +       }
>>
>>          opts->track = BRANCH_TRACK_UNSPECIFIED;
>>
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 1a3dd445f83..45dc2258dc7 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -3976,9 +3976,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>          read_replace_refs = 0;
>>
>>          sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
>> -       prepare_repo_settings(the_repository);
>> -       if (sparse < 0)
>> -               sparse = the_repository->settings.pack_use_sparse;
>> +
>> +       if (startup_info->have_repository) {
>> +               prepare_repo_settings(the_repository);
>> +               if (sparse < 0)
>> +                       sparse = the_repository->settings.pack_use_sparse;
>> +       }
>>
>>          reset_pack_idx_option(&pack_idx_opts);
>>          git_config(git_pack_config, NULL);
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 2706683acfe..265c010122e 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -632,10 +632,13 @@ static int prepare_commit_graph(struct repository *r)
>>          struct object_directory *odb;
>>
>>          /*
>> +        * Early return if there is no git dir or if the commit graph is
>> +        * disabled.
>> +        *
>>           * This must come before the "already attempted?" check below, because
>>           * we want to disable even an already-loaded graph file.
>>           */
>> -       if (r->commit_graph_disabled)
>> +       if (!r->gitdir || r->commit_graph_disabled)
>>                  return 0;
>>
>>          if (r->objects->commit_graph_attempted)
>> diff --git a/repo-settings.c b/repo-settings.c
>> index b93e91a212e..00ca5571a1a 100644
>> --- a/repo-settings.c
>> +++ b/repo-settings.c
>> @@ -17,6 +17,9 @@ void prepare_repo_settings(struct repository *r)
>>          char *strval;
>>          int manyfiles;
>>
>> +       if (!r->gitdir)
>> +               BUG("Cannot add settings for uninitialized repository");
>> +
>>          if (r->settings.initialized++)
>>                  return;
> 
> I'm not what the BUG() is trying to help us catch, but I'm worried
> that there are many additional places that now need workarounds to
> avoid triggering bugs.
> 
I see your point. We're trying to make sure we catch issues like the 
nongit scenario I overlooked in diff in earlier versions of this series. 
But if we feel this change is too disruptive and will likely cause issues 
beyond those I've fixed to ensure our tests pass, I can remove.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 3/4] diff: enable and test the sparse index
  2021-11-23  7:47         ` Elijah Newren
@ 2021-11-23 14:53           ` Lessley Dennington
  0 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-23 14:53 UTC (permalink / raw)
  To: Elijah Newren, Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau



On 11/22/21 11:47 PM, Elijah Newren wrote:
> On Mon, Nov 22, 2021 at 2:42 PM Lessley Dennington via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Enable the sparse index within the 'git diff' command. Its implementation
>> already safely integrates with the sparse index because it shares code
>> with the 'git status' and 'git checkout' commands that were already
>> integrated.  For more details see:
>>
>> d76723e (status: use sparse-index throughout, 2021-07-14)
>> 1ba5f45 (checkout: stop expanding sparse indexes, 2021-06-29)
> 
> I preferred the references in your v3:
> 
> d76723ee53 (status: use sparse-index throughout, 2021-07-14)
> 1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)
> 
> because 7-character abbreviations aren't very future proof;
> 10-character seems better to me.
> 
> (Very micro nit.)
> 
Appreciate the pointer - I'll return to the old formatting for v5.
>>
>> The most interesting thing to do is to add tests that verify that 'git
>> diff' behaves correctly when the sparse index is enabled. These cases are:
>>
>> 1. The index is not expanded for 'diff' and 'diff --staged'
>> 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
>> checkout, and sparse index repositories in the following partially-staged
>> scenarios (i.e. the index, HEAD, and working directory differ at a given
>> path):
>>      1. Path is within sparse-checkout cone
>>      2. Path is outside sparse-checkout cone
>>      3. A merge conflict exists for paths outside sparse-checkout cone
>>
>> The `p2000` tests demonstrate a ~44% execution time reduction for 'git
>> diff' and a ~86% execution time reduction for 'git diff --staged' using a
>> sparse index:
>>
>> Test                                      before  after
>> -------------------------------------------------------------
>> 2000.30: git diff (full-v3)               0.33    0.34 +3.0%
>> 2000.31: git diff (full-v4)               0.33    0.35 +6.1%
>> 2000.32: git diff (sparse-v3)             0.53    0.31 -41.5%
>> 2000.33: git diff (sparse-v4)             0.54    0.29 -46.3%
>> 2000.34: git diff --cached (full-v3)      0.07    0.07 +0.0%
>> 2000.35: git diff --cached (full-v4)      0.07    0.08 +14.3%
>> 2000.36: git diff --cached (sparse-v3)    0.28    0.04 -85.7%
>> 2000.37: git diff --cached (sparse-v4)    0.23    0.03 -87.0%
>>
>> Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
>> ---
>>   builtin/diff.c                           |  5 +++
>>   t/perf/p2000-sparse-operations.sh        |  2 ++
>>   t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
>>   3 files changed, 53 insertions(+)
>>
>> diff --git a/builtin/diff.c b/builtin/diff.c
>> index dd8ce688ba7..fa4683377eb 100644
>> --- a/builtin/diff.c
>> +++ b/builtin/diff.c
>> @@ -437,6 +437,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>>
>>          prefix = setup_git_directory_gently(&nongit);
>>
>> +       if (!nongit) {
>> +               prepare_repo_settings(the_repository);
>> +               the_repository->settings.command_requires_full_index = 0;
>> +       }
>> +
>>          if (!no_index) {
>>                  /*
>>                   * Treat git diff with at least one path outside of the
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index bfd332120c8..5cf94627383 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -113,5 +113,7 @@ test_perf_on_all git checkout -f -
>>   test_perf_on_all git reset
>>   test_perf_on_all git reset --hard
>>   test_perf_on_all git reset -- does-not-exist
>> +test_perf_on_all git diff
>> +test_perf_on_all git diff --cached
>>
>>   test_done
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 44d5e11c762..53524660759 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -832,6 +832,52 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>>          )
>>   '
>>
>> +test_expect_success 'sparse index is not expanded: diff' '
>> +       init_repos &&
>> +
>> +       write_script edit-contents <<-\EOF &&
>> +       echo text >>$1
>> +       EOF
>> +
>> +       # Add file within cone
>> +       test_sparse_match git sparse-checkout set deep &&
>> +       run_on_all ../edit-contents deep/testfile &&
>> +       test_all_match git add deep/testfile &&
>> +       run_on_all ../edit-contents deep/testfile &&
>> +
>> +       test_all_match git diff &&
>> +       test_all_match git diff --staged &&
>> +       ensure_not_expanded diff &&
>> +       ensure_not_expanded diff --staged &&
>> +
>> +       # Add file outside cone
>> +       test_all_match git reset --hard &&
>> +       run_on_all mkdir newdirectory &&
>> +       run_on_all ../edit-contents newdirectory/testfile &&
>> +       test_sparse_match git sparse-checkout set newdirectory &&
>> +       test_all_match git add newdirectory/testfile &&
>> +       run_on_all ../edit-contents newdirectory/testfile &&
>> +       test_sparse_match git sparse-checkout set &&
>> +
>> +       test_all_match git diff &&
>> +       test_all_match git diff --staged &&
>> +       ensure_not_expanded diff &&
>> +       ensure_not_expanded diff --staged &&
>> +
>> +       # Merge conflict outside cone
>> +       # The sparse checkout will report a warning that is not in the
>> +       # full checkout, so we use `run_on_all` instead of
>> +       # `test_all_match`
>> +       run_on_all git reset --hard &&
>> +       test_all_match git checkout merge-left &&
>> +       test_all_match test_must_fail git merge merge-right &&
>> +
>> +       test_all_match git diff &&
>> +       test_all_match git diff --staged &&
>> +       ensure_not_expanded diff &&
>> +       ensure_not_expanded diff --staged
> 
> You've changed some of the --staged to --cached, but based on Junio's
> comments on the previous round, you probably want to convert the
> others too.
> 
Will do for v5.
>> +'
>> +
>>   # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>>   # in this scenario, but it shouldn't.
>>   test_expect_success 'reset mixed and checkout orphan' '
>> --
>> gitgitgadget
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/2] blame: enable and test the sparse index
  2021-11-23  7:57   ` Elijah Newren
@ 2021-11-23 14:57     ` Lessley Dennington
  0 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-23 14:57 UTC (permalink / raw)
  To: Elijah Newren, Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano



On 11/22/21 11:57 PM, Elijah Newren wrote:
> On Thu, Oct 14, 2021 at 10:25 AM Lessley Dennington via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Enable the sparse index for the 'git blame' command. The index was already
>> not expanded with this command, so the most interesting thing to do is to
>> add tests that verify that 'git blame' behaves correctly when the sparse
>> index is enabled and that its performance improves. More specifically, these
>> cases are:
>>
>> 1. The index is not expanded for 'blame' when given paths in the sparse
>> checkout cone at multiple levels.
>>
>> 2. Performance measurably improves for 'blame' with sparse index when given
>> paths in the sparse checkout cone at multiple levels.
>>
>> The `p2000` tests demonstrate a ~60% execution time reduction when running
>> 'blame' for a file two levels deep and and a ~30% execution time reduction
>> for a file three levels deep.
>>
>> Test                                         before  after
>> ----------------------------------------------------------------
>> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
>> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
>> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
>> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
>> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
>> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
>> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
>> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%
> 
> Looks good.
> 
>> We do not include paths outside the sparse checkout cone because blame
>> currently does not support blaming files outside of the sparse definition.
>> Attempting to do so fails with the following error:
>>
>> fatal: no such path '<path outside sparse definition>' in HEAD
> 
> While technically accurate, this wording is misleading; it implies
> that there is something unique to sparse checkouts, and perhaps even
> to cone mode, affecting how blame handles files not in the working
> directory.  That's not true, though; git blame without a revision has
> always reported an error when given a file that does not exist in the
> working tree.  Try this in git.git:
> 
> $ rm t/README
> $ git blame t/README
> fatal: Cannot lstat 't/README': No such file or directory
> 
> The reason is that with no revisions, calling git blame with a
> filename means asking the question "Which commit did each line in that
> file come from?"  If there's no file, the question just doesn't make
> sense.  You could make sense of it by thinking in terms of some
> revision of the file, but then you're passing a revision along --
> which works just fine in a sparse checkout too.
> 
Thank you for clarifying that this is actually the expected behavior and 
isn't something we need to "fix" for sparse-checkout. I will update 
accordingly for v5.
>>
>> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
>> ---
>>   builtin/blame.c                          |  3 +++
>>   t/perf/p2000-sparse-operations.sh        |  2 ++
>>   t/t1092-sparse-checkout-compatibility.sh | 24 +++++++++++++++++-------
>>   3 files changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index 641523ff9af..af3d81e2bd4 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -902,6 +902,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>>          long anchor;
>>          const int hexsz = the_hash_algo->hexsz;
>>
>> +       prepare_repo_settings(the_repository);
>> +       the_repository->settings.command_requires_full_index = 0;
>> +
>>          setup_default_color_by_age();
>>          git_config(git_blame_config, &output_option);
>>          repo_init_revisions(the_repository, &revs, NULL);
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index bff93f16e93..9ac76a049b8 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
>>   test_perf_on_all git reset -- does-not-exist
>>   test_perf_on_all git diff
>>   test_perf_on_all git diff --staged
>> +test_perf_on_all git blame $SPARSE_CONE/a
>> +test_perf_on_all git blame $SPARSE_CONE/f3/a
>>
>>   test_done
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 1070bff1a83..54826e858a9 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -485,15 +485,16 @@ test_expect_success 'blame with pathspec inside sparse definition' '
>>          test_all_match git blame deep/deeper1/deepest/a
>>   '
>>
>> -# TODO: blame currently does not support blaming files outside of the
>> -# sparse definition. It complains that the file doesn't exist locally.
>> -test_expect_failure 'blame with pathspec outside sparse definition' '
>> +# Blame does not support blaming files outside of the sparse
>> +# definition, so we verify this scenario.
> 
> As above, this is misleading.  It'd be better to word it something like:
> 
> # Without a revision specified, blame will error if passed any file that
> # is not present in the working directory (even if the file is tracked).
> # Here we just verify that this is also true with sparse checkouts.
> 
Thank you. Will update for v5.
>> +test_expect_success 'blame with pathspec outside sparse definition' '
>>          init_repos &&
>>
>> -       test_all_match git blame folder1/a &&
>> -       test_all_match git blame folder2/a &&
>> -       test_all_match git blame deep/deeper2/a &&
>> -       test_all_match git blame deep/deeper2/deepest/a
>> +       test_sparse_match git sparse-checkout set &&
>> +       test_sparse_match test_must_fail git blame folder1/a &&
>> +       test_sparse_match test_must_fail git blame folder2/a &&
>> +       test_sparse_match test_must_fail git blame deep/deeper2/a &&
>> +       test_sparse_match test_must_fail git blame deep/deeper2/deepest/a
>>   '
>>
>>   test_expect_success 'checkout and reset (mixed)' '
>> @@ -871,6 +872,15 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
>>          )
>>   '
>>
>> +test_expect_success 'sparse index is not expanded: blame' '
>> +       init_repos &&
>> +
>> +       ensure_not_expanded blame a &&
>> +       ensure_not_expanded blame deep/a &&
>> +       ensure_not_expanded blame deep/deeper1/a &&
>> +       ensure_not_expanded blame deep/deeper1/deepest/a
>> +'
>> +
>>   # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
>>   # in this scenario, but it shouldn't.
>>   test_expect_success 'reset mixed and checkout orphan' '
>> --
>> gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-22 22:42       ` [PATCH v4 1/4] sparse index: enable only for git repos Lessley Dennington via GitGitGadget
  2021-11-23  7:41         ` Elijah Newren
@ 2021-11-23 23:39         ` Junio C Hamano
  2021-11-24 14:41           ` Lessley Dennington
  1 sibling, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-11-23 23:39 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Lessley Dennington

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Check whether git dir exists before adding any repo settings. If it
> does not exist, BUG with the message that one cannot add settings for an
> uninitialized repository. If it does exist, proceed with adding repo
> settings.
>
> Additionally, ensure the above BUG is not triggered when users pass the -h
> flag by adding a check for the repository to the checkout and pack-objects
> builtins.
>
> Finally, ensure the above BUG is not triggered for commit-graph by
> returning early if the git directory does not exist.
>
> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  builtin/checkout.c     | 6 ++++--
>  builtin/pack-objects.c | 9 ++++++---
>  commit-graph.c         | 5 ++++-
>  repo-settings.c        | 3 +++
>  4 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 8c69dcdf72a..31632b07888 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -1588,8 +1588,10 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
>  
>  	git_config(git_checkout_config, opts);
>  
> -	prepare_repo_settings(the_repository);
> -	the_repository->settings.command_requires_full_index = 0;
> +	if (startup_info->have_repository) {
> +		prepare_repo_settings(the_repository);
> +		the_repository->settings.command_requires_full_index = 0;
> +	}

I am kind-a surprised if the control reaches this deep if you are
not in a repository.  In git.c::commands[] list, all three primary
entry points that call checkout_main(), namely, cmd_checkout(),
cmd_restore(), and cmd_switch(), are marked with RUN_SETUP bit,
which makes us call setup_git_directory() even before we call the
cmd_X() function.  And setup_git_directory() dies with "not a git
repository (or any of the parent directories)" outside a repository.

So, how can startup_info->have_repository bit be false if the
control flow reaches here?  Or am I grossly misunderstanding what
you are trying to do?

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 1a3dd445f83..45dc2258dc7 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3976,9 +3976,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	read_replace_refs = 0;

Ditto wrt RUN_SETUP.

> diff --git a/commit-graph.c b/commit-graph.c
> index 2706683acfe..265c010122e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -632,10 +632,13 @@ static int prepare_commit_graph(struct repository *r)
>  	struct object_directory *odb;
>  	/*
> +	 * Early return if there is no git dir or if the commit graph is
> +	 * disabled.
> +	 *
>  	 * This must come before the "already attempted?" check below, because
>  	 * we want to disable even an already-loaded graph file.
>  	 */
> -	if (r->commit_graph_disabled)
> +	if (!r->gitdir || r->commit_graph_disabled)
>  		return 0;

I haven't followed the control flow, but this one probably is a good
addition (in other words, unlike cmd_pack_objects(), I cannot convince
myself that r->gitdir will never be NULL here).

> diff --git a/repo-settings.c b/repo-settings.c
> index b93e91a212e..00ca5571a1a 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -17,6 +17,9 @@ void prepare_repo_settings(struct repository *r)
>  	char *strval;
>  	int manyfiles;
>  
> +	if (!r->gitdir)
> +		BUG("Cannot add settings for uninitialized repository");
> +

This is a very good idea.  If I recall correctly, I think I reviewed
a bugfix patch that would have been simplified if we had this check
here.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 2/4] test-read-cache: set up repo after git directory
  2021-11-22 22:42       ` [PATCH v4 2/4] test-read-cache: set up repo after git directory Lessley Dennington via GitGitGadget
@ 2021-11-23 23:42         ` Junio C Hamano
  2021-11-24 15:10           ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-11-23 23:42 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Lessley Dennington

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Move repo setup to occur after git directory is set up. This will ensure
> enabling the sparse index for `diff` (and guarding against the nongit
> scenario) will not cause tests to start failing, since that change will include
> adding a check to prepare_repo_settings() with the new BUG.

This looks obviously the right thing to do.  Would anything break
because of the "wrong" ordering of events in the original code?

IOW, can this "bugfix" be protected with a new test against
regression?

> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  t/helper/test-read-cache.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
> index b52c174acc7..0d9f08931a1 100644
> --- a/t/helper/test-read-cache.c
> +++ b/t/helper/test-read-cache.c
> @@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv)
>  	int table = 0, expand = 0;
>  
>  	initialize_the_repository();
> -	prepare_repo_settings(r);
> -	r->settings.command_requires_full_index = 0;
>  
>  	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
>  		if (skip_prefix(*argv, "--print-and-refresh=", &name))
> @@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv)
>  	setup_git_directory();
>  	git_config(git_default_config, NULL);
>  
> +	prepare_repo_settings(r);
> +	r->settings.command_requires_full_index = 0;
> +
>  	for (i = 0; i < cnt; i++) {
>  		repo_read_index(r);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 3/4] diff: enable and test the sparse index
  2021-11-22 22:42       ` [PATCH v4 3/4] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
  2021-11-23  7:47         ` Elijah Newren
@ 2021-11-23 23:48         ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-11-23 23:48 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Lessley Dennington

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/builtin/diff.c b/builtin/diff.c
> index dd8ce688ba7..fa4683377eb 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -437,6 +437,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  
>  	prefix = setup_git_directory_gently(&nongit);
>  
> +	if (!nongit) {
> +		prepare_repo_settings(the_repository);
> +		the_repository->settings.command_requires_full_index = 0;
> +	}
> +

It is very pleasing to see that with all the preparations, it is
jsut a very simple matter of adding four lines to enable the big
feature.

At this point immediately after asking setup_git_directory_gently(),
we positively know if we are in a repository, so the placement of
the new code makes perfect sense.

Nice.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 4/4] blame: enable and test the sparse index
  2021-11-22 22:42       ` [PATCH v4 4/4] blame: " Lessley Dennington via GitGitGadget
@ 2021-11-23 23:53         ` Junio C Hamano
  2021-11-24 14:52           ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-11-23 23:53 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Lessley Dennington

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/builtin/blame.c b/builtin/blame.c
> index 641523ff9af..247b9eaf88f 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -902,6 +902,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>  	long anchor;
>  	const int hexsz = the_hash_algo->hexsz;
>  
> +	if (startup_info->have_repository) {

The command is marked with RUN_SETUP bit in git.c::commands[] list,
so I would think we wouldn't even get called if we are not in a
repository here.

Under what condition can startup_info->have_repository be false at
this point in the control flow?  If there is such a case, it would
mean that startup_info->have_repository bit can be false even if we
are in a repository.  That sounds like a bug in some code (I do not
know where offhand) that is supposed to prepare the startup_info
before cmd_X() functions are called.

> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 5cf94627383..9ac76a049b8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -114,6 +114,8 @@ test_perf_on_all git reset
>  test_perf_on_all git reset --hard
>  test_perf_on_all git reset -- does-not-exist
>  test_perf_on_all git diff
> -test_perf_on_all git diff --cached
> +test_perf_on_all git diff --staged

That's a funny revert of what the previous step did; I thought this
step was about "blame" and not "diff".

> +test_perf_on_all git blame $SPARSE_CONE/a
> +test_perf_on_all git blame $SPARSE_CONE/f3/a
>  
>  test_done
> -# TODO: blame currently does not support blaming files outside of the
> -# sparse definition. It complains that the file doesn't exist locally.
> -test_expect_failure 'blame with pathspec outside sparse definition' '
> +# NEEDSWORK: This test documents the current behavior, but this could
> +# change in the future if we decide to support blaming files outside
> +# the sparse definition.

OK.  From the description it is clear that we do not support it
right now, which is OK by me.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-23 23:39         ` Junio C Hamano
@ 2021-11-24 14:41           ` Lessley Dennington
  2021-11-24 18:23             ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington @ 2021-11-24 14:41 UTC (permalink / raw)
  To: Junio C Hamano, Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau



On 11/23/21 3:39 PM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Check whether git dir exists before adding any repo settings. If it
>> does not exist, BUG with the message that one cannot add settings for an
>> uninitialized repository. If it does exist, proceed with adding repo
>> settings.
>>
>> Additionally, ensure the above BUG is not triggered when users pass the -h
>> flag by adding a check for the repository to the checkout and pack-objects
>> builtins.
>>
>> Finally, ensure the above BUG is not triggered for commit-graph by
>> returning early if the git directory does not exist.
>>
>> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
>> ---
>>   builtin/checkout.c     | 6 ++++--
>>   builtin/pack-objects.c | 9 ++++++---
>>   commit-graph.c         | 5 ++++-
>>   repo-settings.c        | 3 +++
>>   4 files changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/builtin/checkout.c b/builtin/checkout.c
>> index 8c69dcdf72a..31632b07888 100644
>> --- a/builtin/checkout.c
>> +++ b/builtin/checkout.c
>> @@ -1588,8 +1588,10 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
>>   
>>   	git_config(git_checkout_config, opts);
>>   
>> -	prepare_repo_settings(the_repository);
>> -	the_repository->settings.command_requires_full_index = 0;
>> +	if (startup_info->have_repository) {
>> +		prepare_repo_settings(the_repository);
>> +		the_repository->settings.command_requires_full_index = 0;
>> +	}
> 
> I am kind-a surprised if the control reaches this deep if you are
> not in a repository.  In git.c::commands[] list, all three primary
> entry points that call checkout_main(), namely, cmd_checkout(),
> cmd_restore(), and cmd_switch(), are marked with RUN_SETUP bit,
> which makes us call setup_git_directory() even before we call the
> cmd_X() function.  And setup_git_directory() dies with "not a git
> repository (or any of the parent directories)" outside a repository.
> 
> So, how can startup_info->have_repository bit be false if the
> control flow reaches here?  Or am I grossly misunderstanding what
> you are trying to do?
> 
This was in reaction to the t0012-help.sh tests failing with the
new BUG in prepare_repo_settings. However, Elijah pointed out
that it's a better idea to move prepare_repo_settings farther
down (after parse_options) instead. So I will be issuing that
change as part of v5.
>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>> index 1a3dd445f83..45dc2258dc7 100644
>> --- a/builtin/pack-objects.c
>> +++ b/builtin/pack-objects.c
>> @@ -3976,9 +3976,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>   	read_replace_refs = 0;
> 
> Ditto wrt RUN_SETUP.
> 
Updating with the same change as checkout for v5.
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 2706683acfe..265c010122e 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -632,10 +632,13 @@ static int prepare_commit_graph(struct repository *r)
>>   	struct object_directory *odb;
>>   	/*
>> +	 * Early return if there is no git dir or if the commit graph is
>> +	 * disabled.
>> +	 *
>>   	 * This must come before the "already attempted?" check below, because
>>   	 * we want to disable even an already-loaded graph file.
>>   	 */
>> -	if (r->commit_graph_disabled)
>> +	if (!r->gitdir || r->commit_graph_disabled)
>>   		return 0;
> 
> I haven't followed the control flow, but this one probably is a good
> addition (in other words, unlike cmd_pack_objects(), I cannot convince
> myself that r->gitdir will never be NULL here).
> 
>> diff --git a/repo-settings.c b/repo-settings.c
>> index b93e91a212e..00ca5571a1a 100644
>> --- a/repo-settings.c
>> +++ b/repo-settings.c
>> @@ -17,6 +17,9 @@ void prepare_repo_settings(struct repository *r)
>>   	char *strval;
>>   	int manyfiles;
>>   
>> +	if (!r->gitdir)
>> +		BUG("Cannot add settings for uninitialized repository");
>> +
> 
> This is a very good idea.  If I recall correctly, I think I reviewed
> a bugfix patch that would have been simplified if we had this check
> here.
>Lessley

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 4/4] blame: enable and test the sparse index
  2021-11-23 23:53         ` Junio C Hamano
@ 2021-11-24 14:52           ` Lessley Dennington
  0 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-24 14:52 UTC (permalink / raw)
  To: Junio C Hamano, Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau



On 11/23/21 3:53 PM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index 641523ff9af..247b9eaf88f 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -902,6 +902,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
>>   	long anchor;
>>   	const int hexsz = the_hash_algo->hexsz;
>>   
>> +	if (startup_info->have_repository) {
> 
> The command is marked with RUN_SETUP bit in git.c::commands[] list,
> so I would think we wouldn't even get called if we are not in a
> repository here.
> 
> Under what condition can startup_info->have_repository be false at
> this point in the control flow?  If there is such a case, it would
> mean that startup_info->have_repository bit can be false even if we
> are in a repository.  That sounds like a bug in some code (I do not
> know where offhand) that is supposed to prepare the startup_info
> before cmd_X() functions are called.
> 
As with checkout and pack-objects, this was added to protect against
the new BUG() in prepare_repo_settings being triggered with the help
command (which was surfaced with failures in t0012-help.sh). As with
those builtins, I will be removing the conditional and moving
prepare_repo_settings after parse_options in v5.
>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
>> index 5cf94627383..9ac76a049b8 100755
>> --- a/t/perf/p2000-sparse-operations.sh
>> +++ b/t/perf/p2000-sparse-operations.sh
>> @@ -114,6 +114,8 @@ test_perf_on_all git reset
>>   test_perf_on_all git reset --hard
>>   test_perf_on_all git reset -- does-not-exist
>>   test_perf_on_all git diff
>> -test_perf_on_all git diff --cached
>> +test_perf_on_all git diff --staged
> 
> That's a funny revert of what the previous step did; I thought this
> step was about "blame" and not "diff".
> 
>> +test_perf_on_all git blame $SPARSE_CONE/a
>> +test_perf_on_all git blame $SPARSE_CONE/f3/a
>>   
>>   test_done
>> -# TODO: blame currently does not support blaming files outside of the
>> -# sparse definition. It complains that the file doesn't exist locally.
>> -test_expect_failure 'blame with pathspec outside sparse definition' '
>> +# NEEDSWORK: This test documents the current behavior, but this could
>> +# change in the future if we decide to support blaming files outside
>> +# the sparse definition.
> 
> OK.  From the description it is clear that we do not support it
> right now, which is OK by me.
> 
Elijah made the point that blaming files outside the directory isn't
supported in full checkouts either. So I've updated the comment and
commit description a bit for v5.

Lessley

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 2/4] test-read-cache: set up repo after git directory
  2021-11-23 23:42         ` Junio C Hamano
@ 2021-11-24 15:10           ` Lessley Dennington
  2021-11-24 18:36             ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington @ 2021-11-24 15:10 UTC (permalink / raw)
  To: Junio C Hamano, Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau



On 11/23/21 3:42 PM, Junio C Hamano wrote:
> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> From: Lessley Dennington <lessleydennington@gmail.com>
>>
>> Move repo setup to occur after git directory is set up. This will ensure
>> enabling the sparse index for `diff` (and guarding against the nongit
>> scenario) will not cause tests to start failing, since that change will include
>> adding a check to prepare_repo_settings() with the new BUG.
> 
> This looks obviously the right thing to do.  Would anything break
> because of the "wrong" ordering of events in the original code?
> 
> IOW, can this "bugfix" be protected with a new test against
> regression?
> 
Yep! Tests 2, 3, 28, and 34 in t1092-sparse-checkout-compatibility.sh
will fail without this change.
>> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
>> ---
>>   t/helper/test-read-cache.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
>> index b52c174acc7..0d9f08931a1 100644
>> --- a/t/helper/test-read-cache.c
>> +++ b/t/helper/test-read-cache.c
>> @@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv)
>>   	int table = 0, expand = 0;
>>   
>>   	initialize_the_repository();
>> -	prepare_repo_settings(r);
>> -	r->settings.command_requires_full_index = 0;
>>   
>>   	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
>>   		if (skip_prefix(*argv, "--print-and-refresh=", &name))
>> @@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv)
>>   	setup_git_directory();
>>   	git_config(git_default_config, NULL);
>>   
>> +	prepare_repo_settings(r);
>> +	r->settings.command_requires_full_index = 0;
>> +
>>   	for (i = 0; i < cnt; i++) {
>>   		repo_read_index(r);

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-24 14:41           ` Lessley Dennington
@ 2021-11-24 18:23             ` Junio C Hamano
  2021-11-29 23:38               ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-11-24 18:23 UTC (permalink / raw)
  To: Lessley Dennington
  Cc: Lessley Dennington via GitGitGadget, git, stolee, newren, Taylor Blau

Lessley Dennington <lessleydennington@gmail.com> writes:

>>> @@ -1588,8 +1588,10 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
>>>     	git_config(git_checkout_config, opts);
>>>   -	prepare_repo_settings(the_repository);
>>> -	the_repository->settings.command_requires_full_index = 0;
>>> +	if (startup_info->have_repository) {
>>> +		prepare_repo_settings(the_repository);
>>> +		the_repository->settings.command_requires_full_index = 0;
>>> +	}
>> I am kind-a surprised if the control reaches this deep if you are
>> not in a repository.  In git.c::commands[] list, all three primary
>> entry points that call checkout_main(), namely, cmd_checkout(),
>> cmd_restore(), and cmd_switch(), are marked with RUN_SETUP bit,
>> which makes us call setup_git_directory() even before we call the
>> cmd_X() function.  And setup_git_directory() dies with "not a git
>> repository (or any of the parent directories)" outside a repository.
>> So, how can startup_info->have_repository bit be false if the
>> control flow reaches here?  Or am I grossly misunderstanding what
>> you are trying to do?
>> 
> This was in reaction to the t0012-help.sh tests failing with the
> new BUG in prepare_repo_settings. However, Elijah pointed out that
> it's a better idea to move prepare_repo_settings farther down
> (after parse_options) instead. So I will be issuing that change as
> part of v5.

I forgot that "git foo -h" for a builtin command 'foo' calls
run_builtin() but bypasses the RUN_SETUP and NEED_WORK_TREE handling
before it in turn calls cmd_foo().  We expect a call in cmd_foo() to
parse_options() emit a short-help and exit.

So, yes, there is a way to reach this point in the codeflow without
being in a repository (or even when in a repository, we may have
chosen not to realize it).  Feels ugly.

Now a bit of tangent.

I wonder if it is a problem to completely bypass RUN_SETUP in such a
case.  In general, we read the configuration to tweak the hardcoded
default behaviour, and then further override it by parsing command
line options.  In order to read configuration fully, we'd need to
know where the repository is.  So the start-up sequence must be in
this order:

 - discover where the repository is (either gently or with a hard failure)
 - read the configuration files
 - call parse_options()

And by completly bypassing RUN_SETUP, we are not reading per-repo
settings from the configuration files.

Something along this line (note: there is an always-taken if block
to reduce the patch noise for this illustration), perhaps.




 git.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git i/git.c w/git.c
index 5ff21be21f..50e258508e 100644
--- i/git.c
+++ w/git.c
@@ -421,25 +421,30 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 	int status, help;
 	struct stat st;
 	const char *prefix;
+	int run_setup = (p->option & (RUN_SETUP | RUN_SETUP_GENTLY));
 
 	prefix = NULL;
 	help = argc == 2 && !strcmp(argv[1], "-h");
-	if (!help) {
-		if (p->option & RUN_SETUP)
+	if (help && (run_setup & RUN_SETUP))
+		/* demote to GENTLY to allow 'git cmd -h' outside repo */
+		run_setup = RUN_SETUP_GENTLY;
+
+	if (1) {
+		if (run_setup & RUN_SETUP)
 			prefix = setup_git_directory();
-		else if (p->option & RUN_SETUP_GENTLY) {
+		else if (run_setup & RUN_SETUP_GENTLY) {
 			int nongit_ok;
 			prefix = setup_git_directory_gently(&nongit_ok);
 		}
 		precompose_argv_prefix(argc, argv, NULL);
-		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
+		if (use_pager == -1 && run_setup &&
 		    !(p->option & DELAY_PAGER_CONFIG))
 			use_pager = check_pager_config(p->cmd);
 		if (use_pager == -1 && p->option & USE_PAGER)
 			use_pager = 1;
 
-		if ((p->option & (RUN_SETUP | RUN_SETUP_GENTLY)) &&
-		    startup_info->have_repository) /* get_git_dir() may set up repo, avoid that */
+		if (run_setup && startup_info->have_repository)
+			/* get_git_dir() may set up repo, avoid that */
 			trace_repo_setup(prefix);
 	}
 	commit_pager_choice();

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 2/4] test-read-cache: set up repo after git directory
  2021-11-24 15:10           ` Lessley Dennington
@ 2021-11-24 18:36             ` Junio C Hamano
  2021-11-29 23:01               ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-11-24 18:36 UTC (permalink / raw)
  To: Lessley Dennington
  Cc: Lessley Dennington via GitGitGadget, git, stolee, newren, Taylor Blau

Lessley Dennington <lessleydennington@gmail.com> writes:

> On 11/23/21 3:42 PM, Junio C Hamano wrote:
>> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
>> writes:
>> 
>>> From: Lessley Dennington <lessleydennington@gmail.com>
>>>
>>> Move repo setup to occur after git directory is set up. This will ensure
>>> enabling the sparse index for `diff` (and guarding against the nongit
>>> scenario) will not cause tests to start failing, since that change will include
>>> adding a check to prepare_repo_settings() with the new BUG.

>> This looks obviously the right thing to do.  Would anything break
>> because of the "wrong" ordering of events in the original code?
>> IOW, can this "bugfix" be protected with a new test against
>> regression?
>> 
> Yep! Tests 2, 3, 28, and 34 in t1092-sparse-checkout-compatibility.sh
> will fail without this change.

I do not understand.  When 1/4 and 2/4 are applied, no tests in
t1092 fail for me.

I think the presentation order of this series is not reviewer
friendly; "the new BUG" is introduced in a separate step and
obscures the reason why this step is needed.  It is better than
adding "the new BUG" first and let some tests fail and then fix the
breakage the series caused in later steps, though.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 2/4] test-read-cache: set up repo after git directory
  2021-11-24 18:36             ` Junio C Hamano
@ 2021-11-29 23:01               ` Lessley Dennington
  0 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-29 23:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lessley Dennington via GitGitGadget, git, stolee, newren, Taylor Blau



On 11/24/21 10:36 AM, Junio C Hamano wrote:
> Lessley Dennington <lessleydennington@gmail.com> writes:
> 
>> On 11/23/21 3:42 PM, Junio C Hamano wrote:
>>> "Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
>>> writes:
>>>
>>>> From: Lessley Dennington <lessleydennington@gmail.com>
>>>>
>>>> Move repo setup to occur after git directory is set up. This will ensure
>>>> enabling the sparse index for `diff` (and guarding against the nongit
>>>> scenario) will not cause tests to start failing, since that change will include
>>>> adding a check to prepare_repo_settings() with the new BUG.
> 
>>> This looks obviously the right thing to do.  Would anything break
>>> because of the "wrong" ordering of events in the original code?
>>> IOW, can this "bugfix" be protected with a new test against
>>> regression?
>>>
>> Yep! Tests 2, 3, 28, and 34 in t1092-sparse-checkout-compatibility.sh
>> will fail without this change.
> 
> I do not understand.  When 1/4 and 2/4 are applied, no tests in
> t1092 fail for me.
> 
> I think the presentation order of this series is not reviewer
> friendly; "the new BUG" is introduced in a separate step and
> obscures the reason why this step is needed.  It is better than
> adding "the new BUG" first and let some tests fail and then fix the
> breakage the series caused in later steps, though.
> 
> 
> 
> 
What I meant was that those tests fail with the first patch in the
series. Once this patch is applied they no longer fail.

That being said, I agree with this feedback - I have reorganized the
changes for v5 in a way in which I hope will make more sense and will
improve the reviewer experience.

Lessley

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-24 18:23             ` Junio C Hamano
@ 2021-11-29 23:38               ` Lessley Dennington
  2021-11-30  6:32                 ` Junio C Hamano
  0 siblings, 1 reply; 56+ messages in thread
From: Lessley Dennington @ 2021-11-29 23:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lessley Dennington via GitGitGadget, git, stolee, newren, Taylor Blau



On 11/24/21 10:23 AM, Junio C Hamano wrote:
> Lessley Dennington <lessleydennington@gmail.com> writes:
> 
>>>> @@ -1588,8 +1588,10 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
>>>>      	git_config(git_checkout_config, opts);
>>>>    -	prepare_repo_settings(the_repository);
>>>> -	the_repository->settings.command_requires_full_index = 0;
>>>> +	if (startup_info->have_repository) {
>>>> +		prepare_repo_settings(the_repository);
>>>> +		the_repository->settings.command_requires_full_index = 0;
>>>> +	}
>>> I am kind-a surprised if the control reaches this deep if you are
>>> not in a repository.  In git.c::commands[] list, all three primary
>>> entry points that call checkout_main(), namely, cmd_checkout(),
>>> cmd_restore(), and cmd_switch(), are marked with RUN_SETUP bit,
>>> which makes us call setup_git_directory() even before we call the
>>> cmd_X() function.  And setup_git_directory() dies with "not a git
>>> repository (or any of the parent directories)" outside a repository.
>>> So, how can startup_info->have_repository bit be false if the
>>> control flow reaches here?  Or am I grossly misunderstanding what
>>> you are trying to do?
>>>
>> This was in reaction to the t0012-help.sh tests failing with the
>> new BUG in prepare_repo_settings. However, Elijah pointed out that
>> it's a better idea to move prepare_repo_settings farther down
>> (after parse_options) instead. So I will be issuing that change as
>> part of v5.
> 
> I forgot that "git foo -h" for a builtin command 'foo' calls
> run_builtin() but bypasses the RUN_SETUP and NEED_WORK_TREE handling
> before it in turn calls cmd_foo().  We expect a call in cmd_foo() to
> parse_options() emit a short-help and exit.
> 
> So, yes, there is a way to reach this point in the codeflow without
> being in a repository (or even when in a repository, we may have
> chosen not to realize it).  Feels ugly.
> 
> Now a bit of tangent.
> 
> I wonder if it is a problem to completely bypass RUN_SETUP in such a
> case.  In general, we read the configuration to tweak the hardcoded
> default behaviour, and then further override it by parsing command
> line options.  In order to read configuration fully, we'd need to
> know where the repository is.  So the start-up sequence must be in
> this order:
> 
>   - discover where the repository is (either gently or with a hard failure)
>   - read the configuration files
>   - call parse_options()
> 
> And by completly bypassing RUN_SETUP, we are not reading per-repo
> settings from the configuration files.
> 
> Something along this line (note: there is an always-taken if block
> to reduce the patch noise for this illustration), perhaps.
> 
> 
> 
> 
>   git.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git i/git.c w/git.c
> index 5ff21be21f..50e258508e 100644
> --- i/git.c
> +++ w/git.c
> @@ -421,25 +421,30 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
>   	int status, help;
>   	struct stat st;
>   	const char *prefix;
> +	int run_setup = (p->option & (RUN_SETUP | RUN_SETUP_GENTLY));
>   
>   	prefix = NULL;
>   	help = argc == 2 && !strcmp(argv[1], "-h");
> -	if (!help) {
> -		if (p->option & RUN_SETUP)
> +	if (help && (run_setup & RUN_SETUP))
> +		/* demote to GENTLY to allow 'git cmd -h' outside repo */
> +		run_setup = RUN_SETUP_GENTLY;
> +
> +	if (1) {
> +		if (run_setup & RUN_SETUP)
>   			prefix = setup_git_directory();
> -		else if (p->option & RUN_SETUP_GENTLY) {
> +		else if (run_setup & RUN_SETUP_GENTLY) {
>   			int nongit_ok;
>   			prefix = setup_git_directory_gently(&nongit_ok);
>   		}
>   		precompose_argv_prefix(argc, argv, NULL);
> -		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
> +		if (use_pager == -1 && run_setup &&
>   		    !(p->option & DELAY_PAGER_CONFIG))
>   			use_pager = check_pager_config(p->cmd);
>   		if (use_pager == -1 && p->option & USE_PAGER)
>   			use_pager = 1;
>   
> -		if ((p->option & (RUN_SETUP | RUN_SETUP_GENTLY)) &&
> -		    startup_info->have_repository) /* get_git_dir() may set up repo, avoid that */
> +		if (run_setup && startup_info->have_repository)
> +			/* get_git_dir() may set up repo, avoid that */
>   			trace_repo_setup(prefix);
>   	}
>   	commit_pager_choice();
> 
This is cool! I applied it locally, and it seems to be working well. I
will plan to replace my changes to checkout and pack-objects with this
for v5.

Lessley

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-29 23:38               ` Lessley Dennington
@ 2021-11-30  6:32                 ` Junio C Hamano
  2021-11-30 23:25                   ` Lessley Dennington
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-11-30  6:32 UTC (permalink / raw)
  To: Lessley Dennington
  Cc: Lessley Dennington via GitGitGadget, git, stolee, newren, Taylor Blau

Lessley Dennington <lessleydennington@gmail.com> writes:

> This is cool! I applied it locally, and it seems to be working well. I
> will plan to replace my changes to checkout and pack-objects with this
> for v5.

I didn't write it to replace the changes you were preparing, though.

What the patch is meant to solve is that "git checkout -h" in a
repository, whose .git/config has some custom configuration
variables that affect how the short help text is shown, were
ignoring that per-repository settings by not even checking if we are
in a repository at all.

When "-h" is in effect, and if you are outside a repository, this
would cause the setup_git_directory_gently() function to be called,
and we'd reach checkout_main() without having discovered repository.
I do not think it removes the need for your "only when startup-info
says we have repository, do these" safety.  At least, I didn't write
the patch to remove that need.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v4 1/4] sparse index: enable only for git repos
  2021-11-30  6:32                 ` Junio C Hamano
@ 2021-11-30 23:25                   ` Lessley Dennington
  0 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington @ 2021-11-30 23:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Lessley Dennington via GitGitGadget, git, stolee, newren, Taylor Blau



On 11/29/21 10:32 PM, Junio C Hamano wrote:
> Lessley Dennington <lessleydennington@gmail.com> writes:
> 
>> This is cool! I applied it locally, and it seems to be working well. I
>> will plan to replace my changes to checkout and pack-objects with this
>> for v5.
> 
> I didn't write it to replace the changes you were preparing, though.
> 
> What the patch is meant to solve is that "git checkout -h" in a
> repository, whose .git/config has some custom configuration
> variables that affect how the short help text is shown, were
> ignoring that per-repository settings by not even checking if we are
> in a repository at all.
> 
> When "-h" is in effect, and if you are outside a repository, this
> would cause the setup_git_directory_gently() function to be called,
> and we'd reach checkout_main() without having discovered repository.
> I do not think it removes the need for your "only when startup-info
> says we have repository, do these" safety.  At least, I didn't write
> the patch to remove that need.
> 
To be clear, I am not replacing the change in repo-settings.c to BUG
if we're trying to prepare settings for an uninitialized repository.
I just removed the conditional I had added around the
prepare_repo_settings method in checkout and pack-objects, since the
failing tests for those builtins started passing with the application
of this patch.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 0/7] Sparse Index: diff and blame builtins
  2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-11-22 22:42       ` [PATCH v4 4/4] blame: " Lessley Dennington via GitGitGadget
@ 2021-12-03 21:15       ` Lessley Dennington via GitGitGadget
  2021-12-03 21:15         ` [PATCH v5 1/7] git: esnure correct git directory setup with -h Lessley Dennington via GitGitGadget
                           ` (7 more replies)
  4 siblings, 8 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:15 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington

This series is based on vd/sparse-reset. It integrates the sparse index with
git diff and git blame and includes:

 1. tests added to t1092 and p2000 to establish the baseline functionality
    of the commands
 2. repository settings to enable the sparse index

The p2000 tests demonstrate a ~44% execution time reduction for 'git diff'
and a ~86% execution time reduction for 'git diff --staged' using a sparse
index. For 'git blame', the reduction time was ~60% for a file two levels
deep and ~30% for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.30: git diff (full-v3)                  0.33    0.34 +3.0%
2000.31: git diff (full-v4)                  0.33    0.35 +6.1%
2000.32: git diff (sparse-v3)                0.53    0.31 -41.5%
2000.33: git diff (sparse-v4)                0.54    0.29 -46.3%
2000.34: git diff --cached (full-v3)         0.07    0.07 +0.0%
2000.35: git diff --cached (full-v4)         0.07    0.08 +14.3%
2000.36: git diff --cached (sparse-v3)       0.28    0.04 -85.7%
2000.37: git diff --cached (sparse-v4)       0.23    0.03 -87.0%
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%



Changes since V1
================

 * Fix failing diff partially-staged test in
   t1092-sparse-checkout-compatibility.sh, which was breaking in seen.


Changes since V2
================

 * Update diff commit description to include patches that make the checkout
   and status commands work with the sparse index for readers to reference.
 * Add new test case to verify diff behaves as expected when run against
   files outside the sparse checkout cone.
 * Indent error message in blame commit
 * Check error message in blame with pathspec outside sparse definition test
   matches expectations.
 * Loop blame tests (instead of running the same command multiple time
   against different files).


Changes since V3
================

 * Update diff p2000 tests to use --cached instead of --staged. Execute new
   run and update results in commit description and cover letter.
 * Update comment on blame with pathspec outside sparse definition test in
   t1092-sparse-checkout-compatibility.sh to clarify that it tests the
   current state and could be improved in the future.
 * Ensure sparse index is only activated when diff is running against files
   in a Git repo.
 * BUG if prepare_repo_settings() is called outside a repository.
 * Ensure sparse index is not activated for calls to blame, checkout, or
   pack-object with -h.
 * Ensure commit-graph is only loaded if a git directory exists.


Changes since V4
================

 * Remove startup_info->have_repository check from checkout, pack-objects,
   and blame. Update git.c to no longer bypass setup when -h is passed
   instead.
 * Move commit-graph, test-read-cache, and repo-settings changes into their
   own patches with details in commit description of why the changes are
   being made.
 * Update t1092-sparse-checkout-compatibility.sh tests to use --cached
   instead of --staged.
 * Use 10-character hash abbreviations for commits referenced in diff commit
   message.
 * Clarify that being unable to blame files outside the working directory is
   not supported in either sparse or non-sparse checkouts both in comment on
   blame with pathspec outside sparse definition test in
   t1092-sparse-checkout-compatibility.sh and blame commit message.

Thanks, Lessley

Lessley Dennington (7):
  git: esnure correct git directory setup with -h
  commit-graph: return if there is no git directory
  test-read-cache: set up repo after git directory
  repo-settings: prepare_repo_settings only in git repos
  diff: replace --staged with --cached in t1092 tests
  diff: enable and test the sparse index
  blame: enable and test the sparse index

 builtin/blame.c                          |   3 +
 builtin/diff.c                           |   5 ++
 commit-graph.c                           |   5 +-
 git.c                                    |  37 ++++----
 repo-settings.c                          |   3 +
 t/helper/test-read-cache.c               |   5 +-
 t/perf/p2000-sparse-operations.sh        |   4 +
 t/t1092-sparse-checkout-compatibility.sh | 109 +++++++++++++++++++----
 8 files changed, 132 insertions(+), 39 deletions(-)


base-commit: f2a454e0a5e26c0f7b840970f69d195c37b16565
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1050%2Fldennington%2Fdiff-blame-sparse-index-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1050

Range-diff vs v4:

 -:  ----------- > 1:  09c2ff9f898 git: esnure correct git directory setup with -h
 1:  81e208cf454 ! 2:  9e53a6435e4 sparse index: enable only for git repos
     @@ Metadata
      Author: Lessley Dennington <lessleydennington@gmail.com>
      
       ## Commit message ##
     -    sparse index: enable only for git repos
     +    commit-graph: return if there is no git directory
      
     -    Check whether git dir exists before adding any repo settings. If it
     -    does not exist, BUG with the message that one cannot add settings for an
     -    uninitialized repository. If it does exist, proceed with adding repo
     -    settings.
     -
     -    Additionally, ensure the above BUG is not triggered when users pass the -h
     -    flag by adding a check for the repository to the checkout and pack-objects
     -    builtins.
     -
     -    Finally, ensure the above BUG is not triggered for commit-graph by
     -    returning early if the git directory does not exist.
     +    Return early if git directory does not exist. This will protect against
     +    test failures in the upcoming change to BUG in prepare_repo_settings if no
     +    git directory exists.
      
          Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
      
     - ## builtin/checkout.c ##
     -@@ builtin/checkout.c: static int checkout_main(int argc, const char **argv, const char *prefix,
     - 
     - 	git_config(git_checkout_config, opts);
     - 
     --	prepare_repo_settings(the_repository);
     --	the_repository->settings.command_requires_full_index = 0;
     -+	if (startup_info->have_repository) {
     -+		prepare_repo_settings(the_repository);
     -+		the_repository->settings.command_requires_full_index = 0;
     -+	}
     - 
     - 	opts->track = BRANCH_TRACK_UNSPECIFIED;
     - 
     -
     - ## builtin/pack-objects.c ##
     -@@ builtin/pack-objects.c: int cmd_pack_objects(int argc, const char **argv, const char *prefix)
     - 	read_replace_refs = 0;
     - 
     - 	sparse = git_env_bool("GIT_TEST_PACK_SPARSE", -1);
     --	prepare_repo_settings(the_repository);
     --	if (sparse < 0)
     --		sparse = the_repository->settings.pack_use_sparse;
     -+
     -+	if (startup_info->have_repository) {
     -+		prepare_repo_settings(the_repository);
     -+		if (sparse < 0)
     -+			sparse = the_repository->settings.pack_use_sparse;
     -+	}
     - 
     - 	reset_pack_idx_option(&pack_idx_opts);
     - 	git_config(git_pack_config, NULL);
     -
       ## commit-graph.c ##
      @@ commit-graph.c: static int prepare_commit_graph(struct repository *r)
       	struct object_directory *odb;
     @@ commit-graph.c: static int prepare_commit_graph(struct repository *r)
       		return 0;
       
       	if (r->objects->commit_graph_attempted)
     -
     - ## repo-settings.c ##
     -@@ repo-settings.c: void prepare_repo_settings(struct repository *r)
     - 	char *strval;
     - 	int manyfiles;
     - 
     -+	if (!r->gitdir)
     -+		BUG("Cannot add settings for uninitialized repository");
     -+
     - 	if (r->settings.initialized++)
     - 		return;
     - 
 2:  5bc5e8465ab ! 3:  219a4158b6a test-read-cache: set up repo after git directory
     @@ Metadata
       ## Commit message ##
          test-read-cache: set up repo after git directory
      
     -    Move repo setup to occur after git directory is set up. This will ensure
     -    enabling the sparse index for `diff` (and guarding against the nongit
     -    scenario) will not cause tests to start failing, since that change will include
     -    adding a check to prepare_repo_settings() with the new BUG.
     +    Move repo setup to occur after git directory is set up. This will protect
     +    against test failures in the upcoming change to BUG in
     +    prepare_repo_settings if no git directory exists.
      
          Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
      
 -:  ----------- > 4:  4d8d58c473b repo-settings: prepare_repo_settings only in git repos
 -:  ----------- > 5:  85e3e5c78e7 diff: replace --staged with --cached in t1092 tests
 3:  273ee16b74e ! 6:  4f16366e5ad diff: enable and test the sparse index
     @@ Commit message
          with the 'git status' and 'git checkout' commands that were already
          integrated.  For more details see:
      
     -    d76723e (status: use sparse-index throughout, 2021-07-14)
     -    1ba5f45 (checkout: stop expanding sparse indexes, 2021-06-29)
     +    d76723ee53 (status: use sparse-index throughout, 2021-07-14)
     +    1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)
      
          The most interesting thing to do is to add tests that verify that 'git
          diff' behaves correctly when the sparse index is enabled. These cases are:
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	run_on_all ../edit-contents deep/testfile &&
      +
      +	test_all_match git diff &&
     -+	test_all_match git diff --staged &&
     ++	test_all_match git diff --cached &&
      +	ensure_not_expanded diff &&
     -+	ensure_not_expanded diff --staged &&
     ++	ensure_not_expanded diff --cached &&
      +
      +	# Add file outside cone
      +	test_all_match git reset --hard &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	test_sparse_match git sparse-checkout set &&
      +
      +	test_all_match git diff &&
     -+	test_all_match git diff --staged &&
     ++	test_all_match git diff --cached &&
      +	ensure_not_expanded diff &&
     -+	ensure_not_expanded diff --staged &&
     ++	ensure_not_expanded diff --cached &&
      +
      +	# Merge conflict outside cone
      +	# The sparse checkout will report a warning that is not in the
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse-index is n
      +	test_all_match test_must_fail git merge merge-right &&
      +
      +	test_all_match git diff &&
     -+	test_all_match git diff --staged &&
     ++	test_all_match git diff --cached &&
      +	ensure_not_expanded diff &&
     -+	ensure_not_expanded diff --staged
     ++	ensure_not_expanded diff --cached
      +'
      +
       # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 4:  7acf5118bf5 ! 7:  04532378734 blame: enable and test the sparse index
     @@ Commit message
          2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%
      
          We do not include paths outside the sparse checkout cone because blame
     -    currently does not support blaming files outside of the sparse definition.
     -    Attempting to do so fails with the following error:
     -
     -      fatal: no such path '<path outside sparse definition>' in HEAD
     +    does not support blaming files that are not present in the working
     +    directory. This is true in both sparse and full checkouts.
      
          Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
      
       ## builtin/blame.c ##
     -@@ builtin/blame.c: int cmd_blame(int argc, const char **argv, const char *prefix)
     - 	long anchor;
     - 	const int hexsz = the_hash_algo->hexsz;
     +@@ builtin/blame.c: parse_done:
     + 	revs.diffopt.flags.follow_renames = 0;
     + 	argc = parse_options_end(&ctx);
       
     -+	if (startup_info->have_repository) {
     -+		prepare_repo_settings(the_repository);
     -+		the_repository->settings.command_requires_full_index = 0;
     -+	}
     ++	prepare_repo_settings(the_repository);
     ++	the_repository->settings.command_requires_full_index = 0;
      +
     - 	setup_default_color_by_age();
     - 	git_config(git_blame_config, &output_option);
     - 	repo_init_revisions(the_repository, &revs, NULL);
     + 	if (incremental || (output_option & OUTPUT_PORCELAIN)) {
     + 		if (show_progress > 0)
     + 			die(_("--progress can't be used with --incremental or porcelain formats"));
      
       ## t/perf/p2000-sparse-operations.sh ##
     -@@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git reset
     - test_perf_on_all git reset --hard
     +@@ t/perf/p2000-sparse-operations.sh: test_perf_on_all git reset --hard
       test_perf_on_all git reset -- does-not-exist
       test_perf_on_all git diff
     --test_perf_on_all git diff --cached
     -+test_perf_on_all git diff --staged
     + test_perf_on_all git diff --cached
      +test_perf_on_all git blame $SPARSE_CONE/a
      +test_perf_on_all git blame $SPARSE_CONE/f3/a
       
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'log with pathspec
      -# TODO: blame currently does not support blaming files outside of the
      -# sparse definition. It complains that the file doesn't exist locally.
      -test_expect_failure 'blame with pathspec outside sparse definition' '
     -+# NEEDSWORK: This test documents the current behavior, but this could
     -+# change in the future if we decide to support blaming files outside
     -+# the sparse definition.
     ++# Without a revision specified, blame will error if passed any file that
     ++# is not present in the working directory (even if the file is tracked).
     ++# Here we just verify that this is also true with sparse checkouts.
      +test_expect_success 'blame with pathspec outside sparse definition' '
       	init_repos &&
      +	test_sparse_match git sparse-checkout set &&
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'log with pathspec
       
       test_expect_success 'checkout and reset (mixed)' '
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is not expanded: diff' '
     - 	ensure_not_expanded diff --staged
     + 	ensure_not_expanded diff --cached
       '
       
      +test_expect_success 'sparse index is not expanded: blame' '

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 1/7] git: esnure correct git directory setup with -h
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
@ 2021-12-03 21:15         ` Lessley Dennington via GitGitGadget
  2021-12-04 18:41           ` Elijah Newren
  2021-12-04 19:58           ` Junio C Hamano
  2021-12-03 21:16         ` [PATCH v5 2/7] commit-graph: return if there is no git directory Lessley Dennington via GitGitGadget
                           ` (6 subsequent siblings)
  7 siblings, 2 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:15 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Ensure correct git directory setup when -h is passed with commands. This
specifically applies to repos with special help text configuration
variables and to commands run with -h outside a repository. This
will also protect against test failures in the upcoming change to BUG in
prepare_repo_settings if no git directory exists.

Note: this diff is better seen when ignoring whitespace changes.

Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 git.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/git.c b/git.c
index 60c2784be45..03a2dfa5174 100644
--- a/git.c
+++ b/git.c
@@ -421,27 +421,28 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 	int status, help;
 	struct stat st;
 	const char *prefix;
-
+	int run_setup = (p->option & (RUN_SETUP | RUN_SETUP_GENTLY));
 	prefix = NULL;
 	help = argc == 2 && !strcmp(argv[1], "-h");
-	if (!help) {
-		if (p->option & RUN_SETUP)
-			prefix = setup_git_directory();
-		else if (p->option & RUN_SETUP_GENTLY) {
-			int nongit_ok;
-			prefix = setup_git_directory_gently(&nongit_ok);
-		}
-		precompose_argv_prefix(argc, argv, NULL);
-		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
-		    !(p->option & DELAY_PAGER_CONFIG))
-			use_pager = check_pager_config(p->cmd);
-		if (use_pager == -1 && p->option & USE_PAGER)
-			use_pager = 1;
-
-		if ((p->option & (RUN_SETUP | RUN_SETUP_GENTLY)) &&
-		    startup_info->have_repository) /* get_git_dir() may set up repo, avoid that */
-			trace_repo_setup(prefix);
+	if (help && (run_setup & RUN_SETUP))
+		/* demote to GENTLY to allow 'git cmd -h' outside repo */
+		run_setup = RUN_SETUP_GENTLY;
+
+	if (run_setup & RUN_SETUP)
+		prefix = setup_git_directory();
+	else if (run_setup & RUN_SETUP_GENTLY) {
+		int nongit_ok;
+		prefix = setup_git_directory_gently(&nongit_ok);
 	}
+	precompose_argv_prefix(argc, argv, NULL);
+	if (use_pager == -1 && run_setup &&
+		!(p->option & DELAY_PAGER_CONFIG))
+		use_pager = check_pager_config(p->cmd);
+	if (use_pager == -1 && p->option & USE_PAGER)
+		use_pager = 1;
+	if (run_setup && startup_info->have_repository)
+		/* get_git_dir() may set up repo, avoid that */
+		trace_repo_setup(prefix);
 	commit_pager_choice();
 
 	if (!help && get_super_prefix()) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 2/7] commit-graph: return if there is no git directory
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-12-03 21:15         ` [PATCH v5 1/7] git: esnure correct git directory setup with -h Lessley Dennington via GitGitGadget
@ 2021-12-03 21:16         ` Lessley Dennington via GitGitGadget
  2021-12-03 21:16         ` [PATCH v5 3/7] test-read-cache: set up repo after " Lessley Dennington via GitGitGadget
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:16 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Return early if git directory does not exist. This will protect against
test failures in the upcoming change to BUG in prepare_repo_settings if no
git directory exists.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 commit-graph.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 2706683acfe..265c010122e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -632,10 +632,13 @@ static int prepare_commit_graph(struct repository *r)
 	struct object_directory *odb;
 
 	/*
+	 * Early return if there is no git dir or if the commit graph is
+	 * disabled.
+	 *
 	 * This must come before the "already attempted?" check below, because
 	 * we want to disable even an already-loaded graph file.
 	 */
-	if (r->commit_graph_disabled)
+	if (!r->gitdir || r->commit_graph_disabled)
 		return 0;
 
 	if (r->objects->commit_graph_attempted)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 3/7] test-read-cache: set up repo after git directory
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
  2021-12-03 21:15         ` [PATCH v5 1/7] git: esnure correct git directory setup with -h Lessley Dennington via GitGitGadget
  2021-12-03 21:16         ` [PATCH v5 2/7] commit-graph: return if there is no git directory Lessley Dennington via GitGitGadget
@ 2021-12-03 21:16         ` Lessley Dennington via GitGitGadget
  2021-12-03 21:16         ` [PATCH v5 4/7] repo-settings: prepare_repo_settings only in git repos Lessley Dennington via GitGitGadget
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:16 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Move repo setup to occur after git directory is set up. This will protect
against test failures in the upcoming change to BUG in
prepare_repo_settings if no git directory exists.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 t/helper/test-read-cache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
index b52c174acc7..0d9f08931a1 100644
--- a/t/helper/test-read-cache.c
+++ b/t/helper/test-read-cache.c
@@ -39,8 +39,6 @@ int cmd__read_cache(int argc, const char **argv)
 	int table = 0, expand = 0;
 
 	initialize_the_repository();
-	prepare_repo_settings(r);
-	r->settings.command_requires_full_index = 0;
 
 	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
 		if (skip_prefix(*argv, "--print-and-refresh=", &name))
@@ -56,6 +54,9 @@ int cmd__read_cache(int argc, const char **argv)
 	setup_git_directory();
 	git_config(git_default_config, NULL);
 
+	prepare_repo_settings(r);
+	r->settings.command_requires_full_index = 0;
+
 	for (i = 0; i < cnt; i++) {
 		repo_read_index(r);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 4/7] repo-settings: prepare_repo_settings only in git repos
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
                           ` (2 preceding siblings ...)
  2021-12-03 21:16         ` [PATCH v5 3/7] test-read-cache: set up repo after " Lessley Dennington via GitGitGadget
@ 2021-12-03 21:16         ` Lessley Dennington via GitGitGadget
  2021-12-03 21:16         ` [PATCH v5 5/7] diff: replace --staged with --cached in t1092 tests Lessley Dennington via GitGitGadget
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:16 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Check whether git directory exists before adding any repo settings. If it
does not exist, BUG with the message that one cannot add settings for an
uninitialized repository. If it does exist, proceed with adding repo
settings.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 repo-settings.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/repo-settings.c b/repo-settings.c
index b93e91a212e..00ca5571a1a 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -17,6 +17,9 @@ void prepare_repo_settings(struct repository *r)
 	char *strval;
 	int manyfiles;
 
+	if (!r->gitdir)
+		BUG("Cannot add settings for uninitialized repository");
+
 	if (r->settings.initialized++)
 		return;
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 5/7] diff: replace --staged with --cached in t1092 tests
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
                           ` (3 preceding siblings ...)
  2021-12-03 21:16         ` [PATCH v5 4/7] repo-settings: prepare_repo_settings only in git repos Lessley Dennington via GitGitGadget
@ 2021-12-03 21:16         ` Lessley Dennington via GitGitGadget
  2021-12-03 21:16         ` [PATCH v5 6/7] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:16 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Replace uses of the synonym --staged in t1092 tests with --cached (which
is the real and preferred option). This will allow consistency in the new
tests to be added with the upcoming change to enable the sparse index for
diff.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 77e302a0ef3..203a594fa45 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -371,7 +371,7 @@ test_expect_success 'checkout and reset --hard' '
 	test_all_match git reset --hard update-folder2
 '
 
-test_expect_success 'diff --staged' '
+test_expect_success 'diff --cached' '
 	init_repos &&
 
 	write_script edit-contents <<-\EOF &&
@@ -380,10 +380,10 @@ test_expect_success 'diff --staged' '
 	run_on_all ../edit-contents &&
 
 	test_all_match git diff &&
-	test_all_match git diff --staged &&
+	test_all_match git diff --cached &&
 	test_all_match git add README.md &&
 	test_all_match git diff &&
-	test_all_match git diff --staged
+	test_all_match git diff --cached
 '
 
 # NEEDSWORK: sparse-checkout behaves differently from full-checkout when
@@ -400,8 +400,8 @@ test_expect_success 'diff with renames and conflicts' '
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- . &&
 		test_all_match git status --porcelain=v2 &&
-		test_all_match git diff --staged --no-renames &&
-		test_all_match git diff --staged --find-renames || return 1
+		test_all_match git diff --cached --no-renames &&
+		test_all_match git diff --cached --find-renames || return 1
 	done
 '
 
@@ -420,8 +420,8 @@ test_expect_success 'diff with directory/file conflicts' '
 		test_all_match git checkout $branch &&
 		test_all_match git checkout rename-base -- . &&
 		test_all_match git status --porcelain=v2 &&
-		test_all_match git diff --staged --no-renames &&
-		test_all_match git diff --staged --find-renames || return 1
+		test_all_match git diff --cached --no-renames &&
+		test_all_match git diff --cached --find-renames || return 1
 	done
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 6/7] diff: enable and test the sparse index
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
                           ` (4 preceding siblings ...)
  2021-12-03 21:16         ` [PATCH v5 5/7] diff: replace --staged with --cached in t1092 tests Lessley Dennington via GitGitGadget
@ 2021-12-03 21:16         ` Lessley Dennington via GitGitGadget
  2021-12-03 21:16         ` [PATCH v5 7/7] blame: " Lessley Dennington via GitGitGadget
  2021-12-04 19:43         ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Elijah Newren
  7 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:16 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code
with the 'git status' and 'git checkout' commands that were already
integrated.  For more details see:

d76723ee53 (status: use sparse-index throughout, 2021-07-14)
1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29)

The most interesting thing to do is to add tests that verify that 'git
diff' behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~44% execution time reduction for 'git
diff' and a ~86% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.33    0.34 +3.0%
2000.31: git diff (full-v4)               0.33    0.35 +6.1%
2000.32: git diff (sparse-v3)             0.53    0.31 -41.5%
2000.33: git diff (sparse-v4)             0.54    0.29 -46.3%
2000.34: git diff --cached (full-v3)      0.07    0.07 +0.0%
2000.35: git diff --cached (full-v4)      0.07    0.08 +14.3%
2000.36: git diff --cached (sparse-v3)    0.28    0.04 -85.7%
2000.37: git diff --cached (sparse-v4)    0.23    0.03 -87.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/diff.c                           |  5 +++
 t/perf/p2000-sparse-operations.sh        |  2 ++
 t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index dd8ce688ba7..fa4683377eb 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -437,6 +437,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 
 	prefix = setup_git_directory_gently(&nongit);
 
+	if (!nongit) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
+
 	if (!no_index) {
 		/*
 		 * Treat git diff with at least one path outside of the
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index bfd332120c8..5cf94627383 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -113,5 +113,7 @@ test_perf_on_all git checkout -f -
 test_perf_on_all git reset
 test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
+test_perf_on_all git diff
+test_perf_on_all git diff --cached
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 203a594fa45..abfb4994bb9 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -846,6 +846,52 @@ test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
 	)
 '
 
+test_expect_success 'sparse index is not expanded: diff' '
+	init_repos &&
+
+	write_script edit-contents <<-\EOF &&
+	echo text >>$1
+	EOF
+
+	# Add file within cone
+	test_sparse_match git sparse-checkout set deep &&
+	run_on_all ../edit-contents deep/testfile &&
+	test_all_match git add deep/testfile &&
+	run_on_all ../edit-contents deep/testfile &&
+
+	test_all_match git diff &&
+	test_all_match git diff --cached &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --cached &&
+
+	# Add file outside cone
+	test_all_match git reset --hard &&
+	run_on_all mkdir newdirectory &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set newdirectory &&
+	test_all_match git add newdirectory/testfile &&
+	run_on_all ../edit-contents newdirectory/testfile &&
+	test_sparse_match git sparse-checkout set &&
+
+	test_all_match git diff &&
+	test_all_match git diff --cached &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --cached &&
+
+	# Merge conflict outside cone
+	# The sparse checkout will report a warning that is not in the
+	# full checkout, so we use `run_on_all` instead of
+	# `test_all_match`
+	run_on_all git reset --hard &&
+	test_all_match git checkout merge-left &&
+	test_all_match test_must_fail git merge merge-right &&
+
+	test_all_match git diff &&
+	test_all_match git diff --cached &&
+	ensure_not_expanded diff &&
+	ensure_not_expanded diff --cached
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v5 7/7] blame: enable and test the sparse index
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
                           ` (5 preceding siblings ...)
  2021-12-03 21:16         ` [PATCH v5 6/7] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
@ 2021-12-03 21:16         ` Lessley Dennington via GitGitGadget
  2021-12-04 19:43         ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Elijah Newren
  7 siblings, 0 replies; 56+ messages in thread
From: Lessley Dennington via GitGitGadget @ 2021-12-03 21:16 UTC (permalink / raw)
  To: git
  Cc: stolee, gitster, newren, Taylor Blau, Lessley Dennington,
	Lessley Dennington

From: Lessley Dennington <lessleydennington@gmail.com>

Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
does not support blaming files that are not present in the working
directory. This is true in both sparse and full checkouts.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
---
 builtin/blame.c                          |  3 ++
 t/perf/p2000-sparse-operations.sh        |  2 +
 t/t1092-sparse-checkout-compatibility.sh | 49 ++++++++++++++++++------
 3 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 641523ff9af..33e411d2203 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -940,6 +940,9 @@ parse_done:
 	revs.diffopt.flags.follow_renames = 0;
 	argc = parse_options_end(&ctx);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	if (incremental || (output_option & OUTPUT_PORCELAIN)) {
 		if (show_progress > 0)
 			die(_("--progress can't be used with --incremental or porcelain formats"));
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 5cf94627383..cb777c74a24 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -115,5 +115,7 @@ test_perf_on_all git reset --hard
 test_perf_on_all git reset -- does-not-exist
 test_perf_on_all git diff
 test_perf_on_all git diff --cached
+test_perf_on_all git blame $SPARSE_CONE/a
+test_perf_on_all git blame $SPARSE_CONE/f3/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index abfb4994bb9..6187a997b7d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -442,21 +442,36 @@ test_expect_success 'log with pathspec outside sparse definition' '
 test_expect_success 'blame with pathspec inside sparse definition' '
 	init_repos &&
 
-	test_all_match git blame a &&
-	test_all_match git blame deep/a &&
-	test_all_match git blame deep/deeper1/a &&
-	test_all_match git blame deep/deeper1/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_all_match git blame $file
+	done
 '
 
-# TODO: blame currently does not support blaming files outside of the
-# sparse definition. It complains that the file doesn't exist locally.
-test_expect_failure 'blame with pathspec outside sparse definition' '
+# Without a revision specified, blame will error if passed any file that
+# is not present in the working directory (even if the file is tracked).
+# Here we just verify that this is also true with sparse checkouts.
+test_expect_success 'blame with pathspec outside sparse definition' '
 	init_repos &&
+	test_sparse_match git sparse-checkout set &&
 
-	test_all_match git blame folder1/a &&
-	test_all_match git blame folder2/a &&
-	test_all_match git blame deep/deeper2/a &&
-	test_all_match git blame deep/deeper2/deepest/a
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		test_sparse_match test_must_fail git blame $file &&
+		cat >expect <<-EOF &&
+		fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
+		EOF
+		# We compare sparse-checkout-err and sparse-index-err in
+		# `test_sparse_match`. Given we know they are the same, we
+		# only check the content of sparse-index-err here.
+		test_cmp expect sparse-index-err
+	done
 '
 
 test_expect_success 'checkout and reset (mixed)' '
@@ -892,6 +907,18 @@ test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --cached
 '
 
+test_expect_success 'sparse index is not expanded: blame' '
+	init_repos &&
+
+	for file in a \
+			deep/a \
+			deep/deeper1/a \
+			deep/deeper1/deepest/a
+	do
+		ensure_not_expanded blame $file
+	done
+'
+
 # NEEDSWORK: a sparse-checkout behaves differently from a full checkout
 # in this scenario, but it shouldn't.
 test_expect_success 'reset mixed and checkout orphan' '
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 1/7] git: esnure correct git directory setup with -h
  2021-12-03 21:15         ` [PATCH v5 1/7] git: esnure correct git directory setup with -h Lessley Dennington via GitGitGadget
@ 2021-12-04 18:41           ` Elijah Newren
  2021-12-04 19:58           ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-12-04 18:41 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Lessley Dennington

On Fri, Dec 3, 2021 at 1:16 PM Lessley Dennington via GitGitGadget
<gitgitgadget@gmail.com> wrote:

Simple typo in the subject of the commit message: "esnure" -> "ensure"

> From: Lessley Dennington <lessleydennington@gmail.com>
>
> Ensure correct git directory setup when -h is passed with commands. This
> specifically applies to repos with special help text configuration
> variables and to commands run with -h outside a repository. This
> will also protect against test failures in the upcoming change to BUG in
> prepare_repo_settings if no git directory exists.
>
> Note: this diff is better seen when ignoring whitespace changes.
>
> Co-authored-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
> ---
>  git.c | 37 +++++++++++++++++++------------------
>  1 file changed, 19 insertions(+), 18 deletions(-)
>
> diff --git a/git.c b/git.c
> index 60c2784be45..03a2dfa5174 100644
> --- a/git.c
> +++ b/git.c
> @@ -421,27 +421,28 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
>         int status, help;
>         struct stat st;
>         const char *prefix;
> -
> +       int run_setup = (p->option & (RUN_SETUP | RUN_SETUP_GENTLY));
>         prefix = NULL;
>         help = argc == 2 && !strcmp(argv[1], "-h");
> -       if (!help) {
> -               if (p->option & RUN_SETUP)
> -                       prefix = setup_git_directory();
> -               else if (p->option & RUN_SETUP_GENTLY) {
> -                       int nongit_ok;
> -                       prefix = setup_git_directory_gently(&nongit_ok);
> -               }
> -               precompose_argv_prefix(argc, argv, NULL);
> -               if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
> -                   !(p->option & DELAY_PAGER_CONFIG))
> -                       use_pager = check_pager_config(p->cmd);
> -               if (use_pager == -1 && p->option & USE_PAGER)
> -                       use_pager = 1;
> -
> -               if ((p->option & (RUN_SETUP | RUN_SETUP_GENTLY)) &&
> -                   startup_info->have_repository) /* get_git_dir() may set up repo, avoid that */
> -                       trace_repo_setup(prefix);
> +       if (help && (run_setup & RUN_SETUP))
> +               /* demote to GENTLY to allow 'git cmd -h' outside repo */
> +               run_setup = RUN_SETUP_GENTLY;
> +
> +       if (run_setup & RUN_SETUP)
> +               prefix = setup_git_directory();
> +       else if (run_setup & RUN_SETUP_GENTLY) {
> +               int nongit_ok;
> +               prefix = setup_git_directory_gently(&nongit_ok);
>         }
> +       precompose_argv_prefix(argc, argv, NULL);
> +       if (use_pager == -1 && run_setup &&
> +               !(p->option & DELAY_PAGER_CONFIG))
> +               use_pager = check_pager_config(p->cmd);
> +       if (use_pager == -1 && p->option & USE_PAGER)
> +               use_pager = 1;
> +       if (run_setup && startup_info->have_repository)
> +               /* get_git_dir() may set up repo, avoid that */
> +               trace_repo_setup(prefix);
>         commit_pager_choice();
>
>         if (!help && get_super_prefix()) {
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 0/7] Sparse Index: diff and blame builtins
  2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
                           ` (6 preceding siblings ...)
  2021-12-03 21:16         ` [PATCH v5 7/7] blame: " Lessley Dennington via GitGitGadget
@ 2021-12-04 19:43         ` Elijah Newren
  7 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-12-04 19:43 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Junio C Hamano, Taylor Blau,
	Lessley Dennington

On Fri, Dec 3, 2021 at 1:16 PM Lessley Dennington via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series is based on vd/sparse-reset. It integrates the sparse index with
> git diff and git blame and includes:
>
>  1. tests added to t1092 and p2000 to establish the baseline functionality
>     of the commands
>  2. repository settings to enable the sparse index
>
> The p2000 tests demonstrate a ~44% execution time reduction for 'git diff'
> and a ~86% execution time reduction for 'git diff --staged' using a sparse
> index. For 'git blame', the reduction time was ~60% for a file two levels
> deep and ~30% for a file three levels deep.
>
> Test                                         before  after
> ----------------------------------------------------------------
> 2000.30: git diff (full-v3)                  0.33    0.34 +3.0%
> 2000.31: git diff (full-v4)                  0.33    0.35 +6.1%
> 2000.32: git diff (sparse-v3)                0.53    0.31 -41.5%
> 2000.33: git diff (sparse-v4)                0.54    0.29 -46.3%
> 2000.34: git diff --cached (full-v3)         0.07    0.07 +0.0%
> 2000.35: git diff --cached (full-v4)         0.07    0.08 +14.3%
> 2000.36: git diff --cached (sparse-v3)       0.28    0.04 -85.7%
> 2000.37: git diff --cached (sparse-v4)       0.23    0.03 -87.0%
> 2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
> 2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
> 2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
> 2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
> 2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
> 2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
> 2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
> 2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%
>
>
>
> Changes since V1
> ================
>
>  * Fix failing diff partially-staged test in
>    t1092-sparse-checkout-compatibility.sh, which was breaking in seen.
>
>
> Changes since V2
> ================
>
>  * Update diff commit description to include patches that make the checkout
>    and status commands work with the sparse index for readers to reference.
>  * Add new test case to verify diff behaves as expected when run against
>    files outside the sparse checkout cone.
>  * Indent error message in blame commit
>  * Check error message in blame with pathspec outside sparse definition test
>    matches expectations.
>  * Loop blame tests (instead of running the same command multiple time
>    against different files).
>
>
> Changes since V3
> ================
>
>  * Update diff p2000 tests to use --cached instead of --staged. Execute new
>    run and update results in commit description and cover letter.
>  * Update comment on blame with pathspec outside sparse definition test in
>    t1092-sparse-checkout-compatibility.sh to clarify that it tests the
>    current state and could be improved in the future.
>  * Ensure sparse index is only activated when diff is running against files
>    in a Git repo.
>  * BUG if prepare_repo_settings() is called outside a repository.
>  * Ensure sparse index is not activated for calls to blame, checkout, or
>    pack-object with -h.
>  * Ensure commit-graph is only loaded if a git directory exists.
>
>
> Changes since V4
> ================
>
>  * Remove startup_info->have_repository check from checkout, pack-objects,
>    and blame. Update git.c to no longer bypass setup when -h is passed
>    instead.
>  * Move commit-graph, test-read-cache, and repo-settings changes into their
>    own patches with details in commit description of why the changes are
>    being made.
>  * Update t1092-sparse-checkout-compatibility.sh tests to use --cached
>    instead of --staged.
>  * Use 10-character hash abbreviations for commits referenced in diff commit
>    message.
>  * Clarify that being unable to blame files outside the working directory is
>    not supported in either sparse or non-sparse checkouts both in comment on
>    blame with pathspec outside sparse definition test in
>    t1092-sparse-checkout-compatibility.sh and blame commit message.

This round addresses all my concerns from previous rounds.  There's a
trivial typo in the subject of the new patch 1, but feel free to add
my

Reviewed-by: Elijah Newren <newren@gmail.com>

when you resubmit with that fix.  Nice work!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v5 1/7] git: esnure correct git directory setup with -h
  2021-12-03 21:15         ` [PATCH v5 1/7] git: esnure correct git directory setup with -h Lessley Dennington via GitGitGadget
  2021-12-04 18:41           ` Elijah Newren
@ 2021-12-04 19:58           ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-12-04 19:58 UTC (permalink / raw)
  To: Lessley Dennington via GitGitGadget
  Cc: git, stolee, newren, Taylor Blau, Lessley Dennington

"Lessley Dennington via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> @@ -421,27 +421,28 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
>  	int status, help;
>  	struct stat st;
>  	const char *prefix;
> -
> +	int run_setup = (p->option & (RUN_SETUP | RUN_SETUP_GENTLY));
>  	prefix = NULL;

Don't lose the blank line that separates the variable declarations
and the first statement.  Also, it probably makes sense to remove
the assignment of NULL to prefix from here (see below).

>  	help = argc == 2 && !strcmp(argv[1], "-h");
> -...
> +	if (help && (run_setup & RUN_SETUP))
> +		/* demote to GENTLY to allow 'git cmd -h' outside repo */
> +		run_setup = RUN_SETUP_GENTLY;

OK.

> +	if (run_setup & RUN_SETUP)
> +		prefix = setup_git_directory();
> +	else if (run_setup & RUN_SETUP_GENTLY) {
> +		int nongit_ok;
> +		prefix = setup_git_directory_gently(&nongit_ok);
>  	}

Here, we say "depending on how run_setup is specified, we compute
the prefix in different ways".  So the "we do not run setup, so set
prefix to NULL" naturally belongs here.  Perhaps something like...

	if (run_setup & RUN_SETUP) {
		prefix = setup_git_directory();
	} else if (run_setup & RUN_SETUP_GENTLY) {
		int nongit_ok;
		prefix = setup_git_directory_gently(&nongit_ok);
	} else {
		prefix = NULL;
	}

> +	precompose_argv_prefix(argc, argv, NULL);
> +	if (use_pager == -1 && run_setup &&
> +		!(p->option & DELAY_PAGER_CONFIG))
> +		use_pager = check_pager_config(p->cmd);
> +	if (use_pager == -1 && p->option & USE_PAGER)
> +		use_pager = 1;
> +	if (run_setup && startup_info->have_repository)
> +		/* get_git_dir() may set up repo, avoid that */
> +		trace_repo_setup(prefix);
>  	commit_pager_choice();
>  
>  	if (!help && get_super_prefix()) {

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2021-12-04 19:58 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14 17:25 [PATCH 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
2021-10-14 17:25 ` [PATCH 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
2021-10-15 16:46   ` Derrick Stolee
2021-10-14 17:25 ` [PATCH 2/2] blame: " Lessley Dennington via GitGitGadget
2021-11-23  7:57   ` Elijah Newren
2021-11-23 14:57     ` Lessley Dennington
2021-10-15 21:20 ` [PATCH v2 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
2021-10-15 21:20   ` [PATCH v2 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
2021-10-25 20:47     ` Taylor Blau
2021-10-26 16:10       ` Lessley Dennington
2021-10-26 16:15         ` Taylor Blau
2021-10-15 21:20   ` [PATCH v2 2/2] blame: " Lessley Dennington via GitGitGadget
2021-10-25 20:53     ` Taylor Blau
2021-10-26 16:17       ` Lessley Dennington
2021-11-21  1:32         ` Elijah Newren
2021-11-01 21:27   ` [PATCH v3 0/2] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
2021-11-01 21:27     ` [PATCH v3 1/2] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
2021-11-03 17:05       ` Junio C Hamano
2021-11-04 23:55         ` Lessley Dennington
2021-11-01 21:27     ` [PATCH v3 2/2] blame: " Lessley Dennington via GitGitGadget
2021-11-03 16:47       ` Junio C Hamano
2021-11-05  0:04         ` Lessley Dennington
2021-11-21  1:46         ` Elijah Newren
2021-11-22 22:42     ` [PATCH v4 0/4] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
2021-11-22 22:42       ` [PATCH v4 1/4] sparse index: enable only for git repos Lessley Dennington via GitGitGadget
2021-11-23  7:41         ` Elijah Newren
2021-11-23 14:52           ` Lessley Dennington
2021-11-23 23:39         ` Junio C Hamano
2021-11-24 14:41           ` Lessley Dennington
2021-11-24 18:23             ` Junio C Hamano
2021-11-29 23:38               ` Lessley Dennington
2021-11-30  6:32                 ` Junio C Hamano
2021-11-30 23:25                   ` Lessley Dennington
2021-11-22 22:42       ` [PATCH v4 2/4] test-read-cache: set up repo after git directory Lessley Dennington via GitGitGadget
2021-11-23 23:42         ` Junio C Hamano
2021-11-24 15:10           ` Lessley Dennington
2021-11-24 18:36             ` Junio C Hamano
2021-11-29 23:01               ` Lessley Dennington
2021-11-22 22:42       ` [PATCH v4 3/4] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
2021-11-23  7:47         ` Elijah Newren
2021-11-23 14:53           ` Lessley Dennington
2021-11-23 23:48         ` Junio C Hamano
2021-11-22 22:42       ` [PATCH v4 4/4] blame: " Lessley Dennington via GitGitGadget
2021-11-23 23:53         ` Junio C Hamano
2021-11-24 14:52           ` Lessley Dennington
2021-12-03 21:15       ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Lessley Dennington via GitGitGadget
2021-12-03 21:15         ` [PATCH v5 1/7] git: esnure correct git directory setup with -h Lessley Dennington via GitGitGadget
2021-12-04 18:41           ` Elijah Newren
2021-12-04 19:58           ` Junio C Hamano
2021-12-03 21:16         ` [PATCH v5 2/7] commit-graph: return if there is no git directory Lessley Dennington via GitGitGadget
2021-12-03 21:16         ` [PATCH v5 3/7] test-read-cache: set up repo after " Lessley Dennington via GitGitGadget
2021-12-03 21:16         ` [PATCH v5 4/7] repo-settings: prepare_repo_settings only in git repos Lessley Dennington via GitGitGadget
2021-12-03 21:16         ` [PATCH v5 5/7] diff: replace --staged with --cached in t1092 tests Lessley Dennington via GitGitGadget
2021-12-03 21:16         ` [PATCH v5 6/7] diff: enable and test the sparse index Lessley Dennington via GitGitGadget
2021-12-03 21:16         ` [PATCH v5 7/7] blame: " Lessley Dennington via GitGitGadget
2021-12-04 19:43         ` [PATCH v5 0/7] Sparse Index: diff and blame builtins Elijah Newren

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).