git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Sparse index integration with 'git show'
@ 2022-04-07 16:37 Derrick Stolee via GitGitGadget
  2022-04-07 16:37 ` [PATCH 1/4] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-07 16:37 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee

This continues our sequence of integrating builtins with the sparse index.

'git show' is relatively simple to get working in a way that doesn't fail
when it would previously succeed, but there are some sutbleties when the
user passes a directory path. If that path happens to be a sparse directory
entry, we suddenly start succeeding and printing the tree information!

Since this behavior can change depending on the sparse checkout definition
and the state of index entries within that directory, this new behavior
would be more likely to confuse users than help them.

Here is an outline of this series:

 * Patch 1: Add more tests around 'git show :' in t1092.

 * Patch 2: Make 'git show' stop expanding the index by default. Make note
   of this behavior change in the tests.

 * Patches 3-4: Make the subtle changes to object-name.c that help us reject
   sparse directories (patch 3) and print the correct error message (patch
   4).

Patches 2-4 could realistically be squashed into a single commit, but I
thought it might be instructive to show these individual steps, especially
as an example for our GSoC project.

This series is based on the current 'master'. I know that Victoria intends
to submit her 'git stash' integration soon, and this provides a way to test
if our split of test changes in t1092 are easy to merge without conflict. If
that is successful, then I will likely submit my integration with the
'sparse-checkout' builtin after this series is complete.

Thanks, -Stolee

Derrick Stolee (4):
  t1092: add compatibility tests for 'git show'
  show: integrate with the sparse index
  object-name: reject trees found in the index
  object-name: diagnose trees in index properly

 builtin/log.c                            |  5 ++++
 object-name.c                            | 25 +++++++++++++++---
 t/t1092-sparse-checkout-compatibility.sh | 33 ++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 3 deletions(-)


base-commit: 07330a41d66a2c9589b585a3a24ecdcf19994f19
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1207%2Fderrickstolee%2Fsparse-index%2Fgit-show-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1207/derrickstolee/sparse-index/git-show-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1207
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] t1092: add compatibility tests for 'git show'
  2022-04-07 16:37 [PATCH 0/4] Sparse index integration with 'git show' Derrick Stolee via GitGitGadget
@ 2022-04-07 16:37 ` Derrick Stolee via GitGitGadget
  2022-04-14 18:37   ` Josh Steadmon
  2022-04-07 16:37 ` [PATCH 2/4] show: integrate with the sparse index Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-07 16:37 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 236ab530284..74792b5ebbc 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1151,6 +1151,22 @@ test_expect_success 'clean' '
 	test_sparse_match test_path_is_dir folder1
 '
 
+test_expect_success 'show (cached blobs/trees)' '
+	init_repos &&
+
+	test_all_match git show :a &&
+	test_all_match git show :deep/a &&
+	test_sparse_match git show :folder1/a &&
+
+	# Asking "git show" for directories in the index
+	# does not work as implemented. The error message is
+	# different for a full checkout and a sparse checkout
+	# when the directory is outside of the cone.
+	test_all_match test_must_fail git show :deep/ &&
+	test_must_fail git -C full-checkout show :folder1/ &&
+	test_sparse_match test_must_fail git show :folder1/
+'
+
 test_expect_success 'submodule handling' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] show: integrate with the sparse index
  2022-04-07 16:37 [PATCH 0/4] Sparse index integration with 'git show' Derrick Stolee via GitGitGadget
  2022-04-07 16:37 ` [PATCH 1/4] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
@ 2022-04-07 16:37 ` Derrick Stolee via GitGitGadget
  2022-04-14 18:50   ` Josh Steadmon
  2022-04-07 16:37 ` [PATCH 3/4] object-name: reject trees found in the index Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-07 16:37 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git show' command can take an input to request the state of an
object in the index. This can lead to parsing the index in order to load
a specific file entry. Without the change presented here, a sparse index
would expand to a full one, taking much longer than usual to access a
simple file.

There is one behavioral change that happens here, though: we now can
find a sparse directory entry within the index! Commands that previously
failed because we could not find an entry in the worktree or index now
succeed because we _do_ find an entry in the index.

There might be more work to do to make other situations succeed when
looking for an indexed tree, perhaps by looking at or updating the
cache-tree extension as needed. These situations include having a full
index or asking for a directory that is within the sparse-checkout cone
(and hence is not a sparse directory entry in the index).

For now, we demonstrate how the sparse index integration is extremely
simple for files outside of the cone as well as directories within the
cone. A later change will resolve this behavior around sparse
directories.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/log.c                            |  5 +++++
 t/t1092-sparse-checkout-compatibility.sh | 23 +++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index c211d66d1d0..8e2e9912ab9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -661,6 +661,11 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	init_log_defaults();
 	git_config(git_log_config, NULL);
 
+	if (the_repository->gitdir) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
+
 	memset(&match_all, 0, sizeof(match_all));
 	repo_init_revisions(the_repository, &rev, prefix);
 	git_config(grep_config, &rev.grep_filter);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 74792b5ebbc..f6a14e08b81 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1159,12 +1159,20 @@ test_expect_success 'show (cached blobs/trees)' '
 	test_sparse_match git show :folder1/a &&
 
 	# Asking "git show" for directories in the index
-	# does not work as implemented. The error message is
-	# different for a full checkout and a sparse checkout
-	# when the directory is outside of the cone.
+	# changes depending on the existence of a sparse index.
 	test_all_match test_must_fail git show :deep/ &&
 	test_must_fail git -C full-checkout show :folder1/ &&
-	test_sparse_match test_must_fail git show :folder1/
+	test_must_fail git -C sparse-checkout show :folder1/ &&
+
+	git -C sparse-index show :folder1/ >actual &&
+	git -C full-checkout show HEAD:folder1 >expect &&
+
+	# The output of "git show" includes the way we referenced the
+	# objects, so strip that out.
+	test_line_count = 4 actual &&
+	tail -n 2 actual >actual-trunc &&
+	tail -n 2 expect >expect-trunc &&
+	test_cmp expect-trunc actual-trunc
 '
 
 test_expect_success 'submodule handling' '
@@ -1388,6 +1396,13 @@ test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --cached
 '
 
+test_expect_success 'sparse index is not expanded: show' '
+	init_repos &&
+
+	ensure_not_expanded show :a &&
+	ensure_not_expanded show :deep/a
+'
+
 test_expect_success 'sparse index is not expanded: update-index' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] object-name: reject trees found in the index
  2022-04-07 16:37 [PATCH 0/4] Sparse index integration with 'git show' Derrick Stolee via GitGitGadget
  2022-04-07 16:37 ` [PATCH 1/4] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
  2022-04-07 16:37 ` [PATCH 2/4] show: integrate with the sparse index Derrick Stolee via GitGitGadget
@ 2022-04-07 16:37 ` Derrick Stolee via GitGitGadget
  2022-04-14 18:57   ` Josh Steadmon
  2022-04-07 16:37 ` [PATCH 4/4] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-07 16:37 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The get_oid_with_context_1() method is used when parsing revision
arguments. One particular case is to take a ":<path>" string and search
the index for the given path.

In the case of a sparse index, this might find a sparse directory entry,
in which case the contained object is a tree. In the case of a full
index, this search within the index would fail.

In order to maintain identical return state as in a full index, inspect
the discovered cache entry to see if it is a sparse directory and reject
it. This requires being careful around the only_to_die option to be sure
we die only at the correct time.

This changes the behavior of 'git show :<sparse-dir>', but does not
bring it entirely into alignment with a full index case. It specifically
hits the wrong error message within diagnose_invalid_index_path(). That
error message will be corrected in a future change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 object-name.c                            | 19 ++++++++++++++++++-
 t/t1092-sparse-checkout-compatibility.sh | 11 ++---------
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/object-name.c b/object-name.c
index f0e327f91f5..2dc5d2549b8 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1881,6 +1881,20 @@ static char *resolve_relative_path(struct repository *r, const char *rel)
 			   rel);
 }
 
+static int reject_tree_in_index(struct repository *repo,
+				int only_to_die,
+				const struct cache_entry *ce,
+				int stage,
+				const char *prefix,
+				const char *cp)
+{
+	if (!S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+	if (only_to_die)
+		diagnose_invalid_index_path(repo, stage, prefix, cp);
+	return -1;
+}
+
 static enum get_oid_result get_oid_with_context_1(struct repository *repo,
 				  const char *name,
 				  unsigned flags,
@@ -1955,9 +1969,12 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
 			    memcmp(ce->name, cp, namelen))
 				break;
 			if (ce_stage(ce) == stage) {
+				free(new_path);
+				if (reject_tree_in_index(repo, only_to_die, ce,
+							 stage, prefix, cp))
+					return -1;
 				oidcpy(oid, &ce->oid);
 				oc->mode = ce->ce_mode;
-				free(new_path);
 				return 0;
 			}
 			pos++;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f6a14e08b81..9d32361110d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1164,15 +1164,8 @@ test_expect_success 'show (cached blobs/trees)' '
 	test_must_fail git -C full-checkout show :folder1/ &&
 	test_must_fail git -C sparse-checkout show :folder1/ &&
 
-	git -C sparse-index show :folder1/ >actual &&
-	git -C full-checkout show HEAD:folder1 >expect &&
-
-	# The output of "git show" includes the way we referenced the
-	# objects, so strip that out.
-	test_line_count = 4 actual &&
-	tail -n 2 actual >actual-trunc &&
-	tail -n 2 expect >expect-trunc &&
-	test_cmp expect-trunc actual-trunc
+	test_must_fail git -C sparse-index show :folder1/ 2>err &&
+	grep "is in the index, but not at stage 0" err
 '
 
 test_expect_success 'submodule handling' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] object-name: diagnose trees in index properly
  2022-04-07 16:37 [PATCH 0/4] Sparse index integration with 'git show' Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-04-07 16:37 ` [PATCH 3/4] object-name: reject trees found in the index Derrick Stolee via GitGitGadget
@ 2022-04-07 16:37 ` Derrick Stolee via GitGitGadget
  2022-04-07 20:46   ` Philip Oakley
  2022-04-12  6:32   ` Junio C Hamano
  2022-04-14 18:37 ` [PATCH 0/4] Sparse index integration with 'git show' Josh Steadmon
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  5 siblings, 2 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-07 16:37 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When running 'git show :<path>' where '<path>' is a directory, then
there is a subtle difference between a full checkout and a sparse
checkout. The error message from diagnose_invalid_index_path() reports
whether the path is on disk or not. The full checkout will have the
directory on disk, but the path will not be in the index. The sparse
checokut could have the directory not exist, specifically when that
directory is outside of the sparse-checkout cone.

In the case of a sparse index, we have yet another state: the path can
be a sparse directory in the index. In this case, the error message from
diagnose_invalid_index_path() would erroneously say "path '<path>' is in
the index, but not at stage 0", which is false.

Add special casing around sparse directory entries so we get to the
correct error message. This requires two checks in order to get parity
with the normal sparse-checkout case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 object-name.c                            |  6 ++++--
 t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++++--
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/object-name.c b/object-name.c
index 2dc5d2549b8..4d2746574cd 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1832,7 +1832,8 @@ static void diagnose_invalid_index_path(struct repository *r,
 		pos = -pos - 1;
 	if (pos < istate->cache_nr) {
 		ce = istate->cache[pos];
-		if (ce_namelen(ce) == namelen &&
+		if (!S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) == namelen &&
 		    !memcmp(ce->name, filename, namelen))
 			die(_("path '%s' is in the index, but not at stage %d\n"
 			    "hint: Did you mean ':%d:%s'?"),
@@ -1848,7 +1849,8 @@ static void diagnose_invalid_index_path(struct repository *r,
 		pos = -pos - 1;
 	if (pos < istate->cache_nr) {
 		ce = istate->cache[pos];
-		if (ce_namelen(ce) == fullname.len &&
+		if (!S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) == fullname.len &&
 		    !memcmp(ce->name, fullname.buf, fullname.len))
 			die(_("path '%s' is in the index, but not '%s'\n"
 			    "hint: Did you mean ':%d:%s' aka ':%d:./%s'?"),
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 9d32361110d..921cafb950a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1164,8 +1164,17 @@ test_expect_success 'show (cached blobs/trees)' '
 	test_must_fail git -C full-checkout show :folder1/ &&
 	test_must_fail git -C sparse-checkout show :folder1/ &&
 
-	test_must_fail git -C sparse-index show :folder1/ 2>err &&
-	grep "is in the index, but not at stage 0" err
+	test_sparse_match test_must_fail git show :folder1/ &&
+
+	# Change the sparse cone for an extra case:
+	run_on_sparse git sparse-checkout set deep/deeper1 &&
+
+	# deep/deeper2 is a sparse directory in the sparse index.
+	test_sparse_match test_must_fail git show :deep/deeper2/ &&
+
+	# deep/deeper2/deepest is not in the sparse index, but
+	# will trigger an index expansion.
+	test_sparse_match test_must_fail git show :deep/deeper2/deepest/
 '
 
 test_expect_success 'submodule handling' '
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] object-name: diagnose trees in index properly
  2022-04-07 16:37 ` [PATCH 4/4] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
@ 2022-04-07 20:46   ` Philip Oakley
  2022-04-12  6:32   ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Philip Oakley @ 2022-04-07 20:46 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: gitster, vdye, shaoxuan.yuan02, Derrick Stolee

spelling nit

On 07/04/2022 17:37, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> When running 'git show :<path>' where '<path>' is a directory, then
> there is a subtle difference between a full checkout and a sparse
> checkout. The error message from diagnose_invalid_index_path() reports
> whether the path is on disk or not. The full checkout will have the
> directory on disk, but the path will not be in the index. The sparse
> checokut could have the directory not exist, specifically when that

s/checokut/checkout/
> directory is outside of the sparse-checkout cone.
[...]
--
Philip

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] object-name: diagnose trees in index properly
  2022-04-07 16:37 ` [PATCH 4/4] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
  2022-04-07 20:46   ` Philip Oakley
@ 2022-04-12  6:32   ` Junio C Hamano
  2022-04-12 13:52     ` Derrick Stolee
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2022-04-12  6:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When running 'git show :<path>' where '<path>' is a directory, then
> there is a subtle difference between a full checkout and a sparse
> checkout. The error message from diagnose_invalid_index_path() reports
> whether the path is on disk or not. The full checkout will have the
> directory on disk, but the path will not be in the index. The sparse
> checokut could have the directory not exist, specifically when that
> directory is outside of the sparse-checkout cone.
>
> In the case of a sparse index, we have yet another state: the path can
> be a sparse directory in the index. In this case, the error message from
> diagnose_invalid_index_path() would erroneously say "path '<path>' is in
> the index, but not at stage 0", which is false.
>
> Add special casing around sparse directory entries so we get to the
> correct error message. This requires two checks in order to get parity
> with the normal sparse-checkout case.

That all makes sense, but let me ask a more basic (and possibly
stupid) question and a half. 

 - When running 'git cmd :<path>' where '<path>' is a directory,
   even before the "sparse-index" or "sparse-checkout" world, I
   sometimes wished that ":<path>" resolved to the object name of
   the tree recorded in the cache-tree, if we have one.  If we are
   living in the "sparse-index" world, and the paths underneath
   '<path>' are not in the index (because we are not interested in
   them), we should be able answer "git rev-parse :<path>" with the
   name of the tree object.  It is a "new feature" regardless of
   sparse-index, but I wonder if it is sensible to add more code to
   engrave in stone that we would not support it and keep treating
   the index as if it is a flat list of paths to blobs (and commits,
   for submodules)?

 - When running 'git cmd :<path>' where '<path>' is *not* a
   directory but is not in the index because of "sparse-index"
   (i.e. a leading directory of '<path>' is represented as a tree in
   the index), should ":<path>" index expand the "tree" index entry
   on-demand, so that <path> has its own entry in the index, as if
   no "sparse-index" is in effect?  The tests I saw in the series
   were mostly asserted with test_sparse_match, not test_all_match,
   so I couldn't tell what the expectations are.

 - More generally, if <leading> path is represented as a
   "sparse-dir" entry, should ":<leading>/<lower>" cause the index
   entry for <leading> path to be expanded on-demand?  I am guessing
   that the answer is yes, as we wouldn't be able to even tell if
   such a path <leading>/<lower> would exist in the index if the
   index were converted to full upfront.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] object-name: diagnose trees in index properly
  2022-04-12  6:32   ` Junio C Hamano
@ 2022-04-12 13:52     ` Derrick Stolee
  2022-04-12 15:45       ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee @ 2022-04-12 13:52 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02

On 4/12/2022 2:32 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> When running 'git show :<path>' where '<path>' is a directory, then
>> there is a subtle difference between a full checkout and a sparse
>> checkout. The error message from diagnose_invalid_index_path() reports
>> whether the path is on disk or not. The full checkout will have the
>> directory on disk, but the path will not be in the index. The sparse
>> checokut could have the directory not exist, specifically when that
>> directory is outside of the sparse-checkout cone.
>>
>> In the case of a sparse index, we have yet another state: the path can
>> be a sparse directory in the index. In this case, the error message from
>> diagnose_invalid_index_path() would erroneously say "path '<path>' is in
>> the index, but not at stage 0", which is false.
>>
>> Add special casing around sparse directory entries so we get to the
>> correct error message. This requires two checks in order to get parity
>> with the normal sparse-checkout case.
> 
> That all makes sense, but let me ask a more basic (and possibly
> stupid) question and a half.

All of these questions are good thoughts. 

>  - When running 'git cmd :<path>' where '<path>' is a directory,
>    even before the "sparse-index" or "sparse-checkout" world, I
>    sometimes wished that ":<path>" resolved to the object name of
>    the tree recorded in the cache-tree, if we have one.  If we are
>    living in the "sparse-index" world, and the paths underneath
>    '<path>' are not in the index (because we are not interested in
>    them), we should be able answer "git rev-parse :<path>" with the
>    name of the tree object.  It is a "new feature" regardless of
>    sparse-index, but...

I also considered "what if we made this 'just work' for trees in the
index?" The issue I found is that we might need to generate the cache-tree
extension when it does not exist (or is not up-to-date).

For this series, I erred on matching existing behavior before
adding such a feature. It is unfortunate that any work to show
trees this way would essentially revert patches 3 and 4 of the
current series.

>    ...I wonder if it is sensible to add more code to
>    engrave in stone that we would not support it and keep treating
>    the index as if it is a flat list of paths to blobs (and commits,
>    for submodules)?

I don't consider any code change to be "engraved in stone", including
these tests that I am adding. Tests document expected behavior. If we
decide to change that expected behavior, then we can change the tests.
Better to have tests that change when we are making a choice to change
behavior than to not have that evidence of a behavior change at all.

Unfortunately, we did not previously have tests for "git show :dir",
so this is the first time we are documenting the behavior.

>  - When running 'git cmd :<path>' where '<path>' is *not* a
>    directory but is not in the index because of "sparse-index"
>    (i.e. a leading directory of '<path>' is represented as a tree in
>    the index), should ":<path>" index expand the "tree" index entry
>    on-demand, so that <path> has its own entry in the index, as if
>    no "sparse-index" is in effect?

There was one case of test_sparse_match that could have been a
test_all_match: "git show :folder1/a". Since folder1/a is discovered in
the index, Git doesn't check the worktree for the existence of the file to
report any different behavior. Changing the test works without any code
change, so I'll update that in v2.

>    The tests I saw in the series
>    were mostly asserted with test_sparse_match, not test_all_match,
>    so I couldn't tell what the expectations are.

The reason for test_sparse_match instead of test_all_match is because the
error messages change depending on the existence of the path in the
worktree. For example "git show :folder1/" in the test script would have
this output difference:

+ diff -u full-checkout-err sparse-checkout-err
--- full-checkout-err   2022-04-12 13:35:51.430805689 +0000
+++ sparse-checkout-err 2022-04-12 13:35:51.430805689 +0000
@@ -1 +1 @@
-fatal: path 'folder1/' exists on disk, but not in the index
+fatal: path 'folder1/' does not exist (neither on disk nor in the index)

>  - More generally, if <leading> path is represented as a
>    "sparse-dir" entry, should ":<leading>/<lower>" cause the index
>    entry for <leading> path to be expanded on-demand?  I am guessing
>    that the answer is yes, as we wouldn't be able to even tell if
>    such a path <leading>/<lower> would exist in the index if the
>    index were converted to full upfront.

This is what happens, but it happens inside index_name_pos(), so there
isn't any change necessary in the builtin to handle this. Tests such as
"git show :folder1/a" demonstrate this behavior.

Now, there could be a _faster_ way to do this if we did not depend
directly on index_name_pos(). Using index_name_pos() will trigger the
index expansion, which is necessary for some compatibility with the sparse
index.

In this case, we don't really care what _position_ the cache entry is in,
we just want to know if the path exists in the index or not.

For that, we could extract a helper method out of the implementation of
index_name_pos() to be something like index_name_exists(). The common code
between the two implementations is to find a position in the index that
either corresponds to the exact path _or_ is a sparse directory (or the
path definitely doesn't exist in the index). From that point,
index_name_exists() could do a tree walk to find the path inside the tree.

All of this is _possible_, but I also understand that running "git show
:<path>" is not common for users to run directly. Instead, I've seen this
used by IDEs to find the staged version of a file. This is not likely to
be run for paths outside of the sparse-checkout cone.

I think that this is not the only place that could benefit from an
index_name_exists() method, but I'm not ready to focus on improving
performance for queries outside of the sparse-checkout cone.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] object-name: diagnose trees in index properly
  2022-04-12 13:52     ` Derrick Stolee
@ 2022-04-12 15:45       ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2022-04-12 15:45 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, vdye, shaoxuan.yuan02

Derrick Stolee <derrickstolee@github.com> writes:

> The reason for test_sparse_match instead of test_all_match is because the
> error messages change depending on the existence of the path in the
> worktree. For example "git show :folder1/" in the test script would have
> this output difference:
>
> + diff -u full-checkout-err sparse-checkout-err
> --- full-checkout-err   2022-04-12 13:35:51.430805689 +0000
> +++ sparse-checkout-err 2022-04-12 13:35:51.430805689 +0000
> @@ -1 +1 @@
> -fatal: path 'folder1/' exists on disk, but not in the index
> +fatal: path 'folder1/' does not exist (neither on disk nor in the index)

Ah, because naturally folder1/ would not be on disk if it is outside
the sparse cones?  Makes sense.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] t1092: add compatibility tests for 'git show'
  2022-04-07 16:37 ` [PATCH 1/4] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
@ 2022-04-14 18:37   ` Josh Steadmon
  2022-04-18 12:23     ` Derrick Stolee
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Steadmon @ 2022-04-14 18:37 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, vdye, shaoxuan.yuan02, Derrick Stolee,
	Derrick Stolee

On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 236ab530284..74792b5ebbc 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1151,6 +1151,22 @@ test_expect_success 'clean' '
>  	test_sparse_match test_path_is_dir folder1
>  '
>  
> +test_expect_success 'show (cached blobs/trees)' '
> +	init_repos &&
> +
> +	test_all_match git show :a &&
> +	test_all_match git show :deep/a &&
> +	test_sparse_match git show :folder1/a &&
> +
> +	# Asking "git show" for directories in the index
> +	# does not work as implemented. The error message is
> +	# different for a full checkout and a sparse checkout
> +	# when the directory is outside of the cone.
> +	test_all_match test_must_fail git show :deep/ &&
> +	test_must_fail git -C full-checkout show :folder1/ &&
> +	test_sparse_match test_must_fail git show :folder1/
> +'

A reminder that directories are not present in a non-sparse index would
help those of us unfamiliar with the differences between
sparse/non-sparse indexes to understand why the full-checkout cases fail
here. Initially I was confused why any of these lookups would fail
because my mental model was "a sparse-index is a proper subset of the
non-sparse index".

> +
>  test_expect_success 'submodule handling' '
>  	init_repos &&
>  
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] Sparse index integration with 'git show'
  2022-04-07 16:37 [PATCH 0/4] Sparse index integration with 'git show' Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-04-07 16:37 ` [PATCH 4/4] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
@ 2022-04-14 18:37 ` Josh Steadmon
  2022-04-14 21:14   ` Junio C Hamano
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  5 siblings, 1 reply; 26+ messages in thread
From: Josh Steadmon @ 2022-04-14 18:37 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, vdye, shaoxuan.yuan02, Derrick Stolee

Hi Stolee,

Thanks for the series! All my comments are from the perspective of
someone without much knowledge of the index, much less sparse-index, or
sparse-checkout. I am not sure whether the project should cater to
reviewers in my position, but I did have trouble understanding most of
this series. I didn't have any concerns with the implementation, but the
cover letter, commit messages, and tests were fairly confusing and I
think can be explained better with a few changes:

On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
> This continues our sequence of integrating builtins with the sparse index.
> 
> 'git show' is relatively simple to get working in a way that doesn't fail
> when it would previously succeed, but there are some sutbleties when the
> user passes a directory path. If that path happens to be a sparse directory
> entry, we suddenly start succeeding and printing the tree information!
>
> Since this behavior can change depending on the sparse checkout definition
> and the state of index entries within that directory, this new behavior
> would be more likely to confuse users than help them.

The two paragraphs above did not really convey to me on first
read-through what the problem was. IIUC the main points are:

* git-show does not currently work with the sparse index.
* A simple change fixes the above, but causes us to sometimes print
  unexpected information about trees.
* We can use the simple change in our implementation, but must do
  additional work to make sure we only print the expected information.

I think this could be clarified by:
* Including examples of the unhelpful output from the simple
  implementation.
* Describing in more detail the situations that trigger this.
* Describing why those situations don't currently happen without support
  for sparse indexes.

> Here is an outline of this series:
> 
>  * Patch 1: Add more tests around 'git show :' in t1092.
> 
>  * Patch 2: Make 'git show' stop expanding the index by default. Make note
>    of this behavior change in the tests.
> 
>  * Patches 3-4: Make the subtle changes to object-name.c that help us reject
>    sparse directories (patch 3) and print the correct error message (patch
>    4).
> 
> Patches 2-4 could realistically be squashed into a single commit, but I
> thought it might be instructive to show these individual steps, especially
> as an example for our GSoC project.
> 
> This series is based on the current 'master'. I know that Victoria intends
> to submit her 'git stash' integration soon, and this provides a way to test
> if our split of test changes in t1092 are easy to merge without conflict. If
> that is successful, then I will likely submit my integration with the
> 'sparse-checkout' builtin after this series is complete.
> 
> Thanks, -Stolee


Thanks,
Josh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] show: integrate with the sparse index
  2022-04-07 16:37 ` [PATCH 2/4] show: integrate with the sparse index Derrick Stolee via GitGitGadget
@ 2022-04-14 18:50   ` Josh Steadmon
  2022-04-18 12:28     ` Derrick Stolee
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Steadmon @ 2022-04-14 18:50 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, vdye, shaoxuan.yuan02, Derrick Stolee,
	Derrick Stolee

On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The 'git show' command can take an input to request the state of an
> object in the index. This can lead to parsing the index in order to load
> a specific file entry. Without the change presented here, a sparse index
> would expand to a full one, taking much longer than usual to access a
> simple file.
> 
> There is one behavioral change that happens here, though: we now can
> find a sparse directory entry within the index! Commands that previously
> failed because we could not find an entry in the worktree or index now
> succeed because we _do_ find an entry in the index.

As with the test in the previous commit, a reminder that sparse-indexes
are not necessarily subsets of a full index could be helpful here.


> There might be more work to do to make other situations succeed when
> looking for an indexed tree, perhaps by looking at or updating the
> cache-tree extension as needed. These situations include having a full
> index or asking for a directory that is within the sparse-checkout cone
> (and hence is not a sparse directory entry in the index).
> 
> For now, we demonstrate how the sparse index integration is extremely
> simple for files outside of the cone as well as directories within the
> cone. A later change will resolve this behavior around sparse
> directories.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/log.c                            |  5 +++++
>  t/t1092-sparse-checkout-compatibility.sh | 23 +++++++++++++++++++----
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index c211d66d1d0..8e2e9912ab9 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -661,6 +661,11 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  	init_log_defaults();
>  	git_config(git_log_config, NULL);
>  
> +	if (the_repository->gitdir) {
> +		prepare_repo_settings(the_repository);
> +		the_repository->settings.command_requires_full_index = 0;
> +	}
> +
>  	memset(&match_all, 0, sizeof(match_all));
>  	repo_init_revisions(the_repository, &rev, prefix);
>  	git_config(grep_config, &rev.grep_filter);
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 74792b5ebbc..f6a14e08b81 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1159,12 +1159,20 @@ test_expect_success 'show (cached blobs/trees)' '
>  	test_sparse_match git show :folder1/a &&
>  
>  	# Asking "git show" for directories in the index
> -	# does not work as implemented. The error message is
> -	# different for a full checkout and a sparse checkout
> -	# when the directory is outside of the cone.
> +	# changes depending on the existence of a sparse index.

The wording here seems awkward after these changes are applied. Without
other context, it makes it sound to me like the command(s) used to show
a directory change depending on the existence of a sparse index, rather
than the fact that the behavior of `git show` changes.


>  	test_all_match test_must_fail git show :deep/ &&
>  	test_must_fail git -C full-checkout show :folder1/ &&
> -	test_sparse_match test_must_fail git show :folder1/
> +	test_must_fail git -C sparse-checkout show :folder1/ &&
> +
> +	git -C sparse-index show :folder1/ >actual &&
> +	git -C full-checkout show HEAD:folder1 >expect &&
> +
> +	# The output of "git show" includes the way we referenced the
> +	# objects, so strip that out.
> +	test_line_count = 4 actual &&
> +	tail -n 2 actual >actual-trunc &&
> +	tail -n 2 expect >expect-trunc &&
> +	test_cmp expect-trunc actual-trunc
>  '

It's not specific to this commit, but in general I think the series of
changes to this test would be easier to follow if we used hard-coded
strings to compare against, rather than matching parts of files against
each other. It makes it more clear to the reader exactly which behavior
is changing, and can make it more obvious why certain output is
undesirable. However, it would make the test more brittle to future
changes.


>  test_expect_success 'submodule handling' '
> @@ -1388,6 +1396,13 @@ test_expect_success 'sparse index is not expanded: diff' '
>  	ensure_not_expanded diff --cached
>  '
>  
> +test_expect_success 'sparse index is not expanded: show' '
> +	init_repos &&
> +
> +	ensure_not_expanded show :a &&
> +	ensure_not_expanded show :deep/a
> +'
> +
>  test_expect_success 'sparse index is not expanded: update-index' '
>  	init_repos &&
>  
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] object-name: reject trees found in the index
  2022-04-07 16:37 ` [PATCH 3/4] object-name: reject trees found in the index Derrick Stolee via GitGitGadget
@ 2022-04-14 18:57   ` Josh Steadmon
  2022-04-18 12:31     ` Derrick Stolee
  0 siblings, 1 reply; 26+ messages in thread
From: Josh Steadmon @ 2022-04-14 18:57 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, gitster, vdye, shaoxuan.yuan02, Derrick Stolee

On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The get_oid_with_context_1() method is used when parsing revision
> arguments. One particular case is to take a ":<path>" string and search
> the index for the given path.
> 
> In the case of a sparse index, this might find a sparse directory entry,
> in which case the contained object is a tree. In the case of a full
> index, this search within the index would fail.

Another case where my naive understanding of sparse-indexes caused a lot
of confusion for me. I don't know if we want to add reminders everywhere
in this series, but there should probably be at least one commit message
in the series that points out that sparse indexes are not subsets of
full indexes.


> In order to maintain identical return state as in a full index, inspect
> the discovered cache entry to see if it is a sparse directory and reject
> it. This requires being careful around the only_to_die option to be sure
> we die only at the correct time.
> 
> This changes the behavior of 'git show :<sparse-dir>', but does not
> bring it entirely into alignment with a full index case. It specifically
> hits the wrong error message within diagnose_invalid_index_path(). That
> error message will be corrected in a future change.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  object-name.c                            | 19 ++++++++++++++++++-
>  t/t1092-sparse-checkout-compatibility.sh | 11 ++---------
>  2 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/object-name.c b/object-name.c
> index f0e327f91f5..2dc5d2549b8 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -1881,6 +1881,20 @@ static char *resolve_relative_path(struct repository *r, const char *rel)
>  			   rel);
>  }
>  
> +static int reject_tree_in_index(struct repository *repo,
> +				int only_to_die,
> +				const struct cache_entry *ce,
> +				int stage,
> +				const char *prefix,
> +				const char *cp)
> +{
> +	if (!S_ISSPARSEDIR(ce->ce_mode))
> +		return 0;
> +	if (only_to_die)
> +		diagnose_invalid_index_path(repo, stage, prefix, cp);
> +	return -1;
> +}
> +
>  static enum get_oid_result get_oid_with_context_1(struct repository *repo,
>  				  const char *name,
>  				  unsigned flags,
> @@ -1955,9 +1969,12 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
>  			    memcmp(ce->name, cp, namelen))
>  				break;
>  			if (ce_stage(ce) == stage) {
> +				free(new_path);
> +				if (reject_tree_in_index(repo, only_to_die, ce,
> +							 stage, prefix, cp))
> +					return -1;
>  				oidcpy(oid, &ce->oid);
>  				oc->mode = ce->ce_mode;
> -				free(new_path);
>  				return 0;
>  			}
>  			pos++;
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f6a14e08b81..9d32361110d 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1164,15 +1164,8 @@ test_expect_success 'show (cached blobs/trees)' '
>  	test_must_fail git -C full-checkout show :folder1/ &&
>  	test_must_fail git -C sparse-checkout show :folder1/ &&
>  
> -	git -C sparse-index show :folder1/ >actual &&
> -	git -C full-checkout show HEAD:folder1 >expect &&
> -
> -	# The output of "git show" includes the way we referenced the
> -	# objects, so strip that out.
> -	test_line_count = 4 actual &&
> -	tail -n 2 actual >actual-trunc &&
> -	tail -n 2 expect >expect-trunc &&
> -	test_cmp expect-trunc actual-trunc
> +	test_must_fail git -C sparse-index show :folder1/ 2>err &&
> +	grep "is in the index, but not at stage 0" err
>  '

It might be worth a note that we're demonstrating the current behavior
here, but this is not the desired end-state. In other words, explicitly
note that this is the "wrong error message" referred to in the commit
message.


>  
>  test_expect_success 'submodule handling' '
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] Sparse index integration with 'git show'
  2022-04-14 18:37 ` [PATCH 0/4] Sparse index integration with 'git show' Josh Steadmon
@ 2022-04-14 21:14   ` Junio C Hamano
  2022-04-18 12:42     ` Derrick Stolee
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2022-04-14 21:14 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: Josh Steadmon, git, vdye, shaoxuan.yuan02, Derrick Stolee

Josh Steadmon <steadmon@google.com> writes:

>> 'git show' is relatively simple to get working in a way that doesn't fail
>> when it would previously succeed, but there are some sutbleties when the
>> user passes a directory path. If that path happens to be a sparse directory
>> entry, we suddenly start succeeding and printing the tree information!
>>
>> Since this behavior can change depending on the sparse checkout definition
>> and the state of index entries within that directory, this new behavior
>> would be more likely to confuse users than help them.
>
> The two paragraphs above did not really convey to me on first
> read-through what the problem was. IIUC the main points are:
>
> * git-show does not currently work with the sparse index.
> * A simple change fixes the above, but causes us to sometimes print
>   unexpected information about trees.
> * We can use the simple change in our implementation, but must do
>   additional work to make sure we only print the expected information.
>
> I think this could be clarified by:
> * Including examples of the unhelpful output from the simple
>   implementation.
> * Describing in more detail the situations that trigger this.
> * Describing why those situations don't currently happen without support
>   for sparse indexes.

I think the issues patches 2-4 addresses are not limited to any
specific command, but how ":<path-in-the-index>" syntax is resolved
to an object name (including the way how error messages are issued
when the given path is not found in the index).

But the title of the series and presentation place undue stress on
"git show", which I suspect may be confusing to some readers.

For example, with these patches, wouldn't "git rev-parse" start
working better, even though the proposed log message for no commit
in the series talks about "rev-parse" and no tests are done with the
"rev-parse" command?

I am not suggesting that commands other than "show" should be also
discussed in detail or tested.  It would help readers if we said
that we are using 'show' merely as an example to demonstrate how
":<path-in-the-index>" syntax interacts with the sparse index
entries (1) before the series, and (2) how well the improved
object-name parsing logic works after the series.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] t1092: add compatibility tests for 'git show'
  2022-04-14 18:37   ` Josh Steadmon
@ 2022-04-18 12:23     ` Derrick Stolee
  0 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-04-18 12:23 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee via GitGitGadget, git, gitster,
	vdye, shaoxuan.yuan02, Derrick Stolee

On 4/14/2022 2:37 PM, Josh Steadmon wrote:
> On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
>>  t/t1092-sparse-checkout-compatibility.sh | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
>> index 236ab530284..74792b5ebbc 100755
>> --- a/t/t1092-sparse-checkout-compatibility.sh
>> +++ b/t/t1092-sparse-checkout-compatibility.sh
>> @@ -1151,6 +1151,22 @@ test_expect_success 'clean' '
>>  	test_sparse_match test_path_is_dir folder1
>>  '
>>  
>> +test_expect_success 'show (cached blobs/trees)' '
>> +	init_repos &&
>> +
>> +	test_all_match git show :a &&
>> +	test_all_match git show :deep/a &&
>> +	test_sparse_match git show :folder1/a &&
>> +
>> +	# Asking "git show" for directories in the index
>> +	# does not work as implemented. The error message is
>> +	# different for a full checkout and a sparse checkout
>> +	# when the directory is outside of the cone.
>> +	test_all_match test_must_fail git show :deep/ &&
>> +	test_must_fail git -C full-checkout show :folder1/ &&
>> +	test_sparse_match test_must_fail git show :folder1/
>> +'
> 
> A reminder that directories are not present in a non-sparse index would
> help those of us unfamiliar with the differences between
> sparse/non-sparse indexes to understand why the full-checkout cases fail
> here. Initially I was confused why any of these lookups would fail
> because my mental model was "a sparse-index is a proper subset of the
> non-sparse index".

At this point, it would be repetitive to explain the sparse index
every time we do anything involving it. Reviewers should expect to
be familiar with the topic, or consult the in-tree documentation [1, 2].

[1] https://github.com/git/git/blob/master/Documentation/technical/sparse-index.txt
[2] https://github.com/git/git/blob/4027e30c5395c9c1aeea85e99f51ac62f5148145/Documentation/technical/index-format.txt#L396-L406

Specific to this change, sparse directories are not being taken into
account, since 'git show' is still in the compatibility mode that
expands a sparse index to a full one [3]. Thus, the differences in this
patch are only related to full-checkout versus sparse-checkout.

[3] https://github.com/git/git/blob/4027e30c5395c9c1aeea85e99f51ac62f5148145/Documentation/technical/sparse-index.txt#L69-L74

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] show: integrate with the sparse index
  2022-04-14 18:50   ` Josh Steadmon
@ 2022-04-18 12:28     ` Derrick Stolee
  0 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-04-18 12:28 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee via GitGitGadget, git, gitster,
	vdye, shaoxuan.yuan02, Derrick Stolee

On 4/14/2022 2:50 PM, Josh Steadmon wrote:
> On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>

>>  	# Asking "git show" for directories in the index
>> -	# does not work as implemented. The error message is
>> -	# different for a full checkout and a sparse checkout
>> -	# when the directory is outside of the cone.
>> +	# changes depending on the existence of a sparse index.
> 
> The wording here seems awkward after these changes are applied. Without
> other context, it makes it sound to me like the command(s) used to show
> a directory change depending on the existence of a sparse index, rather
> than the fact that the behavior of `git show` changes.

I can see that.

>> +	# The output of "git show" includes the way we referenced the
>> +	# objects, so strip that out.
>> +	test_line_count = 4 actual &&
>> +	tail -n 2 actual >actual-trunc &&
>> +	tail -n 2 expect >expect-trunc &&
>> +	test_cmp expect-trunc actual-trunc
>>  '
> 
> It's not specific to this commit, but in general I think the series of
> changes to this test would be easier to follow if we used hard-coded
> strings to compare against, rather than matching parts of files against
> each other. It makes it more clear to the reader exactly which behavior
> is changing, and can make it more obvious why certain output is
> undesirable. However, it would make the test more brittle to future
> changes.

The tests here are designed with this approach in mind: demonstrate
success by comparing to existing behavior. We don't want to be
coupled to the exact behavior of these commands, but we _do_ want to
demonstrate that the sparse-checkout or sparse-index features do not
change from the full-checkout behavior (unless we are demonstrating an
expected difference).

In particular, using comparisons like this are also robust against
changes in the test repository data shape, which has been necessary to
update as bugs are found.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] object-name: reject trees found in the index
  2022-04-14 18:57   ` Josh Steadmon
@ 2022-04-18 12:31     ` Derrick Stolee
  0 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-04-18 12:31 UTC (permalink / raw)
  To: Josh Steadmon, Derrick Stolee via GitGitGadget, git, gitster,
	vdye, shaoxuan.yuan02

On 4/14/2022 2:57 PM, Josh Steadmon wrote:
> On 2022.04.07 16:37, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>

>> This changes the behavior of 'git show :<sparse-dir>', but does not
>> bring it entirely into alignment with a full index case. It specifically
>> hits the wrong error message within diagnose_invalid_index_path(). That
>> error message will be corrected in a future change.

...

>> +	test_must_fail git -C sparse-index show :folder1/ 2>err &&
>> +	grep "is in the index, but not at stage 0" err
>>  '
> 
> It might be worth a note that we're demonstrating the current behavior
> here, but this is not the desired end-state. In other words, explicitly
> note that this is the "wrong error message" referred to in the commit
> message.

This is mentioned in the commit message. Since the line will be changed
in the next patch, I didn't think a test comment was worth it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/4] Sparse index integration with 'git show'
  2022-04-14 21:14   ` Junio C Hamano
@ 2022-04-18 12:42     ` Derrick Stolee
  0 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-04-18 12:42 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: Josh Steadmon, git, vdye, shaoxuan.yuan02

On 4/14/2022 5:14 PM, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
>>> 'git show' is relatively simple to get working in a way that doesn't fail
>>> when it would previously succeed, but there are some sutbleties when the
>>> user passes a directory path. If that path happens to be a sparse directory
>>> entry, we suddenly start succeeding and printing the tree information!
>>>
>>> Since this behavior can change depending on the sparse checkout definition
>>> and the state of index entries within that directory, this new behavior
>>> would be more likely to confuse users than help them.
>>
>> The two paragraphs above did not really convey to me on first
>> read-through what the problem was. IIUC the main points are:
>>
>> * git-show does not currently work with the sparse index.
>> * A simple change fixes the above, but causes us to sometimes print
>>   unexpected information about trees.
>> * We can use the simple change in our implementation, but must do
>>   additional work to make sure we only print the expected information.
>>
>> I think this could be clarified by:
>> * Including examples of the unhelpful output from the simple
>>   implementation.
>> * Describing in more detail the situations that trigger this.
>> * Describing why those situations don't currently happen without support
>>   for sparse indexes.
> 
> I think the issues patches 2-4 addresses are not limited to any
> specific command, but how ":<path-in-the-index>" syntax is resolved
> to an object name (including the way how error messages are issued
> when the given path is not found in the index).

Yes, this is the critical bit. It's the only part of "git show"
that cares about the index.
 
> But the title of the series and presentation place undue stress on
> "git show", which I suspect may be confusing to some readers.
> 
> For example, with these patches, wouldn't "git rev-parse" start
> working better, even though the proposed log message for no commit
> in the series talks about "rev-parse" and no tests are done with the
> "rev-parse" command?

Probably, assuming we unset the command_requires_full_index
protection on 'git rev-parse'. This might be a good case for
Shaoxuan to try.

> I am not suggesting that commands other than "show" should be also
> discussed in detail or tested.  It would help readers if we said
> that we are using 'show' merely as an example to demonstrate how
> ":<path-in-the-index>" syntax interacts with the sparse index
> entries (1) before the series, and (2) how well the improved
> object-name parsing logic works after the series.
 
I can see that.
Some background: as we shipped our sparse index integrations in
the microsoft/git fork and measured performance of real users, we
noticed that 'git show' was a frequently-used command that went
from ~0.5 seconds to ~4 seconds in monorepo situations. This was
unexpected, because we didn't know about the ":<path>" syntax.
Further, it seemed that some third-party tools were triggering
this behavior as a frequent way to check on staged content. That
motivated quickly shipping this integration. Performance improved
to ~0.1 seconds because of the reduced index size.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/5] Sparse index integration with 'git show'
  2022-04-07 16:37 [PATCH 0/4] Sparse index integration with 'git show' Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-04-14 18:37 ` [PATCH 0/4] Sparse index integration with 'git show' Josh Steadmon
@ 2022-04-26 20:43 ` Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 1/5] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
                     ` (5 more replies)
  5 siblings, 6 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-26 20:43 UTC (permalink / raw)
  To: git
  Cc: gitster, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon,
	Derrick Stolee

This continues our sequence of integrating builtins with the sparse index.

'git show' is relatively simple to get working in a way that doesn't fail
when it would previously succeed, but there are some subtleties when the
user passes a directory path using the ":" syntax to get the path out of the
index. If that path happens to be a sparse directory entry, we suddenly
start succeeding and printing the tree information!

Since this behavior can change depending on the sparse checkout definition
and the state of index entries within that directory, this new behavior
would be more likely to confuse users than help them.

This ":" syntax is shared by other commands like "git rev-parse", but we are
not adding those integrations at this point.

Some background: as we shipped our sparse index integrations in the
microsoft/git fork and measured performance of real users, we noticed that
'git show' was a frequently-used command that went from ~0.5 seconds to ~4
seconds in monorepo situations. This was unexpected, because we didn't know
about the ":" syntax. Further, it seemed that some third-party tools were
triggering this behavior as a frequent way to check on staged content. That
motivated quickly shipping this integration. Performance improved to ~0.1
seconds because of the reduced index size. While inspecting our rebase of
microsoft/git commits onto the 2.36.0 release candidates, I noticed this
integration would be a simpler example to demonstrate how sparse index
integrations should go when the behavior is not too complicated.

Here is an outline of this series:

 * Patch 1: Add more tests around 'git show :' in t1092. These tests are
   only to establish the existing differences between the full and
   sparse-checkout cases, since 'git show' is still protected by
   'command_requires_full_index'.

 * Patch 2: Make 'git show' stop expanding the index by default. Make note
   of this behavior change in the tests.

 * Patches 3-4: Make the subtle changes to object-name.c that help us reject
   sparse directories (patch 3) and print the correct error message (patch
   4).

 * Patch 5: Now that the common index-parsing code is updated, do the
   minimum change to 'git rev-parse' to avoid expanding the index to parse
   index entries for a sparse index.

Patches 2-4 could realistically be squashed into a single commit, but I
thought it might be instructive to show these individual steps, especially
as an example for our GSoC project.

I know that Victoria intends to submit her 'git stash' integration soon, and
this provides a way to test if our split of test changes in t1092 are easy
to merge without conflict. If that is successful, then I will likely submit
my integration with the 'sparse-checkout' builtin after this series is
complete. (UPDATE: we inserted a test in the same location of t1092, but
otherwise there are no textual or semantic conflicts.)


Updates in v2
=============

 * The test comment in patch 2 is updated.
 * A commit message typo in patch 4 is fixed.
 * Patch 4 simplified the behavior, but the previous version didn't clean up
   a test comment about that. It now cleans up the test to be simpler.
 * Patch 5 includes an integration with 'git rev-parse'.
 * The cover letter is expanded with more context.
 * The only conflict with Victoria's new 'git stash' patch series is that we
   both added a test in the same position of t1092. Including both new tests
   is the right way to resolve the conflict. Order does not matter.

[1]
https://lore.kernel.org/git/pull.1171.git.1650908957.gitgitgadget@gmail.com/

Thanks, -Stolee

Derrick Stolee (5):
  t1092: add compatibility tests for 'git show'
  show: integrate with the sparse index
  object-name: reject trees found in the index
  object-name: diagnose trees in index properly
  rev-parse: integrate with sparse index

 builtin/log.c                            |  5 ++++
 builtin/rev-parse.c                      |  3 ++
 object-name.c                            | 25 ++++++++++++++--
 t/t1092-sparse-checkout-compatibility.sh | 36 ++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 3 deletions(-)


base-commit: 07330a41d66a2c9589b585a3a24ecdcf19994f19
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1207%2Fderrickstolee%2Fsparse-index%2Fgit-show-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1207/derrickstolee/sparse-index/git-show-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1207

Range-diff vs v1:

 1:  8c2fdb5a4fc = 1:  8c2fdb5a4fc t1092: add compatibility tests for 'git show'
 2:  27ab853a9b4 ! 2:  2e9d47ab09b show: integrate with the sparse index
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blob
      -	# does not work as implemented. The error message is
      -	# different for a full checkout and a sparse checkout
      -	# when the directory is outside of the cone.
     -+	# changes depending on the existence of a sparse index.
     ++	# had different behavior depending on the existence
     ++	# of a sparse index.
       	test_all_match test_must_fail git show :deep/ &&
       	test_must_fail git -C full-checkout show :folder1/ &&
      -	test_sparse_match test_must_fail git show :folder1/
 3:  f5da5327673 = 3:  5a7561637f0 object-name: reject trees found in the index
 4:  99c09ccc240 ! 4:  b730457fccc object-name: diagnose trees in index properly
     @@ Commit message
          checkout. The error message from diagnose_invalid_index_path() reports
          whether the path is on disk or not. The full checkout will have the
          directory on disk, but the path will not be in the index. The sparse
     -    checokut could have the directory not exist, specifically when that
     +    checkout could have the directory not exist, specifically when that
          directory is outside of the sparse-checkout cone.
      
          In the case of a sparse index, we have yet another state: the path can
     @@ object-name.c: static void diagnose_invalid_index_path(struct repository *r,
      
       ## t/t1092-sparse-checkout-compatibility.sh ##
      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blobs/trees)' '
     + 	test_all_match git show :deep/a &&
     + 	test_sparse_match git show :folder1/a &&
     + 
     +-	# Asking "git show" for directories in the index
     +-	# had different behavior depending on the existence
     +-	# of a sparse index.
     ++	# The error message differs depending on whether
     ++	# the directory exists in the worktree.
     + 	test_all_match test_must_fail git show :deep/ &&
       	test_must_fail git -C full-checkout show :folder1/ &&
     - 	test_must_fail git -C sparse-checkout show :folder1/ &&
     +-	test_must_fail git -C sparse-checkout show :folder1/ &&
     ++	test_sparse_match test_must_fail git show :folder1/ &&
       
      -	test_must_fail git -C sparse-index show :folder1/ 2>err &&
      -	grep "is in the index, but not at stage 0" err
     -+	test_sparse_match test_must_fail git show :folder1/ &&
     -+
      +	# Change the sparse cone for an extra case:
      +	run_on_sparse git sparse-checkout set deep/deeper1 &&
      +
 -:  ----------- > 5:  69efe637a18 rev-parse: integrate with sparse index

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/5] t1092: add compatibility tests for 'git show'
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
@ 2022-04-26 20:43   ` Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 2/5] show: integrate with the sparse index Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-26 20:43 UTC (permalink / raw)
  To: git
  Cc: gitster, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 236ab530284..74792b5ebbc 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1151,6 +1151,22 @@ test_expect_success 'clean' '
 	test_sparse_match test_path_is_dir folder1
 '
 
+test_expect_success 'show (cached blobs/trees)' '
+	init_repos &&
+
+	test_all_match git show :a &&
+	test_all_match git show :deep/a &&
+	test_sparse_match git show :folder1/a &&
+
+	# Asking "git show" for directories in the index
+	# does not work as implemented. The error message is
+	# different for a full checkout and a sparse checkout
+	# when the directory is outside of the cone.
+	test_all_match test_must_fail git show :deep/ &&
+	test_must_fail git -C full-checkout show :folder1/ &&
+	test_sparse_match test_must_fail git show :folder1/
+'
+
 test_expect_success 'submodule handling' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/5] show: integrate with the sparse index
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 1/5] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
@ 2022-04-26 20:43   ` Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 3/5] object-name: reject trees found in the index Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-26 20:43 UTC (permalink / raw)
  To: git
  Cc: gitster, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git show' command can take an input to request the state of an
object in the index. This can lead to parsing the index in order to load
a specific file entry. Without the change presented here, a sparse index
would expand to a full one, taking much longer than usual to access a
simple file.

There is one behavioral change that happens here, though: we now can
find a sparse directory entry within the index! Commands that previously
failed because we could not find an entry in the worktree or index now
succeed because we _do_ find an entry in the index.

There might be more work to do to make other situations succeed when
looking for an indexed tree, perhaps by looking at or updating the
cache-tree extension as needed. These situations include having a full
index or asking for a directory that is within the sparse-checkout cone
(and hence is not a sparse directory entry in the index).

For now, we demonstrate how the sparse index integration is extremely
simple for files outside of the cone as well as directories within the
cone. A later change will resolve this behavior around sparse
directories.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/log.c                            |  5 +++++
 t/t1092-sparse-checkout-compatibility.sh | 24 ++++++++++++++++++++----
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index c211d66d1d0..8e2e9912ab9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -661,6 +661,11 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	init_log_defaults();
 	git_config(git_log_config, NULL);
 
+	if (the_repository->gitdir) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
+
 	memset(&match_all, 0, sizeof(match_all));
 	repo_init_revisions(the_repository, &rev, prefix);
 	git_config(grep_config, &rev.grep_filter);
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 74792b5ebbc..3506c0216f0 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1159,12 +1159,21 @@ test_expect_success 'show (cached blobs/trees)' '
 	test_sparse_match git show :folder1/a &&
 
 	# Asking "git show" for directories in the index
-	# does not work as implemented. The error message is
-	# different for a full checkout and a sparse checkout
-	# when the directory is outside of the cone.
+	# had different behavior depending on the existence
+	# of a sparse index.
 	test_all_match test_must_fail git show :deep/ &&
 	test_must_fail git -C full-checkout show :folder1/ &&
-	test_sparse_match test_must_fail git show :folder1/
+	test_must_fail git -C sparse-checkout show :folder1/ &&
+
+	git -C sparse-index show :folder1/ >actual &&
+	git -C full-checkout show HEAD:folder1 >expect &&
+
+	# The output of "git show" includes the way we referenced the
+	# objects, so strip that out.
+	test_line_count = 4 actual &&
+	tail -n 2 actual >actual-trunc &&
+	tail -n 2 expect >expect-trunc &&
+	test_cmp expect-trunc actual-trunc
 '
 
 test_expect_success 'submodule handling' '
@@ -1388,6 +1397,13 @@ test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --cached
 '
 
+test_expect_success 'sparse index is not expanded: show' '
+	init_repos &&
+
+	ensure_not_expanded show :a &&
+	ensure_not_expanded show :deep/a
+'
+
 test_expect_success 'sparse index is not expanded: update-index' '
 	init_repos &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/5] object-name: reject trees found in the index
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 1/5] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 2/5] show: integrate with the sparse index Derrick Stolee via GitGitGadget
@ 2022-04-26 20:43   ` Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 4/5] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-26 20:43 UTC (permalink / raw)
  To: git
  Cc: gitster, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The get_oid_with_context_1() method is used when parsing revision
arguments. One particular case is to take a ":<path>" string and search
the index for the given path.

In the case of a sparse index, this might find a sparse directory entry,
in which case the contained object is a tree. In the case of a full
index, this search within the index would fail.

In order to maintain identical return state as in a full index, inspect
the discovered cache entry to see if it is a sparse directory and reject
it. This requires being careful around the only_to_die option to be sure
we die only at the correct time.

This changes the behavior of 'git show :<sparse-dir>', but does not
bring it entirely into alignment with a full index case. It specifically
hits the wrong error message within diagnose_invalid_index_path(). That
error message will be corrected in a future change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 object-name.c                            | 19 ++++++++++++++++++-
 t/t1092-sparse-checkout-compatibility.sh | 11 ++---------
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/object-name.c b/object-name.c
index f0e327f91f5..2dc5d2549b8 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1881,6 +1881,20 @@ static char *resolve_relative_path(struct repository *r, const char *rel)
 			   rel);
 }
 
+static int reject_tree_in_index(struct repository *repo,
+				int only_to_die,
+				const struct cache_entry *ce,
+				int stage,
+				const char *prefix,
+				const char *cp)
+{
+	if (!S_ISSPARSEDIR(ce->ce_mode))
+		return 0;
+	if (only_to_die)
+		diagnose_invalid_index_path(repo, stage, prefix, cp);
+	return -1;
+}
+
 static enum get_oid_result get_oid_with_context_1(struct repository *repo,
 				  const char *name,
 				  unsigned flags,
@@ -1955,9 +1969,12 @@ static enum get_oid_result get_oid_with_context_1(struct repository *repo,
 			    memcmp(ce->name, cp, namelen))
 				break;
 			if (ce_stage(ce) == stage) {
+				free(new_path);
+				if (reject_tree_in_index(repo, only_to_die, ce,
+							 stage, prefix, cp))
+					return -1;
 				oidcpy(oid, &ce->oid);
 				oc->mode = ce->ce_mode;
-				free(new_path);
 				return 0;
 			}
 			pos++;
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 3506c0216f0..08c9cfd359e 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1165,15 +1165,8 @@ test_expect_success 'show (cached blobs/trees)' '
 	test_must_fail git -C full-checkout show :folder1/ &&
 	test_must_fail git -C sparse-checkout show :folder1/ &&
 
-	git -C sparse-index show :folder1/ >actual &&
-	git -C full-checkout show HEAD:folder1 >expect &&
-
-	# The output of "git show" includes the way we referenced the
-	# objects, so strip that out.
-	test_line_count = 4 actual &&
-	tail -n 2 actual >actual-trunc &&
-	tail -n 2 expect >expect-trunc &&
-	test_cmp expect-trunc actual-trunc
+	test_must_fail git -C sparse-index show :folder1/ 2>err &&
+	grep "is in the index, but not at stage 0" err
 '
 
 test_expect_success 'submodule handling' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/5] object-name: diagnose trees in index properly
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-04-26 20:43   ` [PATCH v2 3/5] object-name: reject trees found in the index Derrick Stolee via GitGitGadget
@ 2022-04-26 20:43   ` Derrick Stolee via GitGitGadget
  2022-04-26 20:43   ` [PATCH v2 5/5] rev-parse: integrate with sparse index Derrick Stolee via GitGitGadget
  2022-04-26 20:55   ` [PATCH v2 0/5] Sparse index integration with 'git show' Junio C Hamano
  5 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-26 20:43 UTC (permalink / raw)
  To: git
  Cc: gitster, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

When running 'git show :<path>' where '<path>' is a directory, then
there is a subtle difference between a full checkout and a sparse
checkout. The error message from diagnose_invalid_index_path() reports
whether the path is on disk or not. The full checkout will have the
directory on disk, but the path will not be in the index. The sparse
checkout could have the directory not exist, specifically when that
directory is outside of the sparse-checkout cone.

In the case of a sparse index, we have yet another state: the path can
be a sparse directory in the index. In this case, the error message from
diagnose_invalid_index_path() would erroneously say "path '<path>' is in
the index, but not at stage 0", which is false.

Add special casing around sparse directory entries so we get to the
correct error message. This requires two checks in order to get parity
with the normal sparse-checkout case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 object-name.c                            |  6 ++++--
 t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/object-name.c b/object-name.c
index 2dc5d2549b8..4d2746574cd 100644
--- a/object-name.c
+++ b/object-name.c
@@ -1832,7 +1832,8 @@ static void diagnose_invalid_index_path(struct repository *r,
 		pos = -pos - 1;
 	if (pos < istate->cache_nr) {
 		ce = istate->cache[pos];
-		if (ce_namelen(ce) == namelen &&
+		if (!S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) == namelen &&
 		    !memcmp(ce->name, filename, namelen))
 			die(_("path '%s' is in the index, but not at stage %d\n"
 			    "hint: Did you mean ':%d:%s'?"),
@@ -1848,7 +1849,8 @@ static void diagnose_invalid_index_path(struct repository *r,
 		pos = -pos - 1;
 	if (pos < istate->cache_nr) {
 		ce = istate->cache[pos];
-		if (ce_namelen(ce) == fullname.len &&
+		if (!S_ISSPARSEDIR(ce->ce_mode) &&
+		    ce_namelen(ce) == fullname.len &&
 		    !memcmp(ce->name, fullname.buf, fullname.len))
 			die(_("path '%s' is in the index, but not '%s'\n"
 			    "hint: Did you mean ':%d:%s' aka ':%d:./%s'?"),
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 08c9cfd359e..fa1d5603605 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1158,15 +1158,21 @@ test_expect_success 'show (cached blobs/trees)' '
 	test_all_match git show :deep/a &&
 	test_sparse_match git show :folder1/a &&
 
-	# Asking "git show" for directories in the index
-	# had different behavior depending on the existence
-	# of a sparse index.
+	# The error message differs depending on whether
+	# the directory exists in the worktree.
 	test_all_match test_must_fail git show :deep/ &&
 	test_must_fail git -C full-checkout show :folder1/ &&
-	test_must_fail git -C sparse-checkout show :folder1/ &&
+	test_sparse_match test_must_fail git show :folder1/ &&
 
-	test_must_fail git -C sparse-index show :folder1/ 2>err &&
-	grep "is in the index, but not at stage 0" err
+	# Change the sparse cone for an extra case:
+	run_on_sparse git sparse-checkout set deep/deeper1 &&
+
+	# deep/deeper2 is a sparse directory in the sparse index.
+	test_sparse_match test_must_fail git show :deep/deeper2/ &&
+
+	# deep/deeper2/deepest is not in the sparse index, but
+	# will trigger an index expansion.
+	test_sparse_match test_must_fail git show :deep/deeper2/deepest/
 '
 
 test_expect_success 'submodule handling' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/5] rev-parse: integrate with sparse index
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-04-26 20:43   ` [PATCH v2 4/5] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
@ 2022-04-26 20:43   ` Derrick Stolee via GitGitGadget
  2022-04-26 20:55   ` [PATCH v2 0/5] Sparse index integration with 'git show' Junio C Hamano
  5 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-26 20:43 UTC (permalink / raw)
  To: git
  Cc: gitster, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

It is not obvious that the 'git rev-parse' builtin would use the sparse
index, but it is possible to parse paths out of the index using the
":<path>" syntax. The 'git rev-parse' output is only the OID of the
object found at that location, but otherwise behaves similarly to 'git
show :<path>'. This includes the failure conditions on directories and
the error messages depending on whether a path is in the worktree or
not.

The only code change required is to change the
command_requires_full_index setting in builtin/rev-parse.c, and we can
re-use many existing 'git show' tests for the rev-parse case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/rev-parse.c                      |  3 ++
 t/t1092-sparse-checkout-compatibility.sh | 45 +++++++++++++-----------
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 8480a59f573..4fc6185b2d1 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -723,6 +723,9 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 			prefix = setup_git_directory();
 			git_config(git_default_config, NULL);
 			did_repo_setup = 1;
+
+			prepare_repo_settings(the_repository);
+			the_repository->settings.command_requires_full_index = 0;
 		}
 
 		if (!strcmp(arg, "--")) {
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index fa1d5603605..93bcfd20bbc 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1151,29 +1151,32 @@ test_expect_success 'clean' '
 	test_sparse_match test_path_is_dir folder1
 '
 
-test_expect_success 'show (cached blobs/trees)' '
-	init_repos &&
+for builtin in show rev-parse
+do
+	test_expect_success "$builtin (cached blobs/trees)" "
+		init_repos &&
 
-	test_all_match git show :a &&
-	test_all_match git show :deep/a &&
-	test_sparse_match git show :folder1/a &&
+		test_all_match git $builtin :a &&
+		test_all_match git $builtin :deep/a &&
+		test_sparse_match git $builtin :folder1/a &&
 
-	# The error message differs depending on whether
-	# the directory exists in the worktree.
-	test_all_match test_must_fail git show :deep/ &&
-	test_must_fail git -C full-checkout show :folder1/ &&
-	test_sparse_match test_must_fail git show :folder1/ &&
+		# The error message differs depending on whether
+		# the directory exists in the worktree.
+		test_all_match test_must_fail git $builtin :deep/ &&
+		test_must_fail git -C full-checkout $builtin :folder1/ &&
+		test_sparse_match test_must_fail git $builtin :folder1/ &&
 
-	# Change the sparse cone for an extra case:
-	run_on_sparse git sparse-checkout set deep/deeper1 &&
+		# Change the sparse cone for an extra case:
+		run_on_sparse git sparse-checkout set deep/deeper1 &&
 
-	# deep/deeper2 is a sparse directory in the sparse index.
-	test_sparse_match test_must_fail git show :deep/deeper2/ &&
+		# deep/deeper2 is a sparse directory in the sparse index.
+		test_sparse_match test_must_fail git $builtin :deep/deeper2/ &&
 
-	# deep/deeper2/deepest is not in the sparse index, but
-	# will trigger an index expansion.
-	test_sparse_match test_must_fail git show :deep/deeper2/deepest/
-'
+		# deep/deeper2/deepest is not in the sparse index, but
+		# will trigger an index expansion.
+		test_sparse_match test_must_fail git $builtin :deep/deeper2/deepest/
+	"
+done
 
 test_expect_success 'submodule handling' '
 	init_repos &&
@@ -1396,11 +1399,13 @@ test_expect_success 'sparse index is not expanded: diff' '
 	ensure_not_expanded diff --cached
 '
 
-test_expect_success 'sparse index is not expanded: show' '
+test_expect_success 'sparse index is not expanded: show and rev-parse' '
 	init_repos &&
 
 	ensure_not_expanded show :a &&
-	ensure_not_expanded show :deep/a
+	ensure_not_expanded show :deep/a &&
+	ensure_not_expanded rev-parse :a &&
+	ensure_not_expanded rev-parse :deep/a
 '
 
 test_expect_success 'sparse index is not expanded: update-index' '
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/5] Sparse index integration with 'git show'
  2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-04-26 20:43   ` [PATCH v2 5/5] rev-parse: integrate with sparse index Derrick Stolee via GitGitGadget
@ 2022-04-26 20:55   ` Junio C Hamano
  2022-04-27 13:47     ` Derrick Stolee
  5 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2022-04-26 20:55 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon,
	Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This ":" syntax is shared by other commands like "git rev-parse", but we are
> not adding those integrations at this point.

This has been the most curious thing since the initial round.  The
changes in the series are mostly about the code that parses the
":<path>" syntax and yield an object name (when exists) or give an
error messages (otherwise) and die, before the computed object name
gets used by "git show", or any other command that takes an object
name from the command line.

I guess what has been confusing me was the phrase "integration",
that you seem to be using to refer only to the final step of setting
require_full_index to 0, while that is the least interesting part of
a series like this one.  The work done in patches 3/ and 4/ that
paves the way to allow us to set the require_full_index to 0 in any
command that needs to work with the ":<path>" syntax is much more
interesting part of the series, and when viewed from that angle, the
series is not about preparing "show" but about ":<path>" syntax to
work better with the sparse index.

> I know that Victoria intends to submit her 'git stash' integration soon, and
> this provides a way to test if our split of test changes in t1092 are easy
> to merge without conflict. If that is successful, then I will likely submit
> my integration with the 'sparse-checkout' builtin after this series is
> complete. (UPDATE: we inserted a test in the same location of t1092, but
> otherwise there are no textual or semantic conflicts.)

Yeah, I think I already queued Victoria's and the previous round of
this series in 'seen' without problematic conflicts.

Let's take a brief look at the range-diff before I go do something
else...

>  2:  27ab853a9b4 ! 2:  2e9d47ab09b show: integrate with the sparse index
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blob
>       -	# does not work as implemented. The error message is
>       -	# different for a full checkout and a sparse checkout
>       -	# when the directory is outside of the cone.
>      -+	# changes depending on the existence of a sparse index.
>      ++	# had different behavior depending on the existence
>      ++	# of a sparse index.
>        	test_all_match test_must_fail git show :deep/ &&
>        	test_must_fail git -C full-checkout show :folder1/ &&
>       -	test_sparse_match test_must_fail git show :folder1/
>  3:  f5da5327673 = 3:  5a7561637f0 object-name: reject trees found in the index
>  4:  99c09ccc240 ! 4:  b730457fccc object-name: diagnose trees in index properly
>      @@ Commit message
>           checkout. The error message from diagnose_invalid_index_path() reports
>           whether the path is on disk or not. The full checkout will have the
>           directory on disk, but the path will not be in the index. The sparse
>      -    checokut could have the directory not exist, specifically when that
>      +    checkout could have the directory not exist, specifically when that
>           directory is outside of the sparse-checkout cone.
>       
>           In the case of a sparse index, we have yet another state: the path can
>      @@ object-name.c: static void diagnose_invalid_index_path(struct repository *r,
>       
>        ## t/t1092-sparse-checkout-compatibility.sh ##
>       @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'show (cached blobs/trees)' '
>      + 	test_all_match git show :deep/a &&
>      + 	test_sparse_match git show :folder1/a &&
>      + 
>      +-	# Asking "git show" for directories in the index
>      +-	# had different behavior depending on the existence
>      +-	# of a sparse index.
>      ++	# The error message differs depending on whether
>      ++	# the directory exists in the worktree.
>      + 	test_all_match test_must_fail git show :deep/ &&
>        	test_must_fail git -C full-checkout show :folder1/ &&
>      - 	test_must_fail git -C sparse-checkout show :folder1/ &&
>      +-	test_must_fail git -C sparse-checkout show :folder1/ &&
>      ++	test_sparse_match test_must_fail git show :folder1/ &&
>        
>       -	test_must_fail git -C sparse-index show :folder1/ 2>err &&
>       -	grep "is in the index, but not at stage 0" err
>      -+	test_sparse_match test_must_fail git show :folder1/ &&
>      -+
>       +	# Change the sparse cone for an extra case:
>       +	run_on_sparse git sparse-checkout set deep/deeper1 &&
>       +
>  -:  ----------- > 5:  69efe637a18 rev-parse: integrate with sparse index

OK, nothing unexpected other than the rev-parse stuff.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/5] Sparse index integration with 'git show'
  2022-04-26 20:55   ` [PATCH v2 0/5] Sparse index integration with 'git show' Junio C Hamano
@ 2022-04-27 13:47     ` Derrick Stolee
  0 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2022-04-27 13:47 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, vdye, shaoxuan.yuan02, Philip Oakley, Josh Steadmon

On 4/26/2022 4:55 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This ":" syntax is shared by other commands like "git rev-parse", but we are
>> not adding those integrations at this point.
> 
> This has been the most curious thing since the initial round.  The
> changes in the series are mostly about the code that parses the
> ":<path>" syntax and yield an object name (when exists) or give an
> error messages (otherwise) and die, before the computed object name
> gets used by "git show", or any other command that takes an object
> name from the command line.
> 
> I guess what has been confusing me was the phrase "integration",
> that you seem to be using to refer only to the final step of setting
> require_full_index to 0, while that is the least interesting part of
> a series like this one.  The work done in patches 3/ and 4/ that
> paves the way to allow us to set the require_full_index to 0 in any
> command that needs to work with the ":<path>" syntax is much more
> interesting part of the series, and when viewed from that angle, the
> series is not about preparing "show" but about ":<path>" syntax to
> work better with the sparse index.

In general, yes, we do need to be teaching different parts of the codebase
about the sparse index. The way I've been tracking that progress is by
which builtins can stop expanding a sparse index to a full one upon parse.
This progress indicator also matches the testing strategy, which focuses
on preserving behavior for top-level Git commands (and checking that they
don't expand the sparse index when they don't need to).

I understand your thought that it would be better to sell the series to
reviewers by the interesting pieces under the hood that are changing. I
think this is one of the first times where only one of these systems is
sufficient to make an entire builtin (or two) work with the sparse index.

I'll keep this in mind for pitching future updates.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-04-27 13:49 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 16:37 [PATCH 0/4] Sparse index integration with 'git show' Derrick Stolee via GitGitGadget
2022-04-07 16:37 ` [PATCH 1/4] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
2022-04-14 18:37   ` Josh Steadmon
2022-04-18 12:23     ` Derrick Stolee
2022-04-07 16:37 ` [PATCH 2/4] show: integrate with the sparse index Derrick Stolee via GitGitGadget
2022-04-14 18:50   ` Josh Steadmon
2022-04-18 12:28     ` Derrick Stolee
2022-04-07 16:37 ` [PATCH 3/4] object-name: reject trees found in the index Derrick Stolee via GitGitGadget
2022-04-14 18:57   ` Josh Steadmon
2022-04-18 12:31     ` Derrick Stolee
2022-04-07 16:37 ` [PATCH 4/4] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
2022-04-07 20:46   ` Philip Oakley
2022-04-12  6:32   ` Junio C Hamano
2022-04-12 13:52     ` Derrick Stolee
2022-04-12 15:45       ` Junio C Hamano
2022-04-14 18:37 ` [PATCH 0/4] Sparse index integration with 'git show' Josh Steadmon
2022-04-14 21:14   ` Junio C Hamano
2022-04-18 12:42     ` Derrick Stolee
2022-04-26 20:43 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2022-04-26 20:43   ` [PATCH v2 1/5] t1092: add compatibility tests for " Derrick Stolee via GitGitGadget
2022-04-26 20:43   ` [PATCH v2 2/5] show: integrate with the sparse index Derrick Stolee via GitGitGadget
2022-04-26 20:43   ` [PATCH v2 3/5] object-name: reject trees found in the index Derrick Stolee via GitGitGadget
2022-04-26 20:43   ` [PATCH v2 4/5] object-name: diagnose trees in index properly Derrick Stolee via GitGitGadget
2022-04-26 20:43   ` [PATCH v2 5/5] rev-parse: integrate with sparse index Derrick Stolee via GitGitGadget
2022-04-26 20:55   ` [PATCH v2 0/5] Sparse index integration with 'git show' Junio C Hamano
2022-04-27 13:47     ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).