git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/1] mv: integrate with sparse-index
@ 2022-03-15 10:01 Shaoxuan Yuan
  2022-03-15 10:01 ` [RFC PATCH 1/1] " Shaoxuan Yuan
  0 siblings, 1 reply; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-15 10:01 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, vdye, Shaoxuan Yuan

Integrate `git mv` with sparse-index.
The performance tests and ensure_not_expanded tests are not added yet,
cause I want to see if the added tests in this patch are on the right
track.

Shaoxuan Yuan (1):
  mv: integrate with sparse-index

 builtin/mv.c                             |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++
 2 files changed, 37 insertions(+)


base-commit: 1a4874565fa3b6668042216189551b98b4dc0b1b
-- 
2.35.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 10:01 [RFC PATCH 0/1] mv: integrate with sparse-index Shaoxuan Yuan
@ 2022-03-15 10:01 ` Shaoxuan Yuan
  2022-03-15 16:07   ` Victoria Dye
  2022-03-15 17:23   ` Junio C Hamano
  0 siblings, 2 replies; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-15 10:01 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, vdye, Shaoxuan Yuan

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/mv.c                             |  3 +++
 t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/builtin/mv.c b/builtin/mv.c
index 83a465ba83..111360ebf5 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 
 	git_config(git_default_config, NULL);
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
+
 	argc = parse_options(argc, argv, prefix, builtin_mv_options,
 			     builtin_mv_usage, 0);
 	if (--argc < 1)
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2a04b532f9..0a8164c5f6 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1521,4 +1521,38 @@ test_expect_success 'checkout behaves oddly with df-conflict-2' '
 	test_cmp full-checkout-err sparse-index-err
 '
 
+test_expect_success 'mv' '
+	init_repos &&
+
+	# test first form <source> <destination>
+	test_all_match git mv deep/a deep/a_mod &&
+	test_all_match git mv deep/deeper1 deep/deeper1_mod &&
+	test_all_match git mv deep/deeper2/deeper1/deepest2/a \
+	deep/deeper2/deeper1/deepest2/a_mod &&
+
+	run_on_all git reset --hard &&
+
+	test_all_match git mv -f deep/a deep/before/a &&
+	test_all_match git mv -f deep/before/a deep/a &&
+
+	run_on_all git reset --hard &&
+
+	test_all_match git mv -k deep/a deep/before/a &&
+	test_all_match git mv -k deep/before/a deep/a &&
+
+	run_on_all git reset --hard &&
+
+	test_all_match git mv -v deep/a deep/a_mod &&
+	test_all_match git mv -v deep/deeper1 deep/deeper1_mod &&
+	test_all_match git mv -v deep/deeper2/deeper1/deepest2/a \
+	deep/deeper2/deeper1/deepest2/a_mod &&
+
+	# test second form <source> ... <destination directory>
+	run_on_all git reset --hard &&
+	run_on_all mkdir deep/folder &&
+	test_all_match git mv deep/a deep/folder &&
+	test_all_match git mv -v deep/deeper1 deep/folder &&
+	test_all_match git mv -f deep/deeper2/deeper1/deepest2/a deep/folder
+'
+
 test_done
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 10:01 ` [RFC PATCH 1/1] " Shaoxuan Yuan
@ 2022-03-15 16:07   ` Victoria Dye
  2022-03-15 17:14     ` Derrick Stolee
                       ` (2 more replies)
  2022-03-15 17:23   ` Junio C Hamano
  1 sibling, 3 replies; 22+ messages in thread
From: Victoria Dye @ 2022-03-15 16:07 UTC (permalink / raw)
  To: Shaoxuan Yuan, git; +Cc: derrickstolee

Shaoxuan Yuan wrote:
> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> ---
>  builtin/mv.c                             |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
> 

Hi Shaoxuan! 

I'll answer your question "are the tests on the right track?" [1] inline
with the tests here. 

[1] https://lore.kernel.org/git/20220315100145.214054-1-shaoxuan.yuan02@gmail.com/

> diff --git a/builtin/mv.c b/builtin/mv.c
> index 83a465ba83..111360ebf5 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  
>  	git_config(git_default_config, NULL);
>  
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +
>  	argc = parse_options(argc, argv, prefix, builtin_mv_options,
>  			     builtin_mv_usage, 0);
>  	if (--argc < 1)
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 2a04b532f9..0a8164c5f6 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1521,4 +1521,38 @@ test_expect_success 'checkout behaves oddly with df-conflict-2' '
>  	test_cmp full-checkout-err sparse-index-err
>  '
>  
> +test_expect_success 'mv' '
> +	init_repos &&
> +

In 't1092', I've tried to write test cases around some of the
characteristics relevant to sparse checkout/sparse index. For example:

- files inside vs. outside of sparse cone (e.g., 'deep/a' vs 'folder1/a')
- "normal" directories vs. sparse directories (e.g., 'deep/' vs. 'folder1/')
- directories inside a sparse directory vs. "toplevel" sparse directories
  (e.g., 'folder1/0/' vs. 'folder1/')
- options that follow different code paths, especially if those code paths
  interact with the index differently (e.g., 'git reset --hard' vs 'git
  reset --mixed')
- (probably not relevant for 'git mv') files with vs. without staged changes
  in the index

I've found that exercising these characteristics provides good baseline
coverage for a sparse index integration, not leaving any major gaps. I'll
also typically add cases specific to any workarounds I need to add to a
command (like for 'git read-tree --prefix' [2]).

Also, if some of the information about the test repos (e.g., what's inside
vs. outside cone, or what's in the repos in the first place) isn't clear,
I'm happy to give a deeper dive into how they're set up.

With all of that in mind, let's go over the cases you have so far.

[2] https://lore.kernel.org/git/90ebcb7b8ff4b4f1ba09abcbe636d639fa597e74.1646166271.git.gitgitgadget@gmail.com/

> +	# test first form <source> <destination>
> +	test_all_match git mv deep/a deep/a_mod &&
> +	test_all_match git mv deep/deeper1 deep/deeper1_mod &&
> +	test_all_match git mv deep/deeper2/deeper1/deepest2/a \
> +	deep/deeper2/deeper1/deepest2/a_mod &&
> +

This is a good basis for "inside cone" to "inside cone" moves. That said, I
don't think you need all three (since they're all testing effectively the
same inside-to-inside cone move). I'd suggest instead adding cases like: 

	test_all_match git mv <inside cone> <outside cone> &&
	test_all_match git mv <outside cone> <inside cone> &&
	test_all_match git mv <outside cone> <outside cone> &&

to see how the sparse index behaves when files are moved in and out of
sparse directories. Similarly, you may want to try 'git mv <sparse
directory> <somewhere else>' to see if that triggers any unintended
behavior.

Additionally, I don't *think* 'git mv' prints out the state of the index, so
you'll probably want to follow these cases with:

	test_all_match git status --porcelain=v2 &&

which prints the status info in a machine-readable format.

> +	run_on_all git reset --hard &&
> +
> +	test_all_match git mv -f deep/a deep/before/a &&
> +	test_all_match git mv -f deep/before/a deep/a &&
> +

Good! The '-f' option will allow one file to overwrite another in the index,
which is definitely interesting in a sparse index. Same as above, though,
you should verify 'git status --porcelain=v2'.

> +	run_on_all git reset --hard &&
> +
> +	test_all_match git mv -k deep/a deep/before/a &&
> +	test_all_match git mv -k deep/before/a deep/a &&
> +

The '-k' option might be interesting in the context of the index, since it
pushes past errors that would normally make it exit early. However, if it
just skips things that fail rather than exiting with an error, it probably
isn't testing anything more than the 'git mv' cases. 

> +	run_on_all git reset --hard &&
> +
> +	test_all_match git mv -v deep/a deep/a_mod &&
> +	test_all_match git mv -v deep/deeper1 deep/deeper1_mod &&
> +	test_all_match git mv -v deep/deeper2/deeper1/deepest2/a \
> +	deep/deeper2/deeper1/deepest2/a_mod &&
> +

Looking at 'builtin/mv.c', the '-v' "verbose" option only controls whether
some verbose printouts are emitted. This might be relevant if the printouts
were printing index information that didn't match between 'full-checkout',
'sparse-checkout' and 'sparse-index', but if you haven't seen that, I'd
leave these cases out.

> +	# test second form <source> ... <destination directory>
> +	run_on_all git reset --hard &&
> +	run_on_all mkdir deep/folder &&
> +	test_all_match git mv deep/a deep/folder &&
> +	test_all_match git mv -v deep/deeper1 deep/folder &&
> +	test_all_match git mv -f deep/deeper2/deeper1/deepest2/a deep/folder

This is a good variation on the standard "inside cone" to "inside cone", and
I'd like to see something similar done inside sparse directories. And,
similar to above, I don't think '-v' needs to be tested.

> +'
> +
>  test_done

Overall, this is a great start! You've got a good pattern set up (it's very
clear to follow), I think it mainly needs some more variety to the test
cases. Also, if you find that this test gets way too large after adding more
cases, feel free to split it into multiple named tests if one gets too long
(e.g. "mv", "mv -f"). 

My recommendations:

- add tests covering outside-of-sparse-cone 'mv' arguments
- add tests covering 'mv' attempting to move directories (in-cone and
  sparse)
- add some "test_must_fail" tests to see what happens when you do something
  "wrong", e.g. to try to overwrite a file without '-f' (I've found some
  really interesting issues in the past where you expect something to fail
  and it doesn't)
- add 'git status --porcelain=v2' checks to confirm that the 'mv' worked the
  same across the different checkouts
- remove multiples of test cases that test the same general behavior (e.g.,
  'git mv <in-cone file> <in-cone file>' only needs to be done once)
- double-check whether '-v' and '-k' have the ability to affect
  full-checkout/sparse-checkout/sparse-index differently - if not, you
  probably don't need to test them

Thanks for working on this, and I hope this helps!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 16:07   ` Victoria Dye
@ 2022-03-15 17:14     ` Derrick Stolee
  2022-03-16  3:29       ` Shaoxuan Yuan
  2022-03-17  8:37       ` Shaoxuan Yuan
  2022-03-16  3:18     ` Shaoxuan Yuan
  2022-03-16 10:45     ` Shaoxuan Yuan
  2 siblings, 2 replies; 22+ messages in thread
From: Derrick Stolee @ 2022-03-15 17:14 UTC (permalink / raw)
  To: Victoria Dye, Shaoxuan Yuan, git

On 3/15/2022 12:07 PM, Victoria Dye wrote:
> Shaoxuan Yuan wrote:
>> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>> ---
>>  builtin/mv.c                             |  3 +++
>>  t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++
>>  2 files changed, 37 insertions(+)
>>
> 
> Hi Shaoxuan! 

Hello!
 
> I'll answer your question "are the tests on the right track?" [1] inline
> with the tests here. 
> In 't1092', I've tried to write test cases around some of the
> characteristics relevant to sparse checkout/sparse index. For example:
> 
> - files inside vs. outside of sparse cone (e.g., 'deep/a' vs 'folder1/a')
> - "normal" directories vs. sparse directories (e.g., 'deep/' vs. 'folder1/')
> - directories inside a sparse directory vs. "toplevel" sparse directories
>   (e.g., 'folder1/0/' vs. 'folder1/')
> - options that follow different code paths, especially if those code paths
>   interact with the index differently (e.g., 'git reset --hard' vs 'git
>   reset --mixed')
> - (probably not relevant for 'git mv') files with vs. without staged changes
>   in the index
> 
> I've found that exercising these characteristics provides good baseline
> coverage for a sparse index integration, not leaving any major gaps. I'll
> also typically add cases specific to any workarounds I need to add to a
> command (like for 'git read-tree --prefix' [2]).

This, and other advice that Victoria mentions, are really
good points to keep in mind.

> My recommendations:
> 
> - add tests covering outside-of-sparse-cone 'mv' arguments
> - add tests covering 'mv' attempting to move directories (in-cone and
>   sparse)
> - add some "test_must_fail" tests to see what happens when you do something
>   "wrong", e.g. to try to overwrite a file without '-f' (I've found some
>   really interesting issues in the past where you expect something to fail
>   and it doesn't)
> - add 'git status --porcelain=v2' checks to confirm that the 'mv' worked the
>   same across the different checkouts
> - remove multiples of test cases that test the same general behavior (e.g.,
>   'git mv <in-cone file> <in-cone file>' only needs to be done once)
> - double-check whether '-v' and '-k' have the ability to affect
>   full-checkout/sparse-checkout/sparse-index differently - if not, you
>   probably don't need to test them
> 
> Thanks for working on this, and I hope this helps!

You mention in your cover letter that the ensure_not_expanded tests
are not added yet (same with performance tests). Now that you've
gotten feedback on this version of the patch, I might recommend the
organization you might want for a full series:

1. Add these 'mv' tests to t1092 _without_ the code change. These
   tests should work when the index is expanded, and making the
   code change to not expand the index shouldn't change the
   behavior.

2. Add the performance test so we have a baseline to measure how
   well 'mv' does in the normal case (and how it is slower when
   expanding the index).

3. Make the code change and add the ensure_not_expanded test,
   since the functionality from the tests added in (1) will not
   change and we can report the results from the perf tests
   added in (2). The only thing to test is the new, internal
   behavior that the index is not expanded when doing these
   actions. (Keep in mind that we expect the index to be
   expanded for out-of-cone moves, but it's the in-cone moves
   that we expect to not expand.)

Thanks!
-Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 10:01 ` [RFC PATCH 1/1] " Shaoxuan Yuan
  2022-03-15 16:07   ` Victoria Dye
@ 2022-03-15 17:23   ` Junio C Hamano
  2022-03-15 20:00     ` Derrick Stolee
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-03-15 17:23 UTC (permalink / raw)
  To: Shaoxuan Yuan; +Cc: git, derrickstolee, vdye

Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> writes:

> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> ---
>  builtin/mv.c                             |  3 +++
>  t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
>
> diff --git a/builtin/mv.c b/builtin/mv.c
> index 83a465ba83..111360ebf5 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  
>  	git_config(git_default_config, NULL);
>  
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +

The command used to be marked as one of the commands that require
full index to work correctly.  Why did it suddenly become not to
require it, especially without any other changes to make it so?

This patch needs a lot more explaining to do in itse proposed log
message.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 17:23   ` Junio C Hamano
@ 2022-03-15 20:00     ` Derrick Stolee
  0 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2022-03-15 20:00 UTC (permalink / raw)
  To: Junio C Hamano, Shaoxuan Yuan; +Cc: git, vdye

On 3/15/2022 1:23 PM, Junio C Hamano wrote:
> Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> writes:
> 
>> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
>> ---
>>  builtin/mv.c                             |  3 +++
>>  t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++
>>  2 files changed, 37 insertions(+)
>>
>> diff --git a/builtin/mv.c b/builtin/mv.c
>> index 83a465ba83..111360ebf5 100644
>> --- a/builtin/mv.c
>> +++ b/builtin/mv.c
>> @@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>  
>>  	git_config(git_default_config, NULL);
>>  
>> +	prepare_repo_settings(the_repository);
>> +	the_repository->settings.command_requires_full_index = 0;
>> +
> 
> The command used to be marked as one of the commands that require
> full index to work correctly.  Why did it suddenly become not to
> require it, especially without any other changes to make it so?
> 
> This patch needs a lot more explaining to do in itse proposed log
> message.

Right. Some builtins already work safely with the sparse index,
but we just were not sure without creating the proper tests for
it. In this case, I expect 'git mv' uses index_name_pos() to find
the locations of a given index entry, which can cause the index
to expand naturally.

I can definitely imagine a bug where index_name_pos() fixes the
location of the in-cone path within the sparse index, then the
index_name_pos() for the out-of-cone path expands the index and
causes the position of the in-cone path to no longer be correct.

Testing with a variety of in-cone and out-of-cone paths will
help here.

While I was writing this reply, I realized that our default cone
in `t1092` doesn't have a sparse directory before the typical in-cone
path of "deep/a". I set out to make one.

This diff _should_ apply to `t1092` without causing any failures:

--- >8 ---

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2a04b532f91..e9533832aab 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -16,7 +16,9 @@ test_expect_success 'setup' '
 		echo "after deep" >e &&
 		echo "after folder1" >g &&
 		echo "after x" >z &&
-		mkdir folder1 folder2 deep x &&
+		mkdir folder1 folder2 deep before x &&
+		echo "before deep" >before/a &&
+		echo "before deep again" >before/b &&
 		mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
 		mkdir deep/deeper1/deepest &&
 		mkdir deep/deeper1/deepest2 &&
@@ -1311,6 +1313,7 @@ test_expect_success 'ls-files' '
 
 	cat >expect <<-\EOF &&
 	a
+	before/
 	deep/
 	e
 	folder1-
@@ -1358,6 +1361,7 @@ test_expect_success 'ls-files' '
 
 	cat >expect <<-\EOF &&
 	a
+	before/
 	deep/
 	e
 	folder1-

--- >8 ---

However, it causes failures in these tests:

1. `8 - add outside sparse cone`
2. `10 - status/add: outside sparse cone`
3. `21 - reset with pathspecs inside sparse definition`

After talking with @vdye about this, it seems that they are all
failing based on a common issue regarding an index-based diff.
Somehow the diff is not finding a version of the paths so is
reporting them as added.

For example, at the point of failure in `8 - add outside sparse
cone`, we have these results for some Git commands:

$ git diff
diff --git a/folder1/a b/folder1/a
index 7898192..8e27be7 100644
--- a/folder1/a
+++ b/folder1/a
@@ -1 +1 @@
-a
+text

$ git diff --staged

$ git diff --staged -- folder1/a
diff --git a/folder1/a b/folder1/a
new file mode 100644
index 0000000..7898192
--- /dev/null
+++ b/folder1/a
@@ -0,0 +1 @@
+a

$ git diff --staged -- deep
diff --git a/deep/later/a b/deep/later/a
new file mode 100644
index 0000000..7898192
--- /dev/null
+++ b/deep/later/a
@@ -0,0 +1 @@
+a
```

The impact must be pretty low and specific to these prefixed diffs
(the reset test also uses prefixed resets, so pathspecs are somehow
involved) which are rare for users to actually use. Still, we
should fix this and strengthen our tests.

After trying for an hour to fix this myself, I have failed to find
the root cause of this issue. I'm about to head out on vacation, so
I won't have time to look into this again until Monday. I wanted to
share this information so it doesn't cause Shaoxuan too much pain
while working on this 'git mv' change.

Thanks,
-Stolee

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 16:07   ` Victoria Dye
  2022-03-15 17:14     ` Derrick Stolee
@ 2022-03-16  3:18     ` Shaoxuan Yuan
  2022-03-16 10:45     ` Shaoxuan Yuan
  2 siblings, 0 replies; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-16  3:18 UTC (permalink / raw)
  To: Victoria Dye; +Cc: git, derrickstolee

On Wed, Mar 16, 2022 at 12:07 AM Victoria Dye <vdye@github.com> wrote:
>
> Shaoxuan Yuan wrote:
> > Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> > ---
> >  builtin/mv.c                             |  3 +++
> >  t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++
> >  2 files changed, 37 insertions(+)
> >
>
> Hi Shaoxuan!

Hi Victoria!

> Thanks for working on this, and I hope this helps!

Thanks for all the feedback, they are really helpful! I'm still researching and
experimenting with various documents and examples regarding your
feedback. And I will try to submit a more refined patch in less than a
few days :)

-- 
Thanks & Regards,
Shaoxuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 17:14     ` Derrick Stolee
@ 2022-03-16  3:29       ` Shaoxuan Yuan
  2022-03-17  8:37       ` Shaoxuan Yuan
  1 sibling, 0 replies; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-16  3:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Victoria Dye, git

On Wed, Mar 16, 2022 at 1:14 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> Hello!

Hi Derrick,

>
> > I'll answer your question "are the tests on the right track?" [1] inline
> > with the tests here.
> > In 't1092', I've tried to write test cases around some of the
> > characteristics relevant to sparse checkout/sparse index. For example:
> >
> > - files inside vs. outside of sparse cone (e.g., 'deep/a' vs 'folder1/a')
> > - "normal" directories vs. sparse directories (e.g., 'deep/' vs. 'folder1/')
> > - directories inside a sparse directory vs. "toplevel" sparse directories
> >   (e.g., 'folder1/0/' vs. 'folder1/')
> > - options that follow different code paths, especially if those code paths
> >   interact with the index differently (e.g., 'git reset --hard' vs 'git
> >   reset --mixed')
> > - (probably not relevant for 'git mv') files with vs. without staged changes
> >   in the index
> >
> > I've found that exercising these characteristics provides good baseline
> > coverage for a sparse index integration, not leaving any major gaps. I'll
> > also typically add cases specific to any workarounds I need to add to a
> > command (like for 'git read-tree --prefix' [2]).
>
> This, and other advice that Victoria mentions, are really
> good points to keep in mind.
>
> > My recommendations:
> >
> > - add tests covering outside-of-sparse-cone 'mv' arguments
> > - add tests covering 'mv' attempting to move directories (in-cone and
> >   sparse)
> > - add some "test_must_fail" tests to see what happens when you do something
> >   "wrong", e.g. to try to overwrite a file without '-f' (I've found some
> >   really interesting issues in the past where you expect something to fail
> >   and it doesn't)
> > - add 'git status --porcelain=v2' checks to confirm that the 'mv' worked the
> >   same across the different checkouts
> > - remove multiples of test cases that test the same general behavior (e.g.,
> >   'git mv <in-cone file> <in-cone file>' only needs to be done once)
> > - double-check whether '-v' and '-k' have the ability to affect
> >   full-checkout/sparse-checkout/sparse-index differently - if not, you
> >   probably don't need to test them
> >
> > Thanks for working on this, and I hope this helps!
>
> You mention in your cover letter that the ensure_not_expanded tests
> are not added yet (same with performance tests). Now that you've
> gotten feedback on this version of the patch, I might recommend the
> organization you might want for a full series:
>
> 1. Add these 'mv' tests to t1092 _without_ the code change. These
>    tests should work when the index is expanded, and making the
>    code change to not expand the index shouldn't change the
>    behavior.
>
> 2. Add the performance test so we have a baseline to measure how
>    well 'mv' does in the normal case (and how it is slower when
>    expanding the index).
>
> 3. Make the code change and add the ensure_not_expanded test,
>    since the functionality from the tests added in (1) will not
>    change and we can report the results from the perf tests
>    added in (2). The only thing to test is the new, internal
>    behavior that the index is not expanded when doing these
>    actions. (Keep in mind that we expect the index to be
>    expanded for out-of-cone moves, but it's the in-cone moves
>    that we expect to not expand.)

Thanks for the recommendations, they are really helpful! I will try to
address them in the next patch :)

-- 
Thanks & Regards,
Shaoxuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 16:07   ` Victoria Dye
  2022-03-15 17:14     ` Derrick Stolee
  2022-03-16  3:18     ` Shaoxuan Yuan
@ 2022-03-16 10:45     ` Shaoxuan Yuan
  2022-03-16 13:34       ` Derrick Stolee
  2 siblings, 1 reply; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-16 10:45 UTC (permalink / raw)
  To: Victoria Dye; +Cc: git, derrickstolee

Hi Victoria,

Just found an interesting (probably) behavior.

In the `sparse-checkout` directory created in `init_repos()` in t1092, if I
say:

$ mkdir folder3
$ touch folder3/a
$ git mv folder3/a deep

and git will prompt:

"fatal: not under version control, source=folder3/a, destination=deep/a"

And if I say:

$ git mv folder3 deep

git will prompt:

"fatal: source directory is empty, source=folder3, destination=deep/folder3"

What I am wondering is that file `folder3/a` is outside of sparse-checkout cone,
should `git mv` instead prompts with `advise_on_updating_sparse_paths()` or this
"not under version control" alarm is acceptable?

-- 
Thanks & Regards,
Shaoxuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-16 10:45     ` Shaoxuan Yuan
@ 2022-03-16 13:34       ` Derrick Stolee
  2022-03-16 14:46         ` Shaoxuan Yuan
  0 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2022-03-16 13:34 UTC (permalink / raw)
  To: Shaoxuan Yuan, Victoria Dye; +Cc: git

On 3/16/2022 6:45 AM, Shaoxuan Yuan wrote:
> Hi Victoria,
> 
> Just found an interesting (probably) behavior.
> 
> In the `sparse-checkout` directory created in `init_repos()` in t1092, if I
> say:
> 
> $ mkdir folder3
> $ touch folder3/a

The issue here is that this file is "untracked", not just outside
of the sparse-checkout cone.

> $ git mv folder3/a deep
> 
> and git will prompt:
> 
> "fatal: not under version control, source=folder3/a, destination=deep/a"
> 
> And if I say:
> 
> $ git mv folder3 deep
> 
> git will prompt:
> 
> "fatal: source directory is empty, source=folder3, destination=deep/folder3"
> 
> What I am wondering is that file `folder3/a` is outside of sparse-checkout cone,
> should `git mv` instead prompts with `advise_on_updating_sparse_paths()` or this
> "not under version control" alarm is acceptable?

Instead, what about

	git mv folder2/a deep/new

since folder2/a is a tracked file, just not in the working tree
since it is outside the sparse-checkout cone.

(If it fails, then it should fail the same with and without the
sparse index, which is what "test_sparse_match" is for.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-16 13:34       ` Derrick Stolee
@ 2022-03-16 14:46         ` Shaoxuan Yuan
  2022-03-17 21:57           ` Victoria Dye
  0 siblings, 1 reply; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-16 14:46 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Victoria Dye, git

Hi Derrick,

On Wed, Mar 16, 2022 at 9:34 PM Derrick Stolee <derrickstolee@github.com> wrote:
> The issue here is that this file is "untracked", not just outside
> of the sparse-checkout cone.

Thanks for the succinct explanation, it makes much more sense now :)

> Instead, what about
>
>         git mv folder2/a deep/new
>
> since folder2/a is a tracked file, just not in the working tree
> since it is outside the sparse-checkout cone.
>
> (If it fails, then it should fail the same with and without the
> sparse index, which is what "test_sparse_match" is for.)

I tested this and it fails as expected with:
"fatal: bad source, source=folder2/a, destination=deep/new"

> Thanks,
> -Stolee

Thanks for the reply above!

Other than that, I also have found another issue (probably), with

$ mkdir folder2
$ git mv folder2 deep

After these I do:

$ git status

And the output indicates that the index is updated with the following changes:

        renamed:    folder2/0/0/0 -> deep/folder2/0/0/0
        renamed:    folder2/0/1 -> deep/folder2/0/1
        renamed:    folder2/a -> deep/folder2/a

Nothing fails, which is not what I expected. What I expect is `git mv` will
fail because it is being told to update a sparse-directory (which as I read the
blogs and sparse-index.txt is taken as a sparse-directory entry) outside of the
sparse-checkout cone. Unless `git mv` is supplied with `--sparse`, the command
will do nothing but fail, no?

What confuses me more is that the `folder2`, which is present in the index but
not in the working tree (due to sparse-checkout cone), seems to be "unlocked"
and re-picked up by Git after `mkdir folder2` and move `folder2` into
the cone area.
And still, the files under `deep/folder2` are not present in the
working tree (might
be relevant to the previous context).

I haven't run the gdb to see into the process, I just get somehow confused by
these discrepancies (seemingly to me). I think I should gdb into it though,
getting some info here from people can also be really helpful :)

-- 
Thanks & Regards,
Shaoxuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-15 17:14     ` Derrick Stolee
  2022-03-16  3:29       ` Shaoxuan Yuan
@ 2022-03-17  8:37       ` Shaoxuan Yuan
  1 sibling, 0 replies; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-17  8:37 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Victoria Dye, git

Hi Derrick and Victoria.

On Wed, Mar 16, 2022 at 1:14 AM Derrick Stolee <derrickstolee@github.com> wrote:
> You mention in your cover letter that the ensure_not_expanded tests
> are not added yet (same with performance tests). Now that you've
> gotten feedback on this version of the patch, I might recommend the
> organization you might want for a full series:
>
> 1. Add these 'mv' tests to t1092 _without_ the code change. These
>    tests should work when the index is expanded, and making the
>    code change to not expand the index shouldn't change the
>    behavior.
>
> 2. Add the performance test so we have a baseline to measure how
>    well 'mv' does in the normal case (and how it is slower when
>    expanding the index).

I'm a bit caught up here.

Do I just do a before-code-change test and after-code-change test, and
benchmark the after against the before?

Or do you mean I should also perf test out-of-cone arguments with 'mv' so
that the index could be expanded? According to my understanding, the
sparse-index could be required to expand when out-of-cone actions
happen and the 'ensure_full_index()' is called. And do a 3-way comparison
among before-code-change, after-code-change, and after-code-change-
index-expanded, no?

-- 
Thanks & Regards,
Shaoxuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-16 14:46         ` Shaoxuan Yuan
@ 2022-03-17 21:57           ` Victoria Dye
  2022-03-18  1:00             ` Junio C Hamano
  2022-03-27  3:48             ` Shaoxuan Yuan
  0 siblings, 2 replies; 22+ messages in thread
From: Victoria Dye @ 2022-03-17 21:57 UTC (permalink / raw)
  To: Shaoxuan Yuan, Derrick Stolee; +Cc: git

Shaoxuan Yuan wrote:
> Hi Derrick,
> 
> On Wed, Mar 16, 2022 at 9:34 PM Derrick Stolee <derrickstolee@github.com> wrote:
>> The issue here is that this file is "untracked", not just outside
>> of the sparse-checkout cone.
> 
> Thanks for the succinct explanation, it makes much more sense now :)
> 
>> Instead, what about
>>
>>         git mv folder2/a deep/new
>>
>> since folder2/a is a tracked file, just not in the working tree
>> since it is outside the sparse-checkout cone.
>>
>> (If it fails, then it should fail the same with and without the
>> sparse index, which is what "test_sparse_match" is for.)
> 
> I tested this and it fails as expected with:
> "fatal: bad source, source=folder2/a, destination=deep/new"
> 

Great! This should then probably be turned into a "test_expect_fail" test in
't1092' - that'll make sure we get both the right behavior and right error
message with sparse index after it's enabled.

However, I also get the same result when I add the '--sparse' option. I
would expect the behavior to be "move 'folder2/a' to 'deep/new' and check it
out in the worktree" - this may be a good candidate for improving the
existing integration with sparse *checkout* before enabling sparse *index*
(e.g., like when 'git add' was updated to not add sparse files by default
[1]).

[1] https://lore.kernel.org/git/2c5c834bc9fb42aeaff7befbba477aec727184c0.1632497954.git.gitgitgadget@gmail.com/

>> Thanks,
>> -Stolee
> 
> Thanks for the reply above!
> 
> Other than that, I also have found another issue (probably), with
> 
> $ mkdir folder2
> $ git mv folder2 deep
> 
> After these I do:
> 
> $ git status
> 
> And the output indicates that the index is updated with the following changes:
> 
>         renamed:    folder2/0/0/0 -> deep/folder2/0/0/0
>         renamed:    folder2/0/1 -> deep/folder2/0/1
>         renamed:    folder2/a -> deep/folder2/a
> 
> Nothing fails, which is not what I expected. What I expect is `git mv` will
> fail because it is being told to update a sparse-directory (which as I read the
> blogs and sparse-index.txt is taken as a sparse-directory entry) outside of the
> sparse-checkout cone. Unless `git mv` is supplied with `--sparse`, the command
> will do nothing but fail, no?
> 

I think you're right that this is a bug. This appears to come from the fact
that 'mv' decides whether a directory is sparse only *after* it sees that it
doesn't exist on-disk. 

> What confuses me more is that the `folder2`, which is present in the index but
> not in the working tree (due to sparse-checkout cone), seems to be "unlocked"
> and re-picked up by Git after `mkdir folder2` and move `folder2` into
> the cone area.
> And still, the files under `deep/folder2` are not present in the
> working tree (might
> be relevant to the previous context).
> 
This is a consequence of how sparse-checkout is implemented. The files in
'folder2/' aren't on-disk (and aren't correspondingly shown as "deleted" in
'git status') because the index entries of 'folder2/' files are marked with
the "SKIP_WORKTREE" flag. This flag basically indicates to git "this file
shouldn't be in the worktree (on-disk), but it is part of the repository (in
the index)". In other words, it's what makes a file "sparse", and
sparse-checkout (when you run 'set' or 'add') assigns that flag based on the
user-specified patterns. However, the flag isn't constantly being
re-evaluated - only certain commands change SKIP_WORKTREE, and only because
they do so explicitly (e.g., 'git reset --mixed' [2]) - leading to
situations like what you're seeing, where 'folder2/' files (which have
SKIP_WORKTREE enabled) are moved into the sparse cone, but still have
SKIP_WORKTREE enabled.

So I think there are three potential things to fix here: 

1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't
   fail with "bad source", even though it should.
2. When you try to move a sparse file with 'git mv --sparse', it still
   fails.
3. SKIP_WORKTREE is not removed from out-of-cone files moved into the sparse
   cone.

On a related note, there is precedent for needing to make fixes like this
before integrating with sparse index. For example: in addition to the
earlier examples in 'add' and 'reset', 'checkout-index' was changed to no
longer checkout SKIP_WORKTREE files by default [3]. It's a somewhat expected
part of this process because sparse-checkout is still "experimental", and
one of our secondary goals with this sparse index work is to improve the
behavior of sparse-checkout in the commands we integrate. All of this
combined will, ideally, make the experience of using sparse-checkout much
nicer for users (both from usability and performance perspectives).

[2] https://lore.kernel.org/git/b221b00b7e06a3b135b9f68ce87cffaa7d782581.1638201164.git.gitgitgadget@gmail.com/
[3] https://lore.kernel.org/git/601888606d1cf7d7752844dbdbc7fac20d4be8c4.1641924306.git.gitgitgadget@gmail.com/

> I haven't run the gdb to see into the process, I just get somehow confused by
> these discrepancies (seemingly to me). I think I should gdb into it though,
> getting some info here from people can also be really helpful :)
> 

Another tool that may help you here is 'git ls-files --sparse -t'. It lists
the files in the index and their "tags" ('H' is "normal" tracked files, 'S'
is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd
expect to be SKIP_WORKTREE is not and vice versa.

[4] https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt--t 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-17 21:57           ` Victoria Dye
@ 2022-03-18  1:00             ` Junio C Hamano
  2022-03-21 15:20               ` Derrick Stolee
  2022-03-27  3:48             ` Shaoxuan Yuan
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-03-18  1:00 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Shaoxuan Yuan, Derrick Stolee, git

Victoria Dye <vdye@github.com> writes:

>> I tested this and it fails as expected with:
>> "fatal: bad source, source=folder2/a, destination=deep/new"
>
> Great! This should then probably be turned into a "test_expect_fail" test in
> 't1092' - that'll make sure we get both the right behavior and right error
> message with sparse index after it's enabled.
>
> However, I also get the same result when I add the '--sparse' option. I
> would expect the behavior to be "move 'folder2/a' to 'deep/new' and check it
> out in the worktree" - this may be a good candidate for improving the
> existing integration with sparse *checkout* before enabling sparse *index*
> (e.g., like when 'git add' was updated to not add sparse files by default
> [1]).
> ...
> I think you're right that this is a bug. This appears to come from the fact
> that 'mv' decides whether a directory is sparse only *after* it sees that it
> doesn't exist on-disk. 
> ...
> So I think there are three potential things to fix here: 
>
> 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't
>    fail with "bad source", even though it should.
> 2. When you try to move a sparse file with 'git mv --sparse', it still
>    fails.
> 3. SKIP_WORKTREE is not removed from out-of-cone files moved into the sparse
>    cone.
>
> On a related note, there is precedent for needing to make fixes like this
> before integrating with sparse index. For example: in addition to the
> earlier examples in 'add' and 'reset', 'checkout-index' was changed to no
> longer checkout SKIP_WORKTREE files by default [3]. It's a somewhat expected
> part of this process ...
> ...
> Another tool that may help you here is 'git ls-files --sparse -t'. It lists
> the files in the index and their "tags" ('H' is "normal" tracked files, 'S'
> is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd
> expect to be SKIP_WORKTREE is not and vice versa.

Wonderful.

Quite honestly, because the code will most likely compile correctly
if you just remove the unconditional "we first expand the in-core
index fully" code, and because the "sparse index" makes the existing
index walking code fail in unexpected and surprising ways, I
consider it unsuitably harder for people who are not yet familiar
with the system.  Without a good test coverage (which is hard to
give unless you are familiar with the code being tested X-<), one
can easily get confused and lost.

Thanks for guiding a new contributor with the usual process of
loosening "require-full-index".

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-18  1:00             ` Junio C Hamano
@ 2022-03-21 15:20               ` Derrick Stolee
  2022-03-21 19:14                 ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2022-03-21 15:20 UTC (permalink / raw)
  To: Junio C Hamano, Victoria Dye; +Cc: Shaoxuan Yuan, git

On 3/17/2022 9:00 PM, Junio C Hamano wrote:
> Victoria Dye <vdye@github.com> writes:
> 
>>> I tested this and it fails as expected with:
>>> "fatal: bad source, source=folder2/a, destination=deep/new"
>>
>> Great! This should then probably be turned into a "test_expect_fail" test in
>> 't1092' - that'll make sure we get both the right behavior and right error
>> message with sparse index after it's enabled.
>>
>> However, I also get the same result when I add the '--sparse' option. I
>> would expect the behavior to be "move 'folder2/a' to 'deep/new' and check it
>> out in the worktree" - this may be a good candidate for improving the
>> existing integration with sparse *checkout* before enabling sparse *index*
>> (e.g., like when 'git add' was updated to not add sparse files by default
>> [1]).
>> ...
>> I think you're right that this is a bug. This appears to come from the fact
>> that 'mv' decides whether a directory is sparse only *after* it sees that it
>> doesn't exist on-disk. 
>> ...
>> So I think there are three potential things to fix here: 
>>
>> 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't
>>    fail with "bad source", even though it should.
>> 2. When you try to move a sparse file with 'git mv --sparse', it still
>>    fails.
>> 3. SKIP_WORKTREE is not removed from out-of-cone files moved into the sparse
>>    cone.
>>
>> On a related note, there is precedent for needing to make fixes like this
>> before integrating with sparse index. For example: in addition to the
>> earlier examples in 'add' and 'reset', 'checkout-index' was changed to no
>> longer checkout SKIP_WORKTREE files by default [3]. It's a somewhat expected
>> part of this process ...
>> ...
>> Another tool that may help you here is 'git ls-files --sparse -t'. It lists
>> the files in the index and their "tags" ('H' is "normal" tracked files, 'S'
>> is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd
>> expect to be SKIP_WORKTREE is not and vice versa.
> 
> Wonderful.
> 
> Quite honestly, because the code will most likely compile correctly
> if you just remove the unconditional "we first expand the in-core
> index fully" code, and because the "sparse index" makes the existing
> index walking code fail in unexpected and surprising ways, I
> consider it unsuitably harder for people who are not yet familiar
> with the system.  Without a good test coverage (which is hard to
> give unless you are familiar with the code being tested X-<), one
> can easily get confused and lost.

Certainly, 'git mv' is looking to be harder than expected, but there
is a lot of interesting exploration happening in the process.

Thanks for persisting on this one, Shaoxuan!

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-21 15:20               ` Derrick Stolee
@ 2022-03-21 19:14                 ` Junio C Hamano
  2022-03-21 19:45                   ` Derrick Stolee
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2022-03-21 19:14 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Victoria Dye, Shaoxuan Yuan, git

Derrick Stolee <derrickstolee@github.com> writes:

>>> Another tool that may help you here is 'git ls-files --sparse -t'. It lists
>>> the files in the index and their "tags" ('H' is "normal" tracked files, 'S'
>>> is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd
>>> expect to be SKIP_WORKTREE is not and vice versa.
>> 
>> Wonderful.
>> 
>> Quite honestly, because the code will most likely compile correctly
>> if you just remove the unconditional "we first expand the in-core
>> index fully" code, and because the "sparse index" makes the existing
>> index walking code fail in unexpected and surprising ways, I
>> consider it unsuitably harder for people who are not yet familiar
>> with the system.  Without a good test coverage (which is hard to
>> give unless you are familiar with the code being tested X-<), one
>> can easily get confused and lost.
>
> Certainly, 'git mv' is looking to be harder than expected, but there
> is a lot of interesting exploration happening in the process.

Yeah, I know.

I am suprised that it is harder than expected *to* *you*, though.
After having seen a few other topics, I thought that you should know
how deceptively easy to lose "require-full" and how hard to audit
the code that may expect "a flat list of paths" in the in-core index
;-).

> Thanks for persisting on this one, Shaoxuan!

Yes, thanks.  And thanks for mentoring Shaoxuan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-21 19:14                 ` Junio C Hamano
@ 2022-03-21 19:45                   ` Derrick Stolee
  2022-03-22  8:38                     ` Shaoxuan Yuan
  0 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2022-03-21 19:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Victoria Dye, Shaoxuan Yuan, git

On 3/21/2022 3:14 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>>>> Another tool that may help you here is 'git ls-files --sparse -t'. It lists
>>>> the files in the index and their "tags" ('H' is "normal" tracked files, 'S'
>>>> is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd
>>>> expect to be SKIP_WORKTREE is not and vice versa.
>>>
>>> Wonderful.
>>>
>>> Quite honestly, because the code will most likely compile correctly
>>> if you just remove the unconditional "we first expand the in-core
>>> index fully" code, and because the "sparse index" makes the existing
>>> index walking code fail in unexpected and surprising ways, I
>>> consider it unsuitably harder for people who are not yet familiar
>>> with the system.  Without a good test coverage (which is hard to
>>> give unless you are familiar with the code being tested X-<), one
>>> can easily get confused and lost.
>>
>> Certainly, 'git mv' is looking to be harder than expected, but there
>> is a lot of interesting exploration happening in the process.
> 
> Yeah, I know.
> 
> I am suprised that it is harder than expected *to* *you*, though.
> After having seen a few other topics, I thought that you should know
> how deceptively easy to lose "require-full" and how hard to audit
> the code that may expect "a flat list of paths" in the in-core index
> ;-).

I'm particularly surprised in how much 'git mv' doesn't work very
well in the sparse-checkout environment already, which makes things
more difficult than "just" doing the normal sparse index things.

It's good that we are discovering them and working to fix them.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-21 19:45                   ` Derrick Stolee
@ 2022-03-22  8:38                     ` Shaoxuan Yuan
  2022-03-23 13:10                       ` Derrick Stolee
  0 siblings, 1 reply; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-22  8:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Victoria Dye, git

Hi all,

On Tue, Mar 22, 2022 at 3:45 AM Derrick Stolee <derrickstolee@github.com> wrote:
> I'm particularly surprised in how much 'git mv' doesn't work very
> well in the sparse-checkout environment already, which makes things
> more difficult than "just" doing the normal sparse index things.
>
> It's good that we are discovering them and working to fix them.
>
> Thanks,
> -Stolee

Really appreciate the mentoring and tips here, I'm trying to make some progress
now. The problems facing here certainly push me to explore more and know
better about the codebase. Appreciate all the help :-)

-- 
Thanks & Regards,
Shaoxuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-22  8:38                     ` Shaoxuan Yuan
@ 2022-03-23 13:10                       ` Derrick Stolee
  2022-03-23 21:33                         ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2022-03-23 13:10 UTC (permalink / raw)
  To: Shaoxuan Yuan; +Cc: Junio C Hamano, Victoria Dye, git

On 3/22/2022 4:38 AM, Shaoxuan Yuan wrote:
> Hi all,
> 
> On Tue, Mar 22, 2022 at 3:45 AM Derrick Stolee <derrickstolee@github.com> wrote:
>> I'm particularly surprised in how much 'git mv' doesn't work very
>> well in the sparse-checkout environment already, which makes things
>> more difficult than "just" doing the normal sparse index things.
>>
>> It's good that we are discovering them and working to fix them.
>>
>> Thanks,
>> -Stolee
> 
> Really appreciate the mentoring and tips here, I'm trying to make some progress
> now. The problems facing here certainly push me to explore more and know
> better about the codebase. Appreciate all the help :-)

A thought occurred to me while thinking about these difficulties:
perhaps it is better to start with 'git rm' since that does only
half of what 'git mv' does. It should be a smaller lift as a first
contribution. There is even a clear loop that is marked with "TODO:
audit for interaction with sparse-index."

As we've discovered in this thread, the direction for integrating
a builtin with the sparse index should follow this outline:

0. Test the builtin in t1092 with interactions inside and outside
   of the sparse-checkout cone.*

1. Add command_requires_full_index = 0 line to the builtin.

2. Check for failures and diagnose them.

3. Check for index expansion and remove them as necessary.
   (Go back to 2.)

4. Run performance tests.

(*) This step is the one we failed to focus enough on previously.

Of course, if you've already gotten really far on 'mv' and don't
want to switch context, then keep at it.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-23 13:10                       ` Derrick Stolee
@ 2022-03-23 21:33                         ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2022-03-23 21:33 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Derrick Stolee, Shaoxuan Yuan, Victoria Dye, git

Derrick Stolee <derrickstolee@github.com> writes:

> On 3/22/2022 4:38 AM, Shaoxuan Yuan wrote:
>> Hi all,
>> 
>> On Tue, Mar 22, 2022 at 3:45 AM Derrick Stolee <derrickstolee@github.com> wrote:
>>> I'm particularly surprised in how much 'git mv' doesn't work very
>>> well in the sparse-checkout environment already, which makes things
>>> more difficult than "just" doing the normal sparse index things.
>>>
>>> It's good that we are discovering them and working to fix them.
>>>
>>> Thanks,
>>> -Stolee
>> 
>> Really appreciate the mentoring and tips here, I'm trying to make some progress
>> now. The problems facing here certainly push me to explore more and know
>> better about the codebase. Appreciate all the help :-)
>
> A thought occurred to me while thinking about these difficulties:
> perhaps it is better to start with 'git rm' since that does only
> half of what 'git mv' does. It should be a smaller lift as a first
> contribution. There is even a clear loop that is marked with "TODO:
> audit for interaction with sparse-index."
>
> As we've discovered in this thread, the direction for integrating
> a builtin with the sparse index should follow this outline:
>
> 0. Test the builtin in t1092 with interactions inside and outside
>    of the sparse-checkout cone.*
>
> 1. Add command_requires_full_index = 0 line to the builtin.
>
> 2. Check for failures and diagnose them.
>
> 3. Check for index expansion and remove them as necessary.
>    (Go back to 2.)
>
> 4. Run performance tests.
>
> (*) This step is the one we failed to focus enough on previously.

Jonathan, I thought you had a radically different approach that
ought to be much safer than the above---do you want to bring it up
for discussion?

> Of course, if you've already gotten really far on 'mv' and don't
> want to switch context, then keep at it.
>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-17 21:57           ` Victoria Dye
  2022-03-18  1:00             ` Junio C Hamano
@ 2022-03-27  3:48             ` Shaoxuan Yuan
  2022-03-28 13:32               ` Derrick Stolee
  1 sibling, 1 reply; 22+ messages in thread
From: Shaoxuan Yuan @ 2022-03-27  3:48 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Derrick Stolee, git, Junio C Hamano

On Fri, Mar 18, 2022 at 5:57 AM Victoria Dye <vdye@github.com> wrote:

Hi all,

It's been a busy week, I'm sorry that did not have much time to respond.

=================================
A brief summary of my latest investigations:

I think the 'git mv' command still has some questionable aspects, especially
with the 'git sparse-checkout' command. And I feel we have to sort things out
between 'git mv' and sparse-checkout first, then proceed to the issues with
sparse-index.
===========

I'm trying to fix the first 2 of the 3 potential things mentioned earlier.

> 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't
>    fail with "bad source", even though it should.

In this case, 'git mv' does not fail with "bad source" is something expected,
because this error is related to the existence of an on-disk file, not
a directory.
The closest thing that it should fail with, in my opinion, is by
calling the advise
function 'advise_on_updating_sparse_paths'.

With that being said, I now raise the first question: should we change
the sparse-
checkout cone check to be placed at the very beginning of the checking process,
or keep it at the end as a very final check (where it is right now).
My preference is
to place it at the very beginning, since the user should always be
cautious about
touching contents outside of sparse-checkout cone, no matter what.

If a certain move touches out-of-cone stuff, and at the same time it will fail
with the, for example, "destination exists" or "conflicted" error, I
think these errors
should come second, after being supplied with the "--sparse" flag.

> 2. When you try to move a sparse file with 'git mv --sparse', it still
>    fails.

This is also related to the first question, because in this case,
"folder2/a" is not
on-disk, then 'git mv' will fail fast at the first check and ignore
the "--sparse" flag.
Even if we modify the code to place the out-of-cone check at the very beginning,
and supply a "--sparse" flag, we have to make 'git mv' look into the
index to find
a sparse file. And here I raise my second question: by moving a sparse
file, which
is normally not on-disk, we have to alter the original 'git mv' logic
to make it grope
into the index for the missing sparse file (for now it does not care
about the index, except
when it receives a directory as <source>); can we make this change?

--
Thanks & Regards,
Shaoxuan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 1/1] mv: integrate with sparse-index
  2022-03-27  3:48             ` Shaoxuan Yuan
@ 2022-03-28 13:32               ` Derrick Stolee
  0 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2022-03-28 13:32 UTC (permalink / raw)
  To: Shaoxuan Yuan, Victoria Dye; +Cc: git, Junio C Hamano

On 3/26/2022 11:48 PM, Shaoxuan Yuan wrote:
> On Fri, Mar 18, 2022 at 5:57 AM Victoria Dye <vdye@github.com> wrote:
> 
> Hi all,
> 
> It's been a busy week, I'm sorry that did not have much time to respond.
> 
> =================================
> A brief summary of my latest investigations:
> 
> I think the 'git mv' command still has some questionable aspects, especially
> with the 'git sparse-checkout' command. And I feel we have to sort things out
> between 'git mv' and sparse-checkout first, then proceed to the issues with
> sparse-index.
> ===========
> 
> I'm trying to fix the first 2 of the 3 potential things mentioned earlier.
> 
>> 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't
>>    fail with "bad source", even though it should.

>> 2. When you try to move a sparse file with 'git mv --sparse', it still
>>    fails.

I think that these conditions are all related to the perspective as
described in the documentation of 'git mv':

  In the first form, it renames <source>, which **must exist** and be
  either a file, symlink or directory, to <destination>. In the second
  form, the last argument has to be **an existing** directory; the given
  sources will be moved into this directory.

  The index is updated after successful completion, but the change must
  still be committed.

(**emphasis mine**)

So, this documents the perspective that files must exist in the worktree
(and destination directories must exist in the worktree), and after all
of those checks, then the move is staged.

I think part of our issues here is that in the case of a sparse-checkout,
we can have index entries that don't exist in the worktree. The expected
behavior we are considering is that 'git mv' should stage the movement
of the file in the index, ignoring the worktree for paths outside of the
sparse-checkout definition.

One way to do this would be to flip the implementation's direction:
perform the index operation of moving the cache entry, then update the
worktree to reflect that change (if necessary).

The case that I can think about being a bit strange is if the user has
an unstaged deletion of the source file, then runs 'git mv'. Since the
worktree is missing the file, then we cannot do the equivalent 'mv'
operation in the worktree.

One other thing to keep in mind is that 'git mv <source> <destination>'
can act like 'mv <source> <destination> && git add <destination>' if
<source> is an untracked path. So, 'git mv' can succeed even if the
source is not in the index!

So the change here is to ignore a non-existing path when the same path
exists as a cache entry with the SKIP_WORKTREE bit. That bit does say
"ignore the worktree" so 'git mv' isn't doing the right thing already.

---

In conclusion, there might be multiple ways forward here:

 1. Keep the expectation that <source> is in the worktree as given,
    and let a tracked <source> outside of the sparse-checkout cone
    result in a failure (as it currently does). Consider adding an
    advice message if <source> is a tracked, sparse path.

 2. Change the expectation to be that <source> must either be a
    file in the worktree _or_ a tracked, sparse path (or both).

The nice thing here is that we can do (1) and then (2) later. The
key things to investigate for the sparse index, then are what
happens when <destination> is outside of the sparse-checkout cone?
I imagine it would be fine if the moved file still appears in the
worktree and is cleaned up by a later 'git sparse-checkout reapply'.

I'm eager to here alternate opinions and options on this topic.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-03-28 13:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-15 10:01 [RFC PATCH 0/1] mv: integrate with sparse-index Shaoxuan Yuan
2022-03-15 10:01 ` [RFC PATCH 1/1] " Shaoxuan Yuan
2022-03-15 16:07   ` Victoria Dye
2022-03-15 17:14     ` Derrick Stolee
2022-03-16  3:29       ` Shaoxuan Yuan
2022-03-17  8:37       ` Shaoxuan Yuan
2022-03-16  3:18     ` Shaoxuan Yuan
2022-03-16 10:45     ` Shaoxuan Yuan
2022-03-16 13:34       ` Derrick Stolee
2022-03-16 14:46         ` Shaoxuan Yuan
2022-03-17 21:57           ` Victoria Dye
2022-03-18  1:00             ` Junio C Hamano
2022-03-21 15:20               ` Derrick Stolee
2022-03-21 19:14                 ` Junio C Hamano
2022-03-21 19:45                   ` Derrick Stolee
2022-03-22  8:38                     ` Shaoxuan Yuan
2022-03-23 13:10                       ` Derrick Stolee
2022-03-23 21:33                         ` Junio C Hamano
2022-03-27  3:48             ` Shaoxuan Yuan
2022-03-28 13:32               ` Derrick Stolee
2022-03-15 17:23   ` Junio C Hamano
2022-03-15 20:00     ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).