git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* [PATCH] rm: honor sparse checkout patterns
@ 2020-11-12 21:01 Matheus Tavares
  2020-11-12 23:54 ` Elijah Newren
                   ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Matheus Tavares @ 2020-11-12 21:01 UTC (permalink / raw)
  To: git; +Cc: stolee, newren

Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
operation to the paths that match both the command line pathspecs and
the repository's sparsity patterns. This better matches the expectations
of users with sparse-checkout definitions, while still allowing them
to optionally enable the old behavior with 'sparse.restrictCmds=false'
or the global '--no-restrict-to-sparse-paths' option.

Suggested-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

This is based on mt/grep-sparse-checkout.
Original feature request: https://github.com/gitgitgadget/git/issues/786

 Documentation/config/sparse.txt  |  3 ++-
 Documentation/git-rm.txt         |  9 +++++++++
 builtin/rm.c                     |  7 ++++++-
 t/t3600-rm.sh                    | 22 ++++++++++++++++++++++
 t/t7011-skip-worktree-reading.sh |  5 -----
 5 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
index 494761526e..79d7d173e9 100644
--- a/Documentation/config/sparse.txt
+++ b/Documentation/config/sparse.txt
@@ -12,7 +12,8 @@ When this option is true (default), some git commands may limit their behavior
 to the paths specified by the sparsity patterns, or to the intersection of
 those paths and any (like `*.c`) that the user might also specify on the
 command line. When false, the affected commands will work on full trees,
-ignoring the sparsity patterns. For now, only git-grep honors this setting.
+ignoring the sparsity patterns. For now, only git-grep and git-rm honor this
+setting.
 +
 Note: commands which export, integrity check, or create history will always
 operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.),
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index ab750367fd..25dda8ff35 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -25,6 +25,15 @@ When `--cached` is given, the staged content has to
 match either the tip of the branch or the file on disk,
 allowing the file to be removed from just the index.
 
+CONFIGURATION
+-------------
+
+sparse.restrictCmds::
+	By default, git-rm only matches and removes paths within the
+	sparse-checkout patterns. This behavior can be changed with the
+	`sparse.restrictCmds` setting or the global
+	`--no-restrict-to-sparse-paths` option. For more details, see the
+	full `sparse.restrictCmds` definition in linkgit:git-config[1].
 
 OPTIONS
 -------
diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f..e1fe71c321 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -14,6 +14,7 @@
 #include "string-list.h"
 #include "submodule.h"
 #include "pathspec.h"
+#include "sparse-checkout.h"
 
 static const char * const builtin_rm_usage[] = {
 	N_("git rm [<options>] [--] <file>..."),
@@ -254,7 +255,7 @@ static struct option builtin_rm_options[] = {
 int cmd_rm(int argc, const char **argv, const char *prefix)
 {
 	struct lock_file lock_file = LOCK_INIT;
-	int i;
+	int i, sparse_paths_only;
 	struct pathspec pathspec;
 	char *seen;
 
@@ -293,8 +294,12 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	seen = xcalloc(pathspec.nr, 1);
 
+	sparse_paths_only = restrict_to_sparse_paths(the_repository);
+
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
+		if (sparse_paths_only && ce_skip_worktree(ce))
+			continue;
 		if (!ce_path_match(&the_index, ce, &pathspec, seen))
 			continue;
 		ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index efec8d13b6..7bf55b42eb 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -892,4 +892,26 @@ test_expect_success 'rm empty string should fail' '
 	test_must_fail git rm -rf ""
 '
 
+test_expect_success 'rm should respect --[no]-restrict-to-sparse-paths' '
+	git init sparse-repo &&
+	(
+		cd sparse-repo &&
+		touch a b c &&
+		git add -A &&
+		git commit -m files &&
+		git sparse-checkout set "/a" &&
+
+		# By default, it should not rm paths outside the sparse-checkout
+		test_must_fail git rm b 2>stderr &&
+		test_i18ngrep "fatal: pathspec .b. did not match any files" stderr &&
+
+		# But it should rm them with --no-restrict-to-sparse-paths
+		git --no-restrict-to-sparse-paths rm b &&
+
+		# And also with sparse.restrictCmds=false
+		git reset &&
+		git -c sparse.restrictCmds=false rm b
+	)
+'
+
 test_done
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 26852586ac..1761a2b1b9 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -132,11 +132,6 @@ test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
 	test -z "$(git diff-files -- one)"
 '
 
-test_expect_success 'git-rm succeeds on skip-worktree absent entries' '
-	setup_absent &&
-	git rm 1
-'
-
 test_expect_success 'commit on skip-worktree absent entries' '
 	git reset &&
 	setup_absent &&
-- 
2.28.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-12 21:01 [PATCH] rm: honor sparse checkout patterns Matheus Tavares
@ 2020-11-12 23:54 ` Elijah Newren
  2020-11-13 13:47   ` Derrick Stolee
  2020-11-16 13:58 ` [PATCH v2] " Matheus Tavares
  2020-11-16 20:14 ` [PATCH] rm: honor sparse checkout patterns Junio C Hamano
  2 siblings, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2020-11-12 23:54 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Git Mailing List, Derrick Stolee

Hi,

On Thu, Nov 12, 2020 at 1:02 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
> operation to the paths that match both the command line pathspecs and
> the repository's sparsity patterns. This better matches the expectations
> of users with sparse-checkout definitions, while still allowing them
> to optionally enable the old behavior with 'sparse.restrictCmds=false'
> or the global '--no-restrict-to-sparse-paths' option.

(For Stolee:) Did this arise when a user specified a directory to
delete, and a (possibly small) part of that directory was in the
sparse checkout while other portions of it were outside?

I can easily see users thinking they are dealing with just the files
relevant to them, and expecting the directory deletion to only affect
that relevant subset, so this seems like a great idea.  We'd just want
to make sure we have a good error message if they explicitly list a
single path outside the sparse checkout.

> Suggested-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>
> This is based on mt/grep-sparse-checkout.
> Original feature request: https://github.com/gitgitgadget/git/issues/786
>
>  Documentation/config/sparse.txt  |  3 ++-
>  Documentation/git-rm.txt         |  9 +++++++++
>  builtin/rm.c                     |  7 ++++++-
>  t/t3600-rm.sh                    | 22 ++++++++++++++++++++++
>  t/t7011-skip-worktree-reading.sh |  5 -----
>  5 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> index 494761526e..79d7d173e9 100644
> --- a/Documentation/config/sparse.txt
> +++ b/Documentation/config/sparse.txt
> @@ -12,7 +12,8 @@ When this option is true (default), some git commands may limit their behavior
>  to the paths specified by the sparsity patterns, or to the intersection of
>  those paths and any (like `*.c`) that the user might also specify on the
>  command line. When false, the affected commands will work on full trees,
> -ignoring the sparsity patterns. For now, only git-grep honors this setting.
> +ignoring the sparsity patterns. For now, only git-grep and git-rm honor this
> +setting.
>  +
>  Note: commands which export, integrity check, or create history will always
>  operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.),
> diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
> index ab750367fd..25dda8ff35 100644
> --- a/Documentation/git-rm.txt
> +++ b/Documentation/git-rm.txt
> @@ -25,6 +25,15 @@ When `--cached` is given, the staged content has to
>  match either the tip of the branch or the file on disk,
>  allowing the file to be removed from just the index.
>
> +CONFIGURATION
> +-------------
> +
> +sparse.restrictCmds::
> +       By default, git-rm only matches and removes paths within the
> +       sparse-checkout patterns. This behavior can be changed with the
> +       `sparse.restrictCmds` setting or the global
> +       `--no-restrict-to-sparse-paths` option. For more details, see the
> +       full `sparse.restrictCmds` definition in linkgit:git-config[1].

Hmm, I wonder what people will think who are reading through the
manual and have never used sparse-checkout.  This seems prone to
confusion for them.  Maybe instead we could word this as:

When sparse-checkouts are in use, by default git-rm will only match
and remove paths within the sparse-checkout patterns...

>
>  OPTIONS
>  -------
> diff --git a/builtin/rm.c b/builtin/rm.c
> index 4858631e0f..e1fe71c321 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -14,6 +14,7 @@
>  #include "string-list.h"
>  #include "submodule.h"
>  #include "pathspec.h"
> +#include "sparse-checkout.h"
>
>  static const char * const builtin_rm_usage[] = {
>         N_("git rm [<options>] [--] <file>..."),
> @@ -254,7 +255,7 @@ static struct option builtin_rm_options[] = {
>  int cmd_rm(int argc, const char **argv, const char *prefix)
>  {
>         struct lock_file lock_file = LOCK_INIT;
> -       int i;
> +       int i, sparse_paths_only;
>         struct pathspec pathspec;
>         char *seen;
>
> @@ -293,8 +294,12 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>
>         seen = xcalloc(pathspec.nr, 1);
>
> +       sparse_paths_only = restrict_to_sparse_paths(the_repository);
> +
>         for (i = 0; i < active_nr; i++) {
>                 const struct cache_entry *ce = active_cache[i];
> +               if (sparse_paths_only && ce_skip_worktree(ce))
> +                       continue;
>                 if (!ce_path_match(&the_index, ce, &pathspec, seen))
>                         continue;
>                 ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
> diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
> index efec8d13b6..7bf55b42eb 100755
> --- a/t/t3600-rm.sh
> +++ b/t/t3600-rm.sh
> @@ -892,4 +892,26 @@ test_expect_success 'rm empty string should fail' '
>         test_must_fail git rm -rf ""
>  '
>
> +test_expect_success 'rm should respect --[no]-restrict-to-sparse-paths' '
> +       git init sparse-repo &&
> +       (
> +               cd sparse-repo &&
> +               touch a b c &&
> +               git add -A &&
> +               git commit -m files &&
> +               git sparse-checkout set "/a" &&
> +
> +               # By default, it should not rm paths outside the sparse-checkout
> +               test_must_fail git rm b 2>stderr &&
> +               test_i18ngrep "fatal: pathspec .b. did not match any files" stderr &&

Ah, this answers my question about whether the user gets an error
message when they explicitly call out a single path outside the sparse
checkout.  I'm curious if we want to be slightly more verbose on the
error message when sparse-checkouts are in effect.  In particular, if
no paths match the sparsity patterns, but some paths would have
matched the pathspec ignoring the sparsity patterns, then perhaps the
error message should include a reference to the
--no-restrict-to-sparse-paths flag.

> +
> +               # But it should rm them with --no-restrict-to-sparse-paths
> +               git --no-restrict-to-sparse-paths rm b &&
> +
> +               # And also with sparse.restrictCmds=false
> +               git reset &&
> +               git -c sparse.restrictCmds=false rm b
> +       )
> +'
> +
>  test_done

Do we also want to include a testcase where the user specifies a
directory and part of that directory is within the sparsity paths and
part is out?  E.g.  'git sparse-checkout set /sub/dir && git rm -r
sub' ?

> diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
> index 26852586ac..1761a2b1b9 100755
> --- a/t/t7011-skip-worktree-reading.sh
> +++ b/t/t7011-skip-worktree-reading.sh
> @@ -132,11 +132,6 @@ test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
>         test -z "$(git diff-files -- one)"
>  '
>
> -test_expect_success 'git-rm succeeds on skip-worktree absent entries' '
> -       setup_absent &&
> -       git rm 1
> -'
> -
>  test_expect_success 'commit on skip-worktree absent entries' '
>         git reset &&
>         setup_absent &&
> --
> 2.28.0

Sweet, nice and simple.  Thanks for sending this in; I think it'll be very nice.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-12 23:54 ` Elijah Newren
@ 2020-11-13 13:47   ` Derrick Stolee
  2020-11-15 20:12     ` Matheus Tavares Bernardino
  2020-11-16 14:30     ` Jeff Hostetler
  0 siblings, 2 replies; 56+ messages in thread
From: Derrick Stolee @ 2020-11-13 13:47 UTC (permalink / raw)
  To: Elijah Newren, Matheus Tavares; +Cc: Git Mailing List

On 11/12/2020 6:54 PM, Elijah Newren wrote:
> Hi,
> 
> On Thu, Nov 12, 2020 at 1:02 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>>
>> Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
>> operation to the paths that match both the command line pathspecs and
>> the repository's sparsity patterns. This better matches the expectations
>> of users with sparse-checkout definitions, while still allowing them
>> to optionally enable the old behavior with 'sparse.restrictCmds=false'
>> or the global '--no-restrict-to-sparse-paths' option.
> 
> (For Stolee:) Did this arise when a user specified a directory to
> delete, and a (possibly small) part of that directory was in the
> sparse checkout while other portions of it were outside?

The user who suggested this used a command like 'git rm */*.csprojx' to
remove all paths with that file extension, but then realized that they
were deleting all of those files from the entire repo, not just the
current sparse-checkout.

> I can easily see users thinking they are dealing with just the files
> relevant to them, and expecting the directory deletion to only affect
> that relevant subset, so this seems like a great idea.  We'd just want
> to make sure we have a good error message if they explicitly list a
> single path outside the sparse checkout.

We should definitely consider how to make this more usable for users
who operate within a sparse-checkout but try to modify files outside
the sparse-checkout.

Is there a warning message such as "the supplied pathspec doesn't
match any known file" that we could extend to recommend possibly
disabling the sparse.restrictCmds config? (I see that you identify
one below.)

>> +CONFIGURATION
>> +-------------
>> +
>> +sparse.restrictCmds::
>> +       By default, git-rm only matches and removes paths within the
>> +       sparse-checkout patterns. This behavior can be changed with the
>> +       `sparse.restrictCmds` setting or the global
>> +       `--no-restrict-to-sparse-paths` option. For more details, see the
>> +       full `sparse.restrictCmds` definition in linkgit:git-config[1].
> 
> Hmm, I wonder what people will think who are reading through the
> manual and have never used sparse-checkout.  This seems prone to
> confusion for them.  Maybe instead we could word this as:
> 
> When sparse-checkouts are in use, by default git-rm will only match
> and remove paths within the sparse-checkout patterns...

A preface such as "When using sparse-checkouts..." can help users
ignore these config settings if they are unfamiliar with the
concept.
>> @@ -293,8 +294,12 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>>
>>         seen = xcalloc(pathspec.nr, 1);
>>
>> +       sparse_paths_only = restrict_to_sparse_paths(the_repository);
>> +
>>         for (i = 0; i < active_nr; i++) {
>>                 const struct cache_entry *ce = active_cache[i];
>> +               if (sparse_paths_only && ce_skip_worktree(ce))
>> +                       continue;
>>                 if (!ce_path_match(&the_index, ce, &pathspec, seen))
>>                         continue;
>>                 ALLOC_GROW(list.entry, list.nr + 1, list.alloc);

This seems like an incredibly simple implementation! Excellent.

>> +test_expect_success 'rm should respect --[no]-restrict-to-sparse-paths' '
>> +       git init sparse-repo &&
>> +       (
>> +               cd sparse-repo &&
>> +               touch a b c &&
>> +               git add -A &&
>> +               git commit -m files &&
>> +               git sparse-checkout set "/a" &&
>> +
>> +               # By default, it should not rm paths outside the sparse-checkout
>> +               test_must_fail git rm b 2>stderr &&
>> +               test_i18ngrep "fatal: pathspec .b. did not match any files" stderr &&
> 
> Ah, this answers my question about whether the user gets an error
> message when they explicitly call out a single path outside the sparse
> checkout.  I'm curious if we want to be slightly more verbose on the
> error message when sparse-checkouts are in effect.  In particular, if
> no paths match the sparsity patterns, but some paths would have
> matched the pathspec ignoring the sparsity patterns, then perhaps the
> error message should include a reference to the
> --no-restrict-to-sparse-paths flag.

The error message could be modified similar to below:

if (!seen[i]) {
	if (!ignore_unmatch) {
		die(_("pathspec '%s' did not match any files%s"),
			original,
			sparse_paths_only
				? _("; disable sparse.restrictCmds if you intend to edit outside the current sparse-checkout definition")
				: "");
	}
}

>> +
>> +               # But it should rm them with --no-restrict-to-sparse-paths
>> +               git --no-restrict-to-sparse-paths rm b &&
>> +
>> +               # And also with sparse.restrictCmds=false
>> +               git reset &&
>> +               git -c sparse.restrictCmds=false rm b
>> +       )
>> +'
>> +
>>  test_done
> 
> Do we also want to include a testcase where the user specifies a
> directory and part of that directory is within the sparsity paths and
> part is out?  E.g.  'git sparse-checkout set /sub/dir && git rm -r
> sub' ?

That is definitely an interesting case. I'm not sure the current
implementation will do the "right" thing here. Definitely worth
testing, and it might require a more complicated implementation.

>> diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
>> index 26852586ac..1761a2b1b9 100755
>> --- a/t/t7011-skip-worktree-reading.sh
>> +++ b/t/t7011-skip-worktree-reading.sh
>> @@ -132,11 +132,6 @@ test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
>>         test -z "$(git diff-files -- one)"
>>  '
>>
>> -test_expect_success 'git-rm succeeds on skip-worktree absent entries' '
>> -       setup_absent &&
>> -       git rm 1
>> -'
>> -

Instead of deleting this case, perhaps we should just use "-c sparse.restrictCmds=false"
in the 'git rm' command, so we are still testing this case?

Thanks again! I appreciate that you jumped on this suggestion.

-Stolee


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-13 13:47   ` Derrick Stolee
@ 2020-11-15 20:12     ` Matheus Tavares Bernardino
  2020-11-15 21:42       ` Johannes Sixt
  2020-11-16 14:30     ` Jeff Hostetler
  1 sibling, 1 reply; 56+ messages in thread
From: Matheus Tavares Bernardino @ 2020-11-15 20:12 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren, Git Mailing List

Hi, Stolee and Elijah

Thank you both for the comments. I'll try to send v2 soon.

On Fri, Nov 13, 2020 at 10:47 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/12/2020 6:54 PM, Elijah Newren wrote:
> >
> > Do we also want to include a testcase where the user specifies a
> > directory and part of that directory is within the sparsity paths and
> > part is out?  E.g.  'git sparse-checkout set /sub/dir && git rm -r
> > sub' ?
>
> That is definitely an interesting case.

I've added the test [1], but it's failing on Windows and I'm not quite
sure why. The trash dir artifact shows that `git sparse-checkout set
/sub/dir` produced the following path on the sparse-checkout file:
"D:/a/git/git/git-sdk-64-minimal/sub/dir".

If I change the setup cmd to `git sparse-checkout set sub/dir` (i.e.
without the root slash), it works as expected. Could this be a bug, or
am I missing something?

[1]: https://github.com/matheustavares/git/commit/656bffa1793ce86b638d7ad1da2452103ce8b14b#diff-69312bb98fb0cf46e6906e3384c11529f3f04713d331a85d67fc77a3e43944f9R919

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-15 20:12     ` Matheus Tavares Bernardino
@ 2020-11-15 21:42       ` Johannes Sixt
  2020-11-16 12:37         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 56+ messages in thread
From: Johannes Sixt @ 2020-11-15 21:42 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee
  Cc: Elijah Newren, Git Mailing List

Am 15.11.20 um 21:12 schrieb Matheus Tavares Bernardino:
> Thank you both for the comments. I'll try to send v2 soon.
> 
> On Fri, Nov 13, 2020 at 10:47 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 11/12/2020 6:54 PM, Elijah Newren wrote:
>>>
>>> Do we also want to include a testcase where the user specifies a
>>> directory and part of that directory is within the sparsity paths and
>>> part is out?  E.g.  'git sparse-checkout set /sub/dir && git rm -r
>>> sub' ?
>>
>> That is definitely an interesting case.
> 
> I've added the test [1], but it's failing on Windows and I'm not quite
> sure why. The trash dir artifact shows that `git sparse-checkout set
> /sub/dir` produced the following path on the sparse-checkout file:
> "D:/a/git/git/git-sdk-64-minimal/sub/dir".

If 'git sparse-checkout' is run from a bash command line, I would not be 
surprised if the absolute path is munched in the way that you observe, 
provided that D:/a/git/git/git-sdk-64-minimal is where your MinGW 
subsystem is located. I that the case?

> If I change the setup cmd to `git sparse-checkout set sub/dir` (i.e.
> without the root slash), it works as expected. Could this be a bug, or
> am I missing something?

Not a bug, unless tell us that D:/a/git/git/git-sdk-64-minimal is a 
completely random path in your system or does not even exist.

-- Hannes

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-15 21:42       ` Johannes Sixt
@ 2020-11-16 12:37         ` Matheus Tavares Bernardino
  2020-11-23 13:23           ` Johannes Schindelin
  0 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares Bernardino @ 2020-11-16 12:37 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Derrick Stolee, Elijah Newren, Git Mailing List

On Sun, Nov 15, 2020 at 6:42 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 15.11.20 um 21:12 schrieb Matheus Tavares Bernardino:
> > Thank you both for the comments. I'll try to send v2 soon.
> >
> > On Fri, Nov 13, 2020 at 10:47 AM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 11/12/2020 6:54 PM, Elijah Newren wrote:
> >>>
> >>> Do we also want to include a testcase where the user specifies a
> >>> directory and part of that directory is within the sparsity paths and
> >>> part is out?  E.g.  'git sparse-checkout set /sub/dir && git rm -r
> >>> sub' ?
> >>
> >> That is definitely an interesting case.
> >
> > I've added the test [1], but it's failing on Windows and I'm not quite
> > sure why. The trash dir artifact shows that `git sparse-checkout set
> > /sub/dir` produced the following path on the sparse-checkout file:
> > "D:/a/git/git/git-sdk-64-minimal/sub/dir".
>
> If 'git sparse-checkout' is run from a bash command line, I would not be
> surprised if the absolute path is munched in the way that you observe,
> provided that D:/a/git/git/git-sdk-64-minimal is where your MinGW
> subsystem is located. I that the case?

Yeah, that must be it, thanks. I didn't run the command myself as I'm
not on Windows, but D:/a/git/git/git-sdk-64-minimal must be the path
where MinGW was installed by our GitHub Actions script, then. I'll use
"sub/dir" without the root slash in t3600 to avoid the conversion.
Thanks again!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2] rm: honor sparse checkout patterns
  2020-11-12 21:01 [PATCH] rm: honor sparse checkout patterns Matheus Tavares
  2020-11-12 23:54 ` Elijah Newren
@ 2020-11-16 13:58 ` Matheus Tavares
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
  2020-11-16 20:14 ` [PATCH] rm: honor sparse checkout patterns Junio C Hamano
  2 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2020-11-16 13:58 UTC (permalink / raw)
  To: git; +Cc: stolee, newren

Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
operation to the paths that match both the command line pathspecs and
the repository's sparsity patterns. This better matches the expectations
of users with sparse-checkout definitions, while still allowing them
to optionally enable the old behavior with 'sparse.restrictCmds=false'
or the global '--no-restrict-to-sparse-paths' option.

Suggested-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---

Based on mt/grep-sparse-checkout.

Changes since v1:
- Reworded git-rm docs to avoid confusion for those who never used
  sparse-checkout.
- Included an advice about disabling sparse.restrictCmds when the
  given pathspec doesn't match any files and sparse-checkout is enabled.
- Added test for `git rm -r` removing a dir that is only partially
  included in the sparse-checkout.
- Adjusted test in t7011 to use `-c sparse.restrictCmds=false`, instead
  of removing it.

 Documentation/config/sparse.txt  |  3 +-
 Documentation/git-rm.txt         | 10 +++++++
 builtin/rm.c                     | 24 ++++++++++------
 t/t3600-rm.sh                    | 47 ++++++++++++++++++++++++++++++++
 t/t7011-skip-worktree-reading.sh |  4 +--
 5 files changed, 77 insertions(+), 11 deletions(-)

diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
index 494761526e..79d7d173e9 100644
--- a/Documentation/config/sparse.txt
+++ b/Documentation/config/sparse.txt
@@ -12,7 +12,8 @@ When this option is true (default), some git commands may limit their behavior
 to the paths specified by the sparsity patterns, or to the intersection of
 those paths and any (like `*.c`) that the user might also specify on the
 command line. When false, the affected commands will work on full trees,
-ignoring the sparsity patterns. For now, only git-grep honors this setting.
+ignoring the sparsity patterns. For now, only git-grep and git-rm honor this
+setting.
 +
 Note: commands which export, integrity check, or create history will always
 operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.),
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index ab750367fd..33bec8c249 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -25,6 +25,16 @@ When `--cached` is given, the staged content has to
 match either the tip of the branch or the file on disk,
 allowing the file to be removed from just the index.
 
+CONFIGURATION
+-------------
+
+sparse.restrictCmds::
+	When sparse-checkouts are in use, by default git-rm will only
+	match and remove paths within the sparse-checkout patterns.
+	This behavior can be changed with the `sparse.restrictCmds`
+	setting or the global `--no-restrict-to-sparse-paths` option.
+	For more details, see the full `sparse.restrictCmds` definition
+	in linkgit:git-config[1].
 
 OPTIONS
 -------
diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f..90f6bb4cae 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -14,6 +14,7 @@
 #include "string-list.h"
 #include "submodule.h"
 #include "pathspec.h"
+#include "sparse-checkout.h"
 
 static const char * const builtin_rm_usage[] = {
 	N_("git rm [<options>] [--] <file>..."),
@@ -254,7 +255,7 @@ static struct option builtin_rm_options[] = {
 int cmd_rm(int argc, const char **argv, const char *prefix)
 {
 	struct lock_file lock_file = LOCK_INIT;
-	int i;
+	int i, sparse_paths_only;
 	struct pathspec pathspec;
 	char *seen;
 
@@ -293,8 +294,13 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	seen = xcalloc(pathspec.nr, 1);
 
+	sparse_paths_only = core_apply_sparse_checkout &&
+			    restrict_to_sparse_paths(the_repository);
+
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
+		if (sparse_paths_only && ce_skip_worktree(ce))
+			continue;
 		if (!ce_path_match(&the_index, ce, &pathspec, seen))
 			continue;
 		ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
@@ -310,14 +316,16 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 		int seen_any = 0;
 		for (i = 0; i < pathspec.nr; i++) {
 			original = pathspec.items[i].original;
-			if (!seen[i]) {
-				if (!ignore_unmatch) {
-					die(_("pathspec '%s' did not match any files"),
-					    original);
-				}
-			}
-			else {
+			if (seen[i]) {
 				seen_any = 1;
+			} else if (!ignore_unmatch) {
+				const char *sparse_config_advice =
+					_("; disable sparse.restrictCmds if you intend to edit"
+					  " outside the current sparse-checkout definition");
+
+				die(_("pathspec '%s' did not match any files%s"),
+				    original,
+				    sparse_paths_only ? sparse_config_advice : "");
 			}
 			if (!recursive && seen[i] == MATCHED_RECURSIVELY)
 				die(_("not removing '%s' recursively without -r"),
diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index efec8d13b6..25cd7187fa 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -892,4 +892,51 @@ test_expect_success 'rm empty string should fail' '
 	test_must_fail git rm -rf ""
 '
 
+test_expect_success 'rm should respect --[no]-restrict-to-sparse-paths' '
+	git init sparse-repo &&
+	(
+		cd sparse-repo &&
+		touch a b c &&
+		git add -A &&
+		git commit -m files &&
+		git sparse-checkout set "/a" &&
+
+		# By default, it should not rm paths outside the sparse-checkout
+		test_must_fail git rm b 2>stderr &&
+		test_i18ngrep "fatal: pathspec .b. did not match any files" stderr &&
+		test_i18ngrep "disable sparse.restrictCmds if you intend to edit outside" stderr &&
+
+		# But it should rm them with --no-restrict-to-sparse-paths
+		git --no-restrict-to-sparse-paths rm b &&
+
+		# And also with sparse.restrictCmds=false
+		git reset &&
+		git -c sparse.restrictCmds=false rm b
+	)
+'
+
+test_expect_success 'recursive rm should respect --[no]-restrict-to-sparse-paths' '
+	git init sparse-repo-2 &&
+	(
+		cd sparse-repo-2 &&
+		mkdir -p sub/dir &&
+		touch sub/f1 sub/dir/f2 &&
+		git add -A &&
+		git commit -m files &&
+		git sparse-checkout set "sub/dir" &&
+
+		git rm -r sub &&
+		echo "D  sub/dir/f2" >expected &&
+		git status --porcelain -uno >actual &&
+		test_cmp expected actual &&
+
+		git reset &&
+		git --no-restrict-to-sparse-paths rm -r sub &&
+		echo "D  sub/dir/f2" >expected-no-restrict &&
+		echo "D  sub/f1"     >>expected-no-restrict &&
+		git status --porcelain -uno >actual-no-restrict &&
+		test_cmp expected-no-restrict actual-no-restrict
+	)
+'
+
 test_done
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 26852586ac..08ede90e14 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -132,9 +132,9 @@ test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
 	test -z "$(git diff-files -- one)"
 '
 
-test_expect_success 'git-rm succeeds on skip-worktree absent entries' '
+test_expect_success 'git-rm succeeds on skip-worktree absent entries when sparse.restrictCmds=false' '
 	setup_absent &&
-	git rm 1
+	git -c sparse.restrictCmds=false rm 1
 '
 
 test_expect_success 'commit on skip-worktree absent entries' '
-- 
2.28.0


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-13 13:47   ` Derrick Stolee
  2020-11-15 20:12     ` Matheus Tavares Bernardino
@ 2020-11-16 14:30     ` Jeff Hostetler
  2020-11-17  4:53       ` Elijah Newren
  1 sibling, 1 reply; 56+ messages in thread
From: Jeff Hostetler @ 2020-11-16 14:30 UTC (permalink / raw)
  To: Derrick Stolee, Elijah Newren, Matheus Tavares; +Cc: Git Mailing List



On 11/13/20 8:47 AM, Derrick Stolee wrote:
> On 11/12/2020 6:54 PM, Elijah Newren wrote:
>> Hi,
>>
>> On Thu, Nov 12, 2020 at 1:02 PM Matheus Tavares
>> <matheus.bernardino@usp.br> wrote:
>>>
>>> Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
>>> operation to the paths that match both the command line pathspecs and
>>> the repository's sparsity patterns. This better matches the expectations
>>> of users with sparse-checkout definitions, while still allowing them
>>> to optionally enable the old behavior with 'sparse.restrictCmds=false'
>>> or the global '--no-restrict-to-sparse-paths' option.
>>
>> (For Stolee:) Did this arise when a user specified a directory to
>> delete, and a (possibly small) part of that directory was in the
>> sparse checkout while other portions of it were outside?
> 
> The user who suggested this used a command like 'git rm */*.csprojx' to
> remove all paths with that file extension, but then realized that they
> were deleting all of those files from the entire repo, not just the
> current sparse-checkout.

Aren't the wildcards expanded by the shell before the command
line is given to Git?  So the Git command should only receive
command line args that actually match existing files, right??

Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-12 21:01 [PATCH] rm: honor sparse checkout patterns Matheus Tavares
  2020-11-12 23:54 ` Elijah Newren
  2020-11-16 13:58 ` [PATCH v2] " Matheus Tavares
@ 2020-11-16 20:14 ` Junio C Hamano
  2020-11-17  5:20   ` Elijah Newren
  2 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2020-11-16 20:14 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, stolee, newren

Matheus Tavares <matheus.bernardino@usp.br> writes:

> Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
> operation to the paths that match both the command line pathspecs and
> the repository's sparsity patterns.

> This better matches the expectations
> of users with sparse-checkout definitions, while still allowing them
> to optionally enable the old behavior with 'sparse.restrictCmds=false'
> or the global '--no-restrict-to-sparse-paths' option.

Hmph.  Is "rm" the only oddball that ignores the sparse setting?

>  to the paths specified by the sparsity patterns, or to the intersection of
>  those paths and any (like `*.c`) that the user might also specify on the
>  command line. When false, the affected commands will work on full trees,
> -ignoring the sparsity patterns. For now, only git-grep honors this setting.
> +ignoring the sparsity patterns. For now, only git-grep and git-rm honor this
> +setting.

I am not sure if this is a good direction to go---can we make an
inventory of all commands that affect working tree files and see
which ones need the same treatment before going forward with just
"grep" and "rm"?  Documenting the decision on the ones that will not
get the same treatment may also be a good idea.  What I am aiming
for is to prevent users from having to know in which versions of Git
they can rely on the sparsity patterns with what commands, and doing
things piecemeal like these two topics would be a road to confusion.

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-16 14:30     ` Jeff Hostetler
@ 2020-11-17  4:53       ` Elijah Newren
  0 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2020-11-17  4:53 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Derrick Stolee, Matheus Tavares, Git Mailing List

On Mon, Nov 16, 2020 at 6:30 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
> On 11/13/20 8:47 AM, Derrick Stolee wrote:
> > On 11/12/2020 6:54 PM, Elijah Newren wrote:
> >> Hi,
> >>
> >> On Thu, Nov 12, 2020 at 1:02 PM Matheus Tavares
> >> <matheus.bernardino@usp.br> wrote:
> >>>
> >>> Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
> >>> operation to the paths that match both the command line pathspecs and
> >>> the repository's sparsity patterns. This better matches the expectations
> >>> of users with sparse-checkout definitions, while still allowing them
> >>> to optionally enable the old behavior with 'sparse.restrictCmds=false'
> >>> or the global '--no-restrict-to-sparse-paths' option.
> >>
> >> (For Stolee:) Did this arise when a user specified a directory to
> >> delete, and a (possibly small) part of that directory was in the
> >> sparse checkout while other portions of it were outside?
> >
> > The user who suggested this used a command like 'git rm */*.csprojx' to
> > remove all paths with that file extension, but then realized that they
> > were deleting all of those files from the entire repo, not just the
> > current sparse-checkout.
>
> Aren't the wildcards expanded by the shell before the command
> line is given to Git?  So the Git command should only receive
> command line args that actually match existing files, right??

Good point.  I suspect, though, that the issue may still be a problem
if the user were to quote the wildcards; that may have been what
happened and the reporting of the case just lost them somewhere along
the way.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-16 20:14 ` [PATCH] rm: honor sparse checkout patterns Junio C Hamano
@ 2020-11-17  5:20   ` Elijah Newren
  2020-11-20 17:06     ` Elijah Newren
  0 siblings, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2020-11-17  5:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matheus Tavares, Git Mailing List, Derrick Stolee

Hi,

On Mon, Nov 16, 2020 at 12:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
> > operation to the paths that match both the command line pathspecs and
> > the repository's sparsity patterns.
>
> > This better matches the expectations
> > of users with sparse-checkout definitions, while still allowing them
> > to optionally enable the old behavior with 'sparse.restrictCmds=false'
> > or the global '--no-restrict-to-sparse-paths' option.
>
> Hmph.  Is "rm" the only oddball that ignores the sparse setting?

This might make you much less happy, but in general none of the
commands pay attention to the setting; I think a line or two in
merge-recursive.c is the only part of the codebase outside of
unpack_trees() that pays any attention to it at all.  This was noted
as a problem in the initial review of the sparse-checkout series at
[1], and was the biggest factor behind me requesting the following
being added to the manpage for sparse-checkout[2]:

THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
THE FUTURE.

However, multiple groups were using sparse checkouts anyway, via
manually editing .git/info/sparse-checkout and running `git read-tree
-mu HEAD`, and adding various wrappers around it, and Derrick and I
thought there was value in getting _something_ out there to smooth it
out a little bit.  I'd still say it's pretty rough around the
edges...but useful nonetheless.

[1] https://lore.kernel.org/git/CABPp-BHJeuEHBDkf93m9sfSZ4rZB7+eFejiAXOsjLEUu5eT5FA@mail.gmail.com/
[2] https://lore.kernel.org/git/CABPp-BEryfaeYhuUsiDTaYdRKpK6GRi7hgZ5XSTVkoHVkx2qQA@mail.gmail.com/

> >  to the paths specified by the sparsity patterns, or to the intersection of
> >  those paths and any (like `*.c`) that the user might also specify on the
> >  command line. When false, the affected commands will work on full trees,
> > -ignoring the sparsity patterns. For now, only git-grep honors this setting.
> > +ignoring the sparsity patterns. For now, only git-grep and git-rm honor this
> > +setting.
>
> I am not sure if this is a good direction to go---can we make an
> inventory of all commands that affect working tree files and see
> which ones need the same treatment before going forward with just
> "grep" and "rm"?  Documenting the decision on the ones that will not
> get the same treatment may also be a good idea.  What I am aiming
> for is to prevent users from having to know in which versions of Git
> they can rely on the sparsity patterns with what commands, and doing
> things piecemeal like these two topics would be a road to confusion.

It's not just commands which affect the working tree that need to be
inventoried and adjusted.  We've made lists of commands in the past:

[3] https://lore.kernel.org/git/CABPp-BEbNCYk0pCuEDQ_ViB2=varJPBsVODxNvJs0EVRyBqjBg@mail.gmail.com/
[4] https://lore.kernel.org/git/xmqqy2y3ejwe.fsf@gitster-ct.c.googlers.com/

But the working-directory related ones are perhaps more problematic.
One additoinal example: I just got a report today that "git stash
apply" dies with a fatal error and the working directory in some
intermediate state when trying to apply a stash when the working
directory has a different set of sparsity paths than when the stash
was created.  (Granted, an error makes sense, but this was throwing
untranslated error messages, meaning they weren't in codepaths that
were meant to be triggered.)  This case may not be an apples to apples
comparison, but the testcase did involve adding new files before
stashing, so the stash apply would have been trying to remove files.
Anyway, I'll send more details on that issue in a separate thread
after I've had some time to dig into it.


Anyway, I'm not sure this helps, because I'm basically saying things
are kind of messy, and we're fixing as we go rather than having a full
implementation and all the fixes.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-17  5:20   ` Elijah Newren
@ 2020-11-20 17:06     ` Elijah Newren
  2020-12-31 20:03       ` sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns] Elijah Newren
  0 siblings, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2020-11-20 17:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matheus Tavares, Git Mailing List, Derrick Stolee

Hi,

On Mon, Nov 16, 2020 at 9:20 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Nov 16, 2020 at 12:14 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Matheus Tavares <matheus.bernardino@usp.br> writes:
> >
> > > Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
> > > operation to the paths that match both the command line pathspecs and
> > > the repository's sparsity patterns.
> >
> > > This better matches the expectations
> > > of users with sparse-checkout definitions, while still allowing them
> > > to optionally enable the old behavior with 'sparse.restrictCmds=false'
> > > or the global '--no-restrict-to-sparse-paths' option.
> >
> > Hmph.  Is "rm" the only oddball that ignores the sparse setting?
>
> This might make you much less happy, but in general none of the
> commands pay attention to the setting; I think a line or two in

This isn't quite right; as noted at the just submitted [1], there are
three different classes of ways that existing commands at least
partially pay attention to the setting.

[1] https://lore.kernel.org/git/5143cba7047d25137b3d7f8c7811a875c1931aee.1605891222.git.gitgitgadget@gmail.com/

> merge-recursive.c is the only part of the codebase outside of
> unpack_trees() that pays any attention to it at all.  This was noted
> as a problem in the initial review of the sparse-checkout series at
> [1], and was the biggest factor behind me requesting the following
> being added to the manpage for sparse-checkout[2]:
>
> THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
> COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
> THE FUTURE.

The fact that commands have only somewhat paid attention to this
setting is still a problem, though.  In fact, it was apparently a
known problem as far back as 2009 just from looking at the short list
of TODOs at the end of that file.

> > >  to the paths specified by the sparsity patterns, or to the intersection of
> > >  those paths and any (like `*.c`) that the user might also specify on the
> > >  command line. When false, the affected commands will work on full trees,
> > > -ignoring the sparsity patterns. For now, only git-grep honors this setting.
> > > +ignoring the sparsity patterns. For now, only git-grep and git-rm honor this
> > > +setting.
> >
> > I am not sure if this is a good direction to go---can we make an
> > inventory of all commands that affect working tree files and see
> > which ones need the same treatment before going forward with just
> > "grep" and "rm"?  Documenting the decision on the ones that will not
> > get the same treatment may also be a good idea.  What I am aiming
> > for is to prevent users from having to know in which versions of Git
> > they can rely on the sparsity patterns with what commands, and doing
> > things piecemeal like these two topics would be a road to confusion.
>
> It's not just commands which affect the working tree that need to be
> inventoried and adjusted.  We've made lists of commands in the past:
>
> [3] https://lore.kernel.org/git/CABPp-BEbNCYk0pCuEDQ_ViB2=varJPBsVODxNvJs0EVRyBqjBg@mail.gmail.com/
> [4] https://lore.kernel.org/git/xmqqy2y3ejwe.fsf@gitster-ct.c.googlers.com/

So, I think there are a few other commands that need to be modified
the same way rm is here by Matheus, a longer list of commands than
what I previously linked to for other modifications, some warnings and
error messages that need to be cleaned up, and a fair amount of
additional testing needed.  I also think we need to revisit the flag
names for --restrict-to-sparse-paths and
--no-restrict-to-sparse-paths; some feedback I'm getting suggest they
might be more frequently used than I originally suspected and thus we
might want shorter names.  (--sparse and --dense?)  So we probably
want to wait off on both mt/grep-sparse-checkout and
mt/rm-sparse-checkout (sorry Matheus) and maybe my recently submitted
stash changes (though those don't have an exposed
--[no]-restrict-to-sparse-paths flag and are modelled on existing
merge behavior) until we have a bigger plan in place.

But I only dug into it a bit while working on the stash apply bug; I'm
going to dig more (probably just after Thanksgiving) and perhaps make
a Documentation/technical/ file of some sort to propose more plans
here.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-16 12:37         ` Matheus Tavares Bernardino
@ 2020-11-23 13:23           ` Johannes Schindelin
  2020-11-24  2:48             ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 56+ messages in thread
From: Johannes Schindelin @ 2020-11-23 13:23 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Johannes Sixt, Derrick Stolee, Elijah Newren, Git Mailing List

Hi Matheus,

On Mon, 16 Nov 2020, Matheus Tavares Bernardino wrote:

> On Sun, Nov 15, 2020 at 6:42 PM Johannes Sixt <j6t@kdbg.org> wrote:
> >
> > Am 15.11.20 um 21:12 schrieb Matheus Tavares Bernardino:
> > > Thank you both for the comments. I'll try to send v2 soon.
> > >
> > > On Fri, Nov 13, 2020 at 10:47 AM Derrick Stolee <stolee@gmail.com> wrote:
> > >>
> > >> On 11/12/2020 6:54 PM, Elijah Newren wrote:
> > >>>
> > >>> Do we also want to include a testcase where the user specifies a
> > >>> directory and part of that directory is within the sparsity paths and
> > >>> part is out?  E.g.  'git sparse-checkout set /sub/dir && git rm -r
> > >>> sub' ?
> > >>
> > >> That is definitely an interesting case.
> > >
> > > I've added the test [1], but it's failing on Windows and I'm not quite
> > > sure why. The trash dir artifact shows that `git sparse-checkout set
> > > /sub/dir` produced the following path on the sparse-checkout file:
> > > "D:/a/git/git/git-sdk-64-minimal/sub/dir".
> >
> > If 'git sparse-checkout' is run from a bash command line, I would not be
> > surprised if the absolute path is munched in the way that you observe,
> > provided that D:/a/git/git/git-sdk-64-minimal is where your MinGW
> > subsystem is located. I that the case?
>
> Yeah, that must be it, thanks. I didn't run the command myself as I'm
> not on Windows, but D:/a/git/git/git-sdk-64-minimal must be the path
> where MinGW was installed by our GitHub Actions script, then. I'll use
> "sub/dir" without the root slash in t3600 to avoid the conversion.
> Thanks again!

In the `windows-test` job, the construct `$(pwd)` will give you the
Windows form (`D:/a/git/git/git-sdk-64-minimal`) whereas the `$PWD` form
will give you the Unix-y form (`/`). What form to use depends on the
context (if the absolute path comes from a shell script, the Unix-y form,
if the absolute path comes from `git.exe` itself, the Windows form).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] rm: honor sparse checkout patterns
  2020-11-23 13:23           ` Johannes Schindelin
@ 2020-11-24  2:48             ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares Bernardino @ 2020-11-24  2:48 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Sixt, Derrick Stolee, Elijah Newren, Git Mailing List

On Mon, Nov 23, 2020 at 10:23 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Mon, 16 Nov 2020, Matheus Tavares Bernardino wrote:
>
> > On Sun, Nov 15, 2020 at 6:42 PM Johannes Sixt <j6t@kdbg.org> wrote:
> > >
> > > Am 15.11.20 um 21:12 schrieb Matheus Tavares Bernardino:
> > > > Thank you both for the comments. I'll try to send v2 soon.
> > > >
> > > > On Fri, Nov 13, 2020 at 10:47 AM Derrick Stolee <stolee@gmail.com> wrote:
> > > >>
> > > >> On 11/12/2020 6:54 PM, Elijah Newren wrote:
> > > >>>
> > > >>> Do we also want to include a testcase where the user specifies a
> > > >>> directory and part of that directory is within the sparsity paths and
> > > >>> part is out?  E.g.  'git sparse-checkout set /sub/dir && git rm -r
> > > >>> sub' ?
> > > >>
> > > >> That is definitely an interesting case.
> > > >
> > > > I've added the test [1], but it's failing on Windows and I'm not quite
> > > > sure why. The trash dir artifact shows that `git sparse-checkout set
> > > > /sub/dir` produced the following path on the sparse-checkout file:
> > > > "D:/a/git/git/git-sdk-64-minimal/sub/dir".
> > >
> > > If 'git sparse-checkout' is run from a bash command line, I would not be
> > > surprised if the absolute path is munched in the way that you observe,
> > > provided that D:/a/git/git/git-sdk-64-minimal is where your MinGW
> > > subsystem is located. I that the case?
> >
> > Yeah, that must be it, thanks. I didn't run the command myself as I'm
> > not on Windows, but D:/a/git/git/git-sdk-64-minimal must be the path
> > where MinGW was installed by our GitHub Actions script, then. I'll use
> > "sub/dir" without the root slash in t3600 to avoid the conversion.
> > Thanks again!
>
> In the `windows-test` job, the construct `$(pwd)` will give you the
> Windows form (`D:/a/git/git/git-sdk-64-minimal`) whereas the `$PWD` form
> will give you the Unix-y form (`/`). What form to use depends on the
> context (if the absolute path comes from a shell script, the Unix-y form,
> if the absolute path comes from `git.exe` itself, the Windows form).

Got it, thanks for the explanation!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns]
  2020-11-20 17:06     ` Elijah Newren
@ 2020-12-31 20:03       ` Elijah Newren
  2021-01-04  3:02         ` Derrick Stolee
  0 siblings, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2020-12-31 20:03 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Derrick Stolee
  Cc: Git Mailing List, Junio C Hamano

Hi,

Sorry for the long delay...

On Fri, Nov 20, 2020 at 9:06 AM Elijah Newren <newren@gmail.com> wrote:
> On Mon, Nov 16, 2020 at 9:20 PM Elijah Newren <newren@gmail.com> wrote:
> > On Mon, Nov 16, 2020 at 12:14 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > Matheus Tavares <matheus.bernardino@usp.br> writes:
> > >
> > > > Make git-rm honor the 'sparse.restrictCmds' setting, by restricting its
> > > > operation to the paths that match both the command line pathspecs and
> > > > the repository's sparsity patterns.
> > >
> > > > This better matches the expectations
> > > > of users with sparse-checkout definitions, while still allowing them
> > > > to optionally enable the old behavior with 'sparse.restrictCmds=false'
> > > > or the global '--no-restrict-to-sparse-paths' option.
> > >
> > > Hmph.  Is "rm" the only oddball that ignores the sparse setting?

> > > >  to the paths specified by the sparsity patterns, or to the intersection of
> > > >  those paths and any (like `*.c`) that the user might also specify on the
> > > >  command line. When false, the affected commands will work on full trees,
> > > > -ignoring the sparsity patterns. For now, only git-grep honors this setting.
> > > > +ignoring the sparsity patterns. For now, only git-grep and git-rm honor this
> > > > +setting.
> > >
> > > I am not sure if this is a good direction to go---can we make an
> > > inventory of all commands that affect working tree files and see
> > > which ones need the same treatment before going forward with just
> > > "grep" and "rm"?  Documenting the decision on the ones that will not
> > > get the same treatment may also be a good idea.  What I am aiming
> > > for is to prevent users from having to know in which versions of Git
> > > they can rely on the sparsity patterns with what commands, and doing
> > > things piecemeal like these two topics would be a road to confusion.
> >
> > It's not just commands which affect the working tree that need to be
> > inventoried and adjusted.  We've made lists of commands in the past:
> >
> > [3] https://lore.kernel.org/git/CABPp-BEbNCYk0pCuEDQ_ViB2=varJPBsVODxNvJs0EVRyBqjBg@mail.gmail.com/
> > [4] https://lore.kernel.org/git/xmqqy2y3ejwe.fsf@gitster-ct.c.googlers.com/
>
> So, I think there are a few other commands that need to be modified
> the same way rm is here by Matheus, a longer list of commands than
> what I previously linked to for other modifications, some warnings and
> error messages that need to be cleaned up, and a fair amount of
> additional testing needed.  I also think we need to revisit the flag
> names for --restrict-to-sparse-paths and
> --no-restrict-to-sparse-paths; some feedback I'm getting suggest they
> might be more frequently used than I originally suspected and thus we
> might want shorter names.  (--sparse and --dense?)  So we probably
> want to wait off on both mt/grep-sparse-checkout and
> mt/rm-sparse-checkout (sorry Matheus) and maybe my recently submitted
> stash changes (though those don't have an exposed
> --[no]-restrict-to-sparse-paths flag and are modelled on existing
> merge behavior) until we have a bigger plan in place.
>
> But I only dug into it a bit while working on the stash apply bug; I'm
> going to dig more (probably just after Thanksgiving) and perhaps make
> a Documentation/technical/ file of some sort to propose more plans
> here.

I apologize, this email is *very* lengthy but I have a summary up-front.
This email includes:

  * questions
  * short term proposals for unsticking sparse-related topics
    (en/stash-apply-sparse-checkout, mt/rm-sparse-checkout, and
    mt/grep-sparse-checkout)
  * longer term proposals for continued sparse-checkout work.

However, the core thing to get across is my main question, contained in
the next four lines:

  sparse-checkout's purpose is not fully defined.  Does it exist to:
    A) allow working on a subset of the repository?
    B) allow working with a subset of the repository checked out?
    C) something else?


You can think of (A) as marrying partial clones and sparse checkouts,
and trying to make the result feel like a smaller repository.  That
means that grep, diff, log, etc. cull output unrelated to your sparsity
paths.  (B) is treating the repo as dense history (so grep, diff, log do
not cull output), but the working directory sparse.  In my view, git
still doesn't (yet) provide either of these.

=== Why it matters ==

There are unfortunately *many* gray areas when you try to define how git
subcommands should behave in sparse-checkouts.  (The
implementation-level definition from a decade ago of "files are assumed
to be unchanged from HEAD when SKIP_WORKTREE is set, and we remove files
with that bit set from the working directory" definition from the past
provides no clear vision about how to resolve gray areas, and also leads
to various inconsistencies and surprises for users.)  I believe a
definition based around a usecase (or usecases) for the purpose of
sparse-checkouts would remove most of the gray areas.

Are there choices other than A & B that I proposed above that make
sense?  Traditionally, I thought of B as just a partial implementation
of A, and that A was the desired end-goal.  However, others have argued
for B as a preferred choice (some users at $DAYJOB even want both A and
B, meaning they'd like a simple short flag to switch between the two).
There may be others I'm unaware of.

git implements neither A nor B.  It might be nice to think of git's
current behavior as a partial implementation of B (enough to provide
some value, but still feel buggy/incomplete), and that after finishing B
we could add more work to allow A.  I'm not sure if the current
implementation is just a subset of B, though.

Let's dig in...

=== sparse-checkout demonstration -- diff, status, add, clean, rm ===

$ git init testing && cd testing
$ echo tracked >tracked
$ echo tracked-but-maybe-skipped >tracked-but-maybe-skipped
$ git add .
$ git commit -m initial
$ echo tracked-but-maybe-skipped-v2 >tracked-but-maybe-skipped
$ git commit -am second
$ echo tracked-but-maybe-skipped-v3 >tracked-but-maybe-skipped
$ git commit -am third

### In a non-sparse checkout...

$ ls -1
tracked
tracked-but-maybe-skipped
$ echo changed >tracked-but-maybe-skipped    # modify the file
$ git diff --stat                            # diff shows the change
 tracked-but-maybe-skipped | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git status --porcelain                     # status shows the change
 M tracked-but-maybe-skipped
$ git add tracked-but-maybe-skipped          # add stages the change
$ git status --porcelain                     # status shows the staged change
M  tracked-but-maybe-skipped
$ git clean -f tracked-but-maybe-skipped     # clean ignores it, it's tracked
$ git rm -f tracked-but-maybe-skipped        # rm removes the file
rm 'tracked-but-maybe-skipped'
$ git status --porcelain                     # status shows the removal
D  tracked-but-maybe-skipped
$ git reset --hard                           # undo all changes

### Compared to a sparse-checkout...

$ git sparse-checkout set tracked            # sparse-checkout...
$ ls -1                                      # ...removes non-matches
tracked
$ echo changed >tracked-but-maybe-skipped    # put the file back, modified
$ ls -1
tracked
tracked-but-maybe-skipped
$ git diff --stat                            # diff ignores changed file
$ git status --porcelain                     # status ignores changed file
$ git add tracked-but-maybe-skipped          # add...
$ git status --porcelain                     # ...also ignores changed file
$ git clean -f tracked-but-maybe-skipped     # maybe it's untracked?  but...
$ ls -1                                      # ...clean also ignores it
tracked
tracked-but-maybe-skipped
$ git rm -f tracked-but-maybe-skipped        # rm doesn't?!
rm 'tracked-but-maybe-skipped'
$ git status --porcelain
D  tracked-but-maybe-skipped
$ git reset --hard                           # undo changes, re-sparsify


With a direct filename some might question the behavior of add.
However, note here that add & rm could have used a glob such as '*.c',
or a directory like 'builtin/'.  In such a case, the add behavior seems
reasonable (though perhaps a warning would be nice if no paths end up
matching the pathspec, much like it does if you specify `git add
mistyped-filename`) and the rm behavior is quite dangerous.


=== more sparse-checkout discussion -- behavior A vs. B with history ===

Note that other forms of checkout/restore will also ignore paths that do
not match sparsity patterns:

$ git checkout HEAD tracked-but-maybe-skipped
error: pathspec 'tracked-but-maybe-skipped' did not match any file(s)
known to git
$ git restore --source=HEAD tracked-but-maybe-skipped
error: pathspec 'tracked-but-maybe-skipped' did not match any file(s)
known to git

The error message is suboptimal, but seems like this otherwise gives
desirable behavior as we want the checkout to be sparse.  If a user had
specified a glob or a directory, we'd only want to match the portion of
that glob or directory associated with the sparsity patterns.

Unfortunately, this behavior changes once you specify a different
version -- it actively not only ignores the sparse-checkout
specification but clears the SKIP_WORKTREE bit:

$ git checkout HEAD~1 tracked-but-maybe-skipped
Updated 1 path from 58916d9
$ ls -1
tracked
tracked-but-maybe-skipped
$ git ls-files -t
H tracked
H tracked-but-maybe-skipped
$ git reset --hard                         # Undo changes, re-sparsify


And it gets even worse when passing globs like '*.c' or directories like
'src/' to checkout because then sparsity patterns are ignored if and
only if the file in the index differs from the specified file:

$ git checkout -- '*skipped'
error: pathspec '*skipped' did not match any file(s) known to git
$ ls -1
tracked
$ git checkout HEAD~2 -- '*skipped'
$ ls -1
tracked
tracked-but-maybe-skipped

We get similar weirdness with directories:

$ git sparse-checkout set nomatches
$ ls -1
$ git checkout .
error: pathspec '.' did not match any file(s) known to git
$ git checkout HEAD~2 .
Updated 1 path from fb99ded
$ git ls-files -t
S tracked
H tracked-but-maybe-skipped
### Undo the above changes
$ git reset --hard
$ git sparse-checkout set tracked

Note that the above only updated 1 path, despite both files existing in
HEAD~2.  Only one of the files differed between HEAD~2 and the current
index, so it only updated that one.  When it updated that one, it
cleared the SKIP_WORKTREE bit for it, but left the other file, that did
exist in the older commit, with the SKIP_WORKTREE bit.


Since checkout ignores non-matching paths, users might expect other
commands like diff to do the same:

$ git ls-files -t
H tracked
S tracked-but-maybe-skipped
$ echo changed> tracked-but-maybe-skipped
$ git diff --stat tracked-but-maybe-skipped          # Yes, ignored
$ git diff --stat HEAD tracked-but-maybe-skipped     # Yes, ignored
$ git diff --stat HEAD~1 tracked-but-maybe-skipped   # Not ignored?!?
 tracked-but-maybe-skipped | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ rm tracked-but-maybe-skipped

Technically this is because SKIP_WORKTREE is treated as "file is assumed
to match HEAD", which might be implementationally well defined, but
yields some weird results for users to attempt to form a mental model.
Diff seems to match behavior B, whereas checkout with revisions and/or
pathspecs doesn't match either A or B.

There is a similar split in whether users would expect a search to
return results for folks who prefer (A) or (B):

$ git grep maybe HEAD~1
HEAD~1:tracked-but-maybe-skipped:tracked-but-maybe-skipped-v2

But, regardless of how history is treated (i.e. regardless of
preferences for behavior A or B), consistent with checkout and diff
above we'd expect a plain grep to not return anything:

$ git grep v3                                        # Should be empty
tracked-but-maybe-skipped:tracked-but-maybe-skipped-v3

Huh?!?  Very confusing.  Let's make it worse, though -- how about we
manually create that file again (despite it being SKIP_WORKTREE) and
give it different contents and see what it does:

$ echo changed >tracked-but-maybe-skipped
$ git grep v3
tracked-but-maybe-skipped:tracked-but-maybe-skipped-v3

WAT?!?  What part of "changed" does "v3" match??  Oh, right, none of it.
The pretend-the-working-directory-matches-HEAD implementation strikes
again.  It's a nice approximation to user-desired behavior, but I don't
see how it actually makes sense in general.


=== sparse-checkout other behaviors -- merges and apply ===

The merge machinery has traditionally done something different than
checkout/diff/status/commit in sparse-checkouts.  Like commit, it has to
include all paths in any created commit object.  Like checkout, it wants
to avoid writing files to the working directory that don't match
sparsity paths -- BUT files might have conflicts.  So, the merge
machinery clears the skip_worktree bit for conflicted files.  Also,
merge-recursive.c also clears the skip_worktree bit for other files
unnecessarily, due to difficulty of implementation of preserving it
(merge-ort should correct this).[1]

[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/

Since the merge-machinery is used in multiple commands -- merge, rebase,
cherry-pick, revert, checkout -m, etc., this behavior is present in all
of those.

stash is somewhat modelled on a merge, so it should behave the same.  It
doesn't quite do so currently in certain cases.

The am and apply subcommands should also behave like merge; both will
need fixes.


=== sparse-checkout papercuts ===

Some simple examples showing that commands which otherwise work quite
nicely with sparse-checkout can have a few rough edges:

$ touch addme
$ git add addme
$ git ls-files -t
H addme
H tracked
S tracked-but-maybe-skipped
$ git reset --hard                           # usually works great
error: Path 'addme' not uptodate; will not remove from working tree.
HEAD is now at bdbbb6f third
$ git ls-files -t
H tracked
S tracked-but-maybe-skipped
$ ls -1
tracked

So, reset --hard worked correctly, but it made the user think that it
didn't with its error message.

$ git add mistyped-filename
fatal: pathspec 'mistyped-filename' did not match any files
$ echo $?
128
$ echo changed >tracked-but-maybe-skipped
$ git add tracked-but-maybe-skipped
$ echo $?
0
$ git status --procelain
$

So, in the case of a mistyped-filename or a glob that doesn't match any
files, `git add` prints an error and returns a non-zero exit code.  In
the case of SKIP_WORKTREE files, `git add` (correctly) refuses to add
them, but confusingly does so silently and with a clean exit status.


=== behavioral proposals ===

Short term version:

  * en/stash-apply-sparse-checkout: apply as-is.

  * mt/rm-sparse-checkout: modify it to ignore sparse.restrictCmds --
      `git rm` should be like `git add` and _always_ ignore
      SKIP_WORKTREE paths, but it should print a warning (and return
      with non-zero exit code) if only SKIP_WORKTREE'd paths match the
      pathspec.  If folks want to remove (or add) files outside current
      sparsity paths, they can either update their sparsity paths or use
      `git update-index`.

  * mt/grep-sparse-checkout: figure out shorter flag names.  Default to
      --no-restrict-to-sparse, for now.  Then merge it for git-2.31.


Longer term version:

I'll split these into categories...

--> Default behavior
  * Default to behavior B (--no-restrict-to-sparse from
    mt/grep-sparse-checkout) for now.  I think that's the wrong default
    for when we marry sparse-checkouts with partial clones, but we only
    have patches for behavior A for git grep; it may take a while to
    support behavior A in each command.  Slowly changing behavior of
    commands with each release is problematic.  We can discuss again
    after behavior A is fully supported what to make the defaults be.

--> Commands already working with sparse-checkouts; no known bugs:
  * status
  * switch, the "switch" parts of checkout

  * read-tree
  * update-index
  * ls-files

--> Enhancements
  * General
    * shorter flag names than --[no-]restrict-to-sparse.  --dense and
      --sparse?  --[no-]restrict?
  * sparse-checkout (After behavior A is implemented...)
    * Provide warning if sparse.restrictCmds not set (similar to git
      pull's warning with no pull.rebase, or git checkout's warning when
      detaching HEAD)
  * clone
      * Consider having clone set sparse.restrictCmds based on whether
      --partial is provided in addition to --sparse.

--> Commands with minor bugs/annoyances:
  * add
    * print a warning if pathspec only matches SKIP_WORKTREE files (much
      as it already does if the pathspec matches no files)

  * reset --hard
    * spurious and incorrect warning when removing a newly added file
  * merge, rebase, cherry-pick, revert
    * unnecessary unsparsification (merge-ort should fix this)
  * stash
    * similar to merge, but there are extra bugs from the pipeline
      design.  en/stash-apply-sparse-checkout fixes the known issues.

--> Buggy commands
  * am
    * should behave like merge commands -- (1) it needs to be okay for
      the file to not exist in the working directory; vivify it if
      necessary, (2) any conflicted paths must remain vivified, (3)
      paths which merge cleanly can be unvivified.
  * apply
    * See am
  * rm
    * should behave like add, skipping SKIP_WORKTREE entries.  See comments
      on mt/rm-sparse-checkout elsewhere
  * restore
    * with revisions and/or globs, sparsity patterns should be heeded
  * checkout
    * see restore

--> Commands that need no changes because commits are full-tree:
  * archive
  * bundle
  * commit
  * format-patch
  * fast-export
  * fast-import
  * commit-tree

--> Commands that would change for behavior A
  * bisect
    * Only consider commits touching paths matching sparsity patterns
  * diff
    * When given revisions, only show subset of files matching sparsity
      patterns.  If pathspecs are given, intersect them with sparsity
      patterns.
  * log
    * Only consider commits touching at least one path matching sparsity
      patterns.  If pathspecs are given, paths must match both the
      pathspecs and the sparsity patterns in order to be considered
      relevant and be shown.
  * gitk
    * See log
  * shortlog
    * See log
  * grep
    * See mt/grep-sparse-checkout; it's been discussed in detail..and is
      implemented.  (Other than that we don't want behavior A to be the
      default when so many commands do not support it yet.)

  * show-branch
    * See log
  * whatchanged
    * See log
  * show (at least for commits)
    * See diff

  * blame
    * With -C or -C -C, only detect lines moved/copied from files that match
      the sparsity paths.
  * annotate
    * See blame.

--> Commands whose behavior I'm still uncertain of:
  * worktree add
    * for behavior A (marrying sparse-checkout with partial clone), we
      should almost certainly copy sparsity paths from the previous
      worktree (we either have to do that or have some kind of
      specify-at-clone-time default set of sparsity paths)
    * for behavior B, we may also want to copy sparsity paths from the
      previous worktree (much like a new command line shell will copy
      $PWD from the previous one), but it's less clear.  Should it?
  * range-diff
    * is this considered to be log-like for format-patch-like in
      behavior?
  * cherry
    * see range-diff
  * plumbing -- diff-files, diff-index, diff-tree, ls-tree, rev-list
    * should these be tweaked or always operate full-tree?
  * checkout-index
    * should it be like checkout and pay attention to sparsity paths, or
      be considered special like update-index, ls-files, & read-tree and
      write to working tree anyway?
  * mv
    * I don't think mv can take a glob, and I think it currently happens to
      work.  Should we add a comment to the code that if anyone wants to
      support mv using pathspecs they might need to be careful about
      SKIP_WORKTREE?

--> Might need changes, but who cares?
  * merge-file
  * merge-index

--> Commands with no interaction with sparse-checkout:
  * branch
  * clean
  * describe
  * fetch
  * gc
  * init
  * maintenance
  * notes
  * pull (merge & rebase have the necessary changes)
  * push
  * submodule
  * tag

  * config
  * filter-branch (works in separate checkout without sparsity paths)
  * pack-refs
  * prune
  * remote
  * repack
  * replace

  * bugreport
  * count-objects
  * fsck
  * gitweb
  * help
  * instaweb
  * merge-tree
  * rerere
  * verify-commit
  * verify-tag

  * commit-graph
  * hash-object
  * index-pack
  * mktag
  * mktree
  * multi-pack-index
  * pack-objects
  * prune-packed
  * symbolic-ref
  * unpack-objects
  * update-ref
  * write-tree

  * for-each-ref
  * get-tar-commit-id
  * ls-remote
  * merge-base
  * name-rev
  * pack-redundant
  * rev-parse
  * show-index
  * show-ref
  * unpack-file
  * var
  * verify-pack

  * <Everything under 'Interacting with Others' in 'git help --all'>
  * <Everything under 'Low-level...Syncing' in 'git help --all'>
  * <Everything under 'Low-level...Internal Helpers' in 'git help --all'>
  * <Everything under 'External commands' in 'git help --all'>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns]
  2020-12-31 20:03       ` sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns] Elijah Newren
@ 2021-01-04  3:02         ` Derrick Stolee
  2021-01-06 19:15           ` Elijah Newren
  0 siblings, 1 reply; 56+ messages in thread
From: Derrick Stolee @ 2021-01-04  3:02 UTC (permalink / raw)
  To: Elijah Newren, Matheus Tavares Bernardino
  Cc: Git Mailing List, Junio C Hamano

On 12/31/2020 3:03 PM, Elijah Newren wrote:
> Sorry for the long delay...

Thanks for bringing us all back to the topic.

>   sparse-checkout's purpose is not fully defined.  Does it exist to:
>     A) allow working on a subset of the repository?
>     B) allow working with a subset of the repository checked out?
>     C) something else?

I think it's all of the above!

My main focus for sparse-checkout is a way for users who care about a
small fraction of a repository's files to only do work on those files.
This saves time because they are asking Git to do less, but also they
can use tools like IDEs that with "Open Folder" options without falling
over. Writing fewer files also affects things like their OS indexing
files for search or antivirus scanning written files.

Others use sparse-checkout to remove a few large files unless they
need them. I'm less interested in this case, myself.

Both perspectives get better with partial clone because the download
size shrinks significantly. While partial clone has a sparse-checkout
style filter, it is hard to compute on the server side. Further, it
is not very forgiving of someone wanting to change their sparse
definition after cloning. Tree misses are really expensive, and I find
that the extra network transfer of the full tree set is a price that is
worth paying.

I'm also focused on users that know that they are a part of a larger
whole. They know they are operating on a large repository but focus on
what they need to contribute their part. I expect multiple "roles" to
use very different, almost disjoint parts of the codebase. Some other
"architect" users operate across the entire tree or hop between different
sections of the codebase as necessary. In this situation, I'm wary of
scoping too many features to the sparse-checkout definition, especially
"git log," as it can be too confusing to have their view of the codebase
depend on your "point of view."

(In case we _do_ start changing behavior in this way, I'm going to use
the term "sparse parallax" to describe users being confused about their
repositories because they have different sparse-checkout definitions,
changing what they see from "git log" or "git diff".)
 
> === Why it matters ==
> 
> There are unfortunately *many* gray areas when you try to define how git
> subcommands should behave in sparse-checkouts.  (The
> implementation-level definition from a decade ago of "files are assumed
> to be unchanged from HEAD when SKIP_WORKTREE is set, and we remove files
> with that bit set from the working directory" definition from the past
> provides no clear vision about how to resolve gray areas, and also leads
> to various inconsistencies and surprises for users.)  I believe a
> definition based around a usecase (or usecases) for the purpose of
> sparse-checkouts would remove most of the gray areas.
> 
> Are there choices other than A & B that I proposed above that make
> sense?  Traditionally, I thought of B as just a partial implementation
> of A, and that A was the desired end-goal.  However, others have argued
> for B as a preferred choice (some users at $DAYJOB even want both A and
> B, meaning they'd like a simple short flag to switch between the two).
> There may be others I'm unaware of.
> 
> git implements neither A nor B.  It might be nice to think of git's
> current behavior as a partial implementation of B (enough to provide
> some value, but still feel buggy/incomplete), and that after finishing B
> we could add more work to allow A.  I'm not sure if the current
> implementation is just a subset of B, though.
> 
> Let's dig in...

I read your detailed message and I think you make some great points.

I think there are three possible situations:

1. sparse-checkout should not affect the behavior at all.

   An example for this is "git commit". We want the root tree to contain
   all of the subtrees and blobs that are out of the sparse-checkout
   definition. The underlying object model should never change.

2. sparse-checkout should change the default, but users can opt-out.

   The examples I think of here are 'git grep' and 'git rm', as we have
   discussed recently. Having a default of "you already chose to be in
   a sparse-checkout, so we think this behavior is better for you"
   should continue to be pursued.

3. Users can opt-in to a sparse-checkout version of a behavior.

   The example in this case is "git diff". Perhaps we would want to see
   a diff scoped only to our sparse definition, but that should not be
   the default. It is too risky to change the output here without an
   explicit choice by the user.

Let's get into your concrete details now:

> === behavioral proposals ===
> 
> Short term version:
> 
>   * en/stash-apply-sparse-checkout: apply as-is.
> 
>   * mt/rm-sparse-checkout: modify it to ignore sparse.restrictCmds --
>       `git rm` should be like `git add` and _always_ ignore
>       SKIP_WORKTREE paths, but it should print a warning (and return
>       with non-zero exit code) if only SKIP_WORKTREE'd paths match the
>       pathspec.  If folks want to remove (or add) files outside current
>       sparsity paths, they can either update their sparsity paths or use
>       `git update-index`.
> 
>   * mt/grep-sparse-checkout: figure out shorter flag names.  Default to
>       --no-restrict-to-sparse, for now.  Then merge it for git-2.31.

I don't want to derail your high-level conversation too much, but by the
end of January I hope to send an RFC to create a "sparse index" which allows
the index to store entries corresponding to a directory with the skip-
worktree bit on. The biggest benefit is that commands like 'git status' and
'git add' will actually change their performance based on the size of the
sparse-checkout definition and not the total number of paths at HEAD.

The other thing that happens once we have that idea is that these behaviors
in 'git grep' or 'git rm' actually become _easier_ to implement because we
don't even have an immediate reference to the blobs outside of the sparse
cone (assuming cone mode).

The tricky part (that I'm continuing to work on, hence no RFC today) is
enabling the part where a user can opt-in to the old behavior. This requires
parsing trees to expand the index as necessary. A simple approach is to
create an in-memory index that is the full expansion at HEAD, when necessary.
It will be better to do expansions in a targeted way.

(Your merge-ort algorithm is critical to the success here, since that doesn't
use the index as a data structure. I expect to make merge-ort the default for
users with a sparse index. Your algorithm will be done first.)

My point in bringing this up is that perhaps we should pause concrete work on
updating other builtins until we have a clearer idea of what a sparse index
could look like and how the implementation would change based on having one
or not. I hope that my RFC will be illuminating in this regard.

Ok, enough of that sidebar. I thought it important to bring up, but 

> Longer term version:
> 
> I'll split these into categories...
> 
> --> Default behavior
>   * Default to behavior B (--no-restrict-to-sparse from
>     mt/grep-sparse-checkout) for now.  I think that's the wrong default
>     for when we marry sparse-checkouts with partial clones, but we only
>     have patches for behavior A for git grep; it may take a while to
>     support behavior A in each command.  Slowly changing behavior of
>     commands with each release is problematic.  We can discuss again
>     after behavior A is fully supported what to make the defaults be.
> 
> --> Commands already working with sparse-checkouts; no known bugs:
>   * status
>   * switch, the "switch" parts of checkout
> 
>   * read-tree
>   * update-index
>   * ls-files
> 
> --> Enhancements
>   * General
>     * shorter flag names than --[no-]restrict-to-sparse.  --dense and
>       --sparse?  --[no-]restrict?

--full-workdir?

>   * sparse-checkout (After behavior A is implemented...)
>     * Provide warning if sparse.restrictCmds not set (similar to git
>       pull's warning with no pull.rebase, or git checkout's warning when
>       detaching HEAD)
>   * clone
>       * Consider having clone set sparse.restrictCmds based on whether
>       --partial is provided in addition to --sparse.

In general, we could use some strategies to help users opt-in to these
new behaviors more easily. We are very close to having the only real
feature of Scalar be that it sets these options automatically, and will
continue to push to the newest improvements as possible.
 
> --> Commands with minor bugs/annoyances:
>   * add
>     * print a warning if pathspec only matches SKIP_WORKTREE files (much
>       as it already does if the pathspec matches no files)
> 
>   * reset --hard
>     * spurious and incorrect warning when removing a newly added file
>   * merge, rebase, cherry-pick, revert
>     * unnecessary unsparsification (merge-ort should fix this)
>   * stash
>     * similar to merge, but there are extra bugs from the pipeline
>       design.  en/stash-apply-sparse-checkout fixes the known issues.
> 
> --> Buggy commands
>   * am
>     * should behave like merge commands -- (1) it needs to be okay for
>       the file to not exist in the working directory; vivify it if
>       necessary, (2) any conflicted paths must remain vivified, (3)
>       paths which merge cleanly can be unvivified.
>   * apply
>     * See am
>   * rm
>     * should behave like add, skipping SKIP_WORKTREE entries.  See comments
>       on mt/rm-sparse-checkout elsewhere
>   * restore
>     * with revisions and/or globs, sparsity patterns should be heeded
>   * checkout
>     * see restore
> 
> --> Commands that need no changes because commits are full-tree:
>   * archive
>   * bundle
>   * commit
>   * format-patch
>   * fast-export
>   * fast-import
>   * commit-tree
> 
> --> Commands that would change for behavior A
>   * bisect
>     * Only consider commits touching paths matching sparsity patterns
>   * diff
>     * When given revisions, only show subset of files matching sparsity
>       patterns.  If pathspecs are given, intersect them with sparsity
>       patterns.
>   * log
>     * Only consider commits touching at least one path matching sparsity
>       patterns.  If pathspecs are given, paths must match both the
>       pathspecs and the sparsity patterns in order to be considered
>       relevant and be shown.
>   * gitk
>     * See log
>   * shortlog
>     * See log
>   * grep
>     * See mt/grep-sparse-checkout; it's been discussed in detail..and is
>       implemented.  (Other than that we don't want behavior A to be the
>       default when so many commands do not support it yet.)
> 
>   * show-branch
>     * See log
>   * whatchanged
>     * See log
>   * show (at least for commits)
>     * See diff
> 
>   * blame
>     * With -C or -C -C, only detect lines moved/copied from files that match
>       the sparsity paths.
>   * annotate
>     * See blame.

this "behavior A" idea is the one I'm most skeptical about. Creating a
way to opt-in to a sparse definition might be nice. It might be nice to
run "git log --simplify-sparse" to see the simplified history when only
caring about commits that changed according to the current sparse-checkout
definitions. Expand that more when asking for diffs as part of that log,
and the way we specify the option becomes tricky.

But I also want to avoid doing this as a default or even behind a config
setting. We already get enough complains about "missing commits" when
someone does a bad merge so "git log -- file" simplifies away a commit
that exists in the full history. Imagine someone saying "on my machine,
'git log' shows the commit, but my colleague can't see it!" I would really
like to avoid adding to that confusion if possible.

> --> Commands whose behavior I'm still uncertain of:
>   * worktree add
>     * for behavior A (marrying sparse-checkout with partial clone), we
>       should almost certainly copy sparsity paths from the previous
>       worktree (we either have to do that or have some kind of
>       specify-at-clone-time default set of sparsity paths)
>     * for behavior B, we may also want to copy sparsity paths from the
>       previous worktree (much like a new command line shell will copy
>       $PWD from the previous one), but it's less clear.  Should it?

I think 'git worktree add' should at minimum continue using a sparse-
checkout if the current working directory has one. Worktrees are a
great way to scale the creation of multiple working directories for
the same repository without re-cloning all of the history. In a partial
clone case, it's really important that we don't explode the workdir in
the new worktree (or even download all those blobs).

Now, should we copy the sparse-checkout definitions, or start with the
"only files at root" default? That's a more subtle question.

>   * range-diff
>     * is this considered to be log-like for format-patch-like in
>       behavior?

If we stick with log acting on the full tree unless specified in the
command-line options, then range-diff can be the same. Seems like a
really low priority, though, because of the proximity to format-patch.

>   * cherry
>     * see range-diff
>   * plumbing -- diff-files, diff-index, diff-tree, ls-tree, rev-list
>     * should these be tweaked or always operate full-tree?
>   * checkout-index
>     * should it be like checkout and pay attention to sparsity paths, or
>       be considered special like update-index, ls-files, & read-tree and
>       write to working tree anyway?
>   * mv
>     * I don't think mv can take a glob, and I think it currently happens to
>       work.  Should we add a comment to the code that if anyone wants to
>       support mv using pathspecs they might need to be careful about
>       SKIP_WORKTREE?
> 
> --> Might need changes, but who cares?
>   * merge-file
>   * merge-index
> 
> --> Commands with no interaction with sparse-checkout:

(I agree with the list you included here.)

Thanks for starting the discussion. Perhaps more will pick it up as
they return from the holiday break.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns]
  2021-01-04  3:02         ` Derrick Stolee
@ 2021-01-06 19:15           ` Elijah Newren
  2021-01-07 12:53             ` Derrick Stolee
  0 siblings, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2021-01-06 19:15 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Matheus Tavares Bernardino, Git Mailing List, Junio C Hamano

On Sun, Jan 3, 2021 at 7:02 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/31/2020 3:03 PM, Elijah Newren wrote:
> > Sorry for the long delay...
>
> Thanks for bringing us all back to the topic.
>
> >   sparse-checkout's purpose is not fully defined.  Does it exist to:
> >     A) allow working on a subset of the repository?
> >     B) allow working with a subset of the repository checked out?
> >     C) something else?
>
> I think it's all of the above!

I think I understand your sentiment, but since my choice C was so
vague, your answer doesn't really help us figure out correct behavior
for commands that currently behave weirdly.  ;-)

> My main focus for sparse-checkout is a way for users who care about a
> small fraction of a repository's files to only do work on those files.
> This saves time because they are asking Git to do less, but also they
> can use tools like IDEs that with "Open Folder" options without falling
> over. Writing fewer files also affects things like their OS indexing
> files for search or antivirus scanning written files.

Interesting, IDE failure/overload was one of the big driving factors
in our adoption as well.

> Others use sparse-checkout to remove a few large files unless they
> need them. I'm less interested in this case, myself.
>
> Both perspectives get better with partial clone because the download
> size shrinks significantly. While partial clone has a sparse-checkout
> style filter, it is hard to compute on the server side. Further, it
> is not very forgiving of someone wanting to change their sparse
> definition after cloning. Tree misses are really expensive, and I find
> that the extra network transfer of the full tree set is a price that is
> worth paying.

Out of curiosity, is that because the promisor handling doesn't do
nice batching of trees to download, as is done for blobs, or is there
a more fundamental reason they are really expensive?  (I'm just
wondering if we are risking changing design in some areas based on
suboptimal implementation of other things.  I don't actually have
experience with partial clones yet, though, so I'm basically just
querying about random but interesting things without any experience to
back it up.)

> I'm also focused on users that know that they are a part of a larger
> whole. They know they are operating on a large repository but focus on
> what they need to contribute their part. I expect multiple "roles" to
> use very different, almost disjoint parts of the codebase. Some other
> "architect" users operate across the entire tree or hop between different
> sections of the codebase as necessary. In this situation, I'm wary of
> scoping too many features to the sparse-checkout definition, especially
> "git log," as it can be too confusing to have their view of the codebase
> depend on your "point of view."
>
> (In case we _do_ start changing behavior in this way, I'm going to use
> the term "sparse parallax" to describe users being confused about their
> repositories because they have different sparse-checkout definitions,
> changing what they see from "git log" or "git diff".)

Thanks for this extra perspective.

>
> > === Why it matters ==
> >
> > There are unfortunately *many* gray areas when you try to define how git
> > subcommands should behave in sparse-checkouts.  (The
> > implementation-level definition from a decade ago of "files are assumed
> > to be unchanged from HEAD when SKIP_WORKTREE is set, and we remove files
> > with that bit set from the working directory" definition from the past
> > provides no clear vision about how to resolve gray areas, and also leads
> > to various inconsistencies and surprises for users.)  I believe a
> > definition based around a usecase (or usecases) for the purpose of
> > sparse-checkouts would remove most of the gray areas.
> >
> > Are there choices other than A & B that I proposed above that make
> > sense?  Traditionally, I thought of B as just a partial implementation
> > of A, and that A was the desired end-goal.  However, others have argued
> > for B as a preferred choice (some users at $DAYJOB even want both A and
> > B, meaning they'd like a simple short flag to switch between the two).
> > There may be others I'm unaware of.
> >
> > git implements neither A nor B.  It might be nice to think of git's
> > current behavior as a partial implementation of B (enough to provide
> > some value, but still feel buggy/incomplete), and that after finishing B
> > we could add more work to allow A.  I'm not sure if the current
> > implementation is just a subset of B, though.
> >
> > Let's dig in...
>
> I read your detailed message and I think you make some great points.
>
> I think there are three possible situations:
>
> 1. sparse-checkout should not affect the behavior at all.
>
>    An example for this is "git commit". We want the root tree to contain
>    all of the subtrees and blobs that are out of the sparse-checkout
>    definition. The underlying object model should never change.
>
> 2. sparse-checkout should change the default, but users can opt-out.
>
>    The examples I think of here are 'git grep' and 'git rm', as we have
>    discussed recently. Having a default of "you already chose to be in
>    a sparse-checkout, so we think this behavior is better for you"
>    should continue to be pursued.
>
> 3. Users can opt-in to a sparse-checkout version of a behavior.
>
>    The example in this case is "git diff". Perhaps we would want to see
>    a diff scoped only to our sparse definition, but that should not be
>    the default. It is too risky to change the output here without an
>    explicit choice by the user.

I'm curious why you put grep and diff in different categories.  A
plain "git diff" without revisions will give the same output whether
or not it restricts to the sparsity paths (because the other paths are
unchanged), so restricting is purely an optimization question.  Making
"git diff REVISION" restrict to the sparsity paths would be a
behavioral change as you note, but "git grep [REVISION]" would also
require a behavioral change to limit to the sparsity paths.  If it's
too risky to change the output for git diff with revisions, why is it
not also too risky to do that with git grep with revisions?


Also, I think you are missing a really important category:

4. sparse-checkout changes the behavior of commands and there is no
opt-out or configurability provided.

The most obvious examples are switch and checkout -- their modified
behavior is really the /point/ of sparse-checkouts and if you want to
"opt out" then just don't use sparse-checkouts.  `reset --hard` can go
in the same bucket; it's modified in the same way.  However, some
commands are modified in a different way, but also have no opt-out --
for example, merge, rebase, cherry-pick, revert, and stash, all "try"
to avoid writing files to the working tree that match the sparsify
specifications, but will vivify files which have conflicts (and maybe
a few additional files based on implementation shortcomings).  Another
command that behaves differently than any of these, and is also
non-configurable in this change, is git-add.  It'll ignore any tracked
files with the SKIP_WORKTREE bit set, even if the file is present.
That's really helpful thing for "git add -A [GLOB_OR_DIRECTORY]" to
do, as we don't want sparsity to accidentally be treated as a
directive to remove files from the repository.

I think more commands should fall under this fourth category as well,
including rm.

However, I actually think 4 deserves to be broken up into separate
categories based on the type of behavior change.  Thus, I'd need a 4a,
4b, and 4c for my example commands above.

> Let's get into your concrete details now:
>
> > === behavioral proposals ===
> >
> > Short term version:
> >
> >   * en/stash-apply-sparse-checkout: apply as-is.
> >
> >   * mt/rm-sparse-checkout: modify it to ignore sparse.restrictCmds --
> >       `git rm` should be like `git add` and _always_ ignore
> >       SKIP_WORKTREE paths, but it should print a warning (and return
> >       with non-zero exit code) if only SKIP_WORKTREE'd paths match the
> >       pathspec.  If folks want to remove (or add) files outside current
> >       sparsity paths, they can either update their sparsity paths or use
> >       `git update-index`.
> >
> >   * mt/grep-sparse-checkout: figure out shorter flag names.  Default to
> >       --no-restrict-to-sparse, for now.  Then merge it for git-2.31.
>
> I don't want to derail your high-level conversation too much, but by the
> end of January I hope to send an RFC to create a "sparse index" which allows
> the index to store entries corresponding to a directory with the skip-
> worktree bit on. The biggest benefit is that commands like 'git status' and
> 'git add' will actually change their performance based on the size of the
> sparse-checkout definition and not the total number of paths at HEAD.

This is _awesome_; I think it'll be huge.  It'll cause even more
commands behavior to change, of course, but in a good way.  And I
don't consider this derailing at all but extending the discussion
complete with extra investigation work.  :-)

> The other thing that happens once we have that idea is that these behaviors
> in 'git grep' or 'git rm' actually become _easier_ to implement because we
> don't even have an immediate reference to the blobs outside of the sparse
> cone (assuming cone mode).
>
> The tricky part (that I'm continuing to work on, hence no RFC today) is
> enabling the part where a user can opt-in to the old behavior. This requires
> parsing trees to expand the index as necessary. A simple approach is to
> create an in-memory index that is the full expansion at HEAD, when necessary.
> It will be better to do expansions in a targeted way.

I'm not sure if you're just thinking of the old mt/rm-sparse-checkout
and commenting on it, or if you're actively disagreeing with my
proposal for rm.

> (Your merge-ort algorithm is critical to the success here, since that doesn't
> use the index as a data structure. I expect to make merge-ort the default for
> users with a sparse index. Your algorithm will be done first.)

Well, at 50 added/changed lines per patch, I've only got ~50 more
patches to go for ort after the ones I submitted Monday (mostly
optimization related).  If I submit 10 patches per week (starting next
week since I already sent a big patchset this week), then maybe
mid-to-late February.  That's a more aggressive pace than we've
managed so far, but maybe it gets easier towards the end?  Anyway,
hopefully that helps you with timing predictions.

On my end, this does make the ort work look like there's finally some
light at the end of the tunnel; I just hope it's not an oncoming
train. :-)

> My point in bringing this up is that perhaps we should pause concrete work on
> updating other builtins until we have a clearer idea of what a sparse index
> could look like and how the implementation would change based on having one
> or not. I hope that my RFC will be illuminating in this regard.

Are you suggesting to pause any work on those pieces of the proposal
that might be affected by your sparse index, or pause any work at all
on sparse-checkouts?  For example, I think
en/stash-apply-sparse-checkout that's been sitting in seen is good to
merge down to master now.  I suspect mt/rm-sparse-checkout WITH my
suggested changes (no configurability -- similar to git-add) and a
better warning/error message for git-add are some examples of cleanups
that could be done before your sparse index, but if you're worried
about conflicting I certainly don't want to derail your project.  (I
agree that anything with configurability and touching on "behavior A"
or "sparse parallelax", like mt/grep-sparse-checkout would be better
if we waited on.  I do feel pretty bad for how much we've made Matheus
wait on that series, but waiting does still seem best.)

> Ok, enough of that sidebar. I thought it important to bring up, but
>
> > Longer term version:
> >
> > I'll split these into categories...
> >
> > --> Default behavior
> >   * Default to behavior B (--no-restrict-to-sparse from
> >     mt/grep-sparse-checkout) for now.  I think that's the wrong default
> >     for when we marry sparse-checkouts with partial clones, but we only
> >     have patches for behavior A for git grep; it may take a while to
> >     support behavior A in each command.  Slowly changing behavior of
> >     commands with each release is problematic.  We can discuss again
> >     after behavior A is fully supported what to make the defaults be.
> >
> > --> Commands already working with sparse-checkouts; no known bugs:
> >   * status
> >   * switch, the "switch" parts of checkout
> >
> >   * read-tree
> >   * update-index
> >   * ls-files
> >
> > --> Enhancements
> >   * General
> >     * shorter flag names than --[no-]restrict-to-sparse.  --dense and
> >       --sparse?  --[no-]restrict?
>
> --full-workdir?

Hmm.  "workdir" sounds like an abbreviation of "working directory",
which is the place where the files are checked out.  And the working
directory is sparse in a sparse-checkout.  So isn't this misleading?
Or did you intend for this option to be the name for requesting a
sparser set?  (If so, isn't "full" in its name a bit weird?)

Also, what would the inverse name of --full-workdir be?  I was looking
to add options for both restricting the command to the sparser set and
for expanding to the full set of files.  Though I guess as you note
below, you perhaps might be in favor of only one of these without
configuration options to adjust defaults.

> >   * sparse-checkout (After behavior A is implemented...)
> >     * Provide warning if sparse.restrictCmds not set (similar to git
> >       pull's warning with no pull.rebase, or git checkout's warning when
> >       detaching HEAD)
> >   * clone
> >       * Consider having clone set sparse.restrictCmds based on whether
> >       --partial is provided in addition to --sparse.
>
> In general, we could use some strategies to help users opt-in to these
> new behaviors more easily. We are very close to having the only real
> feature of Scalar be that it sets these options automatically, and will
> continue to push to the newest improvements as possible.

Nice!

> > --> Commands with minor bugs/annoyances:
> >   * add
> >     * print a warning if pathspec only matches SKIP_WORKTREE files (much
> >       as it already does if the pathspec matches no files)
> >
> >   * reset --hard
> >     * spurious and incorrect warning when removing a newly added file
> >   * merge, rebase, cherry-pick, revert
> >     * unnecessary unsparsification (merge-ort should fix this)
> >   * stash
> >     * similar to merge, but there are extra bugs from the pipeline
> >       design.  en/stash-apply-sparse-checkout fixes the known issues.
> >
> > --> Buggy commands
> >   * am
> >     * should behave like merge commands -- (1) it needs to be okay for
> >       the file to not exist in the working directory; vivify it if
> >       necessary, (2) any conflicted paths must remain vivified, (3)
> >       paths which merge cleanly can be unvivified.
> >   * apply
> >     * See am
> >   * rm
> >     * should behave like add, skipping SKIP_WORKTREE entries.  See comments
> >       on mt/rm-sparse-checkout elsewhere
> >   * restore
> >     * with revisions and/or globs, sparsity patterns should be heeded
> >   * checkout
> >     * see restore
> >
> > --> Commands that need no changes because commits are full-tree:
> >   * archive
> >   * bundle
> >   * commit
> >   * format-patch
> >   * fast-export
> >   * fast-import
> >   * commit-tree
> >
> > --> Commands that would change for behavior A
> >   * bisect
> >     * Only consider commits touching paths matching sparsity patterns
> >   * diff
> >     * When given revisions, only show subset of files matching sparsity
> >       patterns.  If pathspecs are given, intersect them with sparsity
> >       patterns.
> >   * log
> >     * Only consider commits touching at least one path matching sparsity
> >       patterns.  If pathspecs are given, paths must match both the
> >       pathspecs and the sparsity patterns in order to be considered
> >       relevant and be shown.
> >   * gitk
> >     * See log
> >   * shortlog
> >     * See log
> >   * grep
> >     * See mt/grep-sparse-checkout; it's been discussed in detail..and is
> >       implemented.  (Other than that we don't want behavior A to be the
> >       default when so many commands do not support it yet.)
> >
> >   * show-branch
> >     * See log
> >   * whatchanged
> >     * See log
> >   * show (at least for commits)
> >     * See diff
> >
> >   * blame
> >     * With -C or -C -C, only detect lines moved/copied from files that match
> >       the sparsity paths.
> >   * annotate
> >     * See blame.
>
> this "behavior A" idea is the one I'm most skeptical about. Creating a
> way to opt-in to a sparse definition might be nice. It might be nice to
> run "git log --simplify-sparse" to see the simplified history when only
> caring about commits that changed according to the current sparse-checkout
> definitions. Expand that more when asking for diffs as part of that log,
> and the way we specify the option becomes tricky.

--simplify-sparse is a really long name to need to specify at every
invocation.  Also, if we have --[no]-restrict or --sparse/--dense
options at the git level (rather than the subcommand level), then I
think we don't want extra ones like this at the subcommand level.

Also, if the option appears at the global git level, doesn't that
remove the trickiness of revision traversal vs. diff outputting in
commands like log?  It just automatically applies to both.  (The only
trickiness would be if you wanted to somehow apply sparsity patterns
to just revision traversal or just diff outputting but not to both,
but that's already tricky in log with explicit pathspecs and we've
traditionally had files restrict both.)

> But I also want to avoid doing this as a default or even behind a config
> setting. We already get enough complains about "missing commits" when
> someone does a bad merge so "git log -- file" simplifies away a commit
> that exists in the full history. Imagine someone saying "on my machine,
> 'git log' shows the commit, but my colleague can't see it!" I would really
> like to avoid adding to that confusion if possible.

That's a good point.  A really good point.  Maybe we do only want to
allow explicit requests for this behavior -- and thus need very short
option name for it.

Here's a not-even-half-baked idea for thought: What if we allowed a
configuration option to control this, BUT whenever a command like
diff/grep/log restricts output based on the sparsity paths due solely
to the configuration option, it prints a small reminder on stderr at
the beginning of the output (e.g. "Note: output limited to sparsity
paths, as per sparse.restrictCmds setting")?

> > --> Commands whose behavior I'm still uncertain of:
> >   * worktree add
> >     * for behavior A (marrying sparse-checkout with partial clone), we
> >       should almost certainly copy sparsity paths from the previous
> >       worktree (we either have to do that or have some kind of
> >       specify-at-clone-time default set of sparsity paths)
> >     * for behavior B, we may also want to copy sparsity paths from the
> >       previous worktree (much like a new command line shell will copy
> >       $PWD from the previous one), but it's less clear.  Should it?
>
> I think 'git worktree add' should at minimum continue using a sparse-
> checkout if the current working directory has one. Worktrees are a
> great way to scale the creation of multiple working directories for
> the same repository without re-cloning all of the history. In a partial
> clone case, it's really important that we don't explode the workdir in
> the new worktree (or even download all those blobs).

Okay, sounds like you agree with me for the partial clone case -- it's
necessary.

But what about the non-partial clone case?  I think it should adopt
the sparsity in that case too, but Junio has objected in the past.
I'm pretty sure Junio wasn't thinking about the partial clone case,
where I think it seems obvious and compelling.  But I'm not sure how
best to convince him in the non-partial clone case (or maybe I already
did; he didn't respond further after his initial objection).

> Now, should we copy the sparse-checkout definitions, or start with the
> "only files at root" default? That's a more subtle question.

Ooh, good point.  Even if we adopt sparsity in the new worktree,
there's apparently a number of ways to do it.

> >   * range-diff
> >     * is this considered to be log-like for format-patch-like in
> >       behavior?
>
> If we stick with log acting on the full tree unless specified in the
> command-line options, then range-diff can be the same. Seems like a
> really low priority, though, because of the proximity to format-patch.
>
> >   * cherry
> >     * see range-diff
> >   * plumbing -- diff-files, diff-index, diff-tree, ls-tree, rev-list
> >     * should these be tweaked or always operate full-tree?
> >   * checkout-index
> >     * should it be like checkout and pay attention to sparsity paths, or
> >       be considered special like update-index, ls-files, & read-tree and
> >       write to working tree anyway?
> >   * mv
> >     * I don't think mv can take a glob, and I think it currently happens to
> >       work.  Should we add a comment to the code that if anyone wants to
> >       support mv using pathspecs they might need to be careful about
> >       SKIP_WORKTREE?
> >
> > --> Might need changes, but who cares?
> >   * merge-file
> >   * merge-index
> >
> > --> Commands with no interaction with sparse-checkout:
>
> (I agree with the list you included here.)
>
> Thanks for starting the discussion. Perhaps more will pick it up as
> they return from the holiday break.

Thanks for jumping in and pushing it much further with sparse indices
(or is it sparse indexes?)  I'm excited.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns]
  2021-01-06 19:15           ` Elijah Newren
@ 2021-01-07 12:53             ` Derrick Stolee
  2021-01-07 17:36               ` Elijah Newren
  0 siblings, 1 reply; 56+ messages in thread
From: Derrick Stolee @ 2021-01-07 12:53 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Matheus Tavares Bernardino, Git Mailing List, Junio C Hamano

On 1/6/2021 2:15 PM, Elijah Newren wrote:
> On Sun, Jan 3, 2021 at 7:02 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 12/31/2020 3:03 PM, Elijah Newren wrote:
>> Others use sparse-checkout to remove a few large files unless they
>> need them. I'm less interested in this case, myself.
>>
>> Both perspectives get better with partial clone because the download
>> size shrinks significantly. While partial clone has a sparse-checkout
>> style filter, it is hard to compute on the server side. Further, it
>> is not very forgiving of someone wanting to change their sparse
>> definition after cloning. Tree misses are really expensive, and I find
>> that the extra network transfer of the full tree set is a price that is
>> worth paying.
> 
> Out of curiosity, is that because the promisor handling doesn't do
> nice batching of trees to download, as is done for blobs, or is there
> a more fundamental reason they are really expensive?  (I'm just
> wondering if we are risking changing design in some areas based on
> suboptimal implementation of other things.  I don't actually have
> experience with partial clones yet, though, so I'm basically just
> querying about random but interesting things without any experience to
> back it up.)

GitHub doesn't support pathspec filters for partial clone because it
is too expensive to calculate that initial packfile (cannot use
reachability bitmaps). Even outside of that initial cost, we have
problems.

The biggest problem is that we ask for the tree as a one-off request.
There are two ways to approach this:

1. Ask for all trees that are reachable from that tree so we can
   complete the tree walk (current behavior). This downloads trees we
   already have, most of the time.

2. Ask for only that tree and no extra objects. This causes the request
   count to increase significantly, especially during a 'git pull' or
   'git checkout' that spans a large distance.

In either case, commands like "git log -- README.md" are really bad in
a treeless clone (--filter=tree:0).

For the sparse-checkout case, we still need the trees outside of our
sparse cone in order to construct an index, even if we never actually
check out those files. (Maybe not forever, though...)

And maybe the solution would be to ask the server for your missing
trees in the entire history when you change sparse-checkout definition,
but what does that request look like?

 client> I have these commits with trees according to this pathspec.
 client> I want these commits with trees according to a new pathspec.
 server> *flips table*

>> I think there are three possible situations:
>>
>> 1. sparse-checkout should not affect the behavior at all.
>>
>>    An example for this is "git commit". We want the root tree to contain
>>    all of the subtrees and blobs that are out of the sparse-checkout
>>    definition. The underlying object model should never change.
>>
>> 2. sparse-checkout should change the default, but users can opt-out.
>>
>>    The examples I think of here are 'git grep' and 'git rm', as we have
>>    discussed recently. Having a default of "you already chose to be in
>>    a sparse-checkout, so we think this behavior is better for you"
>>    should continue to be pursued.
>>
>> 3. Users can opt-in to a sparse-checkout version of a behavior.
>>
>>    The example in this case is "git diff". Perhaps we would want to see
>>    a diff scoped only to our sparse definition, but that should not be
>>    the default. It is too risky to change the output here without an
>>    explicit choice by the user.
> 
> I'm curious why you put grep and diff in different categories.  A
> plain "git diff" without revisions will give the same output whether
> or not it restricts to the sparsity paths (because the other paths are
> unchanged), so restricting is purely an optimization question.  Making
> "git diff REVISION" restrict to the sparsity paths would be a
> behavioral change as you note, but "git grep [REVISION]" would also
> require a behavioral change to limit to the sparsity paths.  If it's
> too risky to change the output for git diff with revisions, why is it
> not also too risky to do that with git grep with revisions?

I generally think of 'grep' as being "search for something I care about"
which is easier to justify scoping to sparse-checkouts.

'diff' is something that I usually think of as "compare two git objects"
and it is operating on immutable data.

The practical difference comes into play with a blobless partial clone:
'diff' will download blobs that need a content comparison, so the cost
is relative to the number of changed paths in that region and relative
to the requested output. 'grep' will download every blob reachable from
the root tree. We've seen too many cases of users trying 'git grep' to
search the Windows codebase and complaining that it takes too long
(because they are downloading 3 million blobs one at a time).

> Also, I think you are missing a really important category:
> 
> 4. sparse-checkout changes the behavior of commands and there is no
> opt-out or configurability provided.
> 
> The most obvious examples are switch and checkout -- their modified
> behavior is really the /point/ of sparse-checkouts and if you want to
> "opt out" then just don't use sparse-checkouts.  `reset --hard` can go
> in the same bucket; it's modified in the same way.  However, some
> commands are modified in a different way, but also have no opt-out --
> for example, merge, rebase, cherry-pick, revert, and stash, all "try"
> to avoid writing files to the working tree that match the sparsify
> specifications, but will vivify files which have conflicts (and maybe
> a few additional files based on implementation shortcomings).  Another
> command that behaves differently than any of these, and is also
> non-configurable in this change, is git-add.  It'll ignore any tracked
> files with the SKIP_WORKTREE bit set, even if the file is present.
> That's really helpful thing for "git add -A [GLOB_OR_DIRECTORY]" to
> do, as we don't want sparsity to accidentally be treated as a
> directive to remove files from the repository.

True. Except for these, the opt-in/out is "git sparse-checkout init"
and "git sparse-checkout disable". If I want "git checkout" to behave
differently, then I modify my sparse-checkout definition or disable
it altogether.

Perhaps instead we should think of this category as the "core
functionality of sparse-checkout."

> I think more commands should fall under this fourth category as well,
> including rm.

The biggest issue with 'rm' is that users may want to use it to
delete paths outside of their sparse-checkout according to a
pathspec. This is especially true since it is the current
behavior, so if we change it by default we might discover more
complaints than the current requests for a way to limit to the
sparse-checkout definition.

>>>   * mt/grep-sparse-checkout: figure out shorter flag names.  Default to
>>>       --no-restrict-to-sparse, for now.  Then merge it for git-2.31.
>>
>> I don't want to derail your high-level conversation too much, but by the
>> end of January I hope to send an RFC to create a "sparse index" which allows
>> the index to store entries corresponding to a directory with the skip-
>> worktree bit on. The biggest benefit is that commands like 'git status' and
>> 'git add' will actually change their performance based on the size of the
>> sparse-checkout definition and not the total number of paths at HEAD.
> 
> This is _awesome_; I think it'll be huge.  It'll cause even more
> commands behavior to change, of course, but in a good way.  And I
> don't consider this derailing at all but extending the discussion
> complete with extra investigation work.  :-)
> 
>> The other thing that happens once we have that idea is that these behaviors
>> in 'git grep' or 'git rm' actually become _easier_ to implement because we
>> don't even have an immediate reference to the blobs outside of the sparse
>> cone (assuming cone mode).
>>
>> The tricky part (that I'm continuing to work on, hence no RFC today) is
>> enabling the part where a user can opt-in to the old behavior. This requires
>> parsing trees to expand the index as necessary. A simple approach is to
>> create an in-memory index that is the full expansion at HEAD, when necessary.
>> It will be better to do expansions in a targeted way.
> 
> I'm not sure if you're just thinking of the old mt/rm-sparse-checkout
> and commenting on it, or if you're actively disagreeing with my
> proposal for rm.

I remember the discussion around how making 'rm' sparse-aware was more
complicated than "only look at entries without CE_SKIP_WORKTREE" but
it might be easier with a sparse-index. So my intention here was to
see if we should _delay_ our investigation here until I can at least
get a prototype ready for inspection.

I'm also saying that perhaps we could redirect this discussion around
how to opt-in/out of these changes. Much like your "category 4" above
being "behavior expected when in a sparse-checkout," what if this
behavior of restricting to the sparse set was expected when using a
sparse-index instead of based on config options or run-time arguments?

What if we had something like "git update-index --[no-]sparse" to
toggle between the two states?

That's my intention with bringing up my half-baked idea before I have
code to show for it.

>> (Your merge-ort algorithm is critical to the success here, since that doesn't
>> use the index as a data structure. I expect to make merge-ort the default for
>> users with a sparse index. Your algorithm will be done first.)
> 
> Well, at 50 added/changed lines per patch, I've only got ~50 more
> patches to go for ort after the ones I submitted Monday (mostly
> optimization related).  If I submit 10 patches per week (starting next
> week since I already sent a big patchset this week), then maybe
> mid-to-late February.  That's a more aggressive pace than we've
> managed so far, but maybe it gets easier towards the end?  Anyway,
> hopefully that helps you with timing predictions.
> 
> On my end, this does make the ort work look like there's finally some
> light at the end of the tunnel; I just hope it's not an oncoming
> train. :-)

While I expect to have an RFC ready at the end of the month, I expect
I will be working on sparse-index for the entire 2021 calendar year
before it will be fully ready to use by end-users. I expect my RFC to
have fast "git status" and "git add" times, but other commands will
have a guard that expands a sparse-index into a "full" index before
proceeding. This protection will keep behavior consistent but will
cause performance problems. Iteratively removing these guards and
implementing "sparse-aware" versions of each index operation will take
time and care.

>> My point in bringing this up is that perhaps we should pause concrete work on
>> updating other builtins until we have a clearer idea of what a sparse index
>> could look like and how the implementation would change based on having one
>> or not. I hope that my RFC will be illuminating in this regard.
> 
> Are you suggesting to pause any work on those pieces of the proposal
> that might be affected by your sparse index, or pause any work at all
> on sparse-checkouts?  For example, I think
> en/stash-apply-sparse-checkout that's been sitting in seen is good to
> merge down to master now.  I suspect mt/rm-sparse-checkout WITH my
> suggested changes (no configurability -- similar to git-add) and a
> better warning/error message for git-add are some examples of cleanups
> that could be done before your sparse index, but if you're worried
> about conflicting I certainly don't want to derail your project.  (I
> agree that anything with configurability and touching on "behavior A"
> or "sparse parallelax", like mt/grep-sparse-checkout would be better
> if we waited on.  I do feel pretty bad for how much we've made Matheus
> wait on that series, but waiting does still seem best.)

I don't want to hold up valuable work. It's just tricky to navigate
parallel efforts in the same space. I'm asking for a little more time
to get my stuff together to see if it would influence your work.

But it is unreasonable for me to "squat" on the feature and keep others
from making valuable improvements.

>>>     * shorter flag names than --[no-]restrict-to-sparse.  --dense and
>>>       --sparse?  --[no-]restrict?
>>
>> --full-workdir?
> 
> Hmm.  "workdir" sounds like an abbreviation of "working directory",
> which is the place where the files are checked out.  And the working
> directory is sparse in a sparse-checkout.  So isn't this misleading?
> Or did you intend for this option to be the name for requesting a
> sparser set?  (If so, isn't "full" in its name a bit weird?)
> 
> Also, what would the inverse name of --full-workdir be?  I was looking
> to add options for both restricting the command to the sparser set and
> for expanding to the full set of files.  Though I guess as you note
> below, you perhaps might be in favor of only one of these without
> configuration options to adjust defaults.

Right. Perhaps --full-tree or --sparse-tree would be better? I was
trying to link the adjectives "full" and "sparse" to a noun that they
modify.

--dense already exists in rev-list to describe a form of history
simplification.

>>> --> Commands that would change for behavior A
>>>   * bisect
>>>     * Only consider commits touching paths matching sparsity patterns
>>>   * diff
>>>     * When given revisions, only show subset of files matching sparsity
>>>       patterns.  If pathspecs are given, intersect them with sparsity
>>>       patterns.
>>>   * log
>>>     * Only consider commits touching at least one path matching sparsity
>>>       patterns.  If pathspecs are given, paths must match both the
>>>       pathspecs and the sparsity patterns in order to be considered
>>>       relevant and be shown.
>>>   * gitk
>>>     * See log
>>>   * shortlog
>>>     * See log
>>>   * grep
>>>     * See mt/grep-sparse-checkout; it's been discussed in detail..and is
>>>       implemented.  (Other than that we don't want behavior A to be the
>>>       default when so many commands do not support it yet.)
>>>
>>>   * show-branch
>>>     * See log
>>>   * whatchanged
>>>     * See log
>>>   * show (at least for commits)
>>>     * See diff
>>>
>>>   * blame
>>>     * With -C or -C -C, only detect lines moved/copied from files that match
>>>       the sparsity paths.
>>>   * annotate
>>>     * See blame.
>>
>> this "behavior A" idea is the one I'm most skeptical about. Creating a
>> way to opt-in to a sparse definition might be nice. It might be nice to
>> run "git log --simplify-sparse" to see the simplified history when only
>> caring about commits that changed according to the current sparse-checkout
>> definitions. Expand that more when asking for diffs as part of that log,
>> and the way we specify the option becomes tricky.
> 
> --simplify-sparse is a really long name to need to specify at every
> invocation.  Also, if we have --[no]-restrict or --sparse/--dense
> options at the git level (rather than the subcommand level), then I
> think we don't want extra ones like this at the subcommand level.
> 
> Also, if the option appears at the global git level, doesn't that
> remove the trickiness of revision traversal vs. diff outputting in
> commands like log?  It just automatically applies to both.  (The only
> trickiness would be if you wanted to somehow apply sparsity patterns
> to just revision traversal or just diff outputting but not to both,
> but that's already tricky in log with explicit pathspecs and we've
> traditionally had files restrict both.)
> 
>> But I also want to avoid doing this as a default or even behind a config
>> setting. We already get enough complains about "missing commits" when
>> someone does a bad merge so "git log -- file" simplifies away a commit
>> that exists in the full history. Imagine someone saying "on my machine,
>> 'git log' shows the commit, but my colleague can't see it!" I would really
>> like to avoid adding to that confusion if possible.
> 
> That's a good point.  A really good point.  Maybe we do only want to
> allow explicit requests for this behavior -- and thus need very short
> option name for it.

And even though I mentioned earlier that "having a sparse-index might
be a good way to opt-in," I would still say that simplifying commit
history in 'git log' or reducing diff output would still require a
short command-line option.

> Here's a not-even-half-baked idea for thought: What if we allowed a
> configuration option to control this, BUT whenever a command like
> diff/grep/log restricts output based on the sparsity paths due solely
> to the configuration option, it prints a small reminder on stderr at
> the beginning of the output (e.g. "Note: output limited to sparsity
> paths, as per sparse.restrictCmds setting")?

I'm not thrilled with this idea, but perhaps the warning can be
toggled by an advice.* config option.

>>> --> Commands whose behavior I'm still uncertain of:
>>>   * worktree add
>>>     * for behavior A (marrying sparse-checkout with partial clone), we
>>>       should almost certainly copy sparsity paths from the previous
>>>       worktree (we either have to do that or have some kind of
>>>       specify-at-clone-time default set of sparsity paths)
>>>     * for behavior B, we may also want to copy sparsity paths from the
>>>       previous worktree (much like a new command line shell will copy
>>>       $PWD from the previous one), but it's less clear.  Should it?
>>
>> I think 'git worktree add' should at minimum continue using a sparse-
>> checkout if the current working directory has one. Worktrees are a
>> great way to scale the creation of multiple working directories for
>> the same repository without re-cloning all of the history. In a partial
>> clone case, it's really important that we don't explode the workdir in
>> the new worktree (or even download all those blobs).
> 
> Okay, sounds like you agree with me for the partial clone case -- it's
> necessary.
> 
> But what about the non-partial clone case?  I think it should adopt
> the sparsity in that case too, but Junio has objected in the past.
> I'm pretty sure Junio wasn't thinking about the partial clone case,
> where I think it seems obvious and compelling.  But I'm not sure how
> best to convince him in the non-partial clone case (or maybe I already
> did; he didn't respond further after his initial objection).

We might want to consider certain behavior to be on by default when
enough other optional features are enabled. A philosophy such as "We
see you are using partial clone and sparse-checkout, so we restricted
the search in 'git grep' for your own good" might be useful here.

>> Thanks for starting the discussion. Perhaps more will pick it up as
>> they return from the holiday break.
> 
> Thanks for jumping in and pushing it much further with sparse indices
> (or is it sparse indexes?)  I'm excited.

Another way to push this discussion further would be to create a
forward-looking documentation file in Documentation/technical.
We could use such a documentation as a place to organize thoughts
and plans, especially things like:

* How sparse-checkout works and why users need it.
* How it works (and doesn't work) with partial clone.
* Plans for modifying behavior in sparse scenarios:
  - Current behavior that is wrong or suspicious.
  - Commands that could have different default behavior.
  - Commands that could have different opt-in behavior.
  (This section would include a description of the planned flag
   that modifies behavior to limit to the sparse set.)

I would add a section about sparse-index into such a document, if it
existed.

As things get implemented, these items could be moved out of the
technical documentation and into Documentation/git-sparse-checkout.txt
so we have a central place for users to discover how sparse-checkout
can change their behavior.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns]
  2021-01-07 12:53             ` Derrick Stolee
@ 2021-01-07 17:36               ` Elijah Newren
  0 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-01-07 17:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Matheus Tavares Bernardino, Git Mailing List, Junio C Hamano

On Thu, Jan 7, 2021 at 4:53 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 1/6/2021 2:15 PM, Elijah Newren wrote:
> > On Sun, Jan 3, 2021 at 7:02 PM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 12/31/2020 3:03 PM, Elijah Newren wrote:
> >> Others use sparse-checkout to remove a few large files unless they
> >> need them. I'm less interested in this case, myself.
> >>
> >> Both perspectives get better with partial clone because the download
> >> size shrinks significantly. While partial clone has a sparse-checkout
> >> style filter, it is hard to compute on the server side. Further, it
> >> is not very forgiving of someone wanting to change their sparse
> >> definition after cloning. Tree misses are really expensive, and I find
> >> that the extra network transfer of the full tree set is a price that is
> >> worth paying.
> >
> > Out of curiosity, is that because the promisor handling doesn't do
> > nice batching of trees to download, as is done for blobs, or is there
> > a more fundamental reason they are really expensive?  (I'm just
> > wondering if we are risking changing design in some areas based on
> > suboptimal implementation of other things.  I don't actually have
> > experience with partial clones yet, though, so I'm basically just
> > querying about random but interesting things without any experience to
> > back it up.)
>
> GitHub doesn't support pathspec filters for partial clone because it
> is too expensive to calculate that initial packfile (cannot use
> reachability bitmaps). Even outside of that initial cost, we have
> problems.
>
> The biggest problem is that we ask for the tree as a one-off request.
> There are two ways to approach this:
>
> 1. Ask for all trees that are reachable from that tree so we can
>    complete the tree walk (current behavior). This downloads trees we
>    already have, most of the time.
>
> 2. Ask for only that tree and no extra objects. This causes the request
>    count to increase significantly, especially during a 'git pull' or
>    'git checkout' that spans a large distance.
>
> In either case, commands like "git log -- README.md" are really bad in
> a treeless clone (--filter=tree:0).
>
> For the sparse-checkout case, we still need the trees outside of our
> sparse cone in order to construct an index, even if we never actually
> check out those files. (Maybe not forever, though...)
>
> And maybe the solution would be to ask the server for your missing
> trees in the entire history when you change sparse-checkout definition,
> but what does that request look like?
>
>  client> I have these commits with trees according to this pathspec.
>  client> I want these commits with trees according to a new pathspec.
>  server> *flips table*

Ah, the good old *flips table* codepath.  That'd be a fun area of the
code to work on.  ;-)

To be serious, though, thanks for the extra info.

> >> I think there are three possible situations:
> >>
> >> 1. sparse-checkout should not affect the behavior at all.
> >>
> >>    An example for this is "git commit". We want the root tree to contain
> >>    all of the subtrees and blobs that are out of the sparse-checkout
> >>    definition. The underlying object model should never change.
> >>
> >> 2. sparse-checkout should change the default, but users can opt-out.
> >>
> >>    The examples I think of here are 'git grep' and 'git rm', as we have
> >>    discussed recently. Having a default of "you already chose to be in
> >>    a sparse-checkout, so we think this behavior is better for you"
> >>    should continue to be pursued.
> >>
> >> 3. Users can opt-in to a sparse-checkout version of a behavior.
> >>
> >>    The example in this case is "git diff". Perhaps we would want to see
> >>    a diff scoped only to our sparse definition, but that should not be
> >>    the default. It is too risky to change the output here without an
> >>    explicit choice by the user.
> >
> > I'm curious why you put grep and diff in different categories.  A
> > plain "git diff" without revisions will give the same output whether
> > or not it restricts to the sparsity paths (because the other paths are
> > unchanged), so restricting is purely an optimization question.  Making
> > "git diff REVISION" restrict to the sparsity paths would be a
> > behavioral change as you note, but "git grep [REVISION]" would also
> > require a behavioral change to limit to the sparsity paths.  If it's
> > too risky to change the output for git diff with revisions, why is it
> > not also too risky to do that with git grep with revisions?
>
> I generally think of 'grep' as being "search for something I care about"
> which is easier to justify scoping to sparse-checkouts.
>
> 'diff' is something that I usually think of as "compare two git objects"
> and it is operating on immutable data.
>
> The practical difference comes into play with a blobless partial clone:
> 'diff' will download blobs that need a content comparison, so the cost
> is relative to the number of changed paths in that region and relative
> to the requested output. 'grep' will download every blob reachable from
> the root tree. We've seen too many cases of users trying 'git grep' to
> search the Windows codebase and complaining that it takes too long
> (because they are downloading 3 million blobs one at a time).

Oh, is the primary difference here that you're complaining about a bug
in git grep, where without --cached it mixes worktree search results
with index search results?  That's just a flat out bug that should be
fixed.  See https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/.
grep is supposed to search the working tree by default, and *just* the
working tree -- as documented from the beginning.  If you specify
--cached, it'll search the index.  If you specify revisions, then it
searches revisions.  There is not supposed to currently be mixing and
matching of searches from multiple areas.  Someone could add such an
ability to search multiple locations, but then the user should have to
specify that behavior.  The fact that sparse-checkouts search multiple
areas is a *bug*, regardless of sparse indexes or not, regardless of
"behavior A" or "behavior B", etc.

The interesting bit, is whether `git grep ... REVISION` should
restrict to sparsity paths.  Unlike the working tree, REVISION
probably has many paths that don't match the sparsity paths.  Same for
--cached.  That part makes sense to make configurable.

So, I guess my question is, after the bug in git grep is fixed that
I've been railing about for nearly a year (and sadly got tied up in
other changes Matheus wanted to make so that the rest could be made
configurable), then do you consider the rest of git grep different
from git diff?

> > Also, I think you are missing a really important category:
> >
> > 4. sparse-checkout changes the behavior of commands and there is no
> > opt-out or configurability provided.
> >
> > The most obvious examples are switch and checkout -- their modified
> > behavior is really the /point/ of sparse-checkouts and if you want to
> > "opt out" then just don't use sparse-checkouts.  `reset --hard` can go
> > in the same bucket; it's modified in the same way.  However, some
> > commands are modified in a different way, but also have no opt-out --
> > for example, merge, rebase, cherry-pick, revert, and stash, all "try"
> > to avoid writing files to the working tree that match the sparsify
> > specifications, but will vivify files which have conflicts (and maybe
> > a few additional files based on implementation shortcomings).  Another
> > command that behaves differently than any of these, and is also
> > non-configurable in this change, is git-add.  It'll ignore any tracked
> > files with the SKIP_WORKTREE bit set, even if the file is present.
> > That's really helpful thing for "git add -A [GLOB_OR_DIRECTORY]" to
> > do, as we don't want sparsity to accidentally be treated as a
> > directive to remove files from the repository.
>
> True. Except for these, the opt-in/out is "git sparse-checkout init"
> and "git sparse-checkout disable". If I want "git checkout" to behave
> differently, then I modify my sparse-checkout definition or disable
> it altogether.
>
> Perhaps instead we should think of this category as the "core
> functionality of sparse-checkout."

The core functionality of sparse-checkout has always been only
partially implemented.  See the last few lines of t7012.  See the bugs
over the years with the merge machinery and SKIP_WORKTREE entries.
See the bug reports with git-stash.  See my long explanation to
Matheus on how git-grep without --cached or revisions is flat broken
in sparse-checkouts[1].  See Junio's comments about how "the sparse
checkout 'feature' itself is a hack'"[2] and that folks working in
other areas didn't need to provide full support for it.

So, that raises the question -- what else is "core functionality of
sparse-checkout" besides what I listed above?  I reject the idea that
whatever is currently implemented is the bright line.  The rest is
certainly up for discussion, but I don't like using the idea of
current behavior as the litmus test for whether something is core
functionality.  In fact this is the very reason why I so strongly
requested the huge warning in the sparse-checkout documentation[3].

[1] https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/
[2] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[3] https://lore.kernel.org/git/CABPp-BEryfaeYhuUsiDTaYdRKpK6GRi7hgZ5XSTVkoHVkx2qQA@mail.gmail.com/

And I'm arguing a change in rm behavior should be core functionality
of sparse-checkout; more on that below.

> > I think more commands should fall under this fourth category as well,
> > including rm.
>
> The biggest issue with 'rm' is that users may want to use it to
> delete paths outside of their sparse-checkout according to a
> pathspec. This is especially true since it is the current
> behavior, so if we change it by default we might discover more
> complaints than the current requests for a way to limit to the
> sparse-checkout definition.

Users cannot update files outside their sparsity paths using add,
though.  Even if they create the file from scratch with the necessary
changes and run 'git add' on it.  And that's core functionality.
Users must first change their sparsity paths before trying to add such
files.  If we were to allow them, I think things get weird.

I think that's a good model.  We should require them to do the same
for removing.  If they come to us requesting a way to delete paths
outside their sparse-checkout, we give them the same answer that we do
for updating paths outside their sparse-checkout: simply change your
sparsity definition first.  It's a simple solution.  No
configurability is required here, IMO, and it makes commands more
consistent with each other.

Of course, that's the high-level explanation.  Digging in to the
details, there we should also change or add warning/error messages for
both 'add' and 'rm' in some cases; see my original email in this
thread for that discussion.

> >>>   * mt/grep-sparse-checkout: figure out shorter flag names.  Default to
> >>>       --no-restrict-to-sparse, for now.  Then merge it for git-2.31.
> >>
> >> I don't want to derail your high-level conversation too much, but by the
> >> end of January I hope to send an RFC to create a "sparse index" which allows
> >> the index to store entries corresponding to a directory with the skip-
> >> worktree bit on. The biggest benefit is that commands like 'git status' and
> >> 'git add' will actually change their performance based on the size of the
> >> sparse-checkout definition and not the total number of paths at HEAD.
> >
> > This is _awesome_; I think it'll be huge.  It'll cause even more
> > commands behavior to change, of course, but in a good way.  And I
> > don't consider this derailing at all but extending the discussion
> > complete with extra investigation work.  :-)
> >
> >> The other thing that happens once we have that idea is that these behaviors
> >> in 'git grep' or 'git rm' actually become _easier_ to implement because we
> >> don't even have an immediate reference to the blobs outside of the sparse
> >> cone (assuming cone mode).
> >>
> >> The tricky part (that I'm continuing to work on, hence no RFC today) is
> >> enabling the part where a user can opt-in to the old behavior. This requires
> >> parsing trees to expand the index as necessary. A simple approach is to
> >> create an in-memory index that is the full expansion at HEAD, when necessary.
> >> It will be better to do expansions in a targeted way.
> >
> > I'm not sure if you're just thinking of the old mt/rm-sparse-checkout
> > and commenting on it, or if you're actively disagreeing with my
> > proposal for rm.
>
> I remember the discussion around how making 'rm' sparse-aware was more
> complicated than "only look at entries without CE_SKIP_WORKTREE" but
> it might be easier with a sparse-index. So my intention here was to
> see if we should _delay_ our investigation here until I can at least
> get a prototype ready for inspection.
>
> I'm also saying that perhaps we could redirect this discussion around
> how to opt-in/out of these changes. Much like your "category 4" above
> being "behavior expected when in a sparse-checkout," what if this
> behavior of restricting to the sparse set was expected when using a
> sparse-index instead of based on config options or run-time arguments?

All the delay was based on the configurability (which I initially
thought was needed too).  But after further recent investigation, I
think configurability for rm's behavior is just wrong.  rm should
restrict to sparse paths, just like add does.  I agree on delaying any
new features or changes that require configurability (and I agree with
you that the sparse-index could play a role in how to configure
things).  I just think that en/stash-apply-sparse-checkout and a
modified version of mt/rm-sparse-checkout are two cases that don't
require configurability and could go in first.

I also think pulling the bugfix out of mt/grep-sparse-checkout (don't
mix worktree and index searches) and merging it could happen now,
while waiting off on all the other bits that require configuration.

> What if we had something like "git update-index --[no-]sparse" to
> toggle between the two states?

Ooh, interesting idea.  We might have to discuss the name a bit; I'm
worried folks might struggle to differentiate between that command and
`git sparse-checkout {init,disable}`.

> That's my intention with bringing up my half-baked idea before I have
> code to show for it.
>
> >> (Your merge-ort algorithm is critical to the success here, since that doesn't
> >> use the index as a data structure. I expect to make merge-ort the default for
> >> users with a sparse index. Your algorithm will be done first.)
> >
> > Well, at 50 added/changed lines per patch, I've only got ~50 more
> > patches to go for ort after the ones I submitted Monday (mostly
> > optimization related).  If I submit 10 patches per week (starting next
> > week since I already sent a big patchset this week), then maybe
> > mid-to-late February.  That's a more aggressive pace than we've
> > managed so far, but maybe it gets easier towards the end?  Anyway,
> > hopefully that helps you with timing predictions.
> >
> > On my end, this does make the ort work look like there's finally some
> > light at the end of the tunnel; I just hope it's not an oncoming
> > train. :-)
>
> While I expect to have an RFC ready at the end of the month, I expect
> I will be working on sparse-index for the entire 2021 calendar year
> before it will be fully ready to use by end-users. I expect my RFC to
> have fast "git status" and "git add" times, but other commands will
> have a guard that expands a sparse-index into a "full" index before
> proceeding. This protection will keep behavior consistent but will
> cause performance problems. Iteratively removing these guards and
> implementing "sparse-aware" versions of each index operation will take
> time and care.
>
> >> My point in bringing this up is that perhaps we should pause concrete work on
> >> updating other builtins until we have a clearer idea of what a sparse index
> >> could look like and how the implementation would change based on having one
> >> or not. I hope that my RFC will be illuminating in this regard.
> >
> > Are you suggesting to pause any work on those pieces of the proposal
> > that might be affected by your sparse index, or pause any work at all
> > on sparse-checkouts?  For example, I think
> > en/stash-apply-sparse-checkout that's been sitting in seen is good to
> > merge down to master now.  I suspect mt/rm-sparse-checkout WITH my
> > suggested changes (no configurability -- similar to git-add) and a
> > better warning/error message for git-add are some examples of cleanups
> > that could be done before your sparse index, but if you're worried
> > about conflicting I certainly don't want to derail your project.  (I
> > agree that anything with configurability and touching on "behavior A"
> > or "sparse parallelax", like mt/grep-sparse-checkout would be better
> > if we waited on.  I do feel pretty bad for how much we've made Matheus
> > wait on that series, but waiting does still seem best.)
>
> I don't want to hold up valuable work. It's just tricky to navigate
> parallel efforts in the same space. I'm asking for a little more time
> to get my stuff together to see if it would influence your work.
>
> But it is unreasonable for me to "squat" on the feature and keep others
> from making valuable improvements.

I'm totally willing to hold off on bigger changes, new features, and
anything that requires configurability.  I'm also willing to hold off
on any bug fixes that I have reason to think might conflict with your
work.

However, I think that several bug fixes would be independent of your
current work.  For example, the already-submitted
en/stash-apply-sparse-checkout from a month ago (which I think should
be merged to master), and fixing up reset --hard.  You know more about
me than your changes, though; do you have reason to think these might
conflict?

> >>>     * shorter flag names than --[no-]restrict-to-sparse.  --dense and
> >>>       --sparse?  --[no-]restrict?
> >>
> >> --full-workdir?
> >
> > Hmm.  "workdir" sounds like an abbreviation of "working directory",
> > which is the place where the files are checked out.  And the working
> > directory is sparse in a sparse-checkout.  So isn't this misleading?
> > Or did you intend for this option to be the name for requesting a
> > sparser set?  (If so, isn't "full" in its name a bit weird?)
> >
> > Also, what would the inverse name of --full-workdir be?  I was looking
> > to add options for both restricting the command to the sparser set and
> > for expanding to the full set of files.  Though I guess as you note
> > below, you perhaps might be in favor of only one of these without
> > configuration options to adjust defaults.
>
> Right. Perhaps --full-tree or --sparse-tree would be better? I was
> trying to link the adjectives "full" and "sparse" to a noun that they
> modify.
>
> --dense already exists in rev-list to describe a form of history
> simplification.

I still think the flag should be a git global flag (like --no-pager,
--work-tree, or --git-dir), not a subcommand flag, so rev-list's
--dense flag isn't a collision even if we pick --dense/--sparse.

--full-tree and --sparse-tree are good considerations though to add to
the list we've come up with so far:
   --{full,sparse}-tree
   --dense/--sparse
   --[no-]restrict
   --[no-]restrict-to-sparse-paths

> >>> --> Commands that would change for behavior A
> >>>   * bisect
> >>>     * Only consider commits touching paths matching sparsity patterns
> >>>   * diff
> >>>     * When given revisions, only show subset of files matching sparsity
> >>>       patterns.  If pathspecs are given, intersect them with sparsity
> >>>       patterns.
> >>>   * log
> >>>     * Only consider commits touching at least one path matching sparsity
> >>>       patterns.  If pathspecs are given, paths must match both the
> >>>       pathspecs and the sparsity patterns in order to be considered
> >>>       relevant and be shown.
> >>>   * gitk
> >>>     * See log
> >>>   * shortlog
> >>>     * See log
> >>>   * grep
> >>>     * See mt/grep-sparse-checkout; it's been discussed in detail..and is
> >>>       implemented.  (Other than that we don't want behavior A to be the
> >>>       default when so many commands do not support it yet.)
> >>>
> >>>   * show-branch
> >>>     * See log
> >>>   * whatchanged
> >>>     * See log
> >>>   * show (at least for commits)
> >>>     * See diff
> >>>
> >>>   * blame
> >>>     * With -C or -C -C, only detect lines moved/copied from files that match
> >>>       the sparsity paths.
> >>>   * annotate
> >>>     * See blame.
> >>
> >> this "behavior A" idea is the one I'm most skeptical about. Creating a
> >> way to opt-in to a sparse definition might be nice. It might be nice to
> >> run "git log --simplify-sparse" to see the simplified history when only
> >> caring about commits that changed according to the current sparse-checkout
> >> definitions. Expand that more when asking for diffs as part of that log,
> >> and the way we specify the option becomes tricky.
> >
> > --simplify-sparse is a really long name to need to specify at every
> > invocation.  Also, if we have --[no]-restrict or --sparse/--dense
> > options at the git level (rather than the subcommand level), then I
> > think we don't want extra ones like this at the subcommand level.
> >
> > Also, if the option appears at the global git level, doesn't that
> > remove the trickiness of revision traversal vs. diff outputting in
> > commands like log?  It just automatically applies to both.  (The only
> > trickiness would be if you wanted to somehow apply sparsity patterns
> > to just revision traversal or just diff outputting but not to both,
> > but that's already tricky in log with explicit pathspecs and we've
> > traditionally had files restrict both.)
> >
> >> But I also want to avoid doing this as a default or even behind a config
> >> setting. We already get enough complains about "missing commits" when
> >> someone does a bad merge so "git log -- file" simplifies away a commit
> >> that exists in the full history. Imagine someone saying "on my machine,
> >> 'git log' shows the commit, but my colleague can't see it!" I would really
> >> like to avoid adding to that confusion if possible.
> >
> > That's a good point.  A really good point.  Maybe we do only want to
> > allow explicit requests for this behavior -- and thus need very short
> > option name for it.
>
> And even though I mentioned earlier that "having a sparse-index might
> be a good way to opt-in," I would still say that simplifying commit
> history in 'git log' or reducing diff output would still require a
> short command-line option.
>
> > Here's a not-even-half-baked idea for thought: What if we allowed a
> > configuration option to control this, BUT whenever a command like
> > diff/grep/log restricts output based on the sparsity paths due solely
> > to the configuration option, it prints a small reminder on stderr at
> > the beginning of the output (e.g. "Note: output limited to sparsity
> > paths, as per sparse.restrictCmds setting")?
>
> I'm not thrilled with this idea, but perhaps the warning can be
> toggled by an advice.* config option.

You've noted multiple times that you're leery of even providing such a
configuration option -- and you provided good rationale to be worried
about it.  So, if we did use this escape hatch to make it sane to
allow such a configuration option, I'd say this is probably one case
where we do not allow this advice to be turned off.  (In fact, we'd
not only avoid implementing it, we'd be careful to document that it's
intentionally not implemented so that someone doesn't come along later
and assume we just overlooked it.)

But that's still an 'if' and is just an idea I threw out there for us
to consider over the next year or two; we may well require the short
command-line option every time as you say.

> >>> --> Commands whose behavior I'm still uncertain of:
> >>>   * worktree add
> >>>     * for behavior A (marrying sparse-checkout with partial clone), we
> >>>       should almost certainly copy sparsity paths from the previous
> >>>       worktree (we either have to do that or have some kind of
> >>>       specify-at-clone-time default set of sparsity paths)
> >>>     * for behavior B, we may also want to copy sparsity paths from the
> >>>       previous worktree (much like a new command line shell will copy
> >>>       $PWD from the previous one), but it's less clear.  Should it?
> >>
> >> I think 'git worktree add' should at minimum continue using a sparse-
> >> checkout if the current working directory has one. Worktrees are a
> >> great way to scale the creation of multiple working directories for
> >> the same repository without re-cloning all of the history. In a partial
> >> clone case, it's really important that we don't explode the workdir in
> >> the new worktree (or even download all those blobs).
> >
> > Okay, sounds like you agree with me for the partial clone case -- it's
> > necessary.
> >
> > But what about the non-partial clone case?  I think it should adopt
> > the sparsity in that case too, but Junio has objected in the past.
> > I'm pretty sure Junio wasn't thinking about the partial clone case,
> > where I think it seems obvious and compelling.  But I'm not sure how
> > best to convince him in the non-partial clone case (or maybe I already
> > did; he didn't respond further after his initial objection).
>
> We might want to consider certain behavior to be on by default when
> enough other optional features are enabled. A philosophy such as "We
> see you are using partial clone and sparse-checkout, so we restricted
> the search in 'git grep' for your own good" might be useful here.

You're dodging the question.  ;-)  We already agree on the behavior
for sparse-checkouts with worktrees when partial clones are in effect;
it seems obvious any newly added worktree needs to also be sparse in
such a case.  The whole question was what about the non-partial-clone
case.  I'm of the opinion that it makes sense there too (if for no
other reason than making it easy to explain and understand what
worktree does without having to provide a list of cases to users), but
was curious if others had thoughts on the matter.  Maybe you don't
care since you always use partial clones?

> >> Thanks for starting the discussion. Perhaps more will pick it up as
> >> they return from the holiday break.
> >
> > Thanks for jumping in and pushing it much further with sparse indices
> > (or is it sparse indexes?)  I'm excited.
>
> Another way to push this discussion further would be to create a
> forward-looking documentation file in Documentation/technical.
> We could use such a documentation as a place to organize thoughts
> and plans, especially things like:
>
> * How sparse-checkout works and why users need it.
> * How it works (and doesn't work) with partial clone.
> * Plans for modifying behavior in sparse scenarios:
>   - Current behavior that is wrong or suspicious.
>   - Commands that could have different default behavior.
>   - Commands that could have different opt-in behavior.
>   (This section would include a description of the planned flag
>    that modifies behavior to limit to the sparse set.)
>
> I would add a section about sparse-index into such a document, if it
> existed.
>
> As things get implemented, these items could be moved out of the
> technical documentation and into Documentation/git-sparse-checkout.txt
> so we have a central place for users to discover how sparse-checkout
> can change their behavior.

Yeah, good idea.  This thread has a lot of that information, so I'd be
happy to start up such a document after I both hear back from Matheus
and hear back from you on my rm & grep bug fix ideas.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths
  2020-11-16 13:58 ` [PATCH v2] " Matheus Tavares
@ 2021-02-17 21:02   ` Matheus Tavares
  2021-02-17 21:02     ` [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used Matheus Tavares
                       ` (8 more replies)
  0 siblings, 9 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

This is based on the discussion at [1]. It makes `rm` honor sparse
checkouts and adds a warning to both `rm` and `add`, for the case where
a pathspec _only_ matches skip-worktree entries. The first two patches
are somewhat unrelated fixes, but they are used by the later patches.

[1]: https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/

Matheus Tavares (7):
  add --chmod: don't update index when --dry-run is used
  add: include magic part of pathspec on --refresh error
  t3705: add tests for `git add` in sparse checkouts
  add: make --chmod and --renormalize honor sparse checkouts
  pathspec: allow to ignore SKIP_WORKTREE entries on index matching
  add: warn when pathspec only matches SKIP_WORKTREE entries
  rm: honor sparse checkout patterns

 Documentation/config/advice.txt  |   4 +
 Documentation/git-rm.txt         |   4 +-
 advice.c                         |  19 +++++
 advice.h                         |   4 +
 builtin/add.c                    |  72 ++++++++++++++----
 builtin/check-ignore.c           |   2 +-
 builtin/rm.c                     |  35 ++++++---
 pathspec.c                       |  25 ++++++-
 pathspec.h                       |  13 +++-
 read-cache.c                     |   3 +-
 t/t3600-rm.sh                    |  54 ++++++++++++++
 t/t3700-add.sh                   |  26 +++++++
 t/t3705-add-sparse-checkout.sh   | 122 +++++++++++++++++++++++++++++++
 t/t7011-skip-worktree-reading.sh |   5 --
 t/t7012-skip-worktree-writing.sh |  19 -----
 15 files changed, 349 insertions(+), 58 deletions(-)
 create mode 100755 t/t3705-add-sparse-checkout.sh

-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
@ 2021-02-17 21:02     ` Matheus Tavares
  2021-02-17 21:45       ` Junio C Hamano
  2021-02-17 21:02     ` [RFC PATCH 2/7] add: include magic part of pathspec on --refresh error Matheus Tavares
                       ` (7 subsequent siblings)
  8 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

`git add --chmod` applies the mode changes even when `--dry-run` is
used. Fix that and add some tests for this option combination.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/add.c  |  7 ++++---
 t/t3700-add.sh | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index a825887c50..f757de45ea 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -38,7 +38,7 @@ struct update_callback_data {
 	int add_errors;
 };
 
-static void chmod_pathspec(struct pathspec *pathspec, char flip)
+static void chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
 {
 	int i;
 
@@ -48,7 +48,8 @@ static void chmod_pathspec(struct pathspec *pathspec, char flip)
 		if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))
 			continue;
 
-		if (chmod_cache_entry(ce, flip) < 0)
+		if ((show_only && !S_ISREG(ce->ce_mode)) ||
+		    (!show_only && chmod_cache_entry(ce, flip) < 0))
 			fprintf(stderr, "cannot chmod %cx '%s'\n", flip, ce->name);
 	}
 }
@@ -609,7 +610,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		exit_status |= add_files(&dir, flags);
 
 	if (chmod_arg && pathspec.nr)
-		chmod_pathspec(&pathspec, chmod_arg[0]);
+		chmod_pathspec(&pathspec, chmod_arg[0], show_only);
 	unplug_bulk_checkin();
 
 finish:
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index b7d4ba608c..fc81f2ef00 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -386,6 +386,26 @@ test_expect_success POSIXPERM 'git add --chmod=[+-]x does not change the working
 	! test -x foo4
 '
 
+test_expect_success 'git add --chmod honors --dry-run' '
+	git reset --hard &&
+	echo foo >foo4 &&
+	git add foo4 &&
+	git add --chmod=+x --dry-run foo4 &&
+	test_mode_in_index 100644 foo4
+'
+
+test_expect_success 'git add --chmod --dry-run reports error for non regular files' '
+	git reset --hard &&
+	test_ln_s_add foo foo4 &&
+	git add --chmod=+x --dry-run foo4 2>stderr &&
+	grep "cannot chmod +x .foo4." stderr
+'
+
+test_expect_success 'git add --chmod --dry-run reports error for unmatched pathspec' '
+	test_must_fail git add --chmod=+x --dry-run nonexistent 2>stderr &&
+	test_i18ngrep "pathspec .nonexistent. did not match any files" stderr
+'
+
 test_expect_success 'no file status change if no pathspec is given' '
 	>foo5 &&
 	>foo6 &&
-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 2/7] add: include magic part of pathspec on --refresh error
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
  2021-02-17 21:02     ` [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used Matheus Tavares
@ 2021-02-17 21:02     ` Matheus Tavares
  2021-02-17 22:20       ` Junio C Hamano
  2021-02-17 21:02     ` [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
                       ` (6 subsequent siblings)
  8 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

When `git add --refresh <pathspec>` doesn't find any matches for the
given pathspec, it prints an error message using the `match` field of
the `struct pathspec_item`. However, this field doesn't contain the
magic part of the pathspec. Instead, let's use the `original` field.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/add.c  | 2 +-
 t/t3700-add.sh | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/add.c b/builtin/add.c
index f757de45ea..8c96c23778 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -180,7 +180,7 @@ static void refresh(int verbose, const struct pathspec *pathspec)
 	for (i = 0; i < pathspec->nr; i++) {
 		if (!seen[i])
 			die(_("pathspec '%s' did not match any files"),
-			    pathspec->items[i].match);
+			    pathspec->items[i].original);
 	}
 	free(seen);
 }
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index fc81f2ef00..fe72204066 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -196,6 +196,12 @@ test_expect_success 'git add --refresh with pathspec' '
 	grep baz actual
 '
 
+test_expect_success 'git add --refresh correctly reports no match error' "
+	echo \"fatal: pathspec ':(icase)nonexistent' did not match any files\" >expect &&
+	test_must_fail git add --refresh ':(icase)nonexistent' 2>actual &&
+	test_i18ncmp expect actual
+"
+
 test_expect_success POSIXPERM,SANITY 'git add should fail atomically upon an unreadable file' '
 	git reset --hard &&
 	date >foo1 &&
-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
  2021-02-17 21:02     ` [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used Matheus Tavares
  2021-02-17 21:02     ` [RFC PATCH 2/7] add: include magic part of pathspec on --refresh error Matheus Tavares
@ 2021-02-17 21:02     ` Matheus Tavares
  2021-02-17 23:01       ` Junio C Hamano
  2021-02-17 21:02     ` [RFC PATCH 4/7] add: make --chmod and --renormalize honor " Matheus Tavares
                       ` (5 subsequent siblings)
  8 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

We already have a couple tests for `add` with SKIP_WORKTREE entries in
t7012, but these only cover the most basic scenarios. As we will be
changing how `add` deals with sparse paths in the subsequent commits,
let's move these two tests to their own file and add more test cases
for different `add` options and situations. This also demonstrates two
options that don't currently respect SKIP_WORKTREE entries: `--chmod`
and `--renormalize`.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t3705-add-sparse-checkout.sh   | 92 ++++++++++++++++++++++++++++++++
 t/t7012-skip-worktree-writing.sh | 19 -------
 2 files changed, 92 insertions(+), 19 deletions(-)
 create mode 100755 t/t3705-add-sparse-checkout.sh

diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
new file mode 100755
index 0000000000..5530e796b5
--- /dev/null
+++ b/t/t3705-add-sparse-checkout.sh
@@ -0,0 +1,92 @@
+#!/bin/sh
+
+test_description='git add in sparse checked out working trees'
+
+. ./test-lib.sh
+
+SPARSE_ENTRY_BLOB=""
+
+# Optionally take a string for the entry's contents
+setup_sparse_entry()
+{
+	if test -f sparse_entry
+	then
+		rm sparse_entry
+	fi &&
+	git update-index --force-remove sparse_entry &&
+
+	if test "$#" -eq 1
+	then
+		printf "$1" >sparse_entry
+	else
+		printf "" >sparse_entry
+	fi &&
+	git add sparse_entry &&
+	git update-index --skip-worktree sparse_entry &&
+	SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
+}
+
+test_sparse_entry_unchanged() {
+	echo "100644 $SPARSE_ENTRY_BLOB 0	sparse_entry" >expected &&
+	git ls-files --stage sparse_entry >actual &&
+	test_cmp expected actual
+}
+
+test_expect_success "git add does not remove SKIP_WORKTREE entries" '
+	setup_sparse_entry &&
+	rm sparse_entry &&
+	git add sparse_entry &&
+	test_sparse_entry_unchanged
+'
+
+test_expect_success "git add -A does not remove SKIP_WORKTREE entries" '
+	setup_sparse_entry &&
+	rm sparse_entry &&
+	git add -A &&
+	test_sparse_entry_unchanged
+'
+
+for opt in "" -f -u --ignore-removal
+do
+	if test -n "$opt"
+	then
+		opt=" $opt"
+	fi
+
+	test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
+		setup_sparse_entry &&
+		echo modified >sparse_entry &&
+		git add $opt sparse_entry &&
+		test_sparse_entry_unchanged
+	'
+done
+
+test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
+	setup_sparse_entry &&
+	test-tool chmtime -60 sparse_entry &&
+	git add --refresh sparse_entry &&
+
+	# We must unset the SKIP_WORKTREE bit, otherwise
+	# git diff-files would skip examining the file
+	git update-index --no-skip-worktree sparse_entry &&
+
+	echo sparse_entry >expected &&
+	git diff-files --name-only sparse_entry >actual &&
+	test_cmp actual expected
+'
+
+test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
+	setup_sparse_entry &&
+	git add --chmod=+x sparse_entry &&
+	test_sparse_entry_unchanged
+'
+
+test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
+	test_config core.autocrlf false &&
+	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
+	echo "sparse_entry text=auto" >.gitattributes &&
+	git add --renormalize sparse_entry &&
+	test_sparse_entry_unchanged
+'
+
+test_done
diff --git a/t/t7012-skip-worktree-writing.sh b/t/t7012-skip-worktree-writing.sh
index e5c6a038fb..217207c1ce 100755
--- a/t/t7012-skip-worktree-writing.sh
+++ b/t/t7012-skip-worktree-writing.sh
@@ -60,13 +60,6 @@ setup_absent() {
 	git update-index --skip-worktree 1
 }
 
-test_absent() {
-	echo "100644 $EMPTY_BLOB 0	1" > expected &&
-	git ls-files --stage 1 > result &&
-	test_cmp expected result &&
-	test ! -f 1
-}
-
 setup_dirty() {
 	git update-index --force-remove 1 &&
 	echo dirty > 1 &&
@@ -100,18 +93,6 @@ test_expect_success 'index setup' '
 	test_cmp expected result
 '
 
-test_expect_success 'git-add ignores worktree content' '
-	setup_absent &&
-	git add 1 &&
-	test_absent
-'
-
-test_expect_success 'git-add ignores worktree content' '
-	setup_dirty &&
-	git add 1 &&
-	test_dirty
-'
-
 test_expect_success 'git-rm fails if worktree is dirty' '
 	setup_dirty &&
 	test_must_fail git rm 1 &&
-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 4/7] add: make --chmod and --renormalize honor sparse checkouts
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
                       ` (2 preceding siblings ...)
  2021-02-17 21:02     ` [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
@ 2021-02-17 21:02     ` Matheus Tavares
  2021-02-17 21:02     ` [RFC PATCH 5/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching Matheus Tavares
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/add.c                  | 5 +++++
 t/t3705-add-sparse-checkout.sh | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 8c96c23778..e10a039070 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -45,6 +45,9 @@ static void chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
+		if (ce_skip_worktree(ce))
+			continue;
+
 		if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))
 			continue;
 
@@ -137,6 +140,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
+		if (ce_skip_worktree(ce))
+			continue;
 		if (ce_stage(ce))
 			continue; /* do not touch unmerged paths */
 		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index 5530e796b5..f7b0ea782e 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -75,13 +75,13 @@ test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
 	test_cmp actual expected
 '
 
-test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
+test_expect_success 'git add --chmod does not update SKIP_WORKTREE entries' '
 	setup_sparse_entry &&
 	git add --chmod=+x sparse_entry &&
 	test_sparse_entry_unchanged
 '
 
-test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
+test_expect_success 'git add --renormalize does not update SKIP_WORKTREE entries' '
 	test_config core.autocrlf false &&
 	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
 	echo "sparse_entry text=auto" >.gitattributes &&
-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 5/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
                       ` (3 preceding siblings ...)
  2021-02-17 21:02     ` [RFC PATCH 4/7] add: make --chmod and --renormalize honor " Matheus Tavares
@ 2021-02-17 21:02     ` Matheus Tavares
  2021-02-17 21:02     ` [RFC PATCH 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

Add the 'ignore_skip_worktree' boolean parameter to both
add_pathspec_matches_against_index() and
find_pathspecs_matching_against_index(). When true, these functions will
not try to match the given pathspec with SKIP_WORKTREE entries. This
will be used in a future patch to make `git add` display a hint
when the pathspec matches only sparse paths.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/add.c          |  4 ++--
 builtin/check-ignore.c |  2 +-
 pathspec.c             | 10 +++++++---
 pathspec.h             |  5 +++--
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index e10a039070..9f0f6ebaff 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -170,7 +170,7 @@ static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec,
 			*dst++ = entry;
 	}
 	dir->nr = dst - dir->entries;
-	add_pathspec_matches_against_index(pathspec, &the_index, seen);
+	add_pathspec_matches_against_index(pathspec, &the_index, seen, 0);
 	return seen;
 }
 
@@ -571,7 +571,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		int i;
 
 		if (!seen)
-			seen = find_pathspecs_matching_against_index(&pathspec, &the_index);
+			seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
 
 		/*
 		 * file_exists() assumes exact match
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 3c652748d5..235b7fc905 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -100,7 +100,7 @@ static int check_ignore(struct dir_struct *dir,
 	 * should not be ignored, in order to be consistent with
 	 * 'git status', 'git add' etc.
 	 */
-	seen = find_pathspecs_matching_against_index(&pathspec, &the_index);
+	seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
 	for (i = 0; i < pathspec.nr; i++) {
 		full_path = pathspec.items[i].match;
 		pattern = NULL;
diff --git a/pathspec.c b/pathspec.c
index 7a229d8d22..e5e6b7458d 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -21,7 +21,7 @@
  */
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 					const struct index_state *istate,
-					char *seen)
+					char *seen, int ignore_skip_worktree)
 {
 	int num_unmatched = 0, i;
 
@@ -38,6 +38,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 		return;
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
+		if (ignore_skip_worktree && ce_skip_worktree(ce))
+			continue;
 		ce_path_match(istate, ce, pathspec, seen);
 	}
 }
@@ -51,10 +53,12 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
  * given pathspecs achieves against all items in the index.
  */
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate)
+					    const struct index_state *istate,
+					    int ignore_skip_worktree)
 {
 	char *seen = xcalloc(pathspec->nr, 1);
-	add_pathspec_matches_against_index(pathspec, istate, seen);
+	add_pathspec_matches_against_index(pathspec, istate, seen,
+					   ignore_skip_worktree);
 	return seen;
 }
 
diff --git a/pathspec.h b/pathspec.h
index 454ce364fa..8202882ecd 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -151,9 +151,10 @@ static inline int ps_strcmp(const struct pathspec_item *item,
 
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 					const struct index_state *istate,
-					char *seen);
+					char *seen, int ignore_skip_worktree);
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate);
+					    const struct index_state *istate,
+					    int ignore_skip_worktree);
 int match_pathspec_attrs(const struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
                       ` (4 preceding siblings ...)
  2021-02-17 21:02     ` [RFC PATCH 5/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching Matheus Tavares
@ 2021-02-17 21:02     ` Matheus Tavares
  2021-02-19  0:34       ` Junio C Hamano
  2021-02-17 21:02     ` [RFC PATCH 7/7] rm: honor sparse checkout patterns Matheus Tavares
                       ` (2 subsequent siblings)
  8 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

`git add` already refrains from updating SKIP_WORKTREE entries, but it
silently succeeds when a pathspec only matches these entries. Instead,
let's warn the user and display a hint on how to update these entries.

Note that the warning is only shown if the pathspec matches no untracked
paths in the working tree and only matches index entries with the
SKIP_WORKTREE bit set. Performance-wise, this patch doesn't change the
number of ce_path_match() calls in the worst case scenario (because we
still need to check the sparse entries for the warning). But in the
general case, it avoids unnecessarily calling this function for each
SKIP_WORKTREE entry.

A warning message was chosen over erroring out right away to reproduce
the same behavior `add` already exhibits with ignored files. This also
allow users to continue their workflow without having to invoke `add`
again with only the matching pathspecs, as the matched files will have
already been added.

Note: refresh_index() was changed to only mark matches with
no-SKIP-WORKTREE entries in the `seen` output parameter. This is exactly
the behavior we want for `add`, and only `add` calls this function with
a non-NULL `seen` pointer. So the change brings no side effect on
other callers.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/advice.txt |  3 ++
 advice.c                        | 19 +++++++++++
 advice.h                        |  4 +++
 builtin/add.c                   | 60 ++++++++++++++++++++++++++-------
 pathspec.c                      | 15 +++++++++
 pathspec.h                      |  8 +++++
 read-cache.c                    |  3 +-
 t/t3705-add-sparse-checkout.sh  | 40 +++++++++++++++++++---
 8 files changed, 134 insertions(+), 18 deletions(-)

diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
index acbd0c09aa..d53eafa00b 100644
--- a/Documentation/config/advice.txt
+++ b/Documentation/config/advice.txt
@@ -119,4 +119,7 @@ advice.*::
 	addEmptyPathspec::
 		Advice shown if a user runs the add command without providing
 		the pathspec parameter.
+	updateSparsePath::
+		Advice shown if the pathspec given to linkgit:git-add[1] only
+		matches index entries outside the current sparse-checkout.
 --
diff --git a/advice.c b/advice.c
index 164742305f..cf22c1a6e5 100644
--- a/advice.c
+++ b/advice.c
@@ -2,6 +2,7 @@
 #include "config.h"
 #include "color.h"
 #include "help.h"
+#include "string-list.h"
 
 int advice_fetch_show_forced_updates = 1;
 int advice_push_update_rejected = 1;
@@ -136,6 +137,7 @@ static struct {
 	[ADVICE_STATUS_HINTS]				= { "statusHints", 1 },
 	[ADVICE_STATUS_U_OPTION]			= { "statusUoption", 1 },
 	[ADVICE_SUBMODULE_ALTERNATE_ERROR_STRATEGY_DIE] = { "submoduleAlternateErrorStrategyDie", 1 },
+	[ADVICE_UPDATE_SPARSE_PATH]			= { "updateSparsePath", 1 },
 	[ADVICE_WAITING_FOR_EDITOR]			= { "waitingForEditor", 1 },
 };
 
@@ -284,6 +286,23 @@ void NORETURN die_conclude_merge(void)
 	die(_("Exiting because of unfinished merge."));
 }
 
+void advise_on_updating_sparse_paths(struct string_list *pathspec_list)
+{
+	struct string_list_item *item;
+
+	if (!pathspec_list->nr)
+		return;
+
+	fprintf(stderr, _("The following pathspecs only matched index entries outside the current\n"
+			  "sparse checkout:\n"));
+	for_each_string_list_item(item, pathspec_list)
+		fprintf(stderr, "%s\n", item->string);
+
+	advise_if_enabled(ADVICE_UPDATE_SPARSE_PATH,
+			  _("Disable or modify the sparsity rules if you intend to update such entries."));
+
+}
+
 void detach_advice(const char *new_name)
 {
 	const char *fmt =
diff --git a/advice.h b/advice.h
index bc2432980a..bd26c385d0 100644
--- a/advice.h
+++ b/advice.h
@@ -3,6 +3,8 @@
 
 #include "git-compat-util.h"
 
+struct string_list;
+
 extern int advice_fetch_show_forced_updates;
 extern int advice_push_update_rejected;
 extern int advice_push_non_ff_current;
@@ -71,6 +73,7 @@ extern int advice_add_empty_pathspec;
 	ADVICE_STATUS_HINTS,
 	ADVICE_STATUS_U_OPTION,
 	ADVICE_SUBMODULE_ALTERNATE_ERROR_STRATEGY_DIE,
+	ADVICE_UPDATE_SPARSE_PATH,
 	ADVICE_WAITING_FOR_EDITOR,
 };
 
@@ -92,6 +95,7 @@ void advise_if_enabled(enum advice_type type, const char *advice, ...);
 int error_resolve_conflict(const char *me);
 void NORETURN die_resolve_conflict(const char *me);
 void NORETURN die_conclude_merge(void);
+void advise_on_updating_sparse_paths(struct string_list *pathspec_list);
 void detach_advice(const char *new_name);
 
 #endif /* ADVICE_H */
diff --git a/builtin/add.c b/builtin/add.c
index 9f0f6ebaff..b556c566c3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -170,24 +170,41 @@ static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec,
 			*dst++ = entry;
 	}
 	dir->nr = dst - dir->entries;
-	add_pathspec_matches_against_index(pathspec, &the_index, seen, 0);
+	add_pathspec_matches_against_index(pathspec, &the_index, seen, 1);
 	return seen;
 }
 
-static void refresh(int verbose, const struct pathspec *pathspec)
+static int refresh(int verbose, const struct pathspec *pathspec)
 {
 	char *seen;
-	int i;
+	int i, ret = 0;
+	char *skip_worktree_seen = NULL;
+	struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
 
 	seen = xcalloc(pathspec->nr, 1);
 	refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
 		      pathspec, seen, _("Unstaged changes after refreshing the index:"));
 	for (i = 0; i < pathspec->nr; i++) {
-		if (!seen[i])
-			die(_("pathspec '%s' did not match any files"),
-			    pathspec->items[i].original);
+		if (!seen[i]) {
+			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+				string_list_append(&only_match_skip_worktree,
+						   pathspec->items[i].original);
+			} else {
+				die(_("pathspec '%s' did not match any files"),
+				    pathspec->items[i].original);
+			}
+		}
+	}
+
+	if (only_match_skip_worktree.nr) {
+		advise_on_updating_sparse_paths(&only_match_skip_worktree);
+		ret = 1;
 	}
+
 	free(seen);
+	free(skip_worktree_seen);
+	string_list_clear(&only_match_skip_worktree, 0);
+	return ret;
 }
 
 int run_add_interactive(const char *revision, const char *patch_mode,
@@ -563,15 +580,17 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	}
 
 	if (refresh_only) {
-		refresh(verbose, &pathspec);
+		exit_status |= refresh(verbose, &pathspec);
 		goto finish;
 	}
 
 	if (pathspec.nr) {
 		int i;
+		char *skip_worktree_seen = NULL;
+		struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
 
 		if (!seen)
-			seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
+			seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 1);
 
 		/*
 		 * file_exists() assumes exact match
@@ -585,12 +604,20 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 		for (i = 0; i < pathspec.nr; i++) {
 			const char *path = pathspec.items[i].match;
+
 			if (pathspec.items[i].magic & PATHSPEC_EXCLUDE)
 				continue;
-			if (!seen[i] && path[0] &&
-			    ((pathspec.items[i].magic &
-			      (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
-			     !file_exists(path))) {
+			if (seen[i] || !path[0])
+				continue;
+
+			if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen)) {
+				string_list_append(&only_match_skip_worktree,
+						   pathspec.items[i].original);
+				continue;
+			}
+
+			if ((pathspec.items[i].magic & (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
+			    !file_exists(path)) {
 				if (ignore_missing) {
 					int dtype = DT_UNKNOWN;
 					if (is_excluded(&dir, &the_index, path, &dtype))
@@ -601,7 +628,16 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 					    pathspec.items[i].original);
 			}
 		}
+
+
+		if (only_match_skip_worktree.nr) {
+			advise_on_updating_sparse_paths(&only_match_skip_worktree);
+			exit_status = 1;
+		}
+
 		free(seen);
+		free(skip_worktree_seen);
+		string_list_clear(&only_match_skip_worktree, 0);
 	}
 
 	plug_bulk_checkin();
diff --git a/pathspec.c b/pathspec.c
index e5e6b7458d..61f294fed5 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -62,6 +62,21 @@ char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
 	return seen;
 }
 
+char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec)
+{
+	struct index_state *istate = the_repository->index;
+	char *seen = xcalloc(pathspec->nr, 1);
+	int i;
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce_skip_worktree(ce))
+		    ce_path_match(istate, ce, pathspec, seen);
+	}
+
+	return seen;
+}
+
 /*
  * Magic pathspec
  *
diff --git a/pathspec.h b/pathspec.h
index 8202882ecd..f591ba625c 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -155,6 +155,14 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
 					    const struct index_state *istate,
 					    int ignore_skip_worktree);
+char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec);
+static inline int matches_skip_worktree(const struct pathspec *pathspec,
+					int item, char **seen_ptr)
+{
+	if (!*seen_ptr)
+		*seen_ptr = find_pathspecs_matching_skip_worktree(pathspec);
+	return (*seen_ptr)[item];
+}
 int match_pathspec_attrs(const struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
diff --git a/read-cache.c b/read-cache.c
index 29144cf879..cbede4ada3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1546,7 +1546,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (ignore_submodules && S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
+		if (pathspec && !ce_path_match(istate, ce, pathspec,
+					       ce_skip_worktree(ce) ? NULL : seen))
 			filtered = 1;
 
 		if (ce_stage(ce)) {
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index f7b0ea782e..f66d369bf4 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -32,10 +32,22 @@ test_sparse_entry_unchanged() {
 	test_cmp expected actual
 }
 
+cat >sparse_entry_error <<-EOF
+The following pathspecs only matched index entries outside the current
+sparse checkout:
+sparse_entry
+EOF
+
+cat >error_and_hint sparse_entry_error - <<-EOF
+hint: Disable or modify the sparsity rules if you intend to update such entries.
+hint: Disable this message with "git config advice.updateSparsePath false"
+EOF
+
 test_expect_success "git add does not remove SKIP_WORKTREE entries" '
 	setup_sparse_entry &&
 	rm sparse_entry &&
-	git add sparse_entry &&
+	test_must_fail git add sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
 
@@ -56,7 +68,8 @@ do
 	test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
 		setup_sparse_entry &&
 		echo modified >sparse_entry &&
-		git add $opt sparse_entry &&
+		test_must_fail git add $opt sparse_entry 2>stderr &&
+		test_i18ncmp error_and_hint stderr &&
 		test_sparse_entry_unchanged
 	'
 done
@@ -64,7 +77,8 @@ done
 test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
 	setup_sparse_entry &&
 	test-tool chmtime -60 sparse_entry &&
-	git add --refresh sparse_entry &&
+	test_must_fail git add --refresh sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 
 	# We must unset the SKIP_WORKTREE bit, otherwise
 	# git diff-files would skip examining the file
@@ -77,7 +91,8 @@ test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
 
 test_expect_success 'git add --chmod does not update SKIP_WORKTREE entries' '
 	setup_sparse_entry &&
-	git add --chmod=+x sparse_entry &&
+	test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
 
@@ -85,8 +100,23 @@ test_expect_success 'git add --renormalize does not update SKIP_WORKTREE entries
 	test_config core.autocrlf false &&
 	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
 	echo "sparse_entry text=auto" >.gitattributes &&
-	git add --renormalize sparse_entry &&
+	test_must_fail git add --renormalize sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
 
+test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
+	setup_sparse_entry &&
+	test_must_fail git add nonexistent sp 2>stderr &&
+	test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
+	test_i18ngrep ! "The following pathspecs only matched index entries" stderr
+'
+
+test_expect_success 'add obeys advice.updateSparsePath' '
+	setup_sparse_entry &&
+	test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
+	test_i18ncmp sparse_entry_error stderr
+
+'
+
 test_done
-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 7/7] rm: honor sparse checkout patterns
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
                       ` (5 preceding siblings ...)
  2021-02-17 21:02     ` [RFC PATCH 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
@ 2021-02-17 21:02     ` Matheus Tavares
  2021-02-22 18:57     ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Elijah Newren
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
  8 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-17 21:02 UTC (permalink / raw)
  To: git; +Cc: newren, stolee

`git add` refrains from adding or updating paths outside the sparsity
rules, but `git rm` doesn't follow the same restrictions. This is
somewhat counter-intuitive and inconsistent. So make `rm` honor the
sparse checkout and advise on how to remove SKIP_WORKTREE entries, just
like `add` does. Also add a few tests for the new behavior.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/advice.txt  |  5 +--
 Documentation/git-rm.txt         |  4 ++-
 builtin/rm.c                     | 35 ++++++++++++++-------
 t/t3600-rm.sh                    | 54 ++++++++++++++++++++++++++++++++
 t/t7011-skip-worktree-reading.sh |  5 ---
 5 files changed, 84 insertions(+), 19 deletions(-)

diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
index d53eafa00b..bdd423ade4 100644
--- a/Documentation/config/advice.txt
+++ b/Documentation/config/advice.txt
@@ -120,6 +120,7 @@ advice.*::
 		Advice shown if a user runs the add command without providing
 		the pathspec parameter.
 	updateSparsePath::
-		Advice shown if the pathspec given to linkgit:git-add[1] only
-		matches index entries outside the current sparse-checkout.
+		Advice shown if the pathspec given to linkgit:git-add[1] or
+		linkgit:git-rm[1] only matches index entries outside the
+		current sparse-checkout.
 --
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index ab750367fd..26e9b28470 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -23,7 +23,9 @@ branch, and no updates to their contents can be staged in the index,
 though that default behavior can be overridden with the `-f` option.
 When `--cached` is given, the staged content has to
 match either the tip of the branch or the file on disk,
-allowing the file to be removed from just the index.
+allowing the file to be removed from just the index. When
+sparse-checkouts are in use (see linkgit:git-sparse-checkout[1]),
+`git rm` will only remove paths within the sparse-checkout patterns.
 
 
 OPTIONS
diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f..d23a3b2164 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
+#include "advice.h"
 #include "config.h"
 #include "lockfile.h"
 #include "dir.h"
@@ -254,7 +255,7 @@ static struct option builtin_rm_options[] = {
 int cmd_rm(int argc, const char **argv, const char *prefix)
 {
 	struct lock_file lock_file = LOCK_INIT;
-	int i;
+	int i, ret = 0;
 	struct pathspec pathspec;
 	char *seen;
 
@@ -295,6 +296,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
+		if (ce_skip_worktree(ce))
+			continue;
 		if (!ce_path_match(&the_index, ce, &pathspec, seen))
 			continue;
 		ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
@@ -308,24 +311,34 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 	if (pathspec.nr) {
 		const char *original;
 		int seen_any = 0;
+		char *skip_worktree_seen = NULL;
+		struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
+
 		for (i = 0; i < pathspec.nr; i++) {
 			original = pathspec.items[i].original;
-			if (!seen[i]) {
-				if (!ignore_unmatch) {
-					die(_("pathspec '%s' did not match any files"),
-					    original);
-				}
-			}
-			else {
+			if (seen[i])
 				seen_any = 1;
-			}
+			else if (ignore_unmatch)
+				continue;
+			else if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen))
+				string_list_append(&only_match_skip_worktree, original);
+			else
+				die(_("pathspec '%s' did not match any files"), original);
+
 			if (!recursive && seen[i] == MATCHED_RECURSIVELY)
 				die(_("not removing '%s' recursively without -r"),
 				    *original ? original : ".");
 		}
 
+		if (only_match_skip_worktree.nr) {
+			advise_on_updating_sparse_paths(&only_match_skip_worktree);
+			ret = 1;
+		}
+		free(skip_worktree_seen);
+		string_list_clear(&only_match_skip_worktree, 0);
+
 		if (!seen_any)
-			exit(0);
+			exit(ret);
 	}
 
 	if (!index_only)
@@ -405,5 +418,5 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
 		die(_("Unable to write new index file"));
 
-	return 0;
+	return ret;
 }
diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index 7547f11a5c..0e5cf945d9 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -907,4 +907,58 @@ test_expect_success 'rm empty string should fail' '
 	test_must_fail git rm -rf ""
 '
 
+test_expect_success 'setup repo for tests with sparse-checkout' '
+	git init sparse &&
+	(
+		cd sparse &&
+		mkdir -p sub/dir &&
+		touch a b c sub/d sub/dir/e &&
+		git add -A &&
+		git commit -m files
+	) &&
+
+	cat >sparse_entry_b_error <<-EOF &&
+	The following pathspecs only matched index entries outside the current
+	sparse checkout:
+	b
+	EOF
+
+	cat >b_error_and_hint sparse_entry_b_error - <<-EOF
+	hint: Disable or modify the sparsity rules if you intend to update such entries.
+	hint: Disable this message with "git config advice.updateSparsePath false"
+	EOF
+'
+
+test_expect_success 'rm should respect sparse-checkout' '
+	git -C sparse sparse-checkout set "/a" &&
+	test_must_fail git -C sparse rm b 2>stderr &&
+	test_i18ncmp b_error_and_hint stderr
+'
+
+test_expect_success 'rm obeys advice.updateSparsePath' '
+	git -C sparse reset --hard &&
+	git -C sparse sparse-checkout set "/a" &&
+	test_must_fail git -C sparse -c advice.updateSparsePath=false rm b 2>stderr &&
+	test_i18ncmp sparse_entry_b_error stderr
+
+'
+
+test_expect_success 'recursive rm should respect sparse-checkout' '
+	(
+		cd sparse &&
+		git reset --hard &&
+		git sparse-checkout set "sub/dir" &&
+		git rm -r sub &&
+		git status --porcelain -uno >../actual
+	) &&
+	echo "D  sub/dir/e" >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
+	test_must_fail git -C sparse rm nonexistent 2>stderr &&
+	test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
+	test_i18ngrep ! "The following pathspecs only matched index entries" stderr
+'
+
 test_done
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 37525cae3a..f87749951f 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -141,11 +141,6 @@ test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
 	test -z "$(git diff-files -- one)"
 '
 
-test_expect_success 'git-rm succeeds on skip-worktree absent entries' '
-	setup_absent &&
-	git rm 1
-'
-
 test_expect_success 'commit on skip-worktree absent entries' '
 	git reset &&
 	setup_absent &&
-- 
2.29.2


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used
  2021-02-17 21:02     ` [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used Matheus Tavares
@ 2021-02-17 21:45       ` Junio C Hamano
  2021-02-18  1:33         ` Matheus Tavares
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-02-17 21:45 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, newren, stolee

Matheus Tavares <matheus.bernardino@usp.br> writes:

> `git add --chmod` applies the mode changes even when `--dry-run` is
> used. Fix that and add some tests for this option combination.

Well spotted.  I hope we can split this out of the series and fast
track, as it is an obvious bugfix.

I by mistake wrote error(_("...")) in the snippet below, but as a
bugfix, we should stick to the existing fprintf(stderr, "...") without
_().  i18n should be left outside the "bugfix" change.

> -static void chmod_pathspec(struct pathspec *pathspec, char flip)
> +static void chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
>  {
>  	int i;
>  
> @@ -48,7 +48,8 @@ static void chmod_pathspec(struct pathspec *pathspec, char flip)
>  		if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))
>  			continue;
>  
> -		if (chmod_cache_entry(ce, flip) < 0)
> +		if ((show_only && !S_ISREG(ce->ce_mode)) ||
> +		    (!show_only && chmod_cache_entry(ce, flip) < 0))
>  			fprintf(stderr, "cannot chmod %cx '%s'\n", flip, ce->name);
>  	}
>  }

This is a bit dense, especially when the reader does not know by
heart that chmod_cache_entry() refuses to chmod anything that is not
a regular file.

Even when dry-run, we know chmod will fail when the thing is not a
regular file.  When not dry-run, we will try chmod and it will
report an failure.  And we report an error under these conditions.

	if (show_only
	    ? !S_ISREG(ce->ce_mode)
	    : chmod_cache_entry(ce, flip) < 0)
		error(_("cannot chmod ..."), ...);

may express the same idea in a way that is a bit easier to follow.

In any case, that "idea", while it is not wrong per-se, makes it as
if the primary purpose of this code is to give an error message,
which smells a bit funny.

	if (!show_only)
        	err = chmod_cache_entry(ce, flip);
	else 
        	err = S_ISREG(ce->ce_mode) ? 0 : -1;

	if (err < 0)
		error(_("cannot chmod ..."), ...);

would waste one extra variable, but may make the primary point
(i.e. we call chmod_cache_entry() unless dry-run) more clear.

> @@ -609,7 +610,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>  		exit_status |= add_files(&dir, flags);
>  
>  	if (chmod_arg && pathspec.nr)
> -		chmod_pathspec(&pathspec, chmod_arg[0]);
> +		chmod_pathspec(&pathspec, chmod_arg[0], show_only);
>  	unplug_bulk_checkin();

OK, this side is straight-forward.  We know if we are doing --dry-run;
we have to pass it down.

> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> index b7d4ba608c..fc81f2ef00 100755
> --- a/t/t3700-add.sh
> +++ b/t/t3700-add.sh
> @@ -386,6 +386,26 @@ test_expect_success POSIXPERM 'git add --chmod=[+-]x does not change the working
>  	! test -x foo4
>  '
>  
> +test_expect_success 'git add --chmod honors --dry-run' '
> +	git reset --hard &&
> +	echo foo >foo4 &&
> +	git add foo4 &&
> +	git add --chmod=+x --dry-run foo4 &&
> +	test_mode_in_index 100644 foo4
> +'
> +
> +test_expect_success 'git add --chmod --dry-run reports error for non regular files' '
> +	git reset --hard &&
> +	test_ln_s_add foo foo4 &&
> +	git add --chmod=+x --dry-run foo4 2>stderr &&
> +	grep "cannot chmod +x .foo4." stderr
> +'

Nice that test_ln_s_add lets write this without SYMLINKS
prerequisite, as all that matters is what is in the index
and not in the working tree.

> +test_expect_success 'git add --chmod --dry-run reports error for unmatched pathspec' '
> +	test_must_fail git add --chmod=+x --dry-run nonexistent 2>stderr &&
> +	test_i18ngrep "pathspec .nonexistent. did not match any files" stderr
> +'
> +
>  test_expect_success 'no file status change if no pathspec is given' '
>  	>foo5 &&
>  	>foo6 &&

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 2/7] add: include magic part of pathspec on --refresh error
  2021-02-17 21:02     ` [RFC PATCH 2/7] add: include magic part of pathspec on --refresh error Matheus Tavares
@ 2021-02-17 22:20       ` Junio C Hamano
  0 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-02-17 22:20 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, newren, stolee

Matheus Tavares <matheus.bernardino@usp.br> writes:

> When `git add --refresh <pathspec>` doesn't find any matches for the
> given pathspec, it prints an error message using the `match` field of
> the `struct pathspec_item`. However, this field doesn't contain the
> magic part of the pathspec. Instead, let's use the `original` field.

I assume that this is what already happens when there is no
"--refresh" on the command line?  I.e.

    $ cd t
    $ git add ':(icase)foo'
    fatal: pathspec ':(icase)foo' did not match any files
    $ git add --refresh ':(icase)foo'
    fatal: pathspec 't/foo' did not match any files

and you are unifying the discrepancy between the two to match the
former.

Makes sense.

>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  builtin/add.c  | 2 +-
>  t/t3700-add.sh | 6 ++++++
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index f757de45ea..8c96c23778 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -180,7 +180,7 @@ static void refresh(int verbose, const struct pathspec *pathspec)
>  	for (i = 0; i < pathspec->nr; i++) {
>  		if (!seen[i])
>  			die(_("pathspec '%s' did not match any files"),
> -			    pathspec->items[i].match);
> +			    pathspec->items[i].original);
>  	}
>  	free(seen);
>  }
> diff --git a/t/t3700-add.sh b/t/t3700-add.sh
> index fc81f2ef00..fe72204066 100755
> --- a/t/t3700-add.sh
> +++ b/t/t3700-add.sh
> @@ -196,6 +196,12 @@ test_expect_success 'git add --refresh with pathspec' '
>  	grep baz actual
>  '
>  
> +test_expect_success 'git add --refresh correctly reports no match error' "
> +	echo \"fatal: pathspec ':(icase)nonexistent' did not match any files\" >expect &&
> +	test_must_fail git add --refresh ':(icase)nonexistent' 2>actual &&
> +	test_i18ncmp expect actual
> +"
> +
>  test_expect_success POSIXPERM,SANITY 'git add should fail atomically upon an unreadable file' '
>  	git reset --hard &&
>  	date >foo1 &&

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-17 21:02     ` [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
@ 2021-02-17 23:01       ` Junio C Hamano
  2021-02-17 23:22         ` Eric Sunshine
                           ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-02-17 23:01 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, newren, stolee

Matheus Tavares <matheus.bernardino@usp.br> writes:

> We already have a couple tests for `add` with SKIP_WORKTREE entries in
> t7012, but these only cover the most basic scenarios. As we will be
> changing how `add` deals with sparse paths in the subsequent commits,
> let's move these two tests to their own file and add more test cases
> for different `add` options and situations. This also demonstrates two
> options that don't currently respect SKIP_WORKTREE entries: `--chmod`
> and `--renormalize`.

Nice.  It makes sense to describe what we want first, like this step..

> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  t/t3705-add-sparse-checkout.sh   | 92 ++++++++++++++++++++++++++++++++
>  t/t7012-skip-worktree-writing.sh | 19 -------
>  2 files changed, 92 insertions(+), 19 deletions(-)
>  create mode 100755 t/t3705-add-sparse-checkout.sh
>
> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> new file mode 100755
> index 0000000000..5530e796b5
> --- /dev/null
> +++ b/t/t3705-add-sparse-checkout.sh
> @@ -0,0 +1,92 @@
> +#!/bin/sh
> +
> +test_description='git add in sparse checked out working trees'
> +
> +. ./test-lib.sh
> +
> +SPARSE_ENTRY_BLOB=""
> +
> +# Optionally take a string for the entry's contents
> +setup_sparse_entry()
> +{

Style.

	setup_sparse_entry () {

on a single line.

> +	if test -f sparse_entry
> +	then
> +		rm sparse_entry
> +	fi &&
> +	git update-index --force-remove sparse_entry &&

Why not an unconditional removal on the working tree side?

	rm -f sparse_entry &&
	git update-index --force-remove sparse_entry &&

Are there cases where we may have sparse_entry directory here?

> +
> +	if test "$#" -eq 1

No need to quote $# (we know it is a number).

> +	then
> +		printf "$1" >sparse_entry

Make sure the test writers know that they are passing a string that
will be interpreted as a printf format.  Review the comment before
the function and adjust it appropriately ("a string" is not what you
want to tell them).

> +	else
> +		printf "" >sparse_entry

Just

		>sparse_entry

is sufficient, no?

> +	fi &&
> +	git add sparse_entry &&
> +	git update-index --skip-worktree sparse_entry &&
> +	SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
> +}
> +
> +test_sparse_entry_unchanged() {

Style.

	test_sparse_entry_unchanged () {

> +	echo "100644 $SPARSE_ENTRY_BLOB 0	sparse_entry" >expected &&
> +	git ls-files --stage sparse_entry >actual &&
> +	test_cmp expected actual

OK.  So the expected pattern is to first "setup", do stuff that
shouldn't affect the sparse entry in the index, and then call this
to make sure?

> +}

> +test_expect_success "git add does not remove SKIP_WORKTREE entries" '

We use the term SKIP_WORKTREE and SPARSE interchangeably here.  I
wonder if it is easier to understand if we stick to one e.g. by
saying "... does not remove 'sparse' entries" instead?  I dunno.

> +	setup_sparse_entry &&
> +	rm sparse_entry &&
> +	git add sparse_entry &&
> +	test_sparse_entry_unchanged

Wow.  OK.  Makes a reader wonder what should happen when the two
operations are replaced with "git rm sparse_entry"; let's read on.

> +'
> +
> +test_expect_success "git add -A does not remove SKIP_WORKTREE entries" '
> +	setup_sparse_entry &&
> +	rm sparse_entry &&
> +	git add -A &&
> +	test_sparse_entry_unchanged
> +'

OK.  As there is nothing other than sparse_entry in the working tree
or in the index, the above two should be equivalent.

I wonder what should happen if the "add -A" gets replaced with "add .";
it should behave the same way, I think.  Is it worth testing that
case as well, or is it redundant?

> +for opt in "" -f -u --ignore-removal
> +do
> +	if test -n "$opt"
> +	then
> +		opt=" $opt"
> +	fi

The above is cumulative, and as a consequence, "git add -u <path>"
is not tested, but "git add -f -u <path>" is.  Intended?  How was
the order of the options listed in "for opt in ..." chosen?

> +	test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
> +		setup_sparse_entry &&
> +		echo modified >sparse_entry &&
> +		git add $opt sparse_entry &&
> +		test_sparse_entry_unchanged
> +	'
> +done
> +
> +test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
> +	setup_sparse_entry &&
> +	test-tool chmtime -60 sparse_entry &&
> +	git add --refresh sparse_entry &&
> +
> +	# We must unset the SKIP_WORKTREE bit, otherwise
> +	# git diff-files would skip examining the file
> +	git update-index --no-skip-worktree sparse_entry &&
> +
> +	echo sparse_entry >expected &&
> +	git diff-files --name-only sparse_entry >actual &&
> +	test_cmp actual expected

Hmph, I am not sure what we are testing at this point.  One way to
make the final diff-files step show sparse_entry would be for "git
add --refresh" to be a no-op, in which case, the cached stat
information in the index would be different in mtime from the path
in the working tree.  But "update-index --no-skip-worktree" may be
buggy and further change or invalidate the cached stat information
to cause diff-files to report that the path may be different.

> +'
> +
> +test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
> +	setup_sparse_entry &&
> +	git add --chmod=+x sparse_entry &&
> +	test_sparse_entry_unchanged

Hmph.  Should we also check if sparse_entry in the filesystem also
is not made executable, not just the entry in the index?

> +'
> +
> +test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
> +	test_config core.autocrlf false &&
> +	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
> +	echo "sparse_entry text=auto" >.gitattributes &&
> +	git add --renormalize sparse_entry &&
> +	test_sparse_entry_unchanged

Makes sense.

What should "git diff sparse_entry" say at this point, I have to
wonder?

> +'
> +
> +test_done

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-17 23:01       ` Junio C Hamano
@ 2021-02-17 23:22         ` Eric Sunshine
  2021-02-17 23:34           ` Junio C Hamano
  2021-02-18  3:11           ` Matheus Tavares Bernardino
  2021-02-18  3:07         ` Matheus Tavares Bernardino
  2021-02-22 18:53         ` Elijah Newren
  2 siblings, 2 replies; 56+ messages in thread
From: Eric Sunshine @ 2021-02-17 23:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matheus Tavares, Git List, Elijah Newren, Derrick Stolee

On Wed, Feb 17, 2021 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> Matheus Tavares <matheus.bernardino@usp.br> writes:
> > +for opt in "" -f -u --ignore-removal
> > +do
> > +     if test -n "$opt"
> > +     then
> > +             opt=" $opt"
> > +     fi
> > +     test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
>
> The above is cumulative, and as a consequence, "git add -u <path>"
> is not tested, but "git add -f -u <path>" is.  Intended?  How was
> the order of the options listed in "for opt in ..." chosen?

I may be misreading, but I don't think this is cumulative (though it's
easy to mistake it as such due to the way it inserts a space before
$opt). My interpretation is that `opt` gets overwritten with a new
value on each iteration, and it is inserting the space merely to make
the test title print nicely. A more idiomatic way to do this would
have been:

    for opt in "" -f -u --ignore-removal
    do
        test_expect_success " git add${opt:+ $opt} does ..." '

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-17 23:22         ` Eric Sunshine
@ 2021-02-17 23:34           ` Junio C Hamano
  2021-02-18  3:11           ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-02-17 23:34 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Matheus Tavares, Git List, Elijah Newren, Derrick Stolee

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Feb 17, 2021 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>> Matheus Tavares <matheus.bernardino@usp.br> writes:
>> > +for opt in "" -f -u --ignore-removal
>> > +do
>> > +     if test -n "$opt"
>> > +     then
>> > +             opt=" $opt"
>> > +     fi
>> > +     test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
>>
>> The above is cumulative, and as a consequence, "git add -u <path>"
>> is not tested, but "git add -f -u <path>" is.  Intended?  How was
>> the order of the options listed in "for opt in ..." chosen?
>
> I may be misreading, but I don't think this is cumulative (though it's
> easy to mistake it as such due to the way it inserts a space before
> $opt).

Ah, yes, I misread it.

> ... A more idiomatic way to do this would
> have been:
>
>     for opt in "" -f -u --ignore-removal
>     do
>         test_expect_success " git add${opt:+ $opt} does ..." '

Yes, indeed.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used
  2021-02-17 21:45       ` Junio C Hamano
@ 2021-02-18  1:33         ` Matheus Tavares
  0 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-18  1:33 UTC (permalink / raw)
  To: gitster; +Cc: git, newren, stolee

On Wed, Feb 17, 2021 at 6:46 PM Junio C Hamano <gitster@pobox.com>
wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > `git add --chmod` applies the mode changes even when `--dry-run` is
> > used. Fix that and add some tests for this option combination.
>
> Well spotted.  I hope we can split this out of the series and fast
> track, as it is an obvious bugfix.

Makes sense, should I send this as a standalone patch, after applying
the suggested changes?

> I by mistake wrote error(_("...")) in the snippet below, but as a
> bugfix, we should stick to the existing fprintf(stderr, "...") without
> _().  i18n should be left outside the "bugfix" change.

Hmm, when I read your snippet I thought that because this is a small fix
it wouldn't be bad to include the internationalization in the same patch
(with a "While we are here ..." note in the commit message). But are
there other reasons why it is better to do this as a follow-up step? 

> > -static void chmod_pathspec(struct pathspec *pathspec, char flip)
> > +static void chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
> >  {
> >       int i;
> >
> > @@ -48,7 +48,8 @@ static void chmod_pathspec(struct pathspec *pathspec, char flip)
> >               if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))
> >                       continue;
> >
> > -             if (chmod_cache_entry(ce, flip) < 0)
> > +             if ((show_only && !S_ISREG(ce->ce_mode)) ||
> > +                 (!show_only && chmod_cache_entry(ce, flip) < 0))
> >                       fprintf(stderr, "cannot chmod %cx '%s'\n", flip, ce->name);
> >       }
> >  }
>
> This is a bit dense, especially when the reader does not know by
> heart that chmod_cache_entry() refuses to chmod anything that is not
> a regular file.
>
> Even when dry-run, we know chmod will fail when the thing is not a
> regular file.  When not dry-run, we will try chmod and it will
> report an failure.  And we report an error under these conditions.
>
>         if (show_only
>             ? !S_ISREG(ce->ce_mode)
>             : chmod_cache_entry(ce, flip) < 0)
>                 error(_("cannot chmod ..."), ...);
>
> may express the same idea in a way that is a bit easier to follow.
>
> In any case, that "idea", while it is not wrong per-se, makes it as
> if the primary purpose of this code is to give an error message,
> which smells a bit funny.
>
>         if (!show_only)
>                 err = chmod_cache_entry(ce, flip);
>         else
>                 err = S_ISREG(ce->ce_mode) ? 0 : -1;
>
>         if (err < 0)
>                 error(_("cannot chmod ..."), ...);
>
> would waste one extra variable, but may make the primary point
> (i.e. we call chmod_cache_entry() unless dry-run) more clear.

And that's easier to read too. Thanks!

Also, in a following patch, should we make chmod_pathspec() return `err`
so that we can do:

	exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);

and have the chmod error reflected in `add`s exit code?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-17 23:01       ` Junio C Hamano
  2021-02-17 23:22         ` Eric Sunshine
@ 2021-02-18  3:07         ` Matheus Tavares Bernardino
  2021-02-18 14:38           ` Matheus Tavares
  2021-02-18 19:02           ` Junio C Hamano
  2021-02-22 18:53         ` Elijah Newren
  2 siblings, 2 replies; 56+ messages in thread
From: Matheus Tavares Bernardino @ 2021-02-18  3:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Derrick Stolee

On Wed, Feb 17, 2021 at 8:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
> >
> > +test_expect_success "git add does not remove SKIP_WORKTREE entries" '
>
> We use the term SKIP_WORKTREE and SPARSE interchangeably here.  I
> wonder if it is easier to understand if we stick to one e.g. by
> saying "... does not remove 'sparse' entries" instead?  I dunno.

Good idea, thanks.

> > +     setup_sparse_entry &&
> > +     rm sparse_entry &&
> > +     git add sparse_entry &&
> > +     test_sparse_entry_unchanged
>
> Wow.  OK.  Makes a reader wonder what should happen when the two
> operations are replaced with "git rm sparse_entry"; let's read on.
>
> > +'
> > +
> > +test_expect_success "git add -A does not remove SKIP_WORKTREE entries" '
> > +     setup_sparse_entry &&
> > +     rm sparse_entry &&
> > +     git add -A &&
> > +     test_sparse_entry_unchanged
> > +'
>
> OK.  As there is nothing other than sparse_entry in the working tree
> or in the index, the above two should be equivalent.

I just realized that the "actual" file created by the previous
test_sparse_entry_unchanged would also be added to the index here.
This doesn't affect the current test or the next ones, but I guess we
could use `git add -A sparse_entry` to avoid any future problems.

> I wonder what should happen if the "add -A" gets replaced with "add .";
> it should behave the same way, I think.  Is it worth testing that
> case as well, or is it redundant?

Hmm, I think it might be better to test only `add -A sparse_entry`, to
avoid adding the "actual" file or others that might be introduced in
future changes.

> > +test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
> > +     setup_sparse_entry &&
> > +     test-tool chmtime -60 sparse_entry &&
> > +     git add --refresh sparse_entry &&
> > +
> > +     # We must unset the SKIP_WORKTREE bit, otherwise
> > +     # git diff-files would skip examining the file
> > +     git update-index --no-skip-worktree sparse_entry &&
> > +
> > +     echo sparse_entry >expected &&
> > +     git diff-files --name-only sparse_entry >actual &&
> > +     test_cmp actual expected
>
> Hmph, I am not sure what we are testing at this point.  One way to
> make the final diff-files step show sparse_entry would be for "git
> add --refresh" to be a no-op, in which case, the cached stat
> information in the index would be different in mtime from the path
> in the working tree.  But "update-index --no-skip-worktree" may be
> buggy and further change or invalidate the cached stat information
> to cause diff-files to report that the path may be different.

Oh, that is a problem... We could use `git ls-files --debug` and
directly compare the mtime field. But the ls-files doc says that
--debug format may change at any time... Any other idea?

> > +'
> > +
> > +test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
> > +     setup_sparse_entry &&
> > +     git add --chmod=+x sparse_entry &&
> > +     test_sparse_entry_unchanged
>
> Hmph.  Should we also check if sparse_entry in the filesystem also
> is not made executable, not just the entry in the index?

I think so. This is already tested at the "core" --chmod tests in
t3700, but it certainly wouldn't hurt to test here too.

Thanks for all the great feedback

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-17 23:22         ` Eric Sunshine
  2021-02-17 23:34           ` Junio C Hamano
@ 2021-02-18  3:11           ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 56+ messages in thread
From: Matheus Tavares Bernardino @ 2021-02-18  3:11 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, Git List, Elijah Newren, Derrick Stolee

On Wed, Feb 17, 2021 at 8:22 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Wed, Feb 17, 2021 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Matheus Tavares <matheus.bernardino@usp.br> writes:
> > > +for opt in "" -f -u --ignore-removal
> > > +do
> > > +     if test -n "$opt"
> > > +     then
> > > +             opt=" $opt"
> > > +     fi
> > > +     test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
> >
> ... A more idiomatic way to do this would
> have been:
>
>     for opt in "" -f -u --ignore-removal
>     do
>         test_expect_success " git add${opt:+ $opt} does ..." '

That's much better, thanks :)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-18  3:07         ` Matheus Tavares Bernardino
@ 2021-02-18 14:38           ` Matheus Tavares
  2021-02-18 19:05             ` Junio C Hamano
  2021-02-18 19:02           ` Junio C Hamano
  1 sibling, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-18 14:38 UTC (permalink / raw)
  To: gitster; +Cc: git, newren, stolee

On Thu, Feb 18, 2021 at 12:07 AM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote:
>
> On Wed, Feb 17, 2021 at 8:01 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Matheus Tavares <matheus.bernardino@usp.br> writes:
> > >
> > > +test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
> > > +     setup_sparse_entry &&
> > > +     test-tool chmtime -60 sparse_entry &&
> > > +     git add --refresh sparse_entry &&
> > > +
> > > +     # We must unset the SKIP_WORKTREE bit, otherwise
> > > +     # git diff-files would skip examining the file
> > > +     git update-index --no-skip-worktree sparse_entry &&
> > > +
> > > +     echo sparse_entry >expected &&
> > > +     git diff-files --name-only sparse_entry >actual &&
> > > +     test_cmp actual expected
> >
> > Hmph, I am not sure what we are testing at this point.  One way to
> > make the final diff-files step show sparse_entry would be for "git
> > add --refresh" to be a no-op, in which case, the cached stat
> > information in the index would be different in mtime from the path
> > in the working tree.  But "update-index --no-skip-worktree" may be
> > buggy and further change or invalidate the cached stat information
> > to cause diff-files to report that the path may be different.
>
> Oh, that is a problem... We could use `git ls-files --debug` and
> directly compare the mtime field. But the ls-files doc says that
> --debug format may change at any time... Any other idea?

Or maybe we could use a test helper for this? Something like:

-- >8 --
#include "test-tool.h"
#include "cache.h"

int cmd__cached_mtime(int argc, const char **argv)
{
	int i;

	setup_git_directory();
	if (read_cache() < 0)
		die("could not read the index");

	for (i = 1; i < argc; i++) {
		const struct stat_data *sd;
		int pos = cache_name_pos(argv[i], strlen(argv[i]));

		if (pos < 0) {
			pos = -pos-1;
			if (pos < active_nr && !strcmp(active_cache[pos]->name, argv[i]))
				die("'%s' is unmerged", argv[i]);
			else
				die("'%s' is not in the index", argv[i]);
		}

		sd = &active_cache[pos]->ce_stat_data;
		printf("%s %u:%u\n", argv[i], sd->sd_mtime.sec, sd->sd_mtime.nsec);
	}

	discard_cache();
	return 0;
}
-- >8 --

Then, the test would become:

	setup_sparse_entry &&
	test-tool cached_mtime sparse_entry >before &&
	test-tool chmtime -60 sparse_entry &&
	git add --refresh sparse_entry &&
	test-tool cached_mtime sparse_entry >after &&
	test_cmp before after

What do you think?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-18  3:07         ` Matheus Tavares Bernardino
  2021-02-18 14:38           ` Matheus Tavares
@ 2021-02-18 19:02           ` Junio C Hamano
  1 sibling, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-02-18 19:02 UTC (permalink / raw)
  To: Matheus Tavares Bernardino; +Cc: git, Elijah Newren, Derrick Stolee

Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

>> > +test_expect_success "git add -A does not remove SKIP_WORKTREE entries" '
>> > +     setup_sparse_entry &&
>> > +     rm sparse_entry &&
>> > +     git add -A &&
>> > +     test_sparse_entry_unchanged
>> > +'
>>
>> OK.  As there is nothing other than sparse_entry in the working tree
>> or in the index, the above two should be equivalent.
>
> I just realized that the "actual" file created by the previous
> test_sparse_entry_unchanged would also be added to the index here.
> This doesn't affect the current test or the next ones, but I guess we
> could use `git add -A sparse_entry` to avoid any future problems.
> ...
> Hmm, I think it might be better to test only `add -A sparse_entry`, to
> avoid adding the "actual" file or others that might be introduced in
> future changes.

Rewriting 'git add -A' to 'git add -A sparse_entry' may not be wrong
but it will invite "does -A somehow misbehave without pathspec?" and
other puzzlements.

If adding 'actual' or 'expect' do not matter, I think it would be OK
to just add it, but if it bothers you, we can prepare an .gitignore
and list them early in the test with a comment that says "we will do
many 'git add .' and the like and do not want to be affected by what
the test needs to use to verify the result".

> Oh, that is a problem... We could use `git ls-files --debug` and
> directly compare the mtime field. But the ls-files doc says that
> --debug format may change at any time... Any other idea?

The option is to help us developers; if somebody wants to change it
and breaks your tests, they are responsible for rewriting their
change in such a way to keep your tests working or adjust your tests
to their new output format.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-18 14:38           ` Matheus Tavares
@ 2021-02-18 19:05             ` Junio C Hamano
  0 siblings, 0 replies; 56+ messages in thread
From: Junio C Hamano @ 2021-02-18 19:05 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, newren, stolee

Matheus Tavares <matheus.bernardino@usp.br> writes:

> Then, the test would become:
>
> 	setup_sparse_entry &&
> 	test-tool cached_mtime sparse_entry >before &&
> 	test-tool chmtime -60 sparse_entry &&
> 	git add --refresh sparse_entry &&
> 	test-tool cached_mtime sparse_entry >after &&
> 	test_cmp before after
>
> What do you think?

I do not see much point in introducing a duplicated "ls-files --debug"
that gives only a subset of its output.  Even if we add test-tool,
we would need to reserve the right to change its output format any time,
so I am not sure what we'd be gaining by avoiding the use of
existing one.

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries
  2021-02-17 21:02     ` [RFC PATCH 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
@ 2021-02-19  0:34       ` Junio C Hamano
  2021-02-19 17:11         ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 56+ messages in thread
From: Junio C Hamano @ 2021-02-19  0:34 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, newren, stolee

Matheus Tavares <matheus.bernardino@usp.br> writes:

> `git add` already refrains from updating SKIP_WORKTREE entries, but it
> silently succeeds when a pathspec only matches these entries. Instead,
> let's warn the user and display a hint on how to update these entries.

"Silently succeeds" reads as if it succeeds to update, but that is
not what you meant.

I guess the warning is justified and is desirable because an attempt
to add an ignored path would result in a similar hint, e.g.

    $ echo '*~' >.gitignore
    $ git add x~
    hint: use -f if you really want to...
    $ git add .

It is curious why the latter does not warn (even when there is
nothing yet to be added that is not ignored), but that is what we
have now.  If we are modeling the new behaviour for sparse entries
after the ignored files, we should do the same, I think.

> A warning message was chosen over erroring out right away to reproduce
> the same behavior `add` already exhibits with ignored files. This also
> allow users to continue their workflow without having to invoke `add`
> again with only the matching pathspecs, as the matched files will have
> already been added.

Makes sense.

> Note: refresh_index() was changed to only mark matches with
> no-SKIP-WORKTREE entries in the `seen` output parameter. This is exactly
> the behavior we want for `add`, and only `add` calls this function with
> a non-NULL `seen` pointer. So the change brings no side effect on
> other callers.

And possible new callers that wants to learn from seen[] output
would want the same semantics, presumably?

> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> index f7b0ea782e..f66d369bf4 100755
> --- a/t/t3705-add-sparse-checkout.sh
> +++ b/t/t3705-add-sparse-checkout.sh
> @@ -32,10 +32,22 @@ test_sparse_entry_unchanged() {
>  	test_cmp expected actual
>  }
>  
> +cat >sparse_entry_error <<-EOF
> +The following pathspecs only matched index entries outside the current
> +sparse checkout:
> +sparse_entry
> +EOF
> +
> +cat >error_and_hint sparse_entry_error - <<-EOF
> +hint: Disable or modify the sparsity rules if you intend to update such entries.
> +hint: Disable this message with "git config advice.updateSparsePath false"
> +EOF
> +
>  test_expect_success "git add does not remove SKIP_WORKTREE entries" '
>  	setup_sparse_entry &&
>  	rm sparse_entry &&
> -	git add sparse_entry &&
> +	test_must_fail git add sparse_entry 2>stderr &&
> +	test_i18ncmp error_and_hint stderr &&

OK, this demonstrates what exactly you meant by "silently succeed".
We are not expecting side effects that are any different from before
(i.e. sparse_entry is not removed from the index), but the command
is taught to error out and hint, which makes sense.

>  	test_sparse_entry_unchanged
>  '
>  
> @@ -56,7 +68,8 @@ do
>  	test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
>  		setup_sparse_entry &&
>  		echo modified >sparse_entry &&
> -		git add $opt sparse_entry &&
> +		test_must_fail git add $opt sparse_entry 2>stderr &&
> +		test_i18ncmp error_and_hint stderr &&
>  		test_sparse_entry_unchanged
>  	'
>  done
> @@ -64,7 +77,8 @@ done
>  test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
>  	setup_sparse_entry &&
>  	test-tool chmtime -60 sparse_entry &&
> -	git add --refresh sparse_entry &&
> +	test_must_fail git add --refresh sparse_entry 2>stderr &&
> +	test_i18ncmp error_and_hint stderr &&
>  
>  	# We must unset the SKIP_WORKTREE bit, otherwise
>  	# git diff-files would skip examining the file
> @@ -77,7 +91,8 @@ test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
>  
>  test_expect_success 'git add --chmod does not update SKIP_WORKTREE entries' '
>  	setup_sparse_entry &&
> -	git add --chmod=+x sparse_entry &&
> +	test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
> +	test_i18ncmp error_and_hint stderr &&
>  	test_sparse_entry_unchanged
>  '
>  
> @@ -85,8 +100,23 @@ test_expect_success 'git add --renormalize does not update SKIP_WORKTREE entries
>  	test_config core.autocrlf false &&
>  	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
>  	echo "sparse_entry text=auto" >.gitattributes &&
> -	git add --renormalize sparse_entry &&
> +	test_must_fail git add --renormalize sparse_entry 2>stderr &&
> +	test_i18ncmp error_and_hint stderr &&
>  	test_sparse_entry_unchanged
>  '
>  
> +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
> +	setup_sparse_entry &&
> +	test_must_fail git add nonexistent sp 2>stderr &&
> +	test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
> +	test_i18ngrep ! "The following pathspecs only matched index entries" stderr

This is because both of the two pathspec elements given do not match
the sparse entries?  It is curious how the command behaves when
given a pathspec that is broader, e.g. "." (aka "everything under
the sun").  We could do "add --dry-run" for the test if we do not
want to set up .gitignore appropriately and do not want to smudge
the index with stderr, expect, actual etc.

> +'
> +
> +test_expect_success 'add obeys advice.updateSparsePath' '
> +	setup_sparse_entry &&
> +	test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
> +	test_i18ncmp sparse_entry_error stderr
> +
> +'

OK.

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries
  2021-02-19  0:34       ` Junio C Hamano
@ 2021-02-19 17:11         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares Bernardino @ 2021-02-19 17:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Derrick Stolee

On Thu, Feb 18, 2021 at 9:34 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > `git add` already refrains from updating SKIP_WORKTREE entries, but it
> > silently succeeds when a pathspec only matches these entries. Instead,
> > let's warn the user and display a hint on how to update these entries.
>
> "Silently succeeds" reads as if it succeeds to update, but that is
> not what you meant.

Ok, I will rephrase this section.

> I guess the warning is justified and is desirable because an attempt
> to add an ignored path would result in a similar hint, e.g.
>
>     $ echo '*~' >.gitignore
>     $ git add x~
>     hint: use -f if you really want to...
>     $ git add .
>
> It is curious why the latter does not warn (even when there is
> nothing yet to be added that is not ignored), but that is what we
> have now.

Yeah, this also happens with other directories:

    $ echo 'dir/file' >.gitignore
    $ mkdir dir
    $ touch dir/file
    $ git add dir
    <no warning>

In your previous example, `git add '*~'` and `git add '[xy]~'` also
wouldn't warn about 'x~'.  IIUC, that's because `add` uses
DIR_COLLECT_IGNORED for fill_directory(), and this option "Only
returns ignored files that match pathspec exactly (no wildcards)."

> > Note: refresh_index() was changed to only mark matches with
> > no-SKIP-WORKTREE entries in the `seen` output parameter. This is exactly
> > the behavior we want for `add`, and only `add` calls this function with
> > a non-NULL `seen` pointer. So the change brings no side effect on
> > other callers.
>
> And possible new callers that wants to learn from seen[] output
> would want the same semantics, presumably?

Hmm, TBH I haven't given much thought about new callers that might
want to use seen[]. Perhaps would it be better to implement this
behind a REFRESH_* flag?

> > +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
> > +     setup_sparse_entry &&
> > +     test_must_fail git add nonexistent sp 2>stderr &&
> > +     test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
> > +     test_i18ngrep ! "The following pathspecs only matched index entries" stderr
>
> This is because both of the two pathspec elements given do not match
> the sparse entries?

Oops, 'sp' should not be here. But yes, it doesn't display the advice
because the pathspec didn't match 'sparse_entry'.

> It is curious how the command behaves when
> given a pathspec that is broader, e.g. "." (aka "everything under
> the sun").  We could do "add --dry-run" for the test if we do not
> want to set up .gitignore appropriately and do not want to smudge
> the index with stderr, expect, actual etc.

I'll add a test for '.', and perhaps also another test for the case
where the pathspec matches both sparse and dense entries. For the '.'
case, I think we could use a simple .gitignore with '*' and
'!/sparse_entry'.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-17 23:01       ` Junio C Hamano
  2021-02-17 23:22         ` Eric Sunshine
  2021-02-18  3:07         ` Matheus Tavares Bernardino
@ 2021-02-22 18:53         ` Elijah Newren
  2 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-02-22 18:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matheus Tavares, Git Mailing List, Derrick Stolee

On Wed, Feb 17, 2021 at 3:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > We already have a couple tests for `add` with SKIP_WORKTREE entries in
> > t7012, but these only cover the most basic scenarios. As we will be
> > changing how `add` deals with sparse paths in the subsequent commits,
> > let's move these two tests to their own file and add more test cases
> > for different `add` options and situations. This also demonstrates two
> > options that don't currently respect SKIP_WORKTREE entries: `--chmod`
> > and `--renormalize`.
>
> Nice.  It makes sense to describe what we want first, like this step..
>
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > ---
> >  t/t3705-add-sparse-checkout.sh   | 92 ++++++++++++++++++++++++++++++++
> >  t/t7012-skip-worktree-writing.sh | 19 -------
> >  2 files changed, 92 insertions(+), 19 deletions(-)
> >  create mode 100755 t/t3705-add-sparse-checkout.sh
> >
> > diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> > new file mode 100755
> > index 0000000000..5530e796b5
> > --- /dev/null
> > +++ b/t/t3705-add-sparse-checkout.sh
> > @@ -0,0 +1,92 @@
> > +#!/bin/sh
> > +
> > +test_description='git add in sparse checked out working trees'
> > +
> > +. ./test-lib.sh
> > +
> > +SPARSE_ENTRY_BLOB=""
> > +
> > +# Optionally take a string for the entry's contents
> > +setup_sparse_entry()
> > +{
>
> Style.
>
>         setup_sparse_entry () {
>
> on a single line.
>
> > +     if test -f sparse_entry
> > +     then
> > +             rm sparse_entry
> > +     fi &&
> > +     git update-index --force-remove sparse_entry &&
>
> Why not an unconditional removal on the working tree side?
>
>         rm -f sparse_entry &&
>         git update-index --force-remove sparse_entry &&
>
> Are there cases where we may have sparse_entry directory here?
>
> > +
> > +     if test "$#" -eq 1
>
> No need to quote $# (we know it is a number).
>
> > +     then
> > +             printf "$1" >sparse_entry
>
> Make sure the test writers know that they are passing a string that
> will be interpreted as a printf format.  Review the comment before
> the function and adjust it appropriately ("a string" is not what you
> want to tell them).
>
> > +     else
> > +             printf "" >sparse_entry
>
> Just
>
>                 >sparse_entry
>
> is sufficient, no?
>
> > +     fi &&
> > +     git add sparse_entry &&
> > +     git update-index --skip-worktree sparse_entry &&
> > +     SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
> > +}
> > +
> > +test_sparse_entry_unchanged() {
>
> Style.
>
>         test_sparse_entry_unchanged () {
>
> > +     echo "100644 $SPARSE_ENTRY_BLOB 0       sparse_entry" >expected &&
> > +     git ls-files --stage sparse_entry >actual &&
> > +     test_cmp expected actual
>
> OK.  So the expected pattern is to first "setup", do stuff that
> shouldn't affect the sparse entry in the index, and then call this
> to make sure?
>
> > +}
>
> > +test_expect_success "git add does not remove SKIP_WORKTREE entries" '
>
> We use the term SKIP_WORKTREE and SPARSE interchangeably here.  I
> wonder if it is easier to understand if we stick to one e.g. by
> saying "... does not remove 'sparse' entries" instead?  I dunno.
>
> > +     setup_sparse_entry &&
> > +     rm sparse_entry &&
> > +     git add sparse_entry &&
> > +     test_sparse_entry_unchanged
>
> Wow.  OK.  Makes a reader wonder what should happen when the two
> operations are replaced with "git rm sparse_entry"; let's read on.
>
> > +'
> > +
> > +test_expect_success "git add -A does not remove SKIP_WORKTREE entries" '
> > +     setup_sparse_entry &&
> > +     rm sparse_entry &&
> > +     git add -A &&
> > +     test_sparse_entry_unchanged
> > +'
>
> OK.  As there is nothing other than sparse_entry in the working tree
> or in the index, the above two should be equivalent.
>
> I wonder what should happen if the "add -A" gets replaced with "add .";
> it should behave the same way, I think.  Is it worth testing that
> case as well, or is it redundant?
>
> > +for opt in "" -f -u --ignore-removal
> > +do
> > +     if test -n "$opt"
> > +     then
> > +             opt=" $opt"
> > +     fi
>
> The above is cumulative, and as a consequence, "git add -u <path>"
> is not tested, but "git add -f -u <path>" is.  Intended?  How was
> the order of the options listed in "for opt in ..." chosen?
>
> > +     test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
> > +             setup_sparse_entry &&
> > +             echo modified >sparse_entry &&
> > +             git add $opt sparse_entry &&
> > +             test_sparse_entry_unchanged
> > +     '
> > +done
> > +
> > +test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
> > +     setup_sparse_entry &&
> > +     test-tool chmtime -60 sparse_entry &&
> > +     git add --refresh sparse_entry &&
> > +
> > +     # We must unset the SKIP_WORKTREE bit, otherwise
> > +     # git diff-files would skip examining the file
> > +     git update-index --no-skip-worktree sparse_entry &&
> > +
> > +     echo sparse_entry >expected &&
> > +     git diff-files --name-only sparse_entry >actual &&
> > +     test_cmp actual expected
>
> Hmph, I am not sure what we are testing at this point.  One way to
> make the final diff-files step show sparse_entry would be for "git
> add --refresh" to be a no-op, in which case, the cached stat
> information in the index would be different in mtime from the path
> in the working tree.  But "update-index --no-skip-worktree" may be
> buggy and further change or invalidate the cached stat information
> to cause diff-files to report that the path may be different.
>
> > +'
> > +
> > +test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
> > +     setup_sparse_entry &&
> > +     git add --chmod=+x sparse_entry &&
> > +     test_sparse_entry_unchanged
>
> Hmph.  Should we also check if sparse_entry in the filesystem also
> is not made executable, not just the entry in the index?
>
> > +'
> > +
> > +test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
> > +     test_config core.autocrlf false &&
> > +     setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
> > +     echo "sparse_entry text=auto" >.gitattributes &&
> > +     git add --renormalize sparse_entry &&
> > +     test_sparse_entry_unchanged
>
> Makes sense.
>
> What should "git diff sparse_entry" say at this point, I have to
> wonder?

It'll show nothing:

$ git show :0:sparse_entry
LINEONE
LINETWO
$ echo foobar >sparse_entry
$ cat sparse_entry
foobar
$ git diff sparse_entry
$

Likewise, `git status` will ignore SKIP_WORKTREE entries, and `reset
--hard` will fail to correct these files.  Several of these were
discussed at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/;
Matheus is just cleaning up the first few candidates.

I thought that present-despite-SKIP_WORKTREE-setting files would be
extremely rare.  While they are somewhat rare, they show up more than
I thought.  The reason I've seen them appear for users is that they
hit Ctrl+C in the middle of progress updates for a `git
sparse-checkout {disable,add} ...` command; they decide mid-operation
that they specified the wrong set of sparsity paths or didn't really
want to unsparsify or whatever, and wrongly assume that operations are
atomic (individual files are checked out to the working copy first,
followed by an update of the index and .git/info/sparse-checkout file
at the end).

The same kind of issue exists without SKIP_WORKTREE when people use
checkout or switch to change branches and hit Ctrl+C in the middle,
but `git status` will warn users about those and `git reset --hard`
will clean them up.  Neither is true in the
present-despite-SKIP_WORKTREE files case.  That should also be fixed
up, but add & rm were easier low-hanging fruit.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
                       ` (6 preceding siblings ...)
  2021-02-17 21:02     ` [RFC PATCH 7/7] rm: honor sparse checkout patterns Matheus Tavares
@ 2021-02-22 18:57     ` Elijah Newren
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
  8 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-02-22 18:57 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Git Mailing List, Derrick Stolee

Hi,

Sorry for the delay in getting back to you.

On Wed, Feb 17, 2021 at 1:02 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> This is based on the discussion at [1]. It makes `rm` honor sparse
> checkouts and adds a warning to both `rm` and `add`, for the case where
> a pathspec _only_ matches skip-worktree entries. The first two patches
> are somewhat unrelated fixes, but they are used by the later patches.
>
> [1]: https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/

I thought you said you wouldn't have time to look at git-add.  Are
there other commands you don't have time to look at?  I've got a few
suggestions...  :-)

Thanks so much for working on this; much appreciated.  I've looked
over the patch series, and am obviously a fan of the general thrust.
I didn't spot anything additional to point out; Junio already looked
over it pretty closely it appears.  I did add a comment answering one
of his questions, but that's it.

> Matheus Tavares (7):
>   add --chmod: don't update index when --dry-run is used
>   add: include magic part of pathspec on --refresh error
>   t3705: add tests for `git add` in sparse checkouts
>   add: make --chmod and --renormalize honor sparse checkouts
>   pathspec: allow to ignore SKIP_WORKTREE entries on index matching
>   add: warn when pathspec only matches SKIP_WORKTREE entries
>   rm: honor sparse checkout patterns
>
>  Documentation/config/advice.txt  |   4 +
>  Documentation/git-rm.txt         |   4 +-
>  advice.c                         |  19 +++++
>  advice.h                         |   4 +
>  builtin/add.c                    |  72 ++++++++++++++----
>  builtin/check-ignore.c           |   2 +-
>  builtin/rm.c                     |  35 ++++++---
>  pathspec.c                       |  25 ++++++-
>  pathspec.h                       |  13 +++-
>  read-cache.c                     |   3 +-
>  t/t3600-rm.sh                    |  54 ++++++++++++++
>  t/t3700-add.sh                   |  26 +++++++
>  t/t3705-add-sparse-checkout.sh   | 122 +++++++++++++++++++++++++++++++
>  t/t7011-skip-worktree-reading.sh |   5 --
>  t/t7012-skip-worktree-writing.sh |  19 -----
>  15 files changed, 349 insertions(+), 58 deletions(-)
>  create mode 100755 t/t3705-add-sparse-checkout.sh
>
> --
> 2.29.2

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/7] add/rm: honor sparse checkout and warn on sparse paths
  2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
                       ` (7 preceding siblings ...)
  2021-02-22 18:57     ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Elijah Newren
@ 2021-02-24  4:05     ` Matheus Tavares
  2021-02-24  4:05       ` [PATCH v2 1/7] add: include magic part of pathspec on --refresh error Matheus Tavares
                         ` (7 more replies)
  8 siblings, 8 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

Make `rm` honor sparse checkouts, and make both `rm` and `add` warn
about pathspecs that only match sparse entries.

This is now based on mt/add-chmod-fixes.

Main changes since RFC:

Ejected "add --chmod: don't update index when --dry-run is used", which
is now part of mt/add-chmod-fixes.

Patch 2:
- Fixed style problems.
- Fixed comment on setup_sparse_entry() about argument being passed to
  printf.
- Standardized on 'sparse entries' on test names.
- Added test for 'git add .', and set up a .gitignore to run this and
  the 'git add -A' test, to avoid smudging the index.
- Added test for --dry-run.
- Modified `git add --refresh` test to use `git ls-files --debug`
  instead of diff-files.

Added patch 5

Patch 6:
- Improved commit message.
- Used the flag added in patch 5 instead of directly changing
  refresh_index(), so that new users of seen[] can still have the
  complete match array.
- Added test for --ignore-missing.
- Added test where the pathspec matches both sparse and dense entries.
- Changed 'git add .' behavior where . only contains sparse entries. [1]

Patch 7:
- Moved rm-sparse-checkout tests to their own file. They need a repo
  with a different setup than the one t3600 has, and the subshells (or
  git -C ...) were making the tests too hard to read.
- Added tests for -f, --dry-run and --ignore-unmatch.
- Added test where the pathspec matches both sparse and dense entries.
- Also test if the index entry is really not removed (besides testing
  `git rm` exit code and stderr).
- Passed "a" instead of "/a" to git sparse-checkout to avoid MSYS path
  conversions on Windows that would make some of the new tests fail.

[1]: Although the sparse entries warning is based on the ignored files
     warning, they have some differences. One of them is that running
     'git add dir' where 'dir/somefile' is ignored does not trigger a
     warning, whereas if 'somefile' is a sparse entry, we warn. In
     this sense, I think it's more coherent to also warn on 'git add .'
     when . only has sparse entries. This is also consistent with what
     'git rm -r .' does in the same scenario, after this series.

Matheus Tavares (7):
  add: include magic part of pathspec on --refresh error
  t3705: add tests for `git add` in sparse checkouts
  add: make --chmod and --renormalize honor sparse checkouts
  pathspec: allow to ignore SKIP_WORKTREE entries on index matching
  refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag
  add: warn when pathspec only matches SKIP_WORKTREE entries
  rm: honor sparse checkout patterns

 Documentation/config/advice.txt  |   4 +
 Documentation/git-rm.txt         |   4 +-
 advice.c                         |  19 ++++
 advice.h                         |   4 +
 builtin/add.c                    |  75 ++++++++++++---
 builtin/check-ignore.c           |   2 +-
 builtin/rm.c                     |  35 ++++---
 cache.h                          |  15 +--
 pathspec.c                       |  25 ++++-
 pathspec.h                       |  13 ++-
 read-cache.c                     |   5 +-
 t/t3602-rm-sparse-checkout.sh    |  76 +++++++++++++++
 t/t3700-add.sh                   |   6 ++
 t/t3705-add-sparse-checkout.sh   | 155 +++++++++++++++++++++++++++++++
 t/t7011-skip-worktree-reading.sh |   5 -
 t/t7012-skip-worktree-writing.sh |  19 ----
 16 files changed, 398 insertions(+), 64 deletions(-)
 create mode 100755 t/t3602-rm-sparse-checkout.sh
 create mode 100755 t/t3705-add-sparse-checkout.sh

Range-diff against v1:
1:  5612c57977 < -:  ---------- add --chmod: don't update index when --dry-run is used
2:  5a06223007 = 1:  2831fd5744 add: include magic part of pathspec on --refresh error
3:  4fc81a83b1 ! 2:  72b8787018 t3705: add tests for `git add` in sparse checkouts
    @@ t/t3705-add-sparse-checkout.sh (new)
     +
     +SPARSE_ENTRY_BLOB=""
     +
    -+# Optionally take a string for the entry's contents
    -+setup_sparse_entry()
    -+{
    -+	if test -f sparse_entry
    -+	then
    -+		rm sparse_entry
    -+	fi &&
    ++# Optionally take a printf format string to write to the sparse_entry file
    ++setup_sparse_entry () {
    ++	rm -f sparse_entry &&
     +	git update-index --force-remove sparse_entry &&
     +
    -+	if test "$#" -eq 1
    ++	if test $# -eq 1
     +	then
     +		printf "$1" >sparse_entry
     +	else
    -+		printf "" >sparse_entry
    ++		>sparse_entry
     +	fi &&
     +	git add sparse_entry &&
     +	git update-index --skip-worktree sparse_entry &&
     +	SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
     +}
     +
    -+test_sparse_entry_unchanged() {
    ++test_sparse_entry_unchanged () {
     +	echo "100644 $SPARSE_ENTRY_BLOB 0	sparse_entry" >expected &&
     +	git ls-files --stage sparse_entry >actual &&
     +	test_cmp expected actual
     +}
     +
    -+test_expect_success "git add does not remove SKIP_WORKTREE entries" '
    ++setup_gitignore () {
    ++	test_when_finished rm -f .gitignore &&
    ++	cat >.gitignore <<-EOF
    ++	*
    ++	!/sparse_entry
    ++	EOF
    ++}
    ++
    ++test_expect_success 'git add does not remove sparse entries' '
     +	setup_sparse_entry &&
     +	rm sparse_entry &&
     +	git add sparse_entry &&
     +	test_sparse_entry_unchanged
     +'
     +
    -+test_expect_success "git add -A does not remove SKIP_WORKTREE entries" '
    ++test_expect_success 'git add -A does not remove sparse entries' '
     +	setup_sparse_entry &&
     +	rm sparse_entry &&
    ++	setup_gitignore &&
     +	git add -A &&
     +	test_sparse_entry_unchanged
     +'
     +
    -+for opt in "" -f -u --ignore-removal
    -+do
    -+	if test -n "$opt"
    -+	then
    -+		opt=" $opt"
    -+	fi
    ++test_expect_success 'git add . does not remove sparse entries' '
    ++	setup_sparse_entry &&
    ++	rm sparse_entry &&
    ++	setup_gitignore &&
    ++	git add . &&
    ++	test_sparse_entry_unchanged
    ++'
     +
    -+	test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
    ++for opt in "" -f -u --ignore-removal --dry-run
    ++do
    ++	test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
     +		setup_sparse_entry &&
     +		echo modified >sparse_entry &&
     +		git add $opt sparse_entry &&
    @@ t/t3705-add-sparse-checkout.sh (new)
     +	'
     +done
     +
    -+test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
    ++test_expect_success 'git add --refresh does not update sparse entries' '
     +	setup_sparse_entry &&
    ++	git ls-files --debug sparse_entry | grep mtime >before &&
     +	test-tool chmtime -60 sparse_entry &&
     +	git add --refresh sparse_entry &&
    -+
    -+	# We must unset the SKIP_WORKTREE bit, otherwise
    -+	# git diff-files would skip examining the file
    -+	git update-index --no-skip-worktree sparse_entry &&
    -+
    -+	echo sparse_entry >expected &&
    -+	git diff-files --name-only sparse_entry >actual &&
    -+	test_cmp actual expected
    ++	git ls-files --debug sparse_entry | grep mtime >after &&
    ++	test_cmp before after
     +'
     +
    -+test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
    ++test_expect_failure 'git add --chmod does not update sparse entries' '
     +	setup_sparse_entry &&
     +	git add --chmod=+x sparse_entry &&
    -+	test_sparse_entry_unchanged
    ++	test_sparse_entry_unchanged &&
    ++	! test -x sparse_entry
     +'
     +
    -+test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
    ++test_expect_failure 'git add --renormalize does not update sparse entries' '
     +	test_config core.autocrlf false &&
     +	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
     +	echo "sparse_entry text=auto" >.gitattributes &&
4:  ccb05cc3d7 ! 3:  0f03adf241 add: make --chmod and --renormalize honor sparse checkouts
    @@ Commit message
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## builtin/add.c ##
    -@@ builtin/add.c: static void chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
    - 	for (i = 0; i < active_nr; i++) {
    +@@ builtin/add.c: static int chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
      		struct cache_entry *ce = active_cache[i];
    + 		int err;
      
     +		if (ce_skip_worktree(ce))
     +			continue;
    @@ builtin/add.c: static int renormalize_tracked_files(const struct pathspec *paths
      		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
     
      ## t/t3705-add-sparse-checkout.sh ##
    -@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
    - 	test_cmp actual expected
    +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update sparse entries' '
    + 	test_cmp before after
      '
      
    --test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
    -+test_expect_success 'git add --chmod does not update SKIP_WORKTREE entries' '
    +-test_expect_failure 'git add --chmod does not update sparse entries' '
    ++test_expect_success 'git add --chmod does not update sparse entries' '
      	setup_sparse_entry &&
      	git add --chmod=+x sparse_entry &&
    - 	test_sparse_entry_unchanged
    + 	test_sparse_entry_unchanged &&
    + 	! test -x sparse_entry
      '
      
    --test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
    -+test_expect_success 'git add --renormalize does not update SKIP_WORKTREE entries' '
    +-test_expect_failure 'git add --renormalize does not update sparse entries' '
    ++test_expect_success 'git add --renormalize does not update sparse entries' '
      	test_config core.autocrlf false &&
      	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
      	echo "sparse_entry text=auto" >.gitattributes &&
5:  00786cba82 = 4:  a8a8af22a0 pathspec: allow to ignore SKIP_WORKTREE entries on index matching
-:  ---------- > 5:  d65b214dd1 refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag
6:  ce74a60e0d ! 6:  24e889ca9b add: warn when pathspec only matches SKIP_WORKTREE entries
    @@ Commit message
         add: warn when pathspec only matches SKIP_WORKTREE entries
     
         `git add` already refrains from updating SKIP_WORKTREE entries, but it
    -    silently succeeds when a pathspec only matches these entries. Instead,
    -    let's warn the user and display a hint on how to update these entries.
    +    silently exits with zero code when a pathspec only matches these
    +    entries. Instead, let's warn the user and display a hint on how to
    +    update these entries.
     
         Note that the warning is only shown if the pathspec matches no untracked
         paths in the working tree and only matches index entries with the
    -    SKIP_WORKTREE bit set. Performance-wise, this patch doesn't change the
    -    number of ce_path_match() calls in the worst case scenario (because we
    -    still need to check the sparse entries for the warning). But in the
    -    general case, it avoids unnecessarily calling this function for each
    -    SKIP_WORKTREE entry.
    -
    -    A warning message was chosen over erroring out right away to reproduce
    -    the same behavior `add` already exhibits with ignored files. This also
    -    allow users to continue their workflow without having to invoke `add`
    -    again with only the matching pathspecs, as the matched files will have
    -    already been added.
    -
    -    Note: refresh_index() was changed to only mark matches with
    -    no-SKIP-WORKTREE entries in the `seen` output parameter. This is exactly
    -    the behavior we want for `add`, and only `add` calls this function with
    -    a non-NULL `seen` pointer. So the change brings no side effect on
    -    other callers.
    +    SKIP_WORKTREE bit set. A warning message was chosen over erroring out
    +    right away to reproduce the same behavior `add` already exhibits with
    +    ignored files. This also allow users to continue their workflow without
    +    having to invoke `add` again with only the matching pathspecs, as the
    +    matched files will have already been added.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
    @@ builtin/add.c: static char *prune_directory(struct dir_struct *dir, struct paths
     +	int i, ret = 0;
     +	char *skip_worktree_seen = NULL;
     +	struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
    ++	int flags = REFRESH_DONT_MARK_SPARSE_MATCHES |
    ++		    (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
      
      	seen = xcalloc(pathspec->nr, 1);
    - 	refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
    - 		      pathspec, seen, _("Unstaged changes after refreshing the index:"));
    +-	refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
    +-		      pathspec, seen, _("Unstaged changes after refreshing the index:"));
    ++	refresh_index(&the_index, flags, pathspec, seen,
    ++		      _("Unstaged changes after refreshing the index:"));
      	for (i = 0; i < pathspec->nr; i++) {
     -		if (!seen[i])
     -			die(_("pathspec '%s' did not match any files"),
    @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
     -			    ((pathspec.items[i].magic &
     -			      (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
     -			     !file_exists(path))) {
    -+			if (seen[i] || !path[0])
    ++			if (seen[i])
     +				continue;
     +
     +			if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen)) {
    @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
     +				continue;
     +			}
     +
    ++			/* Don't complain at 'git add .' inside empty repo. */
    ++			if (!path[0])
    ++				continue;
    ++
     +			if ((pathspec.items[i].magic & (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
     +			    !file_exists(path)) {
      				if (ignore_missing) {
    @@ pathspec.h: void add_pathspec_matches_against_index(const struct pathspec *paths
      			 const char *name, int namelen,
      			 const struct pathspec_item *item);
     
    - ## read-cache.c ##
    -@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
    - 		if (ignore_submodules && S_ISGITLINK(ce->ce_mode))
    - 			continue;
    - 
    --		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
    -+		if (pathspec && !ce_path_match(istate, ce, pathspec,
    -+					       ce_skip_worktree(ce) ? NULL : seen))
    - 			filtered = 1;
    - 
    - 		if (ce_stage(ce)) {
    -
      ## t/t3705-add-sparse-checkout.sh ##
    -@@ t/t3705-add-sparse-checkout.sh: test_sparse_entry_unchanged() {
    - 	test_cmp expected actual
    +@@ t/t3705-add-sparse-checkout.sh: setup_gitignore () {
    + 	EOF
      }
      
    -+cat >sparse_entry_error <<-EOF
    -+The following pathspecs only matched index entries outside the current
    -+sparse checkout:
    -+sparse_entry
    -+EOF
    ++test_expect_success 'setup' '
    ++	cat >sparse_error_header <<-EOF &&
    ++	The following pathspecs only matched index entries outside the current
    ++	sparse checkout:
    ++	EOF
    ++
    ++	cat >sparse_hint <<-EOF &&
    ++	hint: Disable or modify the sparsity rules if you intend to update such entries.
    ++	hint: Disable this message with "git config advice.updateSparsePath false"
    ++	EOF
     +
    -+cat >error_and_hint sparse_entry_error - <<-EOF
    -+hint: Disable or modify the sparsity rules if you intend to update such entries.
    -+hint: Disable this message with "git config advice.updateSparsePath false"
    -+EOF
    ++	echo sparse_entry | cat sparse_error_header - >sparse_entry_error &&
    ++	cat sparse_entry_error sparse_hint >error_and_hint
    ++'
     +
    - test_expect_success "git add does not remove SKIP_WORKTREE entries" '
    + test_expect_success 'git add does not remove sparse entries' '
      	setup_sparse_entry &&
      	rm sparse_entry &&
     -	git add sparse_entry &&
    @@ t/t3705-add-sparse-checkout.sh: test_sparse_entry_unchanged() {
      	test_sparse_entry_unchanged
      '
      
    +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add -A does not remove sparse entries' '
    + 	setup_sparse_entry &&
    + 	rm sparse_entry &&
    + 	setup_gitignore &&
    +-	git add -A &&
    ++	git add -A 2>stderr &&
    ++	test_must_be_empty stderr &&
    + 	test_sparse_entry_unchanged
    + '
    + 
    +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add . does not remove sparse entries' '
    + 	setup_sparse_entry &&
    + 	rm sparse_entry &&
    + 	setup_gitignore &&
    +-	git add . &&
    ++	test_must_fail git add . 2>stderr &&
    ++
    ++	cat sparse_error_header >expect &&
    ++	echo . >>expect &&
    ++	cat sparse_hint >>expect &&
    ++
    ++	test_i18ncmp expect stderr &&
    + 	test_sparse_entry_unchanged
    + '
    + 
     @@ t/t3705-add-sparse-checkout.sh: do
    - 	test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
    + 	test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
      		setup_sparse_entry &&
      		echo modified >sparse_entry &&
     -		git add $opt sparse_entry &&
    @@ t/t3705-add-sparse-checkout.sh: do
      		test_sparse_entry_unchanged
      	'
      done
    -@@ t/t3705-add-sparse-checkout.sh: done
    - test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
    +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update sparse entries' '
      	setup_sparse_entry &&
    + 	git ls-files --debug sparse_entry | grep mtime >before &&
      	test-tool chmtime -60 sparse_entry &&
     -	git add --refresh sparse_entry &&
     +	test_must_fail git add --refresh sparse_entry 2>stderr &&
     +	test_i18ncmp error_and_hint stderr &&
    + 	git ls-files --debug sparse_entry | grep mtime >after &&
    + 	test_cmp before after
    + '
      
    - 	# We must unset the SKIP_WORKTREE bit, otherwise
    - 	# git diff-files would skip examining the file
    -@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
    - 
    - test_expect_success 'git add --chmod does not update SKIP_WORKTREE entries' '
    + test_expect_success 'git add --chmod does not update sparse entries' '
      	setup_sparse_entry &&
     -	git add --chmod=+x sparse_entry &&
     +	test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
     +	test_i18ncmp error_and_hint stderr &&
    - 	test_sparse_entry_unchanged
    + 	test_sparse_entry_unchanged &&
    + 	! test -x sparse_entry
      '
    - 
    -@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --renormalize does not update SKIP_WORKTREE entries
    +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --renormalize does not update sparse entries' '
      	test_config core.autocrlf false &&
      	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
      	echo "sparse_entry text=auto" >.gitattributes &&
    @@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --renormalize does
      	test_sparse_entry_unchanged
      '
      
    ++test_expect_success 'git add --dry-run --ignore-missing warn on sparse path' '
    ++	setup_sparse_entry &&
    ++	rm sparse_entry &&
    ++	test_must_fail git add --dry-run --ignore-missing sparse_entry 2>stderr &&
    ++	test_i18ncmp error_and_hint stderr &&
    ++	test_sparse_entry_unchanged
    ++'
    ++
     +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
     +	setup_sparse_entry &&
    -+	test_must_fail git add nonexistent sp 2>stderr &&
    ++	test_must_fail git add nonexistent 2>stderr &&
     +	test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
     +	test_i18ngrep ! "The following pathspecs only matched index entries" stderr
     +'
     +
    ++test_expect_success 'do not warn when pathspec matches dense entries' '
    ++	setup_sparse_entry &&
    ++	echo modified >sparse_entry &&
    ++	>dense_entry &&
    ++	git add "*_entry" 2>stderr &&
    ++	test_must_be_empty stderr &&
    ++	test_sparse_entry_unchanged &&
    ++	git ls-files --error-unmatch dense_entry
    ++'
    ++
     +test_expect_success 'add obeys advice.updateSparsePath' '
     +	setup_sparse_entry &&
     +	test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
7:  e76c7c6999 ! 7:  08f0c32bfc rm: honor sparse checkout patterns
    @@ Commit message
         rules, but `git rm` doesn't follow the same restrictions. This is
         somewhat counter-intuitive and inconsistent. So make `rm` honor the
         sparse checkout and advise on how to remove SKIP_WORKTREE entries, just
    -    like `add` does. Also add a few tests for the new behavior.
    +    like `add` does. Also add some tests for the new behavior.
     
         Suggested-by: Elijah Newren <newren@gmail.com>
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
    @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
     +	return ret;
      }
     
    - ## t/t3600-rm.sh ##
    -@@ t/t3600-rm.sh: test_expect_success 'rm empty string should fail' '
    - 	test_must_fail git rm -rf ""
    - '
    - 
    -+test_expect_success 'setup repo for tests with sparse-checkout' '
    -+	git init sparse &&
    -+	(
    -+		cd sparse &&
    -+		mkdir -p sub/dir &&
    -+		touch a b c sub/d sub/dir/e &&
    -+		git add -A &&
    -+		git commit -m files
    -+	) &&
    + ## t/t3602-rm-sparse-checkout.sh (new) ##
    +@@
    ++#!/bin/sh
    ++
    ++test_description='git rm in sparse checked out working trees'
    ++
    ++. ./test-lib.sh
    ++
    ++test_expect_success 'setup' '
    ++	mkdir -p sub/dir &&
    ++	touch a b c sub/d sub/dir/e &&
    ++	git add -A &&
    ++	git commit -m files &&
     +
     +	cat >sparse_entry_b_error <<-EOF &&
     +	The following pathspecs only matched index entries outside the current
    @@ t/t3600-rm.sh: test_expect_success 'rm empty string should fail' '
     +	EOF
     +'
     +
    -+test_expect_success 'rm should respect sparse-checkout' '
    -+	git -C sparse sparse-checkout set "/a" &&
    -+	test_must_fail git -C sparse rm b 2>stderr &&
    -+	test_i18ncmp b_error_and_hint stderr
    ++for opt in "" -f --dry-run
    ++do
    ++	test_expect_success "rm${opt:+ $opt} does not remove sparse entries" '
    ++		git sparse-checkout set a &&
    ++		test_must_fail git rm $opt b 2>stderr &&
    ++		test_i18ncmp b_error_and_hint stderr &&
    ++		git ls-files --error-unmatch b
    ++	'
    ++done
    ++
    ++test_expect_success 'recursive rm does not remove sparse entries' '
    ++	git reset --hard &&
    ++	git sparse-checkout set sub/dir &&
    ++	git rm -r sub &&
    ++	git status --porcelain -uno >actual &&
    ++	echo "D  sub/dir/e" >expected &&
    ++	test_cmp expected actual
     +'
     +
     +test_expect_success 'rm obeys advice.updateSparsePath' '
    -+	git -C sparse reset --hard &&
    -+	git -C sparse sparse-checkout set "/a" &&
    -+	test_must_fail git -C sparse -c advice.updateSparsePath=false rm b 2>stderr &&
    ++	git reset --hard &&
    ++	git sparse-checkout set a &&
    ++	test_must_fail git -c advice.updateSparsePath=false rm b 2>stderr &&
     +	test_i18ncmp sparse_entry_b_error stderr
    -+
    -+'
    -+
    -+test_expect_success 'recursive rm should respect sparse-checkout' '
    -+	(
    -+		cd sparse &&
    -+		git reset --hard &&
    -+		git sparse-checkout set "sub/dir" &&
    -+		git rm -r sub &&
    -+		git status --porcelain -uno >../actual
    -+	) &&
    -+	echo "D  sub/dir/e" >expected &&
    -+	test_cmp expected actual
     +'
     +
     +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
    -+	test_must_fail git -C sparse rm nonexistent 2>stderr &&
    ++	git reset --hard &&
    ++	git sparse-checkout set a &&
    ++	test_must_fail git rm nonexistent 2>stderr &&
     +	test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
     +	test_i18ngrep ! "The following pathspecs only matched index entries" stderr
     +'
     +
    - test_done
    ++test_expect_success 'do not warn about sparse entries when pathspec matches dense entries' '
    ++	git reset --hard &&
    ++	git sparse-checkout set a &&
    ++	git rm "[ba]" 2>stderr &&
    ++	test_must_be_empty stderr &&
    ++	git ls-files --error-unmatch b &&
    ++	test_must_fail git ls-files --error-unmatch a
    ++'
    ++
    ++test_expect_success 'do not warn about sparse entries with --ignore-unmatch' '
    ++	git reset --hard &&
    ++	git sparse-checkout set a &&
    ++	git rm --ignore-unmatch b 2>stderr &&
    ++	test_must_be_empty stderr &&
    ++	git ls-files --error-unmatch b
    ++'
    ++
    ++test_done
     
      ## t/t7011-skip-worktree-reading.sh ##
     @@ t/t7011-skip-worktree-reading.sh: test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 1/7] add: include magic part of pathspec on --refresh error
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
@ 2021-02-24  4:05       ` Matheus Tavares
  2021-02-24  4:05       ` [PATCH v2 2/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
                         ` (6 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

When `git add --refresh <pathspec>` doesn't find any matches for the
given pathspec, it prints an error message using the `match` field of
the `struct pathspec_item`. However, this field doesn't contain the
magic part of the pathspec. Instead, let's use the `original` field.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/add.c  | 2 +-
 t/t3700-add.sh | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/add.c b/builtin/add.c
index ea762a41e3..24ed7e25f3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -187,7 +187,7 @@ static void refresh(int verbose, const struct pathspec *pathspec)
 	for (i = 0; i < pathspec->nr; i++) {
 		if (!seen[i])
 			die(_("pathspec '%s' did not match any files"),
-			    pathspec->items[i].match);
+			    pathspec->items[i].original);
 	}
 	free(seen);
 }
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index d402c775c0..15ede17804 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -196,6 +196,12 @@ test_expect_success 'git add --refresh with pathspec' '
 	grep baz actual
 '
 
+test_expect_success 'git add --refresh correctly reports no match error' "
+	echo \"fatal: pathspec ':(icase)nonexistent' did not match any files\" >expect &&
+	test_must_fail git add --refresh ':(icase)nonexistent' 2>actual &&
+	test_i18ncmp expect actual
+"
+
 test_expect_success POSIXPERM,SANITY 'git add should fail atomically upon an unreadable file' '
 	git reset --hard &&
 	date >foo1 &&
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 2/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
  2021-02-24  4:05       ` [PATCH v2 1/7] add: include magic part of pathspec on --refresh error Matheus Tavares
@ 2021-02-24  4:05       ` Matheus Tavares
  2021-02-24  5:15         ` Elijah Newren
  2021-02-24  4:05       ` [PATCH v2 3/7] add: make --chmod and --renormalize honor " Matheus Tavares
                         ` (5 subsequent siblings)
  7 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

We already have a couple tests for `add` with SKIP_WORKTREE entries in
t7012, but these only cover the most basic scenarios. As we will be
changing how `add` deals with sparse paths in the subsequent commits,
let's move these two tests to their own file and add more test cases
for different `add` options and situations. This also demonstrates two
options that don't currently respect SKIP_WORKTREE entries: `--chmod`
and `--renormalize`.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/t3705-add-sparse-checkout.sh   | 96 ++++++++++++++++++++++++++++++++
 t/t7012-skip-worktree-writing.sh | 19 -------
 2 files changed, 96 insertions(+), 19 deletions(-)
 create mode 100755 t/t3705-add-sparse-checkout.sh

diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
new file mode 100755
index 0000000000..9bb5dc2389
--- /dev/null
+++ b/t/t3705-add-sparse-checkout.sh
@@ -0,0 +1,96 @@
+#!/bin/sh
+
+test_description='git add in sparse checked out working trees'
+
+. ./test-lib.sh
+
+SPARSE_ENTRY_BLOB=""
+
+# Optionally take a printf format string to write to the sparse_entry file
+setup_sparse_entry () {
+	rm -f sparse_entry &&
+	git update-index --force-remove sparse_entry &&
+
+	if test $# -eq 1
+	then
+		printf "$1" >sparse_entry
+	else
+		>sparse_entry
+	fi &&
+	git add sparse_entry &&
+	git update-index --skip-worktree sparse_entry &&
+	SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
+}
+
+test_sparse_entry_unchanged () {
+	echo "100644 $SPARSE_ENTRY_BLOB 0	sparse_entry" >expected &&
+	git ls-files --stage sparse_entry >actual &&
+	test_cmp expected actual
+}
+
+setup_gitignore () {
+	test_when_finished rm -f .gitignore &&
+	cat >.gitignore <<-EOF
+	*
+	!/sparse_entry
+	EOF
+}
+
+test_expect_success 'git add does not remove sparse entries' '
+	setup_sparse_entry &&
+	rm sparse_entry &&
+	git add sparse_entry &&
+	test_sparse_entry_unchanged
+'
+
+test_expect_success 'git add -A does not remove sparse entries' '
+	setup_sparse_entry &&
+	rm sparse_entry &&
+	setup_gitignore &&
+	git add -A &&
+	test_sparse_entry_unchanged
+'
+
+test_expect_success 'git add . does not remove sparse entries' '
+	setup_sparse_entry &&
+	rm sparse_entry &&
+	setup_gitignore &&
+	git add . &&
+	test_sparse_entry_unchanged
+'
+
+for opt in "" -f -u --ignore-removal --dry-run
+do
+	test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
+		setup_sparse_entry &&
+		echo modified >sparse_entry &&
+		git add $opt sparse_entry &&
+		test_sparse_entry_unchanged
+	'
+done
+
+test_expect_success 'git add --refresh does not update sparse entries' '
+	setup_sparse_entry &&
+	git ls-files --debug sparse_entry | grep mtime >before &&
+	test-tool chmtime -60 sparse_entry &&
+	git add --refresh sparse_entry &&
+	git ls-files --debug sparse_entry | grep mtime >after &&
+	test_cmp before after
+'
+
+test_expect_failure 'git add --chmod does not update sparse entries' '
+	setup_sparse_entry &&
+	git add --chmod=+x sparse_entry &&
+	test_sparse_entry_unchanged &&
+	! test -x sparse_entry
+'
+
+test_expect_failure 'git add --renormalize does not update sparse entries' '
+	test_config core.autocrlf false &&
+	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
+	echo "sparse_entry text=auto" >.gitattributes &&
+	git add --renormalize sparse_entry &&
+	test_sparse_entry_unchanged
+'
+
+test_done
diff --git a/t/t7012-skip-worktree-writing.sh b/t/t7012-skip-worktree-writing.sh
index e5c6a038fb..217207c1ce 100755
--- a/t/t7012-skip-worktree-writing.sh
+++ b/t/t7012-skip-worktree-writing.sh
@@ -60,13 +60,6 @@ setup_absent() {
 	git update-index --skip-worktree 1
 }
 
-test_absent() {
-	echo "100644 $EMPTY_BLOB 0	1" > expected &&
-	git ls-files --stage 1 > result &&
-	test_cmp expected result &&
-	test ! -f 1
-}
-
 setup_dirty() {
 	git update-index --force-remove 1 &&
 	echo dirty > 1 &&
@@ -100,18 +93,6 @@ test_expect_success 'index setup' '
 	test_cmp expected result
 '
 
-test_expect_success 'git-add ignores worktree content' '
-	setup_absent &&
-	git add 1 &&
-	test_absent
-'
-
-test_expect_success 'git-add ignores worktree content' '
-	setup_dirty &&
-	git add 1 &&
-	test_dirty
-'
-
 test_expect_success 'git-rm fails if worktree is dirty' '
 	setup_dirty &&
 	test_must_fail git rm 1 &&
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 3/7] add: make --chmod and --renormalize honor sparse checkouts
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
  2021-02-24  4:05       ` [PATCH v2 1/7] add: include magic part of pathspec on --refresh error Matheus Tavares
  2021-02-24  4:05       ` [PATCH v2 2/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
@ 2021-02-24  4:05       ` Matheus Tavares
  2021-02-24  4:05       ` [PATCH v2 4/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching Matheus Tavares
                         ` (4 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/add.c                  | 5 +++++
 t/t3705-add-sparse-checkout.sh | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 24ed7e25f3..5fec21a792 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -46,6 +46,9 @@ static int chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
 		struct cache_entry *ce = active_cache[i];
 		int err;
 
+		if (ce_skip_worktree(ce))
+			continue;
+
 		if (pathspec && !ce_path_match(&the_index, ce, pathspec, NULL))
 			continue;
 
@@ -144,6 +147,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 
+		if (ce_skip_worktree(ce))
+			continue;
 		if (ce_stage(ce))
 			continue; /* do not touch unmerged paths */
 		if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index 9bb5dc2389..6781620297 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -78,14 +78,14 @@ test_expect_success 'git add --refresh does not update sparse entries' '
 	test_cmp before after
 '
 
-test_expect_failure 'git add --chmod does not update sparse entries' '
+test_expect_success 'git add --chmod does not update sparse entries' '
 	setup_sparse_entry &&
 	git add --chmod=+x sparse_entry &&
 	test_sparse_entry_unchanged &&
 	! test -x sparse_entry
 '
 
-test_expect_failure 'git add --renormalize does not update sparse entries' '
+test_expect_success 'git add --renormalize does not update sparse entries' '
 	test_config core.autocrlf false &&
 	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
 	echo "sparse_entry text=auto" >.gitattributes &&
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 4/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
                         ` (2 preceding siblings ...)
  2021-02-24  4:05       ` [PATCH v2 3/7] add: make --chmod and --renormalize honor " Matheus Tavares
@ 2021-02-24  4:05       ` Matheus Tavares
  2021-02-24  5:23         ` Elijah Newren
  2021-02-24  4:05       ` [PATCH v2 5/7] refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag Matheus Tavares
                         ` (3 subsequent siblings)
  7 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

Add the 'ignore_skip_worktree' boolean parameter to both
add_pathspec_matches_against_index() and
find_pathspecs_matching_against_index(). When true, these functions will
not try to match the given pathspec with SKIP_WORKTREE entries. This
will be used in a future patch to make `git add` display a hint
when the pathspec matches only sparse paths.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/add.c          |  4 ++--
 builtin/check-ignore.c |  2 +-
 pathspec.c             | 10 +++++++---
 pathspec.h             |  5 +++--
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 5fec21a792..e15b25a623 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -177,7 +177,7 @@ static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec,
 			*dst++ = entry;
 	}
 	dir->nr = dst - dir->entries;
-	add_pathspec_matches_against_index(pathspec, &the_index, seen);
+	add_pathspec_matches_against_index(pathspec, &the_index, seen, 0);
 	return seen;
 }
 
@@ -578,7 +578,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		int i;
 
 		if (!seen)
-			seen = find_pathspecs_matching_against_index(&pathspec, &the_index);
+			seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
 
 		/*
 		 * file_exists() assumes exact match
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 3c652748d5..235b7fc905 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -100,7 +100,7 @@ static int check_ignore(struct dir_struct *dir,
 	 * should not be ignored, in order to be consistent with
 	 * 'git status', 'git add' etc.
 	 */
-	seen = find_pathspecs_matching_against_index(&pathspec, &the_index);
+	seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
 	for (i = 0; i < pathspec.nr; i++) {
 		full_path = pathspec.items[i].match;
 		pattern = NULL;
diff --git a/pathspec.c b/pathspec.c
index 7a229d8d22..e5e6b7458d 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -21,7 +21,7 @@
  */
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 					const struct index_state *istate,
-					char *seen)
+					char *seen, int ignore_skip_worktree)
 {
 	int num_unmatched = 0, i;
 
@@ -38,6 +38,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 		return;
 	for (i = 0; i < istate->cache_nr; i++) {
 		const struct cache_entry *ce = istate->cache[i];
+		if (ignore_skip_worktree && ce_skip_worktree(ce))
+			continue;
 		ce_path_match(istate, ce, pathspec, seen);
 	}
 }
@@ -51,10 +53,12 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
  * given pathspecs achieves against all items in the index.
  */
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate)
+					    const struct index_state *istate,
+					    int ignore_skip_worktree)
 {
 	char *seen = xcalloc(pathspec->nr, 1);
-	add_pathspec_matches_against_index(pathspec, istate, seen);
+	add_pathspec_matches_against_index(pathspec, istate, seen,
+					   ignore_skip_worktree);
 	return seen;
 }
 
diff --git a/pathspec.h b/pathspec.h
index 454ce364fa..8202882ecd 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -151,9 +151,10 @@ static inline int ps_strcmp(const struct pathspec_item *item,
 
 void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 					const struct index_state *istate,
-					char *seen);
+					char *seen, int ignore_skip_worktree);
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
-					    const struct index_state *istate);
+					    const struct index_state *istate,
+					    int ignore_skip_worktree);
 int match_pathspec_attrs(const struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 5/7] refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
                         ` (3 preceding siblings ...)
  2021-02-24  4:05       ` [PATCH v2 4/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching Matheus Tavares
@ 2021-02-24  4:05       ` Matheus Tavares
  2021-02-24  4:05       ` [PATCH v2 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
                         ` (2 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

refresh_index() optionally takes a seen[] array to mark the pathspec
items that had matches in the index. This is used by `git add --refresh`
to find out if there were any pathspec without matches, and display an
error accordingly.

In the following patch, `git add` will also learn to warn about
pathspecs that only match sparse entries (which are not updated). But
for that, we will need a seen[] array marked exclusively with matches
from dense entries. To avoid having to call ce_path_match() again for
these entries after refresh_index() returns, add a flag that implements
this restriction inside the function itself.

Note that refresh_index() does not update sparse entries, regardless of
passing the flag or not. The flag only controls whether matches with
these entries should appear in the seen[] array.

While we are here, also realign the REFRESH_* flags and convert the hex
values to the more natural bit shift format, which makes it easier to
spot holes.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 cache.h      | 15 ++++++++-------
 read-cache.c |  5 ++++-
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index d928149614..eec864921a 100644
--- a/cache.h
+++ b/cache.h
@@ -879,13 +879,14 @@ int match_stat_data_racy(const struct index_state *istate,
 
 void fill_stat_cache_info(struct index_state *istate, struct cache_entry *ce, struct stat *st);
 
-#define REFRESH_REALLY		0x0001	/* ignore_valid */
-#define REFRESH_UNMERGED	0x0002	/* allow unmerged */
-#define REFRESH_QUIET		0x0004	/* be quiet about it */
-#define REFRESH_IGNORE_MISSING	0x0008	/* ignore non-existent */
-#define REFRESH_IGNORE_SUBMODULES	0x0010	/* ignore submodules */
-#define REFRESH_IN_PORCELAIN	0x0020	/* user friendly output, not "needs update" */
-#define REFRESH_PROGRESS	0x0040  /* show progress bar if stderr is tty */
+#define REFRESH_REALLY                   (1 << 0) /* ignore_valid */
+#define REFRESH_UNMERGED                 (1 << 1) /* allow unmerged */
+#define REFRESH_QUIET                    (1 << 2) /* be quiet about it */
+#define REFRESH_IGNORE_MISSING           (1 << 3) /* ignore non-existent */
+#define REFRESH_IGNORE_SUBMODULES        (1 << 4) /* ignore submodules */
+#define REFRESH_IN_PORCELAIN             (1 << 5) /* user friendly output, not "needs update" */
+#define REFRESH_PROGRESS                 (1 << 6) /* show progress bar if stderr is tty */
+#define REFRESH_DONT_MARK_SPARSE_MATCHES (1 << 7) /* don't mark sparse entries' matches on seen[] */
 int refresh_index(struct index_state *, unsigned int flags, const struct pathspec *pathspec, char *seen, const char *header_msg);
 /*
  * Refresh the index and write it to disk.
diff --git a/read-cache.c b/read-cache.c
index 29144cf879..485510845c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1508,6 +1508,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	int quiet = (flags & REFRESH_QUIET) != 0;
 	int not_new = (flags & REFRESH_IGNORE_MISSING) != 0;
 	int ignore_submodules = (flags & REFRESH_IGNORE_SUBMODULES) != 0;
+	int no_sparse_on_seen = (flags & REFRESH_DONT_MARK_SPARSE_MATCHES) != 0;
 	int first = 1;
 	int in_porcelain = (flags & REFRESH_IN_PORCELAIN);
 	unsigned int options = (CE_MATCH_REFRESH |
@@ -1541,12 +1542,14 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		int cache_errno = 0;
 		int changed = 0;
 		int filtered = 0;
+		char *cur_seen;
 
 		ce = istate->cache[i];
 		if (ignore_submodules && S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
+		cur_seen = no_sparse_on_seen && ce_skip_worktree(ce) ? NULL : seen;
+		if (pathspec && !ce_path_match(istate, ce, pathspec, cur_seen))
 			filtered = 1;
 
 		if (ce_stage(ce)) {
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
                         ` (4 preceding siblings ...)
  2021-02-24  4:05       ` [PATCH v2 5/7] refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag Matheus Tavares
@ 2021-02-24  4:05       ` Matheus Tavares
  2021-02-24  6:50         ` Elijah Newren
  2021-02-24  4:05       ` [PATCH v2 7/7] rm: honor sparse checkout patterns Matheus Tavares
  2021-02-24  7:05       ` [PATCH v2 0/7] add/rm: honor sparse checkout and warn on sparse paths Elijah Newren
  7 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

`git add` already refrains from updating SKIP_WORKTREE entries, but it
silently exits with zero code when a pathspec only matches these
entries. Instead, let's warn the user and display a hint on how to
update these entries.

Note that the warning is only shown if the pathspec matches no untracked
paths in the working tree and only matches index entries with the
SKIP_WORKTREE bit set. A warning message was chosen over erroring out
right away to reproduce the same behavior `add` already exhibits with
ignored files. This also allow users to continue their workflow without
having to invoke `add` again with only the matching pathspecs, as the
matched files will have already been added.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/advice.txt |  3 ++
 advice.c                        | 19 +++++++++
 advice.h                        |  4 ++
 builtin/add.c                   | 70 ++++++++++++++++++++++++-------
 pathspec.c                      | 15 +++++++
 pathspec.h                      |  8 ++++
 t/t3705-add-sparse-checkout.sh  | 73 +++++++++++++++++++++++++++++----
 7 files changed, 171 insertions(+), 21 deletions(-)

diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
index acbd0c09aa..d53eafa00b 100644
--- a/Documentation/config/advice.txt
+++ b/Documentation/config/advice.txt
@@ -119,4 +119,7 @@ advice.*::
 	addEmptyPathspec::
 		Advice shown if a user runs the add command without providing
 		the pathspec parameter.
+	updateSparsePath::
+		Advice shown if the pathspec given to linkgit:git-add[1] only
+		matches index entries outside the current sparse-checkout.
 --
diff --git a/advice.c b/advice.c
index 164742305f..cf22c1a6e5 100644
--- a/advice.c
+++ b/advice.c
@@ -2,6 +2,7 @@
 #include "config.h"
 #include "color.h"
 #include "help.h"
+#include "string-list.h"
 
 int advice_fetch_show_forced_updates = 1;
 int advice_push_update_rejected = 1;
@@ -136,6 +137,7 @@ static struct {
 	[ADVICE_STATUS_HINTS]				= { "statusHints", 1 },
 	[ADVICE_STATUS_U_OPTION]			= { "statusUoption", 1 },
 	[ADVICE_SUBMODULE_ALTERNATE_ERROR_STRATEGY_DIE] = { "submoduleAlternateErrorStrategyDie", 1 },
+	[ADVICE_UPDATE_SPARSE_PATH]			= { "updateSparsePath", 1 },
 	[ADVICE_WAITING_FOR_EDITOR]			= { "waitingForEditor", 1 },
 };
 
@@ -284,6 +286,23 @@ void NORETURN die_conclude_merge(void)
 	die(_("Exiting because of unfinished merge."));
 }
 
+void advise_on_updating_sparse_paths(struct string_list *pathspec_list)
+{
+	struct string_list_item *item;
+
+	if (!pathspec_list->nr)
+		return;
+
+	fprintf(stderr, _("The following pathspecs only matched index entries outside the current\n"
+			  "sparse checkout:\n"));
+	for_each_string_list_item(item, pathspec_list)
+		fprintf(stderr, "%s\n", item->string);
+
+	advise_if_enabled(ADVICE_UPDATE_SPARSE_PATH,
+			  _("Disable or modify the sparsity rules if you intend to update such entries."));
+
+}
+
 void detach_advice(const char *new_name)
 {
 	const char *fmt =
diff --git a/advice.h b/advice.h
index bc2432980a..bd26c385d0 100644
--- a/advice.h
+++ b/advice.h
@@ -3,6 +3,8 @@
 
 #include "git-compat-util.h"
 
+struct string_list;
+
 extern int advice_fetch_show_forced_updates;
 extern int advice_push_update_rejected;
 extern int advice_push_non_ff_current;
@@ -71,6 +73,7 @@ extern int advice_add_empty_pathspec;
 	ADVICE_STATUS_HINTS,
 	ADVICE_STATUS_U_OPTION,
 	ADVICE_SUBMODULE_ALTERNATE_ERROR_STRATEGY_DIE,
+	ADVICE_UPDATE_SPARSE_PATH,
 	ADVICE_WAITING_FOR_EDITOR,
 };
 
@@ -92,6 +95,7 @@ void advise_if_enabled(enum advice_type type, const char *advice, ...);
 int error_resolve_conflict(const char *me);
 void NORETURN die_resolve_conflict(const char *me);
 void NORETURN die_conclude_merge(void);
+void advise_on_updating_sparse_paths(struct string_list *pathspec_list);
 void detach_advice(const char *new_name);
 
 #endif /* ADVICE_H */
diff --git a/builtin/add.c b/builtin/add.c
index e15b25a623..fde6462850 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -177,24 +177,43 @@ static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec,
 			*dst++ = entry;
 	}
 	dir->nr = dst - dir->entries;
-	add_pathspec_matches_against_index(pathspec, &the_index, seen, 0);
+	add_pathspec_matches_against_index(pathspec, &the_index, seen, 1);
 	return seen;
 }
 
-static void refresh(int verbose, const struct pathspec *pathspec)
+static int refresh(int verbose, const struct pathspec *pathspec)
 {
 	char *seen;
-	int i;
+	int i, ret = 0;
+	char *skip_worktree_seen = NULL;
+	struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
+	int flags = REFRESH_DONT_MARK_SPARSE_MATCHES |
+		    (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
 
 	seen = xcalloc(pathspec->nr, 1);
-	refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
-		      pathspec, seen, _("Unstaged changes after refreshing the index:"));
+	refresh_index(&the_index, flags, pathspec, seen,
+		      _("Unstaged changes after refreshing the index:"));
 	for (i = 0; i < pathspec->nr; i++) {
-		if (!seen[i])
-			die(_("pathspec '%s' did not match any files"),
-			    pathspec->items[i].original);
+		if (!seen[i]) {
+			if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
+				string_list_append(&only_match_skip_worktree,
+						   pathspec->items[i].original);
+			} else {
+				die(_("pathspec '%s' did not match any files"),
+				    pathspec->items[i].original);
+			}
+		}
+	}
+
+	if (only_match_skip_worktree.nr) {
+		advise_on_updating_sparse_paths(&only_match_skip_worktree);
+		ret = 1;
 	}
+
 	free(seen);
+	free(skip_worktree_seen);
+	string_list_clear(&only_match_skip_worktree, 0);
+	return ret;
 }
 
 int run_add_interactive(const char *revision, const char *patch_mode,
@@ -570,15 +589,17 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	}
 
 	if (refresh_only) {
-		refresh(verbose, &pathspec);
+		exit_status |= refresh(verbose, &pathspec);
 		goto finish;
 	}
 
 	if (pathspec.nr) {
 		int i;
+		char *skip_worktree_seen = NULL;
+		struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
 
 		if (!seen)
-			seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
+			seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 1);
 
 		/*
 		 * file_exists() assumes exact match
@@ -592,12 +613,24 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 		for (i = 0; i < pathspec.nr; i++) {
 			const char *path = pathspec.items[i].match;
+
 			if (pathspec.items[i].magic & PATHSPEC_EXCLUDE)
 				continue;
-			if (!seen[i] && path[0] &&
-			    ((pathspec.items[i].magic &
-			      (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
-			     !file_exists(path))) {
+			if (seen[i])
+				continue;
+
+			if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen)) {
+				string_list_append(&only_match_skip_worktree,
+						   pathspec.items[i].original);
+				continue;
+			}
+
+			/* Don't complain at 'git add .' inside empty repo. */
+			if (!path[0])
+				continue;
+
+			if ((pathspec.items[i].magic & (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
+			    !file_exists(path)) {
 				if (ignore_missing) {
 					int dtype = DT_UNKNOWN;
 					if (is_excluded(&dir, &the_index, path, &dtype))
@@ -608,7 +641,16 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 					    pathspec.items[i].original);
 			}
 		}
+
+
+		if (only_match_skip_worktree.nr) {
+			advise_on_updating_sparse_paths(&only_match_skip_worktree);
+			exit_status = 1;
+		}
+
 		free(seen);
+		free(skip_worktree_seen);
+		string_list_clear(&only_match_skip_worktree, 0);
 	}
 
 	plug_bulk_checkin();
diff --git a/pathspec.c b/pathspec.c
index e5e6b7458d..61f294fed5 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -62,6 +62,21 @@ char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
 	return seen;
 }
 
+char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec)
+{
+	struct index_state *istate = the_repository->index;
+	char *seen = xcalloc(pathspec->nr, 1);
+	int i;
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce_skip_worktree(ce))
+		    ce_path_match(istate, ce, pathspec, seen);
+	}
+
+	return seen;
+}
+
 /*
  * Magic pathspec
  *
diff --git a/pathspec.h b/pathspec.h
index 8202882ecd..f591ba625c 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -155,6 +155,14 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
 					    const struct index_state *istate,
 					    int ignore_skip_worktree);
+char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec);
+static inline int matches_skip_worktree(const struct pathspec *pathspec,
+					int item, char **seen_ptr)
+{
+	if (!*seen_ptr)
+		*seen_ptr = find_pathspecs_matching_skip_worktree(pathspec);
+	return (*seen_ptr)[item];
+}
 int match_pathspec_attrs(const struct index_state *istate,
 			 const char *name, int namelen,
 			 const struct pathspec_item *item);
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index 6781620297..fdfd8b085e 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -36,10 +36,26 @@ setup_gitignore () {
 	EOF
 }
 
+test_expect_success 'setup' '
+	cat >sparse_error_header <<-EOF &&
+	The following pathspecs only matched index entries outside the current
+	sparse checkout:
+	EOF
+
+	cat >sparse_hint <<-EOF &&
+	hint: Disable or modify the sparsity rules if you intend to update such entries.
+	hint: Disable this message with "git config advice.updateSparsePath false"
+	EOF
+
+	echo sparse_entry | cat sparse_error_header - >sparse_entry_error &&
+	cat sparse_entry_error sparse_hint >error_and_hint
+'
+
 test_expect_success 'git add does not remove sparse entries' '
 	setup_sparse_entry &&
 	rm sparse_entry &&
-	git add sparse_entry &&
+	test_must_fail git add sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
 
@@ -47,7 +63,8 @@ test_expect_success 'git add -A does not remove sparse entries' '
 	setup_sparse_entry &&
 	rm sparse_entry &&
 	setup_gitignore &&
-	git add -A &&
+	git add -A 2>stderr &&
+	test_must_be_empty stderr &&
 	test_sparse_entry_unchanged
 '
 
@@ -55,7 +72,13 @@ test_expect_success 'git add . does not remove sparse entries' '
 	setup_sparse_entry &&
 	rm sparse_entry &&
 	setup_gitignore &&
-	git add . &&
+	test_must_fail git add . 2>stderr &&
+
+	cat sparse_error_header >expect &&
+	echo . >>expect &&
+	cat sparse_hint >>expect &&
+
+	test_i18ncmp expect stderr &&
 	test_sparse_entry_unchanged
 '
 
@@ -64,7 +87,8 @@ do
 	test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
 		setup_sparse_entry &&
 		echo modified >sparse_entry &&
-		git add $opt sparse_entry &&
+		test_must_fail git add $opt sparse_entry 2>stderr &&
+		test_i18ncmp error_and_hint stderr &&
 		test_sparse_entry_unchanged
 	'
 done
@@ -73,14 +97,16 @@ test_expect_success 'git add --refresh does not update sparse entries' '
 	setup_sparse_entry &&
 	git ls-files --debug sparse_entry | grep mtime >before &&
 	test-tool chmtime -60 sparse_entry &&
-	git add --refresh sparse_entry &&
+	test_must_fail git add --refresh sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 	git ls-files --debug sparse_entry | grep mtime >after &&
 	test_cmp before after
 '
 
 test_expect_success 'git add --chmod does not update sparse entries' '
 	setup_sparse_entry &&
-	git add --chmod=+x sparse_entry &&
+	test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 	test_sparse_entry_unchanged &&
 	! test -x sparse_entry
 '
@@ -89,8 +115,41 @@ test_expect_success 'git add --renormalize does not update sparse entries' '
 	test_config core.autocrlf false &&
 	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
 	echo "sparse_entry text=auto" >.gitattributes &&
-	git add --renormalize sparse_entry &&
+	test_must_fail git add --renormalize sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
 	test_sparse_entry_unchanged
 '
 
+test_expect_success 'git add --dry-run --ignore-missing warn on sparse path' '
+	setup_sparse_entry &&
+	rm sparse_entry &&
+	test_must_fail git add --dry-run --ignore-missing sparse_entry 2>stderr &&
+	test_i18ncmp error_and_hint stderr &&
+	test_sparse_entry_unchanged
+'
+
+test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
+	setup_sparse_entry &&
+	test_must_fail git add nonexistent 2>stderr &&
+	test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
+	test_i18ngrep ! "The following pathspecs only matched index entries" stderr
+'
+
+test_expect_success 'do not warn when pathspec matches dense entries' '
+	setup_sparse_entry &&
+	echo modified >sparse_entry &&
+	>dense_entry &&
+	git add "*_entry" 2>stderr &&
+	test_must_be_empty stderr &&
+	test_sparse_entry_unchanged &&
+	git ls-files --error-unmatch dense_entry
+'
+
+test_expect_success 'add obeys advice.updateSparsePath' '
+	setup_sparse_entry &&
+	test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
+	test_i18ncmp sparse_entry_error stderr
+
+'
+
 test_done
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 7/7] rm: honor sparse checkout patterns
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
                         ` (5 preceding siblings ...)
  2021-02-24  4:05       ` [PATCH v2 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
@ 2021-02-24  4:05       ` Matheus Tavares
  2021-02-24  6:59         ` Elijah Newren
  2021-02-24  7:05       ` [PATCH v2 0/7] add/rm: honor sparse checkout and warn on sparse paths Elijah Newren
  7 siblings, 1 reply; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24  4:05 UTC (permalink / raw)
  To: git; +Cc: gitster, newren, stolee

`git add` refrains from adding or updating paths outside the sparsity
rules, but `git rm` doesn't follow the same restrictions. This is
somewhat counter-intuitive and inconsistent. So make `rm` honor the
sparse checkout and advise on how to remove SKIP_WORKTREE entries, just
like `add` does. Also add some tests for the new behavior.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config/advice.txt  |  5 ++-
 Documentation/git-rm.txt         |  4 +-
 builtin/rm.c                     | 35 ++++++++++-----
 t/t3602-rm-sparse-checkout.sh    | 76 ++++++++++++++++++++++++++++++++
 t/t7011-skip-worktree-reading.sh |  5 ---
 5 files changed, 106 insertions(+), 19 deletions(-)
 create mode 100755 t/t3602-rm-sparse-checkout.sh

diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
index d53eafa00b..bdd423ade4 100644
--- a/Documentation/config/advice.txt
+++ b/Documentation/config/advice.txt
@@ -120,6 +120,7 @@ advice.*::
 		Advice shown if a user runs the add command without providing
 		the pathspec parameter.
 	updateSparsePath::
-		Advice shown if the pathspec given to linkgit:git-add[1] only
-		matches index entries outside the current sparse-checkout.
+		Advice shown if the pathspec given to linkgit:git-add[1] or
+		linkgit:git-rm[1] only matches index entries outside the
+		current sparse-checkout.
 --
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index ab750367fd..26e9b28470 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -23,7 +23,9 @@ branch, and no updates to their contents can be staged in the index,
 though that default behavior can be overridden with the `-f` option.
 When `--cached` is given, the staged content has to
 match either the tip of the branch or the file on disk,
-allowing the file to be removed from just the index.
+allowing the file to be removed from just the index. When
+sparse-checkouts are in use (see linkgit:git-sparse-checkout[1]),
+`git rm` will only remove paths within the sparse-checkout patterns.
 
 
 OPTIONS
diff --git a/builtin/rm.c b/builtin/rm.c
index 4858631e0f..d23a3b2164 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "builtin.h"
+#include "advice.h"
 #include "config.h"
 #include "lockfile.h"
 #include "dir.h"
@@ -254,7 +255,7 @@ static struct option builtin_rm_options[] = {
 int cmd_rm(int argc, const char **argv, const char *prefix)
 {
 	struct lock_file lock_file = LOCK_INIT;
-	int i;
+	int i, ret = 0;
 	struct pathspec pathspec;
 	char *seen;
 
@@ -295,6 +296,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 
 	for (i = 0; i < active_nr; i++) {
 		const struct cache_entry *ce = active_cache[i];
+		if (ce_skip_worktree(ce))
+			continue;
 		if (!ce_path_match(&the_index, ce, &pathspec, seen))
 			continue;
 		ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
@@ -308,24 +311,34 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 	if (pathspec.nr) {
 		const char *original;
 		int seen_any = 0;
+		char *skip_worktree_seen = NULL;
+		struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
+
 		for (i = 0; i < pathspec.nr; i++) {
 			original = pathspec.items[i].original;
-			if (!seen[i]) {
-				if (!ignore_unmatch) {
-					die(_("pathspec '%s' did not match any files"),
-					    original);
-				}
-			}
-			else {
+			if (seen[i])
 				seen_any = 1;
-			}
+			else if (ignore_unmatch)
+				continue;
+			else if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen))
+				string_list_append(&only_match_skip_worktree, original);
+			else
+				die(_("pathspec '%s' did not match any files"), original);
+
 			if (!recursive && seen[i] == MATCHED_RECURSIVELY)
 				die(_("not removing '%s' recursively without -r"),
 				    *original ? original : ".");
 		}
 
+		if (only_match_skip_worktree.nr) {
+			advise_on_updating_sparse_paths(&only_match_skip_worktree);
+			ret = 1;
+		}
+		free(skip_worktree_seen);
+		string_list_clear(&only_match_skip_worktree, 0);
+
 		if (!seen_any)
-			exit(0);
+			exit(ret);
 	}
 
 	if (!index_only)
@@ -405,5 +418,5 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
 		die(_("Unable to write new index file"));
 
-	return 0;
+	return ret;
 }
diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh
new file mode 100755
index 0000000000..34f4debacf
--- /dev/null
+++ b/t/t3602-rm-sparse-checkout.sh
@@ -0,0 +1,76 @@
+#!/bin/sh
+
+test_description='git rm in sparse checked out working trees'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	mkdir -p sub/dir &&
+	touch a b c sub/d sub/dir/e &&
+	git add -A &&
+	git commit -m files &&
+
+	cat >sparse_entry_b_error <<-EOF &&
+	The following pathspecs only matched index entries outside the current
+	sparse checkout:
+	b
+	EOF
+
+	cat >b_error_and_hint sparse_entry_b_error - <<-EOF
+	hint: Disable or modify the sparsity rules if you intend to update such entries.
+	hint: Disable this message with "git config advice.updateSparsePath false"
+	EOF
+'
+
+for opt in "" -f --dry-run
+do
+	test_expect_success "rm${opt:+ $opt} does not remove sparse entries" '
+		git sparse-checkout set a &&
+		test_must_fail git rm $opt b 2>stderr &&
+		test_i18ncmp b_error_and_hint stderr &&
+		git ls-files --error-unmatch b
+	'
+done
+
+test_expect_success 'recursive rm does not remove sparse entries' '
+	git reset --hard &&
+	git sparse-checkout set sub/dir &&
+	git rm -r sub &&
+	git status --porcelain -uno >actual &&
+	echo "D  sub/dir/e" >expected &&
+	test_cmp expected actual
+'
+
+test_expect_success 'rm obeys advice.updateSparsePath' '
+	git reset --hard &&
+	git sparse-checkout set a &&
+	test_must_fail git -c advice.updateSparsePath=false rm b 2>stderr &&
+	test_i18ncmp sparse_entry_b_error stderr
+'
+
+test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
+	git reset --hard &&
+	git sparse-checkout set a &&
+	test_must_fail git rm nonexistent 2>stderr &&
+	test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
+	test_i18ngrep ! "The following pathspecs only matched index entries" stderr
+'
+
+test_expect_success 'do not warn about sparse entries when pathspec matches dense entries' '
+	git reset --hard &&
+	git sparse-checkout set a &&
+	git rm "[ba]" 2>stderr &&
+	test_must_be_empty stderr &&
+	git ls-files --error-unmatch b &&
+	test_must_fail git ls-files --error-unmatch a
+'
+
+test_expect_success 'do not warn about sparse entries with --ignore-unmatch' '
+	git reset --hard &&
+	git sparse-checkout set a &&
+	git rm --ignore-unmatch b 2>stderr &&
+	test_must_be_empty stderr &&
+	git ls-files --error-unmatch b
+'
+
+test_done
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 37525cae3a..f87749951f 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -141,11 +141,6 @@ test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
 	test -z "$(git diff-files -- one)"
 '
 
-test_expect_success 'git-rm succeeds on skip-worktree absent entries' '
-	setup_absent &&
-	git rm 1
-'
-
 test_expect_success 'commit on skip-worktree absent entries' '
 	git reset &&
 	setup_absent &&
-- 
2.30.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/7] t3705: add tests for `git add` in sparse checkouts
  2021-02-24  4:05       ` [PATCH v2 2/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
@ 2021-02-24  5:15         ` Elijah Newren
  0 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-02-24  5:15 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Feb 23, 2021 at 8:05 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> We already have a couple tests for `add` with SKIP_WORKTREE entries in
> t7012, but these only cover the most basic scenarios. As we will be
> changing how `add` deals with sparse paths in the subsequent commits,
> let's move these two tests to their own file and add more test cases
> for different `add` options and situations. This also demonstrates two
> options that don't currently respect SKIP_WORKTREE entries: `--chmod`
> and `--renormalize`.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  t/t3705-add-sparse-checkout.sh   | 96 ++++++++++++++++++++++++++++++++
>  t/t7012-skip-worktree-writing.sh | 19 -------
>  2 files changed, 96 insertions(+), 19 deletions(-)
>  create mode 100755 t/t3705-add-sparse-checkout.sh
>
> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> new file mode 100755
> index 0000000000..9bb5dc2389
> --- /dev/null
> +++ b/t/t3705-add-sparse-checkout.sh
> @@ -0,0 +1,96 @@
> +#!/bin/sh
> +
> +test_description='git add in sparse checked out working trees'
> +
> +. ./test-lib.sh
> +
> +SPARSE_ENTRY_BLOB=""
> +
> +# Optionally take a printf format string to write to the sparse_entry file
> +setup_sparse_entry () {
> +       rm -f sparse_entry &&

I think this is unnecessary.

> +       git update-index --force-remove sparse_entry &&

It might be worth adding a comment above this line that it is
necessary when sparse_entry starts out in the SKIP_WORKTREE state,
otherwise the subsequent git add below will ignore it.

> +
> +       if test $# -eq 1
> +       then
> +               printf "$1" >sparse_entry
> +       else
> +               >sparse_entry
> +       fi &&
> +       git add sparse_entry &&
> +       git update-index --skip-worktree sparse_entry &&
> +       SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
> +}
> +
> +test_sparse_entry_unchanged () {
> +       echo "100644 $SPARSE_ENTRY_BLOB 0       sparse_entry" >expected &&
> +       git ls-files --stage sparse_entry >actual &&
> +       test_cmp expected actual
> +}
> +
> +setup_gitignore () {
> +       test_when_finished rm -f .gitignore &&
> +       cat >.gitignore <<-EOF
> +       *
> +       !/sparse_entry
> +       EOF
> +}
> +
> +test_expect_success 'git add does not remove sparse entries' '
> +       setup_sparse_entry &&
> +       rm sparse_entry &&
> +       git add sparse_entry &&
> +       test_sparse_entry_unchanged
> +'
> +
> +test_expect_success 'git add -A does not remove sparse entries' '
> +       setup_sparse_entry &&
> +       rm sparse_entry &&
> +       setup_gitignore &&
> +       git add -A &&
> +       test_sparse_entry_unchanged
> +'
> +
> +test_expect_success 'git add . does not remove sparse entries' '
> +       setup_sparse_entry &&
> +       rm sparse_entry &&
> +       setup_gitignore &&
> +       git add . &&
> +       test_sparse_entry_unchanged
> +'
> +
> +for opt in "" -f -u --ignore-removal --dry-run
> +do
> +       test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
> +               setup_sparse_entry &&
> +               echo modified >sparse_entry &&
> +               git add $opt sparse_entry &&
> +               test_sparse_entry_unchanged
> +       '
> +done
> +
> +test_expect_success 'git add --refresh does not update sparse entries' '
> +       setup_sparse_entry &&
> +       git ls-files --debug sparse_entry | grep mtime >before &&
> +       test-tool chmtime -60 sparse_entry &&
> +       git add --refresh sparse_entry &&
> +       git ls-files --debug sparse_entry | grep mtime >after &&
> +       test_cmp before after
> +'
> +
> +test_expect_failure 'git add --chmod does not update sparse entries' '
> +       setup_sparse_entry &&
> +       git add --chmod=+x sparse_entry &&
> +       test_sparse_entry_unchanged &&
> +       ! test -x sparse_entry
> +'
> +
> +test_expect_failure 'git add --renormalize does not update sparse entries' '
> +       test_config core.autocrlf false &&
> +       setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
> +       echo "sparse_entry text=auto" >.gitattributes &&
> +       git add --renormalize sparse_entry &&
> +       test_sparse_entry_unchanged
> +'
> +
> +test_done
> diff --git a/t/t7012-skip-worktree-writing.sh b/t/t7012-skip-worktree-writing.sh
> index e5c6a038fb..217207c1ce 100755
> --- a/t/t7012-skip-worktree-writing.sh
> +++ b/t/t7012-skip-worktree-writing.sh
> @@ -60,13 +60,6 @@ setup_absent() {
>         git update-index --skip-worktree 1
>  }
>
> -test_absent() {
> -       echo "100644 $EMPTY_BLOB 0      1" > expected &&
> -       git ls-files --stage 1 > result &&
> -       test_cmp expected result &&
> -       test ! -f 1
> -}
> -
>  setup_dirty() {
>         git update-index --force-remove 1 &&
>         echo dirty > 1 &&
> @@ -100,18 +93,6 @@ test_expect_success 'index setup' '
>         test_cmp expected result
>  '
>
> -test_expect_success 'git-add ignores worktree content' '
> -       setup_absent &&
> -       git add 1 &&
> -       test_absent
> -'
> -
> -test_expect_success 'git-add ignores worktree content' '
> -       setup_dirty &&
> -       git add 1 &&
> -       test_dirty
> -'
> -
>  test_expect_success 'git-rm fails if worktree is dirty' '
>         setup_dirty &&
>         test_must_fail git rm 1 &&
> --
> 2.30.1

The rest looks good.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 4/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching
  2021-02-24  4:05       ` [PATCH v2 4/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching Matheus Tavares
@ 2021-02-24  5:23         ` Elijah Newren
  0 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-02-24  5:23 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Feb 23, 2021 at 8:05 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Add the 'ignore_skip_worktree' boolean parameter to both
> add_pathspec_matches_against_index() and
> find_pathspecs_matching_against_index(). When true, these functions will
> not try to match the given pathspec with SKIP_WORKTREE entries. This
> will be used in a future patch to make `git add` display a hint
> when the pathspec matches only sparse paths.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  builtin/add.c          |  4 ++--
>  builtin/check-ignore.c |  2 +-
>  pathspec.c             | 10 +++++++---
>  pathspec.h             |  5 +++--
>  4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index 5fec21a792..e15b25a623 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -177,7 +177,7 @@ static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec,
>                         *dst++ = entry;
>         }
>         dir->nr = dst - dir->entries;
> -       add_pathspec_matches_against_index(pathspec, &the_index, seen);
> +       add_pathspec_matches_against_index(pathspec, &the_index, seen, 0);

One thing to consider here is something Stolee has suggested to me
multiple times -- introducing an enum with self-documenting values.
For example:

enum ignore_skip_worktree_values {
  HEED_SKIP_WORKTREE = 0,
  IGNORE_SKIP_WORKTREE = 1
};

This would allow all the function callers to pass HEED_SKIP_WORKTREE
instead of a bare 0, and allows code readers to avoid looking up
function signatures to find the meaning behind the 0.  It's a minor
point, though.

>         return seen;
>  }
>
> @@ -578,7 +578,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>                 int i;
>
>                 if (!seen)
> -                       seen = find_pathspecs_matching_against_index(&pathspec, &the_index);
> +                       seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
>
>                 /*
>                  * file_exists() assumes exact match
> diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
> index 3c652748d5..235b7fc905 100644
> --- a/builtin/check-ignore.c
> +++ b/builtin/check-ignore.c
> @@ -100,7 +100,7 @@ static int check_ignore(struct dir_struct *dir,
>          * should not be ignored, in order to be consistent with
>          * 'git status', 'git add' etc.
>          */
> -       seen = find_pathspecs_matching_against_index(&pathspec, &the_index);
> +       seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
>         for (i = 0; i < pathspec.nr; i++) {
>                 full_path = pathspec.items[i].match;
>                 pattern = NULL;
> diff --git a/pathspec.c b/pathspec.c
> index 7a229d8d22..e5e6b7458d 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -21,7 +21,7 @@
>   */
>  void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                                         const struct index_state *istate,
> -                                       char *seen)
> +                                       char *seen, int ignore_skip_worktree)
>  {
>         int num_unmatched = 0, i;
>
> @@ -38,6 +38,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                 return;
>         for (i = 0; i < istate->cache_nr; i++) {
>                 const struct cache_entry *ce = istate->cache[i];
> +               if (ignore_skip_worktree && ce_skip_worktree(ce))
> +                       continue;
>                 ce_path_match(istate, ce, pathspec, seen);
>         }
>  }
> @@ -51,10 +53,12 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>   * given pathspecs achieves against all items in the index.
>   */
>  char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
> -                                           const struct index_state *istate)
> +                                           const struct index_state *istate,
> +                                           int ignore_skip_worktree)
>  {
>         char *seen = xcalloc(pathspec->nr, 1);
> -       add_pathspec_matches_against_index(pathspec, istate, seen);
> +       add_pathspec_matches_against_index(pathspec, istate, seen,
> +                                          ignore_skip_worktree);
>         return seen;
>  }
>
> diff --git a/pathspec.h b/pathspec.h
> index 454ce364fa..8202882ecd 100644
> --- a/pathspec.h
> +++ b/pathspec.h
> @@ -151,9 +151,10 @@ static inline int ps_strcmp(const struct pathspec_item *item,
>
>  void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>                                         const struct index_state *istate,
> -                                       char *seen);
> +                                       char *seen, int ignore_skip_worktree);
>  char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
> -                                           const struct index_state *istate);
> +                                           const struct index_state *istate,
> +                                           int ignore_skip_worktree);
>  int match_pathspec_attrs(const struct index_state *istate,
>                          const char *name, int namelen,
>                          const struct pathspec_item *item);
> --
> 2.30.1

Looks good.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries
  2021-02-24  4:05       ` [PATCH v2 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
@ 2021-02-24  6:50         ` Elijah Newren
  2021-02-24 15:33           ` Matheus Tavares
  0 siblings, 1 reply; 56+ messages in thread
From: Elijah Newren @ 2021-02-24  6:50 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Feb 23, 2021 at 8:05 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> `git add` already refrains from updating SKIP_WORKTREE entries, but it
> silently exits with zero code when a pathspec only matches these
> entries. Instead, let's warn the user and display a hint on how to
> update these entries.
>
> Note that the warning is only shown if the pathspec matches no untracked
> paths in the working tree and only matches index entries with the
> SKIP_WORKTREE bit set. A warning message was chosen over erroring out
> right away to reproduce the same behavior `add` already exhibits with
> ignored files. This also allow users to continue their workflow without
> having to invoke `add` again with only the matching pathspecs, as the
> matched files will have already been added.
>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  Documentation/config/advice.txt |  3 ++
>  advice.c                        | 19 +++++++++
>  advice.h                        |  4 ++
>  builtin/add.c                   | 70 ++++++++++++++++++++++++-------
>  pathspec.c                      | 15 +++++++
>  pathspec.h                      |  8 ++++
>  t/t3705-add-sparse-checkout.sh  | 73 +++++++++++++++++++++++++++++----
>  7 files changed, 171 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
> index acbd0c09aa..d53eafa00b 100644
> --- a/Documentation/config/advice.txt
> +++ b/Documentation/config/advice.txt
> @@ -119,4 +119,7 @@ advice.*::
>         addEmptyPathspec::
>                 Advice shown if a user runs the add command without providing
>                 the pathspec parameter.
> +       updateSparsePath::
> +               Advice shown if the pathspec given to linkgit:git-add[1] only
> +               matches index entries outside the current sparse-checkout.
>  --
> diff --git a/advice.c b/advice.c
> index 164742305f..cf22c1a6e5 100644
> --- a/advice.c
> +++ b/advice.c
> @@ -2,6 +2,7 @@
>  #include "config.h"
>  #include "color.h"
>  #include "help.h"
> +#include "string-list.h"
>
>  int advice_fetch_show_forced_updates = 1;
>  int advice_push_update_rejected = 1;
> @@ -136,6 +137,7 @@ static struct {
>         [ADVICE_STATUS_HINTS]                           = { "statusHints", 1 },
>         [ADVICE_STATUS_U_OPTION]                        = { "statusUoption", 1 },
>         [ADVICE_SUBMODULE_ALTERNATE_ERROR_STRATEGY_DIE] = { "submoduleAlternateErrorStrategyDie", 1 },
> +       [ADVICE_UPDATE_SPARSE_PATH]                     = { "updateSparsePath", 1 },
>         [ADVICE_WAITING_FOR_EDITOR]                     = { "waitingForEditor", 1 },
>  };
>
> @@ -284,6 +286,23 @@ void NORETURN die_conclude_merge(void)
>         die(_("Exiting because of unfinished merge."));
>  }
>
> +void advise_on_updating_sparse_paths(struct string_list *pathspec_list)
> +{
> +       struct string_list_item *item;
> +
> +       if (!pathspec_list->nr)
> +               return;
> +
> +       fprintf(stderr, _("The following pathspecs only matched index entries outside the current\n"
> +                         "sparse checkout:\n"));
> +       for_each_string_list_item(item, pathspec_list)
> +               fprintf(stderr, "%s\n", item->string);

Was the use of fprintf(stderr, ...) because of the fact that you want
to do multiple print statements?  I'm just curious if that was the
reason for avoiding the warning() function, or if there was another
consideration at play as well.

> +
> +       advise_if_enabled(ADVICE_UPDATE_SPARSE_PATH,
> +                         _("Disable or modify the sparsity rules if you intend to update such entries."));
> +
> +}
> +
>  void detach_advice(const char *new_name)
>  {
>         const char *fmt =
> diff --git a/advice.h b/advice.h
> index bc2432980a..bd26c385d0 100644
> --- a/advice.h
> +++ b/advice.h
> @@ -3,6 +3,8 @@
>
>  #include "git-compat-util.h"
>
> +struct string_list;
> +
>  extern int advice_fetch_show_forced_updates;
>  extern int advice_push_update_rejected;
>  extern int advice_push_non_ff_current;
> @@ -71,6 +73,7 @@ extern int advice_add_empty_pathspec;
>         ADVICE_STATUS_HINTS,
>         ADVICE_STATUS_U_OPTION,
>         ADVICE_SUBMODULE_ALTERNATE_ERROR_STRATEGY_DIE,
> +       ADVICE_UPDATE_SPARSE_PATH,
>         ADVICE_WAITING_FOR_EDITOR,
>  };
>
> @@ -92,6 +95,7 @@ void advise_if_enabled(enum advice_type type, const char *advice, ...);
>  int error_resolve_conflict(const char *me);
>  void NORETURN die_resolve_conflict(const char *me);
>  void NORETURN die_conclude_merge(void);
> +void advise_on_updating_sparse_paths(struct string_list *pathspec_list);
>  void detach_advice(const char *new_name);
>
>  #endif /* ADVICE_H */
> diff --git a/builtin/add.c b/builtin/add.c
> index e15b25a623..fde6462850 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -177,24 +177,43 @@ static char *prune_directory(struct dir_struct *dir, struct pathspec *pathspec,
>                         *dst++ = entry;
>         }
>         dir->nr = dst - dir->entries;
> -       add_pathspec_matches_against_index(pathspec, &the_index, seen, 0);
> +       add_pathspec_matches_against_index(pathspec, &the_index, seen, 1);

_If_ you add the enum as mentioned earlier, this 1 would become
whatever the other enum value was.  I'll omit making similar comments
for other call sites like this one, though there are a few more in the
patch.

>         return seen;
>  }
>
> -static void refresh(int verbose, const struct pathspec *pathspec)
> +static int refresh(int verbose, const struct pathspec *pathspec)
>  {
>         char *seen;
> -       int i;
> +       int i, ret = 0;
> +       char *skip_worktree_seen = NULL;
> +       struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
> +       int flags = REFRESH_DONT_MARK_SPARSE_MATCHES |
> +                   (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
>
>         seen = xcalloc(pathspec->nr, 1);
> -       refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
> -                     pathspec, seen, _("Unstaged changes after refreshing the index:"));
> +       refresh_index(&the_index, flags, pathspec, seen,
> +                     _("Unstaged changes after refreshing the index:"));
>         for (i = 0; i < pathspec->nr; i++) {
> -               if (!seen[i])
> -                       die(_("pathspec '%s' did not match any files"),
> -                           pathspec->items[i].original);
> +               if (!seen[i]) {
> +                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
> +                               string_list_append(&only_match_skip_worktree,
> +                                                  pathspec->items[i].original);
> +                       } else {
> +                               die(_("pathspec '%s' did not match any files"),
> +                                   pathspec->items[i].original);
> +                       }
> +               }
> +       }
> +
> +       if (only_match_skip_worktree.nr) {
> +               advise_on_updating_sparse_paths(&only_match_skip_worktree);
> +               ret = 1;
>         }

On first reading, I missed that the code die()s if there are any
non-SKIP_WORKTREE entries matched, and that is the reason you know
that only SKIP_WORKTREE entries could have been matched for this last
if-statement.  Perhaps a short comment for the reader like me who is
prone to miss the obvious?

> +
>         free(seen);
> +       free(skip_worktree_seen);
> +       string_list_clear(&only_match_skip_worktree, 0);
> +       return ret;
>  }
>
>  int run_add_interactive(const char *revision, const char *patch_mode,
> @@ -570,15 +589,17 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>         }
>
>         if (refresh_only) {
> -               refresh(verbose, &pathspec);
> +               exit_status |= refresh(verbose, &pathspec);
>                 goto finish;
>         }
>
>         if (pathspec.nr) {
>                 int i;
> +               char *skip_worktree_seen = NULL;
> +               struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
>
>                 if (!seen)
> -                       seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 0);
> +                       seen = find_pathspecs_matching_against_index(&pathspec, &the_index, 1);
>
>                 /*
>                  * file_exists() assumes exact match
> @@ -592,12 +613,24 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>
>                 for (i = 0; i < pathspec.nr; i++) {
>                         const char *path = pathspec.items[i].match;
> +
>                         if (pathspec.items[i].magic & PATHSPEC_EXCLUDE)
>                                 continue;
> -                       if (!seen[i] && path[0] &&
> -                           ((pathspec.items[i].magic &
> -                             (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
> -                            !file_exists(path))) {
> +                       if (seen[i])
> +                               continue;
> +
> +                       if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen)) {
> +                               string_list_append(&only_match_skip_worktree,
> +                                                  pathspec.items[i].original);
> +                               continue;
> +                       }
> +
> +                       /* Don't complain at 'git add .' inside empty repo. */
> +                       if (!path[0])
> +                               continue;
> +
> +                       if ((pathspec.items[i].magic & (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
> +                           !file_exists(path)) {

Breaking up that if-statement into several with some continues seems
to make it a lot easier for me to read; thanks.  It also makes it
easier to add your new condition.

>                                 if (ignore_missing) {
>                                         int dtype = DT_UNKNOWN;
>                                         if (is_excluded(&dir, &the_index, path, &dtype))
> @@ -608,7 +641,16 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>                                             pathspec.items[i].original);
>                         }
>                 }
> +
> +
> +               if (only_match_skip_worktree.nr) {
> +                       advise_on_updating_sparse_paths(&only_match_skip_worktree);
> +                       exit_status = 1;
> +               }

Hmm...here's an interesting command sequence:

git init lame
cd lame
mkdir baz
touch baz/tracked
git add baz/tracked
git update-index --skip-worktree baz/tracked
rm baz/tracked.  # But leave the empty directory!
echo baz >.gitignore
git add --ignore-missing --dry-run baz


Reports the following:
"""
The following pathspecs only matched index entries outside the current
sparse checkout:
baz
hint: Disable or modify the sparsity rules if you intend to update such entries.
hint: Disable this message with "git config advice.updateSparsePath false"
The following paths are ignored by one of your .gitignore files:
baz
hint: Use -f if you really want to add them.
hint: Turn this message off by running
hint: "git config advice.addIgnoredFile false"
"""

That's probably okay because it does match both, but the "only
matched" in the first message followed by saying it matched something
else seems a little surprising at first.  It's not wrong, just
slightly surprising.  But then again, this setup is super weird...so
maybe it's all okay?

> +
>                 free(seen);
> +               free(skip_worktree_seen);
> +               string_list_clear(&only_match_skip_worktree, 0);
>         }
>
>         plug_bulk_checkin();
> diff --git a/pathspec.c b/pathspec.c
> index e5e6b7458d..61f294fed5 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -62,6 +62,21 @@ char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
>         return seen;
>  }
>
> +char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec)
> +{
> +       struct index_state *istate = the_repository->index;
> +       char *seen = xcalloc(pathspec->nr, 1);
> +       int i;
> +
> +       for (i = 0; i < istate->cache_nr; i++) {
> +               struct cache_entry *ce = istate->cache[i];
> +               if (ce_skip_worktree(ce))
> +                   ce_path_match(istate, ce, pathspec, seen);
> +       }
> +
> +       return seen;
> +}
> +
>  /*
>   * Magic pathspec
>   *
> diff --git a/pathspec.h b/pathspec.h
> index 8202882ecd..f591ba625c 100644
> --- a/pathspec.h
> +++ b/pathspec.h
> @@ -155,6 +155,14 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec,
>  char *find_pathspecs_matching_against_index(const struct pathspec *pathspec,
>                                             const struct index_state *istate,
>                                             int ignore_skip_worktree);
> +char *find_pathspecs_matching_skip_worktree(const struct pathspec *pathspec);
> +static inline int matches_skip_worktree(const struct pathspec *pathspec,
> +                                       int item, char **seen_ptr)
> +{
> +       if (!*seen_ptr)
> +               *seen_ptr = find_pathspecs_matching_skip_worktree(pathspec);
> +       return (*seen_ptr)[item];
> +}
>  int match_pathspec_attrs(const struct index_state *istate,
>                          const char *name, int namelen,
>                          const struct pathspec_item *item);
> diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
> index 6781620297..fdfd8b085e 100755
> --- a/t/t3705-add-sparse-checkout.sh
> +++ b/t/t3705-add-sparse-checkout.sh
> @@ -36,10 +36,26 @@ setup_gitignore () {
>         EOF
>  }
>
> +test_expect_success 'setup' '
> +       cat >sparse_error_header <<-EOF &&
> +       The following pathspecs only matched index entries outside the current
> +       sparse checkout:
> +       EOF
> +
> +       cat >sparse_hint <<-EOF &&
> +       hint: Disable or modify the sparsity rules if you intend to update such entries.
> +       hint: Disable this message with "git config advice.updateSparsePath false"
> +       EOF
> +
> +       echo sparse_entry | cat sparse_error_header - >sparse_entry_error &&
> +       cat sparse_entry_error sparse_hint >error_and_hint
> +'
> +
>  test_expect_success 'git add does not remove sparse entries' '
>         setup_sparse_entry &&
>         rm sparse_entry &&
> -       git add sparse_entry &&
> +       test_must_fail git add sparse_entry 2>stderr &&
> +       test_i18ncmp error_and_hint stderr &&
>         test_sparse_entry_unchanged
>  '
>
> @@ -47,7 +63,8 @@ test_expect_success 'git add -A does not remove sparse entries' '
>         setup_sparse_entry &&
>         rm sparse_entry &&
>         setup_gitignore &&
> -       git add -A &&
> +       git add -A 2>stderr &&
> +       test_must_be_empty stderr &&
>         test_sparse_entry_unchanged
>  '
>
> @@ -55,7 +72,13 @@ test_expect_success 'git add . does not remove sparse entries' '
>         setup_sparse_entry &&
>         rm sparse_entry &&
>         setup_gitignore &&
> -       git add . &&
> +       test_must_fail git add . 2>stderr &&
> +
> +       cat sparse_error_header >expect &&
> +       echo . >>expect &&
> +       cat sparse_hint >>expect &&
> +
> +       test_i18ncmp expect stderr &&
>         test_sparse_entry_unchanged
>  '
>
> @@ -64,7 +87,8 @@ do
>         test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
>                 setup_sparse_entry &&
>                 echo modified >sparse_entry &&
> -               git add $opt sparse_entry &&
> +               test_must_fail git add $opt sparse_entry 2>stderr &&
> +               test_i18ncmp error_and_hint stderr &&
>                 test_sparse_entry_unchanged
>         '
>  done
> @@ -73,14 +97,16 @@ test_expect_success 'git add --refresh does not update sparse entries' '
>         setup_sparse_entry &&
>         git ls-files --debug sparse_entry | grep mtime >before &&
>         test-tool chmtime -60 sparse_entry &&
> -       git add --refresh sparse_entry &&
> +       test_must_fail git add --refresh sparse_entry 2>stderr &&
> +       test_i18ncmp error_and_hint stderr &&
>         git ls-files --debug sparse_entry | grep mtime >after &&
>         test_cmp before after
>  '
>
>  test_expect_success 'git add --chmod does not update sparse entries' '
>         setup_sparse_entry &&
> -       git add --chmod=+x sparse_entry &&
> +       test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
> +       test_i18ncmp error_and_hint stderr &&
>         test_sparse_entry_unchanged &&
>         ! test -x sparse_entry
>  '
> @@ -89,8 +115,41 @@ test_expect_success 'git add --renormalize does not update sparse entries' '
>         test_config core.autocrlf false &&
>         setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
>         echo "sparse_entry text=auto" >.gitattributes &&
> -       git add --renormalize sparse_entry &&
> +       test_must_fail git add --renormalize sparse_entry 2>stderr &&
> +       test_i18ncmp error_and_hint stderr &&
>         test_sparse_entry_unchanged
>  '
>
> +test_expect_success 'git add --dry-run --ignore-missing warn on sparse path' '
> +       setup_sparse_entry &&
> +       rm sparse_entry &&
> +       test_must_fail git add --dry-run --ignore-missing sparse_entry 2>stderr &&
> +       test_i18ncmp error_and_hint stderr &&
> +       test_sparse_entry_unchanged

See also the slightly convoluted setup I responded above with.  I'm
not sure if we need to make it a test, in part because it's crazy
enough that I'm not quite convinced that the current behavior is right
-- or wrong.

> +'
> +
> +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
> +       setup_sparse_entry &&
> +       test_must_fail git add nonexistent 2>stderr &&
> +       test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
> +       test_i18ngrep ! "The following pathspecs only matched index entries" stderr
> +'
> +
> +test_expect_success 'do not warn when pathspec matches dense entries' '
> +       setup_sparse_entry &&
> +       echo modified >sparse_entry &&
> +       >dense_entry &&
> +       git add "*_entry" 2>stderr &&
> +       test_must_be_empty stderr &&
> +       test_sparse_entry_unchanged &&
> +       git ls-files --error-unmatch dense_entry
> +'
> +
> +test_expect_success 'add obeys advice.updateSparsePath' '
> +       setup_sparse_entry &&
> +       test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
> +       test_i18ncmp sparse_entry_error stderr
> +
> +'
> +
>  test_done
> --
> 2.30.1

Overall, a nice read.  Didn't spot any other issues, just the minor
points I highlighted above.  I'm curious if others have thoughts on my
really weird setup and the dual warnings, though.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 7/7] rm: honor sparse checkout patterns
  2021-02-24  4:05       ` [PATCH v2 7/7] rm: honor sparse checkout patterns Matheus Tavares
@ 2021-02-24  6:59         ` Elijah Newren
  0 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-02-24  6:59 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, Feb 23, 2021 at 8:05 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> `git add` refrains from adding or updating paths outside the sparsity
> rules, but `git rm` doesn't follow the same restrictions. This is
> somewhat counter-intuitive and inconsistent. So make `rm` honor the
> sparse checkout and advise on how to remove SKIP_WORKTREE entries, just
> like `add` does. Also add some tests for the new behavior.
>
> Suggested-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  Documentation/config/advice.txt  |  5 ++-
>  Documentation/git-rm.txt         |  4 +-
>  builtin/rm.c                     | 35 ++++++++++-----
>  t/t3602-rm-sparse-checkout.sh    | 76 ++++++++++++++++++++++++++++++++
>  t/t7011-skip-worktree-reading.sh |  5 ---
>  5 files changed, 106 insertions(+), 19 deletions(-)
>  create mode 100755 t/t3602-rm-sparse-checkout.sh
>
> diff --git a/Documentation/config/advice.txt b/Documentation/config/advice.txt
> index d53eafa00b..bdd423ade4 100644
> --- a/Documentation/config/advice.txt
> +++ b/Documentation/config/advice.txt
> @@ -120,6 +120,7 @@ advice.*::
>                 Advice shown if a user runs the add command without providing
>                 the pathspec parameter.
>         updateSparsePath::
> -               Advice shown if the pathspec given to linkgit:git-add[1] only
> -               matches index entries outside the current sparse-checkout.
> +               Advice shown if the pathspec given to linkgit:git-add[1] or
> +               linkgit:git-rm[1] only matches index entries outside the
> +               current sparse-checkout.
>  --
> diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
> index ab750367fd..26e9b28470 100644
> --- a/Documentation/git-rm.txt
> +++ b/Documentation/git-rm.txt
> @@ -23,7 +23,9 @@ branch, and no updates to their contents can be staged in the index,
>  though that default behavior can be overridden with the `-f` option.
>  When `--cached` is given, the staged content has to
>  match either the tip of the branch or the file on disk,
> -allowing the file to be removed from just the index.
> +allowing the file to be removed from just the index. When
> +sparse-checkouts are in use (see linkgit:git-sparse-checkout[1]),
> +`git rm` will only remove paths within the sparse-checkout patterns.
>
>
>  OPTIONS
> diff --git a/builtin/rm.c b/builtin/rm.c
> index 4858631e0f..d23a3b2164 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -5,6 +5,7 @@
>   */
>  #define USE_THE_INDEX_COMPATIBILITY_MACROS
>  #include "builtin.h"
> +#include "advice.h"
>  #include "config.h"
>  #include "lockfile.h"
>  #include "dir.h"
> @@ -254,7 +255,7 @@ static struct option builtin_rm_options[] = {
>  int cmd_rm(int argc, const char **argv, const char *prefix)
>  {
>         struct lock_file lock_file = LOCK_INIT;
> -       int i;
> +       int i, ret = 0;
>         struct pathspec pathspec;
>         char *seen;
>
> @@ -295,6 +296,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>
>         for (i = 0; i < active_nr; i++) {
>                 const struct cache_entry *ce = active_cache[i];
> +               if (ce_skip_worktree(ce))
> +                       continue;
>                 if (!ce_path_match(&the_index, ce, &pathspec, seen))
>                         continue;
>                 ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
> @@ -308,24 +311,34 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>         if (pathspec.nr) {
>                 const char *original;
>                 int seen_any = 0;
> +               char *skip_worktree_seen = NULL;
> +               struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
> +
>                 for (i = 0; i < pathspec.nr; i++) {
>                         original = pathspec.items[i].original;
> -                       if (!seen[i]) {
> -                               if (!ignore_unmatch) {
> -                                       die(_("pathspec '%s' did not match any files"),
> -                                           original);
> -                               }
> -                       }
> -                       else {
> +                       if (seen[i])
>                                 seen_any = 1;
> -                       }
> +                       else if (ignore_unmatch)
> +                               continue;
> +                       else if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen))
> +                               string_list_append(&only_match_skip_worktree, original);
> +                       else
> +                               die(_("pathspec '%s' did not match any files"), original);
> +
>                         if (!recursive && seen[i] == MATCHED_RECURSIVELY)
>                                 die(_("not removing '%s' recursively without -r"),
>                                     *original ? original : ".");
>                 }
>
> +               if (only_match_skip_worktree.nr) {
> +                       advise_on_updating_sparse_paths(&only_match_skip_worktree);
> +                       ret = 1;
> +               }
> +               free(skip_worktree_seen);
> +               string_list_clear(&only_match_skip_worktree, 0);
> +
>                 if (!seen_any)
> -                       exit(0);
> +                       exit(ret);
>         }
>
>         if (!index_only)
> @@ -405,5 +418,5 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>                                COMMIT_LOCK | SKIP_IF_UNCHANGED))
>                 die(_("Unable to write new index file"));
>
> -       return 0;
> +       return ret;
>  }
> diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh
> new file mode 100755
> index 0000000000..34f4debacf
> --- /dev/null
> +++ b/t/t3602-rm-sparse-checkout.sh
> @@ -0,0 +1,76 @@
> +#!/bin/sh
> +
> +test_description='git rm in sparse checked out working trees'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +       mkdir -p sub/dir &&
> +       touch a b c sub/d sub/dir/e &&
> +       git add -A &&
> +       git commit -m files &&
> +
> +       cat >sparse_entry_b_error <<-EOF &&
> +       The following pathspecs only matched index entries outside the current
> +       sparse checkout:
> +       b
> +       EOF
> +
> +       cat >b_error_and_hint sparse_entry_b_error - <<-EOF
> +       hint: Disable or modify the sparsity rules if you intend to update such entries.
> +       hint: Disable this message with "git config advice.updateSparsePath false"
> +       EOF
> +'
> +
> +for opt in "" -f --dry-run
> +do
> +       test_expect_success "rm${opt:+ $opt} does not remove sparse entries" '
> +               git sparse-checkout set a &&
> +               test_must_fail git rm $opt b 2>stderr &&
> +               test_i18ncmp b_error_and_hint stderr &&
> +               git ls-files --error-unmatch b
> +       '
> +done
> +
> +test_expect_success 'recursive rm does not remove sparse entries' '
> +       git reset --hard &&
> +       git sparse-checkout set sub/dir &&
> +       git rm -r sub &&
> +       git status --porcelain -uno >actual &&
> +       echo "D  sub/dir/e" >expected &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'rm obeys advice.updateSparsePath' '
> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       test_must_fail git -c advice.updateSparsePath=false rm b 2>stderr &&
> +       test_i18ncmp sparse_entry_b_error stderr
> +'
> +
> +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       test_must_fail git rm nonexistent 2>stderr &&
> +       test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
> +       test_i18ngrep ! "The following pathspecs only matched index entries" stderr
> +'
> +
> +test_expect_success 'do not warn about sparse entries when pathspec matches dense entries' '
> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       git rm "[ba]" 2>stderr &&
> +       test_must_be_empty stderr &&
> +       git ls-files --error-unmatch b &&
> +       test_must_fail git ls-files --error-unmatch a
> +'
> +
> +test_expect_success 'do not warn about sparse entries with --ignore-unmatch' '
> +       git reset --hard &&
> +       git sparse-checkout set a &&
> +       git rm --ignore-unmatch b 2>stderr &&
> +       test_must_be_empty stderr &&
> +       git ls-files --error-unmatch b
> +'
> +
> +test_done
> diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
> index 37525cae3a..f87749951f 100755
> --- a/t/t7011-skip-worktree-reading.sh
> +++ b/t/t7011-skip-worktree-reading.sh
> @@ -141,11 +141,6 @@ test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
>         test -z "$(git diff-files -- one)"
>  '
>
> -test_expect_success 'git-rm succeeds on skip-worktree absent entries' '
> -       setup_absent &&
> -       git rm 1
> -'
> -
>  test_expect_success 'commit on skip-worktree absent entries' '
>         git reset &&
>         setup_absent &&
> --
> 2.30.1

I already reviewed the source code previously a few times now; it
still looks good to me, and you didn't change it from the last
version.  You've added new tests and tweaked them a bit, but all those
changes look good to me.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 0/7] add/rm: honor sparse checkout and warn on sparse paths
  2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
                         ` (6 preceding siblings ...)
  2021-02-24  4:05       ` [PATCH v2 7/7] rm: honor sparse checkout patterns Matheus Tavares
@ 2021-02-24  7:05       ` Elijah Newren
  7 siblings, 0 replies; 56+ messages in thread
From: Elijah Newren @ 2021-02-24  7:05 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

Hi,

This time I'm responding a bit more promptly than the last few patch
series you sent out.  :-)

On Tue, Feb 23, 2021 at 8:05 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Make `rm` honor sparse checkouts, and make both `rm` and `add` warn
> about pathspecs that only match sparse entries.
>
> This is now based on mt/add-chmod-fixes.
>
> Main changes since RFC:

I've read through the series.  I'm obviously a fan of the overall
direction, and after reading it over I think the implementation also
looks pretty good.  I had a few minor comments here and there, and one
weird testcase that I came up with based on reading your code where
I'm not sure what the right behavior is (because the testcase is
convoluted); at worst, the current code just prints two warnings when
we maybe want one, but two warnings might be considered the correct
behavior.  Maybe you or someone else can look at it with a fresh set
of eyes and have a reason to prefer one behavior over the other for
that testcase.

>
> Ejected "add --chmod: don't update index when --dry-run is used", which
> is now part of mt/add-chmod-fixes.
>
> Patch 2:
> - Fixed style problems.
> - Fixed comment on setup_sparse_entry() about argument being passed to
>   printf.
> - Standardized on 'sparse entries' on test names.
> - Added test for 'git add .', and set up a .gitignore to run this and
>   the 'git add -A' test, to avoid smudging the index.
> - Added test for --dry-run.
> - Modified `git add --refresh` test to use `git ls-files --debug`
>   instead of diff-files.
>
> Added patch 5
>
> Patch 6:
> - Improved commit message.
> - Used the flag added in patch 5 instead of directly changing
>   refresh_index(), so that new users of seen[] can still have the
>   complete match array.
> - Added test for --ignore-missing.
> - Added test where the pathspec matches both sparse and dense entries.
> - Changed 'git add .' behavior where . only contains sparse entries. [1]
>
> Patch 7:
> - Moved rm-sparse-checkout tests to their own file. They need a repo
>   with a different setup than the one t3600 has, and the subshells (or
>   git -C ...) were making the tests too hard to read.
> - Added tests for -f, --dry-run and --ignore-unmatch.
> - Added test where the pathspec matches both sparse and dense entries.
> - Also test if the index entry is really not removed (besides testing
>   `git rm` exit code and stderr).
> - Passed "a" instead of "/a" to git sparse-checkout to avoid MSYS path
>   conversions on Windows that would make some of the new tests fail.
>
> [1]: Although the sparse entries warning is based on the ignored files
>      warning, they have some differences. One of them is that running
>      'git add dir' where 'dir/somefile' is ignored does not trigger a
>      warning, whereas if 'somefile' is a sparse entry, we warn. In
>      this sense, I think it's more coherent to also warn on 'git add .'
>      when . only has sparse entries. This is also consistent with what
>      'git rm -r .' does in the same scenario, after this series.
>
> Matheus Tavares (7):
>   add: include magic part of pathspec on --refresh error
>   t3705: add tests for `git add` in sparse checkouts
>   add: make --chmod and --renormalize honor sparse checkouts
>   pathspec: allow to ignore SKIP_WORKTREE entries on index matching
>   refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag
>   add: warn when pathspec only matches SKIP_WORKTREE entries
>   rm: honor sparse checkout patterns
>
>  Documentation/config/advice.txt  |   4 +
>  Documentation/git-rm.txt         |   4 +-
>  advice.c                         |  19 ++++
>  advice.h                         |   4 +
>  builtin/add.c                    |  75 ++++++++++++---
>  builtin/check-ignore.c           |   2 +-
>  builtin/rm.c                     |  35 ++++---
>  cache.h                          |  15 +--
>  pathspec.c                       |  25 ++++-
>  pathspec.h                       |  13 ++-
>  read-cache.c                     |   5 +-
>  t/t3602-rm-sparse-checkout.sh    |  76 +++++++++++++++
>  t/t3700-add.sh                   |   6 ++
>  t/t3705-add-sparse-checkout.sh   | 155 +++++++++++++++++++++++++++++++
>  t/t7011-skip-worktree-reading.sh |   5 -
>  t/t7012-skip-worktree-writing.sh |  19 ----
>  16 files changed, 398 insertions(+), 64 deletions(-)
>  create mode 100755 t/t3602-rm-sparse-checkout.sh
>  create mode 100755 t/t3705-add-sparse-checkout.sh
>
> Range-diff against v1:
> 1:  5612c57977 < -:  ---------- add --chmod: don't update index when --dry-run is used
> 2:  5a06223007 = 1:  2831fd5744 add: include magic part of pathspec on --refresh error
> 3:  4fc81a83b1 ! 2:  72b8787018 t3705: add tests for `git add` in sparse checkouts
>     @@ t/t3705-add-sparse-checkout.sh (new)
>      +
>      +SPARSE_ENTRY_BLOB=""
>      +
>     -+# Optionally take a string for the entry's contents
>     -+setup_sparse_entry()
>     -+{
>     -+  if test -f sparse_entry
>     -+  then
>     -+          rm sparse_entry
>     -+  fi &&
>     ++# Optionally take a printf format string to write to the sparse_entry file
>     ++setup_sparse_entry () {
>     ++  rm -f sparse_entry &&
>      +  git update-index --force-remove sparse_entry &&
>      +
>     -+  if test "$#" -eq 1
>     ++  if test $# -eq 1
>      +  then
>      +          printf "$1" >sparse_entry
>      +  else
>     -+          printf "" >sparse_entry
>     ++          >sparse_entry
>      +  fi &&
>      +  git add sparse_entry &&
>      +  git update-index --skip-worktree sparse_entry &&
>      +  SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
>      +}
>      +
>     -+test_sparse_entry_unchanged() {
>     ++test_sparse_entry_unchanged () {
>      +  echo "100644 $SPARSE_ENTRY_BLOB 0       sparse_entry" >expected &&
>      +  git ls-files --stage sparse_entry >actual &&
>      +  test_cmp expected actual
>      +}
>      +
>     -+test_expect_success "git add does not remove SKIP_WORKTREE entries" '
>     ++setup_gitignore () {
>     ++  test_when_finished rm -f .gitignore &&
>     ++  cat >.gitignore <<-EOF
>     ++  *
>     ++  !/sparse_entry
>     ++  EOF
>     ++}
>     ++
>     ++test_expect_success 'git add does not remove sparse entries' '
>      +  setup_sparse_entry &&
>      +  rm sparse_entry &&
>      +  git add sparse_entry &&
>      +  test_sparse_entry_unchanged
>      +'
>      +
>     -+test_expect_success "git add -A does not remove SKIP_WORKTREE entries" '
>     ++test_expect_success 'git add -A does not remove sparse entries' '
>      +  setup_sparse_entry &&
>      +  rm sparse_entry &&
>     ++  setup_gitignore &&
>      +  git add -A &&
>      +  test_sparse_entry_unchanged
>      +'
>      +
>     -+for opt in "" -f -u --ignore-removal
>     -+do
>     -+  if test -n "$opt"
>     -+  then
>     -+          opt=" $opt"
>     -+  fi
>     ++test_expect_success 'git add . does not remove sparse entries' '
>     ++  setup_sparse_entry &&
>     ++  rm sparse_entry &&
>     ++  setup_gitignore &&
>     ++  git add . &&
>     ++  test_sparse_entry_unchanged
>     ++'
>      +
>     -+  test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
>     ++for opt in "" -f -u --ignore-removal --dry-run
>     ++do
>     ++  test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
>      +          setup_sparse_entry &&
>      +          echo modified >sparse_entry &&
>      +          git add $opt sparse_entry &&
>     @@ t/t3705-add-sparse-checkout.sh (new)
>      +  '
>      +done
>      +
>     -+test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
>     ++test_expect_success 'git add --refresh does not update sparse entries' '
>      +  setup_sparse_entry &&
>     ++  git ls-files --debug sparse_entry | grep mtime >before &&
>      +  test-tool chmtime -60 sparse_entry &&
>      +  git add --refresh sparse_entry &&
>     -+
>     -+  # We must unset the SKIP_WORKTREE bit, otherwise
>     -+  # git diff-files would skip examining the file
>     -+  git update-index --no-skip-worktree sparse_entry &&
>     -+
>     -+  echo sparse_entry >expected &&
>     -+  git diff-files --name-only sparse_entry >actual &&
>     -+  test_cmp actual expected
>     ++  git ls-files --debug sparse_entry | grep mtime >after &&
>     ++  test_cmp before after
>      +'
>      +
>     -+test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
>     ++test_expect_failure 'git add --chmod does not update sparse entries' '
>      +  setup_sparse_entry &&
>      +  git add --chmod=+x sparse_entry &&
>     -+  test_sparse_entry_unchanged
>     ++  test_sparse_entry_unchanged &&
>     ++  ! test -x sparse_entry
>      +'
>      +
>     -+test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
>     ++test_expect_failure 'git add --renormalize does not update sparse entries' '
>      +  test_config core.autocrlf false &&
>      +  setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
>      +  echo "sparse_entry text=auto" >.gitattributes &&
> 4:  ccb05cc3d7 ! 3:  0f03adf241 add: make --chmod and --renormalize honor sparse checkouts
>     @@ Commit message
>          Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>
>       ## builtin/add.c ##
>     -@@ builtin/add.c: static void chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
>     -   for (i = 0; i < active_nr; i++) {
>     +@@ builtin/add.c: static int chmod_pathspec(struct pathspec *pathspec, char flip, int show_only)
>                 struct cache_entry *ce = active_cache[i];
>     +           int err;
>
>      +          if (ce_skip_worktree(ce))
>      +                  continue;
>     @@ builtin/add.c: static int renormalize_tracked_files(const struct pathspec *paths
>                 if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
>
>       ## t/t3705-add-sparse-checkout.sh ##
>     -@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
>     -   test_cmp actual expected
>     +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update sparse entries' '
>     +   test_cmp before after
>       '
>
>     --test_expect_failure 'git add --chmod does not update SKIP_WORKTREE entries' '
>     -+test_expect_success 'git add --chmod does not update SKIP_WORKTREE entries' '
>     +-test_expect_failure 'git add --chmod does not update sparse entries' '
>     ++test_expect_success 'git add --chmod does not update sparse entries' '
>         setup_sparse_entry &&
>         git add --chmod=+x sparse_entry &&
>     -   test_sparse_entry_unchanged
>     +   test_sparse_entry_unchanged &&
>     +   ! test -x sparse_entry
>       '
>
>     --test_expect_failure 'git add --renormalize does not update SKIP_WORKTREE entries' '
>     -+test_expect_success 'git add --renormalize does not update SKIP_WORKTREE entries' '
>     +-test_expect_failure 'git add --renormalize does not update sparse entries' '
>     ++test_expect_success 'git add --renormalize does not update sparse entries' '
>         test_config core.autocrlf false &&
>         setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
>         echo "sparse_entry text=auto" >.gitattributes &&
> 5:  00786cba82 = 4:  a8a8af22a0 pathspec: allow to ignore SKIP_WORKTREE entries on index matching
> -:  ---------- > 5:  d65b214dd1 refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag
> 6:  ce74a60e0d ! 6:  24e889ca9b add: warn when pathspec only matches SKIP_WORKTREE entries
>     @@ Commit message
>          add: warn when pathspec only matches SKIP_WORKTREE entries
>
>          `git add` already refrains from updating SKIP_WORKTREE entries, but it
>     -    silently succeeds when a pathspec only matches these entries. Instead,
>     -    let's warn the user and display a hint on how to update these entries.
>     +    silently exits with zero code when a pathspec only matches these
>     +    entries. Instead, let's warn the user and display a hint on how to
>     +    update these entries.
>
>          Note that the warning is only shown if the pathspec matches no untracked
>          paths in the working tree and only matches index entries with the
>     -    SKIP_WORKTREE bit set. Performance-wise, this patch doesn't change the
>     -    number of ce_path_match() calls in the worst case scenario (because we
>     -    still need to check the sparse entries for the warning). But in the
>     -    general case, it avoids unnecessarily calling this function for each
>     -    SKIP_WORKTREE entry.
>     -
>     -    A warning message was chosen over erroring out right away to reproduce
>     -    the same behavior `add` already exhibits with ignored files. This also
>     -    allow users to continue their workflow without having to invoke `add`
>     -    again with only the matching pathspecs, as the matched files will have
>     -    already been added.
>     -
>     -    Note: refresh_index() was changed to only mark matches with
>     -    no-SKIP-WORKTREE entries in the `seen` output parameter. This is exactly
>     -    the behavior we want for `add`, and only `add` calls this function with
>     -    a non-NULL `seen` pointer. So the change brings no side effect on
>     -    other callers.
>     +    SKIP_WORKTREE bit set. A warning message was chosen over erroring out
>     +    right away to reproduce the same behavior `add` already exhibits with
>     +    ignored files. This also allow users to continue their workflow without
>     +    having to invoke `add` again with only the matching pathspecs, as the
>     +    matched files will have already been added.
>
>          Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>
>     @@ builtin/add.c: static char *prune_directory(struct dir_struct *dir, struct paths
>      +  int i, ret = 0;
>      +  char *skip_worktree_seen = NULL;
>      +  struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
>     ++  int flags = REFRESH_DONT_MARK_SPARSE_MATCHES |
>     ++              (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
>
>         seen = xcalloc(pathspec->nr, 1);
>     -   refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
>     -                 pathspec, seen, _("Unstaged changes after refreshing the index:"));
>     +-  refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
>     +-                pathspec, seen, _("Unstaged changes after refreshing the index:"));
>     ++  refresh_index(&the_index, flags, pathspec, seen,
>     ++                _("Unstaged changes after refreshing the index:"));
>         for (i = 0; i < pathspec->nr; i++) {
>      -          if (!seen[i])
>      -                  die(_("pathspec '%s' did not match any files"),
>     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
>      -                      ((pathspec.items[i].magic &
>      -                        (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
>      -                       !file_exists(path))) {
>     -+                  if (seen[i] || !path[0])
>     ++                  if (seen[i])
>      +                          continue;
>      +
>      +                  if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen)) {
>     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
>      +                          continue;
>      +                  }
>      +
>     ++                  /* Don't complain at 'git add .' inside empty repo. */
>     ++                  if (!path[0])
>     ++                          continue;
>     ++
>      +                  if ((pathspec.items[i].magic & (PATHSPEC_GLOB | PATHSPEC_ICASE)) ||
>      +                      !file_exists(path)) {
>                                 if (ignore_missing) {
>     @@ pathspec.h: void add_pathspec_matches_against_index(const struct pathspec *paths
>                          const char *name, int namelen,
>                          const struct pathspec_item *item);
>
>     - ## read-cache.c ##
>     -@@ read-cache.c: int refresh_index(struct index_state *istate, unsigned int flags,
>     -           if (ignore_submodules && S_ISGITLINK(ce->ce_mode))
>     -                   continue;
>     -
>     --          if (pathspec && !ce_path_match(istate, ce, pathspec, seen))
>     -+          if (pathspec && !ce_path_match(istate, ce, pathspec,
>     -+                                         ce_skip_worktree(ce) ? NULL : seen))
>     -                   filtered = 1;
>     -
>     -           if (ce_stage(ce)) {
>     -
>       ## t/t3705-add-sparse-checkout.sh ##
>     -@@ t/t3705-add-sparse-checkout.sh: test_sparse_entry_unchanged() {
>     -   test_cmp expected actual
>     +@@ t/t3705-add-sparse-checkout.sh: setup_gitignore () {
>     +   EOF
>       }
>
>     -+cat >sparse_entry_error <<-EOF
>     -+The following pathspecs only matched index entries outside the current
>     -+sparse checkout:
>     -+sparse_entry
>     -+EOF
>     ++test_expect_success 'setup' '
>     ++  cat >sparse_error_header <<-EOF &&
>     ++  The following pathspecs only matched index entries outside the current
>     ++  sparse checkout:
>     ++  EOF
>     ++
>     ++  cat >sparse_hint <<-EOF &&
>     ++  hint: Disable or modify the sparsity rules if you intend to update such entries.
>     ++  hint: Disable this message with "git config advice.updateSparsePath false"
>     ++  EOF
>      +
>     -+cat >error_and_hint sparse_entry_error - <<-EOF
>     -+hint: Disable or modify the sparsity rules if you intend to update such entries.
>     -+hint: Disable this message with "git config advice.updateSparsePath false"
>     -+EOF
>     ++  echo sparse_entry | cat sparse_error_header - >sparse_entry_error &&
>     ++  cat sparse_entry_error sparse_hint >error_and_hint
>     ++'
>      +
>     - test_expect_success "git add does not remove SKIP_WORKTREE entries" '
>     + test_expect_success 'git add does not remove sparse entries' '
>         setup_sparse_entry &&
>         rm sparse_entry &&
>      -  git add sparse_entry &&
>     @@ t/t3705-add-sparse-checkout.sh: test_sparse_entry_unchanged() {
>         test_sparse_entry_unchanged
>       '
>
>     +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add -A does not remove sparse entries' '
>     +   setup_sparse_entry &&
>     +   rm sparse_entry &&
>     +   setup_gitignore &&
>     +-  git add -A &&
>     ++  git add -A 2>stderr &&
>     ++  test_must_be_empty stderr &&
>     +   test_sparse_entry_unchanged
>     + '
>     +
>     +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add . does not remove sparse entries' '
>     +   setup_sparse_entry &&
>     +   rm sparse_entry &&
>     +   setup_gitignore &&
>     +-  git add . &&
>     ++  test_must_fail git add . 2>stderr &&
>     ++
>     ++  cat sparse_error_header >expect &&
>     ++  echo . >>expect &&
>     ++  cat sparse_hint >>expect &&
>     ++
>     ++  test_i18ncmp expect stderr &&
>     +   test_sparse_entry_unchanged
>     + '
>     +
>      @@ t/t3705-add-sparse-checkout.sh: do
>     -   test_expect_success "git add$opt does not update SKIP_WORKTREE entries" '
>     +   test_expect_success "git add${opt:+ $opt} does not update sparse entries" '
>                 setup_sparse_entry &&
>                 echo modified >sparse_entry &&
>      -          git add $opt sparse_entry &&
>     @@ t/t3705-add-sparse-checkout.sh: do
>                 test_sparse_entry_unchanged
>         '
>       done
>     -@@ t/t3705-add-sparse-checkout.sh: done
>     - test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
>     +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update sparse entries' '
>         setup_sparse_entry &&
>     +   git ls-files --debug sparse_entry | grep mtime >before &&
>         test-tool chmtime -60 sparse_entry &&
>      -  git add --refresh sparse_entry &&
>      +  test_must_fail git add --refresh sparse_entry 2>stderr &&
>      +  test_i18ncmp error_and_hint stderr &&
>     +   git ls-files --debug sparse_entry | grep mtime >after &&
>     +   test_cmp before after
>     + '
>
>     -   # We must unset the SKIP_WORKTREE bit, otherwise
>     -   # git diff-files would skip examining the file
>     -@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --refresh does not update SKIP_WORKTREE entries' '
>     -
>     - test_expect_success 'git add --chmod does not update SKIP_WORKTREE entries' '
>     + test_expect_success 'git add --chmod does not update sparse entries' '
>         setup_sparse_entry &&
>      -  git add --chmod=+x sparse_entry &&
>      +  test_must_fail git add --chmod=+x sparse_entry 2>stderr &&
>      +  test_i18ncmp error_and_hint stderr &&
>     -   test_sparse_entry_unchanged
>     +   test_sparse_entry_unchanged &&
>     +   ! test -x sparse_entry
>       '
>     -
>     -@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --renormalize does not update SKIP_WORKTREE entries
>     +@@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --renormalize does not update sparse entries' '
>         test_config core.autocrlf false &&
>         setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
>         echo "sparse_entry text=auto" >.gitattributes &&
>     @@ t/t3705-add-sparse-checkout.sh: test_expect_success 'git add --renormalize does
>         test_sparse_entry_unchanged
>       '
>
>     ++test_expect_success 'git add --dry-run --ignore-missing warn on sparse path' '
>     ++  setup_sparse_entry &&
>     ++  rm sparse_entry &&
>     ++  test_must_fail git add --dry-run --ignore-missing sparse_entry 2>stderr &&
>     ++  test_i18ncmp error_and_hint stderr &&
>     ++  test_sparse_entry_unchanged
>     ++'
>     ++
>      +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
>      +  setup_sparse_entry &&
>     -+  test_must_fail git add nonexistent sp 2>stderr &&
>     ++  test_must_fail git add nonexistent 2>stderr &&
>      +  test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
>      +  test_i18ngrep ! "The following pathspecs only matched index entries" stderr
>      +'
>      +
>     ++test_expect_success 'do not warn when pathspec matches dense entries' '
>     ++  setup_sparse_entry &&
>     ++  echo modified >sparse_entry &&
>     ++  >dense_entry &&
>     ++  git add "*_entry" 2>stderr &&
>     ++  test_must_be_empty stderr &&
>     ++  test_sparse_entry_unchanged &&
>     ++  git ls-files --error-unmatch dense_entry
>     ++'
>     ++
>      +test_expect_success 'add obeys advice.updateSparsePath' '
>      +  setup_sparse_entry &&
>      +  test_must_fail git -c advice.updateSparsePath=false add sparse_entry 2>stderr &&
> 7:  e76c7c6999 ! 7:  08f0c32bfc rm: honor sparse checkout patterns
>     @@ Commit message
>          rules, but `git rm` doesn't follow the same restrictions. This is
>          somewhat counter-intuitive and inconsistent. So make `rm` honor the
>          sparse checkout and advise on how to remove SKIP_WORKTREE entries, just
>     -    like `add` does. Also add a few tests for the new behavior.
>     +    like `add` does. Also add some tests for the new behavior.
>
>          Suggested-by: Elijah Newren <newren@gmail.com>
>          Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
>     @@ builtin/rm.c: int cmd_rm(int argc, const char **argv, const char *prefix)
>      +  return ret;
>       }
>
>     - ## t/t3600-rm.sh ##
>     -@@ t/t3600-rm.sh: test_expect_success 'rm empty string should fail' '
>     -   test_must_fail git rm -rf ""
>     - '
>     -
>     -+test_expect_success 'setup repo for tests with sparse-checkout' '
>     -+  git init sparse &&
>     -+  (
>     -+          cd sparse &&
>     -+          mkdir -p sub/dir &&
>     -+          touch a b c sub/d sub/dir/e &&
>     -+          git add -A &&
>     -+          git commit -m files
>     -+  ) &&
>     + ## t/t3602-rm-sparse-checkout.sh (new) ##
>     +@@
>     ++#!/bin/sh
>     ++
>     ++test_description='git rm in sparse checked out working trees'
>     ++
>     ++. ./test-lib.sh
>     ++
>     ++test_expect_success 'setup' '
>     ++  mkdir -p sub/dir &&
>     ++  touch a b c sub/d sub/dir/e &&
>     ++  git add -A &&
>     ++  git commit -m files &&
>      +
>      +  cat >sparse_entry_b_error <<-EOF &&
>      +  The following pathspecs only matched index entries outside the current
>     @@ t/t3600-rm.sh: test_expect_success 'rm empty string should fail' '
>      +  EOF
>      +'
>      +
>     -+test_expect_success 'rm should respect sparse-checkout' '
>     -+  git -C sparse sparse-checkout set "/a" &&
>     -+  test_must_fail git -C sparse rm b 2>stderr &&
>     -+  test_i18ncmp b_error_and_hint stderr
>     ++for opt in "" -f --dry-run
>     ++do
>     ++  test_expect_success "rm${opt:+ $opt} does not remove sparse entries" '
>     ++          git sparse-checkout set a &&
>     ++          test_must_fail git rm $opt b 2>stderr &&
>     ++          test_i18ncmp b_error_and_hint stderr &&
>     ++          git ls-files --error-unmatch b
>     ++  '
>     ++done
>     ++
>     ++test_expect_success 'recursive rm does not remove sparse entries' '
>     ++  git reset --hard &&
>     ++  git sparse-checkout set sub/dir &&
>     ++  git rm -r sub &&
>     ++  git status --porcelain -uno >actual &&
>     ++  echo "D  sub/dir/e" >expected &&
>     ++  test_cmp expected actual
>      +'
>      +
>      +test_expect_success 'rm obeys advice.updateSparsePath' '
>     -+  git -C sparse reset --hard &&
>     -+  git -C sparse sparse-checkout set "/a" &&
>     -+  test_must_fail git -C sparse -c advice.updateSparsePath=false rm b 2>stderr &&
>     ++  git reset --hard &&
>     ++  git sparse-checkout set a &&
>     ++  test_must_fail git -c advice.updateSparsePath=false rm b 2>stderr &&
>      +  test_i18ncmp sparse_entry_b_error stderr
>     -+
>     -+'
>     -+
>     -+test_expect_success 'recursive rm should respect sparse-checkout' '
>     -+  (
>     -+          cd sparse &&
>     -+          git reset --hard &&
>     -+          git sparse-checkout set "sub/dir" &&
>     -+          git rm -r sub &&
>     -+          git status --porcelain -uno >../actual
>     -+  ) &&
>     -+  echo "D  sub/dir/e" >expected &&
>     -+  test_cmp expected actual
>      +'
>      +
>      +test_expect_success 'do not advice about sparse entries when they do not match the pathspec' '
>     -+  test_must_fail git -C sparse rm nonexistent 2>stderr &&
>     ++  git reset --hard &&
>     ++  git sparse-checkout set a &&
>     ++  test_must_fail git rm nonexistent 2>stderr &&
>      +  test_i18ngrep "fatal: pathspec .nonexistent. did not match any files" stderr &&
>      +  test_i18ngrep ! "The following pathspecs only matched index entries" stderr
>      +'
>      +
>     - test_done
>     ++test_expect_success 'do not warn about sparse entries when pathspec matches dense entries' '
>     ++  git reset --hard &&
>     ++  git sparse-checkout set a &&
>     ++  git rm "[ba]" 2>stderr &&
>     ++  test_must_be_empty stderr &&
>     ++  git ls-files --error-unmatch b &&
>     ++  test_must_fail git ls-files --error-unmatch a
>     ++'
>     ++
>     ++test_expect_success 'do not warn about sparse entries with --ignore-unmatch' '
>     ++  git reset --hard &&
>     ++  git sparse-checkout set a &&
>     ++  git rm --ignore-unmatch b 2>stderr &&
>     ++  test_must_be_empty stderr &&
>     ++  git ls-files --error-unmatch b
>     ++'
>     ++
>     ++test_done
>
>       ## t/t7011-skip-worktree-reading.sh ##
>      @@ t/t7011-skip-worktree-reading.sh: test_expect_success 'diff-files does not examine skip-worktree dirty entries' '
> --
> 2.30.1
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries
  2021-02-24  6:50         ` Elijah Newren
@ 2021-02-24 15:33           ` Matheus Tavares
  0 siblings, 0 replies; 56+ messages in thread
From: Matheus Tavares @ 2021-02-24 15:33 UTC (permalink / raw)
  To: newren; +Cc: git, gitster, stolee

On Wed, Feb 24, 2021 at 3:50 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Feb 23, 2021 at 8:05 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > +void advise_on_updating_sparse_paths(struct string_list *pathspec_list)
> > +{
> > +       struct string_list_item *item;
> > +
> > +       if (!pathspec_list->nr)
> > +               return;
> > +
> > +       fprintf(stderr, _("The following pathspecs only matched index entries outside the current\n"
> > +                         "sparse checkout:\n"));
> > +       for_each_string_list_item(item, pathspec_list)
> > +               fprintf(stderr, "%s\n", item->string);
>
> Was the use of fprintf(stderr, ...) because of the fact that you want
> to do multiple print statements?  I'm just curious if that was the
> reason for avoiding the warning() function, or if there was another
> consideration at play as well.

Yes, that was one of the reasons. The other was to use the same style as
the ignored files message, which doesn't print the "warning:" prefix.
But I don't have any strong preference here, I'd be OK with using
warning() too.

> > -static void refresh(int verbose, const struct pathspec *pathspec)
> > +static int refresh(int verbose, const struct pathspec *pathspec)
> >  {
> >         char *seen;
> > -       int i;
> > +       int i, ret = 0;
> > +       char *skip_worktree_seen = NULL;
> > +       struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
> > +       int flags = REFRESH_DONT_MARK_SPARSE_MATCHES |
> > +                   (verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET);
> >
> >         seen = xcalloc(pathspec->nr, 1);
> > -       refresh_index(&the_index, verbose ? REFRESH_IN_PORCELAIN : REFRESH_QUIET,
> > -                     pathspec, seen, _("Unstaged changes after refreshing the index:"));
> > +       refresh_index(&the_index, flags, pathspec, seen,
> > +                     _("Unstaged changes after refreshing the index:"));
> >         for (i = 0; i < pathspec->nr; i++) {
> > -               if (!seen[i])
> > -                       die(_("pathspec '%s' did not match any files"),
> > -                           pathspec->items[i].original);
> > +               if (!seen[i]) {
> > +                       if (matches_skip_worktree(pathspec, i, &skip_worktree_seen)) {
> > +                               string_list_append(&only_match_skip_worktree,
> > +                                                  pathspec->items[i].original);
> > +                       } else {
> > +                               die(_("pathspec '%s' did not match any files"),
> > +                                   pathspec->items[i].original);
> > +                       }
> > +               }
> > +       }
> > +
> > +       if (only_match_skip_worktree.nr) {
> > +               advise_on_updating_sparse_paths(&only_match_skip_worktree);
> > +               ret = 1;
> >         }
>
> On first reading, I missed that the code die()s if there are any
> non-SKIP_WORKTREE entries matched, and that is the reason you know
> that only SKIP_WORKTREE entries could have been matched for this last
> if-statement.

Hmm, I may be misinterpreting your explanation, but I think the
reasoning is slightly different. The code die()s if there are _no_
matches either among sparse or dense entries. The reason why we know
that only sparse entries matched the pathspecs in this last if-statement
is because the `only_match_skip_worktree` list is only appended when a
pathspec is not marked on seen[] (dense entries only), but it is marked
on skip_worktree_seen[] (sparse entries only).

> Hmm...here's an interesting command sequence:
>
> git init lame
> cd lame
> mkdir baz
> touch baz/tracked
> git add baz/tracked
> git update-index --skip-worktree baz/tracked
> rm baz/tracked.  # But leave the empty directory!
> echo baz >.gitignore
> git add --ignore-missing --dry-run baz
>
>
> Reports the following:
> """
> The following pathspecs only matched index entries outside the current
> sparse checkout:
> baz
> hint: Disable or modify the sparsity rules if you intend to update such entries.
> hint: Disable this message with "git config advice.updateSparsePath false"
> The following paths are ignored by one of your .gitignore files:
> baz
> hint: Use -f if you really want to add them.
> hint: Turn this message off by running
> hint: "git config advice.addIgnoredFile false"
> """

That's interesting. You can also trigger this behavior with a plain add
(i.e. without "--ignore-missing --dry-run").

Since we only get the list of ignored paths from fill_directory(), we
can't really tell whether a specific pathspec item had matches among
ignored files or not. If we had this information, we could conditionally
skip the sparse warning.

I.e. something like this (WARNING: hacky and just briefly tested):

diff --git a/builtin/add.c b/builtin/add.c
index fde6462850..90614e7e76 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -597,3 +597,3 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		int i;
-		char *skip_worktree_seen = NULL;
+		char *skip_worktree_seen = NULL, *ignored_seen = NULL;
 		struct string_list only_match_skip_worktree = STRING_LIST_INIT_NODUP;
@@ -621,3 +621,14 @@ int cmd_add(int argc, const char **argv, const char *prefix)

-			if (matches_skip_worktree(&pathspec, i, &skip_worktree_seen)) {
+			if (dir.ignored_nr) {
+				int j, prefix_len = common_prefix_len(&pathspec);
+				ignored_seen = xcalloc(pathspec.nr, 1);
+				for (j = 0; j < dir.ignored_nr; j++) {
+					dir_path_match(&the_index, dir.ignored[j],
+						       &pathspec, prefix_len,
+						       ignored_seen);
+				}
+			}
+
+			if (ignored_seen && !ignored_seen[i] &&
+			    matches_skip_worktree(&pathspec, i, &skip_worktree_seen)) {
 				string_list_append(&only_match_skip_worktree,
diff --git a/dir.c b/dir.c
index d153a63bbd..a19bc7aa0b 100644
--- a/dir.c
+++ b/dir.c
@@ -136,3 +136,3 @@ static int fnmatch_icase_mem(const char *pattern, int patternlen,

-static size_t common_prefix_len(const struct pathspec *pathspec)
+size_t common_prefix_len(const struct pathspec *pathspec)
 {
diff --git a/dir.h b/dir.h
index facfae4740..aa2d4aa71b 100644
--- a/dir.h
+++ b/dir.h
@@ -355,2 +355,3 @@ int simple_length(const char *match);
 int no_wildcard(const char *string);
+size_t common_prefix_len(const struct pathspec *pathspec);
 char *common_prefix(const struct pathspec *pathspec);


Now `git add baz` would only produce:

The following paths are ignored by one of your .gitignore files:
baz
hint: Use -f if you really want to add them.
hint: Turn this message off by running
hint: "git config advice.addIgnoredFile false"


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2021-02-24 16:00 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 21:01 [PATCH] rm: honor sparse checkout patterns Matheus Tavares
2020-11-12 23:54 ` Elijah Newren
2020-11-13 13:47   ` Derrick Stolee
2020-11-15 20:12     ` Matheus Tavares Bernardino
2020-11-15 21:42       ` Johannes Sixt
2020-11-16 12:37         ` Matheus Tavares Bernardino
2020-11-23 13:23           ` Johannes Schindelin
2020-11-24  2:48             ` Matheus Tavares Bernardino
2020-11-16 14:30     ` Jeff Hostetler
2020-11-17  4:53       ` Elijah Newren
2020-11-16 13:58 ` [PATCH v2] " Matheus Tavares
2021-02-17 21:02   ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Matheus Tavares
2021-02-17 21:02     ` [RFC PATCH 1/7] add --chmod: don't update index when --dry-run is used Matheus Tavares
2021-02-17 21:45       ` Junio C Hamano
2021-02-18  1:33         ` Matheus Tavares
2021-02-17 21:02     ` [RFC PATCH 2/7] add: include magic part of pathspec on --refresh error Matheus Tavares
2021-02-17 22:20       ` Junio C Hamano
2021-02-17 21:02     ` [RFC PATCH 3/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
2021-02-17 23:01       ` Junio C Hamano
2021-02-17 23:22         ` Eric Sunshine
2021-02-17 23:34           ` Junio C Hamano
2021-02-18  3:11           ` Matheus Tavares Bernardino
2021-02-18  3:07         ` Matheus Tavares Bernardino
2021-02-18 14:38           ` Matheus Tavares
2021-02-18 19:05             ` Junio C Hamano
2021-02-18 19:02           ` Junio C Hamano
2021-02-22 18:53         ` Elijah Newren
2021-02-17 21:02     ` [RFC PATCH 4/7] add: make --chmod and --renormalize honor " Matheus Tavares
2021-02-17 21:02     ` [RFC PATCH 5/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching Matheus Tavares
2021-02-17 21:02     ` [RFC PATCH 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
2021-02-19  0:34       ` Junio C Hamano
2021-02-19 17:11         ` Matheus Tavares Bernardino
2021-02-17 21:02     ` [RFC PATCH 7/7] rm: honor sparse checkout patterns Matheus Tavares
2021-02-22 18:57     ` [RFC PATCH 0/7] add/rm: honor sparse checkout and warn on sparse paths Elijah Newren
2021-02-24  4:05     ` [PATCH v2 " Matheus Tavares
2021-02-24  4:05       ` [PATCH v2 1/7] add: include magic part of pathspec on --refresh error Matheus Tavares
2021-02-24  4:05       ` [PATCH v2 2/7] t3705: add tests for `git add` in sparse checkouts Matheus Tavares
2021-02-24  5:15         ` Elijah Newren
2021-02-24  4:05       ` [PATCH v2 3/7] add: make --chmod and --renormalize honor " Matheus Tavares
2021-02-24  4:05       ` [PATCH v2 4/7] pathspec: allow to ignore SKIP_WORKTREE entries on index matching Matheus Tavares
2021-02-24  5:23         ` Elijah Newren
2021-02-24  4:05       ` [PATCH v2 5/7] refresh_index(): add REFRESH_DONT_MARK_SPARSE_MATCHES flag Matheus Tavares
2021-02-24  4:05       ` [PATCH v2 6/7] add: warn when pathspec only matches SKIP_WORKTREE entries Matheus Tavares
2021-02-24  6:50         ` Elijah Newren
2021-02-24 15:33           ` Matheus Tavares
2021-02-24  4:05       ` [PATCH v2 7/7] rm: honor sparse checkout patterns Matheus Tavares
2021-02-24  6:59         ` Elijah Newren
2021-02-24  7:05       ` [PATCH v2 0/7] add/rm: honor sparse checkout and warn on sparse paths Elijah Newren
2020-11-16 20:14 ` [PATCH] rm: honor sparse checkout patterns Junio C Hamano
2020-11-17  5:20   ` Elijah Newren
2020-11-20 17:06     ` Elijah Newren
2020-12-31 20:03       ` sparse-checkout questions and proposals [Was: Re: [PATCH] rm: honor sparse checkout patterns] Elijah Newren
2021-01-04  3:02         ` Derrick Stolee
2021-01-06 19:15           ` Elijah Newren
2021-01-07 12:53             ` Derrick Stolee
2021-01-07 17:36               ` Elijah Newren

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git