git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] [RFC] diff: introduce scope option
@ 2022-10-31  4:11 ZheNing Hu via GitGitGadget
  2022-11-01  1:34 ` Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2022-10-31  4:11 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Ævar Arnfjörð Bjarmason,
	Jeff King, Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Victoria Dye, Elijah Newren, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan, ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

When we use sparse-checkout, we often want the set of files
that some commands operate on to be restricted to the
sparse-checkout specification.

So introduce the `--scope` option to git diff, which have two
value: "sparse" and "all". "sparse" mean that diff is performed
restrict to paths which matching sparse-checkout specification,
"all" mean that diff is performed regardless of whether the path
meets the sparse-checkout specification. `--no-scope` is the default
option for now.

Add `diff.scope={sparse, all}` config, which can also have the same
capabilities as `--scope`, and it will be covered by `--scope` option.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    [RFC] diff: introduce scope option
    
    In [1], we discovered that users working on different sparse-checkout
    specification may download unnecessary blobs from each other's
    specification in collaboration. In [2] Junio suggested that maybe we can
    restrict some git command's filespec in sparse-checkout specification to
    elegantly solve this problem above. In [3]: Newren and Derrick Stolee
    prefer to name the option --scope={sparse, all}.
    
    So this patch is attempt to do this thing on git diff:
    
    v1:
    
     1. add --restrict option to git diff, which restrict diff filespec in
        sparse-checkout specification. [4] v2.
     2. rename --restrict to --scope={sparse, all}, support --no-scope.
     3. add config: diff.scope={sparse,all}.
    
    Unresolved work:
    
     1. how to properly pass this --scope={sparse, all} to other commands
        like git log, git format-patch, etc.
     2. how to set the default value of scope for different diff commands.
    
    [1]:
    https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
    [2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
    https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
    [4]:
    https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1398

 Documentation/config/diff.txt         |  12 +
 Documentation/diff-options.txt        |  18 +
 builtin/diff.c                        |   4 +
 diff-lib.c                            |  36 +-
 diff-no-index.c                       |   4 +
 diff.c                                |  39 +++
 diff.h                                |  11 +
 t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
 tree-diff.c                           |   5 +
 9 files changed, 597 insertions(+), 1 deletion(-)
 create mode 100644 t/t4070-diff-sparse-checkout-scope.sh

diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
index 35a7bf86d77..52707e1b2d6 100644
--- a/Documentation/config/diff.txt
+++ b/Documentation/config/diff.txt
@@ -201,6 +201,18 @@ diff.algorithm::
 --
 +
 
+diff.scope::
+	Choose diff scope. The variants are as follows:
++
+--
+`sparse`;;
+	Restrict diff paths to those matching sparse-checkout specification.
+`all`;;
+	Without restriction, diff is performed regardless of whether the path
+	meets the sparse-checkout specification.
+--
++
+
 diff.wsErrorHighlight::
 	Highlight whitespace errors in the `context`, `old` or `new`
 	lines of the diff.  Multiple values are separated by comma,
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 3674ac48e92..04bf83e8be1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
 non-default value and want to use the default one, then you
 have to use `--diff-algorithm=default` option.
 
+ifndef::git-format-patch[]
+ifndef::git-log[]
+
+--scope={sparse|all}::
+	Choose diff scope. The variants are as follows:
++
+--
+`--sparse`;;
+	Restrict diff paths to those matching sparse-checkout specification.
+`--all`;;
+	Without restriction, diff is performed regardless of whether the path
+	meets the sparse-checkout specification.
+--
++
+
+endif::git-log[]
+endif::git-format-patch[]
+
 --stat[=<width>[,<name-width>[,<count>]]]::
 	Generate a diffstat. By default, as much space as necessary
 	will be used for the filename part, and the rest for the graph
diff --git a/builtin/diff.c b/builtin/diff.c
index 854d2c5a5c4..6b450f7184c 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -54,6 +54,10 @@ static void stuff_change(struct diff_options *opt,
 	    oideq(old_oid, new_oid) && (old_mode == new_mode))
 		return;
 
+	if (opt->scope == DIFF_SCOPE_SPARSE &&
+	    !diff_paths_in_sparse_checkout(old_path, new_path))
+		return;
+
 	if (opt->flags.reverse_diff) {
 		SWAP(old_mode, new_mode);
 		SWAP(old_oid, new_oid);
diff --git a/diff-lib.c b/diff-lib.c
index 2edea41a234..a3381f2e0ff 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -88,6 +88,22 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
 	return changed;
 }
 
+int diff_path_in_sparse_checkout(const char *path) {
+	if (core_sparse_checkout_cone)
+		return path_in_cone_mode_sparse_checkout(path, the_repository->index);
+	else
+		return path_in_sparse_checkout(path, the_repository->index);
+}
+
+int diff_paths_in_sparse_checkout(const char *one, const char*two) {
+	if (one == two || !strcmp(one, two))
+		return diff_path_in_sparse_checkout(one);
+	else
+		return diff_path_in_sparse_checkout(one) &&
+		       diff_path_in_sparse_checkout(two);
+}
+
+
 int run_diff_files(struct rev_info *revs, unsigned int option)
 {
 	int entries, i;
@@ -113,6 +129,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 
 		if (diff_can_quit_early(&revs->diffopt))
 			break;
+		if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
+		    !diff_path_in_sparse_checkout(ce->name))
+			continue;
 
 		if (!ce_path_match(istate, ce, &revs->prune_data, NULL))
 			continue;
@@ -202,7 +221,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				continue;
 		}
 
-		if (ce_uptodate(ce) || ce_skip_worktree(ce))
+		if (ce_uptodate(ce) ||
+		    (revs->diffopt.scope != DIFF_SCOPE_ALL && ce_skip_worktree(ce)))
 			continue;
 
 		/*
@@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 			return;	/* nothing to diff.. */
 	}
 
+	if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
+		if (idx && tree) {
+			if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
+				return;
+		} else if (idx) {
+			if (!diff_path_in_sparse_checkout(idx->name))
+				return;
+		} else if (tree) {
+			if (!diff_path_in_sparse_checkout(tree->name))
+				return;
+		} else
+			return;
+	}
+
 	/* if the entry is not checked out, don't examine work tree */
 	cached = o->index_only ||
 		(idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));
diff --git a/diff-no-index.c b/diff-no-index.c
index 18edbdf4b59..ea94a104ea4 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -281,6 +281,10 @@ int diff_no_index(struct rev_info *revs,
 
 	fixup_paths(paths, &replacement);
 
+	if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
+	    !diff_paths_in_sparse_checkout(paths[0], paths[1]))
+		goto out;
+
 	revs->diffopt.skip_stat_unmatch = 1;
 	if (!revs->diffopt.output_format)
 		revs->diffopt.output_format = DIFF_FORMAT_PATCH;
diff --git a/diff.c b/diff.c
index 285d6e2d575..9de4044ae05 100644
--- a/diff.c
+++ b/diff.c
@@ -48,6 +48,7 @@ static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
 static const char *external_diff_cmd_cfg;
 static const char *diff_order_file_cfg;
+static const char *external_diff_scope_cfg;
 int diff_auto_refresh_index = 1;
 static int diff_mnemonic_prefix;
 static int diff_no_prefix;
@@ -57,6 +58,7 @@ static int diff_dirstat_permille_default = 30;
 static struct diff_options default_diff_options;
 static long diff_algorithm;
 static unsigned ws_error_highlight_default = WSEH_NEW;
+static enum diff_scope external_diff_scope;
 
 static char diff_colors[][COLOR_MAXLEN] = {
 	GIT_COLOR_RESET,
@@ -423,6 +425,16 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "diff.scope")) {
+		git_config_string(&external_diff_scope_cfg, var, value);
+		if (!strcmp(value, "all"))
+			external_diff_scope = DIFF_SCOPE_ALL;
+		else if (!strcmp(value, "sparse"))
+			external_diff_scope = DIFF_SCOPE_SPARSE;
+		else
+			return -1;
+	}
+
 	if (git_color_config(var, value, cb) < 0)
 		return -1;
 
@@ -4663,6 +4675,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
 
 	options->color_moved = diff_color_moved_default;
 	options->color_moved_ws_handling = diff_color_moved_ws_default;
+	options->scope = external_diff_scope;
 
 	prep_parse_options(options);
 }
@@ -4914,6 +4927,29 @@ static int parse_dirstat_opt(struct diff_options *options, const char *params)
 	return 1;
 }
 
+static int diff_opt_diff_scope(const struct option *option,
+				const char *optarg, int unset)
+{
+	struct diff_options *opt = option->value;
+
+	if (unset) {
+		opt->scope = DIFF_SCOPE_NONE;
+	} else if (optarg) {
+		if (!strcmp(optarg, "all")) {
+			if (core_apply_sparse_checkout) {
+				opt->scope = DIFF_SCOPE_ALL;
+			}
+		} else if (!strcmp(optarg, "sparse")) {
+			if (core_apply_sparse_checkout) {
+				opt->scope = DIFF_SCOPE_SPARSE;
+			}
+		} else
+			return error(_("invalid --scope value: %s"), optarg);
+	}
+
+	return 0;
+}
+
 static int diff_opt_diff_filter(const struct option *option,
 				const char *optarg, int unset)
 {
@@ -5683,6 +5719,9 @@ static void prep_parse_options(struct diff_options *options)
 		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
 			       N_("select files by diff type"),
 			       PARSE_OPT_NONEG, diff_opt_diff_filter),
+		OPT_CALLBACK_F(0, "scope", options, N_("[sparse|all]"),
+			       N_("choose diff scope"),
+			       PARSE_OPT_OPTARG, diff_opt_diff_scope),
 		{ OPTION_CALLBACK, 0, "output", options, N_("<file>"),
 		  N_("output to a specific file"),
 		  PARSE_OPT_NONEG, NULL, 0, diff_opt_output },
diff --git a/diff.h b/diff.h
index 8ae18e5ab1e..90f7512034c 100644
--- a/diff.h
+++ b/diff.h
@@ -230,6 +230,12 @@ enum diff_submodule_format {
 	DIFF_SUBMODULE_INLINE_DIFF
 };
 
+enum diff_scope {
+	DIFF_SCOPE_NONE = 0,
+	DIFF_SCOPE_ALL,
+	DIFF_SCOPE_SPARSE,
+};
+
 /**
  * the set of options the calling program wants to affect the operation of
  * diffcore library with.
@@ -285,6 +291,9 @@ struct diff_options {
 	/* diff-filter bits */
 	unsigned int filter, filter_not;
 
+	/* diff sparse-checkout scope */
+	enum diff_scope scope;
+
 	int use_color;
 
 	/* Number of context lines to generate in patch output. */
@@ -696,4 +705,6 @@ void print_stat_summary(FILE *fp, int files,
 			int insertions, int deletions);
 void setup_diff_pager(struct diff_options *);
 
+int diff_path_in_sparse_checkout(const char *path);
+int diff_paths_in_sparse_checkout(const char *one, const char *two);
 #endif /* DIFF_H */
diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
new file mode 100644
index 00000000000..dca75a3308b
--- /dev/null
+++ b/t/t4070-diff-sparse-checkout-scope.sh
@@ -0,0 +1,469 @@
+#!/bin/sh
+
+test_description='diff sparse-checkout scope'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+
+test_expect_success 'setup' '
+	git init temp &&
+	(
+		cd temp &&
+		mkdir sub1 &&
+		mkdir sub2 &&
+		echo sub1/file1 >sub1/file1 &&
+		echo sub2/file2 >sub2/file2 &&
+		echo file1 >file1 &&
+		echo file2 >file2 &&
+		git add --all &&
+		git commit -m init &&
+		echo sub1/file1 >>sub1/file1 &&
+		echo sub1/file2 >>sub1/file2 &&
+		echo sub2/file1 >>sub2/file1 &&
+		echo sub2/file2 >>sub2/file2 &&
+		echo file1 >>file1 &&
+		echo file2 >>file2 &&
+		git add --all &&
+		git commit -m change1 &&
+		echo sub1/file1 >>sub1/file1 &&
+		echo sub1/file2 >>sub1/file2 &&
+		echo sub2/file1 >>sub2/file1 &&
+		echo sub2/file2 >>sub2/file2 &&
+		echo file1 >>file1 &&
+		echo file2 >>file2 &&
+		git add --all &&
+		git commit -m change2
+	)
+'
+
+reset_repo () {
+	rm -rf repo &&
+	git clone --no-checkout temp repo
+}
+
+reset_with_sparse_checkout() {
+	reset_repo &&
+	git -C repo sparse-checkout set $1 sub1 &&
+	git -C repo checkout
+}
+
+change_worktree_and_index() {
+	(
+		cd repo &&
+		mkdir sub2 sub3 &&
+		echo sub1/file3 >sub1/file3 &&
+		echo sub2/file3 >sub2/file3 &&
+		echo sub3/file3 >sub3/file3 &&
+		echo file3 >file3 &&
+		git add --all --sparse &&
+		echo sub1/file3 >>sub1/file3 &&
+		echo sub2/file3 >>sub2/file3 &&
+		echo sub3/file3 >>sub3/file3 &&
+		echo file3 >>file3
+	)
+}
+
+diff_scope() {
+	title=$1
+	need_change_worktree_and_index=$2
+	sparse_checkout_option=$3
+	scope_option=$4
+	expect=$5
+	shift 5
+	args=("$@")
+
+	test_expect_success "$title $sparse_checkout_option $scope_option" "
+		reset_with_sparse_checkout $sparse_checkout_option &&
+		if test \"$need_change_worktree_and_index\" = \"true\" ; then
+			change_worktree_and_index
+		fi &&
+		git -C repo diff $scope_option ${args[*]} >actual &&
+		if test -z \"$expect\" ; then
+			>expect
+		else
+			cat > expect <<-EOF
+$expect
+			EOF
+		fi &&
+		test_cmp expect actual
+	"
+}
+
+args=("--name-only" "HEAD" "HEAD~")
+diff_scope builtin_diff_tree false "--no-cone" "--scope=sparse" \
+"sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--no-cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--no-cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--cone" "--scope=sparse" \
+"file1
+file2
+sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_tree false "--cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+args=("--name-only" "HEAD~")
+diff_scope builtin_diff_index true "--no-cone" "--scope=sparse" \
+"sub1/file1
+sub1/file2
+sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--no-cone" "--scope=all" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--no-cone" "--no-scope" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--cone" "--scope=sparse" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--cone" "--scope=all" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+diff_scope builtin_diff_index true "--cone" "--no-scope" \
+"file1
+file2
+file3
+sub1/file1
+sub1/file2
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3
+sub3/file3" "${args[@]}"
+
+args=("--name-only" "file3" "sub1/" "sub2/")
+
+diff_scope builtin_diff_files true "--no-cone" "--scope=sparse" \
+"sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--no-cone" "--scope=all" \
+"file3
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--no-cone" "--no-scope" \
+"file3
+sub1/file3
+sub2/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--cone" "--scope=sparse" \
+"file3
+sub1/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--cone" "--scope=all" \
+"file3
+sub1/file3
+sub2/file1
+sub2/file2
+sub2/file3" "${args[@]}"
+
+diff_scope builtin_diff_files true "--cone" "--no-scope" \
+"file3
+sub1/file3
+sub2/file3" "${args[@]}"
+
+
+args=("--name-only" "HEAD~:sub2/file2" "sub1/file2")
+
+diff_scope builtin_diff_b_f true "--no-cone" "--scope=sparse" \
+"" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--no-cone" "--scope=all" \
+"sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--no-cone" "--no-scope" \
+"sub1/file2" "${args[@]}"
+
+args=("--name-only" "HEAD~:sub1/file1" "file3")
+
+diff_scope builtin_diff_b_f true "--cone" "--scope=sparse" \
+"file3" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--cone" "--scope=all" \
+"file3" "${args[@]}"
+
+diff_scope builtin_diff_b_f true "--cone" "--no-scope" \
+"file3" "${args[@]}"
+
+args=("--name-only" HEAD~:sub2/file2 HEAD:sub1/file2)
+
+diff_scope builtin_diff_blobs true "--no-cone" "--scope=sparse" \
+"" "${args[@]}"
+
+diff_scope builtin_diff_blobs true "--no-cone" "--scope=all" \
+"sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_blobs true "--no-cone" "--no-scope" \
+"sub1/file2" "${args[@]}"
+
+args=("--name-only" HEAD~:sub1/file1 HEAD:file2)
+
+diff_scope builtin_diff_blobs false "--cone" "--scope=sparse" \
+"file2" "${args[@]}"
+
+diff_scope builtin_diff_blobs false "--cone" "--scope=all" \
+"file2" "${args[@]}"
+
+diff_scope builtin_diff_blobs false "--cone" "--no-scope" \
+"file2" "${args[@]}"
+
+args=("--name-only" HEAD~2 HEAD~ HEAD)
+
+diff_scope builtin_diff_combined false "--no-cone" "--scope=sparse" \
+"sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--no-cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--no-cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--cone" "--scope=sparse" \
+"file1
+file2
+sub1/file1
+sub1/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--cone" "--scope=all" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+diff_scope builtin_diff_combined false "--cone" "--no-scope" \
+"file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2" "${args[@]}"
+
+test_expect_success 'diff_no_index --no-cone, --scope=sparse' '
+	reset_with_sparse_checkout --no-cone &&
+	(
+		cd repo &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 sub1/file2 >actual &&
+	cat > expect <<-EOF &&
+sub1/file2
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 sub3/file3 >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --no-cone, --scope=all' '
+	reset_with_sparse_checkout --no-cone &&
+	(
+		cd repo &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 sub1/file2 >actual &&
+	cat > expect <<-EOF &&
+sub1/file2
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --no-cone, --no-scope' '
+	reset_with_sparse_checkout --no-cone &&
+	(
+		cd repo &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 sub1/file2 >actual &&
+	cat > expect <<-EOF &&
+sub1/file2
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --cone, --scope=sparse' '
+	reset_with_sparse_checkout --cone &&
+	(
+		cd repo &&
+		echo file3 >file3 &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 file3 >actual &&
+	cat > expect <<-EOF &&
+file3
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=sparse sub1/file1 sub3/file3 >actual &&
+	>expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --cone, --scope=all' '
+	reset_with_sparse_checkout --cone &&
+	(
+		cd repo &&
+		echo file3 >file3 &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 file3 >actual &&
+	cat > expect <<-EOF &&
+file3
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --scope=all sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff_no_index --cone, --no-scope' '
+	reset_with_sparse_checkout --cone &&
+	(
+		cd repo &&
+		echo file3 >file3 &&
+		mkdir sub3 &&
+		echo sub3/file3 >sub3/file3
+	) &&
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 file3 >actual &&
+	cat > expect <<-EOF &&
+file3
+	EOF
+	test_expect_code 1 git -C repo diff --no-index --name-only --no-scope sub1/file1 sub3/file3 >actual &&
+	cat > expect <<-EOF &&
+sub3/file3
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff scope config sparse' '
+	reset_with_sparse_checkout --cone &&
+	git -C repo -c diff.scope=sparse diff --name-only HEAD~ >actual &&
+	cat > expect <<-EOF &&
+file1
+file2
+sub1/file1
+sub1/file2
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff scope config all' '
+	reset_with_sparse_checkout --cone &&
+	git -C repo -c diff.scope=all diff --name-only HEAD~ >actual &&
+	cat > expect <<-EOF &&
+file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'diff scope config override by option' '
+	reset_with_sparse_checkout --cone &&
+	git -C repo -c diff.scope=sparse diff --name-only --scope=all HEAD~ >actual &&
+	cat > expect <<-EOF &&
+file1
+file2
+sub1/file1
+sub1/file2
+sub2/file1
+sub2/file2
+	EOF
+	test_cmp expect actual
+'
+
+test_done
diff --git a/tree-diff.c b/tree-diff.c
index 69031d7cbae..67f99c8e4df 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -76,6 +76,11 @@ static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
 static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_diff_path *p)
 {
 	struct combine_diff_parent *p0 = &p->parent[0];
+
+	if (opt->scope == DIFF_SCOPE_SPARSE &&
+	    !diff_path_in_sparse_checkout(p->path))
+		return 0;
+
 	if (p->mode && p0->mode) {
 		opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid,
 			1, 1, p->path, 0, 0);

base-commit: 63bba4fdd86d80ef061c449daa97a981a9be0792
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] diff: introduce scope option
  2022-10-31  4:11 [PATCH] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
@ 2022-11-01  1:34 ` Taylor Blau
  2022-11-01  2:13   ` ZheNing Hu
  2022-11-01  5:18 ` Elijah Newren
  2022-11-25  2:45 ` [PATCH v2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: Taylor Blau @ 2022-11-01  1:34 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Christian Couder, Ævar Arnfjörð Bjarmason,
	Jeff King, Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Victoria Dye, Elijah Newren, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan, ZheNing Hu

On Mon, Oct 31, 2022 at 04:11:52AM +0000, ZheNing Hu via GitGitGadget wrote:
>  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++

It looks like this test is non-executable, leading to the following
error:

    ~/s/git [nand] (zh/diff--scope) $ make test
        SUBDIR templates
    make -C t/ all
    make[1]: Entering directory '/home/ttaylorr/src/git/t'
    rm -f -r 'test-results'
    non-executable tests: t4070-diff-sparse-checkout-scope.sh
    make[1]: *** [Makefile:120: test-lint-executable] Error 1
    make[1]: *** Waiting for unfinished jobs....
    make[1]: Leaving directory '/home/ttaylorr/src/git/t'
    make: *** [Makefile:3083: test] Error 2

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] diff: introduce scope option
  2022-11-01  1:34 ` Taylor Blau
@ 2022-11-01  2:13   ` ZheNing Hu
  0 siblings, 0 replies; 11+ messages in thread
From: ZheNing Hu @ 2022-11-01  2:13 UTC (permalink / raw)
  To: Taylor Blau
  Cc: ZheNing Hu via GitGitGadget, git, Christian Couder,
	Ævar Arnfjörð Bjarmason, Jeff King, Junio C Hamano,
	Derrick Stolee, Johannes Schindelin, Victoria Dye, Elijah Newren,
	Emily Shaffer, Matheus Tavares Bernardino, Shaoxuan Yuan

Taylor Blau <me@ttaylorr.com> 于2022年11月1日周二 09:34写道:
>
> On Mon, Oct 31, 2022 at 04:11:52AM +0000, ZheNing Hu via GitGitGadget wrote:
> >  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
>
> It looks like this test is non-executable, leading to the following
> error:
>
>     ~/s/git [nand] (zh/diff--scope) $ make test
>         SUBDIR templates
>     make -C t/ all
>     make[1]: Entering directory '/home/ttaylorr/src/git/t'
>     rm -f -r 'test-results'
>     non-executable tests: t4070-diff-sparse-checkout-scope.sh
>     make[1]: *** [Makefile:120: test-lint-executable] Error 1
>     make[1]: *** Waiting for unfinished jobs....
>     make[1]: Leaving directory '/home/ttaylorr/src/git/t'
>     make: *** [Makefile:3083: test] Error 2
>

Thanks for pointing out that I had overlooked this problem.

> Thanks,
> Taylor

ZheNing Hu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] diff: introduce scope option
  2022-10-31  4:11 [PATCH] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
  2022-11-01  1:34 ` Taylor Blau
@ 2022-11-01  5:18 ` Elijah Newren
  2022-11-06  2:11   ` ZheNing Hu
  2022-11-25  2:45 ` [PATCH v2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: Elijah Newren @ 2022-11-01  5:18 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Christian Couder, Ævar Arnfjörð Bjarmason,
	Jeff King, Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Victoria Dye, Emily Shaffer, Matheus Tavares Bernardino,
	Shaoxuan Yuan, ZheNing Hu

Hi,

On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: ZheNing Hu <adlternative@gmail.com>
>
> When we use sparse-checkout, we often want the set of files
> that some commands operate on to be restricted to the
> sparse-checkout specification.

It seems a bit premature to send this, when the guideline document[*]
detailing how these options should work is still in the "Needs Review"
state.  I know, it's been waiting for a while, but it's a _long_
document.

[*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/

> So introduce the `--scope` option to git diff, which have two
> value: "sparse" and "all". "sparse" mean that diff is performed
> restrict to paths which matching sparse-checkout specification,
> "all" mean that diff is performed regardless of whether the path
> meets the sparse-checkout specification.

The wording probably needs some care to reflect the fact that it only
affects cases where either --cached or revisions are passed.  In
particular, your wording for "all" suggests behavior very different
from today, whereas "all" is probably best thought of as today's
current behavior.  For example, a plain `git diff` without --cached or
revisions, should be unaffected by either of these flags.

> `--no-scope` is the default
> option for now.

What does --no-scope even mean?  You didn't explain it, and I don't
see how it could make sense.  We explicitly avoided a `--no-` prefix
by allowing the --scope option to take a value.  I don't think there
should be a --no-scope option.

> Add `diff.scope={sparse, all}` config, which can also have the same
> capabilities as `--scope`, and it will be covered by `--scope` option.

This is not what we want.  The high level usecases should not need to
be configured per-command.  There should be a config option which
reflects the high level use cases (e.g. sparse.scope) and then all
relevant commands (diff, log, grep, etc.) can key off it.

> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>     [RFC] diff: introduce scope option
>
>     In [1], we discovered that users working on different sparse-checkout
>     specification may download unnecessary blobs from each other's
>     specification in collaboration. In [2] Junio suggested that maybe we can
>     restrict some git command's filespec in sparse-checkout specification to
>     elegantly solve this problem above. In [3]: Newren and Derrick Stolee
>     prefer to name the option --scope={sparse, all}.
>
>     So this patch is attempt to do this thing on git diff:
>
>     v1:
>
>      1. add --restrict option to git diff, which restrict diff filespec in
>         sparse-checkout specification. [4] v2.
>      2. rename --restrict to --scope={sparse, all}, support --no-scope.
>      3. add config: diff.scope={sparse,all}.
>
>     Unresolved work:
>
>      1. how to properly pass this --scope={sparse, all} to other commands
>         like git log, git format-patch, etc.

log & grep should accept a similar flag.  format-patch should not, and
should ignore any config in this area.

>      2. how to set the default value of scope for different diff commands.

I don't understand this.

>     [1]:
>     https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
>     [2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
>     https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
>     [4]:
>     https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1398
>
>  Documentation/config/diff.txt         |  12 +
>  Documentation/diff-options.txt        |  18 +
>  builtin/diff.c                        |   4 +
>  diff-lib.c                            |  36 +-
>  diff-no-index.c                       |   4 +
>  diff.c                                |  39 +++
>  diff.h                                |  11 +
>  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
>  tree-diff.c                           |   5 +
>  9 files changed, 597 insertions(+), 1 deletion(-)
>  create mode 100644 t/t4070-diff-sparse-checkout-scope.sh
>
> diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
> index 35a7bf86d77..52707e1b2d6 100644
> --- a/Documentation/config/diff.txt
> +++ b/Documentation/config/diff.txt
> @@ -201,6 +201,18 @@ diff.algorithm::
>  --
>  +
>
> +diff.scope::
> +       Choose diff scope. The variants are as follows:
> ++
> +--
> +`sparse`;;
> +       Restrict diff paths to those matching sparse-checkout specification.
> +`all`;;
> +       Without restriction, diff is performed regardless of whether the path
> +       meets the sparse-checkout specification.

As noted above, this is the wrong level to specify things.  The
description for "all" is misleading as well and suggests something
other than "behavior B" from the direction document.

>  diff.wsErrorHighlight::
>         Highlight whitespace errors in the `context`, `old` or `new`
>         lines of the diff.  Multiple values are separated by comma,
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 3674ac48e92..04bf83e8be1 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
>  non-default value and want to use the default one, then you
>  have to use `--diff-algorithm=default` option.
>
> +ifndef::git-format-patch[]
> +ifndef::git-log[]
> +
> +--scope={sparse|all}::
> +       Choose diff scope. The variants are as follows:
> ++
> +--
> +`--sparse`;;
> +       Restrict diff paths to those matching sparse-checkout specification.
> +`--all`;;
> +       Without restriction, diff is performed regardless of whether the path
> +       meets the sparse-checkout specification.
> +--
> ++
> +
> +endif::git-log[]
> +endif::git-format-patch[]

What about diff-files, diff-index, diff-tree, and show?

> +
>  --stat[=<width>[,<name-width>[,<count>]]]::
>         Generate a diffstat. By default, as much space as necessary
>         will be used for the filename part, and the rest for the graph
> diff --git a/builtin/diff.c b/builtin/diff.c
> index 854d2c5a5c4..6b450f7184c 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -54,6 +54,10 @@ static void stuff_change(struct diff_options *opt,
>             oideq(old_oid, new_oid) && (old_mode == new_mode))
>                 return;
>
> +       if (opt->scope == DIFF_SCOPE_SPARSE &&
> +           !diff_paths_in_sparse_checkout(old_path, new_path))
> +               return;

This can't be right.
   git diff c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e
72d42bd856228c15f702fa3c353432f4f1defe03
(to directly diff two known blobs) will go through this function, with
old_path == c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e and new_path ==
72d42bd856228c15f702fa3c353432f4f1defe03.  But those aren't real
paths, and sparse-checkout should not restrict what is shown in those
cases.

> +
>         if (opt->flags.reverse_diff) {
>                 SWAP(old_mode, new_mode);
>                 SWAP(old_oid, new_oid);
> diff --git a/diff-lib.c b/diff-lib.c
> index 2edea41a234..a3381f2e0ff 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -88,6 +88,22 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
>         return changed;
>  }
>
> +int diff_path_in_sparse_checkout(const char *path) {
> +       if (core_sparse_checkout_cone)
> +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> +       else
> +               return path_in_sparse_checkout(path, the_repository->index);
> +}

This says we are including the path if it matches the sparsity
patterns.  Thus, we have to be careful when we use this function,
because the relevant paths are ones that match the sparsity
specification.  The sparsity specification will always match the
sparsity patterns when diffing two commits, but when either the index
or the working tree is part of the diff, the sparsity specification
*might* be temporarily expanded.

> +int diff_paths_in_sparse_checkout(const char *one, const char*two) {
> +       if (one == two || !strcmp(one, two))
> +               return diff_path_in_sparse_checkout(one);
> +       else
> +               return diff_path_in_sparse_checkout(one) &&
> +                      diff_path_in_sparse_checkout(two);

Why && rather than || ?

> +}
> +
> +
>  int run_diff_files(struct rev_info *revs, unsigned int option)
>  {
>         int entries, i;
> @@ -113,6 +129,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>
>                 if (diff_can_quit_early(&revs->diffopt))
>                         break;
> +               if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> +                   !diff_path_in_sparse_checkout(ce->name))
> +                       continue;

Here you've cut off the possibility of showing diffs for anything
outside the sparsity patterns, which is a mistake.  We need to handle
a temporarily expanded sparse specification too.

>                 if (!ce_path_match(istate, ce, &revs->prune_data, NULL))
>                         continue;
> @@ -202,7 +221,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>                                 continue;
>                 }
>
> -               if (ce_uptodate(ce) || ce_skip_worktree(ce))
> +               if (ce_uptodate(ce) ||
> +                   (revs->diffopt.scope != DIFF_SCOPE_ALL && ce_skip_worktree(ce)))
>                         continue;

Here you make --scope=all show files even if they are skip-worktree,
making them appear to have been deleted.  I called out your
description earlier as potentially misleading, because it could imply
this behavior.  It looks like you were consistent with the description
and implementation, it just doesn't match what we want.

>                 /*
> @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,

do_oneway_diff is for cases where we are diffing against the index...

>                         return; /* nothing to diff.. */
>         }
>
> +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> +               if (idx && tree) {
> +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> +                               return;
> +               } else if (idx) {
> +                       if (!diff_path_in_sparse_checkout(idx->name))
> +                               return;
> +               } else if (tree) {
> +                       if (!diff_path_in_sparse_checkout(tree->name))
> +                               return;
> +               } else
> +                       return;
> +       }

...and you again mistakenly only compare to the sparsity patterns
instead of the sparse specification.

> +
>         /* if the entry is not checked out, don't examine work tree */
>         cached = o->index_only ||
>                 (idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));



> diff --git a/diff-no-index.c b/diff-no-index.c
> index 18edbdf4b59..ea94a104ea4 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -281,6 +281,10 @@ int diff_no_index(struct rev_info *revs,
>
>         fixup_paths(paths, &replacement);
>
> +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> +           !diff_paths_in_sparse_checkout(paths[0], paths[1]))
> +               goto out;

--no-index means we're diffing two files that are not tracked, or at
least treating them as not tracked.  sparse-checkout should not affect
such files.

> +
>         revs->diffopt.skip_stat_unmatch = 1;
>         if (!revs->diffopt.output_format)
>                 revs->diffopt.output_format = DIFF_FORMAT_PATCH;
> diff --git a/diff.c b/diff.c
> index 285d6e2d575..9de4044ae05 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -48,6 +48,7 @@ static int diff_interhunk_context_default;
>  static const char *diff_word_regex_cfg;
>  static const char *external_diff_cmd_cfg;
>  static const char *diff_order_file_cfg;
> +static const char *external_diff_scope_cfg;
>  int diff_auto_refresh_index = 1;
>  static int diff_mnemonic_prefix;
>  static int diff_no_prefix;
> @@ -57,6 +58,7 @@ static int diff_dirstat_permille_default = 30;
>  static struct diff_options default_diff_options;
>  static long diff_algorithm;
>  static unsigned ws_error_highlight_default = WSEH_NEW;
> +static enum diff_scope external_diff_scope;

Why is this called "external"?

>  static char diff_colors[][COLOR_MAXLEN] = {
>         GIT_COLOR_RESET,
> @@ -423,6 +425,16 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>                 return 0;
>         }
>
> +       if (!strcmp(var, "diff.scope")) {
> +               git_config_string(&external_diff_scope_cfg, var, value);
> +               if (!strcmp(value, "all"))
> +                       external_diff_scope = DIFF_SCOPE_ALL;
> +               else if (!strcmp(value, "sparse"))
> +                       external_diff_scope = DIFF_SCOPE_SPARSE;
> +               else
> +                       return -1;
> +       }
> +
>         if (git_color_config(var, value, cb) < 0)
>                 return -1;
>
> @@ -4663,6 +4675,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
>
>         options->color_moved = diff_color_moved_default;
>         options->color_moved_ws_handling = diff_color_moved_ws_default;
> +       options->scope = external_diff_scope;
>
>         prep_parse_options(options);
>  }
> @@ -4914,6 +4927,29 @@ static int parse_dirstat_opt(struct diff_options *options, const char *params)
>         return 1;
>  }
>
> +static int diff_opt_diff_scope(const struct option *option,
> +                               const char *optarg, int unset)
> +{
> +       struct diff_options *opt = option->value;
> +
> +       if (unset) {
> +               opt->scope = DIFF_SCOPE_NONE;

I think we should instead have a
    BUG_ON_OPT_NEG(unset)
or, even better, a
    BUG_ON_OPT_NEG_NOARG(unset, optarg)
at the beginning of this function...

> +       } else if (optarg) {

...which would also allow you to drop this if and dedent the rest of
the function.

> +               if (!strcmp(optarg, "all")) {
> +                       if (core_apply_sparse_checkout) {
> +                               opt->scope = DIFF_SCOPE_ALL;
> +                       }
> +               } else if (!strcmp(optarg, "sparse")) {
> +                       if (core_apply_sparse_checkout) {
> +                               opt->scope = DIFF_SCOPE_SPARSE;
> +                       }

If core_apply_sparse_checkout is false, should we perhaps throw an
error instead of just silently ignoring the option the user passed?

> +               } else
> +                       return error(_("invalid --scope value: %s"), optarg);
> +       }

As written with no follow-on else clause here, wouldn't this silently
accept "--scope" without an "=<something>" argument without an error
and without properly initializing opt->scope?

> +
> +       return 0;
> +}
> +
>  static int diff_opt_diff_filter(const struct option *option,
>                                 const char *optarg, int unset)
>  {
> @@ -5683,6 +5719,9 @@ static void prep_parse_options(struct diff_options *options)
>                 OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
>                                N_("select files by diff type"),
>                                PARSE_OPT_NONEG, diff_opt_diff_filter),
> +               OPT_CALLBACK_F(0, "scope", options, N_("[sparse|all]"),
> +                              N_("choose diff scope"),

maybe "choose diff scope in sparse checkouts"?

> +                              PARSE_OPT_OPTARG, diff_opt_diff_scope),
>                 { OPTION_CALLBACK, 0, "output", options, N_("<file>"),
>                   N_("output to a specific file"),
>                   PARSE_OPT_NONEG, NULL, 0, diff_opt_output },
> diff --git a/diff.h b/diff.h
> index 8ae18e5ab1e..90f7512034c 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -230,6 +230,12 @@ enum diff_submodule_format {
>         DIFF_SUBMODULE_INLINE_DIFF
>  };
>
> +enum diff_scope {
> +       DIFF_SCOPE_NONE = 0,
> +       DIFF_SCOPE_ALL,
> +       DIFF_SCOPE_SPARSE,
> +};
> +
>  /**
>   * the set of options the calling program wants to affect the operation of
>   * diffcore library with.
> @@ -285,6 +291,9 @@ struct diff_options {
>         /* diff-filter bits */
>         unsigned int filter, filter_not;
>
> +       /* diff sparse-checkout scope */
> +       enum diff_scope scope;
> +
>         int use_color;
>
>         /* Number of context lines to generate in patch output. */
> @@ -696,4 +705,6 @@ void print_stat_summary(FILE *fp, int files,
>                         int insertions, int deletions);
>  void setup_diff_pager(struct diff_options *);
>
> +int diff_path_in_sparse_checkout(const char *path);
> +int diff_paths_in_sparse_checkout(const char *one, const char *two);
>  #endif /* DIFF_H */
> diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> new file mode 100644

This needs to be fixed.

> index 00000000000..dca75a3308b
> --- /dev/null
> +++ b/t/t4070-diff-sparse-checkout-scope.sh
> @@ -0,0 +1,469 @@
> +#!/bin/sh
> +
> +test_description='diff sparse-checkout scope'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./test-lib.sh
> +
> +
> +test_expect_success 'setup' '
> +       git init temp &&
> +       (
> +               cd temp &&
> +               mkdir sub1 &&
> +               mkdir sub2 &&
> +               echo sub1/file1 >sub1/file1 &&
> +               echo sub2/file2 >sub2/file2 &&
> +               echo file1 >file1 &&
> +               echo file2 >file2 &&
> +               git add --all &&
> +               git commit -m init &&
> +               echo sub1/file1 >>sub1/file1 &&
> +               echo sub1/file2 >>sub1/file2 &&
> +               echo sub2/file1 >>sub2/file1 &&
> +               echo sub2/file2 >>sub2/file2 &&
> +               echo file1 >>file1 &&
> +               echo file2 >>file2 &&
> +               git add --all &&
> +               git commit -m change1 &&
> +               echo sub1/file1 >>sub1/file1 &&
> +               echo sub1/file2 >>sub1/file2 &&
> +               echo sub2/file1 >>sub2/file1 &&
> +               echo sub2/file2 >>sub2/file2 &&
> +               echo file1 >>file1 &&
> +               echo file2 >>file2 &&
> +               git add --all &&
> +               git commit -m change2
> +       )
> +'
> +
> +reset_repo () {
> +       rm -rf repo &&
> +       git clone --no-checkout temp repo

Why --no-checkout rather than say --sparse?

> +}
> +
> +reset_with_sparse_checkout() {
> +       reset_repo &&
> +       git -C repo sparse-checkout set $1 sub1 &&
> +       git -C repo checkout

Fixing the above would let us get rid of this really weird extra
checkout command too.

> +}
> +
> +change_worktree_and_index() {
> +       (
> +               cd repo &&
> +               mkdir sub2 sub3 &&
> +               echo sub1/file3 >sub1/file3 &&
> +               echo sub2/file3 >sub2/file3 &&
> +               echo sub3/file3 >sub3/file3 &&
> +               echo file3 >file3 &&
> +               git add --all --sparse &&
> +               echo sub1/file3 >>sub1/file3 &&
> +               echo sub2/file3 >>sub2/file3 &&
> +               echo sub3/file3 >>sub3/file3 &&
> +               echo file3 >>file3
> +       )
> +}

It would be nice to modify different paths in the working tree and
index, to see if we can handle cases where the sparse specification
does not match the sparsity patterns better.  (So, modify files inside
and outside the sparsity patterns, stage those changes, and then do a
`git sparse-checkout reapply` to make the files outside the sparsity
patterns disappear from the working tree...then modify different files
in the working tree both inside and outside the sparsity patterns.
And also remove some file that matches the sparsity patterns and
manually mark it as SKIP_WORKTREE.)  That'd give us much better
coverage.

> +
> +diff_scope() {
> +       title=$1
> +       need_change_worktree_and_index=$2
> +       sparse_checkout_option=$3
> +       scope_option=$4
> +       expect=$5
> +       shift 5
> +       args=("$@")
> +
> +       test_expect_success "$title $sparse_checkout_option $scope_option" "
> +               reset_with_sparse_checkout $sparse_checkout_option &&
> +               if test \"$need_change_worktree_and_index\" = \"true\" ; then
> +                       change_worktree_and_index
> +               fi &&
> +               git -C repo diff $scope_option ${args[*]} >actual &&
> +               if test -z \"$expect\" ; then
> +                       >expect
> +               else
> +                       cat > expect <<-EOF
> +$expect
> +                       EOF
> +               fi &&
> +               test_cmp expect actual
> +       "
> +}
> +
> +args=("--name-only" "HEAD" "HEAD~")
> +diff_scope builtin_diff_tree false "--no-cone" "--scope=sparse" \
> +"sub1/file1
> +sub1/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--no-cone" "--scope=all" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--no-cone" "--no-scope" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--cone" "--scope=sparse" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--cone" "--scope=all" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +diff_scope builtin_diff_tree false "--cone" "--no-scope" \
> +"file1
> +file2
> +sub1/file1
> +sub1/file2
> +sub2/file1
> +sub2/file2" "${args[@]}"
> +
> +args=("--name-only" "HEAD~")
> +diff_scope builtin_diff_index true "--no-cone" "--scope=sparse" \
> +"sub1/file1
> +sub1/file2
> +sub1/file3" "${args[@]}"

Here's a good example where the testcase is doing the wrong thing.
The expected answer here would also include file3, sub2/file3, and
sub3/file3.

> +
> +diff_scope builtin_diff_index true "--no-cone" "--scope=all" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +diff_scope builtin_diff_index true "--no-cone" "--no-scope" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +diff_scope builtin_diff_index true "--cone" "--scope=sparse" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3" "${args[@]}"

This is also wrong; it's missing sub2/file3 and sub3/file3.

> +
> +diff_scope builtin_diff_index true "--cone" "--scope=all" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +diff_scope builtin_diff_index true "--cone" "--no-scope" \
> +"file1
> +file2
> +file3
> +sub1/file1
> +sub1/file2
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3
> +sub3/file3" "${args[@]}"
> +
> +args=("--name-only" "file3" "sub1/" "sub2/")
> +
> +diff_scope builtin_diff_files true "--no-cone" "--scope=sparse" \
> +"sub1/file3" "${args[@]}"

This should also include file3, sub2/file3, and sub3/file3.
`--scope=` should not affect diff output at all if neither --cached
nor revision arguments are supplied.

> +
> +diff_scope builtin_diff_files true "--no-cone" "--scope=all" \
> +"file3
> +sub1/file3
> +sub2/file1
> +sub2/file2
> +sub2/file3" "${args[@]}"

This is wrong due to including too much; it should not include
sub2/file1 or sub2/file2 (it is only including those because it is
showing them as deleted, when they are not deleted but are
SKIP_WORKTREE).

I think I'm going to stop reviewing here.  I'm probably just going to
keep repeating the same issues I identified earlier if I continue.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] diff: introduce scope option
  2022-11-01  5:18 ` Elijah Newren
@ 2022-11-06  2:11   ` ZheNing Hu
  2022-11-06  6:58     ` Elijah Newren
  0 siblings, 1 reply; 11+ messages in thread
From: ZheNing Hu @ 2022-11-06  2:11 UTC (permalink / raw)
  To: Elijah Newren
  Cc: ZheNing Hu via GitGitGadget, git, Christian Couder,
	Ævar Arnfjörð Bjarmason, Jeff King, Junio C Hamano,
	Derrick Stolee, Johannes Schindelin, Victoria Dye, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan

   inHi,

Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
>
> Hi,
>
> On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use sparse-checkout, we often want the set of files
> > that some commands operate on to be restricted to the
> > sparse-checkout specification.
>
> It seems a bit premature to send this, when the guideline document[*]
> detailing how these options should work is still in the "Needs Review"
> state.  I know, it's been waiting for a while, but it's a _long_
> document.
>
> [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
>

Fine, I just want to start trying to experiment with this feature in
git-diff earlier,
and I can wait for the sparse-checkout.txt documentation to finish
first if needed :)

> > So introduce the `--scope` option to git diff, which have two
> > value: "sparse" and "all". "sparse" mean that diff is performed
> > restrict to paths which matching sparse-checkout specification,
> > "all" mean that diff is performed regardless of whether the path
> > meets the sparse-checkout specification.
>
> The wording probably needs some care to reflect the fact that it only
> affects cases where either --cached or revisions are passed.  In
> particular, your wording for "all" suggests behavior very different
> from today, whereas "all" is probably best thought of as today's
> current behavior.  For example, a plain `git diff` without --cached or
> revisions, should be unaffected by either of these flags.
>

Yes, after re-reading your sparse-checkout.txt patch, I realized that I
misinterpreted "--scope=sparse" as sparse patterns instead of sparse
specification, and misinterpreted "-scope=all" as diff on all files.

> > `--no-scope` is the default
> > option for now.
>
> What does --no-scope even mean?  You didn't explain it, and I don't
> see how it could make sense.  We explicitly avoided a `--no-` prefix
> by allowing the --scope option to take a value.  I don't think there
> should be a --no-scope option.
>

I think the “--no-scope” here does nothing, as if it were unaffected by scope
(just like correctly "--scope=all", right?)

> > Add `diff.scope={sparse, all}` config, which can also have the same
> > capabilities as `--scope`, and it will be covered by `--scope` option.
>
> This is not what we want.  The high level usecases should not need to
> be configured per-command.  There should be a config option which
> reflects the high level use cases (e.g. sparse.scope) and then all
> relevant commands (diff, log, grep, etc.) can key off it.
>

Ok, using a global config should indeed be more useful.

> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [RFC] diff: introduce scope option
> >
> >     In [1], we discovered that users working on different sparse-checkout
> >     specification may download unnecessary blobs from each other's
> >     specification in collaboration. In [2] Junio suggested that maybe we can
> >     restrict some git command's filespec in sparse-checkout specification to
> >     elegantly solve this problem above. In [3]: Newren and Derrick Stolee
> >     prefer to name the option --scope={sparse, all}.
> >
> >     So this patch is attempt to do this thing on git diff:
> >
> >     v1:
> >
> >      1. add --restrict option to git diff, which restrict diff filespec in
> >         sparse-checkout specification. [4] v2.
> >      2. rename --restrict to --scope={sparse, all}, support --no-scope.
> >      3. add config: diff.scope={sparse,all}.
> >
> >     Unresolved work:
> >
> >      1. how to properly pass this --scope={sparse, all} to other commands
> >         like git log, git format-patch, etc.
>
> log & grep should accept a similar flag.  format-patch should not, and
> should ignore any config in this area.
>
> >      2. how to set the default value of scope for different diff commands.
>
> I don't understand this.
>

Oh, I was just curious if the config defaults for scope needed to be configured
separately for the different diff commands  git diff-files, git diff-index,
git diff-no-index, git diff-tree, since sparse-checkout.txt mentions
the different
behavior of scope for them. Now I think this just needs to be handled in command
code logic and no additional command level configuration is needed.

> >     [1]:
> >     https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
> >     [2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
> >     https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
> >     [4]:
> >     https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/1398
> >
> >  Documentation/config/diff.txt         |  12 +
> >  Documentation/diff-options.txt        |  18 +
> >  builtin/diff.c                        |   4 +
> >  diff-lib.c                            |  36 +-
> >  diff-no-index.c                       |   4 +
> >  diff.c                                |  39 +++
> >  diff.h                                |  11 +
> >  t/t4070-diff-sparse-checkout-scope.sh | 469 ++++++++++++++++++++++++++
> >  tree-diff.c                           |   5 +
> >  9 files changed, 597 insertions(+), 1 deletion(-)
> >  create mode 100644 t/t4070-diff-sparse-checkout-scope.sh
> >
> > diff --git a/Documentation/config/diff.txt b/Documentation/config/diff.txt
> > index 35a7bf86d77..52707e1b2d6 100644
> > --- a/Documentation/config/diff.txt
> > +++ b/Documentation/config/diff.txt
> > @@ -201,6 +201,18 @@ diff.algorithm::
> >  --
> >  +
> >
> > +diff.scope::
> > +       Choose diff scope. The variants are as follows:
> > ++
> > +--
> > +`sparse`;;
> > +       Restrict diff paths to those matching sparse-checkout specification.
> > +`all`;;
> > +       Without restriction, diff is performed regardless of whether the path
> > +       meets the sparse-checkout specification.
>
> As noted above, this is the wrong level to specify things.  The
> description for "all" is misleading as well and suggests something
> other than "behavior B" from the direction document.
>

So do we need to make "--scope=all" correspond to the "behavior B",
The correct description of it should be: "worktree-sparse-history-dense"...

> >  diff.wsErrorHighlight::
> >         Highlight whitespace errors in the `context`, `old` or `new`
> >         lines of the diff.  Multiple values are separated by comma,
> > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > index 3674ac48e92..04bf83e8be1 100644
> > --- a/Documentation/diff-options.txt
> > +++ b/Documentation/diff-options.txt
> > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> >  non-default value and want to use the default one, then you
> >  have to use `--diff-algorithm=default` option.
> >
> > +ifndef::git-format-patch[]
> > +ifndef::git-log[]
> > +
> > +--scope={sparse|all}::
> > +       Choose diff scope. The variants are as follows:
> > ++
> > +--
> > +`--sparse`;;
> > +       Restrict diff paths to those matching sparse-checkout specification.
> > +`--all`;;
> > +       Without restriction, diff is performed regardless of whether the path
> > +       meets the sparse-checkout specification.
> > +--
> > ++
> > +
> > +endif::git-log[]
> > +endif::git-format-patch[]
>
> What about diff-files, diff-index, diff-tree, and show?
>

diff-options.txt included in their documents, and git-format-patch.txt,
git-log.txt, but should not in git-format-patch.txt. I don't know if it
should be included in git-diff-files.txt, because git diff-files compare
the files in the working tree and the index (so it's the same as
"simple" git-diff, which should not be affected by scope?)

> > +
> >  --stat[=<width>[,<name-width>[,<count>]]]::
> >         Generate a diffstat. By default, as much space as necessary
> >         will be used for the filename part, and the rest for the graph
> > diff --git a/builtin/diff.c b/builtin/diff.c
> > index 854d2c5a5c4..6b450f7184c 100644
> > --- a/builtin/diff.c
> > +++ b/builtin/diff.c
> > @@ -54,6 +54,10 @@ static void stuff_change(struct diff_options *opt,
> >             oideq(old_oid, new_oid) && (old_mode == new_mode))
> >                 return;
> >
> > +       if (opt->scope == DIFF_SCOPE_SPARSE &&
> > +           !diff_paths_in_sparse_checkout(old_path, new_path))
> > +               return;
>
> This can't be right.
>    git diff c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e
> 72d42bd856228c15f702fa3c353432f4f1defe03
> (to directly diff two known blobs) will go through this function, with
> old_path == c231e0f26fe9b2ea9ec46aa68ff95ba984ce592e and new_path ==
> 72d42bd856228c15f702fa3c353432f4f1defe03.  But those aren't real
> paths, and sparse-checkout should not restrict what is shown in those
> cases.
>

Yeah, it's buggy that I forget to check two blobs without paths.

> > +
> >         if (opt->flags.reverse_diff) {
> >                 SWAP(old_mode, new_mode);
> >                 SWAP(old_oid, new_oid);
> > diff --git a/diff-lib.c b/diff-lib.c
> > index 2edea41a234..a3381f2e0ff 100644
> > --- a/diff-lib.c
> > +++ b/diff-lib.c
> > @@ -88,6 +88,22 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
> >         return changed;
> >  }
> >
> > +int diff_path_in_sparse_checkout(const char *path) {
> > +       if (core_sparse_checkout_cone)
> > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > +       else
> > +               return path_in_sparse_checkout(path, the_repository->index);
> > +}
>
> This says we are including the path if it matches the sparsity
> patterns.  Thus, we have to be careful when we use this function,
> because the relevant paths are ones that match the sparsity
> specification.  The sparsity specification will always match the
> sparsity patterns when diffing two commits, but when either the index
> or the working tree is part of the diff, the sparsity specification
> *might* be temporarily expanded.
>

Yes, I may have to look at more code to better understand how and when the
"sparsity specification" is extended. Any recommendations for places to read?

> > +int diff_paths_in_sparse_checkout(const char *one, const char*two) {
> > +       if (one == two || !strcmp(one, two))
> > +               return diff_path_in_sparse_checkout(one);
> > +       else
> > +               return diff_path_in_sparse_checkout(one) &&
> > +                      diff_path_in_sparse_checkout(two);
>
> Why && rather than || ?
>

Agree, we do need to use || here, because the one diff side in the sparse
specification, we should diff the two files.

> > +}
> > +
> > +
> >  int run_diff_files(struct rev_info *revs, unsigned int option)
> >  {
> >         int entries, i;
> > @@ -113,6 +129,9 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> >
> >                 if (diff_can_quit_early(&revs->diffopt))
> >                         break;
> > +               if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> > +                   !diff_path_in_sparse_checkout(ce->name))
> > +                       continue;
>
> Here you've cut off the possibility of showing diffs for anything
> outside the sparsity patterns, which is a mistake.  We need to handle
> a temporarily expanded sparse specification too.
>

Agree.

> >                 if (!ce_path_match(istate, ce, &revs->prune_data, NULL))
> >                         continue;
> > @@ -202,7 +221,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> >                                 continue;
> >                 }
> >
> > -               if (ce_uptodate(ce) || ce_skip_worktree(ce))
> > +               if (ce_uptodate(ce) ||
> > +                   (revs->diffopt.scope != DIFF_SCOPE_ALL && ce_skip_worktree(ce)))
> >                         continue;
>
> Here you make --scope=all show files even if they are skip-worktree,
> making them appear to have been deleted.  I called out your
> description earlier as potentially misleading, because it could imply
> this behavior.  It looks like you were consistent with the description
> and implementation, it just doesn't match what we want.
>

Agree too. So we should not do anything in run_diff_files.

> >                 /*
> > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
>
> do_oneway_diff is for cases where we are diffing against the index...
>
> >                         return; /* nothing to diff.. */
> >         }
> >
> > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > +               if (idx && tree) {
> > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > +                               return;
> > +               } else if (idx) {
> > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > +                               return;
> > +               } else if (tree) {
> > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > +                               return;
> > +               } else
> > +                       return;
> > +       }
>
> ...and you again mistakenly only compare to the sparsity patterns
> instead of the sparse specification.
>

So here we should use ce_skip_worktree(idx) to check if this index entry matches
sparse specification.

> > +
> >         /* if the entry is not checked out, don't examine work tree */
> >         cached = o->index_only ||
> >                 (idx && ((idx->ce_flags & CE_VALID) || ce_skip_worktree(idx)));
>
>
>
> > diff --git a/diff-no-index.c b/diff-no-index.c
> > index 18edbdf4b59..ea94a104ea4 100644
> > --- a/diff-no-index.c
> > +++ b/diff-no-index.c
> > @@ -281,6 +281,10 @@ int diff_no_index(struct rev_info *revs,
> >
> >         fixup_paths(paths, &replacement);
> >
> > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE &&
> > +           !diff_paths_in_sparse_checkout(paths[0], paths[1]))
> > +               goto out;
>
> --no-index means we're diffing two files that are not tracked, or at
> least treating them as not tracked.  sparse-checkout should not affect
> such files.
>

Yeah, we should care about untracked files sparse-checkout only when
we are using git add/rm/update-index...

> > +
> >         revs->diffopt.skip_stat_unmatch = 1;
> >         if (!revs->diffopt.output_format)
> >                 revs->diffopt.output_format = DIFF_FORMAT_PATCH;
> > diff --git a/diff.c b/diff.c
> > index 285d6e2d575..9de4044ae05 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -48,6 +48,7 @@ static int diff_interhunk_context_default;
> >  static const char *diff_word_regex_cfg;
> >  static const char *external_diff_cmd_cfg;
> >  static const char *diff_order_file_cfg;
> > +static const char *external_diff_scope_cfg;
> >  int diff_auto_refresh_index = 1;
> >  static int diff_mnemonic_prefix;
> >  static int diff_no_prefix;
> > @@ -57,6 +58,7 @@ static int diff_dirstat_permille_default = 30;
> >  static struct diff_options default_diff_options;
> >  static long diff_algorithm;
> >  static unsigned ws_error_highlight_default = WSEH_NEW;
> > +static enum diff_scope external_diff_scope;
>
> Why is this called "external"?
>

Learn from external_diff_cmd_cfg, I should remove it.

> >  static char diff_colors[][COLOR_MAXLEN] = {
> >         GIT_COLOR_RESET,
> > @@ -423,6 +425,16 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
> >                 return 0;
> >         }
> >
> > +       if (!strcmp(var, "diff.scope")) {
> > +               git_config_string(&external_diff_scope_cfg, var, value);
> > +               if (!strcmp(value, "all"))
> > +                       external_diff_scope = DIFF_SCOPE_ALL;
> > +               else if (!strcmp(value, "sparse"))
> > +                       external_diff_scope = DIFF_SCOPE_SPARSE;
> > +               else
> > +                       return -1;
> > +       }
> > +
> >         if (git_color_config(var, value, cb) < 0)
> >                 return -1;
> >
> > @@ -4663,6 +4675,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
> >
> >         options->color_moved = diff_color_moved_default;
> >         options->color_moved_ws_handling = diff_color_moved_ws_default;
> > +       options->scope = external_diff_scope;
> >
> >         prep_parse_options(options);
> >  }
> > @@ -4914,6 +4927,29 @@ static int parse_dirstat_opt(struct diff_options *options, const char *params)
> >         return 1;
> >  }
> >
> > +static int diff_opt_diff_scope(const struct option *option,
> > +                               const char *optarg, int unset)
> > +{
> > +       struct diff_options *opt = option->value;
> > +
> > +       if (unset) {
> > +               opt->scope = DIFF_SCOPE_NONE;
>
> I think we should instead have a
>     BUG_ON_OPT_NEG(unset)
> or, even better, a
>     BUG_ON_OPT_NEG_NOARG(unset, optarg)
> at the beginning of this function...
>
> > +       } else if (optarg) {
>
> ...which would also allow you to drop this if and dedent the rest of
> the function.
>

Agree.

> > +               if (!strcmp(optarg, "all")) {
> > +                       if (core_apply_sparse_checkout) {
> > +                               opt->scope = DIFF_SCOPE_ALL;
> > +                       }
> > +               } else if (!strcmp(optarg, "sparse")) {
> > +                       if (core_apply_sparse_checkout) {
> > +                               opt->scope = DIFF_SCOPE_SPARSE;
> > +                       }
>
> If core_apply_sparse_checkout is false, should we perhaps throw an
> error instead of just silently ignoring the option the user passed?
>

Agree.

> > +               } else
> > +                       return error(_("invalid --scope value: %s"), optarg);
> > +       }
>
> As written with no follow-on else clause here, wouldn't this silently
> accept "--scope" without an "=<something>" argument without an error
> and without properly initializing opt->scope?
>

Because opt will be initializing with default_diff_options in repo_diff_setup(),
and opt->scope should respect config value first. So I don't think there's a
mistake here.

> > +
> > +       return 0;
> > +}
> > +
> >  static int diff_opt_diff_filter(const struct option *option,
> >                                 const char *optarg, int unset)
> >  {
> > @@ -5683,6 +5719,9 @@ static void prep_parse_options(struct diff_options *options)
> >                 OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
> >                                N_("select files by diff type"),
> >                                PARSE_OPT_NONEG, diff_opt_diff_filter),
> > +               OPT_CALLBACK_F(0, "scope", options, N_("[sparse|all]"),
> > +                              N_("choose diff scope"),
>
> maybe "choose diff scope in sparse checkouts"?
>

OK.

> > +                              PARSE_OPT_OPTARG, diff_opt_diff_scope),
> >                 { OPTION_CALLBACK, 0, "output", options, N_("<file>"),
> >                   N_("output to a specific file"),
> >                   PARSE_OPT_NONEG, NULL, 0, diff_opt_output },
> > diff --git a/diff.h b/diff.h
> > index 8ae18e5ab1e..90f7512034c 100644
> > --- a/diff.h
> > +++ b/diff.h
> > @@ -230,6 +230,12 @@ enum diff_submodule_format {
> >         DIFF_SUBMODULE_INLINE_DIFF
> >  };
> >
> > +enum diff_scope {
> > +       DIFF_SCOPE_NONE = 0,
> > +       DIFF_SCOPE_ALL,
> > +       DIFF_SCOPE_SPARSE,
> > +};
> > +
> >  /**
> >   * the set of options the calling program wants to affect the operation of
> >   * diffcore library with.
> > @@ -285,6 +291,9 @@ struct diff_options {
> >         /* diff-filter bits */
> >         unsigned int filter, filter_not;
> >
> > +       /* diff sparse-checkout scope */
> > +       enum diff_scope scope;
> > +
> >         int use_color;
> >
> >         /* Number of context lines to generate in patch output. */
> > @@ -696,4 +705,6 @@ void print_stat_summary(FILE *fp, int files,
> >                         int insertions, int deletions);
> >  void setup_diff_pager(struct diff_options *);
> >
> > +int diff_path_in_sparse_checkout(const char *path);
> > +int diff_paths_in_sparse_checkout(const char *one, const char *two);
> >  #endif /* DIFF_H */
> > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > new file mode 100644
>
> This needs to be fixed.
>
> > index 00000000000..dca75a3308b
> > --- /dev/null
> > +++ b/t/t4070-diff-sparse-checkout-scope.sh
> > @@ -0,0 +1,469 @@
> > +#!/bin/sh
> > +
> > +test_description='diff sparse-checkout scope'
> > +
> > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> > +
> > +. ./test-lib.sh
> > +
> > +
> > +test_expect_success 'setup' '
> > +       git init temp &&
> > +       (
> > +               cd temp &&
> > +               mkdir sub1 &&
> > +               mkdir sub2 &&
> > +               echo sub1/file1 >sub1/file1 &&
> > +               echo sub2/file2 >sub2/file2 &&
> > +               echo file1 >file1 &&
> > +               echo file2 >file2 &&
> > +               git add --all &&
> > +               git commit -m init &&
> > +               echo sub1/file1 >>sub1/file1 &&
> > +               echo sub1/file2 >>sub1/file2 &&
> > +               echo sub2/file1 >>sub2/file1 &&
> > +               echo sub2/file2 >>sub2/file2 &&
> > +               echo file1 >>file1 &&
> > +               echo file2 >>file2 &&
> > +               git add --all &&
> > +               git commit -m change1 &&
> > +               echo sub1/file1 >>sub1/file1 &&
> > +               echo sub1/file2 >>sub1/file2 &&
> > +               echo sub2/file1 >>sub2/file1 &&
> > +               echo sub2/file2 >>sub2/file2 &&
> > +               echo file1 >>file1 &&
> > +               echo file2 >>file2 &&
> > +               git add --all &&
> > +               git commit -m change2
> > +       )
> > +'
> > +
> > +reset_repo () {
> > +       rm -rf repo &&
> > +       git clone --no-checkout temp repo
>
> Why --no-checkout rather than say --sparse?
>

This is because I accidentally associated it with a
partial clone. I often use "git clone -filter=blob:none -no-checkout"
to do a partial clone, then "git sparse- checkout set <pattern>"
to set sparse-checkout patterns, and "git checkout" to download
the missing blobs and checkout to a branch. But in this
test file, we only need sparse-checkout, so it's true that I should
not do such strange no-checkout thing.

> > +}
> > +
> > +reset_with_sparse_checkout() {
> > +       reset_repo &&
> > +       git -C repo sparse-checkout set $1 sub1 &&
> > +       git -C repo checkout
>
> Fixing the above would let us get rid of this really weird extra
> checkout command too.
>
> > +}
> > +
> > +change_worktree_and_index() {
> > +       (
> > +               cd repo &&
> > +               mkdir sub2 sub3 &&
> > +               echo sub1/file3 >sub1/file3 &&
> > +               echo sub2/file3 >sub2/file3 &&
> > +               echo sub3/file3 >sub3/file3 &&
> > +               echo file3 >file3 &&
> > +               git add --all --sparse &&
> > +               echo sub1/file3 >>sub1/file3 &&
> > +               echo sub2/file3 >>sub2/file3 &&
> > +               echo sub3/file3 >>sub3/file3 &&
> > +               echo file3 >>file3
> > +       )
> > +}
>
> It would be nice to modify different paths in the working tree and
> index, to see if we can handle cases where the sparse specification
> does not match the sparsity patterns better.  (So, modify files inside
> and outside the sparsity patterns, stage those changes, and then do a
> `git sparse-checkout reapply` to make the files outside the sparsity
> patterns disappear from the working tree...then modify different files
> in the working tree both inside and outside the sparsity patterns.
> And also remove some file that matches the sparsity patterns and
> manually mark it as SKIP_WORKTREE.)  That'd give us much better
> coverage.
>

Nice addition. So I should use git update-index --skip-worktree to set
skip_worktree bit for index entries.

> > +
> > +diff_scope() {
> > +       title=$1
> > +       need_change_worktree_and_index=$2
> > +       sparse_checkout_option=$3
> > +       scope_option=$4
> > +       expect=$5
> > +       shift 5
> > +       args=("$@")
> > +
> > +       test_expect_success "$title $sparse_checkout_option $scope_option" "
> > +               reset_with_sparse_checkout $sparse_checkout_option &&
> > +               if test \"$need_change_worktree_and_index\" = \"true\" ; then
> > +                       change_worktree_and_index
> > +               fi &&
> > +               git -C repo diff $scope_option ${args[*]} >actual &&
> > +               if test -z \"$expect\" ; then
> > +                       >expect
> > +               else
> > +                       cat > expect <<-EOF
> > +$expect
> > +                       EOF
> > +               fi &&
> > +               test_cmp expect actual
> > +       "
> > +}
> > +
> > +args=("--name-only" "HEAD" "HEAD~")
> > +diff_scope builtin_diff_tree false "--no-cone" "--scope=sparse" \
> > +"sub1/file1
> > +sub1/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--no-cone" "--scope=all" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--no-cone" "--no-scope" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--cone" "--scope=sparse" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--cone" "--scope=all" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +diff_scope builtin_diff_tree false "--cone" "--no-scope" \
> > +"file1
> > +file2
> > +sub1/file1
> > +sub1/file2
> > +sub2/file1
> > +sub2/file2" "${args[@]}"
> > +
> > +args=("--name-only" "HEAD~")
> > +diff_scope builtin_diff_index true "--no-cone" "--scope=sparse" \
> > +"sub1/file1
> > +sub1/file2
> > +sub1/file3" "${args[@]}"
>
> Here's a good example where the testcase is doing the wrong thing.
> The expected answer here would also include file3, sub2/file3, and
> sub3/file3.
>

Yeah. Files that are not part of the sparse-checkout patterns are temporarily
extended into the sparse specification.

> > +
> > +diff_scope builtin_diff_index true "--no-cone" "--scope=all" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +diff_scope builtin_diff_index true "--no-cone" "--no-scope" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +diff_scope builtin_diff_index true "--cone" "--scope=sparse" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3" "${args[@]}"
>
> This is also wrong; it's missing sub2/file3 and sub3/file3.
>
> > +
> > +diff_scope builtin_diff_index true "--cone" "--scope=all" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +diff_scope builtin_diff_index true "--cone" "--no-scope" \
> > +"file1
> > +file2
> > +file3
> > +sub1/file1
> > +sub1/file2
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3
> > +sub3/file3" "${args[@]}"
> > +
> > +args=("--name-only" "file3" "sub1/" "sub2/")
> > +
> > +diff_scope builtin_diff_files true "--no-cone" "--scope=sparse" \
> > +"sub1/file3" "${args[@]}"
>
> This should also include file3, sub2/file3, and sub3/file3.
> `--scope=` should not affect diff output at all if neither --cached
> nor revision arguments are supplied.
>

Agree.

> > +
> > +diff_scope builtin_diff_files true "--no-cone" "--scope=all" \
> > +"file3
> > +sub1/file3
> > +sub2/file1
> > +sub2/file2
> > +sub2/file3" "${args[@]}"
>
> This is wrong due to including too much; it should not include
> sub2/file1 or sub2/file2 (it is only including those because it is
> showing them as deleted, when they are not deleted but are
> SKIP_WORKTREE).
>

Agree.

> I think I'm going to stop reviewing here.  I'm probably just going to
> keep repeating the same issues I identified earlier if I continue.

Thank you very much for your review, you have pointed out very many
problems with this patch :)

--
ZheNing Hu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] diff: introduce scope option
  2022-11-06  2:11   ` ZheNing Hu
@ 2022-11-06  6:58     ` Elijah Newren
  2022-11-14  9:08       ` ZheNing Hu
  0 siblings, 1 reply; 11+ messages in thread
From: Elijah Newren @ 2022-11-06  6:58 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, git, Christian Couder,
	Ævar Arnfjörð Bjarmason, Jeff King, Junio C Hamano,
	Derrick Stolee, Johannes Schindelin, Victoria Dye, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan

Hi,

On Sat, Nov 5, 2022 at 7:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
> > On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > From: ZheNing Hu <adlternative@gmail.com>
> > >
> > > When we use sparse-checkout, we often want the set of files
> > > that some commands operate on to be restricted to the
> > > sparse-checkout specification.
> >
> > It seems a bit premature to send this, when the guideline document[*]
> > detailing how these options should work is still in the "Needs Review"
> > state.  I know, it's been waiting for a while, but it's a _long_
> > document.
> >
> > [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
> >
>
> Fine, I just want to start trying to experiment with this feature in
> git-diff earlier,
> and I can wait for the sparse-checkout.txt documentation to finish
> first if needed :)

Note that you may be able to help reduce the wait by reviewing the
document.  (You commented on v1 -- thanks! -- but no one has commented
on any of the newer versions, despite being out for over a month and
still being marked as "Needs Review" in the "What's cooking" reports.)

> > > `--no-scope` is the default
> > > option for now.
> >
> > What does --no-scope even mean?  You didn't explain it, and I don't
> > see how it could make sense.  We explicitly avoided a `--no-` prefix
> > by allowing the --scope option to take a value.  I don't think there
> > should be a --no-scope option.
>
> I think the “--no-scope” here does nothing, as if it were unaffected by scope
> (just like correctly "--scope=all", right?)

Reading your patch, I was unable to determine where you made
--no-scope behave differently than how you implemented --scope=all.
Maybe there was a difference and I missed it.

Anyway, I don't think there should be a --no-scope option.

> > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > > index 3674ac48e92..04bf83e8be1 100644
> > > --- a/Documentation/diff-options.txt
> > > +++ b/Documentation/diff-options.txt
> > > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> > >  non-default value and want to use the default one, then you
> > >  have to use `--diff-algorithm=default` option.
> > >
> > > +ifndef::git-format-patch[]
> > > +ifndef::git-log[]
> > > +
> > > +--scope={sparse|all}::
> > > +       Choose diff scope. The variants are as follows:
> > > ++
> > > +--
> > > +`--sparse`;;
> > > +       Restrict diff paths to those matching sparse-checkout specification.
> > > +`--all`;;
> > > +       Without restriction, diff is performed regardless of whether the path
> > > +       meets the sparse-checkout specification.
> > > +--
> > > ++
> > > +
> > > +endif::git-log[]
> > > +endif::git-format-patch[]
> >
> > What about diff-files, diff-index, diff-tree, and show?
> >
>
> diff-options.txt included in their documents, and git-format-patch.txt,
> git-log.txt, but should not in git-format-patch.txt. I don't know if it
> should be included in git-diff-files.txt, because git diff-files compare
> the files in the working tree and the index (so it's the same as
> "simple" git-diff, which should not be affected by scope?)

Oh, good point.  Yeah, git-diff-files.txt should not get this option
as it won't have any affect.  git-diff-index and git-diff-tree and
git-show and git-log should get it, though.

> > > +int diff_path_in_sparse_checkout(const char *path) {
> > > +       if (core_sparse_checkout_cone)
> > > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > > +       else
> > > +               return path_in_sparse_checkout(path, the_repository->index);
> > > +}
> >
> > This says we are including the path if it matches the sparsity
> > patterns.  Thus, we have to be careful when we use this function,
> > because the relevant paths are ones that match the sparsity
> > specification.  The sparsity specification will always match the
> > sparsity patterns when diffing two commits, but when either the index
> > or the working tree is part of the diff, the sparsity specification
> > *might* be temporarily expanded.
>
> Yes, I may have to look at more code to better understand how and when the
> "sparsity specification" is extended. Any recommendations for places to read?

Yes, please review the newer version of my sparse-checkout.txt
directions file; it covers this in more detail.

> > > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
> >
> > do_oneway_diff is for cases where we are diffing against the index...
> >
> > >                         return; /* nothing to diff.. */
> > >         }
> > >
> > > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > > +               if (idx && tree) {
> > > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > > +                               return;
> > > +               } else if (idx) {
> > > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > > +                               return;
> > > +               } else if (tree) {
> > > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > > +                               return;
> > > +               } else
> > > +                       return;
> > > +       }
> >
> > ...and you again mistakenly only compare to the sparsity patterns
> > instead of the sparse specification.
> >
>
> So here we should use ce_skip_worktree(idx) to check if this index entry matches
> sparse specification.

ce_skip_worktree(idx) only checks for expansions to the sparse
specification in the working tree.  If dealing with index files, the
sparse specification is also expanded to handle all files in the index
that differ from HEAD.

> > > +               } else
> > > +                       return error(_("invalid --scope value: %s"), optarg);
> > > +       }
> >
> > As written with no follow-on else clause here, wouldn't this silently
> > accept "--scope" without an "=<something>" argument without an error
> > and without properly initializing opt->scope?
> >
>
> Because opt will be initializing with default_diff_options in repo_diff_setup(),
> and opt->scope should respect config value first. So I don't think there's a
> mistake here.

Okay, it's good that you've got the variables initialized somehow, but
that's only half the point here.  The main point is that the user can
specify something that makes no sense and if they do, we should throw
an error informing them of their mistake.  --scope should not be
passed without an argument (either "all" or "sparse", currently), but
this code allowed it.

> > > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > > new file mode 100644
> >
> > This needs to be fixed.

Since you didn't comment on this but usually do comment on thing, I'll
just re-ping it to make sure you don't miss the comment about the
incorrect file mode here.

> > > +reset_repo () {
> > > +       rm -rf repo &&
> > > +       git clone --no-checkout temp repo
> >
> > Why --no-checkout rather than say --sparse?
> >
>
> This is because I accidentally associated it with a
> partial clone. I often use "git clone -filter=blob:none -no-checkout"
> to do a partial clone, then "git sparse- checkout set <pattern>"
> to set sparse-checkout patterns, and "git checkout" to download
> the missing blobs and checkout to a branch. But in this
> test file, we only need sparse-checkout, so it's true that I should
> not do such strange no-checkout thing.

Yeah, no need to involve partial clones here.  (As an aside, though, I
think even with a partial clone that --sparse makes more sense than
--no-checkout.)

> > > +change_worktree_and_index() {
> > > +       (
> > > +               cd repo &&
> > > +               mkdir sub2 sub3 &&
> > > +               echo sub1/file3 >sub1/file3 &&
> > > +               echo sub2/file3 >sub2/file3 &&
> > > +               echo sub3/file3 >sub3/file3 &&
> > > +               echo file3 >file3 &&
> > > +               git add --all --sparse &&
> > > +               echo sub1/file3 >>sub1/file3 &&
> > > +               echo sub2/file3 >>sub2/file3 &&
> > > +               echo sub3/file3 >>sub3/file3 &&
> > > +               echo file3 >>file3
> > > +       )
> > > +}
> >
> > It would be nice to modify different paths in the working tree and
> > index, to see if we can handle cases where the sparse specification
> > does not match the sparsity patterns better.  (So, modify files inside
> > and outside the sparsity patterns, stage those changes, and then do a
> > `git sparse-checkout reapply` to make the files outside the sparsity
> > patterns disappear from the working tree...then modify different files
> > in the working tree both inside and outside the sparsity patterns.
> > And also remove some file that matches the sparsity patterns and
> > manually mark it as SKIP_WORKTREE.)  That'd give us much better
> > coverage.
> >
>
> Nice addition. So I should use git update-index --skip-worktree to set
> skip_worktree bit for index entries.

Well, I'd say use normal "sparse-checkout" commands to set most of
them.  However, adding one or two extra SKIP_WORKTREE paths via
running `git update-index --skip-worktree $PATH` (and removing the
corresponding local file) would make the testcases more interesting.

> > I think I'm going to stop reviewing here.  I'm probably just going to
> > keep repeating the same issues I identified earlier if I continue.
>
> Thank you very much for your review, you have pointed out very many
> problems with this patch :)

I'm glad it was helpful.  :)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] diff: introduce scope option
  2022-11-06  6:58     ` Elijah Newren
@ 2022-11-14  9:08       ` ZheNing Hu
  0 siblings, 0 replies; 11+ messages in thread
From: ZheNing Hu @ 2022-11-14  9:08 UTC (permalink / raw)
  To: Elijah Newren
  Cc: ZheNing Hu via GitGitGadget, git, Christian Couder,
	Ævar Arnfjörð Bjarmason, Jeff King, Junio C Hamano,
	Derrick Stolee, Johannes Schindelin, Victoria Dye, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan

Elijah Newren <newren@gmail.com> 于2022年11月6日周日 14:58写道:
>
> Hi,
>
> On Sat, Nov 5, 2022 at 7:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
> > > On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > > >
> > > > From: ZheNing Hu <adlternative@gmail.com>
> > > >
> > > > When we use sparse-checkout, we often want the set of files
> > > > that some commands operate on to be restricted to the
> > > > sparse-checkout specification.
> > >
> > > It seems a bit premature to send this, when the guideline document[*]
> > > detailing how these options should work is still in the "Needs Review"
> > > state.  I know, it's been waiting for a while, but it's a _long_
> > > document.
> > >
> > > [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
> > >
> >
> > Fine, I just want to start trying to experiment with this feature in
> > git-diff earlier,
> > and I can wait for the sparse-checkout.txt documentation to finish
> > first if needed :)
>
> Note that you may be able to help reduce the wait by reviewing the
> document.  (You commented on v1 -- thanks! -- but no one has commented
> on any of the newer versions, despite being out for over a month and
> still being marked as "Needs Review" in the "What's cooking" reports.)
>

I've skimmed v3 in general, but haven't looked at v4 yet, I'll take a closer
look at it.

> > > > `--no-scope` is the default
> > > > option for now.
> > >
> > > What does --no-scope even mean?  You didn't explain it, and I don't
> > > see how it could make sense.  We explicitly avoided a `--no-` prefix
> > > by allowing the --scope option to take a value.  I don't think there
> > > should be a --no-scope option.
> >
> > I think the “--no-scope” here does nothing, as if it were unaffected by scope
> > (just like correctly "--scope=all", right?)
>
> Reading your patch, I was unable to determine where you made
> --no-scope behave differently than how you implemented --scope=all.
> Maybe there was a difference and I missed it.
>
> Anyway, I don't think there should be a --no-scope option.
>

Fine, I'll get rid of it.

> > > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > > > index 3674ac48e92..04bf83e8be1 100644
> > > > --- a/Documentation/diff-options.txt
> > > > +++ b/Documentation/diff-options.txt
> > > > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> > > >  non-default value and want to use the default one, then you
> > > >  have to use `--diff-algorithm=default` option.
> > > >
> > > > +ifndef::git-format-patch[]
> > > > +ifndef::git-log[]
> > > > +
> > > > +--scope={sparse|all}::
> > > > +       Choose diff scope. The variants are as follows:
> > > > ++
> > > > +--
> > > > +`--sparse`;;
> > > > +       Restrict diff paths to those matching sparse-checkout specification.
> > > > +`--all`;;
> > > > +       Without restriction, diff is performed regardless of whether the path
> > > > +       meets the sparse-checkout specification.
> > > > +--
> > > > ++
> > > > +
> > > > +endif::git-log[]
> > > > +endif::git-format-patch[]
> > >
> > > What about diff-files, diff-index, diff-tree, and show?
> > >
> >
> > diff-options.txt included in their documents, and git-format-patch.txt,
> > git-log.txt, but should not in git-format-patch.txt. I don't know if it
> > should be included in git-diff-files.txt, because git diff-files compare
> > the files in the working tree and the index (so it's the same as
> > "simple" git-diff, which should not be affected by scope?)
>
> Oh, good point.  Yeah, git-diff-files.txt should not get this option
> as it won't have any affect.  git-diff-index and git-diff-tree and
> git-show and git-log should get it, though.
>
> > > > +int diff_path_in_sparse_checkout(const char *path) {
> > > > +       if (core_sparse_checkout_cone)
> > > > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > > > +       else
> > > > +               return path_in_sparse_checkout(path, the_repository->index);
> > > > +}
> > >
> > > This says we are including the path if it matches the sparsity
> > > patterns.  Thus, we have to be careful when we use this function,
> > > because the relevant paths are ones that match the sparsity
> > > specification.  The sparsity specification will always match the
> > > sparsity patterns when diffing two commits, but when either the index
> > > or the working tree is part of the diff, the sparsity specification
> > > *might* be temporarily expanded.
> >
> > Yes, I may have to look at more code to better understand how and when the
> > "sparsity specification" is extended. Any recommendations for places to read?
>
> Yes, please review the newer version of my sparse-checkout.txt
> directions file; it covers this in more detail.
>

Ok, thanks.

> > > > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
> > >
> > > do_oneway_diff is for cases where we are diffing against the index...
> > >
> > > >                         return; /* nothing to diff.. */
> > > >         }
> > > >
> > > > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > > > +               if (idx && tree) {
> > > > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > > > +                               return;
> > > > +               } else if (idx) {
> > > > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > > > +                               return;
> > > > +               } else if (tree) {
> > > > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > > > +                               return;
> > > > +               } else
> > > > +                       return;
> > > > +       }
> > >
> > > ...and you again mistakenly only compare to the sparsity patterns
> > > instead of the sparse specification.
> > >
> >
> > So here we should use ce_skip_worktree(idx) to check if this index entry matches
> > sparse specification.
>
> ce_skip_worktree(idx) only checks for expansions to the sparse
> specification in the working tree.  If dealing with index files, the
> sparse specification is also expanded to handle all files in the index
> that differ from HEAD.
>

I will read more code to understand this detail.

> > > > +               } else
> > > > +                       return error(_("invalid --scope value: %s"), optarg);
> > > > +       }
> > >
> > > As written with no follow-on else clause here, wouldn't this silently
> > > accept "--scope" without an "=<something>" argument without an error
> > > and without properly initializing opt->scope?
> > >
> >
> > Because opt will be initializing with default_diff_options in repo_diff_setup(),
> > and opt->scope should respect config value first. So I don't think there's a
> > mistake here.
>
> Okay, it's good that you've got the variables initialized somehow, but
> that's only half the point here.  The main point is that the user can
> specify something that makes no sense and if they do, we should throw
> an error informing them of their mistake.  --scope should not be
> passed without an argument (either "all" or "sparse", currently), but
> this code allowed it.
>

Ah, it makes sense, I will fix it.

> > > > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > > > new file mode 100644
> > >
> > > This needs to be fixed.
>
> Since you didn't comment on this but usually do comment on thing, I'll
> just re-ping it to make sure you don't miss the comment about the
> incorrect file mode here.
>

Yes, I've noticed the file permissions issue here, but there may be some other
issues with this test file: it fails when running CI tests, presumably
because of
some shell syntax incompatibility.

> > > > +reset_repo () {
> > > > +       rm -rf repo &&
> > > > +       git clone --no-checkout temp repo
> > >
> > > Why --no-checkout rather than say --sparse?
> > >
> >
> > This is because I accidentally associated it with a
> > partial clone. I often use "git clone -filter=blob:none -no-checkout"
> > to do a partial clone, then "git sparse- checkout set <pattern>"
> > to set sparse-checkout patterns, and "git checkout" to download
> > the missing blobs and checkout to a branch. But in this
> > test file, we only need sparse-checkout, so it's true that I should
> > not do such strange no-checkout thing.
>
> Yeah, no need to involve partial clones here.  (As an aside, though, I
> think even with a partial clone that --sparse makes more sense than
> --no-checkout.)
>

Using --sparse may indeed be better than -no-checkout, as there is no
need to perform an additional checkout after using clone --no-checkout
and sparse-checkout. But it uses sparse-checkout cone mode by default.

A little curious, why can't we specify --no-cone mode when doing git clone
and why can't we specify sparse-checkout patterns here? If such a feature
is available, git clone and git sparse-checkout will be combined in one step.

> > > > +change_worktree_and_index() {
> > > > +       (
> > > > +               cd repo &&
> > > > +               mkdir sub2 sub3 &&
> > > > +               echo sub1/file3 >sub1/file3 &&
> > > > +               echo sub2/file3 >sub2/file3 &&
> > > > +               echo sub3/file3 >sub3/file3 &&
> > > > +               echo file3 >file3 &&
> > > > +               git add --all --sparse &&
> > > > +               echo sub1/file3 >>sub1/file3 &&
> > > > +               echo sub2/file3 >>sub2/file3 &&
> > > > +               echo sub3/file3 >>sub3/file3 &&
> > > > +               echo file3 >>file3
> > > > +       )
> > > > +}
> > >
> > > It would be nice to modify different paths in the working tree and
> > > index, to see if we can handle cases where the sparse specification
> > > does not match the sparsity patterns better.  (So, modify files inside
> > > and outside the sparsity patterns, stage those changes, and then do a
> > > `git sparse-checkout reapply` to make the files outside the sparsity
> > > patterns disappear from the working tree...then modify different files
> > > in the working tree both inside and outside the sparsity patterns.
> > > And also remove some file that matches the sparsity patterns and
> > > manually mark it as SKIP_WORKTREE.)  That'd give us much better
> > > coverage.
> > >
> >
> > Nice addition. So I should use git update-index --skip-worktree to set
> > skip_worktree bit for index entries.
>
> Well, I'd say use normal "sparse-checkout" commands to set most of
> them.  However, adding one or two extra SKIP_WORKTREE paths via
> running `git update-index --skip-worktree $PATH` (and removing the
> corresponding local file) would make the testcases more interesting.
>

Get it.

> > > I think I'm going to stop reviewing here.  I'm probably just going to
> > > keep repeating the same issues I identified earlier if I continue.
> >
> > Thank you very much for your review, you have pointed out very many
> > problems with this patch :)
>
> I'm glad it was helpful.  :)

Thanks.

ZheNing Hu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] [RFC] diff: introduce --scope option
  2022-10-31  4:11 [PATCH] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
  2022-11-01  1:34 ` Taylor Blau
  2022-11-01  5:18 ` Elijah Newren
@ 2022-11-25  2:45 ` ZheNing Hu via GitGitGadget
  2022-11-29 12:00   ` [PATCH v3 0/2] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2022-11-25  2:45 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Ævar Arnfjörð Bjarmason,
	Jeff King, Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Victoria Dye, Elijah Newren, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan, Taylor Blau,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Many of git commands, such as "git grep", "git diff", they
will search the "full-tree" scope of the entire git repository,
which is reasonable under normal circumstances, but if the user
uses sparse checkout in a git monorepo, it's very possible that
he just wants to use files within the sparse specification,
perhaps because:

* He wants to be able to focus on his subprojects, the output
of other subprojects will only interfere with him.

* He's using partial cloning at the same time, and he doesn't
want to be able to execute the above git commands download a
large number of blobs which out of sparse specification, which
is a waste of time and may cause the size of the git repository
to gradually expand.

So we need a way to restrict git commands to the sparse
specification. Implementing "diff --scope" is the first step
in this plan. We are looking for a suitable option to choose:
restrict the path scope of diff to the sparse specification
or keep the full tree scope (default action now). "--scope=sparse",
"--scope=all" are the parameters corresponding to these two
cases.

It is worth noting that "--scope" option only works on diff
commands specify "--cached" or "REVISION", because normal
"git diff" has retrict the scope of diff files to the sparse
specificaiton by default, while "git diff --cached" or
"git diff REVSION" will compare to the commit history, and
"--scope" options can works here to restrict or not.

Add "--scope" option to git "diff-index" and "git diff-tree"
too, because they also meet the above: specify "--cached",
or "REVISION". Meanwhile, "git diff-no-index", "git diff-files"
don't have this option.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
    [RFC] diff: introduce scope option
    
    In [1], we discovered that users working on different sparse-checkout
    specification may download unnecessary blobs from each other's
    specification in collaboration. In [2] Junio suggested that maybe we can
    restrict some git command's filespec in sparse-checkout specification to
    elegantly solve this problem above. In [3]: Newren and Derrick Stolee
    prefer to name the option --scope={sparse, all}.
    
    So this patch is attempt to do this thing on git diff:
    
    v1:
    
     1. add --restrict option to git diff, which restrict diff filespec in
        sparse-checkout specification. [4]
    
    v2.
    
     1. rename --restrict to --scope={sparse, all}, support --no-scope.
     2. add config: diff.scope={sparse,all}.
    
    v3.
    
     1. with the help of newren's review, fix the wrong --scope behavior,
        its previous meaning was misrepresented as sparse patterns, and now
        it is fixed to match sparse specification. [5]
     2. remove wrong diff.scope config.
     3. apply --scope to git diff, git diff-index, git diff-tree.
    
    Since I split --scope into a separate option, this option will not be
    directly inherited by git commands such as git log, git format-patch,
    etc. If necessary, we can add it to git log or other commands in a
    similar way later.
    
    Global scope config haven’t implement yet... Since we haven't decided on
    an appropriate name for scope config. [6]
    
    [1]:
    https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
    [2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
    https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
    [4]:
    https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/
    [5]:
    https://lore.kernel.org/git/CAOLTT8TceM-NpV2_hUCZj2Dx=W30f_9SHW8CcRH-pw32BRd-oA@mail.gmail.com/
    [6]:
    https://lore.kernel.org/git/CABPp-BGHMsMxP6e7p0HAZA=ugk+GY3XW6_EaTN=HzaLQYAzQYA@mail.gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1398

Contributor requested no range-diff. You can review it using these commands:
   git fetch https://github.com/gitgitgadget/git 63bba4fd 471a0691
   git range-diff <options> 63bba4fd..93ddbcbd c000d916..471a0691

 Documentation/diff-options.txt        |  33 +++
 builtin/diff-index.c                  |  29 ++-
 builtin/diff-tree.c                   |  15 ++
 builtin/diff.c                        |  46 ++++-
 cache.h                               |   5 +
 diff-lib.c                            |  44 ++++
 diff.c                                |   2 +
 diff.h                                |   8 +
 dir.c                                 |  20 ++
 dir.h                                 |   4 +
 t/t4070-diff-sparse-checkout-scope.sh | 286 ++++++++++++++++++++++++++
 tree-diff.c                           |  11 +
 12 files changed, 497 insertions(+), 6 deletions(-)
 create mode 100755 t/t4070-diff-sparse-checkout-scope.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 3674ac48e92..778b22ae982 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -195,6 +195,39 @@ For instance, if you configured the `diff.algorithm` variable to a
 non-default value and want to use the default one, then you
 have to use `--diff-algorithm=default` option.
 
+ifdef::git-diff[]
+ifdef::git-diff-index[]
+ifdef::git-diff-tree[]
+
+--scope=[sparse|all]::
+	Restrict or not restrict diff path scope in sparse specification.
+	The variants are as follows:
+
++
+--
+`sparse`;;
+	When using diff to compare commit history, restrict the
+	scope of file path comparisons to the sparse specification.
+	See sparse specification in link:technical/sparse-checkout.html
+	[the sparse-checkout design document] for more information.
+`all`;;
+	When using diff to compare commit history, the file comparison
+	scope is full-tree. This is consistent with the current default
+	behavior.
+--
++
+
+Note that `--scope` option only take effect if diff command specify
+`--cached` or `REVISION`.
+
+The behavior of this `--scope` option is experimental and may change
+in the future. See link:technical/sparse-checkout.html [the sparse-checkout
+design document] for more information.
+
+endif::git-diff-tree[]
+endif::git-diff-index[]
+endif::git-diff[]
+
 --stat[=<width>[,<name-width>[,<count>]]]::
 	Generate a diffstat. By default, as much space as necessary
 	will be used for the filename part, and the rest for the graph
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index aea139b9d8f..9cf37af8990 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -20,6 +20,14 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 	unsigned int option = 0;
 	int i;
 	int result;
+	enum sparse_scope scope;
+
+	struct option sparse_scope_options[] = {
+		OPT_CALLBACK_F(0, "scope", &scope, N_("[sparse|all]"),
+				N_("restrict path scope in sparse specification"),
+				PARSE_OPT_NONEG, diff_opt_sparse_scope),
+		OPT_END()
+	};
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(diff_cache_usage);
@@ -36,6 +44,15 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 	diff_merges_suppress_m_parsing();
 
 	argc = setup_revisions(argc, argv, &rev, NULL);
+
+	argc = parse_options(argc, argv, prefix, sparse_scope_options, NULL,
+			     PARSE_OPT_KEEP_DASHDASH |
+			     PARSE_OPT_KEEP_UNKNOWN_OPT |
+			     PARSE_OPT_KEEP_ARGV0 |
+			     PARSE_OPT_NO_INTERNAL_HELP);
+
+	rev.diffopt.scope = scope;
+
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 
@@ -66,9 +83,15 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 			perror("read_cache_preload");
 			return -1;
 		}
-	} else if (read_cache() < 0) {
-		perror("read_cache");
-		return -1;
+	} else {
+		if (read_cache() < 0) {
+			perror("read_cache");
+			return -1;
+		}
+		if (rev.diffopt.scope == SPARSE_SCOPE_SPARSE &&
+		    strcmp(rev.pending.objects[0].name, "HEAD"))
+			diff_collect_changes_index(&rev.diffopt.pathspec,
+						   &rev.diffopt.change_index_files);
 	}
 	result = run_diff_index(&rev, option);
 	result = diff_result_code(&rev.diffopt, result);
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 85e8c81e594..5da13bdb5ca 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -114,6 +114,14 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	struct userformat_want w;
 	int read_stdin = 0;
 	int merge_base = 0;
+	enum sparse_scope scope;
+
+	struct option sparse_scope_options[] = {
+		OPT_CALLBACK_F(0, "scope", &scope, N_("[sparse|all]"),
+				N_("restrict path scope in sparse specification"),
+				PARSE_OPT_NONEG, diff_opt_sparse_scope),
+		OPT_END()
+	};
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(diff_tree_usage);
@@ -131,6 +139,13 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	prefix = precompose_argv_prefix(argc, argv, prefix);
 	argc = setup_revisions(argc, argv, opt, &s_r_opt);
 
+	argc = parse_options(argc, argv, prefix, sparse_scope_options, NULL,
+			     PARSE_OPT_KEEP_DASHDASH |
+			     PARSE_OPT_KEEP_UNKNOWN_OPT |
+			     PARSE_OPT_KEEP_ARGV0 |
+			     PARSE_OPT_NO_INTERNAL_HELP);
+	opt->diffopt.scope = scope;
+
 	memset(&w, 0, sizeof(w));
 	userformat_find_requirements(NULL, &w);
 
diff --git a/builtin/diff.c b/builtin/diff.c
index 854d2c5a5c4..e30b3548c78 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -161,9 +161,15 @@ static int builtin_diff_index(struct rev_info *revs,
 			perror("read_cache_preload");
 			return -1;
 		}
-	} else if (read_cache() < 0) {
-		perror("read_cache");
-		return -1;
+	} else {
+		if (read_cache() < 0) {
+			perror("read_cache");
+			return -1;
+		}
+		if (revs->diffopt.scope == SPARSE_SCOPE_SPARSE &&
+		    strcmp(revs->pending.objects[0].name, "HEAD"))
+			diff_collect_changes_index(&revs->diffopt.pathspec,
+						   &revs->diffopt.change_index_files);
 	}
 	return run_diff_index(revs, option);
 }
@@ -388,6 +394,25 @@ static void symdiff_prepare(struct rev_info *rev, struct symdiff *sym)
 	sym->skip = map;
 }
 
+int diff_opt_sparse_scope(const struct option *option,
+				const char *optarg, int unset)
+{
+	enum sparse_scope *scope = option->value;
+
+	BUG_ON_OPT_NEG_NOARG(unset, optarg);
+
+	if (!core_apply_sparse_checkout)
+		return error(_("this git repository don't "
+			       "use sparse-checkout, --scope option cannot be used"));
+	if (!strcmp(optarg, "all"))
+		*scope = SPARSE_SCOPE_ALL;
+	else if (!strcmp(optarg, "sparse"))
+		*scope = SPARSE_SCOPE_SPARSE;
+	else
+		return error(_("invalid --scope value: %s"), optarg);
+	return 0;
+}
+
 int cmd_diff(int argc, const char **argv, const char *prefix)
 {
 	int i;
@@ -399,6 +424,14 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 	int nongit = 0, no_index = 0;
 	int result = 0;
 	struct symdiff sdiff;
+	enum sparse_scope scope;
+
+	struct option sparse_scope_options[] = {
+		OPT_CALLBACK_F(0, "scope", &scope, N_("[sparse|all]"),
+				N_("restrict path scope in sparse specification"),
+				PARSE_OPT_NONEG, diff_opt_sparse_scope),
+		OPT_END()
+	};
 
 	/*
 	 * We could get N tree-ish in the rev.pending_objects list.
@@ -504,6 +537,13 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 		diff_setup_done(&rev.diffopt);
 	}
 
+	argc = parse_options(argc, argv, prefix, sparse_scope_options, NULL,
+			     PARSE_OPT_KEEP_DASHDASH |
+			     PARSE_OPT_KEEP_UNKNOWN_OPT |
+			     PARSE_OPT_KEEP_ARGV0 |
+			     PARSE_OPT_NO_INTERNAL_HELP);
+
+	rev.diffopt.scope = scope;
 	rev.diffopt.flags.recursive = 1;
 	rev.diffopt.rotate_to_strict = 1;
 
diff --git a/cache.h b/cache.h
index 26ed03bd6de..e3bb2f3dde0 100644
--- a/cache.h
+++ b/cache.h
@@ -1082,6 +1082,11 @@ extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
 extern int sparse_expect_files_outside_of_patterns;
 
+enum sparse_scope {
+	SPARSE_SCOPE_ALL = 0,
+	SPARSE_SCOPE_SPARSE,
+};
+
 /*
  * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
  */
diff --git a/diff-lib.c b/diff-lib.c
index 2edea41a234..69770ca62a6 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -445,6 +445,13 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 
 	match_missing = revs->match_missing;
 
+	if (revs->diffopt.scope == SPARSE_SCOPE_SPARSE &&
+	    ((o->index_only && revs->pending.objects[0].name &&
+	      strcmp(revs->pending.objects[0].name, "HEAD") &&
+	      !index_file_in_sparse_specification(idx ? idx : tree, &revs->diffopt.change_index_files)) ||
+	     (!o->index_only && !worktree_file_in_sparse_specification(idx))))
+		return;
+
 	if (cached && idx && ce_stage(idx)) {
 		struct diff_filepair *pair;
 		pair = diff_unmerge(&revs->diffopt, idx->name);
@@ -598,6 +605,43 @@ void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb)
 	free_commit_list(merge_bases);
 }
 
+static void diff_collect_updated_cb(struct diff_queue_struct *q,
+					 struct diff_options *options,
+					 void *data) {
+	int i;
+	struct strset *change_index_files = (struct strset *)data;
+
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+
+		strset_add(change_index_files, p->two->path);
+		if (p->status == DIFF_STATUS_RENAMED)
+			strset_add(change_index_files, p->one->path);
+	}
+}
+
+void diff_collect_changes_index(struct pathspec *pathspec, struct strset *change_index_files)
+{
+	struct rev_info rev;
+	struct setup_revision_opt opt;
+
+	repo_init_revisions(the_repository, &rev, NULL);
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	setup_revisions(0, NULL, &rev, &opt);
+
+	rev.diffopt.ita_invisible_in_index = 1;
+	rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
+	rev.diffopt.format_callback = diff_collect_updated_cb;
+	rev.diffopt.format_callback_data = change_index_files;
+	rev.diffopt.flags.recursive = 1;
+
+	copy_pathspec(&rev.prune_data, pathspec);
+	run_diff_index(&rev, 1);
+	release_revisions(&rev);
+}
+
+
 int run_diff_index(struct rev_info *revs, unsigned int option)
 {
 	struct object_array_entry *ent;
diff --git a/diff.c b/diff.c
index 9f9a92ec9d2..cdbcacd7332 100644
--- a/diff.c
+++ b/diff.c
@@ -4663,6 +4663,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
 	options->color_moved = diff_color_moved_default;
 	options->color_moved_ws_handling = diff_color_moved_ws_default;
 
+	strset_init(&options->change_index_files);
 	prep_parse_options(options);
 }
 
@@ -6511,6 +6512,7 @@ void diff_free(struct diff_options *options)
 	diff_free_ignore_regex(options);
 	clear_pathspec(&options->pathspec);
 	FREE_AND_NULL(options->parseopts);
+	strset_clear(&options->change_index_files);
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index fd33caeb25d..4098a0cb123 100644
--- a/diff.h
+++ b/diff.h
@@ -8,6 +8,7 @@
 #include "pathspec.h"
 #include "object.h"
 #include "oidset.h"
+#include "strmap.h"
 
 /**
  * The diff API is for programs that compare two sets of files (e.g. two trees,
@@ -285,6 +286,9 @@ struct diff_options {
 	/* diff-filter bits */
 	unsigned int filter, filter_not;
 
+	/* diff sparse-checkout scope */
+	enum sparse_scope scope;
+
 	int use_color;
 
 	/* Number of context lines to generate in patch output. */
@@ -397,6 +401,7 @@ struct diff_options {
 	struct option *parseopts;
 	struct strmap *additional_path_headers;
 
+	struct strset change_index_files;
 	int no_free;
 };
 
@@ -696,4 +701,7 @@ void print_stat_summary(FILE *fp, int files,
 			int insertions, int deletions);
 void setup_diff_pager(struct diff_options *);
 
+void diff_collect_changes_index(struct pathspec *pathspec, struct strset *files);
+int diff_opt_sparse_scope(const struct option *option, const char *optarg, int unset);
+
 #endif /* DIFF_H */
diff --git a/dir.c b/dir.c
index d604d1bab98..010e243f24a 100644
--- a/dir.c
+++ b/dir.c
@@ -1503,6 +1503,26 @@ int path_in_cone_mode_sparse_checkout(const char *path,
 	return path_in_sparse_checkout_1(path, istate, 1);
 }
 
+int path_in_sparse_patterns(const char *path) {
+	return path_in_sparse_checkout_1(path, the_repository->index, core_sparse_checkout_cone);
+}
+
+/* Expand sparse-checkout specification (worktree) */
+int worktree_file_in_sparse_specification(const struct cache_entry *worktree_check_ce)
+{
+	return worktree_check_ce && !ce_skip_worktree(worktree_check_ce);
+}
+
+/* Expand sparse-checkout specification (index) */
+int index_file_in_sparse_specification(const struct cache_entry *ce, struct strset *change_index_files)
+{
+	if (!ce->ce_namelen)
+		return 0;
+	if (change_index_files && strset_contains(change_index_files, ce->name))
+		return 1;
+	return path_in_sparse_patterns(ce->name);
+}
+
 static struct path_pattern *last_matching_pattern_from_lists(
 		struct dir_struct *dir, struct index_state *istate,
 		const char *pathname, int pathlen,
diff --git a/dir.h b/dir.h
index 674747d93af..254268bb9a0 100644
--- a/dir.h
+++ b/dir.h
@@ -4,6 +4,7 @@
 #include "cache.h"
 #include "hashmap.h"
 #include "strbuf.h"
+#include "strmap.h"
 
 /**
  * The directory listing API is used to enumerate paths in the work tree,
@@ -401,6 +402,9 @@ int path_in_sparse_checkout(const char *path,
 			    struct index_state *istate);
 int path_in_cone_mode_sparse_checkout(const char *path,
 				      struct index_state *istate);
+int path_in_sparse_patterns(const char *path);
+int index_file_in_sparse_specification(const struct cache_entry *ce, struct strset *change_index_files);
+int worktree_file_in_sparse_specification(const struct cache_entry *worktree_check_ce);
 
 struct dir_entry *dir_add_ignored(struct dir_struct *dir,
 				  struct index_state *istate,
diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
new file mode 100755
index 00000000000..b8e3bfbaf5b
--- /dev/null
+++ b/t/t4070-diff-sparse-checkout-scope.sh
@@ -0,0 +1,286 @@
+#!/bin/sh
+
+test_description='diff sparse-checkout scope'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	git init repo &&
+	(
+		cd repo &&
+		mkdir in out1 out2 &&
+		for i in $(test_seq 6)
+		do
+			echo "in $i" >in/"$i" &&
+			echo "out1 $i" >out1/"$i" &&
+			echo "out2 $i" >out2/"$i" || return 1
+		done &&
+		git add in out1 out2 &&
+		git commit -m init &&
+		for i in $(test_seq 6)
+		do
+			echo "in $i" >>in/"$i" &&
+			echo "out1 $i" >>out1/"$i" || return 1
+		done &&
+		git add in out1 &&
+		git commit -m change &&
+		git sparse-checkout set "in"
+	)
+'
+
+reset_sparse_checkout_state() {
+	git -C repo reset --hard HEAD &&
+	git -C repo sparse-checkout reapply
+}
+
+reset_and_change_index() {
+	reset_sparse_checkout_state &&
+	# add new ce
+	oid=$(echo "new thing" | git -C repo hash-object --stdin -w) &&
+	git -C repo update-index --add --cacheinfo 100644 $oid in/7 &&
+	git -C repo update-index --add --cacheinfo 100644 $oid out1/7 &&
+	# rm ce
+	git -C repo update-index --remove in/6 &&
+	git -C repo update-index --remove out1/6 &&
+	# modify ce
+	git -C repo update-index --cacheinfo 100644 $oid out1/5 &&
+	# mv ce1 -> ce2
+	oid=$(git -C repo ls-files --format="%(objectname)" in/4) &&
+	git -C repo update-index --add --cacheinfo 100644 $oid in/8 &&
+	git -C repo update-index --remove in/4 &&
+	oid=$(git -C repo ls-files --format="%(objectname)" out1/4) &&
+	git -C repo update-index --add --cacheinfo 100644 $oid out1/8 &&
+	git -C repo update-index --remove out1/4 &&
+	# chmod ce
+	git -C repo update-index --chmod +x in/3 &&
+	git -C repo update-index --chmod +x out1/3
+}
+
+reset_and_change_worktree() {
+	reset_sparse_checkout_state &&
+	rm -rf repo/out1 repo/out2 &&
+	mkdir repo/out1 repo/out2 &&
+	# add new file
+	echo "in 7" >repo/in/7 &&
+	echo "out1 7" >repo/out1/7 &&
+	git -C repo add --sparse in/7 out1/7 &&
+	# create out old file
+	>repo/out1/6 &&
+	# rm file
+	rm repo/in/6 &&
+	# modify file
+	echo "out1 x" >repo/out1/5 &&
+	# mv file1 -> file2
+	mv repo/in/4 repo/in/3 &&
+	# chmod file
+	chmod +x repo/in/2 &&
+	# add new file, mark skipworktree
+	echo "in 8" >repo/in/8 &&
+	echo "out1 8" >repo/out1/8 &&
+	echo "out2 8" >repo/out2/8 &&
+	git -C repo add --sparse in/8 out1/8 out2/8 &&
+	git -C repo update-index --skip-worktree in/8 &&
+	git -C repo update-index --skip-worktree out1/8 &&
+	git -C repo update-index --skip-worktree out2/8 &&
+	rm repo/in/8 repo/out1/8
+}
+
+# git diff --cached REV
+
+test_expect_success 'git diff --cached --scope=all' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/1
+M	out1/2
+M	out1/3
+M	out1/5
+D	out1/6
+A	out1/7
+R050	out1/4	out1/8
+	EOF
+	git -C repo diff --name-status --cached --scope=all HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff --cached --scope=sparse' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/3
+M	out1/5
+D	out1/6
+A	out1/7
+R050	out1/4	out1/8
+	EOF
+	git -C repo diff --name-status --cached --scope=sparse HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+# git diff REV
+
+test_expect_success 'git diff REVISION --scope=all' '
+	reset_and_change_worktree &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+D	in/4
+M	in/5
+D	in/6
+A	in/7
+M	out1/1
+M	out1/2
+M	out1/3
+M	out1/4
+M	out1/5
+M	out1/6
+A	out1/7
+A	out2/8
+	EOF
+	git -C repo diff --name-status --scope=all HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff REVISION --scope=sparse' '
+	reset_and_change_worktree &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+D	in/4
+M	in/5
+D	in/6
+A	in/7
+M	out1/5
+M	out1/6
+A	out1/7
+A	out2/8
+	EOF
+	git -C repo diff --name-status --scope=sparse HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+# git diff REV1 REV2
+
+test_expect_success 'git diff two REVISION --scope=all' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+M	out1/1
+M	out1/2
+M	out1/3
+M	out1/4
+M	out1/5
+M	out1/6
+	EOF
+	git -C repo diff --name-status --scope=all HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff two REVISION --scope=sparse' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+	EOF
+	git -C repo diff --name-status --scope=sparse HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
+# git diff-index
+
+test_expect_success 'git diff-index --cached --scope=all' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/1
+M	out1/2
+M	out1/3
+D	out1/4
+M	out1/5
+D	out1/6
+A	out1/7
+A	out1/8
+	EOF
+	git -C repo diff-index --name-status --cached --scope=all HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff-index --cached --scope=sparse' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/3
+D	out1/4
+M	out1/5
+D	out1/6
+A	out1/7
+A	out1/8
+	EOF
+	git -C repo diff-index --name-status --cached --scope=sparse HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+# git diff-tree
+
+test_expect_success 'git diff-tree --scope=all' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in
+M	out1
+	EOF
+	git -C repo diff-tree --name-status --scope=all HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff-tree --scope=sparse' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in
+	EOF
+	git -C repo diff-tree --name-status --scope=sparse HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_done
diff --git a/tree-diff.c b/tree-diff.c
index 69031d7cbae..921665d0286 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -5,6 +5,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "tree.h"
+#include "dir.h"
 
 /*
  * internal mode marker, saying a tree entry != entry of tp[imin]
@@ -76,6 +77,16 @@ static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
 static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_diff_path *p)
 {
 	struct combine_diff_parent *p0 = &p->parent[0];
+
+	if (opt->scope == SPARSE_SCOPE_SPARSE) {
+		struct strbuf sb = STRBUF_INIT;
+
+		strbuf_addstr(&sb, p->path);
+		if (S_ISDIR(p->mode) || S_ISDIR(p0->mode))
+			strbuf_addch(&sb, '/');
+		if (!path_in_sparse_patterns(sb.buf))
+			return 0;
+	}
 	if (p->mode && p0->mode) {
 		opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid,
 			1, 1, p->path, 0, 0);

base-commit: c000d916380bb59db69c78546928eadd076b9c7d
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 0/2] [RFC] diff: introduce scope option
  2022-11-25  2:45 ` [PATCH v2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
@ 2022-11-29 12:00   ` ZheNing Hu via GitGitGadget
  2022-11-29 12:00     ` [PATCH v3 1/2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
  2022-11-29 12:00     ` [PATCH v3 2/2] [RPC] grep: " ZheNing Hu via GitGitGadget
  0 siblings, 2 replies; 11+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2022-11-29 12:00 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Ævar Arnfjörð Bjarmason,
	Jeff King, Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Victoria Dye, Elijah Newren, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan, Taylor Blau,
	ZheNing Hu

In [1], we discovered that users working on different sparse-checkout
specification may download unnecessary blobs from each other's specification
in collaboration. In [2] Junio suggested that maybe we can restrict some git
command's filespec in sparse-checkout specification to elegantly solve this
problem above. In [3]: Newren and Derrick Stolee prefer to name the option
--scope={sparse, all}.

So this patch is attempt to do this thing on git diff:

v1:

 1. add --restrict option to git diff, which restrict diff filespec in
    sparse-checkout specification. [4]

v2.

 1. rename --restrict to --scope={sparse, all}, support --no-scope.
 2. add config: diff.scope={sparse,all}.

v3.

 1. with the help of newren's review, fix the wrong --scope behavior, its
    previous meaning was misrepresented as sparse patterns, and now it is
    fixed to match sparse specification. [5]
 2. remove wrong diff.scope config.
 3. apply --scope to git diff, git diff-index, git diff-tree.

v4.

 1. create a OPT_SPARSE_SCOPE macro for easier add --scope option to other
    git commands later.
 2. introduce --scope option to "git grep ".

Since I split --scope into a separate option, this option will not be
directly inherited by git commands such as git log, git format-patch, etc.
If necessary, we can add it to git log or other commands in a similar way
later.

Global scope config haven’t implement yet... Since we haven't decided on an
appropriate name for scope config. [6]

Note why I don't add --scope to "git grep --cached", because we can't use ""
with "--cached", whether we use --scope=sparse or not , the resulting set of
files it gets is the same.

[1]:
https://lore.kernel.org/git/CAOLTT8SHo66kGbvWr=+LQ9UVd1NHgqGGEYK2qq6==QgRCgLZqQ@mail.gmail.com/
[2]: https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ [3]:
https://lore.kernel.org/git/07a25d48-e364-0d9b-6ffa-41a5984eb5db@github.com/
[4]:
https://lore.kernel.org/git/pull.1368.git.1664036052741.gitgitgadget@gmail.com/
[5]:
https://lore.kernel.org/git/CAOLTT8TceM-NpV2_hUCZj2Dx=W30f_9SHW8CcRH-pw32BRd-oA@mail.gmail.com/
[6]:
https://lore.kernel.org/git/CABPp-BGHMsMxP6e7p0HAZA=ugk+GY3XW6_EaTN=HzaLQYAzQYA@mail.gmail.com/

ZheNing Hu (2):
  [RFC] diff: introduce --scope option
  [RPC] grep: introduce --scope option

 Documentation/diff-options.txt   |  33 ++++
 Documentation/git-grep.txt       |  24 +++
 builtin/diff-index.c             |  24 ++-
 builtin/diff-tree.c              |  11 ++
 builtin/diff.c                   |  23 ++-
 builtin/grep.c                   |  10 +
 cache.h                          |   5 +
 diff-lib.c                       |  43 +++++
 diff.c                           |   2 +
 diff.h                           |   7 +
 dir.c                            |  52 ++++++
 dir.h                            |   4 +
 grep.h                           |   2 +
 parse-options.h                  |   7 +
 t/t1090-sparse-checkout-scope.sh | 303 +++++++++++++++++++++++++++++++
 tree-diff.c                      |   7 +
 16 files changed, 551 insertions(+), 6 deletions(-)


base-commit: 815c1e82021edbd99a2c423cf27f28863f28cef3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1398%2Fadlternative%2Fzh%2Fdiff-scope-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1398/adlternative/zh/diff-scope-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1398

Contributor requested no range-diff. You can review it using these commands:
   git fetch https://github.com/gitgitgadget/git c000d916 1d1b6618
   git range-diff <options> c000d916..471a0691 083e0127..1d1b6618
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/2] [RFC] diff: introduce --scope option
  2022-11-29 12:00   ` [PATCH v3 0/2] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
@ 2022-11-29 12:00     ` ZheNing Hu via GitGitGadget
  2022-11-29 12:00     ` [PATCH v3 2/2] [RPC] grep: " ZheNing Hu via GitGitGadget
  1 sibling, 0 replies; 11+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2022-11-29 12:00 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Ævar Arnfjörð Bjarmason,
	Jeff King, Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Victoria Dye, Elijah Newren, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan, Taylor Blau,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Many of git commands, such as "git grep", "git diff", they
will search the "full-tree" scope of the entire git repository,
which is reasonable under normal circumstances, but if the user
uses sparse checkout in a git monorepo, it's very possible that
he just wants to use files within the sparse specification,
perhaps because:

* He wants to be able to focus on his subprojects, the output
of other subprojects will only interfere with him.

* He's using partial cloning at the same time, and he doesn't
want to be able to execute the above git commands download a
large number of blobs which out of sparse specification, which
is a waste of time and may cause the size of the git repository
to gradually expand.

So we need a way to restrict git commands to the sparse
specification. Implementing "diff --scope" is the first step
in this plan. We are looking for a suitable option to choose:
restrict the path scope of diff to the sparse specification
or keep the full tree scope (default action now). "--scope=sparse",
"--scope=all" are the parameters corresponding to these two
cases.

It is worth noting that "--scope" option only works on diff
commands specify "--cached" or "REVISION", because normal
"git diff" has retrict the scope of diff files to the sparse
specificaiton by default, while "git diff --cached" or
"git diff REVSION" will compare to the commit history, and
"--scope" options can works here to restrict or not.

Add "--scope" option to git "diff-index" and "git diff-tree"
too, because they also meet the above: specify "--cached",
or "REVISION". Meanwhile, "git diff-no-index", "git diff-files"
don't have this option.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/diff-options.txt   |  33 ++++
 builtin/diff-index.c             |  24 ++-
 builtin/diff-tree.c              |  11 ++
 builtin/diff.c                   |  23 ++-
 cache.h                          |   5 +
 diff-lib.c                       |  43 +++++
 diff.c                           |   2 +
 diff.h                           |   7 +
 dir.c                            |  52 ++++++
 dir.h                            |   4 +
 parse-options.h                  |   7 +
 t/t1090-sparse-checkout-scope.sh | 276 +++++++++++++++++++++++++++++++
 tree-diff.c                      |   7 +
 13 files changed, 488 insertions(+), 6 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 3674ac48e92..778b22ae982 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -195,6 +195,39 @@ For instance, if you configured the `diff.algorithm` variable to a
 non-default value and want to use the default one, then you
 have to use `--diff-algorithm=default` option.
 
+ifdef::git-diff[]
+ifdef::git-diff-index[]
+ifdef::git-diff-tree[]
+
+--scope=[sparse|all]::
+	Restrict or not restrict diff path scope in sparse specification.
+	The variants are as follows:
+
++
+--
+`sparse`;;
+	When using diff to compare commit history, restrict the
+	scope of file path comparisons to the sparse specification.
+	See sparse specification in link:technical/sparse-checkout.html
+	[the sparse-checkout design document] for more information.
+`all`;;
+	When using diff to compare commit history, the file comparison
+	scope is full-tree. This is consistent with the current default
+	behavior.
+--
++
+
+Note that `--scope` option only take effect if diff command specify
+`--cached` or `REVISION`.
+
+The behavior of this `--scope` option is experimental and may change
+in the future. See link:technical/sparse-checkout.html [the sparse-checkout
+design document] for more information.
+
+endif::git-diff-tree[]
+endif::git-diff-index[]
+endif::git-diff[]
+
 --stat[=<width>[,<name-width>[,<count>]]]::
 	Generate a diffstat. By default, as much space as necessary
 	will be used for the filename part, and the rest for the graph
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 35dc9b23eef..27a510c30da 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -20,6 +20,11 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 	int i;
 	int result;
 
+	struct option sparse_scope_options[] = {
+		OPT_SPARSE_SCOPE(&rev.diffopt.scope),
+		OPT_END()
+	};
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(diff_cache_usage);
 
@@ -35,6 +40,13 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 	diff_merges_suppress_m_parsing();
 
 	argc = setup_revisions(argc, argv, &rev, NULL);
+
+	argc = parse_options(argc, argv, prefix, sparse_scope_options, NULL,
+			     PARSE_OPT_KEEP_DASHDASH |
+			     PARSE_OPT_KEEP_UNKNOWN_OPT |
+			     PARSE_OPT_KEEP_ARGV0 |
+			     PARSE_OPT_NO_INTERNAL_HELP);
+
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 
@@ -65,9 +77,15 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 			perror("repo_read_index_preload");
 			return -1;
 		}
-	} else if (repo_read_index(the_repository) < 0) {
-		perror("repo_read_index");
-		return -1;
+	} else {
+		if (repo_read_index(the_repository) < 0) {
+			perror("read_cache");
+			return -1;
+		}
+		if (rev.diffopt.scope == SPARSE_SCOPE_SPARSE &&
+		    strcmp(rev.pending.objects[0].name, "HEAD"))
+			diff_collect_changes_index(&rev.diffopt.pathspec,
+						   &rev.diffopt.change_index_files);
 	}
 	result = run_diff_index(&rev, option);
 	result = diff_result_code(&rev.diffopt, result);
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 25b853b85ca..31cac0c2614 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -115,6 +115,11 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	int read_stdin = 0;
 	int merge_base = 0;
 
+	struct option sparse_scope_options[] = {
+		OPT_SPARSE_SCOPE(&opt->diffopt.scope),
+		OPT_END()
+	};
+
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(diff_tree_usage);
 
@@ -131,6 +136,12 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	prefix = precompose_argv_prefix(argc, argv, prefix);
 	argc = setup_revisions(argc, argv, opt, &s_r_opt);
 
+	argc = parse_options(argc, argv, prefix, sparse_scope_options, NULL,
+			     PARSE_OPT_KEEP_DASHDASH |
+			     PARSE_OPT_KEEP_UNKNOWN_OPT |
+			     PARSE_OPT_KEEP_ARGV0 |
+			     PARSE_OPT_NO_INTERNAL_HELP);
+
 	memset(&w, 0, sizeof(w));
 	userformat_find_requirements(NULL, &w);
 
diff --git a/builtin/diff.c b/builtin/diff.c
index 163f2c6a874..8c2a847ec94 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -162,9 +162,15 @@ static int builtin_diff_index(struct rev_info *revs,
 			perror("repo_read_index_preload");
 			return -1;
 		}
-	} else if (repo_read_index(the_repository) < 0) {
-		perror("repo_read_cache");
-		return -1;
+	} else {
+		if (repo_read_index(the_repository) < 0) {
+			perror("read_cache");
+			return -1;
+		}
+		if (revs->diffopt.scope == SPARSE_SCOPE_SPARSE &&
+		    strcmp(revs->pending.objects[0].name, "HEAD"))
+			diff_collect_changes_index(&revs->diffopt.pathspec,
+						   &revs->diffopt.change_index_files);
 	}
 	return run_diff_index(revs, option);
 }
@@ -403,6 +409,11 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 	int result = 0;
 	struct symdiff sdiff;
 
+	struct option sparse_scope_options[] = {
+		OPT_SPARSE_SCOPE(&rev.diffopt.scope),
+		OPT_END()
+	};
+
 	/*
 	 * We could get N tree-ish in the rev.pending_objects list.
 	 * Also there could be M blobs there, and P pathspecs. --cached may
@@ -507,6 +518,12 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 		diff_setup_done(&rev.diffopt);
 	}
 
+	argc = parse_options(argc, argv, prefix, sparse_scope_options, NULL,
+			     PARSE_OPT_KEEP_DASHDASH |
+			     PARSE_OPT_KEEP_UNKNOWN_OPT |
+			     PARSE_OPT_KEEP_ARGV0 |
+			     PARSE_OPT_NO_INTERNAL_HELP);
+
 	rev.diffopt.flags.recursive = 1;
 	rev.diffopt.rotate_to_strict = 1;
 
diff --git a/cache.h b/cache.h
index 07d40b0964b..2731656573e 100644
--- a/cache.h
+++ b/cache.h
@@ -1058,6 +1058,11 @@ extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
 extern int sparse_expect_files_outside_of_patterns;
 
+enum sparse_scope {
+	SPARSE_SCOPE_ALL = 0,
+	SPARSE_SCOPE_SPARSE,
+};
+
 /*
  * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
  */
diff --git a/diff-lib.c b/diff-lib.c
index 2edea41a234..d660ecf0c62 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -445,6 +445,13 @@ static void do_oneway_diff(struct unpack_trees_options *o,
 
 	match_missing = revs->match_missing;
 
+	if (revs->diffopt.scope == SPARSE_SCOPE_SPARSE &&
+	    ((o->index_only && revs->pending.objects[0].name &&
+	      strcmp(revs->pending.objects[0].name, "HEAD") &&
+	      !index_file_in_sparse_specification(idx ? idx : tree, &revs->diffopt.change_index_files)) ||
+	     (!o->index_only && !worktree_file_in_sparse_specification(idx))))
+		return;
+
 	if (cached && idx && ce_stage(idx)) {
 		struct diff_filepair *pair;
 		pair = diff_unmerge(&revs->diffopt, idx->name);
@@ -598,6 +605,42 @@ void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb)
 	free_commit_list(merge_bases);
 }
 
+static void diff_collect_updated_cb(struct diff_queue_struct *q,
+					 struct diff_options *options,
+					 void *data) {
+	int i;
+	struct strset *change_index_files = (struct strset *)data;
+
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+
+		strset_add(change_index_files, p->two->path);
+		if (p->status == DIFF_STATUS_RENAMED)
+			strset_add(change_index_files, p->one->path);
+	}
+}
+
+void diff_collect_changes_index(struct pathspec *pathspec, struct strset *change_index_files)
+{
+	struct rev_info rev;
+	struct setup_revision_opt opt;
+
+	repo_init_revisions(the_repository, &rev, NULL);
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	setup_revisions(0, NULL, &rev, &opt);
+
+	rev.diffopt.ita_invisible_in_index = 1;
+	rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
+	rev.diffopt.format_callback = diff_collect_updated_cb;
+	rev.diffopt.format_callback_data = change_index_files;
+	rev.diffopt.flags.recursive = 1;
+
+	copy_pathspec(&rev.prune_data, pathspec);
+	run_diff_index(&rev, 1);
+	release_revisions(&rev);
+}
+
 int run_diff_index(struct rev_info *revs, unsigned int option)
 {
 	struct object_array_entry *ent;
diff --git a/diff.c b/diff.c
index 1054a4b7329..c719e9779a9 100644
--- a/diff.c
+++ b/diff.c
@@ -4663,6 +4663,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options)
 	options->color_moved = diff_color_moved_default;
 	options->color_moved_ws_handling = diff_color_moved_ws_default;
 
+	strset_init(&options->change_index_files);
 	prep_parse_options(options);
 }
 
@@ -6514,6 +6515,7 @@ void diff_free(struct diff_options *options)
 	diff_free_ignore_regex(options);
 	clear_pathspec(&options->pathspec);
 	FREE_AND_NULL(options->parseopts);
+	strset_clear(&options->change_index_files);
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index fd33caeb25d..31b255744db 100644
--- a/diff.h
+++ b/diff.h
@@ -8,6 +8,7 @@
 #include "pathspec.h"
 #include "object.h"
 #include "oidset.h"
+#include "strmap.h"
 
 /**
  * The diff API is for programs that compare two sets of files (e.g. two trees,
@@ -285,6 +286,9 @@ struct diff_options {
 	/* diff-filter bits */
 	unsigned int filter, filter_not;
 
+	/* diff sparse-checkout scope */
+	enum sparse_scope scope;
+
 	int use_color;
 
 	/* Number of context lines to generate in patch output. */
@@ -397,6 +401,7 @@ struct diff_options {
 	struct option *parseopts;
 	struct strmap *additional_path_headers;
 
+	struct strset change_index_files;
 	int no_free;
 };
 
@@ -696,4 +701,6 @@ void print_stat_summary(FILE *fp, int files,
 			int insertions, int deletions);
 void setup_diff_pager(struct diff_options *);
 
+void diff_collect_changes_index(struct pathspec *pathspec, struct strset *files);
+
 #endif /* DIFF_H */
diff --git a/dir.c b/dir.c
index d604d1bab98..5d47d5abf1a 100644
--- a/dir.c
+++ b/dir.c
@@ -18,6 +18,7 @@
 #include "ewah/ewok.h"
 #include "fsmonitor.h"
 #include "submodule-config.h"
+#include "parse-options.h"
 
 /*
  * Tells read_directory_recursive how a file or directory should be treated.
@@ -1503,6 +1504,57 @@ int path_in_cone_mode_sparse_checkout(const char *path,
 	return path_in_sparse_checkout_1(path, istate, 1);
 }
 
+int path_in_sparse_patterns(const char *path, int is_dir) {
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addstr(&sb, path);
+	if (!sb.len)
+		return 0;
+	if (is_dir && sb.buf[sb.len - 1] != '/')
+		strbuf_addch(&sb, '/');
+	if (!path_in_sparse_checkout_1(sb.buf,
+				       the_repository->index,
+				       core_sparse_checkout_cone))
+		return 0;
+	strbuf_release(&sb);
+	return 1;
+}
+
+/* Expand sparse-checkout specification (worktree) */
+int worktree_file_in_sparse_specification(const struct cache_entry *worktree_check_ce)
+{
+	return worktree_check_ce && !ce_skip_worktree(worktree_check_ce);
+}
+
+/* Expand sparse-checkout specification (index) */
+int index_file_in_sparse_specification(const struct cache_entry *ce, struct strset *change_index_files)
+{
+	if (!ce->ce_namelen)
+		return 0;
+	if (change_index_files && strset_contains(change_index_files, ce->name))
+		return 1;
+	return path_in_sparse_patterns(ce->name, 0);
+}
+
+int opt_sparse_scope(const struct option *option,
+				const char *optarg, int unset)
+{
+	enum sparse_scope *scope = option->value;
+
+	BUG_ON_OPT_NEG_NOARG(unset, optarg);
+
+	if (!core_apply_sparse_checkout)
+		return error(_("this git repository don't "
+			       "use sparse-checkout, --scope option cannot be used"));
+	if (!strcmp(optarg, "all"))
+		*scope = SPARSE_SCOPE_ALL;
+	else if (!strcmp(optarg, "sparse"))
+		*scope = SPARSE_SCOPE_SPARSE;
+	else
+		return error(_("invalid --scope value: %s"), optarg);
+	return 0;
+}
+
 static struct path_pattern *last_matching_pattern_from_lists(
 		struct dir_struct *dir, struct index_state *istate,
 		const char *pathname, int pathlen,
diff --git a/dir.h b/dir.h
index 674747d93af..14c1bac6e20 100644
--- a/dir.h
+++ b/dir.h
@@ -4,6 +4,7 @@
 #include "cache.h"
 #include "hashmap.h"
 #include "strbuf.h"
+#include "strmap.h"
 
 /**
  * The directory listing API is used to enumerate paths in the work tree,
@@ -401,6 +402,9 @@ int path_in_sparse_checkout(const char *path,
 			    struct index_state *istate);
 int path_in_cone_mode_sparse_checkout(const char *path,
 				      struct index_state *istate);
+int path_in_sparse_patterns(const char *path, int is_dir);
+int index_file_in_sparse_specification(const struct cache_entry *ce, struct strset *change_index_files);
+int worktree_file_in_sparse_specification(const struct cache_entry *worktree_check_ce);
 
 struct dir_entry *dir_add_ignored(struct dir_struct *dir,
 				  struct index_state *istate,
diff --git a/parse-options.h b/parse-options.h
index b6ef86e0d15..37ca2714f87 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -356,6 +356,13 @@ int parse_opt_passthru_argv(const struct option *, const char *, int);
 /* value is enum branch_track* */
 int parse_opt_tracking_mode(const struct option *, const char *, int);
 
+int opt_sparse_scope(const struct option *option,
+				const char *optarg, int unset);
+
+#define OPT_SPARSE_SCOPE(var) OPT_CALLBACK_F(0, "scope", (var), N_("[sparse|all]"), \
+				N_("restrict path scope in sparse specification"), \
+				PARSE_OPT_NONEG, opt_sparse_scope)
+
 #define OPT__VERBOSE(var, h)  OPT_COUNTUP('v', "verbose", (var), (h))
 #define OPT__QUIET(var, h)    OPT_COUNTUP('q', "quiet",   (var), (h))
 #define OPT__VERBOSITY(var) \
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 3a14218b245..e6ec8e8c1e4 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -106,4 +106,280 @@ test_expect_success 'in partial clone, sparse checkout only fetches needed blobs
 	test_cmp expect actual
 '
 
+test_expect_success 'setup two' '
+	git init repo &&
+	(
+		cd repo &&
+		mkdir in out1 out2 &&
+		for i in $(test_seq 6)
+		do
+			echo "in $i" >in/"$i" &&
+			echo "out1 $i" >out1/"$i" &&
+			echo "out2 $i" >out2/"$i" || return 1
+		done &&
+		git add in out1 out2 &&
+		git commit -m init &&
+		for i in $(test_seq 6)
+		do
+			echo "in $i" >>in/"$i" &&
+			echo "out1 $i" >>out1/"$i" || return 1
+		done &&
+		git add in out1 &&
+		git commit -m change &&
+		git sparse-checkout set "in"
+	)
+'
+
+reset_sparse_checkout_state() {
+	git -C repo reset --hard HEAD &&
+	git -C repo sparse-checkout reapply
+}
+
+reset_and_change_index() {
+	reset_sparse_checkout_state &&
+	# add new ce
+	oid=$(echo "new thing" | git -C repo hash-object --stdin -w) &&
+	git -C repo update-index --add --cacheinfo 100644 $oid in/7 &&
+	git -C repo update-index --add --cacheinfo 100644 $oid out1/7 &&
+	# rm ce
+	git -C repo update-index --remove in/6 &&
+	git -C repo update-index --remove out1/6 &&
+	# modify ce
+	git -C repo update-index --cacheinfo 100644 $oid out1/5 &&
+	# mv ce1 -> ce2
+	oid=$(git -C repo ls-files --format="%(objectname)" in/4) &&
+	git -C repo update-index --add --cacheinfo 100644 $oid in/8 &&
+	git -C repo update-index --remove in/4 &&
+	oid=$(git -C repo ls-files --format="%(objectname)" out1/4) &&
+	git -C repo update-index --add --cacheinfo 100644 $oid out1/8 &&
+	git -C repo update-index --remove out1/4 &&
+	# chmod ce
+	git -C repo update-index --chmod +x in/3 &&
+	git -C repo update-index --chmod +x out1/3
+}
+
+reset_and_change_worktree() {
+	reset_sparse_checkout_state &&
+	rm -rf repo/out1 repo/out2 &&
+	mkdir repo/out1 repo/out2 &&
+	# add new file
+	echo "in 7" >repo/in/7 &&
+	echo "out1 7" >repo/out1/7 &&
+	git -C repo add --sparse in/7 out1/7 &&
+	# create out old file
+	>repo/out1/6 &&
+	# rm file
+	rm repo/in/6 &&
+	# modify file
+	echo "out1 x" >repo/out1/5 &&
+	# mv file1 -> file2
+	mv repo/in/4 repo/in/3 &&
+	# chmod file
+	chmod +x repo/in/2 &&
+	# add new file, mark skipworktree
+	echo "in 8" >repo/in/8 &&
+	echo "out1 8" >repo/out1/8 &&
+	echo "out2 8" >repo/out2/8 &&
+	git -C repo add --sparse in/8 out1/8 out2/8 &&
+	git -C repo update-index --skip-worktree in/8 &&
+	git -C repo update-index --skip-worktree out1/8 &&
+	git -C repo update-index --skip-worktree out2/8 &&
+	rm repo/in/8 repo/out1/8
+}
+
+# git diff --cached REV
+
+test_expect_success 'git diff --cached --scope=all' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/1
+M	out1/2
+M	out1/3
+M	out1/5
+D	out1/6
+A	out1/7
+R050	out1/4	out1/8
+	EOF
+	git -C repo diff --name-status --cached --scope=all HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff --cached --scope=sparse' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/3
+M	out1/5
+D	out1/6
+A	out1/7
+R050	out1/4	out1/8
+	EOF
+	git -C repo diff --name-status --cached --scope=sparse HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+# git diff REV
+
+test_expect_success 'git diff REVISION --scope=all' '
+	reset_and_change_worktree &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+D	in/4
+M	in/5
+D	in/6
+A	in/7
+M	out1/1
+M	out1/2
+M	out1/3
+M	out1/4
+M	out1/5
+M	out1/6
+A	out1/7
+A	out2/8
+	EOF
+	git -C repo diff --name-status --scope=all HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff REVISION --scope=sparse' '
+	reset_and_change_worktree &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+D	in/4
+M	in/5
+D	in/6
+A	in/7
+M	out1/5
+M	out1/6
+A	out1/7
+A	out2/8
+	EOF
+	git -C repo diff --name-status --scope=sparse HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+# git diff REV1 REV2
+
+test_expect_success 'git diff two REVISION --scope=all' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+M	out1/1
+M	out1/2
+M	out1/3
+M	out1/4
+M	out1/5
+M	out1/6
+	EOF
+	git -C repo diff --name-status --scope=all HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff two REVISION --scope=sparse' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+	EOF
+	git -C repo diff --name-status --scope=sparse HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
+# git diff-index
+
+test_expect_success 'git diff-index --cached --scope=all' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/1
+M	out1/2
+M	out1/3
+D	out1/4
+M	out1/5
+D	out1/6
+A	out1/7
+A	out1/8
+	EOF
+	git -C repo diff-index --name-status --cached --scope=all HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff-index --cached --scope=sparse' '
+	reset_and_change_index &&
+	cat >expected <<-EOF &&
+M	in/1
+M	in/2
+M	in/3
+M	in/4
+M	in/5
+M	in/6
+A	in/7
+A	in/8
+M	out1/3
+D	out1/4
+M	out1/5
+D	out1/6
+A	out1/7
+A	out1/8
+	EOF
+	git -C repo diff-index --name-status --cached --scope=sparse HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+# git diff-tree
+
+test_expect_success 'git diff-tree --scope=all' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in
+M	out1
+	EOF
+	git -C repo diff-tree --name-status --scope=all HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git diff-tree --scope=sparse' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+M	in
+	EOF
+	git -C repo diff-tree --name-status --scope=sparse HEAD~ HEAD >actual &&
+	test_cmp expected actual
+'
+
 test_done
diff --git a/tree-diff.c b/tree-diff.c
index 69031d7cbae..72ca0ab38bc 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -5,6 +5,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "tree.h"
+#include "dir.h"
 
 /*
  * internal mode marker, saying a tree entry != entry of tp[imin]
@@ -76,6 +77,12 @@ static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
 static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_diff_path *p)
 {
 	struct combine_diff_parent *p0 = &p->parent[0];
+
+	if (opt->scope == SPARSE_SCOPE_SPARSE &&
+	    !path_in_sparse_patterns(p->path,
+				     S_ISDIR(p->mode) ||
+				     S_ISDIR(p0->mode)))
+		return 0;
 	if (p->mode && p0->mode) {
 		opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid,
 			1, 1, p->path, 0, 0);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] [RPC] grep: introduce --scope option
  2022-11-29 12:00   ` [PATCH v3 0/2] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
  2022-11-29 12:00     ` [PATCH v3 1/2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
@ 2022-11-29 12:00     ` ZheNing Hu via GitGitGadget
  1 sibling, 0 replies; 11+ messages in thread
From: ZheNing Hu via GitGitGadget @ 2022-11-29 12:00 UTC (permalink / raw)
  To: git
  Cc: Christian Couder, Ævar Arnfjörð Bjarmason,
	Jeff King, Junio C Hamano, Derrick Stolee, Johannes Schindelin,
	Victoria Dye, Elijah Newren, Emily Shaffer,
	Matheus Tavares Bernardino, Shaoxuan Yuan, Taylor Blau,
	ZheNing Hu, ZheNing Hu

From: ZheNing Hu <adlternative@gmail.com>

Because we may want to restrict grep's file scope
to sparse specification, Apply the --scope option
we implemented in diff to grep as well.

`--scope=sparse` mean that the search file scope
restrict to sparse specification when we grep
something in commit history, and `--scope=all`
mean that the search file scope will be full-tree.

Note that `--scope` option only oly takes effect
when "git grep <tree>" is specified.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
---
 Documentation/git-grep.txt       | 24 ++++++++++++++++++++++++
 builtin/grep.c                   | 10 ++++++++++
 grep.h                           |  2 ++
 t/t1090-sparse-checkout-scope.sh | 27 +++++++++++++++++++++++++++
 4 files changed, 63 insertions(+)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index dabdbe8471d..b556f657306 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -28,6 +28,7 @@ SYNOPSIS
 	   [-f <file>] [-e] <pattern>
 	   [--and|--or|--not|(|)|-e <pattern>...]
 	   [--recurse-submodules] [--parent-basename <basename>]
+	   [--scope=(sparse|all)]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
 	   [--] [<pathspec>...]
 
@@ -296,6 +297,29 @@ question doesn't support them.
 	Do not output matched lines; instead, exit with status 0 when
 	there is a match and with non-zero status when there isn't.
 
+--scope=(sparse|all)::
+	Restrict or not restrict grep path scope in sparse specification.
+	The variants are as follows:
+
++
+--
+`sparse`;;
+	When grep in commit history, restrict the scope of file path
+	to the sparse specification. See sparse specification in
+	link:technical/sparse-checkout.html [the sparse-checkout design
+	document] for more information.
+`all`;;
+	When grep in commit history, the file path scope is full-tree.
+	This is consistent with the current default behavior.
+--
++
+
+Note that `--scope` option only take effect if git command specify `<tree>`.
+
+The behavior of this `--scope` option is experimental and may change
+in the future. See link:technical/sparse-checkout.html [the sparse-checkout
+design document] for more information.
+
 <tree>...::
 	Instead of searching tracked files in the working tree, search
 	blobs in the given trees.
diff --git a/builtin/grep.c b/builtin/grep.c
index f7821c5fbba..8f7944fb924 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -640,6 +640,15 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		}
 
 		strbuf_add(base, entry.path, te_len);
+		if (opt->scope == SPARSE_SCOPE_SPARSE &&
+			base->len != tn_len &&
+			!path_in_sparse_patterns(base->buf + tn_len,
+					S_ISDIR(entry.mode) ||
+					S_ISGITLINK(entry.mode))) {
+			strbuf_setlen(base, old_baselen);
+			continue;
+		}
+
 
 		if (S_ISREG(entry.mode)) {
 			hit |= grep_oid(opt, &entry.oid, base->buf, tn_len,
@@ -999,6 +1008,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 			   PARSE_OPT_NOCOMPLETE),
 		OPT_INTEGER('m', "max-count", &opt.max_count,
 			N_("maximum number of results per file")),
+		OPT_SPARSE_SCOPE(&opt.scope),
 		OPT_END()
 	};
 	grep_prefix = prefix;
diff --git a/grep.h b/grep.h
index 6075f997e68..05010d4b166 100644
--- a/grep.h
+++ b/grep.h
@@ -22,6 +22,7 @@ typedef int pcre2_general_context;
 #endif
 #include "thread-utils.h"
 #include "userdiff.h"
+#include "cache.h"
 
 struct repository;
 
@@ -175,6 +176,7 @@ struct grep_opt {
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
 	void *output_priv;
+	enum sparse_scope scope;
 };
 
 #define GREP_OPT_INIT { \
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index e6ec8e8c1e4..c8f68fc7afa 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -382,4 +382,31 @@ M	in
 	test_cmp expected actual
 '
 
+# git grep TREE
+
+test_expect_success 'git grep --scope=all' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+HEAD~:in/1
+HEAD~:out1/1
+HEAD~:out1/2
+HEAD~:out1/3
+HEAD~:out1/4
+HEAD~:out1/5
+HEAD~:out1/6
+HEAD~:out2/1
+	EOF
+	git -C repo grep --name-only --scope=all 1 HEAD~ >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'git grep --scope=sparse' '
+	reset_sparse_checkout_state &&
+	cat >expected <<-EOF &&
+HEAD~:in/1
+	EOF
+	git -C repo grep --name-only --scope=sparse 1 HEAD~ >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-11-29 12:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31  4:11 [PATCH] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-01  1:34 ` Taylor Blau
2022-11-01  2:13   ` ZheNing Hu
2022-11-01  5:18 ` Elijah Newren
2022-11-06  2:11   ` ZheNing Hu
2022-11-06  6:58     ` Elijah Newren
2022-11-14  9:08       ` ZheNing Hu
2022-11-25  2:45 ` [PATCH v2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00   ` [PATCH v3 0/2] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 1/2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 2/2] [RPC] grep: " ZheNing Hu via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).